
Large language models (LMs) can complete abstract reasoning tasks, but they are susceptible to many of the same types of mistakes made by humans. Andrew Lampinen, Ishita Dasgupta, and colleagues conducted extensive tests on state-of-the-art LMs and humans, focusing on three distinct types of reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. Through these experiments, the authors discovered that LMs exhibit a propensity for similar content effects as humans. Both humans and LMs are more likely to mistakenly label an invalid argument as valid, especially when the semantic content appears sensical and believable.
The researchers also found that LMs perform just as poorly as humans on the Wason selection task. In this task, participants are presented with four cards, each displaying a letter or number (e.g., ‘D,’ ‘F,’ ‘3,’ and ‘7’), and are asked which cards they need to flip over to verify the accuracy of a specific rule such as “if a card has a ‘D’ on one side, then it has a ‘3’ on the other side.” Humans often choose to flip over cards that do not provide any useful information about the rule’s validity but test the contrapositive rule. For instance, in this example, humans would likely opt to flip the card labeled ‘3,’ even though the rule does not imply that a card with ‘3’ would have ‘D’ on the reverse side. Interestingly, LMs replicate this error pattern and other similar mistakes, showing a comparable overall error rate to humans.
Moreover, the performance of both humans and LMs on the Wason selection task shows marked improvement when the rules concerning arbitrary letters and numbers are substituted with socially relevant relationships. For example, if the rules are framed in terms of people’s ages and whether a person is drinking alcohol or soda, participants perform better. This suggests that context plays a significant role in logical reasoning for both humans and LMs. According to the authors, LMs trained on human data seem to exhibit some human-like foibles in terms of reasoning. Therefore, much like humans, LMs may require formal training to enhance their logical reasoning capabilities effectively.