Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Large language models (LLMs) have shown impressive performance on various reasoning and problem-solving tasks. However, there are questions about how these reasoning abilities work and their limitations.
In a new study, researchers at the University of California, Los Angeles, and Amazon have done a comprehensive study of the capabilities of LLMs at deductive and inductive reasoning. Their findings show that while LLMs can be very good at finding the rules of a task from solved examples, they are limited in following specific instructions. The findings can have important implications for how we use LLMs in applications that require reasoning.
Inductive vs. deductive reasoning
Reasoning can be broadly categorized into two distinct types: deductive and inductive. Deductive reasoning, often described as “top-down” logic, starts with a general principle or rule and applies it to infer specific conclusions. For example, when given the formula for converting Celsius temperature to Fahrenheit, you can use it to calculate new measurements.
Inductive reasoning, on the other hand, takes a “bottom-up” approach. It involves observing specific instances or examples and drawing general conclusions or patterns from them. For example, you can observe several Celsius and Fahrenheit measurements on a thermometer and try to infer the formula that converts one to the other.
Both types of reasoning are essential for intelligence but involve different cognitive processes. And while LLMs are often evaluated on their reasoning abilities, most research doesn’t make a clear distinction between their inductive and deductive capabilities.
A new framework for testing LLM reasoning
The researchers at Amazon and UCLA designed a series of experiments to evaluate the inductive and deductive reasoning capabilities of LLMs. To ensure a fair and consistent comparison, the experiments used a similar task structure across different contexts, with each context specifically emphasizing either deductive or inductive reasoning.
For instance, in an arithmetic task, the researchers tested the LLMs’ ability to apply a given mathematical function to solve problems (deductive reasoning) and their ability to infer the underlying mathematical function from a set of input-output examples (inductive reasoning).
To further disentangle inductive reasoning from deductive reasoning, the researchers developed SolverLearner, a two-step framework that isolates and evaluates the inductive reasoning process in LLMs.
SolverLearner first prompts the LLM to generate a function that maps input data points to their corresponding output values based solely on a set of input-output examples. This step focuses on the LLM’s ability to learn the underlying pattern or rule from the data.
In the second step, SolverLearner uses an external code interpreter to execute the proposed function on new test data. This separation ensures that the LLM is not involved in applying the function, preventing its deductive reasoning abilities from influencing the evaluation of its inductive reasoning.
“By focusing on inductive reasoning and setting aside LLM-based deductive reasoning, we can isolate and investigate inductive reasoning of LLMs in its pure form via SolverLearner,” the researchers write.
LLMs show contrasting strengths in inductive and deductive reasoning
The researchers used SolverLearner to evaluate the inductive and deductive reasoning capabilities of GPT-3.5 and GPT-4 across various tasks, including syntactic reasoning, arithmetic operations, and spatial reasoning.
The results showed that both LLMs consistently exhibited remarkable inductive reasoning capabilities, achieving near-perfect accuracy on tasks that required them to learn from examples and infer the underlying mapping function.
However, the LLMs struggled when tasked with applying specific rules or instructions, especially when those instructions involved scenarios not commonly encountered during their training. This is especially true for “counterfactual” reasoning tasks that are different from conventional cases. For example, the LLMs perform well on deductive reasoning involving base 10 arithmetic but perform very poorly on unconventional numerical bases, such as 11 and 9.
The findings suggest that LLMs might be better at learning by example and discovering patterns in data than at following explicit instructions. This has important implications for the use of LLMs in real-world scenarios. While on the surface, LLMs might show impressive abilities to follow logical instructions, there is a great chance that they are just following patterns they observed during their training, which means their performance will degrade as soon as the examples they see deviate from their training distribution.
On the other hand, SolverLearner provides a framework that ensures the model learns the correct rules that map the inputs to the outputs. However, SolverLearner is only applicable in settings where a verification mechanism such as a code interpreter is available.
This study is a sobering reminder that we have yet a lot to learn about the abilities of these black boxes that are becoming part of a growing number of applications.
Source Link
Support Techcratic
If you find value in our blend of original insights (Techcratic articles and Techs Got To Eat), up-to-date daily curated articles, and the extensive technical work required to keep everything running smoothly, consider supporting Techcratic with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to future updates and improvements. I am committed to continually enhancing the site and staying at the forefront of trends to provide the best possible experience. Your generosity and commitment are deeply appreciated. Thank you!
Bitcoin Address:bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending any funds to ensure your donation is directed correctly.Bitcoin QR Code
Your contribution is vital in supporting my efforts to deliver valuable content and manage the technical aspects of the site. To donate, simply scan the QR code below. Your generosity allows me to keep providing insightful articles and maintaining the server infrastructure that supports them.
Privacy and Security Disclaimer
- No Personal Information Collected: We do not collect any personal information or transaction details when you make a donation via Bitcoin. The Bitcoin address provided is used solely for receiving donations.
- Data Privacy: We do not store or process any personal data related to your Bitcoin transactions. All transactions are processed directly through the Bitcoin network, ensuring your privacy.
- Security Measures: We utilize industry-standard security practices to protect our Bitcoin address and ensure that your donations are received securely. However, we encourage you to exercise caution and verify the address before sending funds.
- Contact Us: If you have any concerns or questions about our donation process, please contact us via the Techcratic Contact form. We are here to assist you.