机械鹦鹉与真正的智能：大语言模型推理能力的迷思-03 结束语

本文探讨了一个热门话题：LLMs 是否具备推理能力？或者至少是某种形式的推理能力？

我们所展示的研究成果给出了不同的观点，认为 LLMs 实质上是高级的模式匹配机器。 总结来说，这些研究指出：

LLMs 通过海量 tokens 进行训练，因此存在主要基准测试数据集发生数据污染的风险。即便模型未曾直接见过某个数学问题，它也可能接触过众多类似的案例。
凭借其庞大的知识库和与生俱来的模式识别能力（归功于注意力机制和上下文学习[19]），它们能够解决大部分问题。
它们在应对问题变化、tokens 偏差以及噪声影响方面的脆弱性，强烈表明 LLMs 并不具备形式推理的能力。最新研究结果显示，即便采用先进的提示词技术，模型仍然容易受到噪声和不相关（甚至可能误导）信息的影响。
这些模型能够进行模式匹配，但似乎并不理解解决问题所依据的任何数学概念。

这些发现并未否定 LLMs 的实用性，而是对 LLMs 具备推理能力的观点提出了质疑。 这些结果表明，可以将 LLM 视为拥有非凡记忆力的机器，却无法进行推理（或者可以说是迄今为止最精巧的“机械鹦鹉”）。这并非贬低创造它们所需的卓越技术，而是对人类智慧结晶的赞叹。为了更深入地理解 LLMs 的能力，以及开发能够进行推理的新模型架构，可能还需要进一步的研究。

Reference

Jiang, 2024, A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners, https://arxiv.org/abs/2406.11050
Shi, 2023, Large Language Models Can Be Easily Distracted by Irrelevant Context, https://proceedings.mlr.press/v202/shi23a.html
Schaeffer, 2023, Are emergent abilities of large language models a mirage? https://arxiv.org/pdf/2304.15004
Wei, 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, https://arxiv.org/abs/2201.11903
Sprague, 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
Valmeekam, 2023, PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
Kambhampati, 2024, Can Large Language Models Reason and Plan? https://arxiv.org/abs/2403.04121
Razeghi, 2022, Impact of Pretraining Term Frequencies on Few-Shot Reasoning, https://arxiv.org/abs/2202.07206
Mirzadeh, 2024, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229
Valmeekam, 2024, LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench, https://www.arxiv.org/abs/2409.13373
Lu, 2022, Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity, https://aclanthology.org/2022.acl-long.556/
Zhao, 2021, Calibrate Before Use: Improving Few-shot Performance of Language Models, https://proceedings.mlr.press/v139/zhao21c.html
Rogers, 2024, Position: Key Claims in LLM Research Have a Long Tail of Footnotes, https://openreview.net/forum?id=M2cwkGleRL

Thanks for reading!

Hope you have enjoyed and learned new things from this blog!

About the authors

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

END

本期互动内容 ????

❓您对未来可能出现的、真正具备推理能力的 AI 系统有什么期待和想象？

????文中链接????

[1]https://github.com/SalvatoreRa/tutorial/blob/main/artificial%20intelligence/FAQ.md#large-language-models:~:text=Large%20Language%20Models,-What%20is%20a

[2]https://en.wikipedia.org/wiki/Natural_language_processing

[3]https://openai.com/index/introducing-openai-o1-preview/

[4]https://aibusiness.com/nlp/chatgpt-update-claims-reasoning-capabilities-industry-reacts

[5]https://gluebenchmark.com/

[6]https://super.gluebenchmark.com/

[7]https://deepgram.com/learn/hellaswag-llm-benchmark-guide

[8]https://paperswithcode.com/area/reasoning

[9]https://arxiv.org/pdf/2406.11050

[10]https://www.promptingguide.ai/techniques

[11]https://ngsf.in/2021/09/19/intelligence-as-an-emergent-property-in-biological-systems/

[12]https://github.com/SalvatoreRa/tutorial/blob/main/artificial%20intelligence/FAQ.md#large-language-models:~:text=What%20does%20it%20mean%20emergent%20properties%3F%20what%20it%20is%20the%20scaling%20law%3F

[13]https://arxiv.org/pdf/2409.12183

[14]https://openai.com/index/learning-to-reason-with-llms/

[15]https://www.lakera.ai/blog/what-is-in-context-learning

[16]https://www.technologyreview.com/2023/08/30/1078670/large-language-models-arent-people-lets-stop-testing-them-like-they-were/

[17]https://paperswithcode.com/dataset/gsm8k

[18]https://machinelearning.apple.com/research/gsm-symbolic

[19]http://ai.stanford.edu/blog/understanding-incontext/

原文链接：

https://towardsdatascience.com/the-savant-syndrome-is-pattern-recognition-equivalent-to-intelligence-242aab928152

秒客网

机械鹦鹉与真正的智能：大语言模型推理能力的迷思-03 结束语

About the authors

相关文章