TODO | Pluto's blog

From System 1 to System 2: A Survey of Reasoning Large Language Models

论文概览论文标题：From System 1 to System 2: A Survey of Reasoning Large Language Models 核心主题：从快速直觉到深度推理的AI认知进化关键洞察：推理型LLM代表了从System 1到System 2思维模式的重大转变 ...

Buffer of Thoughts：Thought-Augmented Reasoning with Large Language Models

施工中论文翻译：https://dppemvhuzp.feishu.cn/docx/Rp4YdgRXAohJBaxWqL7cO9FPnJf?from=from_copylink ...

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

论文概览论文标题：Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought 研究机构：SynthLabs.ai, Stanford University, UC Berkeley 核心创新：元思维链（Meta-CoT）框架，从CoT到深度推理的革命性跃升 ...

Cube-LLM

https://news.cafa.edu.cn/MobileNews/independenWeixinContent?contentId=225334751 abstract 多模态大型语言模型 (MLLM) 在各种 2D 视觉和语言任务中表现出令人难以置信的能力。我们将 MLLM 的感知能力扩展到对 3 维空间中的图像进行基础和推理。为此，我们首先通过将多个现有的 2D 和 3D 识别数据集结合在一个共同的任务公式下（即多轮问答），开发了一个称为 LV3D 的 2D 和 3D 大规模预训练数据集。接下来，我们引入了一个名为 Cube-LLM 的新 MLLM，并在 LV3D 上对其进行预训练。我们表明，纯数据缩放可以实现强大的 3D 感知能力，而无需 3D 特定的架构设计或训练目标。Cube-LLM 表现出与 LLM 类似的有趣特性：(1) Cube-LLM 可以应用思路链提示来从 2D 上下文信息中提高 3D 理解。 (2) Cube-LLM 可以遵循复杂多样的指令，并适应多种输入和输出格式。 (3) Cube-LLM 可以以视觉方式提示，例如来自专家的 2D 框或一组候选 3D 框。我们在户外基准测试上的实验表明，Cube-LLM 在 Talk2Car 数据集上对 3D 基础推理的 APBEV 得分显著高于现有基线 21.3 分，在 DriveLM 数据集上对驾驶场景的复杂推理的 APBEV 得分显著高于现有基线 17.7 分。 Cube-LLM 在一般 MLLM 基准测试（例如 refCOCO 的 2D 基础推理，平均得分为 (87.0)）以及视觉问答基准（例如 VQAv2、GQA、SQA、POPE 等复杂推理）中也表现出了竞争力。我们的项目可在 https://janghyuncho.github.io/Cube-LLM 上找到。 ...

3D-LLM：Injecting the 3D World into Large Language Models

https://blog.51cto.com/u_16282361/7841645 3D-LLM的优势： By taking the 3D representations of scenes as input, LLMs are blessed with twofold advantages: (1) long-term memories about the entire scene can be stored in the holistic 3D representations, instead of episodic partial-view observations. (2) 3D properties such as affordances and spatial relationships can be reasoned from 3D representations, far beyond the scope of language-based or 2D image-based LLMs. （1）有关整个场景的长期记忆可以存储在整体 3D 表示中，而不是情景式的部分视图观察中。（2）诸如可供性和空间关系等 3D 属性可以从 3D 表示中推理出来，远远超出了基于语言或基于 2D 图像的 LLM 的范围。 ...