Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

论文概览 论文标题:Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search 研究机构:AI Lab 基座模型:Llama-3.1-8B-Instruct, DeepSeek-Math-7B-Instruct 论文地址:https://arxiv.org/abs/2501.01478 ...

2025年02月20日 · 6 分钟 · 2612 字 · ZhaoYang

s1: Simple test-time scaling

单位: Stanford 代码:https://github.com/simplescaling/s1 基座模型: Qwen2.5 32B-Instruct 原文地址:https://arxiv.org/abs/2501.19393 ...

2025年02月16日 · 6 分钟 · 2985 字 · ZhaoYang

Sky-T1: Train your own O1 preview model within $450

原博客地址:https://novasky-ai.github.io/posts/sky-t1/ 代码:https://github.com/NovaSky-AI/SkyThought ...

2025年02月16日 · 1 分钟 · 235 字 · ZhaoYang

LIMO: Less Is More for Reasoning

单位: SJTU 代码:https://github.com/GAIR-NLP/LIMO 基座模型: Qwen2.5-32B-Instruct 原文地址:https://arxiv.org/pdf/2502.03387 ...

2025年02月15日 · 6 分钟 · 2705 字 · ZhaoYang