DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
论文概览 论文标题:DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities 主要贡献:提出了一个专门评估大语言模型长文本推理能力的新基准 数据规模:100个专家级问答问题,涵盖5个现实领域 ...