原博客地址:https://novasky-ai.github.io/posts/sky-t1/

代码:https://github.com/NovaSky-AI/SkyThought

费用: less than $450

组织: the NovaSky team at UC Berkeley

基座模型: Qwen2.5-32B-Instruct

模型训练:

训练了3个epoch,学习率1e-5,batch_size : 96,在八卡H100上使用 DeepSpeed Zero-3 offload 训练了19小时,训练框架为Llama-Factory。

The model is trained with 3 epochs, learning rate 1e-5 and batch size 96. The model training finishes in 19 hours on 8 H100 with DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud pricing). We use Llama-Factory to perform training.

模型测评