原博客地址:https://novasky-ai.github.io/posts/sky-t1/
代码:https://github.com/NovaSky-AI/SkyThought
费用: less than $450
组织: the NovaSky team at UC Berkeley
基座模型: Qwen2.5-32B-Instruct
模型训练:
训练了3个epoch,学习率1e-5,batch_size : 96,在八卡H100上使用 DeepSpeed Zero-3 offload 训练了19小时,训练框架为Llama-Factory。
The model is trained with 3 epochs, learning rate 1e-5 and batch size 96. The model training finishes in 19 hours on 8 H100 with DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud pricing). We use Llama-Factory to perform training.