Training | Peng Tan's AI Blog

Reflect, Retry, Reward: 大型语言模型的自我进化新范式

Reflect, Retry, Reward: 大型语言模型的自我进化新范式

Reflect, Retry, Reward: 大型语言模型的自我进化新范式

📅 2025-07-04 ⏱️ 6 分钟 📝 2385 字

#Reflect, Retry, Reward #LLM #training

training

微调

本文介绍了微调的常见挑战及其克服方法，并详细介绍了如何使用Unsloth在消费级GPU上对DeepSeek-R1进行微调。

微调

本文介绍了微调的常见挑战及其克服方法，并详细介绍了如何使用Unsloth在消费级GPU上对DeepSeek-R1进行微调。

📅 2025-02-26 ⏱️ 7 分钟 📝 2421 字

#training #finetuning #DeepSeek-R1