Can GRPO be 10x Efficient? Kwai AI’s SRPO Suggests Yes
The remarkable success of OpenAI’s o1 series and DeepSeek-R1 has demonstrated the power of large-scale reinforcement learning (RL) in eliciting sophisticated reasoning behaviors and enhancing the capabilities of large language models (LLMs). Recent community efforts have focused on mathematical reasoning, but the core training methodologies behind these models often remain unclear. Kwai AI's SRPO suggests that GRPO can be 10x efficient, potentially leading to significant advancements in the field of AI.
Source: source. Read the original story →