#grpo — TECH Dashboard

Entries page 1/1 · 4 total

Mon, Jun 1 2 entries

blog local-llm 4w ago ·

zenn-llm

GRPOはなぜ長時間学習で崩壊するのか――Qwenが出した「系列単位」の答え、GSPO Explains why the GRPO reinforcement-learning method collapses during long training due to …

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 6月1日 Published Jun 1

AI要約推論モデルのRL手法GRPOがトークン単位の重要度比のばらつきで長時間学習時に崩壊する問題を、一次情報（arXiv 2507.18071とQwen公式）から解説。Qwenが提案した系列単位で最適化するGSPOがこれをどう安定化させるかを読み解く。

EN Explains why the GRPO reinforcement-learning method collapses during long training due to noisy token-level importance ratios, and how Qwen's sequence-level GSPO stabilises optimisation for reasoning models.

#llm #open-model #zenn +9

zenn.dev →

fallback

paper research 4w ago ·

arxiv-cs-lg

VeriGate: 検証器ゲーティングによるGRPOのステップレベル監督 VeriGate: Verifier-Gated Step-Level Supervision for GRPO

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 6月1日 Published Jun 1

AI要約 VeriGateは、GRPOの結果報酬が持つ粗さを補うため、検証器でゲーティングしたステップレベルの監督信号を導入する手法。各推論ステップへ細かい報酬を与えることで、推論モデルの学習効率と精度の向上を目指している。

EN VeriGate augments GRPO with verifier-gated step-level supervision to offset the coarseness of outcome-only rewards, aiming to improve the training efficiency and accuracy of reasoning models.

#arxiv #paper #grpo +8

arxiv.org →

fallback

Tue, Mar 31 1 entries

🔥 HOT NEW blog local-llm 3mo ago ·

huggingface-blog

TRL v1.0公開: 進化に追従するポストトレーニングライブラリ TRL v1.0: Post-Training Library Built to Move with the Field

重要度 High High priority 重要度 High · 技術記事 · Local LLM / Open Models High priority · technical post · Local LLM / Open Models 公開 3月31日 Published Mar 31

AI要約 Hugging FaceがLLMポストトレーニング用ライブラリTRLのv1.0を公開。SFT/DPO/GRPOなど主要手法を統合し、API安定化やvLLM連携、マルチノード分散学習、VLM対応強化を備え、実運用に耐える成熟版へ到達した。

EN Hugging Face released TRL v1.0, its post-training library unifying SFT, DPO, and GRPO. The release stabilizes the API and adds vLLM integration, multi-node distributed training, and stronger VLM support for production use.

#huggingface #open-model #trl +7

huggingface.co →

fallback

Tue, Mar 10 1 entries

NEW blog local-llm 3mo ago ·

huggingface-blog

オープンソースRLライブラリ16種に学ぶ非同期学習の現状 Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 3月10日 Published Mar 10

AI要約 Hugging Faceが16のオープンソース強化学習ライブラリを比較し、LLM向けRL訓練の非同期化やトークン生成効率化の課題を整理。学習と推論の分離やオフポリシー対応でスループットを高める設計パターンを解説する。

EN Hugging Face surveys 16 open-source RL libraries, mapping out how each tackles async training, throughput, and off-policy support to keep tokens flowing. It distills design patterns for separating training and inference in LLM RL workflows.

#huggingface #open-model #rlhf +7

huggingface.co →

fallback

#grpo 4 total

Entries page 1/1 · 4 total

GRPOはなぜ長時間学習で崩壊するのか――Qwenが出した「系列単位」の答え、GSPO Explains why the GRPO reinforcement-learning method collapses during long training due to …

VeriGate: 検証器ゲーティングによるGRPOのステップレベル監督 VeriGate: Verifier-Gated Step-Level Supervision for GRPO

TRL v1.0公開: 進化に追従するポストトレーニングライブラリ TRL v1.0: Post-Training Library Built to Move with the Field

オープンソースRLライブラリ16種に学ぶ非同期学習の現状 Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries