#post-training — TECH Dashboard

Entries page 1/1 · 3 total

Mon, Jun 1 1 entries

paper research 3w ago ·

arxiv-cs-lg

大規模言語モデルの逐次ポストトレーニングにおける表現崩壊 Representation Collapse in Sequential Post-Training of Large Language Models

重要度 Medium Medium priority 重要度 Medium · 論文/研究 · Papers / Benchmarks Medium priority · paper/research · Papers / Benchmarks 公開 6月1日 Published Jun 1

AI要約複数段階のポストトレーニングを順番に適用すると、LLMの内部表現が崩壊する現象を分析した論文。単一の命令チューニングでは見られないこの問題のメカニズムと対策を論じている。

EN arXiv:2605.30524v1 Announce Type: new Abstract: Large language models are now adapted through chains of post-training stages rather than through a single instruction-tuning pass. This paper studies wh

#arxiv #paper #llm +5

arxiv.org →

fallback

Thu, Apr 16 1 entries

blog gemini 2mo ago ·

google-developers

MaxText、シングルホストTPUでSFTとRLによるポストトレーニングに対応 MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Gemini / Gemma Medium priority · technical post · Gemini / Gemma 公開 4月16日 Published Apr 16

AI要約 GoogleはMaxTextを拡張し、シングルホストTPU上で教師ありファインチューニング(SFT)と強化学習(RL)によるポストトレーニングを可能にした。Tunixと統合し、Gemma等のオープンモデルを少ないリソースで効率的にカスタマイズできる。

EN Google has extended MaxText with post-training support, enabling supervised fine-tuning (SFT) and reinforcement learning (RL) workflows on single-host TPUs through integration with the Tunix library, making it easier to customize open models like Gemma.

#google #maxtext #tpu +4

developers.googleblog.com →

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

og fallback

Tue, Mar 31 1 entries

NEW blog local-llm 2mo ago ·

huggingface-blog

TRL v1.0公開: 進化に追従するポストトレーニングライブラリ TRL v1.0: Post-Training Library Built to Move with the Field

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 3月31日 Published Mar 31

AI要約 Hugging FaceがLLMポストトレーニング用ライブラリTRLのv1.0を公開。SFT/DPO/GRPOなど主要手法を統合し、APIの安定化、vLLM連携、マルチノード分散学習、VLM対応強化など、実運用に耐える成熟版に到達した。

原文JA Hugging FaceがLLMポストトレーニング用ライブラリTRLのv1.0を公開。SFT/DPO/GRPOなど主要手法を統合し、APIの安定化、vLLM連携、マルチノード分散学習、VLM対応強化など、実運用に耐える成熟版に到達した。

#huggingface #open-model #trl +5

huggingface.co →

fallback

#post-training 3 total

Entries page 1/1 · 3 total

大規模言語モデルの逐次ポストトレーニングにおける表現崩壊 Representation Collapse in Sequential Post-Training of Large Language Models

MaxText、シングルホストTPUでSFTとRLによるポストトレーニングに対応 MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

TRL v1.0公開: 進化に追従するポストトレーニングライブラリ TRL v1.0: Post-Training Library Built to Move with the Field