Papers / Benchmarks ⚠ 古い情報の可能性

安定性と表現力のギャップを埋める：低リソース音声言語モデルのための合成データスケーリングと選好アライメント Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

arXiv cs.CL · arxiv.org · 2026/05/29 13:00 · 3w ago · 📖 1 min

元記事を読む古い情報の可能性

AI 3 行サマリ

低リソース環境の音声言語モデルにおける安定性と表現力のトレードオフを、合成データのスケーリングと選好アライメントで解消する研究。

English summary

arXiv:2605.27383v1 Announce Type: new Abstract: Spoken Language Models (SLMs) have emerged as a promising paradigm for speech synthesis by bypassing explicit grapheme-to-phoneme pipelines.
However, th

音声言語モデル（SLM）は、grapheme-to-phonemeパイプラインを必要としない音声合成の新たなパラダイムとして注目されている。しかし低リソース環境では、生成の安定性と表現力の両立が困難であるという課題が残されている。

本論文ではこの「安定性と表現力のギャップ」を埋めるため、合成データのスケーリングと選好アライメントを組み合わせたアプローチを提案している。詳細な手法や実験結果については原論文（arXiv:2605.27383）を参照のこと。

Spoken Language Models (SLMs) have emerged as an appealing alternative for speech synthesis, sidestepping explicit grapheme-to-phoneme pipelines. Despite their promise, low-resource settings expose a fundamental tension between generation stability and expressive naturalness—a gap that has limited practical deployment.

This paper proposes addressing that gap through two complementary strategies: scaling synthetic training data and applying preference alignment methods. The combination aims to make SLMs both reliable and expressive even when real supervised data is scarce. Specific benchmarks, datasets, and quantitative results are not available from the abstract snippet alone; readers should consult the full paper at arXiv:2605.27383 for experimental details and conclusions.

#arxiv #paper #spoken-language-model #speech-synthesis #low-resource #preference-alignment #synthetic-data #tts

SourcearXiv cs.CLT1
Source Avg ★ 2.0
Type論文
Importance ★ 通常 (top 93% in Papers / Benchmarks)
Half-life 🏛️ 長期 (アーキテクチャ)
LangEN
Collected2026/05/30 07:00

元記事を読む

arxiv.org

本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。

🔬 Papers / Benchmarks の他の記事 もっと見る →

🔬 Papers / Benchmarks の他の記事もっと見る →