プログラムスケッチによる構造化推論時スケーリング手法 Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching
- 本論文は、LLMの推論時スケーリングを構造化するため「Sketch-and-Verify」手法を提案する。
- プログラムスケッチで解の骨格を生成し、検証器で穴埋め部分を確認することで、計算資源を効率的に配分する。
English summary
- The paper proposes Sketch-and-Verify, a structured inference-time scaling method that uses program sketches to outline solutions and verifiers to fill and check holes, aiming to allocate compute more efficiently than naive sampling.
大規模言語モデル(LLM)の性能向上策として、推論時に計算資源を追加投入する「inference-time scaling(推論時スケーリング)」が注目されている。本論文「Sketch-and-Verify」は、その投入を無秩序な多数サンプリングではなく、プログラムスケッチという構造を介して制御する手法を提案している。
中心となるアイデアは、ソフトウェア合成分野で古くから知られる「プログラムスケッチ(program sketch)」の概念をLLM推論に持ち込む点にある。スケッチとは、解の大枠(制御構造や関数の骨組み)をあらかじめ与え、未確定部分を「穴(hole)」として残した不完全プログラムを指す。本手法ではまずLLMにスケッチを生成させ、続いて穴埋め候補を探索、検証器(verifier)が各候補の妥当性を確認するという二段構成を採る。これにより、サンプリング空間を構造的に絞り込みつつ、計算予算を有効箇所に集中させられると見られる。
背景として、OpenAIのo1系モデルやDeepSeek-R1などが示したように、思考過程を長く展開するチェイン・オブ・ソート型のスケーリングは強力だが、コストが線形以上に膨らみやすい。これに対しTree-of-ThoughtsやSelf-Consistency、Best-of-N、Process Reward Modelなど、探索と検証を組み合わせる流派が研究されてきた。Sketch-and-Verifyはその系譜上にあり、特にコード生成や形式的に検証可能なタスクとの相性が良い可能性がある。MicrosoftのSketchツールやProgram-Aided Language Models(PAL)など、記号的構造とニューラルモデルを橋渡しする先行研究との関連も読み取れる。
本論文は、LLMの推論時スケーリングを構造化するため「Sketch-and-Verify」手法を提案する。
なお、提示されたURL(arxiv 2605番台)は将来日付に見え、メタデータの整合性には注意が必要である。手法の汎用性や検証器設計の難しさ、スケッチ粒度の調整など、実運用に向けた課題も残るとみられる。とはいえ、推論時計算をどう構造化するかは今後のLLM研究の中心テーマであり、本提案はその一形態として参照価値がある。
Inference-time scaling, the practice of spending extra compute at decoding to lift model accuracy, has become one of the most active research fronts in large language models. This paper, Sketch-and-Verify, argues that the dominant approach of brute-force sampling or long chain-of-thought is wasteful, and proposes instead to structure the extra compute through program sketches.
The core idea borrows from classical program synthesis. A sketch is a partial program in which the high-level control flow and the overall shape of the solution are fixed by the author, while specific expressions are left as holes to be filled by a synthesizer. The authors adapt this pattern to LLM reasoning: the model first produces a sketch that captures the skeleton of a candidate solution, a search procedure then generates concrete completions for the holes, and a verifier checks each filled-in candidate. By committing early to structural decisions, the search space at scaling time becomes a tree of well-formed variants rather than a flat pool of independent samples.
This design echoes a broader trend toward hybrid neuro-symbolic reasoning. Methods such as Tree-of-Thoughts, Self-Consistency, Best-of-N sampling, and process reward models all try to convert raw compute into better answers by adding some form of branching and scoring. Program-Aided Language Models and tool-augmented reasoning take a different route, offloading subproblems to interpreters. Sketch-and-Verify sits between these lines: it keeps generation inside the model but constrains it with an explicit structural prior, which is likely to be most effective on tasks where correctness can be mechanically checked, such as code synthesis, mathematical derivations, or constraint satisfaction.
There are several reasons to find the approach attractive. First, sketches act as a form of compute budgeting; expensive search effort is directed only at the parts of the solution that the model is genuinely uncertain about. Second, verifier feedback becomes more informative because failures localize to specific holes rather than to entire chains. Third, the framework is compatible with existing scaling tricks, so it could plausibly be stacked with reward models or self-refinement loops.
The paper also raises questions that remain open. The quality of the initial sketch is critical: a wrong skeleton cannot be repaired by hole-filling, so the method may inherit the brittleness of one-shot planning. Designing verifiers that are both sound and cheap is nontrivial outside formally checkable domains. And the right granularity of holes, too coarse and search explodes, too fine and the structural benefit fades, will likely need tuning per task. Readers should also note that the cited arXiv identifier appears unusual, so bibliographic details may warrant verification.
Even so, the broader message is in line with where the field seems to be heading. As frontier labs from OpenAI to DeepSeek invest in reasoning-time compute, methods that impose explicit structure on that compute, rather than simply spending more of it, are likely to play a growing role. Sketch-and-Verify is a concrete proposal in that direction and a useful reference point for anyone designing inference-time systems.
本ページの本文・要約は AI による自動生成です。正確性は元記事 (arxiv.org) をご確認ください。