#gpu-kernel — TECH Dashboard

NEW paper research 5h ago ·

arxiv-cs-lg

RaMP: MoE向けランタイム対応メガカーネル多態性 RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts

AI要約 Mixture-of-Experts推論を高速化するため、実行時の負荷に応じてメガカーネルの実装を切り替える多態的アプローチRaMPを提案。動的なエキスパート選択に伴う非効率を低減し、GPU上でのMoE実行性能を改善する。

EN RaMP introduces runtime-aware polymorphism in megakernels for Mixture-of-Experts inference, dynamically switching kernel implementations based on workload characteristics to mitigate inefficiencies from dynamic expert routing and improve GPU performance.

#arxiv #paper #mixture-of-experts #gpu-kernel

arxiv.org →

#gpu-kernel page 1/1 · 1 total

RaMP: MoE向けランタイム対応メガカーネル多態性 RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts