显存带宽优化 — Tag

DiffusionGemma: The First Diffusion LLM (dLLM) Natively Supported in vLLM

vLLM natively supports a discrete diffusion language model that replaces sequential generation with parallel block denoising, trading compute for bandwidth to significantly reduce latency.

vLLM Blog · Jun 10, 2026

Tag: 显存带宽优化 (1 articles)

DiffusionGemma: The First Diffusion LLM (dLLM) Natively Supported in vLLM