Tag: 模型架构 (6 articles)

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's new diffusion language models generate tokens in parallel and refine them iteratively, potentially breaking the latency limits of traditional autoregressive models and enabling self-correction.

Hugging Face Blog · May 23, 2026

EMO: Pretraining mixture of experts for emergent modularity

AI2 releases EMO, a new MoE model pretrained to enable emergent modularity, allowing users to selectively use just 12.5% of experts for a task while maintaining near full-model performance.

Hugging Face Blog · May 9, 2026

Building a Fast Multilingual OCR Model with Synthetic Data

NVIDIA trained the Nemotron OCR v2 model on 12 million synthetic images, achieving high accuracy (NED as low as 0.035) and high speed (34.7 pages/second on a single A100 GPU) across six languages, demonstrating that synthetic data is a key solution to the multilingual data bottleneck in OCR.

Hugging Face Blog · Apr 18, 2026

Holotron-12B - High Throughput Computer Use Agent

Holotron-12B optimizes inference efficiency and handles long contexts, becoming a powerful tool for high-performance computing agents, crucial for AI applications.

Hugging Face Blog · Mar 17, 2026

Why We Think

Lilian Weng explores how AI models can enhance reasoning and decision-making by simulating human thought processes, providing new insights for future model design.

Lilian Weng · May 1, 2025

Which tokens does a hybrid model predict better?

Hybrid models significantly outperform pure Transformers in semantic understanding and dynamic context tracking, but lag in verbatim repetition, revealing a clear architectural division of labor.

Hugging Face Blog ·