Tag: 合成数据 (4 articles)

Data for Agents

NVIDIA experts argue that open data and synthetic data are key to building reliable AI agents: open data for explainability, synthetic data for scaling without exposing secrets.

Hugging Face Blog · Jul 9, 2026

Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining

NVIDIA introduces a task-seeded synthetic data generation pipeline that achieves double-digit benchmark improvements in Nemotron-3 Nano pretraining, signaling a new paradigm for synthetic data usage.

Hugging Face Blog · Jun 4, 2026

How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas

NVIDIA, in collaboration with Korean institutions, released a dataset of 6 million synthetic personas to ground AI agents in authentic Korean demographics and cultural context, moving beyond simple Western defaults.

Hugging Face Blog · Apr 21, 2026

Building a Fast Multilingual OCR Model with Synthetic Data

NVIDIA trained the Nemotron OCR v2 model on 12 million synthetic images, achieving high accuracy (NED as low as 0.035) and high speed (34.7 pages/second on a single A100 GPU) across six languages, demonstrating that synthetic data is a key solution to the multilingual data bottleneck in OCR.

Hugging Face Blog · Apr 18, 2026