Tag: 长文本处理 (3 articles)

GLM-5.2 is probably the most powerful text-only open weights LLM

Z.ai releases the 753B-parameter open-weights GLM-5.2, topping key benchmarks while consuming excessive tokens, signaling a new era of brute-force open-source AI.

Simon Willison · Jun 18, 2026

Evaluating Long-Context Question & Answer Systems

A comprehensive guide to evaluating long-context Q&A systems covering metrics, dataset construction, and benchmark reviews across narrative and technical domains.

eugeneyan.com · Apr 5, 2026

Evaluating Long-Context Question & Answer Systems

Long-context Q&A systems face challenges like information overload and multi-hop reasoning, and evaluation should focus on answer faithfulness and helpfulness to enhance user experience.

Eugene Yan · Jun 22, 2025