Introducing the Ettin Reranker Family
Hugging Face has released six Ettin reranker models of varying sizes, designed to significantly improve the accuracy of search and RAG systems at low cost through a 'retrieve-then-rerank' two-stage architecture.
Hugging Face has released six Ettin reranker models of varying sizes, designed to significantly improve the accuracy of search and RAG systems at low cost through a 'retrieve-then-rerank' two-stage architecture.
IBM releases two Apache 2.0 open-source multilingual embedding models, where the 97-million-parameter compact version outperforms all models of similar size on various benchmarks, demonstrating the huge potential of 'small but mighty' models for specific tasks.
IBM's Granite 4.1 series demonstrates that a meticulously engineered data pipeline and multi-stage training can enable an 8B dense model to match or exceed the performance of a previous 32B MoE model, highlighting a paradigm shift where data quality trumps parameter count.
NVIDIA releases Nemotron 3 Nano Omni, a hybrid Mamba-Transformer model enabling long-context multimodal understanding of documents, audio, and video, leading multiple benchmarks and offering an efficient new option for AI agents handling complex real-world tasks.
Microsoft releases VibeVoice, an MIT-licensed Whisper-style speech model with built-in speaker diarization, capable of locally transcribing up to one hour of audio on a Mac.
DeepSeek's V4 series delivers near-frontier performance at a fraction of the cost (Pro at $1.74/M input, Flash at just $0.14/M), potentially reshaping the cost-effectiveness standard for open-weight models.
DeepSeek-V4 makes million-token context windows practically usable for long-running AI agents by dramatically cutting inference costs and memory usage through its novel hybrid attention architecture.
Alibaba's Qwen releases Qwen3.6-27B, a dense 27B parameter model that outperforms the previous generation's 397B MoE flagship on coding benchmarks, signaling a turning point for efficient, local-first coding models.
NVIDIA trained the Nemotron OCR v2 model on 12 million synthetic images, achieving high accuracy (NED as low as 0.035) and high speed (34.7 pages/second on a single A100 GPU) across six languages, demonstrating that synthetic data is a key solution to the multilingual data bottleneck in OCR.
Simon Willison's famous 'pelican riding a bicycle' benchmark surprisingly shows a locally-run, smaller Alibaba Qwen3.6 model outperforming the cloud-based, massive Claude Opus 4.7 in creative SVG generation, revealing the surprising potential of open-source models for specific tasks.
LangChain's evaluations show that open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks such as file operations and tool use, at a fraction of the cost and with lower latency.