Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers
Hugging Face releases a new tutorial demonstrating how fine-tuning multimodal embedding models can yield performance far surpassing general-purpose large models in specific domains (like visual document retrieval), even outperforming models with 4x its parameters.
Hugging Face Blog · Apr 16, 2026
AI Document Classification: A Practical Guide to Automated Sorting and Tagging
AI document classification automates sorting and tagging by understanding content and context, freeing enterprises from labor-intensive manual classification and serving as a crucial step toward automating document workflows.
LlamaIndex Blog ·
Building a Financial Document Pipeline with LlamaParse
LlamaParse's 'agentic parsing' capability automatically transforms messy financial PDFs (like pay stubs and brokerage statements) into structured data and enables cross-document analysis, significantly boosting automation in workflows like loan underwriting.
LlamaIndex Blog ·
Building a Financial Due Diligence Agent with LiteParse
LlamaIndex demonstrates a financial due diligence AI agent built with just 600 lines of code and no vector database, leveraging LiteParse to extract PDF layout information for precise, highlighted source citations in answers.
LlamaIndex Blog ·
How Agentic AI Improves Document Extraction Accuracy and Automation
The article argues that by introducing a 'plan-act-verify' agent loop, document processing is shifting from mechanical pattern matching to a cognitive task with spatial awareness and contextual reasoning, breaking through the limitations of traditional OCR.
LlamaIndex Blog ·
OCR Accuracy Explained: What Impacts Performance and How to Improve It
OCR accuracy is not a single number, but a systems engineering problem determined by image quality, document complexity, evaluation metrics, and post-processing.
LlamaIndex Blog ·
OCR for KYC: Why Standard Text Extraction Falls Short of Compliance Requirements
The article reveals the fundamental shortcomings of traditional OCR in financial KYC compliance, highlighting its failure with real-world documents and proposing 'Agentic OCR' as the solution.
LlamaIndex Blog ·
Why Single-Pass Extraction Fails and What Deep Extraction Actually Solves
Single-pass extraction fails silently on complex documents, while deep extraction uses an iterative, agent-driven verification loop to achieve near-perfect accuracy, making it essential for production workflows.
LlamaIndex Blog ·