Z.ai Launches GLM-OCR Model Beats Document AI Giants

In a result that’s turning heads across the AI research world, Z.ai has landed at the top of a major document AI leaderboard—using a model that’s barely a fraction of the size of its rivals.

GLM-OCR, Z.ai’s newly released optical character recognition model, scored 94.62% on OmniDocBench v1.5, outperforming larger systems like PaddleOCR-VL-1.5 and DeepSeek-OCR2. The surprise? GLM-OCR runs on just 0.9 billion parameters, challenging the assumption that bigger models automatically mean better results.

Table of Contents

A leaderboard win that breaks the “bigger is better” rule

OmniDocBench is designed to stress-test document AI systems on real-world complexity: dense text, messy layouts, formulas, tables, and multi-field extraction. GLM-OCR didn’t just edge out competitors—it showed consistent strength across all of those categories.

That matters because most enterprise OCR workloads aren’t clean scans. They’re invoices, contracts, research papers, and regulatory documents where tables blur into text and equations break traditional OCR pipelines.

Z.ai says GLM-OCR processes PDFs at 1.86 pages per second, putting it in the same performance class as far larger models while using significantly fewer compute resources.

Introducing GLM-OCR: SOTA performance, optimized for complex document understanding.

With only 0.9B parameters, GLM-OCR delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction.… pic.twitter.com/2c6iSsaXYs
— Z.ai (@Zai_org) February 3, 2026

Why GLM-OCR punches above its weight

The model is built on Z.ai’s GLM-V architecture and uses a technique called Multi-Token Prediction, which allows it to generate multiple tokens in a single inference step. In practical terms, that means faster processing without sacrificing accuracy—an especially useful tradeoff for document-heavy workflows.

Instead of aiming to be a general-purpose vision-language model, GLM-OCR is tightly optimized for document understanding. That focus appears to be paying off.

Open-source—and ready to deploy

Unlike many leaderboard-topping models, GLM-OCR isn’t locked behind an API or restrictive license. Z.ai released it under the MIT license on Hugging Face, making it free for commercial use.

It also ships with immediate support for popular inference and deployment stacks like Ollama and vLLM. That lowers friction for teams looking to run OCR locally, on private infrastructure, or at scale.

For startups and enterprises alike, that combination—top-tier accuracy, small footprint, and permissive licensing—is hard to ignore.

Part of a broader shift in document AI

GLM-OCR’s release fits into a wider trend emerging in early 2026, particularly from Chinese research teams: compact, task-specific models are gaining ground over massive, general-purpose systems.

Instead of chasing raw parameter counts, these teams are targeting efficiency—models that are “just big enough” to excel at invoices, forms, and academic documents. As costs and energy usage come under greater scrutiny, that approach is resonating.

Why this matters

For businesses processing millions of documents, OCR quality directly impacts automation, compliance, and cost. A smaller model that delivers state-of-the-art accuracy can dramatically reduce infrastructure requirements while improving reliability.

GLM-OCR’s leaderboard win suggests the next phase of document AI may be less about scale—and more about smart specialization.

Conclusion

GLM-OCR shows that in document AI, precision engineering is starting to beat brute force—and that could reshape how OCR systems are built and deployed going forward.