UTSW’s New AI Tool Hits 99% Accuracy in Medical Data Extraction

Manual data extraction from medical records has long slowed down research. Now, UT Southwestern’s breakthrough AI tool promises 99% accuracy—revolutionizing how researchers collect, organize, and analyze clinical data.

UT Southwestern Medical Center has unveiled a groundbreaking AI-driven pipeline that could fundamentally transform how medical researchers handle clinical data. By leveraging a powerful large language model (LLM), this new system rapidly pulls key details from complex, free-text pathology reports with stunning accuracy—up to 99% in tumor classification and 97% in identifying metastasis.

The tool, developed by a multidisciplinary team of data scientists, pathologists, clinicians, and statisticians, has already shown promise in parsing over 2,200 kidney cancer reports. Researchers say the implications reach far beyond just kidney cancer—this model could pave the way for similar breakthroughs in many areas of medicine.

Key Takeaways:

  • 99% accuracy in identifying kidney tumor types from free-text medical records
  • Saves hundreds of hours of manual chart review per study
  • Potential for broad application beyond kidney cancer
  • Validated across 3,500+ internal pathology reports
  • Collaborative approach between AI experts and medical professionals ensured precision

The Old Way: Manual Review

Traditionally, creating structured datasets from free-text medical records has been a painstaking process. Researchers often spend weeks—even months—sifting through complex narratives just to pull relevant clinical facts.

“Constructing accurate datasets from medical records is extremely time-consuming,” explained David Hein, M.S., Data Scientist at UT Southwestern. “Our goal was to simplify and accelerate that process using AI.”

The newly developed pipeline uses a large language model to scan narrative pathology reports and identify detailed information like tumor types, size, location, and spread. Not only does it extract the data—it also standardizes it for immediate use in analysis.

UTSW,UT Southwestern medical center,ut southwestern,utsw ai,payal kapur

A Human-Centered AI Breakthrough

One of the toughest challenges in training AI for this task is the sheer variety in how doctors phrase things. As Dr. Payal Kapur, Professor of Pathology and Urology, pointed out:

“Clinicians use hundreds of terms to describe the same findings. It’s not just a yes or no—it’s nuanced, narrative detail.”

To overcome this, the team ran multiple training and testing cycles, refining the AI model with input from real medical professionals at every step. The result: a tool that understands context and captures intricate, human-written medical narratives with near-perfect precision.

Tested. Trusted. Ready to Scale.

To validate the model, the team tested it on over 3,500 internal kidney cancer pathology reports. The model’s performance remained consistent—thanks in large part to UT Southwestern’s Kidney Cancer Program, which provided curated, high-quality datasets that enabled the AI to learn effectively.

“It’s the collaboration that made this work,” said Dr. James Brugarolas, Director of the Kidney Cancer Program. “When you have data scientists working hand-in-hand with doctors and pathologists, the results speak for themselves.”

More Than Just Kidney Cancer

While this study centered on kidney cancer, the team sees enormous potential for applying this technique across other diseases and medical specialties.

“There’s no one-size-fits-all model,” said Dr. Andrew Jamieson, Assistant Professor and lead AI developer. “But the strategies we used—like iterative training, cross-specialty collaboration, and validation—can help others build their own models.”

The broader implication? AI could become the backbone of modern medical research, dramatically accelerating timelines and improving data integrity.

Why It Matters

In an era where data drives discovery, this kind of automation isn’t just a convenience—it’s a research multiplier. The ability to quickly mine rich clinical data could shorten the timeline between discovery and treatment, potentially saving lives.

And unlike traditional systems, which require massive human labor to build usable datasets, this AI model learns from narrative language—making it especially adaptable across disciplines.

What’s Next?

UT Southwestern researchers plan to refine the system and explore its applications in other tumor types and chronic conditions. With continued investment and support, it could become a core tool in every research hospital’s arsenal.

As the medical field increasingly embraces AI, human-guided models like this—where doctors shape how the machine learns—will likely be the gold standard.

Also Read

Leave a Comment