How We’re Using AI to Extract Medical Chart Reviews

AI-driven medical chart review is gaining traction across clinical research and real-world evidence generation. Many tools now claim to extract structured data from unstructured documents, promising faster access to insights and reduced manual workload. 

But most solutions stop at surface-level text extraction. The real opportunity lies in moving beyond that, using models that understand clinical context, structure outputs intelligently, and adapt to the complexity of real-world documentation. 

By combining robust OCR with a fine-tuned medical language model, we are building a system that turns scanned medical PDFs into consistent, analysis-ready data. 

Turning PDFs into Structured Data 

Extracting clinical data from unstructured PDFs starts with recognising the formats that show up across real-world documentation. Discharge summaries, lab reports and consultation notes are common examples, each with their own layout and level of consistency. 

Here is how it works: 

  • Step 1: OCR built for real documents 
    Scanned or image-based PDFs are processed using a high-performance OCR engine. Even low-resolution files or inconsistent layouts are converted into readable, searchable text. 

  • Step 2: A transformer model tuned for clinical language 
    The text is processed by a customised large language model that has been fine-tuned to recognise clinical terminology and context. It identifies key data points such as diagnoses, medications, lab values and dates, and organises them into structured JSON

The result is structured clinical data that reduces processing time and supports faster, more reliable decision-making. 

Strong Performance, Consistent Results 

Building a tool that works on paper is always easy. Our goal was to create a system that could extract clinically relevant information from real-world documents with precision and consistency. To get there, we tested multiple model variants, refining each one against a curated dataset of medical PDFs. 

We are now able to offer is a solution that reliably extracts the right information from unstructured clinical documents, even when formats vary.  We have observed on average a 95.13 percent precision with an F1 score of 88.02 percent and some models reaching 100 percent. These levels of performance remain consistent across repeated use. 

This level of precision means fewer errors, less rework and greater confidence in the data being used to inform critical decisions. It marks a step forward in how structured insights can be generated from real-world clinical content, quickly, accurately and at scale. 

We Are Applying This Elsewhere Too 

Our approach is not limited to chart reviews. We are using similar models to support automated evidence synthesis in our tool, ANCHOR

In a recent evaluation, automation reduced screening time by over 30 hours. It also improved precision and recall across both abstract and full-text screening phases. That means less time spent reviewing irrelevant content. 

Why It Matters 

Manual chart reviews persist because automation has often fallen short. Existing tools have been difficult to scale or too rigid to deal with real-world documents. 

This system is different. It is accurate, stable and built to handle clinical complexity. It fits into existing workflows and makes structured data available faster and with less effort. 

For anyone tasked with turning information into insight, that makes a real difference. 
 

Want to know what the  ideal medical chart review should look like? 
Download our whitepaper for a practical breakdown of the process, the trade-offs to consider, and how to get it right. Download the whitepaper 

Curious how this connects with oncology? 
Watch our recent interview with health economist Sabrina Müller, where she talks through how OncoCase is transforming oncology documentation into valuable, low-burden research data. Watch the interview 

Next
Next

GIPAM at ISPOR Annual, Montreal 2025: What We Heard and Learned