Automating the extraction of biomedical concepts from patient records for a biotech start-up

case study

Sven Van Hoorebeeck

In a sector where 80% of data lives in unstructured text, one biotech start-up saw an opportunity. From patient records to doctor’s notes, the information was there, but impossible to process at scale. With clinical trials on the line, they needed a way to extract insights in seconds, not hours. That’s when BrightWolves stepped in to build a custom NLP pipeline that could turn complex medical text into structured, actionable data.

Challenge

While NLP held huge promise, building a production-ready solution for medical records came with key challenges. The start-up had to first secure reliable data sources and build robust pipelines for parsing, cleaning, and enriching sensitive data. On top of that, significant effort went into correctly labeling datasets to train supervised models. Finally, the system had to learn application-specific vocabulary to avoid misclassification and semantic confusion, requiring custom ML models tailored to the medical domain.

In projects like this, it’s not just about applying the latest NLP models—it’s about understanding the domain, designing for scalability, and ensuring the output actually supports critical decisions. That’s where the real value is created.

Approach

Starting from the client’s specifications, we designed a custom NLP pipeline tailored to biomedical text processing.

Our approach began by selecting language models best suited for the domain, ensuring the pipeline could handle the complexity of medical records with high accuracy.

Our approach was structured around the following key actions:

Select domain-specific NLP models, such as BioBERT and SciBERT, and integrate them into the spaCy framework for seamless text processing.
Design custom heuristics based on model outputs to accurately extract relevant biomedical concepts from patient records.
Automate data integration by directly inserting structured outputs into the client’s back-end database, enabling real-time access and searchability.

This modular approach ensured flexibility, scalability, and alignment with the client’s long-term data strategy.

Wondering where NLP fits into your operations? Let’s explore how it can help you unlock insights, streamline workflows, and turn unstructured data into real business value. Reach out to Sven to start the conversation.

Let's connect

Impact

The NLP pipeline enabled the client to build a functional MVP, boosting classification accuracy and improving patient-to-trial matching. This milestone helped them demonstrate the potential of their solution and secure buy-in for further development. It also highlighted the importance of high-quality training data—laying the groundwork for future improvements as the client scales toward a full production rollout.

Summary

A biotech start-up needed to extract insights from unstructured medical text to accelerate clinical trials and decision-making.
BrightWolves built a custom NLP pipeline using domain-specific models like BioBERT, integrating heuristics and automation to process patient records accurately and at scale.
We ensured the system was modular, scalable, and aligned with the client’s long-term data strategy.
The result was a high-performing MVP that improved patient-to-trial matching, secured stakeholder buy-in, and set the foundation for future growth.