Health Discovery


Health Discovery is a text mining and machine learning platform for analyzing large amounts of patient data. With Health Discovery, medical documents can be analyzed and searched for diagnoses, symptoms, prescriptions, special findings, and other criteria. Heterogeneous patient data in both structured and unstructured forms can be harmonized and analyzed by text mining, and can be accessed and searched via a unified interface.

Health Discovery enables meaningful predictions regarding diagnoses and therapeutic course. Patient cohorts can be assembled with just a few mouse clicks — be it for feasibility studies and patient recruitment for clinical trials, or to support diagnosis in rare diseases, or to support medical coding specialists in medical service billing.

Averbis Health Discovery


Overview Health Discovery


The text mining pipeline from Averbis contains a variety of tools for analyzing unstructured patient data. We analyze physicians’ reports, pathology and radiology reports, and many other data sources. From this we extract diagnoses, prescriptions, laboratory values, localizations, personal characteristics, and many other types of information. Text mining takes into account specific characteristics of medical language usage. Thus, negations (“rule out”) and statements on diagnostic safety (“suspicion of”) are recognized and assigned to the corresponding entities.

Text Mining in Arztbriefen

TEXT MINING from physican reports

A variety of information about a patient’s hospital stay can be extracted from physicians’ reports. Among other things, we extract the following information in from physicians’ reports.


  • ICD-10-Code
  • Degree of safety


  • Active substance
  • Trade name
  • Strength
  • Form
  • Dosing

Laboratory Values

  • Parameters
  • Value
  • Unit

Time Aspects

  • Hospital Duration
  • Creation Date of Letter

TEXT MINING from pathology reports

Pathology reports are often recorded in an unstructured manner by means of speech recognition. To encode the information from pathology reports, we extract the following information:

  • Tumor
  • Diagnosis (ICD10)
  • Localization (ICD-O)
  • Histology (ICD-O)
  • TNM
  • Grading
  • Lymph Node Status
  • Sentinel Lymph Node
  • Receptor Status
  • Voting
  • Site
Text Mining in Pathologieberichten
Anonymisierung von Arztbriefen

Anonymization of findings

Identification and removal of personal characteristics also requires the methods of text mining. If these features are removed, they can be shared beyond hospital borders for research purposes, for example. The features identified by us are based on the “Safe Harbor” identifiers.

  • First and Last Name
  • Age
  • Biometric Data (e.g., size, weight)
  • Date
  • Addresses
  • Contact Data (e.g., telephone, mail)
  • Facilities (e.g., hospital,
  • Others (e.g., IDs, religion)

Semantic search

Semantic search allows large amounts of patient data to be searched. With just a few mouse clicks, patient data can be filtered using medical criteria such as illness, medication, abnormal laboratory values, or other patient characteristics such as age and sex. Linguistic variations such as synonyms and hierarchies are taken into account in the search, as well as statements containing negation and information regarding diagnostic certainty (“suspected of”). Through this process, high accuracy is made possible while at the same time ensuring completeness of hit numbers.

Averbis Workflow Machine Learning

Machine Learning

Machine learning makes meaningful predictions based on patient data. In the field of rare diseases, we make predictions for diagnosis, and for eye diseases, we predict therapeutic outcomes. In the context of DRG coding, we propose billing-relevant diagnosis and procedure codes, and in cardiology, we compare guidelines with patient data and identify patients who need a heart pacemaker. These and many other use cases can be realized using Machine Learning from Information Discovery.


The terminology management system from Averbis enables maintenance and extension of standard and application-specific terminologies. We give you access to a variety of medical terminologies such as ICD-10, SNOMED-CT and LOINC. The table below provides an overview of terminologies that we work with or have already worked with. We enrich these terminologies with a variety of synonyms so that we can achieve a high degree of completeness when analyzing patient data.Technologien für das Terminology-Management von Averbis

Terminologien Kategorien
ATC Medications
FMA Anatomy
ICD-10 Indications
ICD-O Oncology
LOINC Laboratory Values
MedDRA Misc
MeSH Misc
NCI-Thesaurus Misc
OPS Therapies
RadLex Radiology
RxNorm Medications
Uberon Anatomy
UCUM Units


Health Discovery can be fully customized to meet unique business needs, offering a complete set of open and standard API’s for building rich analytic apps and embedding text analytics into existing solutions. Its modular architecture allows seamless integration into existing infrastructures and third party applications.

Averbis Information Discovery Mobile

More than 1000 Document Formats

HealthDiscovery supports more than 1000 different
document formats, providing you with full flexibility.


In this video, we’ll show you how to configure Health Discovery as a text mining and machine learning platform in a few minutes minutes to analyze large volumes of medical documents for diagnoses, symptoms, prescriptions, special findings, and more.

Jetzt weitere Informationen und Demo anfordern!

kostenlos & unverbindlich


Vorname (Pflichtfeld)

Name (Pflichtfeld)

Jobtitle / Rolle (Pflichtfeld)

Firma (Pflichtfeld)

E-Mail-Adresse (Pflichtfeld)



Get more Information or schedule a FREE Demo

Use case

First Name (required)

Last Name (required)

Job Title / Role (required)

Company (required)

Email (required)