Text Mining of Microbiology Findings

For diagnosis confirmation and reimbursement optimization

For the clinics and hospitals under cost pressure, a complete coding of diagnoses and therapies is essential for the billing of services.  The completeness of the coding becomes a particular challenge when medical documentation from other areas such as the laboratory has to be taken into account.  The new version of Health Discovery addresses precisely this challenge.

In addition to clinical notes, surgery and pathology reports, microbiological reports  can now also be analyzed. Numerous DRG-relevant secondary diagnoses that were previously often overlooked can be recognized and coded.

View of a microbiology finding in the annotation editor of Health Discovery


Case study microbiology: 10% uncoded ICD secondary diagnoses

The new module has already been successfully tested in a large German hospital.  For this purpose, a total of more than 70,000 microbiological reports were analysed using Health Discovery. The result: more than 12,500 ICD codes were derived from positive microbiological findings. About 10% of these codes had not been coded by the hospital’s medical controllers so far.

Top 10 ICD-10-Codes (German version)


Assessment of CCL relevance

A comparison with the clinical complexity level CCL (Complication or Comorbidity level) showed that of the approximately 12,500 predicted codes, more than 7,500 represent CCL-relevant secondary diagnoses. Of the diagnoses not yet coded, a total of 825 were found to be CCL-relevant.

The new microbiology module can now be used by all Health Discovery customers and, through the partnerships with Cerner, is also available to all hospitals that use Meta-KIS as DRG-optimization software.

Read More

Averbis Partner TriNetX is providing Covid-19 researchers with up-to-date patient-level clinical data

Our partner TriNetX, through its global network of 150 healthcare organizations, is providing Covid-19 researchers with up-to-date patient-level clinical data for those diagnosed with the virus to help develop supportive, curative, and preventative therapies for the disease.

Crucial types of information from Covid-19 patients are currently only available in an unstructured form such as doctors’ letters, progress reports, radiology reports, etc. In the current situation COVID-19 is very often mentioned in these documents (e.g. patient is afraid of having Covid-19 or the patient’s wife has returned from a corona risk area). In order to improve the detection of positive COVID-19 diagnoses, Averbis and the technical team of TriNetX have jointly optimized the NLP based diagnosis detection for COVID-19.

Now it is possible to incorporate facts from unstructured documents to develop a study a cohort of COVID-19 patients.

Thank you for delivering so fast!

(Photo by Drew Hays unsplashed)


Read More

mineRARE: Semantic text-mining of electronic medical records as diagnostic decision support tool to search for rare neurologic diseases such as Pompe disease, Fabry disease and Niemann-Pick type C disease


Background and aims:Diagnosis of rare neurogenetic disorders is often challenging, particularly adult-onset presentations, with long diagnostic delays and misdiagnosis. As therapies become available, it is increasingly important to identify patients with rare neurologic diseases.

Methods:This multicenter project on ten rare neurogenetic diseases was approved by local Ethics committees and data protection authorities of six German University medical centers. Semantic text mining software structures medical data by ranking documents according to probability of disease, based on disease-specific lists of weighted signs and symptoms. Software and search algorithms were optimised in a pilot phase. Existing electronic medical records from the Department of Neurology of each center, corresponding to 10 years of activity, were screened, and patients ranked by probability of having the respective disease. An experienced team of physicians reviewed the data for the top ranked patients and those without a confirmed diagnosis were contacted for testing for the respective disease.

Results:In the pilot phase, 4 patients with Pompe disease and 4 heterozygous NPC1 mutation carriers were identified in Munich. More than 400.000 datasets from four centers were analysed for three diseases: Niemann-Pick type C disease, Pompe disease and Fabry disease. Four novel Pompe patients and 3 heterozygous NPC1or NPC2 mutation carriers were identified, who had not previously been diagnosed. Data from more centers will be provided.

Conclusion:Electronic medical records-based diagnostic data mining seems to be a promising tool to help diagnosing rare neurologic diseases. It may allow effective screening, re-evaluation of patients with uncertain diagnosis, and identification of patients for clinical trials.

This research was supported by research grants from Sanofi Genzyme and Actelion.

Cite this article as:

Catarino C, Grandjean A, Doss S, Mücke M, Tunc S, Schmidt K, Schmidt J, Young P, Bäumer T, Kornblum C, Endres M, Daumke P, Klopstock T, Schoser B. mineRARE: Semantic text-mining of electronic medical records as diagnostic decision support tool to search for rare neurologic diseases such as Pompe disease, Fabry disease and Niemann-Pick type C disease. European Journal of Neurology 07/2017; 24(Suppl 1):75-75.

Read More

Information Discovery: New Version 4.11 is available!

averbis information discovery text mining

information discovery averbisInformation Discovery is a next-generation text analytics platform that allows you to gain insights into your unstructured data and explore key information in the most flexible way possible. Information Discovery collects and analyzes all kind of documents such as patents, research literature, databases, websites and company- internal repositories.

In recent months, Averbis has published versions 4.9 to 4.11 with a variety of improvements and extensions within a very short time. Highlights include the new modules of Terminology Editor for the maintenance of terminology and Annotation Editor for visualization of text mining results, as well as the horizontal scalability of text mining pipelines for a variety of big data scenarios.

Here are the most important innovations in versions 4.9 to 4.11 at a glance:

  • With the newly developed Terminology Editor, terminology can now be edited directly in Information Discovery and seamlessly integrated in text mining pipelines.
  • Text analysis pipelines can now be distributed horizontally on any number of machines. The distribution can be fully controlled via the graphical administration within Information Discovery.
  • The new Annotation Editor lets you graphically visualize and edit text analysis results.
  • The Evaluations Workbench allows the comparison of different text mining pipelines regarding precision, recall, F1, and standard deviation.
Read More

Information Discovery: New Version 4.8.0 available!

averbis information discovery text mining

information discovery averbisInformation Discovery is a next-generation text analytics platform that allows you to gain insights into your unstructured data and explore key information in the most flexible way possible. Information Discovery collects and analyzes all kind of documents such as patents, research literature, databases, websites and company- internal repositories.

With the new version, we have revised our Information Discovery platform and significantly improved the interfaces and the configuration options.

The key new features of Version 4.8.0 include:

  • New! Graphic configuration and administration of text analysis pipelines
  • New! Monitoring of throughput numbers at the pipeline level and component level
  • New! Restful text mining web services per Swagger framework
  • New! Additional connectors to file systems and databases
  • New! Numerous improvements for scalable and shared applications
  • Various bug fixes
Read More

XplOit – Research to automate the Prediction of Disease Progression


BMBF has started a project to support and strengthen research on software-based forecasts, individualized treatment of disease processes, and applications in transplantation medicine

Individualized mathematical/systems medical models for the development of disease processes have the potential to predict future health events and individual therapy outcomes. Such predictive models can assist clinicians in diagnosing and treating their patients, and help patients to better understand their diseases. In order to develop a mathematical prediction model, a large range of varied clinical patient data must be gathered from information systems, curated, and extensively analyzed. Privacy must be ensured when dealing with personal and sensitive patient information. The complex predictive models obtained from analysis of the data must first be checked in elaborate clinical trials in terms of their prediction accuracy before they can be used in practice. Previously, this has been achieved by only a few models.

The BMBF project “XplOit” will facilitate and accelerate the complex process of preparing and combining clinical data, of model development and validation, as well for making it available for clinical use. This should be possible with a new generation of advanced so-called semantic data integration and information extraction tools, which are integrated with a modeling workbench in an IT platform.

Averbis XplOit

© Fraunhofer IBMT, MEV Verlag

The “XplOit” platform is initially designed for the development and validation of predictive models to improve treatment after stem cell transplantation. The transplantation of hematopoietic stem cells from donors, for example, is used for the treatment of various forms of leukemia. Life-threatening complications can occur, such as viral infections or graft-versus-host reactions. Even disease relapses are often seen. At present, it is not yet possible to predict with high accuracy in which patients these complications will occur, and life-saving measures are often introduced too late. Using the “XplOit” platform, precise forecast models will be developed individually for each patient to predict possible complications, and to prevent them through timely clinical intervention or counteracting them at an early stage.

Launched in March 2016 and designed to run for 5 years, the collaborative project “XplOit” is implemented by an experienced international, multidisciplinary team of experts from the fields of medicine, systems biology, computational linguistics, and medical and bioinformatics. It is coordinated by the Fraunhofer Institute for Biomedical Engineering IBMT, which also holds the lead in development of the “XplOit” platform, and contributes core components for extracting information, integration, and analysis. The Institute for Formal Ontology and Medical Information Science at the University of Saarland is mainly responsible for the so-called Semantic Integration Framework platform. The company Averbis develops tools for the extraction of information from clinical text documents. Computer scientists from the Department of Pediatric Oncology and Hematology of the University of Saarland are in charge of data protection and develop anonymizing tools and parts of the modeling workbench system.

The modeling itself is done by the Max Planck Institute for Computer Science and the Department of Clinical Pharmacy of the University of Saarland. Clinical expertise and data are represented by the Department of Internal Medicine I — oncology, hematology, clinical immunology, rheumatology and the Institute of Virology at the University of Saarland, as well as by the Department of Bone Marrow Transplantation and the Institute of Virology at the University Hospital Essen. The clinical partners will validate the predictive models for stem cell transplantation developed through coordination by the Institute of Virology at the University of Saarland using the “XplOit” platform.

A basic version of the “Xploit” platform will be ready in autumn 2018 from the University Hospital and model developers. First prototype predictive models for stem cell transplantation medicine are expected in early 2019. XplOit is funded under the initiative i:DSem — Integrative Data Semantics in Systems Medicine funded by the Federal Ministry of Education and Research.

Duration: 01.03.2016 – 28.02.2021

Contact person(s):

Dr.-Ing. Gabriele Weiler / Dipl.-Inform. Stephan Kiefer
Projektkoordinatoren XplOit
Fraunhofer-Institut für Biomedizinische Technik IBMT
Joseph-von-Fraunhofer-Weg 1
66280 Sulzbach

Phone:+49 6894/980-156

E-Mail: /

Project partners:

Fraunhofer-Institut für Biomedizinische Technik IBMT, St. Ingbert (Koordinator)

Universität des Saarlandes

Max-Planck-Institut für Informatik, Saarbrücken

Universitätsklinikum Essen

Averbis GmbH, Freiburg

Read More

SEMCARE: New Platform for Information Management in the Healthcare Industry

Information Discovery specially designed for the healthcare branch – that was the goal of the SEMCARE (Semantic Data Platform for Healthcare) research project. Alongside European partners from industry, secondary education, and clinical settings, we acted as coordinators in developing the new data analytics platform Information Discovery for Healthcare through this EU-funded project.

The software developed in SEMCARE supports clinics in diagnosing illnesses and selecting suitable treatments, and makes it easier for them to choose suitable patients for clinical studies. In the SEMCARE project, we created a platform based on our proven information discovery technology to combine the newest text-mining technologies with multilingual semantics. Specific medical language features, terminology, and vocabulary were integrated, making it possible to harmonize and analyze structured and unstructured patient data from a variety of sources. Medical documents can be searched based on diagnoses, symptoms, regulations, and other criteria. This makes it possible to assemble patient groups based on specific characteristics, for instance for clinical trials, with just a few mouse clicks.

Screenshot Averbis Information Discovery

Text mining meets patient data

Information Discovery for Healthcare was developed to pool specific patient data according to defined clinical criteria. The technology makes it possible to find and evaluate individual criteria such as age, sex, diagnosis, indication, symptoms, or laboratory results in documents from various sources. Powerful full-text search capabilities and semantic textual analysis are combined here into a hybrid semantic full-text search. This makes it possible to semantically integrate heterogeneous unstructured and structured data sources for the purpose of identifying information and documents, as well as to complete individual data assessment and representation.

In order to provide basic functionality, the software requires only access to the text documents, independent of their format, whether it be PDF, RTF, TXT, or others. Data export to platforms like i2b2 and tranSMART is supported. Open interfaces also make it possible to integrate the system into existing hospital information systems.

Our Proof of Concept

The platform has already been evaluated in three European pilot locations in London, Rotterdam, and Graz during the SEMCARE project term. Information Discovery for Healthcare was used in these locations in the field of cardiovascular disease. High-risk patients were successfully identified using specific biomarkers and combinations of symptoms. Due to their ischemic heart disease, these patients are at risk of dying from sudden cardiac arrhythmia. Early recognition of this danger can be used to provide timely care with suitable treatments and lower the mortality of this patient group.

Privacy protection in accordance with international standards

Information Discovery for Healthcare was developed in a manner that conforms with statutory regulations on privacy protection. A multi-layer concept of data usage regulates access rights, guaranteeing the highest level of security:

  • Installation exclusively in individual hospitals: no higher-level collection of data from multiple hospitals into centralized data banks
  • Integration into local access control systems: data from Information Discovery may only be viewed if the user has unrestricted access to the electronic patient data system
  • Automated, retrospective analysis of patient data: limited access to data from the user’s own department
  • No clinical data leaves the hospital

One platform – Many uses

One Application - Multiple Scenarios

Information Discovery for Healthcare is a modern analysis platform that gives users the opportunity to complete comprehensive semantic searches of medical documents. It provides insight into structured and unstructured patient data, and facilitates flexible and comprehensive data analyses and correlations. With its comprehensive functionality, the platform is suitable for use in a wide variety of applications in the healthcare industry.

Read More

Information Discovery: New version 4.5.0 available!

averbis information discovery text mining

information discovery averbisInformation Discovery is a next-generation text analytics platform that allows you to gain insights into your unstructured data and explore key information in the most flexible way possible. Information Discovery collects and analyzes all kind of documents such as patents, research literature, databases, websites and company- internal repositories.

With the new version we have our Information Discovery revised again and again taken the opportunity to improve the user interface in detail.

The key new features of Version 4.5.0 include:

  • New! Significantly improved performance of the file system crawler
  • New! Additional configuration options with regard to indexing
  • New! Configurable facets: Facets can now be configured as AND or OR links, rendering search queries easier and quicker to configure. You can now also immediately refresh the next entries dynamically within the facets.
  • Various bugfixes
Read More

Big Data in Healthcare

Proportion of studies in Europe, which conclude the recruitment process in time
Proportion of studies in the United States, which conclude the recruitment process in time
8 Mio. $
Cost per day in case of delayed launch of a medicament

Healthcare systems worldwide are currently going through major transformations brought on by increasing regulation, record public debt, and shrinking budgets. Traditionally separate and fragmented sectors of the industry such as healthcare providers, payers and drug companies are now looking at ways to work together and coordinate efforts to improve patient safety and healthcare quality while reducing costs.

In these times of reduced income from payers and a decline in R&D productivity, manufacturers seek to develop drugs that combine cost effectiveness and targeting with a high value. The concept of personalized medicine offers the chance of improved healthcare, better patient outcomes, and less harm. Clinical data perform an important function in this scenario, whether through being able to identify who is likely to respond well to a given treatment, or by speeding up and improving drug submissions to regulatory authorities.

Personalized medicine, however, has created various “big data” challenges for medical trials, including how to collect, manage and examine effectively the increasing amount and velocity of patient data involved. There has been an enormous upsurge in documented patient data noted by the life sciences industry in the last 10 years. This has been propelled by impressive changes that include advances in genome sequencing technologies; the adoption of Electronic Health Records (EHRs) by different healthcare systems; the sharing of clinical-trial data; and the explosion in data from patient registries, social media networks, and medical and non-medical devices (e.g. smartphones and fitness monitors). These changes have given rise to a profusion of data from diverse sources such as: genomic, clinical trials, EHRs, and research studies.

The adoption of advanced analytical tools is more necessary than ever to develop insights from these data. Manufacturers are thus placing themselves to create more targeted therapies and to revolutionize the way that biopharmaceutical drugs are discovered, developed, and marketed. Identifying and recruiting suitable patients and finding trial sites are the main causes of trial delays, where there is no access to clinical data. Delayed trials waste precious resources and curtail access to new drugs.

Half of clinical trials today fail to obtain the target sample size needed for the study; just 18% of Europe based studies and 7% of US based studies complete enrolment on time. A single day of delay for a drug reaching the market can cost pharmaceutical companies up to 8 million US dollars.

Averbis’ mission is to facilitate collaboration between pharmaceutical and medtech companies and healthcare providers by building tools and services giving real-time access to large patient populations. We reduce inefficiency and unnecessary expenses in clinical studies enabling pharmaceutical companies to get new therapeutics on the market faster. We improve clinical research leveraging the patient’s data, providing tools and services for semantic harmonization and better quality data. The network enables collaboration with other member providers, advancing translational research efforts. We allow hospitals to fund their research via pharmaceutical company sponsorships and increase their participation in industry-sponsored, clinical studies and enhance grant competitiveness. We help preventing unnecessary amendments to clinical studies and aid in identifying clinical trial sites that provide access to a sufficient number of patients meeting the inclusion and exclusion criteria. By this, pharmaceutical companies get new therapeutics on the market faster by reducing the inefficiencies of clinical studies and eliminating unsuccessful candidates early on in the clinical trial process.

Read More


In mid-2016, a new set of regulations concerning product compliance with European industry standards will be put in place for pharmaceutical and life science organizations. Identification of Medicinal Products (IDMP) is a framework of detailed descriptions of substances, composition and dosage forms, production procedures, and packaging. These IDMP norms require identification of all pharmaceutical products according to certain data standards, laid out by ISO (International Organization for Standardization). ISO has come up with five IDMP standards; these are aimed at accurately identifying medicinal products for human use, with a high degree of certainty.

IDMP Standards Averbis

With Europe being the first region to adopt Identification of Medicinal Products (IDMP) standards, by July 1st, 2016, time is running out for life science organizations to comply. It is anticipated that the U.S. Food and Drug Administration (FDA) and the Japanese Pharmaceutical and Medical Devices Agency (PMDA) will follow the European Medicines Agency’s (EMA) advance move. EMA is the first regulatory agency that requires life science organizations to comply with the ISO standards. In the case of noncompliance, organizations will face severe fines. Life science organizations therefore will need to quickly establish a data standardization process that is robust, reliable, and flexible enough to meet these varying regulatory demands.


The main challenges from the perspective of implementing compliance are:

  • An enormously ambitious time schedule. Life science organizations are obliged to adhere to the requirements prior to the July 2016 deadline. But the final EMA implementation guidelines were only made available in late 2015. This leaves organizations with a time slot of less than a year to comply with the new standards.
  • The need to collaborate between business units. The product master data required to satisfy the IDMP standards is present in a wide set of units and systems within life science organizations and their suppliers.
  • A massively unstructured data pool. So far, most data has been submitted to regulators in heterogeneous formats such as pdf, doc and txt files. Details about substances such as the Summary of Product Characteristics (SmPC), Manufacturing Licenses, Chemistry, Manufacturing and Control (CMC) documents and others are present in a wide set of source systems within the organizations.

Regarding the last point, text mining technologies are crucial for overcoming the complexity and heterogeneity of unstructured data. Text mining refers to the process of deriving high-quality information from text, a process that is capable of quickly extracting relevant information from text sources and structuring heterogeneous data contained in various IDMP relevant sources, such as SmPC documents. Relevant information includes product names; ingredients; excipients; pharmaceutical dosage forms, strengths, and units; undesirable side effects; and much more. This information is mapped to standardized vocabulary as defined in the above-mentioned ISO standards and as shown in the following picture.

Identification of Medicinal Products (IDMP)

Let’s take undesirable effects as an example, to show the complexity of the task and the capabilities of advanced text mining solutions. Side effects are usually listed in section 4.8 “Undesirable Effects” of SmPC documents. They are mostly present in different table formats but can also be found in a text passage. Side effects will be coded/mapped to the MedDRA vocabulary. The information intended for extraction consists of multiple items from the table. For example, adverse events are usually listed in connection with its System Organ Class (SOC) and frequency. Text mining solutions must be able to extract such complex information and relations from tables and automatically map it to MedDRA codes.

If a text mining solution is able to fulfill these needs with high precision, then it will save pharmaceutical companies a lot of time and money. It creates a consistent and homogenized set of product master data, permitting significant analysis. This delivers new insights in areas including post-marketing surveillance, competitor analysis, supply chain, and sales/marketing. Expenses associated with managing, integrating, maintaining, and reconciling data across functions and sites are reduced.

Averbis offers solutions that enable organizations to comply with IDMP, within the given timeframe, and to leverage data assets. Please contact us to find out more.

Read More
page  1  of  2

Jetzt weitere Informationen und Demo anfordern!

kostenlos & unverbindlich

Schreiben Sie uns von Ihrem Vorhaben (Pflichtfeld)


Vorname (Pflichtfeld)

Name (Pflichtfeld)

Jobtitle / Rolle (Pflichtfeld)

Firma (Pflichtfeld)

E-Mail-Adresse (Pflichtfeld)



Your message (required)

Use case

First Name (required)

Last Name (required)

Job Title / Role (required)

Company (required)

Email (required)