Negation recognition in healthcare

Why negation detection?

The recognition of negations is an important task in medical text mining: during the treatment of patients many examinations are carried out and both the presence and absence of symptoms and conspicuous findings are documented. The following are examples of negated findings.

  • Patient shows no meningism, normal light reaction, no indication of scleral icterus, oral mucosa and pharyngeal ring are irritation-free and inconspicuous, thyroid gland not enlarged, nausea did not occur.

Due to the complexity of human language there is the particular challenge of e.g. double negations, pseudo-negations etc. In the following examples, findings are not negated, although there are clear indicator words (“inconspicuous”, “not”, “excluded”) in the same sentence.

  • Apart from a slight skin rash, the physical examination not inconspicuous.
  • tumor cannot be excluded with certainty.
  • Spleen and liver not palpable due to ascites.

Machine learning forms the basis for improved negation detection

In rule-based approaches, these complex patterns are recognized via a wealth of rules. Here Averbis uses the text mining rule engine UIMA RUTA, which is developed by Averbis and made available to the community open-source. Many negation patterns can already be reliably recognized using rules.

Internal analyses have shown that the reliable detection of complex negation patterns leads to an enormous complexity in the rules and the maintainability suffers greatly from this. For this reason, we have added a machine learning approach to this procedure in recent weeks. For this purpose, we have compiled and annotated a large internal training data set consisting of English and German training data. Additionally, we used data from the public i2b2-2010 dataset.

For each diagnosis, the ML model gets information about the words before and after each diagnosis, whether the diagnosis is in an enumeration and whether a negation indicator such as “none” or “not” is present. The rest of the decision making process is fully automated. We have improved individual errors of the ML model by applying specific post-processing rules.

Negation status of diagnoses can be determined with high accuracy

Fortunately, we were able to increase the performance of the negation detection to 96% F1 score (English) and 95% (German) by combining the rule-based and ML method. An in-depth analysis of the errors showed that the ML approach was also able to identify faulty annotations in the gold standard.

Read More

Averbis is awarded as AI Champion

From Baden-Württemberg to the World

Time to celebrate: We are proud and thrilled to be awarded as “KI Champion” by the state of Baden-Württemberg, Germany!

As part of a virtual award ceremony, Minister of Economics Dr. Nicole Hoffmeister-Kraut honored a total of nine winners of the “KI Champions Baden-Württemberg” competition for the first time on August 11, 2020. The competition honors companies and research institutions whose solutions are outstanding examples of AI “Made in BW”.

The award inspires us even more to advance research and processes in healthcare through AI and Natural Language Processing.

A big thank you to the jury and to Minister of Economics, Dr. Hoffmeister-Kraut!

Averbis from 1h 17m

Read More

In conversation with Dr. Jochen Spuck (CTO of EconSight)

EconSight is a new generation, independent and neutral consulting company in the best Swiss sense. Dr. Jochen Spuck, Chief Technology Officer (CTO) at EconSight in Basel, has prepared a scientific analysis for the Bertelsmann Foundation on the topic of “WORLD CLASS PATENTS IN FUTURE TECHNOLOGIES”, in which the Patent Monitor from Averbis is used.

(Read the complete article here, for the methodology see page 66)

The Bertelsmann Stiftung is one of the most influential think tanks in Germany. “The Bertelsmann Stiftung is a place where we look to the future without party-political boundaries and develop impulses for change.” Reinhard Mohn, Founder

Dr. Spuck, most patent analyses use either very broad technology fields or very specific patent classes to categorize the contents. EconSight therefore strikes a balance between these two approaches by developing specific technology definitions in order to capture the technological activities of companies, research institutions, regions and countries in the best possible way. For this purpose you use an AI-based application, our Patent Monitor. Can you explain why you have chosen Averbis?

Firstly, because we have already had very positive experiences with the results for industrial customers for several years, which go far beyond what can be achieved with cluster engines or other automatic categorization tools on the market. Secondly, because Averbis is a competent and “close” supplier with whom you can easily strive for solutions instead of discussing problems.


Where do you see the biggest challenge in the area of worldwide patent administration?

On the one hand, the mass, since the number of patent applications is increasing more and more and at the same time more and more patents are available in languages other than English, especially Chinese and Korean. On the other hand, the dynamics of technologies, triggered by digitalization. The classical classification is often too complex and too slow, which makes it difficult to bring meaningful thematic structures into the patent landscape. And without these, the patent world remains closed to non-experts.


Which potential do you see in the future in the field of AI to support the various questions of a patent analyst with software solutions?

We call it adaptive classification or categorization, i.e. categorization that is quickly adapted to requirements, we believe is the ideal solution for patent analysts. This has to be done on different levels, i.e. large topics, for example “robotics”, have to work as well as on an FTO level, for example “left-hand threaded screw for knee joints”. In addition, AI is increasingly being used to overcome the language barrier once and for all. AI will offer real added value if it can analyse technical similarities in patents on a large scale in order to make disruptive technological developments visible.


In your opinion, how will the world look in 10 years in the field of patents? Where will the development leading to?

Automated prior art searches, AI novelty searches and semi-automated FTOs will completely change the landscape of patent analysts and examiners at the offices. AI will not only do the research, but also the inventions themselves. AI will be used to make trends visible earlier, and new methods for automatic predictions will be investigated. Patent quality indicators will be available that are based on similarities in content, in addition to or instead of citations, and patents will become recognised indicators of the development and commitment of companies, e.g. in the climate and environmental sector (based on AI-supported categorisations). In addition, the use of AI will develop from patents to scientific literature and will significantly expand the search possibilities.


Thank you very much for this informative conversation and your valuable time, Dr. Spuck

Read More

Sonja Fix about Averbis

After more than 10 years in a consultancy for the life sciences, I joined Averbis in 2018 as Project Manager Healthcare. From the very beginning, I was fascinated by the fact that Averbis has retained the “start-up mentality”, but can draw on the experience of more than a decade of working with hospitals and service providers in the healthcare sector.

Together with our clients and partners, developing the potentials of text mining & AI for the health care sector is an exciting task and multifaceted, because our use cases range from medical research to the optimization of administrative processes.  Where else are informed decisions and the best possible use of resources as important as in healthcare? It makes me proud that I can make a contribution through my work at Averbis.

To apply new technologies in the medical environment with high-quality standards and data protection requirements is challenging.  The competence mix in the Averbis team helps us to address the complex needs of our clients. I like how openly and in partnership we act with our clients. And it is particularly important to me that I always have fellow teammates on my side at Averbis who are prepared to go an extra mile for excellent results!

Sonja Fix
Project Manager Healthcare

Read More

Averbis Health Discovery – Anonymisation of medical documents

  • Averbis Health Discovery enables reliable de-identification of medical documents in accordance with the HIPAA Safe Harbor method.
  • Sensitive patient information remain protected.
  • Patient information can be used for medical research, quality assurance and clinical studies in a privacy-compliant manner.

Secondary use of routine medical data in research
The provision of clinical raw data is an indispensable basis for numerous applications in medical research: e.g. aggregated patient data can help to identify disease mechanisms, reduce recruitment times of patients in clinical studies or improve medication safety monitoring. However, such projects often fail because the raw data contain personal information and there is no way to quickly and  reliably deidentify large volumes of data. This is exactly where Averbis Health Discovery can help.

High need for protection of clinical data
Personal medical data is highly sensitive and subject to strict data protection regulations. All personal information must be removed before the data can be released for medical research. In addition to personal names and dates of birth, telephone numbers, names of medical staff and relatives, many other text passages are also part of the information that needs to be protected. Depending on the application, information such as dates should not be completely removed, but should be coarsened to one or more years for specific studies.

Best possible protection through a combination of AI and pattern recognition
Averbis Health Discovery supports you in the de-identification of personal data in medical free texts in compliance with HIPAA. In order to ensure the best possible protection of data, current technologies from the field of deep learning are combined with pattern-based procedures: for example, names, location information, occupational information and other features are recognized by means of artificial intelligence, the identification of e-mail address and date information is additionally secured by means of pattern recognition.

Individually adaptable to your needs
The marking of personal data and their further processing are logically separated in Averbis Health Discovery. Thus, you have the possibility to treat the identified characteristics differently according to requirements. You would like to obtain the patient’s year of birth in one study, and completely remove the year in another study? The de-identification is flexibly adaptable and provides reliable protection in every case of application.

De-identification fast and safe
The de-identification of Averbis Health Discovery is an indispensable tool for distributed medical research and finds its application wherever personal information is to be protected. It supports the data protection-compatible handling of medical documents in clinical studies, quality assurance and medical research. The de-identification as well as all other modules of Averbis Health Discovery are available for different languages.

Read More

Why is INFORMATION DISCOVERY a highly interesting tool in the NOW(-future) for the life science industry?

As Head of Platform, Christian Gaege is responsible for the Averbis text analytics and machine learning platform Information Discovery.

Christian, can you explain briefly what exactly Information Discovery is?
Information Discovery is the leading application platform for natural language processing (NLP) for the life science industry.
Information Discovery combines state of the art machine learning, terminologies and a powerful rule engine to uncover facts and relations in unstructured text data. It includes a large variety of components to identify document language, entities like companies and persons, part of speech, abbreviations, measurements, temporal expressions, keywords and negations. Domain specific components (to detect information like diagnoses, medications and laboratory values for health care scenarios) are available as add-ons.
Data scientists and software developers can easily extend Information Discovery to tailor the text analysis functionality to their specific needs. For example, we’ve recently achieved an accuracy of over 98 % within just a few hours of optimization for a specific customer scenario.

What is different to other tools in this area?
As we all know, more and more data is available in unstructured form (e.g. chat messages and social media posts, but also office documents, scientific papers, patents and patient records) but it’s hard to leverage this data.
Information Discovery helps you to exploit this potential for your business:
• BI on your unstructured data – extract facts from your documents and apply queries and analytics to get insights.
• Document classification – train your own machine learning model to automatically classify documents in any categories such as relevant or irrelevant. No machine learning experience is required.
Multilingual support
• On premise installation or private cloud to protect sensitive data.
• Seamless integration into existing workflows or other applications through extensive API.
• No vendor lock-in due to open standards.

What benefits do I gain?
Gain competitive advantages from insights that were previously not accessible due to lack of structured information.
 Reduce development time by using Information Discovery as an off-the-shelf solution for natural language processing in your own products.
• Save money and reduce time spent for manual processes like document pre-classification.
 Benefit from Averbis’ many years of experience in the life science environment
 Be able to handle the increasing amount of unstructured data.

What fascinates you personally the most about this tool, Christian?
The wide range of use cases in which our customers use Information Discovery. For example, the software is currently used in the health sector to identify patients with a positive COVID-19 diagnosis and distinguish them from corona suspect cases.

Last but not least a personal question to get an insight info: work hard – play hard. Tell us about your favourite play time activity after work.
As a balance to work I like to do sports and spend time with my 3 children. Besides, I love music and try to improve my guitar playing.

Thank you for these valuable insights, Christian.

We also would like to point out: Demo version is available, if you got curious enough and would like to check it out by yourself.
Simply contact us!

Read More

Text Mining of Microbiology Findings

For diagnosis confirmation and reimbursement optimization

For the clinics and hospitals under cost pressure, a complete coding of diagnoses and therapies is essential for the billing of services.  The completeness of the coding becomes a particular challenge when medical documentation from other areas such as the laboratory has to be taken into account.  The new version of Health Discovery addresses precisely this challenge.

In addition to clinical notes, surgery and pathology reports, microbiological reports  can now also be analyzed. Numerous DRG-relevant secondary diagnoses that were previously often overlooked can be recognized and coded.

View of a microbiology finding in the annotation editor of Health Discovery


Case study microbiology: 10% uncoded ICD secondary diagnoses

The new module has already been successfully tested in a large German hospital.  For this purpose, a total of more than 70,000 microbiological reports were analysed using Health Discovery. The result: more than 12,500 ICD codes were derived from positive microbiological findings. About 10% of these codes had not been coded by the hospital’s medical controllers so far.

Top 10 ICD-10-Codes (German version)


Assessment of CCL relevance

A comparison with the clinical complexity level CCL (Complication or Comorbidity level) showed that of the approximately 12,500 predicted codes, more than 7,500 represent CCL-relevant secondary diagnoses. Of the diagnoses not yet coded, a total of 825 were found to be CCL-relevant.

The new microbiology module can now be used by all Health Discovery customers and, through the partnerships with Cerner, is also available to all hospitals that use Meta-KIS as DRG-optimization software.

Read More

Averbis Partner TriNetX is providing Covid-19 researchers with up-to-date patient-level clinical data

Our partner TriNetX, through its global network of 150 healthcare organizations, is providing Covid-19 researchers with up-to-date patient-level clinical data for those diagnosed with the virus to help develop supportive, curative, and preventative therapies for the disease.

Crucial types of information from Covid-19 patients are currently only available in an unstructured form such as doctors’ letters, progress reports, radiology reports, etc. In the current situation COVID-19 is very often mentioned in these documents (e.g. patient is afraid of having Covid-19 or the patient’s wife has returned from a corona risk area). In order to improve the detection of positive COVID-19 diagnoses, Averbis and the technical team of TriNetX have jointly optimized the NLP based diagnosis detection for COVID-19.

Now it is possible to incorporate facts from unstructured documents to develop a study a cohort of COVID-19 patients.

Thank you for delivering so fast!

(Photo by Drew Hays unsplashed)


Read More

mineRARE: Semantic text-mining of electronic medical records as diagnostic decision support tool to search for rare neurologic diseases such as Pompe disease, Fabry disease and Niemann-Pick type C disease


Background and aims:Diagnosis of rare neurogenetic disorders is often challenging, particularly adult-onset presentations, with long diagnostic delays and misdiagnosis. As therapies become available, it is increasingly important to identify patients with rare neurologic diseases.

Methods:This multicenter project on ten rare neurogenetic diseases was approved by local Ethics committees and data protection authorities of six German University medical centers. Semantic text mining software structures medical data by ranking documents according to probability of disease, based on disease-specific lists of weighted signs and symptoms. Software and search algorithms were optimised in a pilot phase. Existing electronic medical records from the Department of Neurology of each center, corresponding to 10 years of activity, were screened, and patients ranked by probability of having the respective disease. An experienced team of physicians reviewed the data for the top ranked patients and those without a confirmed diagnosis were contacted for testing for the respective disease.

Results:In the pilot phase, 4 patients with Pompe disease and 4 heterozygous NPC1 mutation carriers were identified in Munich. More than 400.000 datasets from four centers were analysed for three diseases: Niemann-Pick type C disease, Pompe disease and Fabry disease. Four novel Pompe patients and 3 heterozygous NPC1or NPC2 mutation carriers were identified, who had not previously been diagnosed. Data from more centers will be provided.

Conclusion:Electronic medical records-based diagnostic data mining seems to be a promising tool to help diagnosing rare neurologic diseases. It may allow effective screening, re-evaluation of patients with uncertain diagnosis, and identification of patients for clinical trials.

This research was supported by research grants from Sanofi Genzyme and Actelion.

Cite this article as:

Catarino C, Grandjean A, Doss S, Mücke M, Tunc S, Schmidt K, Schmidt J, Young P, Bäumer T, Kornblum C, Endres M, Daumke P, Klopstock T, Schoser B. mineRARE: Semantic text-mining of electronic medical records as diagnostic decision support tool to search for rare neurologic diseases such as Pompe disease, Fabry disease and Niemann-Pick type C disease. European Journal of Neurology 07/2017; 24(Suppl 1):75-75.

Read More

Information Discovery: New Version 4.11 is available!

averbis information discovery text mining

information discovery averbisInformation Discovery is a next-generation text analytics platform that allows you to gain insights into your unstructured data and explore key information in the most flexible way possible. Information Discovery collects and analyzes all kind of documents such as patents, research literature, databases, websites and company- internal repositories.

In recent months, Averbis has published versions 4.9 to 4.11 with a variety of improvements and extensions within a very short time. Highlights include the new modules of Terminology Editor for the maintenance of terminology and Annotation Editor for visualization of text mining results, as well as the horizontal scalability of text mining pipelines for a variety of big data scenarios.

Here are the most important innovations in versions 4.9 to 4.11 at a glance:

  • With the newly developed Terminology Editor, terminology can now be edited directly in Information Discovery and seamlessly integrated in text mining pipelines.
  • Text analysis pipelines can now be distributed horizontally on any number of machines. The distribution can be fully controlled via the graphical administration within Information Discovery.
  • The new Annotation Editor lets you graphically visualize and edit text analysis results.
  • The Evaluations Workbench allows the comparison of different text mining pipelines regarding precision, recall, F1, and standard deviation.
Read More
page  1  of  2

Jetzt weitere Informationen und Demo anfordern!

kostenlos & unverbindlich

Schreiben Sie uns von Ihrem Vorhaben (Pflichtfeld)


Vorname (Pflichtfeld)

Name (Pflichtfeld)

Jobtitle / Rolle (Pflichtfeld)

Firma (Pflichtfeld)

E-Mail-Adresse (Pflichtfeld)



Your message (required)

Use case

First Name (required)

Last Name (required)

Job Title / Role (required)

Company (required)

Email (required)