Interested in joining our team?
Please contact anne-christin.hauschild(at)med.uni-goettingen.de
The research group Clinical Decision Support focuses on the development of medical and bioinformatics methods for the analysis of medical data with the goal of diagnostics and therapy optimization. For this purpose, different clinical, molecular and medical image data are integrated and analyzed using statistical and machine learning methods. The developed models are used for stratification of diseases and identification of potential markers.
Machine learning has been successfully applied in many areas of biomedical research. However, many challenges remain that hinder a transfer towards clinical practice. In particular, limited sample size and systematic biases of single cohorts lead to data sparsity, heterogeneity of medical registry and omics data. Moreover, a lack of interpretable and reliable predictions result in a lack of trust in otherwise highly accurate models. We are addressing these challenges with different computational architectures and algorithms such as federated learning for global model generation, data integration and small sample sizes by transfer learning, online and time-critical event prediction as well as model and prediction interpretability by explainable artificial intelligence methods.
Leverage the Unreachable - Federated Learning for Privacy Preserving Model Aggregation
Sensitive patient information such as clinical data and medical registry data is often stored in critical healthcare infrastructure distributed across institutes. The analysis of such data harbours privacy risks and thus falls under a variety of legal regulations such as the General Data Protection Regulation (GDPR), making the application of traditional machine learning algorithms often impossible. Essentially data exchange among institutions over the internet is posing a roadblock hampering big-data-based medical innovations. Therefore we are developing federated learning FL algorithms that follow a privacy by design architecture. FL techniques aim to overcome the barrier of exchanging raw patient data and move towards large-scale medical data mining. The idea is to build a generalized global model without access to a shared dataset by merging locally trained models that capture the essence of the data. As part of this focus, we will soon launch the BMBF-funded FAIrPaCT project (see projects). Recently, Maryam Moradpour joined our group and will focus on the development of federated algorithms for non-partially overlapping biomedical datasets.
Leveraging the Vast Known - Transfer learning to address Challenges of Data Heterogeneity and Sparsity
Data sparsity and heterogeneity are two of the biggest challenges in the medical data science domain. Transfer learning is able to mitigate these issues and has been successfully employed in medical imaging utilizing pre-trained models from general image databases such as ImageNet to identify skin cancer, for instance. Furthermore, it can be very beneficial in biomedical research where a lack of sufficient data is even more frequent. Significant technological advances in next-generation sequencing led to a large number of focused studies with distinct purpose often spread amongst a large number of data sets which are not fully utilized yet.
Therefore, data heterogeneity is one of the big challenges for integrative analysis, which requires merging data from various developing fields. For example, various data types require different processing, even if the data sets originate from the same sequencing technique. In addition, other challenges such as batch effects and noise often make biomedical data analysis difficult. Thus, in our group we are trying to develop transfer learning methods that overcome these challenges by leveraging large scale public data sets from one domain and transferring these to improve tasks in a different target domain with smaller data sets to identify meaningful biomedical insights. Recently, Youngjun Park has developed a method to transfer knowledge between data sets from different sequencing technologies and to handle batch effect and noise in the target domain. Currently, our team is developing domain adaptation and zero-shot learning methods for proper knowledge transfer to a different biological domain.
Just-In-Time Prediction - Online and time-critical event prediction
The majority of current clinical and omics research, is often restricted to investigations of cross-sectional snapshots of specific diseases. However, most diseases traverse various stages during their development, or emerge towards different subtypes depending on influences such as genetics, environment or medication. Thus an analysis that focuses on a single time point of the patients' metabolite or protein abundance may overlook potential biomarker pattern, in particular, when identifying subtypes of diseases and optimize treatment. Novel technologies paved the way for longitudinal analysis and online monitoring of fast progressing diseases or changing patient vitals, for instance, parameters of lung function or metabolites in exhaled air. Our group is developing methods, and packages, such as the R package LoBrA for longitudinal linear spline analysis and more advanced recurrent neural networks to model real world longitudinal data. For example, we applied the LoBrA package to breath metabolomic data of rats with progressing to illuminate the potential of computer-aided metabolic breath analysis as early alarm system for time critical diseases such as sepsis. Moreover, we will employ our methodologies to investigate to clinical variables that are captured during mechanical ventilation and model the progression of acute respiratory failure. Dr. Zully Ritter and Stefan Rühlicke are currently working in the Ensure project, the aim is to implement a clinical decision support system for diagnostics in the emergency department and evaluate its usability and accuracy compared to existing guidelines.
Opening the Black Box - Achieving model and prediction interpretability by explainable artificial intelligence methods
In domains such as clinical decision support systems (CDSS), there is typically a lack of trust in black box machine learning models that do not allow to trace back the factors that lead to a specific decision. The field of eXplainable Artificial Intelligence (XAI) tries to increase the transparency and the trustworthiness of AI models and thus increasing the use of AI models as CDSS. However, until now, their usage is primarily limited to data science experts. In one of her current projects Ms. Beinecke is developing a Graphical User Interface (GUI) for visualizing and analysing XAI attributions on graph datasets such as protein protein interaction networks, to allow researchers of other domains to gain a deeper understanding of the underlying models. The GUI is part of a human-in-the-loop platform that will allow graph manipulation based on the gained insights through the XAI attributions. In addition, she is working on an extensive benchmarking study focusing on the trustworthiness of XAI methods on different types and sizes of, in particular, biomedical data to develop a guideline that will aid in the acceptance of AI methods as clinical decision support systems (CDSS). Moreover, Zully Ritter is employing classical tools such as SHAP and LIME for giving information on which specific parameters and its weight in relation to the correct prediction finding is used in the decision-making process. These methodologies are essential for increasing the acceptance of these CDSS especially in the critical and post hospital care, not only for allowing the understanding of the decision taken but mainly when the ML prediction differs from those of the clinicians.
Clinical Decision Support Systems (CDSS) based on machine learning models
Machine Learning models (ML) are an essential tool that, integrated into portable devices or a clinical setup, can be used by medical staff as clinical decision support. Patient data, including vital signs, symptoms, and outcome diagnosis, are frequently used. Some considerations, like physiological filters, are essential to pre-processing and cleaning data steps. Zully Ritter is working on practical applications of machine learning models, which include diagnostic prediction in emergency department patients. In such a specific clinical setup, the assessment of patient diagnosis is a time-critical process to initiate adequate treatment strategies. It thus represents one of the most important steps concerning patient`s outcomes in emergency care. ML models are time-saving, managing the clinical case of a specific patient using the previously learned result from several patients’ analyzed data in adequate time. In critical cases, it could be life-saving in emergency patients.
Breath metabolomics in clinical diagnostics and monitoring
Odors and vapors of the body and breath have been known for their diagnostic power for millennia. More recent history confirmed this knowledge within clinical studies by successfully training dogs and mice to detect diseases, by sniffing specific volatile organic profiles. Like a vertebrate nose, there exist analytical technologies capable of capturing such metabolites. The science of analyzing the aggregation of all metabolites within the breath of an organism is called breathomics. The crucial task is to identify discriminating patterns that are predictive for certain diseases. Additionally, like other diagnostic technologies, breath is influenced by various sources of systematic or random noise. The field needs to move from separability to predictability by evolving from pilot studies to large scale screening studies. Therefore, there is a necessity for further standardization and automatization in managing, analyzing and evaluating this novel type of metabolomics data. In order to achieve this, several challenges remain to be addressed: data accumulation and heterogeneity; manual peak finding; unknown metabolites; robust statistics and biomarkers; background noise and confounding factors; heterogeneous diseases and disease stages; usability, maintainability, and re-usability.
Cancer diagnostics and treatment optimization
While over the past decades, basic and clinical cancer research and trials led to promising new therapies and an improvement of survival for a number of cancer types, it cancer remains a major cause of morbidity and mortality. This research has focused on a variety of different aspects, from understanding cancer development, dissecting progression and evaluating susceptibility to therapy. This led to an even more diverse range of data sets, from clinical variables such as patient symptoms, laboratory aspect like hormonal changes, data from invasive interventions like tumor size and locations as well as modern molecular factors such as gene and protein abundance. In a variety of collaborations we integrated such data to evaluate for instance, whether cancers of the respiratory system can be identified via metabolites in patients breath. Other studies focused on the potential of gene expression data to achieve molecular tumor sub typing, or the identification of involved molecular functions or metabolic pathways that could potentially lead to novel drug targets. Finally, we investigated a combination of clinical, laboratory and surgical data to predict prostate cancer recurrence (PCR) after successful primary treatment. Currently, we are collaborating with the Gastroenterology departments of the UMG, the Clinic right of the Isar, Technical University Munich and the Philipps-University Marburg to build a privacy preserving federated learning framework to predict treatment response.
Antimicrobial Drug Resistance Prediction
A general aim of microbiological diagnostics is to rapidly identify pathogens out of a patient's specimen and to generate information on their susceptibility to antibiotic drugs, allowing clinicians to generate informed treatment plans. This has been increasingly complicated by frequent occurrence of antimicrobial drug resistance (AMR) in pathogenic bacteria, which has become an important problem for human and veterinary health settings, on a global scale. Spread of multi-drug resistant pathogens may even render entire sets of antimicrobial drugs useless, which often results in expensive and difficult-to-treat infections in humans and animals. While the use of ML methods on mass-spectrometry data for microbial identification has been established, only a few studies have applied machine learning algorithms to predict resistance. A particular challenge is the heterogeneity amongst mass-spectrometric data sets from different devices, institute or even time points. Thus, we aim to build robust models, generalizable towards other data sets of the similar structure.
Modeling Lung Protective Ventilation in Intensive Care
Mechanical ventilation is the most important supportive and life-sustaining measure for severe acute respiratory failure in patients with Acute Respiratory Distress Syndrome (ARDS). Conservative estimates suggest that about 150.000 patients per year suffer from ARDS in Europe alone. Ensuring pulmonary gas exchange buys time to treat the underlying cause of the disease. The COVID-19 pandemic led to a massive increase in ARDS cases worldwide and has highlighted the possibilities, but also the limitations and problems associated with mechanical ventilation. The concept of lung protective ventilation aims at preventing injurious side effects while enhancing protection by monitoring several variables that captured during mechanical ventilation such as the tidal volume, the peak airway pressure or positive end-expiratory pressure (PEEP). However, current therapeutic approaches, often only addressing a few variables, do not sufficiently account for treatment corridors to be specified for the setting of mechanical ventilation and thus fail to adapt to personalized mechanical ventilation for the individual patient. We aim to integrate existing clinical data from patients and model organisms in order to develop statistical models to investigate the complex interactions amongst clinical ventilation parameters to predict the success of mechanical ventilation and predict pending risk for individual patients. These models will be the first step towards an (causal) model for personalized ventilation settings.
Psychiatric Pharmacogenomics Research
In the era of large genomic data sets and machine learning (ML), secure data management is one of the key challenges when developing biomedical software, handling personal, clinical and genomic patient information. Particularly in psychiatric and pharmacogenomic research, where state-of-the-art ML approaches are slowly gaining attention, data confidentiality is a continuous concern. Therefore, privacy-preserving technologies will play a crucial role to pave the way for the application of modern ML in medical diagnostics and treatment optimization of psychiatric disorders. We aim to implement federated machine learning approaches for personalized psychiatric diagnostics and pharmacogenomic research for treatment optimization.
Predicting a Heart Failure Attack
Not only retrospective data but also open-source data can be used for developing machine learning models that can be used to predict an outcome for new patients or subjects. To demonstrate this, Zully Ritter implemented all steps from data science to web-app development, as shown in CorMeum, an app to predict heart attacks.
CDSS in Clinical Setups
Another important application is predicting post-hospital care needs based on analyzing routine data from health insurance companies. In such cases, time series or recurrent events from the same patient need to be understood for proper machine-learning model development. ML models that will be used in real scenarios need to be proven by quantifying their performance and testing in labor conditions to determine their limitations, potential, and usability before being used as clinical support.
These projects are only possible through interdisciplinarity and the inclusion of different key institutions that possess the know-how, the adequate data, and an environment for interchange and critical self-analysis of all steps during the project development going from project aims, model creation, deployment, and laboratory test before being implemented in clinical trials. More details about the current projects, such as ENSURE (ENtwicklung Smarter Notfall-Algorithmen dURch Erklärbare KI-Verfahren: Development of Smart emergency-algorithms using explainable AI processes) and KI-THRUST project (Potenziale KI-gesTützter VorHersageveRfahren aUf BaSis von RouTinendaten potential AI-supported prediction method based on routine data), in which machine learning models are implemented into clinical decision support to be used in clinical setup-ups can be found in the corresponding project pages below.
List of collaborative projects
FAIrPaCT - Federated Artificial Intelligence fRamework for PAncreatic Cancer Treatment optimisation
The goal of our consortium, consisting of the University Medical Center Göttingen, the University Hospital Giessen and Marburg and the Technical University Munich, is to develop a software system supported by federated artificial intelligence called FAIrPaCT that will enable the analysis of clinical patient data and molecular cancer cell data from patients with pancreatic cancer across institutes. Pancreatic cancer is a highly aggressive malignancy with a rising incidence, predicted to become the second leading cause of cancer-related death by 2030 in the industrialised world. Due to its extraordinarily aggressive, locally invasive tumour biology with a tendency to distant metastases, and the exceptionally high and heterogeneous resistance to conventional chemotherapy, therapy is often difficult.
Our project combines three of the largest patient cohorts (KFO5002, KFO325, SFB1321) on pancreatic cancer in Germany, which are unique in size and heterogeneity. In combination with innovatively tailored federated artificial intelligence methods, we are able to train robust high-performance models to estimate the probability of success for specific treatment approaches. Moreover, the FAIrPaCT framework will enable the identification of important parameters that drive treatment response, so called markers. These can provide key details about the molecular mechanisms that influence therapy success and thus can support the development of improved drugs and enable personalised treatment strategies. To address the needs of stakeholders such as medical and computational researchers, and patient communities all software packages will be open-source, data will be FAIR, and results will be published. Finally, moving towards cross-cohort analysis enables us to benefit from heterogeneous local data, to build more robust and clinically relevant models, identify globally relevant markers and ultimately make a step further towards artificial intelligence-supported precision medicine.
In cooperation with Prof. Dr. Michael Altenbuchinger, the CDSS group offers the course "Advanced Statistical Learning for Data Science" in the winter semester in the B.Sc. and M.Sc. programs Applied Computer Science and Data Science. In addition, bachelor, master and doctoral theses on our research topics are offered at any time see GWDG-Pad. Please feel free to contact us if you are interested.