Doctoral Students

Ekin Celikkan
GFZ - KIT

Contact

Ekin Celikkan
Bayesian Machine Learning with Uncertainty Quantification for Detecting Weeds in Crop Lands from Low Altitude Remote Sensing (2022 - )

Supervisors:

Martin Herold (GFZ)

Nadja Klein (KIT)

 

Weeds are significant contributors (about 12% of global crop production) to crop yield and quality decline. Farmers use different approaches such as chemical or biological herbicides to eliminate weeds. However, excess use of herbicides leads to the pollution of soils, water, and air, putting the above and below ground wildlife biodiversity at risk. Alternative weed mitigation strategies must be designed and promoted. The site-specific weed management (SSWM) approach has been proposed consisting of varying weed management strategies within a crop field to suit the weed population's variation in density, location, and composition. The first step in implementing a SSWM strategy is accurate and timely detection and mapping of weeds. The high temporal, spatial, and spectral remote sensing information is required to capture detailed within-field variability, which can only be met by using the Low Altitude Remote Sensing platform and lightweight hyperspectral imaging sensors that can be deployed locally and at varying conditions.

This research aims to use hyperspectral multi-temporal Low altitude remote-based time-series imagery combined with field data and advanced machine learning (ML) techniques to detect and discriminate weeds in croplands. To achieve this aim, the project will i) monitor the weed population during different growth stages and preprocess of hyperspectral data and field data for better detecting weeds, ii) test a combination of ML and image processing algorithms for detecting weeds across a range of field and growing stage conditions, iii) apply the framework for multi-temporal analysis to track weed distribution and conditions for theuptake by precision farming techniques in our study regions.

 

Full-length publications

  1. E. Celikkan, M. Saberioon, M. Herold and N. Klein (2023). Semantic Segmentation of Crops and Weeds with Probabilistic Modeling and Uncertainty Quantification. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 582-592.
  2. E. Celikkan, T. Kunzmann, Y. Yeskaliyev, S. Itzerott, N. Klein, and M. Herold (2025). WeedsGalore: A Multispectral and Multitemporal UAV-Based Dataset for Crop and Weed Segmentation in Agricultural Maize Fields. Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pp. 4767-4777.

 

Conference presentations

-

Daniel Collin
GFZ - TU Berlin

Contact

Daniel Collin
Predicting geomagnetic conditions on the Earth from multi-spectral images of the Sun by combining data science and physical models (2022 - )

Supervisors:

Yuri Shprits (GFZ)

Guillermo Gallego (TU)

 

Space weather is a term used to describe hazardous events in the near-Earth space environment that can have adverse effects. Power grids, telecommunication infrastructure and space assets show significant vulnerability to space weather events originating from the Sun. While these effects are largely invisible to the naked eye, in the 21st century operations in space and on the ground significantly depend on the accurate knowledge and forecast of the conditions in the space environment.

Using ground observations of the Sun and physics-based models of the Solar System, it is possible to quantify space weather hazards (e.g., solar wind speed and density) and risks on Earth’s technology. The output of this method provides an estimate of the desired variables. These methods, however do not utilize the vast amount of observations that are available from space, and they are computationally demanding and dominantly physics-based, making them difficult to run in real-time and use all available measurements.

We propose to leverage the growing number of highly detailed multi-spectral images of the Sun to improve predictions of the solar wind streams arriving at the Earth. We propose to exploit the capabilities of modern computer vision and machine learning (ML) techniques to register solar images, analyze them and assimilate them in an empirical (data-driven) model. Through our approach we will develop a novel, data-driven framework directly connecting solar disturbances to its consequences for space weather.

 

Full-length publications

  1. D. Collin, Y. Shprits, S.J. Hofmeister, S. Bianco, and G. Gallego (2025). Forecasting High-Speed Solar Wind Streams from Solar Images. Space Weather. https://doi.org/10.1029/2024SW004125

 

Conference presentations

  1. D. Collin, S. Bianco, G. Gallego, and Y. Shprits. Forecasting solar wind speed from solar EUV images. (Oral and poster presentation), International Workshop on Machine Learning and Computer Vision in Heliophysics, Sofia, Bulgaria, 19-21 April 2023.
  2. D. Collin, S. Bianco, G. Gallego, and Y. Shprits. Forecasting solar wind speed by machine learning based on coronal hole characteristics. (Poster presentation), EGU General Assembly , Vienna, Austria, 24–28 April 2023. https://doi.org/10.5194/egusphere-egu23-6968
  3. D. Collin, S. Bianco, G. Gallego, and Y. Shprits. Forecasting solar wind speed from solar EUV images. (Oral presentation), IUGG General Assembly, Berlin, Germany, 11-20 July 2023. https://doi.org/10.57757/IUGG23-2070
  4. D. Collin, S. Bianco, G. Gallego, and Y. Shprits. Forecasting solar wind speed from coronal holes. (Oral presentation), AGU Fall Meeting, San Francisco, USA, 11–15 December 2023.
  5. D. Collin, Y. Shprits, S. Bianco, F. Inceoglu, S. Hofmeister, and G. Gallego. Forecasting solar wind speed from coronal holes and active regions. (Poster presentation), EGU General Assembly, Vienna, Austria, 14–19 April 2024. https://doi.org/10.5194/egusphere-egu24-18676
  6. D. Collin, Y. Shprits, S. Hofmeister, S. Bianco, and G. Gallego. Forecasting solar wind speed with solar images. (Oral and poster presentation), International Magnetosphere Coupling Workshop IV, Potsdam, Germany, 3–7 June 2024.
  7. D. Collin, Y. Shprits, S. Bianco, S. Hofmeister, and G. Gallego. Forecasting Solar Wind Speed from Solar EUV Images. (Poster presentation), Helmholtz AI Conference, Düsseldorf, Germany, 12-14 Jun 2024.
  8. D. Collin, Y. Shprits, S. Bianco, S. Hofmeister, and G. Gallego. Using Distributional Regression to Improve Solar Wind Speed Forecasting from Solar Images. (Oral presentation), European Space Weather Week, Coimbra, Portugal, 4-8 Nov 2024.
  9. D. Collin, Y. Shprits, L. Chiarabini, S. Hofmeister, S. Bianco, N. Klein, and G. Gallego. Forecasting Solar Wind Speed From Solar Images Using Distributional Regression. (Oral presentation), Machine Learning and Computer Vision in Heliophysics, Sofia, Bulgaria, 7-9 April 2025.
  10. D. Collin, Y. Shprits, L. Chiarabini, S. Hofmeister, S. Bianco, N. Klein, and G. Gallego. Solar Wind Speed Forecasting From Solar Images Using Distributional Regression. (Oral presentation), EGU General Assembly, Vienna, Austria, 27 Apr–2 May 2025. https://doi.org/10.5194/egusphere-egu25-11587
  11. D. Collin, Y. Shprits, L. Chiarabini, S. Hofmeister, N. Klein, and G. Gallego. Probabilistic Solar Wind Speed Forecasting Using Deep Distributional Regression. (Oral and poster presentation), European Space Weather Week, Umea, Sweden, 27-31 October 2025.

 

Hannes Drobek
MDC – FU Berlin

Contact

Hannes Drobek
Mapping Molecular Landscapes of Organelles and Cells in CryoET Data by Molecular Simulations and AI (2025 - )

Supervisors:

Mikhail Kudryashev (MDC)

Cecilia Clementi (FU) 

 

The molecular landscapes of cells define cellular functions and are responsible for diseases,  therefore understanding the structure and interactions between biomolecules is highly important.  Cryo-electron tomography (CryoET) is a microscopy method that uniquely allows imaging of  natively preserved cells at molecular resolution and has the potential to determine molecular  structures at a high resolution. The key problem in CryoET is the identification of the molecules  imaged in the tomograms, as most of the biomolecules in cells are small and look similar at  CryoET resolution. 

The objective of the project is to combine imaging data from CryoET with orthogonal data  sources: structures of molecules solved experimentally and/or predicted by Alphafold;  information about interactions between molecules from protein coevolution and cross linking mass spectroscopy analysis and, in the future, other biophysical sources. Furthermore, as the proteins typically do not adopt a single structure, but an ensemble of conformations, we propose to employ coarse-grained molecular simulations to sample the structural landscapes of proteins and protein complexes and use them for annotating the tomograms. 

This will allow us to predict the potential complex behavior of biological systems and probe the effects of mutations in the key molecular complexes. The ultimate aim of the project is the development of methods for  automatic annotation of molecules in human cells. This will help building an understanding of molecular networks, the determination of structures of protein complexes, and the quantifications of the effects of disease-related phenotypes. 

Lauren Gerber
MDC - Charité - HHI

Contact

Lauren Gerber
Understanding Host-Microbiome Interactions (HMIs) Using Deep Learning Models (DLMs) and Explainable AI (XAI) (2025 - )

Supervisors:

Prof. Dr. Sofia K. Forslund-Startceva (MDC, ECRC)

Prof. Dr. Wojciech Samek (HHI)

 

 

Human microbiomes are integral ecological microbial communities that have co-evolved with their human hosts, shaping key physiological processes. The diversity and dynamics of human microbiomes influence communication with multiple systems in their hosts. This is especially relevant to gut microbiomes and their hosts’ immune systems. Together, these host-microbiome interactions (HMIs) affect an individual’s susceptibility to disease onset, progression, and response to treatment. Computational approaches, particularly deep learning models (DLMs), are more frequently being adopted to unravel the complexities of HMIs.

Although several computational models have been developed to study HMIs, the mechanisms underlying interindividual variations in microbiome composition and their influence on health and disease are not yet fully understood. While traditional machine learning models (MLMs) have been increasingly applied to analyze HMIs, they are less adept at directly handling raw sequencing data. MLMs also require manual feature engineering, which can limit the discovery of latent patterns in the data.

In contrast, DLMs may be better suited to handle raw sequencing data, which is heterogeneous and sequential. These models can automatically extract features, which may enable the detection of novel taxa and discovery of hidden relationships among low-abundance taxa that might otherwise be discarded during taxonomic feature table generation. DLMs can also integrate multiple data types, supporting multi-omics analysis and offering a more holistic view of interconnected biological mechanisms. These insights may reveal potential drivers of disease onset and progression.

A major challenge associated with DLMs is model opacity. DLMs function as ‘black boxes,’ thereby lacking transparency regarding which features contribute to specific disease predictions. To address this limitation, explainable AI (XAI) methods can be applied to elucidate the decision-making process of DLMs. XAI helps to verify whether identified feature contributions are biologically plausible rather than artifacts, confounders, or noise. This, in turn, improves the trustworthiness of the DLMs and highlights features that may influence disease outcomes.

This project tests existing DLMs built on different architectures, including models built for both single- and multi-omics data, for disease prediction in the context of HMIs. The goal is to evaluate performance differences on new datasets, and subsequently apply post-hoc XAI methods to identify features driving predictions. The findings will then be assessed to determine whether they fit within the scope of current biological knowledge. Ultimately, this approach seeks to investigate the mechanisms inherent in HMIs and improve understanding of how interindividual variations in microbiome composition are associated with differences in health and disease outcomes.

 

Paolo Graniero
FU Berlin - HZB

Contact

Paolo Graniero
Optimization of Solar Energy Yield and Specific Load Conditions Considering Electric Busses in Public Transportation (2019 - )

Supervisors:

Natalia Kliewer (FU)

Carolin Ulbrich (HZB)

Rutger Schlatmann (HZB)

 

The number of photovoltaic (PV) systems installed worldwide is steadily increasing, along with their share in the energy produced in the power grid. Getting the maximum benefit from these PV systems involves two primary considerations: maximizing power generation over the entire lifetime of the PV modules and optimally integrating them into the power grid.

Getting the maximum output from a given PV installation requires continuous monitoring to detect underperformance and failures as early as possible. However, this type of monitoring is seldom implemented, particularly outside of utility-scale photovoltaic plants or research-focused installations. These facilities are equipped with all the instruments required for accurate monitoring, in contrast to residential and commercial installations.

The capabilities of modern data analytic methods can help raise the proportion of PV installations that can be monitored. In addition, data-driven modeling can make monitoring a PV system simpler: data-driven models do not require detailed information about the system, and they can also combine data from sources that are available outside of plants and research centers, but which standard, physics-based modeling approaches can't take advantage of.

One thing that could hinder the integration of PV systems into the power grid is the stochastic nature of solar energy. This stochasticity could lead to adverse effects such as over- and under-production and challenge the stability of power grids. This means that in addition to monitoring, the integration will also require the most accurate possible forecasting of expected power to control the grid optimally and avoid unnecessary stresses on it. Data-driven methodologies can help make the implementation of such best practices more accessible and more widely available, benefiting both PV system owners and power grid managers.

In this project, we investigate data-driven monitoring and forecasting systems for photovoltaic installations. As an application, we study the potential and benefits of PV power monitoring and forecasting for load management in support of electric mobility integration in the public transport system. We focus specifically on electric buses and the load generated from charging their batteries. Suppose the charging tasks are performed in an uncontrolled way. In that case, the large electric loads of the batteries might cause instability of the power grid and reduce the environmental benefits of such electric vehicles. Contrary to this, load management coordinates the charging processes to benefit as much as possible from renewable energy sources while preserving the power grid stability.

The coordination of energy generation from PV systems and energy consumption by electric vehicles is a key to a green transition in the transport sector.

 

Full-length publications

  1. P. Graniero, M. Khenkin, H. Köbler, N.T. Putri Hartono, R. Schlatmann, A. Abate, E. Unger, T.J. Jacobsson and C. Ulbrich (2023). The challenge of studying perovskite solar cells’ stability with machine learning. Front. Energy Res., Sec. Solar Energy, 11. https://doi.org/10.3389/fenrg.2023.1118654
  2. N.T.P. Hartono, H. Köbler, P. Graniero, et al. (2023). Stability follows efficiency based on the analysis of a large perovskite solar cells ageing dataset. Nat Commun, 14, 4869. https://doi.org/10.1038/s41467-023-40585-3

 

Conference presentations

  1. P. Graniero, A. Louwen, R. Schlatmann, and C. Ulbrich. Comparison of different data sources for  Machine Learning algorithms in photovoltaic output power estimation. (Poster presentation), 37th EU PVSEC, Online, 7-11 September 2020.
  2. P. Graniero, D. Rößler, C. Ulbrich, and N. Kliewer. Potentials and challenges for integration of electric bus fleets and PV-systems. (Oral presentation), 31st European Conference on Operational Research (EURO 2021), Athens, Greece / Online, 11-14 July 2021.
  3. P. Graniero. Data driven mitigation measures in advanced PV plant monitoring. (Oral presentation), Intersolar Conference, Munich, 6-7 October 2021.
  4. P. Graniero. Comparison of Unsupervised Algorithms for PV Fault Detection, and Data Sources for Power Nowcasting. (Oral presentation), Cost Action Pearl PV’s Conference- Enabling the Terawatt Transition,  Enschede, The Netherlands, 14-16 March 2022.
  5. P. Graniero, G.A.F. Basulto, R. Schlatmann, R. Klenk, and C. Ulbrich. Online Implementation of a Multiple Linear Regression Model for CIGS Photovoltaic Module Performance. (Poster presentation), WCPEC-8, Milan, Italy,  26-30 September 2022.

Zixin Hu
GFZ – TU Berlin

Contact

Zixin Hu
FloodChat - Global assessment of flood adaptation and risk through large language models and global open data (2025 - )

Supervisors:

PD Dr. Heidi Kreibich

Prof. Dr. Andrea Cominola

 

With millions of people exposed, global riverine flood risk is one of the major natural hazards worldwide, causing damage worth US$ billions and thousands of fatalities every year. As climate change and urban expansion accelerate, effective adaptation is urgently required to counteract the increasing trend of flood damage. Amidst increasing calls for accelerated climate adaptation, including the recent UNEP report, a pivotal question remains: What are the status, effectiveness, and potential of adaptation efforts to reduce future flood risks? 

FloodChat will provide an answer to this question by developing an automated data processing framework for global assessment of flood adaptation based on Large Language Models (LLMs) and other machine learning techniques to process public flood adaptation documents and quantitatively assess the effects of adaptation measures on risk dynamics at the global scale. Recent developments in LLMs have revolutionized text processing and have demonstrated its unprecedented potential. LLMs can efficiently analyze and synthesize vast collections of documents, providing interpretable and concise results that support informed knowledge generation. Chatbots have been used, for instance, to answer questions about climate change from IPCC reports. Based on global open data, empirical data and public adaptation reports from globally distributed case studies (including the UNDDR Knowledge Base and the World Bank Climate Change Knowledge Portal) and probabilistic machine learning (e.g., Bayesian regression), the treatment effect of various adaptation measures on flood damage will be quantified. Damage in situations with and without an implemented measure are compared while controlling for confounding variables (i.e., hazard, exposure, vulnerability) that can also significantly impact damage. Our planned assessment of adaptation reports and the effectiveness of adaptation measures complements and enhances the efforts of the Global Adaptation Mapping Initiative in line with the priorities set for global adaptation research.  

The overarching goal of FloodChat is to provide a quantitative global assessment of adaptation efforts to reduce future flood risks.

Viktoriia Huryn
MDC - Charité

Contact

Viktoriia Huryn
Multi-resolution models for single-cell genomics data (2022 - )

Supervisors:

Uwe Ohler (MDC)

Markus Schuelke-Gerstenfeld (Charité)

 

Single-cell genomics can obtain molecular data for tens of thousands of cells simultaneously. A typical experiment is carried out on a complex sample that contains different cell types, and can measure different cellular properties, such as the number of messenger RNA molecules per cell. Typical tasks include identifying distinct cell types (e.g. via unsupervised embeddings, [1]) or inferring a pseudo-temporal ordering of cells along developmental stages. A particular opportunity arises from single-cell genome accessibility data, which provides information about which of several million gene switches, so called regulatory regions, are accessible/on or inaccessible/off [2]. These data can be analyzed at multiple resolutions: At the level of whole regions, to identify where active switch regions are and to infer which genes they may regulate, or at the level of short DNA sequence patterns within the regions, which are recognized by proteins to specifically activate the switches in e.g. different cell-types. Models to utilize the power of single-cell genomics data, and accessibility in particular, are still in their infancy. The main challenge is that the higher number of cells (i.e. samples) is accompanied by high dropout: the readout covers only a few percent of all variables, and the resulting discrete count data is sparse. Additionally, ground truth experimental data only exists for a handful of scenarios, making it hard to develop practically useful methods that work beyond simulated data.

The project will utilize data from the Schuelke lab to develop deep neural network approaches in the Ohler lab that enable flexible multi-resolution analyses: the goal is to devise models that are able to infer both active regulatory regions and the functional sequence patterns in them, while (a) leveraging data from smaller or larger cell neighborhoods as needed; (b) accounting for confounders such as variable dropout and cell type mixtures; and (c) utilizing auxiliary data from other single-cell experiments.

 

Full-length publications

-

Conference presentations

  1. V. Huryn, R. Monti, A.A. Rakowski, and V. Döring. Disentanglement learning for functional genomics data. (Poster presentation), Kipoi Summit. New Horizons in Computational Regulatory Genomics. Zugspitze, Germany, 25-26 September 2023.

Hendrik Junkawitsch
Helmholtz - HU

Contact

Hendrik Junkawitsch
Explainable Artificial Intelligence for Quantum Chemistry and Spectroscopy (2025 - )

Supervisors:

Prof. Ulf Leser (HU)
Prof. Annika Bande (Helmholtz)

 

Machine learning is increasingly applied in quantum chemistry and spectroscopy. Among spectroscopic techniques, X-ray Absorption Spectroscopy (XAS) is an important method for investigating the local chemical environment of elements in materials. The analysis of characteristic absorption intensities as a function of photon energy enables XAS to reveal structural properties such as oxidation states, coordination numbers, and interatomic distances around an absorbing atom. Consequently, XAS is invaluable in catalysis, environmental chemistry, battery and solar cell research, and bioinorganic chemistry.

However, acquiring and interpreting XAS spectra remain challenging and rely on significant experimental efforts by specialists or on computationally intensive methods. Here, machine learning provides an alternative by enabling trained models to predict spectra orders of magnitude faster than conventional methods, infer local structures directly from spectral data, and identify complex correlations between geometry and spectral features.

In this project, we aim to develop interpretable machine learning methods for predicting spectra from 3D structures using geometric Graph Neural Networks (GNNs). We focus on building chemically diverse benchmarking datasets that enable fair comparisons between methods and specifically aim for explainable models that capture physically and chemically meaningful relationships between structures and spectra.

Daniel León Periñán
MDC - TU Berlin - Charité

Contact

Daniel León Periñán
Towards molecular digital pathology: leveraging spatial transcriptomics and deep learning to predict gene expression from tissue morphology in solid tumors (2022 - )

Supervisors:

Nikolaus Rajewsky (MDC)

Klaus-Robert Müller (TU)

Frederich Klauschen (Charité)

 

Despite enormous progress in the understanding, early diagnosis and treatment, solid tumors still account for a quarter of deaths. Solid tumors arise from somatic cells through the accumulation of molecular alterations that eventually lead to their uncontrolled proliferation, invasion of healthy tissues and ultimately, distant metastases. In current clinical practice, the best treatment is chosen by assessing clinical presentation, histological tumor type and molecular characteristics, such as oncogenic mutations and, in some cases, gene expression profiles. The evaluation of histomorphological tumor properties in combination with molecular profiles guides risk-adjusted, personalized therapies that aim to optimize outcome for individual patients. However, diagnostic molecular profiling currently mostly relies on techniques performed on bulk tissue without providing any spatial information about the observed molecular tumor properties. While this can already make the interpretation of mutational profiles challenging, spatially resolved profiling becomes indispensable for the molecular analysis of the tumor microenvironment composed of multiple different cell types whose complex interactions influence therapy response. In this context, single-cell sequencing techniques may offer a unique opportunity to offer both high spatial and molecular resolution in the context of complex tumor histology.

Recently, the fruitful interdisciplinary collaboration between clinical, technological, and computational researchers has cultivated rapid progress in the field of digital pathology. AI-based models provide support to diagnostic pathology, for instance by automating tumor identification and tissue classification and cell detection from tumor histology images. Current computational models, however, are limited in their ability to predict tumor molecular characteristics due to the scarcity of paired imaging and molecular training data. Development of spatial transcriptomics assays aim to fill this gap by enabling unbiased, transcriptome-wide profiling of mRNA expression in intact tissue sections. Thus, they represent an ideal source of such paired morphological and molecular data with unprecedented resolution for training the next generation of digital pathology algorithms.

The project aims to push forward the field of digital pathology by predicting gene expression in tissue space from histomorphology alone. To achieve that, we will train deep learning models on the high-resolution gene expression maps provided by spatial transcriptomics co-registered with the corresponding histomorphology images. Advances in explainable AI approaches will be leveraged to reveal which morphological features and areas are exploited by such models to predict gene expression. This would be vital to allow for a transparent decision process, crucial for medical applications, and would also deepen our understanding of tumor biology by correlating tissue and cellular composition/histomorphology of the tumor and its microenvironment with function. We anticipate such a computational approach to have a positive impact on clinical practices by facilitating the prediction of molecular properties from routine diagnostic H&E images and thus to complement or even partially replace molecular testing.

 

Full-length publications

  1. T.M. Pentimalli, S. Schallenberg, D. León-Periñán, ..., F. Klauschen, and N. Rajewsky (2025). Combining spatial transcriptomics and ECM imaging in 3D for mapping cellular interactions in the tumor microenvironment. Cell Systems. https://doi.org/10.1016/j.cels.2025.101261
  2. M. Schott, D. León-Periñán,.., T.M. Pentimalli, …, N. Karaiskos, and N. Rajewsky (2024). Open-ST: High-resolution spatial transcriptomics in 3D. Cell. https://doi.org/10.1016/j.cell.2024.05.055

Conference presentations

  1. D. León-Periñán, N. Karaiskos, M. Schott, E. Splendiani, E. Senel, and N. Rajewsky. Computational methods for high-resolution spatial transcriptomics. (Poster presentation), VIB Spatial Omics, Ghent, Belgium, 13-14 June 2024.

Oleksii Martynchuk
DLR - TU Berlin

Contact

Oleksii Martynchuk
Identification of rock falls in Mars Reconnaissance Orbiter images using machine learning (2020 - )

Supervisors:

Jürgen Oberst (DLR)

Odej Kao (TU)

 

Mars has kilometer‐thick polar ice caps, which represent a unique laboratory to study the dynamics of ice sheets and the possible effects of climate change on a planet other than Earth. Most of the north polar cap margins are characterized by steep scarps, where avalanches and ice‐block falls are frequently observed. These cause a measurable scarp retreat, depending on season and the associated solar heating cycle. Rock falls also represent useful sources for seismic experiment and the exploration of the subsurface of the planet. The High‐Resolution Imaging Science Experiment (HiRISE) on-board the Mars Reconnaissance Orbiter (MRO) has been monitoring the planet for more than a decade, returning images at resolutions up to 25 cm, many of which in stereo. However, the identification of such small “mass wasting” effects, such as block and rock falls, is far from trivial considering the vast number of images in their differing illumination and viewing geometry. The image analysis requires new approaches of automated detection involving machine learning, which go beyond the traditional classification and regression schemes. In this project we apply modern data science techniques, such as deep convolutional networks to identify and measure rock sizes and volumes. The techniques are to be trained on established areas and ultimately be applied to new sets of images to investigate the recent climate evolution and support the seismic exploration of the planet.
 

Full-length publications

-

Conference presentations

-

Abhay Mehta
DESY - HU Berlin

Contact

Abhay Mehta
Context awareness in real-time image classification for ground-based gamma-ray telescopes (2022 - )

Supervisors:

David Berge (DESY)

Matthias Weidlich (HU)

 

The sensitivity of ground-based gamma-ray telescopes is ultimately limited by their ability to reconstruct the properties of gamma-rays from the particle shower produced when they interact with the atmosphere and reject the much more numerous background of showers from charged cosmic rays. 

At this time, array sensitivity can be considered as “software limited” and as such has continuously improved in the past 20 years by exploiting advanced image reconstruction and classification algorithms [1] and multivariate classification techniques such as boosted decision trees [2] and neural networks. It is clear that using modern machine learning techniques can improve this performance further. Yet, telescope arrays typically make observations under a huge range of context conditions. As such, it is of utmost importance to integrate contextual data on the operation of the telescope as well as on atmospheric conditions into any data analysis pipeline. This is challenging not only because of the heterogeneity of contextual data, but also from a computational point of view. 

This PhD project sets out to develop data processing pipelines that combine deep learning techniques for gamma-ray telescope data with diverse types of contextual data to dramatically improve the telescopes’ sensitivity. To this end, we will draw on initial results on employing state-of-the-art machine learning techniques to gamma-ray telescope data [3] as well as technical insights into integration of static datasets into stream processing pipelines [4].

 

Full-length publications

  1. Mehta, A., Parsons, D., Holch, T. L., Berge, D., & Weidlich, M. (2025). Convolution and graph-based deep learning approaches for gamma/hadron separation in imaging atmospheric Cherenkov telescopes. In Proceedings of Science (Vol. 501). 39th International Cosmic Ray Conference (ICRC2025). 
  2. Kostunin, D., Sotnikov, V., Golovachev, S., Mehta, A., Holch, T. L., & Jones, E. (2025). Agent-based code generation for the Gammapy framework. arXiv.

Conference presentations

  1. A. Mehta. Machine Learning for Imaging Atmospheric Cherenkov Telescope (IACT) Background Rejection. H.E.S.S. Collaboration Meeting, Bordeaux, France, 24-29.09.2023.
  2. CNN-GNN-based gamma/hadron classifiers for the H.E.S.S. experiment in the H.E.S.S. Collaboration Meeting in Obergurgl, Austria from 30.03.2025 -03.04.2025.
  3. Deep Learning for Ground-based Gamma-ray Astronomy in the International Cosmic Ray Conference (ICRC2025) in Geneva, Switzerland from 15-24.07.2025.

Lusinè Nazaretyan
Charité - MDC - HU Berlin

Contact

Lusinè Nazaretyan
Identification of Disease-causing Genetic Variants by Genome-wide Predictions of Human Variant Effects (2020 - )

Supervisors:

Martin Kircher (Charité)

Dieter Beule (MDC)

Ulf Leser (HU)

 

More than 15 years after initial sequencing of the human genome, exome and whole genome sequencing are widely performed for research and clinical applications. Despite much progress, pinpointing the few phenotypically causal variants among the millions of variants in our genomes remains a major challenge. To illustrate that problem, the NCBI dbSNP and ClinVar databases report almost 700 million variants discovered in healthy and diseased humans, but only about 500,000 (<0.1%) are clinically or functionally characterized. For diagnostics, it is critical to identify the causal variants among thousands of variants with no physiological effect.
Many algorithms for predicting functional impact of variants were proposed in the past, but are largely limited to highly conserved positions in protein-coding sequences and do not interpret variants genome wide. Kircher et al. previously developed a computational method (Combined Annotation Dependent Depletion, CADD) that combines diverse annotations – from large-scale epigenetic experiments, comparisons of genomes across species, to gene model annotations. Using a linear model, CADD integrates available information in a unified framework and quantifies organismal deleteriousness on a whole-genome and variant-specific scale. While CADD has been successfully applied in thousands of disease studies, its best performance is still observed for the interpretation of variants in and around protein coding genes.
Reasons for that are manifold, ranging from the lack of domain-specific features (e.g. non-coding sequence species, regulatory sequences, 3D genome architecture) to shortcomings in the actual model (e.g. non-linearity, missing feature interactions, mislabeling of the training data). Here, we propose a joint effort, between a group that routinely analyses and interprets genetic data from individual patients, families or large research cohorts and a research group developing computational methods for variant prioritization, to significantly improve the current method and advance the automatic reporting of potentially clinically relevant variants from genetic data. For this purpose, our project has the following aims: (1) Establishing model training for unlabeled or mislabeled data, for example by semi-supervised and iterative learning approaches. (2) Systematic exploration of feature interactions and non-linearity, including but not limited to alternative learning approaches like boosting trees and neural networks, feature transformations, or automated selection of interactions terms. (3) Integration of large sets of correlated annotations (e.g. ENCODE, IHEC) through dimensionality reduction, orthogonalization approaches, parallelization and training via subsampling, or hierarchical integration of models. (4) Using additional and genome-wide available measures of sequence constraint like population variant density and sequence-dependent mutational load, to complement species conservation.

 

Full-length publications

  1. M. Schubach, L. Nazaretyan and M. Kircher (2023). The Regulatory Mendelian Mutation score for GRCh38. GigaScience, 12, giad024. https://doi.org/10.1093/gigascience/giad024
  2. M. Schubach, T. Maass, L. Nazaretyan, S. Röner, and Martin Kircher (2024). CADD v1.7: Using protein language models, regulatory CNNs and other nucleotide-Level scores to improve genome-wide variant predictions. Nucleic Acids Research, 52, D1, D1143-D1154. https://doi.org/10.1093/nar/gkad989

 

Conference presentations

  1. L. Nazaretyan, M. Schubach, and M. Kircher. The Regulatory Mendelian Mutation (ReMM) score for GRCh38. (Poster presentation), ISMB/ECCB 2021, Online, 25-30 July, 2021.
  2. L. Nazaretyan, M. Schubach, and M. Kircher. The Regulatory Mendelian Mutation (ReMM) score for GRCh38. (Poster presentation), ESHG 2021, Online, August 28–31, 2021.
  3. L. Nazaretyan, M. Kircher, and U. Leser. Benchmarking machine learning methods for identification of mislabeled data. (Poster presentation), ECCB 2022, Sitges, Spain, September 18-21, 2022.
  4. M. Schubach, T. Maass, L. Nazaretyan, S. Röner, and M. Kircher. CADD v1.7: Using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. (Poster presentation), Genome Informatics, Cold Spring Harbour, 6-9 December 2023.

Elizabeth Robertson
DLR - TU Berlin

Contact

Elizabeth Robertson
Building a Photonic Processor for Energy-Efficient AI (2020 - )

Supervisors:

Janik Wolters (DLR)

Guillermo Gallego (TU)

 

Classical digital computer architectures are visibly approaching their technological and physical limits. Thus, there is a growing interest in developing post-digital computing approaches to overcome these limitations. Besides quantum computers, approaches that emulate neuromorphic processes represent a promising alternative because they mimic the massively parallel, energy-efficient computations carried out by the human brain. Such computations constitute the building blocks of the pattern recognition algorithms underpinning the success of machine learning (ML). Optically integrated systems promise 2–3 orders of magnitude higher energy efficiency compared to today's electronic approaches. Among others, post-digital computer concepts will enable numerous new applications for ML in places like data centers or security systems, as well as autonomous vehicles, drones and satellites – any area where massive amounts of computations need to be done but is limited by power and time. In this project we will realize ML with optical neural networks. That is, we want to use light to power machine learning, due to the potential advantages that an optical neural network (ONN) has over one that is emulated on conventional GPU chips. Moreover, we will investigate the potential of neuromorphic computing hardware for ML on low-power autonomous systems.

 

Full-length publications

  1. L. Jaurigue, E. Robertson, J. Wolters, and K. Lüdge (2021). Reservoir computing with delayed input for fast and easy optimisation. Entropy23, 1560. https://doi.org/10.3390/e23121560
  2. M. Yang, E. Robertson, L. Esguerra, K. Busch and J. Wolters (2023). Optical convolutional neural network with atomic nonlinearity. https://doi.org/10.48550/arXiv.2301.09994 [Preprint]
  3. L. Meßner, E. Robertson, L. Esguerra, K. Lüdge and J. Wolters (2023). Multiplexed random-access optical memory in warm cesium vapor. https://doi.org/10.48550/arXiv.2301.04885 [Preprint]
  4. E. Robertson, L. Esguerra, L. Meßner, G. Gallego, and J. Wolters (2024). Machine-learning optimal control pulses in an optical quantum memory experiment. Phys. Rev. Applied, 22, 024026. https://doi.org/10.1103/PhysRevApplied.22.024026

 

Conference presentations

  1. E. Robertson, L. Jaruingue, L. Messner, L. Esguerra, G. Gallego, K. Lüdge, and J. Wolters. A scheme for optical reservoir computers with atomic memory. (Poster presentation), Hot Vapor Workshop, Stuttgart / Online, 22-24 March 2021.
  2. L. Esguerra, L. Meßner, E. Robertson, N.V. Ewald, M. Gündoğan, and J. Wolters (2022). Optimization and readout-noise analysis of a hot vapor EIT memory on the Cs D1 line. Quantum Physics. https://doi.org/10.48550/arXiv.2203.06151 [Preprint]
  3. M. Yang, E. Robertson, L. Esguerra, K. Busch and J. Wolters (2023). Optical convolutional neural network with atomic nonlinearity. https://doi.org/10.48550/arXiv.2301.09994 [Preprint]
  4. L. Meßner, E. Robertson, L. Esguerra, K. Lüdge and J. Wolters (2023). Multiplexed random-access optical memory in warm cesium vapor. https://doi.org/10.48550/arXiv.2301.04885 [Preprint]

Saran Rajendran Sari
GFZ

Contact

Saran Rajendran Sari
Unsupervised Geo-Dynamic Separation (2025 - )

Supervisors:

Dr. J. Saynisch-Wagner (GFZ)

Dr. Jürgen Kurths (HU)

 

This project aims to develop an unsupervised geo-dynamic separation approach to disentangle satellite-based Earth observations into their underlying dynamic components, such as tides, convection, and turbulence. Focusing initially on oceanic sea surface temperature (SST), the method seeks to improve estimates of oceanic heat uptake and, consequently, our understanding of climate change. The approach employs two neural networks, the Separator-NN and the Reconstructor-NN, trained adversarially in a novel dynamics-focused masked autoencoder framework. The Separator-NN learns to produce disjunctive dynamic components, while the Reconstructor-NN evaluates forecast skill loss when one component is removed, serving as a formal unsupervised separation criterion. This eliminates reliance on prior physical models or supervised labels, reducing bias and enabling generalization to various geophysical datasets. Beyond Earth sciences, the proposed framework can extend to complex systems such as epidemiology, traffic dynamics, and ecological monitoring, offering a versatile and scalable tool for dynamic process separation across disciplines.

Jonas Schaible
HZB - FU Berlin

Contact

Jonas Schaible
Data-driven performance optimization of coloured and textured solar modules (2022 - )

Supervisors:

Christiane Becker (HZB)

Christof Schütte (FU)

Sven Burger (ZIB)

 

With the growing share of photovoltaic (PV) solar energy in the global energy generation capacity, building-integrated photovoltaics (BIPV) gains increasing importance. For BIPV, aesthetical factors, such as color, play a larger role than for large industrial PV fields. The structural color of PV modules and surface texturing for minimizing reflective losses and for self-cleaning have to be considered for accurately estimating the optical performance and, subsequently, the annual energy yield. This raises several issues which are hardly considered in state-of-the-art numerical methods, which are used for planning PV systems and for estimating their performance: Which specific module surface treatments ensure maximum energy yield? How does the module appear visually to an observer throughout the day and year? How do the spectral albedo of the surroundings, local weather conditions and local shadowing affect the result? How can the complex environments in building-integrated and bifacial solar modules be simulated as efficiently as possible?

This project aims to develop an all-encompassing optical modeling and optimization toolbox for individual PV modules considering the full optical circumstances from specifically textured module surfaces to local solar irradiance conditions. Together with urban planners and architects, desirable color effects will be identified. Metrics will be developed in order to quantify the aesthetical effect of color appearance of BIPV modules. The PhD student will model PV modules with different textured surfaces and coloring techniques, and account for shadowing and spectral reflection of the surroundings, e.g. from overgrown ground and trees.

 

Full-length publications

  1. J. Schaible, H. Winarto, V. Škorjanc, D. Yoo, L. Zimmermann, K. Jäger, I. Sekulic, P.-I. Schneider, S. Burger, A. Wessels, B. Bläsi, and C. Becker (2024). Optimizing Aesthetic Appearance of Perovskite Solar Cells Using Color Filters. Solar RRL. https://doi.org/10.1002/solr.202400627

Conference presentations

  1. J. Schaible, B. Nouri, T. Kotzab, M. Loevenich, N. Blum, A. Hammer, K. Jäger, C. Becker and S. Wilbert. Application of Nowcasting to Reduce the Impact of Irradiance Ramps on PV Power Plants. (Oral presentation), EU PVSEC, Lisbon, Portugal, 18-22 September 2023.
  2. J. Schaible, H. Winarto, D. Yoo, L. Zimmermann, A. Wessels, K. Jäger, B. Bläsi, S. Burger, C. Becker. On aesthetical appearance of colored perovskite solar modules. (Oral presentation), SPIE Photonics Europe, Strasbourg, France, 8-12 April 2024.
  3. J. Schaible, H. Winarto, D. Yoo, L. Zimmermann, A. Wessels, K. Jäger, B. Bläsi, S. Burger, C. Becker. On aesthetical appearance of colored perovskite solar modules. (Poster presentation),16th Annual Meeting Photonic Devices (AMPD2024), Berlin, Germany, 17-19 April 2024.
  4. Jonas Schaible, Markus Götz, Sven Burger, Christiane Becker, Klaus Jäger, A New Algorithm to Design PV with Arbitrary Color, European Photovoltaic Solar Energy Conference and Exhibition (EUPVSEC), 22.–26.09.2025, Bilbao, Spain.
  5. Jonas Schaible, Hanifah Winarto, Danbi Yoo, Lea Zimmermann, Andreas, Wessels, Klaus Jäger, Ivan Sekulic, Philipp-Immanuel Schneider, Benedikt Bläsi, Sven Burger, Christiane Becker, Optimization strategies for colorful thin film solar cells, SPIE PHOTONICS WEST 2025, 27.–31.01.2025, San Francisco, USA.

Karen Schmieder
Max Delbrück Center (MDC) - (BIH/Charité)

Contact

Karen Schmieder
Metabolic Reprogramming of Immune Cells During Age-Dependent Immune Responses: A Multi-Modal Data Analysis (2025 - )

Supervisors:

Jana Wolf  (MDC)

Birgit Sawitzki (BIH/Charité)

 

The immune system orchestrates complex, dynamic interactions involving intercellular signalling, gene regulation, and molecular networks. Crucially, immune cell activation is tightly coupled with cellular metabolic reprogramming, a process wherein immune cells dynamically adjust their metabolic pathways—such as glycolysis and lipid metabolism—in response to activation signals. These metabolic shifts are essential for supporting immune effector functions. Aging significantly impacts these metabolic adaptations, resulting in distinct metabolomic profiles that influence both the magnitude and quality of immune responses.

To investigate how aging modulates immune-metabolic interactions, we integrate single-cell RNA sequencing (scRNA-seq) data and bulk metabolomics measurements collected from young and aged mice following immune challenges. Through this multi-modal approach, we characterize age-dependent temporal dynamics of immune cell responses and metabolic states. These integrated data sets will be embedded within genome-scale metabolic networks (GEMs), computational models consisting of thousands of interconnected metabolic reactions. GEMs have previously been successfully utilized to characterize cell-type-specific metabolic profiles, thus may provide a powerful framework to deepen our understanding of age-associated immune-metabolic reprogramming.

 

Carl Stadie
AWI - TU Berlin

Contact

Carl Stadie
Capturing the Arctic Driftwood Carbon Pool with Remote Sensing Based Deep Learning and Foundation Models (2025 - )

Supervisors:

Guido Grosse (AWI)
Begüm Demir (TU Berlin)

My project aims to study driftwood deposits along Arctic coasts as a window into how polar environments store and move carbon. By mapping where these deposits accumulate and how they change, we can learn about past storms, river discharge, sea ice, and coastal processes, and, most importantly the carbon stored in driftwood deposits. In the project, we build large-scale maps of driftwood from Earth-observation imagery and apply modern deep learning methods to detect deposits consistently at scale. We combine these maps with supporting data to estimate how much wood (and therefore carbon) is present and how it varies across regions and through time. The results will offer the first broad picture of the Arctic driftwood carbon pool. This information can help climate scientists, ecologists, and coastal managers improve reconstructions of Arctic change and refine carbon-budget estimates. Finally, the project shows how satellite data and data-driven methods can open new perspectives on remote, rapidly changing ecosystems.

Relevant publications or conference presentations: 

  1. Stadie, C., Brandt, M., Nitze, I. et al. Large driftwood accumulations along arctic coastlines and rivers. Sci Rep 15, 32500 (2025). https://doi.org/10.1038/s41598-025-17426-y

Yifan Tian
DESY - HU Berlin

Contact

Yifan Tian
Data Stream Processing for Transient Detection in Gamma-Ray Observatories

Supervisors:

Matthias Weidlich (HU)

David Berge (DESY) 

 

Gamma-ray transients are brief bursts of gamma-ray radiation from astrophysical sources, such as gamma-ray bursts (GRBs), active galactic nuclei (AGN). Detecting and analyzing these transients is crucial for understanding these extreme astrophysical processes. 

Traditional search methods for gamma-ray transients often rely on obtaining a probability density function (PDF) for the positional uncertainty region and aim to cover that region as quickly as possible. This ‘brute force’ approach can be suboptimal due to the limited visibility and field of view of gamma-ray telescopes. It also ignores important information such as the expected energy spectrum of the transitional source, the anticipated changes in its emission over time, and the instrument’s performance under various observing conditions. Moreover, the large uncertainty on the position of GW events as detected by the current GW detectors challenges the detection using reduced field-of-view telescopes. This is particularly true for complex multi-site observatories such as the Cherenkov Telescope Array Observatory (CTAO). 

This project will develop an integrated computational framework that combines Bayesian inference, machine learning, and domain-specific knowledge of astrophysical sources and instruments in response to real-time alerts. In parallel, a low-latency, high-throughput system will be built to support near-real-time decision-making on streaming data. By focusing observations on the most promising regions of the sky at optimal times, guided by predicted source behavior and instrument performance, the framework aims to maximize the detection probability of high-energy gamma-ray transients. 

The successful implementation of this framework promises to provide deeper insights into high-energy astrophysical phenomena and enhance the scientific return from CTAO observations. 

Christian Utama
FU Berlin - HZB

Contact

Christian Utama
Explainable Artificial Intelligence and Trust in the Energy Sector (2020 - )

Supervisors:

Chistian Meske (FU)

Rutger Schlatmann (HZB)

 

Today, because of the complexity of underlying machine learning models, AI appears as a ‘black box’, since the internal learning and optimization processes are often not completely comprehensible. To tackle the trade-off problem, methods of “Explainable Artificial Intelligence” (XAI) were developed to increase the transparency of underlying models without decreasing their performance. In this context, the PhD project aims to apply XAI methods for machine learning models in two use cases in the field of energy, esp. photovoltaics, to increase the models’ explainability and hence trustworthiness.

The first use case is linked to the existing HEIBRiDS project “Optimization of solar energy yield..” The control strategies in this project incorporate predictive and prescriptive data analytics based on machine learning approaches, which however represent black boxes. The XAI research proposed here makes it possible both to improve the trust and acceptability of AI-based solutions, and to generate findings that sustainably improve product design or system configurations. The second use case relates to the “Combinatorial materials discovery” pursued in the HZB research groups Unold/Schorr and Abdi/van de Krol, which focuses on the exploration of light-absorbing semiconductor sand catalysts for solar energy conversion devices by combinatorial high throughput methods. This generates high-dimensional sets of data, which have to be searched and analyzed for structure-property-function relationships and from which guidance for further experiments is sought. For this purpose, it is aimed to use machine learning approaches to automate time-consuming procedures regarding the analysis of the multidimensional datasets. The introduction of machine learning in combination with XAI methods is hence expected to accelerate development processes, to provide new physical understandings and eventually support the efficient development of materials for solar conversion devices.

The results of the PhD project will be synthesized to contribute knowledge regarding a) the validity and reliability of XAI methods in the field of photovoltaic, b) new opportunities for data pre-processing and data analytics based on XAI outcomes, c) consequent effects on the development of (and trust towards) energy systems, and d) the findings’ transferability to other fields in the energy sector.

 

Full-length publications

  1. C. Utama, C. Meske, J. Schneider, and C. Ulbrich (2022). Reactive power control in photovoltaic systems through (explainable) artificial intelligence. Applied Energy, 328. https://doi.org/10.1016/j.apenergy.2022.120004
  2. C. Utama, B. Karg, C. Meske, and S. Lucia (2022). Explainable artificial intelligence for deep learning-based model predictive controllers. In Proceedings of the 26th International Conference on System Theory, Control and Computing (ICSTCC), 464-471. https://doi.org/10.1109/ICSTCC55426.2022.9931794
  3. C. Utama, C. Meske, J. Schneider, R. Schlatmann, and C. Ulbrich (2023). Explainable artificial intelligence for photovoltaic fault detection: A comparison of instrumentsSolar Energy, 249, 139–151. https://doi.org/10.1016/j.solener.2022.11.018

 

Conference presentations

  1. C. Utama, C. Meske, J. Schneider, and C. Ulbrich. Reactive power control in photovoltaic systems through (explainable) artificial intelligence. (Poster presentation), 8th World Conference on Photovoltaic Energy Conversion (WCPEC-8), Milan, Italy, 26-30 September 2022.
  2. C. Utama, B. Karg, C. Meske, and S. Lucia. Explainable artificial intelligence for deep learning-based model predictive controllers. (Oral presentation), 26th International Conference on System Theory, Control and Computing (ICSTCC), Online, 19-21 October 2022.

Femke van Geffen
AWI - TU Berlin

Contact

Femke van Geffen
New routines to explore modern genomic data to assess ancient DNA records from the Last Ice Age (2019 - )

Supervisors:

Ulrike Herzschuh (AWI)

Begüm Demir (TU)

 

The Polar Terrestrial Environmental Systems Research Group at the Alfred-Wegener-Institut has multiple field trips a year that, among other goals, aim to collect data to monitor the landcover and vegetation dynamics in the Arctic region. Vegetation dynamics can provide an insight into the effects of global warming on the environment. The aim of the current project is to employ Machine Learning and Deep learning methods to analyse this data to gain better insights into the dynamics of the vegetation species and how these are changing over time. In order to accomplish this goal, various types of Remote Sensing data is used such as Sentinel-2, Landsat 7/8 as well as drone data collected in the field. The ultimate goal is to develop a fusion method that can utilise the available data to create a comprehensive overview of vegetation dynamics of the past, present and future.  

 

Full-length publications

  1. F. van Geffen, B. Heim, F. Brieger, ..., U. Herzschuh, and S. Kruse. SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches. Earth Syst. Sci. Data, 14, 4967–4994, 2022. https://doi.org/10.5194/essd-14-4967-2022
  2. F. van Geffen, R. Hänsch, B. Demir, S. Kruse, U. Herzschuh and B. Heim (2025). A Benchmark Dataset for Sentinel-2 Based Forest Type Classification in the Siberian Summergreen-Evergreen Forest Transition Zone. in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, https://doi.org/10.1109/JSTARS.2025.3562912

 

Conference presentations

  1. F. van Geffen, B. Heim, U. Herzschuh, L. Pestryakova, E. Zakharov, R. Hänsch, B. Demir, B. Kleinschmit, M.Förster, and S. Kruse. SiDro Forest: Siberian drone-mapped forest inventory. (Oral presentation), Arctic Science Summit Week, Lisbon, Portugal / Online,19-26 March 2021.
  2. F. van Geffen, B. Heim, U. Herzschuh, L. Pestryakova, E. Zakharov, R. Hänsch, B. Demir, B. Kleinschmit, M.Förster, and S. Kruse. SiDro Forest: Siberian drone-mapped forest inventory. (PICO presentation), EGU General Assembly, Online, 19-30 April 2021. https://doi.org/10.5194/egusphere-egu21-15106 

Emma Vinson
AWI - KIT - HU

Contact

Emma Vinson
Reconstructing spatial climate variability patterns using Bayesian hierarchical models (2025 - )

Supervisors:

Prof. Dr. Thomas Laepple

Prof. Dr. Nadja Klein

Prof. Dr. Tobias Krüger 

 

Understanding Earth’s natural climate variability over the past millennia is crucial for anticipating future climate trends and their associated risks. While climate models are essential for assessing plausible future climate trajectories, they often underestimate temperature variability at supra-decadal and regional scales. This mismatch can affect how we attribute climate change and design effective regional adaptation strategies. 
 
Paleoclimate records — such as marine sediments, ice cores, tree rings and other proxies — offer a window into Earth’s climate over thousands of years. However, these records are sparse, irregular in space and time, and carry uncertainties in dating and measurement. Despite these challenges, the vast number of available proxy records provides an opportunity to better understand climate variability — if analyzed with appropriate methods. 

This research aims to develop a Bayesian framework for analyzing climate signals in the frequency domain using multiple, spatially dependent power spectra. The project will: 
i) create statistical models that account for sparse, irregular, and uncertain spatio-temporal climate data, 
ii) incorporate geographical relationships through hierarchical structures, 
iii) integrate diverse types of paleoclimate records and their uncertainties into a single model, and 
iv) ensure computational efficiency to handle large datasets spanning thousands of locations and thousands of years. 
 
By combining these elements, the project will produce spatial maps of temperature variability across the Earth during the late Holocene. This novel approach — focusing on the frequency domain, applying Bayesian methods, and aggregating multiple types of paleoclimate records — will yield a more precise understanding of how temperature has varied over decades to millennia at different geographical scales. 
 
 

Piet Lennart Wagner
HZB - HU

Contact

Piet Lennart Wagner
Harnessing Language Models for Knowledge Discovery in Specialized Scientific Domains (2025 - )

Supervisors:

Alan Akbik (HU) 

Thomas Unold (HZB) 

 

The large body of scientific publications and the constant generation of experimental data in materials research present significant opportunities for discovery and innovation. However, the integration of these data sources is often slowed by inefficient, manually-intensive processes. This project aims to enhance materials research by developing tools that connect experimental results with relevant information extracted from scientific literature through natural language processing (NLP) techniques. 
Our approach involves creating a framework to bridge the gap between experimental workflows and literature-derived insights. We will develop tools to automatically identify and retrieve literature relevant to ongoing experiments, extracting critical information on materials properties and processing conditions. This information will be integrated directly into the experimental workflow, improving the efficiency and effectiveness of materials discovery and optimization. 
A high-throughput combinatorial materials exploration workflow will be used as a model system. This system produces compositional gradient samples, which are characterized using automated optical characterization techniques. By leveraging this advanced experimental setup, we generate and analyze large datasets, providing an testbed for integrated tools. 
The innovation of this project lies in the application of NLP to extract useful information from the extensive body of scientific literature. Our tools will parse and interpret complex scientific texts, extracting relevant data on materials composition, properties, and processing methods. This data will then be mapped onto experimental data, facilitating a dynamic research environment where literature-derived knowledge informs experimental design and decision-making. 


By integrating literature-based insights into the experimental workflow, this project aims to accelerate the pace of materials innovation. Researchers will have immediate access to relevant findings from the literature, enabling more informed hypotheses and experimental strategies. 


Ultimately, this project seeks to improve the landscape of materials research by using NLP to connect literature and experimental data in a practical way. The expected outcomes include a significant acceleration in the discovery and optimization of new photovoltaic materials, fostering innovation and advancing the frontiers of materials science.

Xiaoyan Yu
MDC - Charité - Uni Potsdam

Contact

Xiaoyan Yu
Deep Learning with sparse annotations for the analysis of lung tissue microscopy images (2020 - )

Supervisors:

Dagmar Kainmueller (MDC)

Andreas Hocke (Charité)

Marina Höhne (Uni Potsdam)

 

The field of digital pathology is growing and people have higher and higher expectations of artificial intelligence to realize the automatic diagnosis. Before achieving automatic medicine diagnosis or other specific clinic tasks, realizing medical images semantic instance segmentation, which is pixel-wise recognition of objects and their classification into meaningful categories, is a key step. And semantic instance segmentation is also a classical task in the computer vision area. The great success of semantic instance segmentation by applying neural networks has shown on nature images with the help of a large amount of annotated dataset. While this can be achieved for microscope images, there are problems: new types of samples by means of microscopy are continually being entered into medical image datasets and getting annotated samples of each type for training neural networks would not be feasible. Secondly, diverse microscopy modality including upcoming techniques is being developed which makes this task even more complex. To date, there are no automated methods that are able to perform desired image analyses at the necessary level of accuracy without extensive manual input. This particularly holds for microscopy data of cells in heterogeneous tissue, where the cost for accurate outlining of cell boundaries, be it as part of a manual analysis or as part of generating training data for deep learning methods, restricts the feasibility of high-content studies.

In this project, we aim at overcoming this restriction by leveraging “sparse” annotations for training deep neural networks for semantic instance segmentation, which amounts to the weak supervision. We will develop a model for learning pixel-accurate instance segmentation purely from center point annotations, which is an unsolved problem for clusters of densely packed objects, like cells in tissue. Beyond leveraging center point annotations, we will investigate alternative sparse annotations, like image-level labels, in terms of their potential to be generated by crowd workers. In the extreme case, we want to explore the possibility of solving the problem in an unsupervised manner. We would like to investigate on a deep learning method applied to medical images doing semantic instance segmentation task without any labels and its limit.

 

Full-length publications

  1. J.L. Rumberger, X. Yu, P. Hirsch, M. Dohmen, V.E. Guarino, A. Mokarian, L. Mais, J. Funke, and D. Kainmueller (2021). How shift equivariance impacts metric learning for instance segmentation.  Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

 

Conference presentations

-