PEtab SciML: The missing layer for scalable and flexible scientific machine learning modeling in biology
Abstract
Mechanistic ordinary differential equation (ODE) models are a powerful tool to study dynamic biological systems. However, their predictive power is constrained by gaps, biases, and inconsistencies in the literature. They typically also require quantitative time-lapse data for training, which is time-consuming to collect. At the same time, machine-learning approaches can capture complex patterns from data, but they are often harder to interpret and typically require large training datasets. Hybrid scientific machine learning (SciML) models offer a promising way to combine the strengths of both approaches by integrating mechanistic models with flexible data-driven modules.
Despite this promise, the use of SciML in biology remains limited by insufficient infrastructure. Dedicated software is needed because coding end-to-end differentiable workflows for gradient-based training of hybrid models is technically challenging. In addition, model exchange is hindered by the lack of a standardized, reproducible format for specifying SciML training problems, analogous to the PEtab standard for ODE models. To address these challenges, we developed PEtab-SciML, an extension of the PEtab format, and implemented support for it in the state-of-the-art modeling toolboxes PEtab.jl and AMICI. In this seminar, I will introduce the PEtab-SciML format. Using real-data examples, I will show how PEtab-SciML enables the integration of diverse data modalities into dynamic model training; such as learning the kinetic parameters of an ODE model from omics and protein sequence data. I will also show how it supports machine-learning-based black-boxing of complex model components, such as quarantine strength in an SIR model. Finally, I will show how PEtab-SciML enables the use of efficient training strategies, such as curriculum learning, that make SciML models easier to train and apply in practice.