Dynamics-based data science in biology

Life science has long been a rich subject for research, and continues to develop at high speed. One of the major aims of life science is to study the mechanisms of various biological processes on the basis of biological big-data. Many statistics-based methods have been proposed to catch the essence by mining such data, including the popular category classification, variables regression, group clustering, statistical comparison, dimensionality reduction, and component analysis. However, these mainly elucidate static features or steady behavior of living organisms because of a lack of temporal data. A biological system is inherently dynamic, and with increasingly accumulated time-series data, there is a need for dynamics-based approaches based on physical and biological laws to reveal dynamic features or complex behavior of biological systems [ 1]. In this perspective, we review three dynamics-based data science approaches for studying dynamical bio-processes: namely, dynamical network biomarkers (DNB), landscapes of differentiation dynamics (LDD) and autoreservoir neural networks (ARNN). They are all data-driven or model-free approaches but based on the theoretical frameworks of nonlinear dynamics, that is, ordinary differential equations (ODE), partial differential equations (PDE) and artificial neural networks (NN), respectively. Figure 1A and B illustrates dynamical bio-processes and their omics data in biomedical fields, while Fig. 1C summarizes the three approaches of dynamics-based data science, which serve as typical examples for studying biological systems from a data-driven dynamical perspective.

Dynamics-based data science approaches. (A) An illustration for a dynamic biological process, which usually includes a series of stages or states. (B) An illustration for omics-data extracted from experiments. (C) Three examples of dynamics-based data science approaches. The first column is the dynamic network biomarker (DNB) framework. DNB provides early warning signals for pre-disease/tipping-point detection from omics data of different health states, based on the bifurcation theory with ordinary differential equations (ODE). The second column is the landscape of a differentiation dynamics (LDD) framework, which distinguishes different cell types, provides pseudo-time trajectory for cell clusters (C1, C2, C3 and C4), and constructs a potential landscape for the cell differentiation process from single-cell RNA-sequencing (scRNA-seq) data during a cell differentiation process, based on the diffusion map theory of partial differential equations (PDE) named Fokker-Planck equations with source-sink terms. The third column is the autoreservoir neural network (ARNN) framework, which is able to predict short time-series from high-dimensional data by spatiotemporal information transformation (STI) and also significantly save computing resources, based on the delay-embedding and generalized embedding theorems of dynamic systems.

Dynamics-based data science approaches. (A) An illustration for a dynamic biological process, which usually includes a series of stages or states. (B) An illustration for omics-data extracted from experiments. (C) Three examples of dynamics-based data science approaches. The first column is the dynamic network biomarker (DNB) framework. DNB provides early warning signals for pre-disease/tipping-point detection from omics data of different health states, based on the bifurcation theory with ordinary differential equations (ODE). The second column is the landscape of a differentiation dynamics (LDD) framework, which distinguishes different cell types, provides pseudo-time trajectory for cell clusters (C1, C2, C3 and C4), and constructs a potential landscape for the cell differentiation process from single-cell RNA-sequencing (scRNA-seq) data during a cell differentiation process, based on the diffusion map theory of partial differential equations (PDE) named Fokker-Planck equations with source-sink terms. The third column is the autoreservoir neural network (ARNN) framework, which is able to predict short time-series from high-dimensional data by spatiotemporal information transformation (STI) and also significantly save computing resources, based on the delay-embedding and generalized embedding theorems of dynamic systems.

The efficiency of dynamical-based data science approaches on biological data has been demonstrated by the three methods above, which all show strong power in solving biological questions and are complementary to traditional statistics-based data science approaches. In addition, dynamical-based data science approaches can be applied to dynamical causality detection by exploring continuity of the cross mapping function between the observed variables. From a methodological viewpoint, we can summarize how to build a dynamics-based data-driven approach for studying biological dynamics as follows.

Taken together, we conclude that the principles and advantages of dynamics-based data-driven approaches are explicable, quantifiable and generalizable. ‘Explicable’ indicates that every term in the dynamics-based data-driven approaches has its physical or biological sense. ‘Quantifiable’ ensures that the system can be measured by objective criteria and is comparable by quantitative indicators. ‘Generalizable’ says that the method can be improved by adding new factors or be generalized to other systems by proper modifications. In particular, dynamics-based data science approaches exploit the essential features of dynamical systems in terms of data, e.g. strong fluctuations near a bifurcation point, low-dimensionality of a center manifold or an attractor, and phase-space reconstruction from a single variable by delay embedding theorem, and thus are able to provide different or additional information to the traditional approaches, i.e. statistics-based data science approaches. We believe that dynamical-based data science approaches will play an important role in systematic research in biology and medicine in future.

FUNDING

This work was supported by the National Key R&D Program of China (2017YFA0505500), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB38040400), the Japan Science and Technology Agency Moonshot R&D (JPMJMS2021), the Japan Agency for Medical Research and Development (JP20dm0307009), Japan Society for the Promotion of Science KAKENHI (JP15H05707), and the National Natural Science Foundation of China (31930022, 31771476 and 12026608).

Conflict of interest statement. None declared.