Flux Balance Analysis (Part 2)

2024. 12. 6. 10:22·Bioinformatics

In Part 1, we delved into the mathematical basis of FBA. In this article, we will explore the pipeline of the personalized multi-omics FBA model and apply it to real-world data.

 

Introduction

Let's revise Part 1:

  • FBA is a mathematical method used to simulate the metabolomic profiles of cells.
  • Why simulate metabolic profiles instead of directly measuring them?
    • Current metabolic profiling methods remain inefficient.
    • By modifying FBA to incorporate sample-specific genomics data, we can leverage the already abundant transcriptomics data to extrapolate the metabolic profiles of the collected samples.
  • FBA predicts optimal metabolic flux distributions through linear programming.
    • The steady state assumption in FBA is a double-edged sword: while it simplifies the optimization process, the choice of lower- and upper-bounds can significantly influence the predicted flux distributions.

These are crucial points from Part 1 that are essential for understanding Part 2.

 

Personalized FBA

As mentioned, the predicted flux distribution is heavily influenced by the user-defined lower- and upper-bounds. Additionally, traditional FBA models do not incorporate kinetic or thermodynamic constraints, which significantly affect metabolic fluxes and the directionality of individual reactions. To address this issue, Lewis et al. (2021) utilized transcriptomic, genomic, kinetic, and thermodynamic data to determine optimal maximal fluxes.

 

In the first study, FBA was modified to account for gene expressions and somatic mutation data of solid tumors from The Cancer Genome Atlas (TCGA). In the second study, the predicted flux values from the modified FBA model were combined with gene expression and somatic mutation variables to build a meta-learner capable of achieving improved prediction accuracy.

 

Stoichiometric matrix

Overview of Recon3D database from Brunk et al., (2018).

 

The stoichiometric matrix $ S $ is defined by the metabolic network provided by `Recon3D`, a community-curated human metabolic network reconstruction. Recon3D includes the stoichiometry for 4,140 metabolites, 13,547 reactions, and 3,268 genes. Out of 8,401 metabolites, those without a KEGG database ID were filtered out, resulting in predictions being performed for 1681 metabolites

 

Turnover numbers and missense mutation

To perform linear programming, we need to set the lower and upper flux bounds for each $ v_j \; (j = 1, ..., 1681) $. Recall the Michaelis-Menten kinetics equation:

 

$$v = \frac{d[P]}{dt} = V_{\text{max}} \frac{[S]}{K_M + [S]} = k_{\text{cat}} [E]_0 \frac{[S]}{K_M + [S]}$$

where $ V_{\mathrm{max}} = k_{\mathrm{cat}} [E]_0 $

 

The turnover number $ k_{\mathrm{cat}} $ represents the efficiency of an enzyme. In enzymology, for an enzyme with a single active site, $ k_{\mathrm{cat}} $ is defined as the number of substrate molecules that undergo chemical conversion per unit time. Therefore, we can leverage experimentally obtained turnover numbers to set physiologically feasible upper-bounds for reactions. Experimentally measured turnover values from the `BRENDA`database with the corresponding enzyme, substrate, organism, and environmental conditions as those in the Recon3D network were extracted.

 

Distribution of Envision scores in TCGA samples from Lewis et al., (2021).

 

A `missense mutation` is a point mutation that results in the change of one amino acid in a protein, thereby altering the protein's function. Scores of each missense mutation are obtained from the `Envision` database (Gray et al., 2017). Envision uses experimental data from ~28,500 missense mutations to train a set of gradient-boosting regression models that predict the percent changes in protein activity following a mutation. Each score is normalized so that scores less than 1 indicate a loss-of-function and scores greater than 1 indicate a gain-of-function.

 

For each patient with available somatic mutation data, the turnover number for a particular reaction is multiplied by the Envision score for any SNPs located in the associated gene. Interestingly, "exclusion of genomic data did not affect the majority of objective value calculations, nor comparisons between radiation-sensitive and -resistant tumor models", signifying that inclusion of mutational data has a negligible effect on the predicted flux distributions.

 

Turnover numbers and gene expression

$ V_{\mathrm{max}} = k_{\mathrm{cat}} [E]_0 $

 

The limiting rate $ V_{\mathrm{max}} $ is influenced by the initial concentration of the enzyme. Enzyme abundances $ [E] $ are predicted from sample gene expression data using ordinary differential equation models developed by Schwanhausser et al., (2011). Models were available for only 955 out of 3,268 genes in the Recon3D network. To address the missing data, linear regression was employed for imputation. For each sample, a linear regression was performed between the predicted protein abundance values and the measured gene expression values of the corresponding genes. This regression model was then used to predict the initial abundance of proteins for the missing genes.

 

Objective function

As mentioned, users can define any objective function of interest. To study the differential flux distribution between radiation-resistant and -sensitive groups, the reductive process of NADPH $\rightarrow$ NADP$^+$ + H$^+$ was maximized. To incorporate the flux values in a predictive model, each flux involving the 955 metabolites was maximized.

 

1met[all] $\rightarrow \emptyset$

 

Its objective function is maximized for each "met" in "all" cellular compartments. "This creates an artificial sink for a particular metabolite in the Recon3D metabolic network, resulting in the maximization of reaction fluxes generating this metabolite."

 

Then the `maximized objective value` (not the optimized reaction speed value) for each metabolite is used as a predictor value in an XGBoost model. In essence, FBA was used as a feature engineering process.

 

Summary of results

Summary of personalized multi-omics FBA model from Lewis et al., (2021).

 

 

The AUROC performance of radiation-resistant and -sensitive classification based on the gene expression was around 0.85. When mutational information and FBA objective values were incorporated into an ensemble model, its performance jumped to 0.906.

 

Shapley values of each data modality.

 

The performance of the XGBoost model trained solely on the metabolite values was not shown. However, the Shapely values of the metabolite base model were comparable to or higher than the gene expression data, signifying that the objective values derived from FBA possess significant predictive power.

'Bioinformatics' 카테고리의 다른 글

Flux Balance Analysis (Part 3)  (0) 2025.01.09
Flux Balance Analysis (Part 1)  (0) 2024.12.04
'Bioinformatics' 카테고리의 다른 글
  • Flux Balance Analysis (Part 3)
  • Flux Balance Analysis (Part 1)
CDeo2
CDeo2
inter-link 팀 블로그 입니다.
  • 링크

    • 재홍
    • 우석
  • 공지사항

    • 가이드라인
  • CDeo2
    Inter-link
    CDeo2
    • 분류 전체보기 (10)
      • Research (2)
        • Published (2)
      • Data Science (3)
        • Dimensionality Reduction (2)
        • Clustering Algorithms (1)
        • Federeated Learning (0)
        • Uncertainty Estimation (0)
      • Computer Science (0)
        • Concurrency Control (0)
      • Bioinformatics (5)
        • Single Cell (1)
        • Metagenomics (0)
        • Pathway Analysis (1)
  • 블로그 메뉴

    • 글쓰기
    • 관리
  • 전체
    오늘
    어제
  • hELLO· Designed By정상우.v4.10.1
CDeo2
Flux Balance Analysis (Part 2)
상단으로

티스토리툴바