In Part 1, we delved into the mathematical basis of FBA. In this article, we will explore the pipeline of the personalized multi-omics FBA model and apply it to real-world data.
Introduction
Let's revise Part 1:
- FBA is a mathematical method used to simulate the metabolomic profiles of cells.
- Why simulate metabolic profiles instead of directly measuring them?
- Current metabolic profiling methods remain inefficient.
- By modifying FBA to incorporate sample-specific genomics data, we can leverage the already abundant transcriptomics data to extrapolate the metabolic profiles of the collected samples.
- FBA predicts optimal metabolic flux distributions through linear programming.
- The steady state assumption in FBA is a double-edged sword: while it simplifies the optimization process, the choice of lower- and upper-bounds can significantly influence the predicted flux distributions.
These are crucial points from Part 1 that are essential for understanding Part 2.
Personalized FBA
As mentioned, the predicted flux distribution is heavily influenced by the user-defined lower- and upper-bounds. Additionally, traditional FBA models do not incorporate kinetic or thermodynamic constraints, which significantly affect metabolic fluxes and the directionality of individual reactions. To address this issue, Lewis et al. (2021) utilized transcriptomic, genomic, kinetic, and thermodynamic data to determine optimal maximal fluxes.
In the first study, FBA was modified to account for gene expressions and somatic mutation data of solid tumors from The Cancer Genome Atlas (TCGA). In the second study, the predicted flux values from the modified FBA model were combined with gene expression and somatic mutation variables to build a meta-learner capable of achieving improved prediction accuracy.
Stoichiometric matrix
The stoichiometric matrix $ S $ is defined by the metabolic network provided by `Recon3D`, a community-curated human metabolic network reconstruction. Recon3D includes the stoichiometry for 4,140 metabolites, 13,547 reactions, and 3,268 genes. Out of 8,401 metabolites, those without a KEGG database ID were filtered out, resulting in predictions being performed for 1681 metabolites
Turnover numbers and missense mutation
To perform linear programming, we need to set the lower and upper flux bounds for each $ v_j \; (j = 1, ..., 1681) $. Recall the Michaelis-Menten kinetics equation:
$$v = \frac{d[P]}{dt} = V_{\text{max}} \frac{[S]}{K_M + [S]} = k_{\text{cat}} [E]_0 \frac{[S]}{K_M + [S]}$$
where $ V_{\mathrm{max}} = k_{\mathrm{cat}} [E]_0 $
The turnover number $ k_{\mathrm{cat}} $ represents the efficiency of an enzyme. In enzymology, for an enzyme with a single active site, $ k_{\mathrm{cat}} $ is defined as the number of substrate molecules that undergo chemical conversion per unit time. Therefore, we can leverage experimentally obtained turnover numbers to set physiologically feasible upper-bounds for reactions. Experimentally measured turnover values from the `BRENDA`database with the corresponding enzyme, substrate, organism, and environmental conditions as those in the Recon3D network were extracted.
A `missense mutation` is a point mutation that results in the change of one amino acid in a protein, thereby altering the protein's function. Scores of each missense mutation are obtained from the `Envision` database (Gray et al., 2017). Envision uses experimental data from ~28,500 missense mutations to train a set of gradient-boosting regression models that predict the percent changes in protein activity following a mutation. Each score is normalized so that scores less than 1 indicate a loss-of-function and scores greater than 1 indicate a gain-of-function.
For each patient with available somatic mutation data, the turnover number for a particular reaction is multiplied by the Envision score for any SNPs located in the associated gene. Interestingly, "exclusion of genomic data did not affect the majority of objective value calculations, nor comparisons between radiation-sensitive and -resistant tumor models", signifying that inclusion of mutational data has a negligible effect on the predicted flux distributions.
Turnover numbers and gene expression
$ V_{\mathrm{max}} = k_{\mathrm{cat}} [E]_0 $
The limiting rate $ V_{\mathrm{max}} $ is influenced by the initial concentration of the enzyme. Enzyme abundances $ [E] $ are predicted from sample gene expression data using ordinary differential equation models developed by Schwanhausser et al., (2011). Models were available for only 955 out of 3,268 genes in the Recon3D network. To address the missing data, linear regression was employed for imputation. For each sample, a linear regression was performed between the predicted protein abundance values and the measured gene expression values of the corresponding genes. This regression model was then used to predict the initial abundance of proteins for the missing genes.
Objective function
As mentioned, users can define any objective function of interest. To study the differential flux distribution between radiation-resistant and -sensitive groups, the reductive process of NADPH $\rightarrow$ NADP$^+$ + H$^+$ was maximized. To incorporate the flux values in a predictive model, each flux involving the 955 metabolites was maximized.
1met[all] $\rightarrow \emptyset$
Its objective function is maximized for each "met" in "all" cellular compartments. "This creates an artificial sink for a particular metabolite in the Recon3D metabolic network, resulting in the maximization of reaction fluxes generating this metabolite."
Then the `maximized objective value` (not the optimized reaction speed value) for each metabolite is used as a predictor value in an XGBoost model. In essence, FBA was used as a feature engineering process.
Summary of results
The AUROC performance of radiation-resistant and -sensitive classification based on the gene expression was around 0.85. When mutational information and FBA objective values were incorporated into an ensemble model, its performance jumped to 0.906.
The performance of the XGBoost model trained solely on the metabolite values was not shown. However, the Shapely values of the metabolite base model were comparable to or higher than the gene expression data, signifying that the objective values derived from FBA possess significant predictive power.
'Bioinformatics' 카테고리의 다른 글
Flux Balance Analysis (Part 3) (0) | 2025.01.09 |
---|---|
Flux Balance Analysis (Part 1) (0) | 2024.12.04 |