Work

Data-driven Approaches to Predict and Utilize Enzyme Promiscuity for Novel Biosynthesis Design

Public

Downloadable Content

Download PDF

Enzyme substrate promiscuity has significant implications for metabolic engineering. The ability to predict the space of possible enzymatic side reactions is crucial for elucidating underground metabolic networks in microorganisms, as well as harnessing novel biosynthetic capabilities of enzymes to produce desired chemicals. Reaction rule-based cheminformatics platforms have been implemented to computationally enumerate possible promiscuous reactions, relying on existing knowledge of enzymatic transformations to inform novel reactions. However, past versions of curated reaction rules have been limited by 1) a lack of comprehensiveness in representing all possible transformations, 2) the inability to associate new reactions with the most promising candidate enzymes, as well as 3) the need to prune enumerated reactions to enhance computational efficiency in pathway expansion. This work first focused on curation of a set of most generalized enzymatic reaction rules. All rules were automatically abstracted from atom-mapped MetaCyc reactions, and verified to uniquely cover all common enzymatic transformations without any redundancies and errors, resulting in a minimal yet comprehensive rule set. Then, this work sought to improve the accuracy of enzyme annotations for new reactions, and focused on generating a set of reaction rules that reflect realistic ranges of promiscuity consistent with known enzyme sequences and substrates. An unsupervised learning approach was utilized to generate reaction rules with adjustable levels of specificity, and this level was optimized to correctly recover the highest number of enzyme substrate association. Finally, to solve the combinatorial explosion issue that leads to heavy computational burden during the enumeration of reaction pathways, an enzymatic reaction feasibility classifier based on the XGBoost machine learning architecture was developed to quickly assess whether new enzymatic reactions are feasible. This classifier can be used as a filter to prune out infeasible reactions during the enumeration of reaction networks to improve both the accuracy and runtime of retrobiosynthesis platforms. Significant efforts were focused on strategically generating more confident and high-quality synthetic negative data as part of the training dataset, as well as extensively experimenting with a range of methods to featurize reactions in a meaningful way, in order to generate a highly performing model that is widely applicable under different scenarios. Altogether, these computational efforts that employ data-driven approaches to study enzyme promiscuity will lead to further advancement in its applications towards biocatalysis design, metabolic engineering, and understanding microbial metabolism.

Creator
DOI
Subject
Language
Alternate Identifier
Date created
Resource type
Rights statement

Relationships

Items