Research - Metabolic Reactions

Genome-scale classification of metabolic reactions

Computer processing of chemical reactions requires specific methodologies for their numerical representation. We have proposed the MOLMAP method for numerically encoding the structural transformations resulting from a chemical reaction. The chemical bonds existing in the structure of each reactant, and each product, are classified by a Kohonen self-organizing map (SOM) on the basis of their (calculated) physicochemical properties. This leads to a numerical fixed-length fingerprint (the MOLMAP) describing the types of bonds available in each molecule. By subtracting the MOLMAPs of the products from the MOLMAPs of the reactants, a MOLMAP of the reaction is obtained, which represents the types of bonds that disappeared from the reactants as well as those created in the products.

Such a numerical representation of chemical reactions enabled us to automatically classify a genome-scale database of >3,000 enzymatic reactions (the reactome) using again the well-established machine learning technique SOM, or Random Forests, obtaining an agreement with most of the official Enzyme Commission (EC) numbers (Figure). Several interesting and promising results emerged from this application. The possibility to automatically identify similarities between enzymatic reactions in databases (even those with no official EC numbers) is of importance, e.g. for the design of biotechnological processes, or in the genomic reconstruction of metabolic pathways where genes are annotated on the basis of sequence similarity to genes encoding for specific enzymatic functions. The assignment of EC numbers from the reaction formula can assist in internal validations of the EC system, and has revealed similarities between metabolic reactions hidden by much different EC numbers.

Figure: A self-organizing map trained with 3741 enzymatic reactions from the KEGG database, and its relationship with EC numbers. Reactions were distributed on the map exclusively based on similarities between their MOLMAP descriptors calculated from the reaction formulas. The class of the reactions activating a position of the grid (a neuron) determines its color.

Refs:

D. A. R. S. Latino, J. Aires-de-Sousa, "Assignment of EC Numbers to Enzymatic Reactions with MOLMAP Reaction Descriptors and Random Forests", J. Chem. Inf. Model. 2009, 49 (7), 1839–1846.

D. A. R. S. Latino, Q.-Y. Zhang, J. Aires-de-Sousa, "Genome-scale classification of metabolic reactions and assignment of EC numbers with self-organizing maps", Bioinformatics 2008, 24(19), 2236-2244.

D. A. R. S. Latino, J. Aires-de-Sousa, "Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics Approach", Angew. Chem. Int. Ed. 2006, 45(13), 2066-2069.