Predicting the Genotype-Phenotype Map of Complex Traits

How to predict phenotypic development in a changing environment from the genotype of complex organisms is one of the most important and challenging questions we face in modern biology. This challenge can be addressed through establishing a framework that identifies and maps the mechanistic basis of the processes from genotype to phenotype. The central rationale of this framework is based on the genetic, developmental and regulatory dissection of phenotypic changes in response to different environments. First, a phenotype is genetically complex because of the involvement of many genes that display pervasive interactions with other genes and with environmental factors. Second, the formation of any phenotype involves a series of developmental events and biological alterations that entail cell growth, differentiation and morphogenesis. Third, DNA polymorphisms affect variation in a phenotype by perturbing transcripts, metabolites and proteins in transcriptional and regulatory networks. In this editorial, I attempt to provide a big picture of each of these three aspects on phenotypic dissection. The genotype-phenotype prediction can be enabled by integrating mathematical models for developmental processes from morphogenesis to pattern formation as well as for transcript, protein and metabolite abundance affecting high-order phenotypes through a series of biochemical steps. Wu, J Biomet Biostat 2012, 3:4 http://dx.doi.org/10.4172/2155-6180.1000e109 Volume 3 • Issue 4 • 1000e109 Citation: Wu R (2012) Predicting the Genotype-Phenotype Map of Complex Traits in Plants. J Biomet Biostat 3:e109. doi:10.4172/2155-6180.1000e109 J Biomet Biostat ISSN:2155-6180 JBMBS, an open access journal Page 2 of 3 Modeling change points or structural changes for a sequence of plant developmental events: Plant development includes several sequential distinct phases, seed development from a complete embryo, vegetative growth after the germination of the seed, flower growth, pollination, and seed formation. Seed development within each phase is a quantitative change over time; mathematical models can well be used to describe the pattern of development. However, the connection of any two adjacent phases through mathematical models is challenging given that they are qualitatively different from each other. New mathematical and statistical models for functional mapping are entailed to map the QTLs that regulate the transition of development from one phase to next. Modeling epistatic interactions among genes from different regions of seed: Seed development in flowering plants is triggered by a double-fertilization process that leads to the differentiation of the embryo, endosperm, and seed coat that are the major regions of the seed and essential for seed viability and plant reproduction. Many different developmental and physiological events occur within each seed region during development that is programmed, in part, by the activity of different genes. Seed development, therefore, is the result of a mosaic of different gene expression programs occurring in parallel in different seed compartments. Statistical models have been developed for integrating genes from different regions [13,14], but new models should be developed for understanding how these genes are organized into unique regulatory circuits within the plant genome to “make a seed.” Developing functional genome-wide association studies: To date, most QTLs in plants have been identified by linkage mapping with biparental crosses. With the advent of high-throughput SNP data in plants, genome-wide association studies (GWAS) will soon be coming of age in plants. Functional mapping has been integrated into the setting of GWAS, leading to the birth of a new statistical approach, functional genome-wide association studies or fGWAS [15]. Through fGWAS, a complete number of genes for plant traits can be identified and, more importantly, the physiological and developmental pathways in which these genes function and interact with each other can be characterized. In a platform developed for fGWAS, Bayesian lasso has been developed to select a subset of significant SNPs for developmental trajectories of traits, with the model solved using the Markov chain Monte Carlo algorithm [16,17]. Regulatory Dissection of Complex Traits The formation of any trait can be modeled as a dynamic system in which various biological parts coordinate to determine a final phenotype through genetic regulation. The behavior and outcome of this system, i.e., trait phenotype, can be changed by altering the pathways of one or more parts. To achieve this, a profound understanding of how different parts are coordinated and organized into a whole system and what are the genetic roots of the function of these parts is crucial. As the cost of methods for measuring mRNA, protein and other indicators continues to fall, it becomes reasonable to design experiments that capture the dynamic processes of phenotypic formation across timescales. With these data, we can reconstruct biological networks by incorporating transcriptome (the set of RNA transcripts), proteome (the set of proteins), and metabolome (the entire range of metabolites taking part in a biological process) to the functional mapping and systems mapping of final phenotypes. The true quantitative relationship between the variation in activity of every one of the thousands of gene-protein couples or protein-metabolite couples in a cellular system can be understood by implementing a high-dimensional system of differential equations (DEs). The DEs that model electronic networks in engineering have been successfully used to map QTLs involved in phenotypic variation [10]. The DEs have power to model several critical factors related to the regulatory network, such as the time displacement of the genetic and protein synthetic and post-translational events, their different timescales and their half-lives [18]. By integrating a system of DEs for regulatory pathways and functional mapping, we have developed a new model, called network mapping, for mapping the underlying transcriptional, proteomic, and metabolomic QTLs and interaction networks among these different types of QTLs [19]. The model has power to test what are the most important pathways that cause final phenotypes and how genes control these pathways. The regulatory network can be predicted by combining environmental and genetic perturbations through network mapping.

The current theory of complex trait genetics is based on the hypothesis that genetic variants in the genetic code, such as singlenucleotide polymorphisms (SNPs), insertions or deletions (indels), and copy number variants, act in concert to determine the phenotypic value of a trait through functional alterations in the activity, expression level, stability, and splicing of the RNA and proteins they encode. Genetic mapping that attributes a phenotypic trait to its underlying quantitative trait loci (QTLs) using polymorphic markers is powerful for mapping the locations of QTLs on the genome and estimating their effects of genetic actions and interactions [2]. As a routine technique of genetic analysis, QTL mapping has been instrumental for studying the genetic architecture of complex traits [3].

Developmental Dissection of Complex Traits
Development includes a broad spectrum of processes. For example in plants, these processes include the formation of a complete embryo from a zygote, seed germination, the elaboration of a mature vegetative plant from the embryo, the formation of flowers, fruits, and seeds, and many of the plant's responses to its environment. Each of these processes is fundamental to determine the size, shape and production of all higher plants. For this reason, knowledge of the genetic basis of the variation in each process is important for understanding adaptive evolution and deriving elite domestic crop varieties. While traditional approaches for mapping QTLs with phenotypes measured at particular times fail to capture the dynamic structure and pattern of the process, two new statistical methods, called functional mapping (incorporated in a package of software FunMap [4,5]) and systems mapping, integrates biological mechanisms and dynamic processes of the trait into the genetic mapping framework through mathematical and computational models [6][7][8][9][10][11]. Functional mapping unifies the strengths of statistics, genetics, and developmental biology, thus facilitating the test of the interplay between genetic action and development.
The principle of functional mapping can be expanded to map ontogenetic QTLs that govern all developmental events in a plant's lifetime [12]. Previous work for functional mapping focused on the identification of QTLs for a particular phase of development using a mathematical model for growth trajectories during this specific phase. Thus, identified QTLs from this approach cannot be inferred to affect the landscape of ontogenetic growth and development. In plants ontogenetic QTL mapping, three major issues remain to be resolved: Modeling change points or structural changes for a sequence of plant developmental events: Plant development includes several sequential distinct phases, seed development from a complete embryo, vegetative growth after the germination of the seed, flower growth, pollination, and seed formation. Seed development within each phase is a quantitative change over time; mathematical models can well be used to describe the pattern of development. However, the connection of any two adjacent phases through mathematical models is challenging given that they are qualitatively different from each other. New mathematical and statistical models for functional mapping are entailed to map the QTLs that regulate the transition of development from one phase to next.
Modeling epistatic interactions among genes from different regions of seed: Seed development in flowering plants is triggered by a double-fertilization process that leads to the differentiation of the embryo, endosperm, and seed coat that are the major regions of the seed and essential for seed viability and plant reproduction. Many different developmental and physiological events occur within each seed region during development that is programmed, in part, by the activity of different genes. Seed development, therefore, is the result of a mosaic of different gene expression programs occurring in parallel in different seed compartments. Statistical models have been developed for integrating genes from different regions [13,14], but new models should be developed for understanding how these genes are organized into unique regulatory circuits within the plant genome to "make a seed." Developing functional genome-wide association studies: To date, most QTLs in plants have been identified by linkage mapping with biparental crosses. With the advent of high-throughput SNP data in plants, genome-wide association studies (GWAS) will soon be coming of age in plants. Functional mapping has been integrated into the setting of GWAS, leading to the birth of a new statistical approach, functional genome-wide association studies or fGWAS [15]. Through fGWAS, a complete number of genes for plant traits can be identified and, more importantly, the physiological and developmental pathways in which these genes function and interact with each other can be characterized. In a platform developed for fGWAS, Bayesian lasso has been developed to select a subset of significant SNPs for developmental trajectories of traits, with the model solved using the Markov chain Monte Carlo algorithm [16,17].

Regulatory Dissection of Complex Traits
The formation of any trait can be modeled as a dynamic system in which various biological parts coordinate to determine a final phenotype through genetic regulation. The behavior and outcome of this system, i.e., trait phenotype, can be changed by altering the pathways of one or more parts. To achieve this, a profound understanding of how different parts are coordinated and organized into a whole system and what are the genetic roots of the function of these parts is crucial.
As the cost of methods for measuring mRNA, protein and other indicators continues to fall, it becomes reasonable to design experiments that capture the dynamic processes of phenotypic formation across timescales. With these data, we can reconstruct biological networks by incorporating transcriptome (the set of RNA transcripts), proteome (the set of proteins), and metabolome (the entire range of metabolites taking part in a biological process) to the functional mapping and systems mapping of final phenotypes.
The true quantitative relationship between the variation in activity of every one of the thousands of gene-protein couples or protein-metabolite couples in a cellular system can be understood by implementing a high-dimensional system of differential equations (DEs). The DEs that model electronic networks in engineering have been successfully used to map QTLs involved in phenotypic variation [10]. The DEs have power to model several critical factors related to the regulatory network, such as the time displacement of the genetic and protein synthetic and post-translational events, their different timescales and their half-lives [18]. By integrating a system of DEs for regulatory pathways and functional mapping, we have developed a new model, called network mapping, for mapping the underlying transcriptional, proteomic, and metabolomic QTLs and interaction networks among these different types of QTLs [19]. The model has power to test what are the most important pathways that cause final phenotypes and how genes control these pathways. The regulatory network can be predicted by combining environmental and genetic perturbations through network mapping.

Outlook
Genetic analysis of complex traits has now developed to a point at which available approaches allows us to comprehend the genetic architecture of a complex trait and elucidate the rules for translating genetic variation among individuals to the phenotypic variation of the trait. Since the number of genes is usually high, the estimation of genetic effects of each gene becomes highly challenging. The following three strategies are recommended to confront this challenge: 1. Develop and use geometric series to model intrinsic changes of genetic effects over a sequence of genes on chromosomes with a fewer number of parameters. The predictive model constructed on geometric series has been confirmed by results from quantitative genetic analyses of many traits [20].
2. Derive a variable selection approach, such as lasso, to analyze all markers and their interactions at the same time [16]. Since the number of markers may be much larger than the number of samples, special regression approaches equipped by penalty will be developed. By shrinking the effects of a majority of markers to zero we obtain highly sparse estimates of marker effects.
3. Develop a new model for studying gene-environment interactions and charting the genetic basis of phenotypic plasticity for dynamic traits. A dynamic model developed [21,22] can be extended to quantify the genetic control of phenotypic plasticity over a range of discrete environments.
The future direction of genetic studies should focus on the mechanistic and process analysis of complex phenotypes by combining genetic approaches with developmental and regulatory principles underlying trait formation and progression. Such an approach enables geneticists to perform the genetic, developmental and regulatory dissection of complex traits and predict the phenotype from genotype. Computational biologists should collaborate with experimental biologists to study the genetic architecture, developmental interactions and regulatory network of complex traits. This will not only allow the conceptual models to be tested and validated by analyzing real data, but also likely glean a new insight into the genetic roots that drive the formation and development of complex phenotypes.