Development of high-throughput, next-generation sequencing and other advanced technologies, a large number of gene expression profiles have been produced. Many of these profiles are available from public databases [1-3]. A challenging research problem that has drawn a lot of attention in the past is to infer gene regulatory networks from the expression data. A gene regulatory network is represented by a directed graph, in which nodes represent transcription factors or mRNA with edges showing transcriptional regulatory relationships between two nodes.
Maetschke et al.  categorized existing network inference methods into three groups: unsupervised, supervised and semi-supervised. While supervised algorithms are capable of achieving the highest accuracy among all the network inference methods, these algorithms require a large number of positive and negative training examples. An example here refers to an edge between two nodes in a network. A positive example refers to a known interaction between two genes while a negative example refers to an interaction that is known not to exist between two genes. Negative examples are difficult to obtain in many organisms. Instead, some researchers use unknown interactions between genes for negative examples. Unsupervised algorithms infer networks based solely on gene expression profiles and do not need any training examples. The accuracy of these algorithms is usually low. However, these algorithms are useful for organisms where training data are not available. Semi-supervised algorithms often exploit positive-unlabeled (PU) learning techniques by taking a small sample of positive examples and a large number of unlabeled examples to train a classification model and use the trained model to predict a network.
Marbach et al. [5,6] developed an in silico benchmark suite within the DREAM (Dialogue on Reverse Engineering Assessment and Methods) project [7,8], and assessed the performance of 29 network inference methods. They concluded that reliable network inference from gene expression data remains an unsolved problem. Madhamshettiwar et al.  evaluated nine state-of-the-art gene regulatory network inference methods using 38 simulated datasets. These authors observed that the performance of the evaluated methods depends on many factors such as features of the data, network size and topology, as well as parameter settings. Indeed, the parameter settings often affect the accuracy of a network inference method, and identifying the optimal parameter values is a very challenging task.
Citation: Jason T. L. Wang (2015) Inferring Gene Regulatory Networks: Challenges and opportunities. J Data Mining Genomics Proteomics 6:e118. doi: 10.4172/2153-0602.1000e118