Measuring Inequalities in Gene Co-expression Networks of HIV-1 Infection Using the Lorenz Curve and Gini Coefficient

Volume 5 • Issue 1 • 1000148 J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal The Gini methodology is a family of mathematical models that describe various relations in or between variables [1,2]. The basic concept of Gini methodology is the Gini coefficient (also known as Gini index, or Gini ratio), which measures the inequality of a distribution (e.g., income) with values ranged from 0 (complete equality) to 1 (complete inequality), has been popularly used in economics for quantifying the income inequality in a country [3,4]. Due to the superiority of analyzing data with normalized and non-normalized distribution [2], Gini coefficient and the derived statistical algorithms have been extended to apply in disciplines as diverse as social science, chemistry and engineering. Recently, the Gini methodology has also been introduced to biology for inferring transcription regulation relationships from gene expression data [5], and for exploring the symbiosis and pathogenesis of human immunodeficiency virus type 1 (HIV-1) infection [6].

, where n is the number of genes in the network, X (i) is the ith value of connectivity sorted in increasing order, 0 ≤ X (1) ≤ X (2) ≤…≤ X (n) . We observed that the Lorenz curve from the GCN at the AIDS stage is markedly deviated from the diagonal line that those from the other three GCNs ( Figure 1B). At the same time, the Gini coefficient from the GCN at the AIDS stage is much higher than those from the other three GCNs. These results indicate that dramatic changes of transcriptional regulation at the last stage of HIV infection.

Application of Gini Coefficient to Estimate the Contribution of Positive and Negative Connectivity to the Connectivity Inequality (CI)
In the GCN, the connectivity of a gene is composed with positive and negative connectivity, which present the connection to other genes with positive and negative PCC values, respectively. The contribution of positive and negative connectivity to the overall inequality of connectivity in the network is defined based on the decomposition of Gini coefficient (1): p p p n n n CI = S (X , X)CI S (X , X)CI τ τ + , where CI and CI n are the inequality of positive and negative connectivity, respectively. S p and n n p S (S 1 S ) = − are two Gini share measures represent the percentages of positive and negative connection in the whole network, respectively. p (X ,X) τ and n (X ,X) τ are two Gini correlation coefficients ranged from -1 to 1, indicating the contribution of positive and negative connectivity to the CI, respectively. As shown in Figure 1D, Gini share of positive connectivity in four networks are remarkably higher than that of negative connectivity, indicating the positive regulation is the dominant relation in the network for uninfected subjects and patients at different stages of HIV-1 infection. Interestingly, the positive regulations were enhanced at the first two stages of HIV-1 infection. In contrast, The negative regulations at the AIDS stage were enhanced. From the HIV uninfected to the AIDS stage, the Gini correlation of HIV-1 is a virus that can cause acquired immunodeficiency syndrome (AIDS), leading to thousands of death per year in the world due to the lack of effective vaccines and cure. As one of powerful systems biology approaches, gene co-expression networks (GCNs) have been recently applied to investigate the molecular mechanisms of HIV-1 infection by organizing genes into a network, in which two genes with similar expression patterns are connected by an edge [6-8]. An in-depth statistical analysis of HIV-related network properties will be helpful to discover new biomarkers and signatures of HIV-1 infection.
Here we applied the Gini methodology to explore inequalities in GCNs constructed with 943 genes differentially expressed in human lymphatic tissues of uninfected subjects and infected patients at different stages of HIV-1 infection (the acute, the asymptomatic, and the AIDS stages). More details about the microarray data generation and normalization, and the selection of differentially expressed genes can be found in Xu et al. [9]. To construct GCNs, the similarities of expression patterns between two genes were measured with Pearson correlation coefficient (PCC). Two genes were connected in the GCNs if the significance level (p-value) of PCC is lower than 0.05. The p-values were estimated with permutation method by shuffling gene expression data in the microarray dataset.
negative connectivity is changed more significantly than that of positive connectivity ( Figure 1E), indicating that positive and negative coexpression associations might play different roles in the pathogenesis of HIV infection.

Application of Gini Coefficient to Measure the Inequality of Edge Weights in GCNs
Besides the connectivity, the edge weights (i.e., correlation values) in GCNs were also changed during the HIV-1 infection. For a given gene i, the changes in the correlation strengths can be calculated using the differential co-expression (dC) measure with the formula gene i and j in two networks, respectively. In this study, we observed that there were differences in the inequality of edge weights between GCNs of HIV-1 infection (Figure 2). At the acute and asymptomatic stages of HIV-1 infection, the edge weights are more equal than those in network for uninfected subjects. However, the edge weights become dramatically unequal in network for patients at the AIDS stage ( Figure 2). On this basis, a novel measure "delta Gini" was introduced to consider the differences in the inequality of edge weights between two networks. Although the delta Gini and dC were significantly correlated in most network comparisons (except AIDS vs. Uninfected) (Figure 3), the delta Gini provided additional information about the changes of edge weights between two networks. First, the delta Gini is ranged from -1 to 1, with positive value indicating the inequality of edge weights is increased and negative values indicating the inequality of edge weights is decreased. Second, the delta Gini is valuable to identify candidate biomarkers of HIV-1 with low rank of dC values. For instance, MRC1 is a mannose receptor interacting with several HIV proteins to promote viral spread [13][14][15], and has a delta Gini value of -0.44 (rank=2) and a dC value of 0.96 (rank=173) while comparing networks constructed for patients at the AIDS stage and for uninfected subjects. Similarly, PPFIBP1, which plays roles in HIV-1 replication, also has a high rank of delta Gini (value=-0.42; rank=3) but a low   Table 1.
These results indicate that Gini algorithm would be a complementary approach to dC for comparing the differences between two GCNs.