Mitochondrial Haplogroups Associated with Japanese Centenarians, Alzheimer’s Patients, Parkinson’s Patients, Type 2 Diabetes Patients, Healthy Non-Obese Young Males, and Obese Young Males

Copyright: © 2011 Takasaki S. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Mitochondria are essential cytoplasmic organelles generating cellular energy in the form of adenosine triphosphate by oxidative phosphorylation. Because most cells contain hundreds of mitochondria, each having multiple copies of mitochondrial DNA (mtDNA), each cell contains several thousands of mtDNA copies. The mutation rate for mtDNA is a very high, and when mtDNA mutations occur the cells contain a mixture of wild-type and mutant mtDNAs. As the mutations accumulate, the percentage of mutant mtDNAs increases and the amount of energy produced within the cell can decline until it falls below the level necessary for the cell to function normally. When this bioenergetic threshold is crossed, disease symptoms appear and become progressively worse. Mitochondrial diseases encompass an extraordinary assemblage of clinical problems, usually involving tissues that require large amounts of energy, such as heart, muscle, kidney, and endocrine tissues [1][2][3].
Although mtDNA mutations have been reported to be related to aging and a wide variety of diseases-such as Parkinson's disease (PD), Alzheimer's disease (AD), type 2 diabetes, and various kinds of cancer [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]-those reports have were focused on the amino acid replacements caused by mtDNA mutations. Mitochondrial functions can of course be affected directly by amino acid replacements, but they can also be affected indirectly by mutations in mtDNA control regions. It is therefore important to examine the relations between all mtDNA mutations and centenarians or disease patients.
In the present study the relations between each of eight classes of Japanese people and their mitochondrial single nucleotide polymorphism (mtSNP) frequencies at mtDNA positions throughout the mitochondrial genome were examined using a classification method based on the predictions of a radial basis function (RBF) network [21,22] and using a modified version of that classification method [23]. This examination revealed new mitochondrial haplogroups characteristic of these classes, and the relations between the haplogroups and classes differ from those reported previously [15,16,24,25].

Materials and Methods mtSNPs for the eight classes of people
Tanaka et al. [26] sequenced the complete mitochondrial genomes of 672 Japanese individuals to construct an East Asia mitochondrial DNA (mtDNA) phylogeny [26]. Using these sequences and other published Asian sequences, they constructed the phylogenetic tree for macrohaplogroups M and N [26][27][28]. Kong et al. [29] recently corrected the above sequences by re-sequencing the dubious fragments and segments [29]. In the present study the mtSNPs in 112 Japanese semisupercentenarians were obtained from the report by Bilal et al. [25] and the mtSNPs in the other classes-96 Japanese centenarians, 96 Japanese Alzheimer's disease (AD) patients, 96 Japanese Parkinson's disease (PD) patients, 96 Japanese type 2 diabetes (T2D) patients, 96 Japanese T2D patients with angiopathy, 96 Japanese healthy non-obese young males,

Abstract
There are strong connections among mitochondria, aging, and a wide variety of diseases. In this paper, the relations between eight classes of Japanese people-96 centenarians, 112 semi-supercentenarians (over 105 years old but less than 116 years old), 96 Alzheimer's disease (AD) patients, 96 Parkinson's disease (PD) patients, 96 type 2 diabetes (T2D) patients, 96 T2D patients with angiopathy, 96 healthy non-obese young males and 96 healthy obese young males-and their mitochondrial single nucleotide polymorphism (mtSNP) frequencies at individual mtDNA positions of the entire mitochondrial genome were examined using the radial basis function (RBF) network and a modified RBF method. New findings of mitochondrial haplogroups were obtained for individual classes. The centenarians were found to be associated with the haplogroups D4b2a, B5b, and M7b2; the semi-supercentenarinas with B4c1a, F1, B4c1c1, B4c1b1, and M1; the AD patients with G2a and N9b1; the PD patients with N9a, G1a, B4e, and M7a1a; the T2D patients with D4b2b, M8a, and B5b; the T2D patients with angiopathy with N9a2, D4b1, and G2a; the healthy non-obese young males with D4g, N9a, D4b2b, and B4b/d/e; and the healthy obese young males with M7b2, D4b2b, B4c1, and M7a1a. These results are different from the previously reported haplogroup classifications. As the proposed analysis method can predict a person's mtSNP constitution and probabilities of becoming a centenarian, AD patient, PD patient, or T2D patient, it may be useful in the initial diagnosis of various diseases or longevity. and 96 Japanese healthy obese young males-were obtained from the GiiB Human Mitochondrial Genome Polymorphism Database (http:// mtsnp.tmig.or.jp/mtsnp).

mtSNP classification using a RBF network
A RBF network is an artificial network used for supervised learning problems such as regression, classification, and time series prediction. In the supervised learning a function is inferred from the examples (training set) that a teacher supplies. The elements in the training set are paired values of the independent (input) variable and dependent (output) variable. The RBF network shown in Figure 1 is the supervised learning, and the mtSNP classification for the eight classes of people was carried out individually. In the mtSNP classification for the centenarians, the mtSNPs of the centenarians were regarded as correct and mtSNPs of the other seven classes of people (i.e., semi-supercentenarians, AD patients, PD patients, T2D patients, T2D patients with angiopathy, non-obese young males, and obese young males) were regarded as incorrect. The mtSNP classifications for the other seven classes were carried out in the same way as that for the centenarians (Figure 1).
The mitochondrial genome sequences of the eight classes of people were divided into two sets, one of training data and the other of validation data, and the processes of the classifications were carried out in two phases: training and validation. The steps are described in detail elsewhere [30].

Modified classification based on probabilities predicted by the RBF network
Since a RBF network can predict the probabilities that persons with certain mtSNPs belong to certain classes, these predicted probabilities were used to identify mtSNP features. By examining the relations between individual mtSNPs and the persons with high predicted probabilities of belonging to one of these classes, other mtSNPs useful X TN ------------X 5 X 4 X 3 X 2 X 1

Input layer
Hidden layer Output layer   for distinguishing between the members in different classes were identified. The modified classification method based on the probabilities predicted by the RBF network was carried out in the following way [23].
1) Select the target class to be analyzed.
2) Rank individuals according to their predicted probabilities of belonging to the target class.
3) Either select individuals whose probabilities are greater than a certain value or select the desired number of individuals and set them as a modified cluster.

Results and Discussion
Associations between Asian/Japanese haplogroups and mtSNPs for the eight classes of people  O: haplogroups classified in the highest 15 individuals Table 3: Haplogroup-class relations determined using the individuals whose predicted probabilities were greater than 50%.

Centenarians (69)
Semi-super centenarians (18) AD patients (65) PD patients (24) T2D patients (7) T2D patients with antgiopathy (72) Non-obese young males (16) Obese young males (58) of the highest cluster (classification ID 1) for the eight classes of people were 66.7% for both the centenarians, and the semi-supercentenarians, 88.3% for the AD patients, 60% for the PD patients, 50% for the T2D patients, 81.3% for theT2D patients with angiopathy, 57.1% for the healthy non-obese young males, and 62.5% for the healthy obese young males. As individual classes have different predicted probabilities, for each of the eight classes the 15 individuals with the highest predicted probabilities were selected to examine the relations between Asian/ Japanese haplogroups and mtSNPs [26][27][28]. For the centenarians the association between the haplogroups and mtSNPs that was based on the highest 15 individuals is shown in detailed form in Figure  The relations between the haplogroups for the eight classes of people are listed in Table 2. The haplogroup M7a1a was common in PD patients and obese young males; M7b2 was common in centenarians and obese young males; G2a was common in AD patients and T2D patients with angiopathy; D4b2b was common in T2D patients, nonobese young males, and obese young males; B5b was common in centenarians and T2D patients; and N9a was common in PD patients and non-obese young males.
Then the individuals whose probabilities predicted using the modified classification method were greater than 50% were selected and their nucleotide distributions at individual mtDNA positions were examined. The individuals selected were 69 centenarians, 18 semisupercentenarians, 65 AD patients, 24 PD patients, 7 T2D patients, 72 T2D patients with angiopathy, 16 healthy non-obese young males, and 58 healthy obese young males. The associations between the haplogroups and mtSNPs for the eight classes of people are shown in Figure 3A to H (Provided in the Supplementary). The relations among the haplogroups for the eight classes of people are listed in Table 3. In the mtSNP analysis for the individuals whose probabilities were greater than 50% there were 30 haplogroups for the eight classes, whereas in the analysis for the 15 individuals with the highest predicted probabilities there were only 21 haplogoups. As a result, the ratios of individual haplogoups of the individuals whose probabilities were greater than 50% tended to be lower than those of the 15 individuals with the highest predicted probabilities. In addition, the analysis based on the individuals with probabilities greater than 50% yielded 12 common haplogoups among the eight classes of people, whereas the analysis based on the 15 individuals with the highest predicted probabilities yielded only 6 common haplogoups among the eight classes.
Although the semi-supercentenarians were classified into similar haplogroups in the two cases, their ratios were lower in case 2. That is, in case 2 the ratios of B4c1a, F1, M1, B4c1b1, and B4c1c1 were respectively decreased from 40% to 33%, from 27% to 22%, from 7% to 6%, from 7% to 6%, and from 13% to 6%. In addition, a new haplogroup M7b2 (6%) was classified in case 2.
Although the PD patients were classified into the same haplogroups in the two cases, their ratios were different in the two cases. That is, the G1a ratio was 20% in case 1 and 13% in case 2, the N9a ratio was 20% in case 1 and 21% in case 2, the M7a1a ratio was 13% in case 1 and 17% in case 2, and the B4e ratio was 13% in case 1 and 8% in case 2.
Although the T2D patients were classified into the same haplogroups in both cases, the haplogroup ratios were different in the two cases. That

Statistical technique Proposed method Technique
Relative relations between target and normal data Supervised learning (RBF) by using correct and incorrect data Analysis position Each locus of mtDNA polymorphisms (independent position) Entire loci of mtDNA polymorphisms (succesive positions) Input (required data) Target (individual cases) and control (normal data) Correct (individual cases) and incorrect (others except correct) Output (results) Odds ratio or relative risk Clusters with predictions Analysis Check odds ratio or relative risk at each position Check individuals in clusters based on prediction probabilities is, the D4b2b ratio was 40% in case 1 and 29% in case 2, the M8a ratio was 27% in case 1 and 14% in case 2, and the B5b was 13% in case 1 and 14% in case 2.
The healthy non-obese young males were classified into the same haplogroups in both cases and their ratios were also nearly the same in the two cases. That is, the haplogroup D4g, N9a, D4b2b, and B4b/d/e ratios in cases 1 and 2 were respectively 33% and 31%, 27% and 25%, 13% and 12%, and 7% and 12%. This similarity of haplogroup ratios is due to the numbers of selected individuals being nearly the same in both cases: 15 in case 1 and 16 in case 2.
Then the relations between the haplogroups of pairs of related classes of people-centenarians and semi-supercentenarians, AD patients and PD patients, T2D patients and T2D patients with angiopathy, and healthy non-obese young males and healthy obese young males-were examined in cases 1 and 2.
Centenarians and semi-supercentenarians: Although in case 1 these two classes of people had no common haplogroups, in case 2 they had two common haplogroups M7b2 and F1.

AD patients and PD patients:
Although AD and PD are both brain diseases, these patients had no common haplogroups in either case.
T2D patients and T2D patients with angiopathy: Although these two classes of people had no common haplogroups in case 1, they had a common haplogroup B5b in case 2.

Healthy non-obese young males and healthy obese young males:
These two classes of people had a common haplogroup D4b2b in case 1 and had two common haplogroups D4g and D4b2b in case 2. Common haplogroups were found more often in case 2 because more haplogroups were classified in that case.
As there were 112 individuals in the class of semi-supercentenarians, changes in haplogroup classifications with changes in the number of highest-probability individuals selected were examined. As one sees in Table 4, the number of haplogroups classified increased from 6 for the 15 individuals with the highest predicted probabilities to 9 for the 30 with the highest predicted probabilities, to 11 for the 45 with the highest predicted probabilities, to 14 for the 60 with the highest predicted probabilities, to 15 for the 75 with the highest predicted probabilities, to 16 for the 90 with the highest predicted probabilities, and to 17 for all 112 semi-supercentenarians. The ratios of the haplogroups B4c1a and F1 were respectively 40% and 27% for the 15 individuals with the highest predicted probabilities, but they decreased as the numbers of selected individuals increased and finally became respectively 5% and 4% when all 112 individuals were used. Although the haplogroup D4a was not classified when the 15 individuals with the highest predicted probabilities were used, its ratio was 3% when the 30 individuals with the highest predicted probabilities were used and was 29% when the 45 individuals with the highest predicted probabilities were used. This indicates that most of the semi-supercentenarians belonging to D4a were included in the range of the predicted probabilities 45% to 74%. From Table 4, it is implied that the feature of the semi-supercentenarians appears in the appropriate number of selected individuals used. In the case of the semi-supercentenarians, its number may be 45 individuals used.

Comparison with previous works
After analyzing the results of a large-scale study using hospitalbased sampling data, Fuku et al. [16] reported that the mitochondrial haplogroup F in Japanese individuals is associated with a significantly increased risk of type 2 diabetes mellitus (T2DM) (odds ratio 1.53, P=0.0032) [16]. In the present study, on the other hand, the haplogroups (risks) of T2D patients and T2D patients with angiopathy were respectively D4b2b (40%), M8a (27%) and B5b (13%), and N9a (47%), D4b1 (20%) and G2a (13%) ( Figures 2E, 2F (Provided in the Supplementary)). There were therefore big differences between the analysis of Fuku et al. [16] and results of this study.

Differences between statistical technique and the proposed method
Although the previously reported methods analyzed the relations between mtSNPs and Japanese T2D patients, centenarians, and semisupercentenarians by using standard statistical techniques [15,25], the mutual relations among the other classes of people-AD patients, PD patients, healthy non-obese young males, and healthy obese young males-were not investigated. The differences between and mutual relations among the eight classes of people were described in this study. In addition, the predicted probabilities of associations between mtSNPs and the eight classes of people cannot be obtained by the statistical techniques used in the previous methods, whereas the proposed method is able to compute them from the results obtained when learning the mtSNPs of individual classes.
Although the previous methods used standard statistical techniques, a RBF network was used in the present study because the relations among individual mtSNPs for the eight classes of people should be analyzed as mutual mtSNP connections in the entire population of mtSNPs.
The differences between standard statistical technique and the proposed method are listed in Table 5. In the statistical technique, odds ratios or relative risks are analyzed on the basis of relative relations between target and control data at each polymorphic mtDNA locus. In the proposed method, on the other hand, clusters indicating predicted probabilities are examined on the basis of the RBF using correct and incorrect data for the entire set of polymorphic mtDNA loci. The statistical technique determines characteristics of haplogroups by using independent mtDNA polymorphisms that indicate high odds ratios, whereas the proposed method determines them by checking individuals with high predicted probabilities. This means that the statistical technique uses the results of independent mutation positions, whereas the proposed method uses the results of all mutation positions. As there are the differences between the two methods, which method is better will need to be determined in future research. Furthermore, the proposed method may have possibilities for use in the initial diagnosis of various diseases or longevity on the basis of the individual predicted probabilities.