Relationship between Ancestry Inferred by Molecular Analysis, Self-Report and Hetero-Classification

Pablo Abdon da Costa Francez1,2*, Adriano Ruiz Lima2, Rivelton Riverson Pereira de Almeida2 and Sidney Emanuel Batista dos Santos3 1Department of Forensic Laboratories, Laboratory for Forensic Genetics, Technical and Scientific Police in Amapá, Macapá, Amapá, Brazil 2Faculdade SEAMA, Macapá, Amapá, Brazil 3Department of Pathology, Laboratory of Human and Medical Genetics, Federal University of Pará, Belém, Pará, Brazil


Introduction
The genetic variations observed among different human populations are relevant in epidemiologic studies and can be used as a tool for the definition of information about the parental lineage in admixed populations. Although the biogeography of certain population groups in some parts of the world is culturally and genetically established, other groups have a relatively recent history of miscegenation, with ancestors coming from different continents. This is the case of the Brazilian population, characterized by the miscegenation of three philogeographic parental groups (Europeans, Africans and Native Americans), in regionally variable degrees of miscegenation [1,2].
Self-reported ancestry is a method described as having high correlation with the genetic structure of the population in welldefined and stratified philogeographic groups, such as the Europeans, Africans or Asians [3][4][5]. In highly miscegenated populations, however, both the declared ancestry or other anthropometric human features usually employed, such as skin color, prove controversial and are not very reliable for assuming ancestry [5][6][7][8]. In this context, the use of molecular genetic markers could be of great value to reduce the potential consequences of stratification in these populations [6,9,10].
Regardless of any sociological and anthropological context, in Brazil, according to the Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística -IBGE), race and skin color are treated as equivalent in the demographic classification. In most Brazilian genetic association studies, control of the population structure is carried out based only on the self-reported skin color, the evaluation of anthropometric features, a genealogy analysis or the interviewer's subjective opinion [11][12][13][14][15].
While the grouping of individuals based on self-reported ancestry is a typical procedure for determining philogeographic groups in studies of the Brazilian population, the recent utilization of molecular markers for inferring genetic ancestry has revealed great genetic heterogeneity [16][17][18][19][20][21][22]. One of the problems of conducting association studies in miscegenated populations by using only self-reported ancestry or skin color as equivalents of ethnicity is the possibility of a spurious association with false-positive or false-negative results [23][24][25][26][27][28][29].
An increasing number of publications have reported the use of ancestry-informative markers (AIMs), whose allelic frequencies vary significantly among populations of different geographic origins and can be employed to estimate individual admixture and to identify a population substructure. In most of the studies, single nucleotide polymorphisms (SNPs) were used [30][31][32][33], but insertion/deletion markers (INDELs) with small DNA fragments [29,34] and Short Tandem Repeats (STRs) [35] have also been employed.
Genetic variation studies using DNA polymorphisms distributed throughout the genome have allowed a better understanding of the history and diversity of human populations and have provided a genetic identification system of individuals. Insertion/deletion polymorphisms (INDELs) are length polymorphisms produced by insertions or deletions of one or more nucleotides in the genome. In the last years, the INDEL-type polymorphisms have been given attention in many studies.
using INDELs for a great variety of purposes, including investigations of relatedness, addressing the genetic structure of human populations and using them as genetic markers in natural populations [37][38][39][40].
The objective of this work was to evaluate the relationship between self-reported/hetero-classified ancestry, skin color and individual genetic ancestry estimated by using 48 insertion/deletion polymorphisms in an admixed population sample from the Brazilian Amazon region.

Samples
Peripheral blood samples were collected from 130 healthy and unrelated individuals (63 male, 67 female, with a mean age of 37.5 years, varying from 18 to 69) seen for routine exams at the Clinical Analysis Laboratory (Laboratório de Análise Clínica UNILAB) in the city of Macapá (0° 02'20 "N; 51° 03'59" W), State of Amapá, northern Brazil. After the blood collection, digital images (photographs) of all 130 volunteers were also taken and recorded. All participants in this study signed a free and informed consent form.

Molecular analysis
DNA was extracted from peripheral blood mononuclear cells, using the phenol-chloroform protocol [40]. DNA quantification was done with a NANODROP 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The PCR amplification conditions and primer sequences used were the same as described by Santos et al. [41] (Supplementary Table 1). The amplification products were submitted to capillary electrophoresis in an ABI 3130 genetic analyzer (Applied Biosystems RT ) and analyzed using Genescan and Gene Mapper software.

Enquiry
Self-report: In the interviews with the volunteer study were analyzed the skin color, predominant ancestry and percentages of ancestry (African, Native American and European) obtained by selfreport. In other words, each of the 130 subjects indicated that skin color, predominant ancestry and percentage of European, African and Native American ancestry who believed present.

Hetero-classification:
The recorded digital images of each study subject were presented to another group, composed of 40 volunteers (24 women and 16 men, with a mean age of 25.3 years, varying from 18 to 58), selected mostly among university students or graduates who signed a free and informed consent form by which they agreed to take part in the research. These volunteers evaluated issues regarding skin color, predominant ancestry and ancestry percentage of the 130 photographed persons (hetero-classification).
No specific period of time was determined for the evaluation of the digital images by the volunteers (hetero-classification), and most of the analyses took between 40 minutes and one and a half hour.
The hetero-classified ancestry and skin color of each one of the 130 subjects was defined based on the majority of statements obtained in the enquiry, i. e., the test subjects were included in a given skin color (white, light brown, medium brown, dark brown, and black) or predominant ancestry (African, European and Native American) category if at least 20 (50%) of the evaluators assigned them to the same category.
Regarding ancestry percentages by hetero-classification, each of forty volunteers indicated the percentage who believed that each subject had of African, Native American and European ancestry, based on their physical characteristics inferred from the observed photographs. Then the percentage of hetero-classification ancestry was estimated for each individual using the arithmetic mean of the percentages mentioned by 40 volunteers. This methodology (hetero-classification ancestry) simulates an everyday situation in criminal cases is related to the description made by the eyewitnesses, the physical characteristics of criminals.

Statistical analysis
The genotypes obtained for the 130 samples investigated were analyzed with regard to their interethnic admixture using the Structure software http://pritch.bsd.uchicago.edu/software/structure2_1.html), and the FST matrix and WPGMA tree analysis were carried out with the GDA program. The tree was displayed by means of the TreeView software http://taxonomy.zoology.gla.ac.uk/rod/treeview.html [42,43].
The results obtained by the enquiries were compared with the ancestry estimates based on the DNA of the 130 subjects. These comparisons were submitted to statistical treatments using the G test and linear regression, aiming to determine whether there was a correlation between the phenotypes (ancestry and skin color) indicated by self-report and by hetero-classification and the genetic ancestry inferred by the AIMs employed, using the BioStat 5.0 program (Ayres et al.).

Ethics Committee
This study was submitted for analysis to the Ethics Committee of SEAMA College and was given approval, under protocol: 133/2010.

Results and Discussion
DNA samples from the 130 study subjects were analyzed, comparing them with the genetic profile of the three parental groups (Africans, Europeans and Native Americans) using the 48 AIMs described earlier.
The STRUCTURE program allowed estimating for the population of Macapá a mean ancestry of 21% African, 50% European and 29% Native American. These ancestry percentages are in accordance with those estimated in previous studies which employed genetic markers with bi-parental inheritance for the same population [42,43] and for other miscegenated Brazilian populations [44][45][46][47][48] (Supplementary  Tables 2 and 3).
Comparing the genetic profile obtained for the AIM panel investigated in the population of Macapá with those obtained for other Brazilian populations (southern regions, Belém do Pará, and "quilombola" African descendants) and parental populations (Europeans, Africans and Native Americans), the population of Macapá was found to group to that of Belém, presenting a larger divergence from the other populations (Supplemental Figures 1-3). This result is in strict accordance with the geographic proximity of these two populations, their similar colonization history and the fact that a significant percentage of the population of Amapá originates from the neighboring State of Pará.

Enquiry (hetero-classification)
Comparing the study subjects' self-reported and hetero-classified (through the enquiries) skin color, overall 70% of the subjects selfreported as having white skin were also hetero-classified as having white skin. As for the other skin colors, no such evident relationship between self-report and hetero-classification was observed when evaluated isolatedly. However, when grouped into three categories (W-LB=white and light brown, MB=middle brown, and DB-B=dark brown and black), it was observed that in 73.5% of cases the subjects self-reported as having white or light brown skin were also heteroclassified in the same manner. Likewise, 68.7% of the subjects selfreported as having dark brown or black skin were also hetero-classified similarly (Supplementary Table 4).
These results indicate that the majority of the participants of the study are convergent in their opinion with regard to skin color when evaluating persons with the most extreme skin colors (white/light brown and black/dark brown).
Comparing the predominant ancestry obtained by self-report and by hetero-classification, 63.6% of the subjects self-reported as having predominantly African ancestry were also hetero-classified as having predominantly African ancestry. With regard to predominantly European ancestry, 66.1% of the subjects were both self-reported and hetero-classified as having predominantly European ancestry. Among the subjects self-reported as having predominantly Native American ancestry, 44% were also hetero-classified as predominantly Native American (Supplementary Table 5).
A comparison between skin color and ancestry, both by self-report (Supplemental Table 6) and by hetero-classification (Supplementary Table 7), showed significant differences when the G test was applied to the ancestry percentages among the individuals with different skin colors. A European ancestry was predominant among the individuals who self-reported and/or were considered as having white or light brown skin, and an African ancestry was predominant among the individuals who self-reported and/or were considered as having dark brown or black skin.

Genetic Ancestry
A comparison between the ancestry percentages estimated by genetic analysis and the self-reported and hetero-classified ancestry data showed that individuals with European genetic ancestry values ranging from 60% to 70% were self-reported and hetero-classified as having predominantly European ancestry in 79% and 68.4% of cases, respectively. Among the individuals with a European genetic ancestry percentage superior to 70%, the predominant self-reported and heteroclassified ancestry percentages were 100% and 93.3%, respectively. Similar results are also observed among the individuals with African genetic ancestry values ranging from 30% to 50% were self-reported and hetero-classified as having predominantly African ancestry in 43% and 53.6% of cases, respectively. Among the individuals with an African genetic ancestry percentage superior to 50%, who were self-reported and hetero-classified as having predominantly African ancestry in 100% of cases (Tables 1 and 2; Figure 1). However, as the sample size of the subgroup with percentage of African ancestry greater than 50% was very small, these results should be evaluated by employing additional studies with larger sample size.
When the ancestry percentages estimated by DNA analysis were compared with the self-reported and hetero-classified skin color, a correlation was found between the European and African ancestry percentages and skin color. The higher the European genetic ancestry percentages estimated, the greater the number of individuals selfreported or hetero-classified as having white or light brown skin, and similarly, the higher the African ancestry values, the greater the number of individuals self-reported or hetero-classified as having dark brown or black skin (Tables 3-5). Figures 2-4 show graphs comparing the ancestry percentages (self-reported, hetero-classified and genetic) and the skin color of the study subjects. There is a clear-cut decrease of European and increase of African ancestry as the skin color becomes darker. Linear regression analysis allowed disclosing statistically significant correlations between the genetic ancestry percentages and the ancestry percentages estimated by self-report and hetero-classification (Supplementary Figures 4-12).
An important point to be highlighted is the fact that, the observed correlation notwithstanding, the African and -to a lesser extent -the Native American contributions were overestimated and the European contribution was underestimated, comparing the self-reported/heteroclassified and the genetic ancestry. This result can be explained by the strong association observed between skin color and ancestry. A person with black or brown skin is automatically appointed as having a high degree of African ancestry, regardless of other characteristics. The same result was also reported by Santos et al. [39].
Some studies on ancestry [49] suggest that an attempt to correlate genetic ancestry based on AIMs with phenotypic characteristics such as skin color or another feature that is relevant in criminal investigations would not be applicable in Brazil, given the high degree of miscegenation of the population. However, many of these studies used AIMs which are not significantly different from the allele frequencies observed in the parental populations. These AIMs have often been selected for studies involving European populations and, when employed in Brazil, eventually overestimated the percentages of European ancestry and minimized the African and Native American contribution, making the Brazilian population seem much more homogeneous than it actually is.
Another recurrent problem in many studies is the utilization of modern African and Native American population samples as being representative of the parental populations that formed the Brazilian population, without taking into account the great genetic diversity of the current African populations and the several population bottlenecks undergone by the Native American populations along the colonization process, which substantially reduced the variability that existed in the past. In order to bypass these problems, it is fundamental to carry out a thorough investigation of the origin of the African populations which took part in the formation of the miscegenated population to be studied and to employ the largest possible sample of Native Americans, in an attempt to compensate for the loss of diversity observed in the modern populations.
The MULTI-INDELS panels described in this study were tested in different forensic samples and various concentrations of DNA and showed satisfactory results proving to be a promising technology in criminal investigations [50]. This molecular tool was first employed assisting a criminal investigation in the identification of two skeletons found in a small boat adrift located in the Atlantic Ocean near the north coast of the Brazilian Amazon region, in the state of Amapá. After speculation that the crew would be Africans, molecular analyzes using the AIMs MULTI-INDELs have been shown to treat admixed individuals, predominantly with African ancestry but with significant percentage of European and Native American ancestry, indicating that probably would be from the American continent, possibly in some region of northern South America or the Caribbean [51].

Conclusion
In conclusion, this AIM panel allowed inferring percentages of    Furthermore, the use of MULTI-INDELS panels features easy processing, fast reading results and does not require significant adjustments in Forensic Genetics laboratories already deployed, since it uses the same team chemistry and keeping the simple workflow is already established for STRs.