A Simple Method of Generating Complex DNA Profiles Utilizing Alu-based Markers with Applications in Forensics, Paternity, Genetic Mapping, Population Studies and Ancestry

The D1S80 variable number tandem repeat (VNTR) and the PV92 dimorphic Alu element are commonly incorporated DNA profiling tools for educational purposes that are useful in teaching students about forensics and human population studies [1]. However, these simple, low-cost profiling tools have notable drawbacks. The resolution limitation yields difficulty in assessing alleles of the D1S80 locus, particularly from agarose gels; whereas the PV92 locus only Abstract


Introduction
The D1S80 variable number tandem repeat (VNTR) and the PV92 dimorphic Alu element are commonly incorporated DNA profiling tools for educational purposes that are useful in teaching students about forensics and human population studies [1]. However, these simple, low-cost profiling tools have notable drawbacks. The resolution limitation yields difficulty in assessing alleles of the D1S80 locus, particularly from agarose gels; whereas the PV92 locus only Abstract Figure 1: Schematic diagram of an Alu integration into a chromosome yielding a dimorphic presence/absence marker which can be analyzed by PCR using forward (F) and reverse (R) primers that flank the integration. By agarose gel electrophoresis the three possible genotypes could be determined as the Alu integration yields a fragment approximately 300 bp larger than the absence (ancestral) allele.
As DNA profiling systems become more complex, advancements to a relatively simple technique are presented that promote greater accessibility and usefulness for a variety of applications. In contrast, other simple tools commonly used to teach students about forensics and human populations have notable drawbacks. Two Alutetraplex systems, utilizing four Alu presence/absence variants in a single reaction were therefore developed to provide a simple methodology to generate complex profiles. A third Alutetraplex system is presented, escalating the number of possible genotypes to 531,441, with all alleles of the 12 dimorphic markers being relatively common. Reproducible results were attained even with the use of crude DNA preparations stored frozen for several years with multiple freeze thaws. The incorporation of GelRed DNA stain instead of the highly toxic ethidium bromide promotes greater accessibility, particularly in a classroom setting. This study demonstrates the effectiveness of this profiling system as a simple but informative methodology to analyze paternity, genetic mapping of human traits, and provides data to illustrate its potential in assessing ancestry or geographic origins of an individual.
consists of two alleles (presence or absence of the Alu) or three possible genotypes and therefore is not overly informative. PV92 is an example of an intermediate-frequency (IF) Alu, whereby both the presence and absence forms, also known as a dimorphism, are common in human populations. These types of variants result from a relatively recent integration event of the Alu retrotransposon (Figure 1), and therefore not fixed in the human genome. Approximately 1,200 of the over one million Alu elements within the human genome are dimorphic [2]. The presence and absence alleles can be distinguished by the polymerase chain reaction (PCR) utilizing primers that flank the element ( Figure  1). A number of IF Alu variants have been identified [3,4] and therefore, combining several of these markers in a single reaction exponentially increases the possible number of genotypes without increasing the amount of reactions. A major obstacle in generating these profiles by multiplex PCR, however, is the formation of heteroduplexes, as very similar Alu elements are found in each presence variant, and limits the capability of standard Taq DNA polymerase with a variety of conditions [5][6][7]. The Failsafe system (Epicentre Biotechnologies, Madison, WI) designed for difficult or multiplex PCR has been demonstrated to work well with up to four IF Alu loci in a single reaction, referred to as an Alutetraplex, with the primary adjustment being the modification of primer concentrations [6,7]. Alutetraplexes offer a relatively low-cost technology that is simple to use, requires only standard laboratory equipment, and yet provides relatively complex DNA profiles. This technique provides individuals, particularly students, the opportunity to perform forensic analyses by utilizing their DNA samples and matching it to an unknown, a sample selected by the instructor [7]. Along with a constructed database of anonymous submissions [8], it has also allowed the students to determine the proportion of individuals with the same genotypes, as well as perform a variety of population analyses such as genetic distances and assessment of Hardy-Weinberg equilibrium. The two developed Alutetraplexes [7] generate 6,561 possible genotypes (3 4 ×3 4 ), far exceeding the 435 possible D1S80 genotypes. Additionally, one IF Alu locus has been found to consist of four alleles [9] providing a unique tool in studying human populations, particularly considering the evolutionary order of the generation of the alleles can be determined. Along with forensic and population analyses, this simple technique demonstrates how DNA markers can be used in paternity [7].
Polymorphic DNA markers of various types have been used in positional cloning for the purpose of mapping the chromosomal position of a gene associated with a trait with the objective of eventually identifying that gene [10]. A variable number tandem repeat (VNTR) or a microsatellite marker has greater potential than biallelic markers such as restriction fragment length polymorphisms (RFLPs), by having a greater possibility of a heterozygous genotype for the marker in the individual with the trait. However, large data sets of biallelic markers such as a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome [11] provides a highly useful tool to aid in identifying biomedically important genes. An Alutetraplex consists of four biallelic markers in a single reaction. Therefore, if three tetraplex reactions were available, the potential of finding a useful marker (heterozygous genotype) increases, giving the student an opportunity to perform a mapping experiment and evaluate significance by generating Lod scores [12]. This type of study would involve a non-health risk condition such as the autosomal dominant compelling helio-ophthalmic outburst (ACHOO) syndrome [13]. This syndrome is found in roughly 30% of the population and involves sneezing when initially exposed to bright light, also known as a photic sneeze reflex [14].
To enhance the Alutetraplex methodology for various applications, a third tetraplex of IF Alu variants was developed in this study. By escalating the number of possible genotypes that can be determined, this simple technique becomes a more prominent tool in forensic analysis. Alternative techniques incorporating as many as 30-32 Alu dimorphisms, in studies of specific populations, have demonstrated high levels of discriminatory power in the range of 3.7×10 -13 [15] to 5.53×10 -14 [16] probability of a genotype match. The use of a third set of markers also increases the probability of addressing paternity. For example, if the mother is heterozygous for all four markers in one tetraplex, then the results are uninformative, but more markers will dramatically reduce this possibility and further aid in excluding individuals as the father. Additionally, this study presents the use of Alu-based tetraplexes as a tool for demonstrating genetic mapping of human traits.
The practical application of this method is supported by the use of crude rapid-extraction DNA preparations from buccal cells that have been stored at -20°C for several years with multiple freeze thaws. Additionally, the use of GelRed (Biotium, Hayward, CA) for analyzing tetraplex gels is presented as a safe DNA stain alternative to the highly toxic standard of ethidium bromide, making this technique more accessible, especially for classroom use. Overall, by generating a third tetraplex, the number of markers increases among different chromosomal sites, and markedly enhances the methodology as a simple, rapid, and low-cost tool for paternity, population, genetic mapping and forensic analyses, and may provide information regarding ancestry or geographic origins.

DNA samples
Crude, preparations of DNA were obtained using the Epicentre BuccalAmp DNA Extraction Kit protocol (www.epibio.com) from cheek swabs. Approval was obtained through the institutional human subjects review panel.

Genetic mapping
Two families, in which the father and two children had the ACHOO syndrome, but not the mother, volunteered their DNA samples for this study. The Alutetraplex PCR conditions and primers are provided in a previous study [7]. Alu variants that are heterozygous in the father and homozygous in the mother would be useful in assessing linkage. Lod scores to statistically measure linkage were determined using the formula Log 10 [(1/2 (1-θ) nr ×θ r )/(0.5) n+r +(1/2 (1-θ) nr '×θ r ')/(0.5) nr ' +r' ] [12], with nr referring to non-recombinants and r to recombinants assuming one allele is linked with the trait on the same chromosome, and with nr' and r' referring to the non-recombinants and recombinants, respectively, with the assumption that the alternative allele is initially on the same chromosome as the allele for the trait.

Generation of a new Alutetraplex
Two Alutetraplexes had previously been generated [6,7] and reproduced in this study with the same protocols, but utilizing the safer GelRed stain. This work entails the development of a third tetraplex or AluTetraplex 3. Intermediate-frequency Alu elements and primers sets were selected from those identified by Roy-Engel et al. [3] and Carroll et al. [4]. The selections were based on having alleles that could be distinguished in size on a 2% agarose gel. Primer sets that consistently worked well together were utilized and additional sets then added until identifying four markers that generate interpretable and reproducible results. Alutetraplex 3 was generated by PCR amplification using 1X Buffer E (FailSafe buffer containing dNTPs and MgCl 2 ; Epicentre Biotechnologies, Madison, WI), primer (sequences provided in Table  1)

Incorporation of Alu-based tetraplexes in paternity analysis
Although Alu-based tetraplexes provide a unique tool to demonstrate paternity [7], increasing the number of biallelic markers improves the possibility of excluding individuals as the father, as well as enhances the verification of the correct individual. Individual conditions: 94°C for 2 m, followed by 32 cycles of 94°C for 30 s, 58°C for 30 s, 72°C for 1 m 30s, with a final extension step at 72°C for 5 m. Amplified products were separated on a 2% agarose gel in 1XTAE buffer with 1X GelRed stain (Biotium, Madison, WI) incorporated in the gel. Gels were run at 135 volts until the bromophenol blue band was at the bottom or just barely run off the gel.

Refinements of Alu-based tetraplex reactions
Reproducible results of the two previously designed Alutetraplex reactions [7] were obtained ( Figure 2) using rapidly extracted DNA preparations. For a laboratory course the BuccalAmp Extraction Kit is advantageous in generating ready to use DNA in just several minutes. However, since it is based only on heating to lyse the epithelial cells and degrade compounds inhibitory to amplification, then as with the commonly used Chelex method [1], the preparations are crude in relation to the purity of DNA obtained using affinity-based kits such as QIAamp. Crude preparations have been associated with lower stability and decreased amplification potential with long-term storage and multiple freeze thaws [17]. Due to the impurities, accurate quantitation could not be assessed using a NanoDrop 2000c spectrophotometer (Thermo Scientific, Waltham, MA) as OD 260 /OD 280 ratios of eleven samples isolated by the most recent class ranged from 1.05 to 1.38 (X =1.13 ± 0.09). Therefore, reactions incorporated the same volume of DNA rather than the same concentration. The samples used in the two Alubased tetraplexes had been stored for several years with multiple freeze thaws and retained interpretable results ( Figure 2). Highly purified DNA isolated from peripheral lymphocytes [18], as anticipated, worked well in this study (data not shown). The much less toxic GelRed DNA stain was incorporated into these gels as opposed to ethidium bromide, with all eight alleles discernible in each tetraplex. 6 ( Figures 2A and 2B) represents the maternal sample. Since she is heterozygous for all four Alu loci in tetraplex 1 (Figure 2A), then the data are uninformative for a paternity analysis. Therefore, the use of a second tetraplex was necessary to be able to perform the analysis ( Figure 2B). All the alleles in the children (samples 7 and 8) that were not transmitted by the mother (6), can be accounted for by the father (sample 5). To help explain the concept, we could ask whether the individual represented by sample 9 could be the father. Since individual 7 has the 60-allele, whereas the mother (6) and individual 9 do not, then individual 9 is ruled out as the father of this child. We could rule out number 9 as the father of individual 8 in this case only because we can distinguish between two slightly different-sized 225+ alleles [9]. Individual 8 has the smaller 225+ allele (more details regarding this variant will be discussed) whereas neither the mother nor individual 9 does. If just labeling Alu 225 as + and -variants then we could not have ruled out individual 9 as the father of this child. The generation of a third Alutetraplex therefore would enhance the paternity analysis by providing additional potentially informative markers.

Alu-based tetraplex reactions and their use in genetic mapping studies
Heterozygous markers are required in order to perform genetic mapping studies (outlined in Figure 3). However, heterozygosity of the mother in this pedigree is problematic, as she does not exhibit the trait, and heterozygous offspring from heterozygous parents would result in indistinguishable determination of the inherited paternal allele, which is being assessed for linkage to the trait. In this study Alutetraplex 1 was not informative since the mother was heterozygous for all markers (Figure 2A). However, Alutetraplex 2 provided markers informative for genetic mapping (Figures 2B and 4). In family 1, three of the four markers in the father (who has the trait) were heterozygous and the mother was homozygous for all four markers, allowing three genetic loci to be assessed for linkage. If the two children, both with the trait, shared the same paternal allele then the result would lean toward linkage, whereas if they had different paternal alleles then independent assortment would be inferred (Figure 3).
Genetic mapping using the Alu 9 dimorphism resulted in the children both obtaining the presence allele from the father ( Figures 2B  and 4), and Lod scores based on linkage at 10%, 20%, 30% and 40% (theta values in the formula) would yield values of 0.215, 0.134, 0.065, and 0.017, respectively. These are all positive numbers directed toward

Development of Alutetraplex 3
In this investigation a third Alu-based tetraplex was generated ( Figure 5), consisting of markers 109, 65, 201, and 241( Table 1). The data were reproducible and variants easily discernible. As a rule-ofthumb, uncertainty can be resolved by noting the size of the fragment and the intensity of the alternative allele. A case in which this guideline can be used is sample 3 (Figure 5), as there is a barely visible band at approximately the size of the 201+ allele, although clearly a smaller fragment than that found in lanes 1, 4, and 5, but not as small as the notably visible 241+ allele. Additionally, the relative intensity of the 201allele for sample 3 is similar to sample 2, which is clearly homozygous for the absence allele. In accordance with this rationale is the comparison of the 109 alleles between samples 2 and 3, with 109-being displayed as a much more intense band in the homozygous sample 3 relative to the heterozygous sample 2. As verification of the correct interpretation, the data were consistent with individual testing of the primer sets (data not shown). This result was also consistent with repeated attempts using the markers 109, 65, and 201,which consistently worked well together. Several fourth markers were attempted (data not shown) to finalize this tetraplex. Marker 241 was found to consistently work well using the Failsafe system, with modifications of primer concentrations as previously described for building Alutetraplexes [6].

Alu-Based tetraplex reactions for forensic analysis and ancestry
Since two Alu-based tetraplexes have been generated, then 6,561 possible genotypes are possible. Therefore, in a laboratory course of 12 students, using one of the samples as the unknown, the identification of the unknown individual would be inevitable. To date, with over 60 samples used, no two individuals have had the identical genotype for the eight markers (data not shown). With more widespread use of this system, such as at various college and university classrooms, profiles can be submitted anonymously (e.g. http://www.emudnaprofiledatabase. org/index.php), increasing the data available for forensic analysis and could strengthen the potential use of this system for assessing ancestry. Also, the generation of additional Alu-based tetraplexes would valuable asset in further pursuing these objectives. This would increase the complexity of the possible outcomes, while maintaining the simplicity of the technique. With the addition of four dimorphic Alu variants, markers on the three gels yield 531,441 possible genotypes. Genotypes and allele frequencies have been assessed in various populations (Table  2), supporting these markers as IF variants, hence effective as tools in forensic analysis. Based on the known values, students can then assess the possibility of generating their genotype. Hardy-Weinberg equilibrium is assumed for each population as none of the generated data statistically deviated from expected values. As an example of the potential for three Alu-based tetraplexes, the probability of obtaining the genotype 51A ++, 182 +-, ACE --, TPA25 +-   children also being heterozygous, and therefore none of the markers were informative. Additional markers, such as those incorporated into a third Alutetraplex, increases the probability of demonstrating the additive effect by potentially generating informative data from more than one family. linkage but represent too small a sample size to generate a significant value. In contrast the children had obtained different alleles for the Alu 60 and Alu 225 markers from the father, thereby generating Lod scores of -0.444, -0.194, -0.076, -0.018 for the same theta values, respectively. These negative values shift the inference toward non-linkage for these chromosomal locations. Values are additive and therefore as data are accumulated using different families, the Lod scores can be added, and values greater or equal to +3 provide significance that the marker is linked to the gene for the trait, whereas a value equal or less than -2 provides significant support that the marker is not linked to the gene. When analyzing the second family with the same marker set, the father was heterozygous for only Alu 225, but the mother was as well, with the In contrast, the student can ascertain the probability of this genotype in his/her ethnic group. If African American ancestry was expected, the value would be 1.71×10 -7 ; whereas if Egyptian, the expected value would be 1.07×10 -6 . Albeit small data sets were evaluated, there is roughly an order of magnitude less of a probability that the individual is of African descent than Egyptian. Additionally, F st calculations, measuring unbiased genetic distances between populations, depict three markers (Alu 50, Alu 241, Alu 9) associated with very high differentiation (values 0.25 or greater) between the African American and Egyptian populations ( Table 3). The Alu 9 marker is exceptionally strong in distinguishing individuals of these populations. Therefore, Alu-based markers warrant consideration as applicable markers in depicting ancestry and ethnicity of an unknown sample.

Discussion
The Alu-based tetraplex methodology, incorporating the simultaneous analysis of four IF dimorphic Alu markers, provides a simple tool to generate complex DNA profiles. With basic laboratory equipment and the use of GelRed DNA stain, the system offers accessibility for a wide range of users, particularly in a genetics-or genomics-based laboratory course. Students can use a rapid extraction kit to utilize their own DNA for profiling, which has been shown to increase their interest in the material [19]. In this investigation a third Alutetraplex was developed to increase the usefulness of this profiling tool while maintaining the simplicity of the system. With a third Alutetraplex, this DNA profiling system is enhanced by increasing the possible number of genotypes to 531,441, along with minimal Ancestry or personalized genetic histories (PGHs) can be assessed by polymorphic DNA markers that substantially differ in allele frequencies across populations. These are referred to as ancestral informative markers (AIMs) [20]. A portion of Alu dimorphisms may be classified as AIMs based on distinct variation supported by weighted allele frequencies and Fst values between ethnic groups or populations of different geographic origins [21][22][23][24]. With a limited data set [3,4,21,24,25], very high genetic differentiation between African American and Egyptian populations was supported for three of the 12 Alu markers used in the tetraplexes. Additionally, when assessing a randomly selected genotype, the chance of the individual having an African American background was nearly an order of magnitude less than Egyptian. Combined, these findings illustrate the potential of this dimorphisms to infer geographic origins as either from Europe, Asia, Africa, or India, correctly assigned 14 of 18 tested unknown samples to one of these four major global populations and properly determined the other four as having mixed ancestry [26]. As anticipated, ancestry is not limited to one of four global populations. Additionally, Cordaux et al. [23] identified two dimorphic Alus in which the presence allele was restricted to African populations. The development of a highly comprehensive map of Alu and other mobile element insertion polymorphisms [27] provides continued building of population data advancing the power of these markers in assessing ancestry. Therefore, continued analysis of Alu dimorphisms and advancements of contributions to the field.
The use of approximately 30 Alu dimorphisms [15,16] have been demonstrated to have resolving power approaching that of the Combined DNA Index System (CODIS) used by the FBI, involving 13 short tandem repeat loci (STRs) consisting of 8-22 alleles offering an of the Alu based system include the stability of Alu integrations and the lack of separate integrations into the exact same location making these  additional procedural requirements that only include the obtaining of four additional primer sets and analyzing one additional PCR set. By utilizing three tetraplexes instead of 12 individual reactions for each sample, then time, reagents, and consumables are saved. Additionally, since the analyses are performed on mini agarose gels, then cost and time are further minimized. These are important considerations for a class-based laboratory. The generated profiles were demonstrated to be applicable to forensic, paternity, population and genetic mapping studies. system to distinguish a sample among ethnic groups. The use of 100 techniques to study these variants are anticipated to provide important approximate 10 -15 probability of an exact match [28,29]. Advantages of homoplasy-free markers, whereas STRs have the potential to change in number of repeats [30]. With additional Alu dimorphisms being identified with techniques such as anchored PCR [23], subtractive hybridization [31], and DNA sequencing of genomes [32], then Alubased systems have great potential to advance as a formidable tool in forensic analysis. One approach incorporates analyzing the Alu dimorphisms individually with PCR and gel electrophoresis [16]. This is a simple technique that provides clear results without specialized equipment and doable for standard molecular biology laboratories. An alternative novel method of Alu dimorphism analysis is the use of a multiplex PCR system using fluorescent dyes and is based on the target site duplication, in which size differences of 3-6 nucleotides analyzed by capillary gel electrophoresis distinguishes the alleles [15]. In contrast to these techniques, albeit thus far only 12 Alu variants are currently being analyzed, the tetraplex system provides a mechanism to analyze several variants at once, use less reagents than individual analysis of variants and less time, and does not require the use of specialized equipment or need for fluorescent dyes for labeling. The time and cost are important considerations for basic laboratories particularly in an educational setting. The development of a third Alu-based tetraplex escalated the profile possibilities and enhances this tool for use in paternity, genetic mapping, population, and forensic studies in addition to the potential use in assessing ancestry and/or geographic origins. Future development of a fourth profile of IF Alu markers would rocket the number of possible genotypes to 43,046,721 possible genotypes, gradually progressing to a range of potential commercial use.
A further advancement of the Alu-based system arose with the finding of additional Alu presence alleles that post-date the integration event [9]. Based on a small, but noticeable size variant of the Alu 225 marker as seen in Alutetraplex 2 ( Figure 2B), which demonstrated a Mendelian pattern of inheritance for a three-allele system, then the presence allele was previously investigated further [9]. Interestingly, form, and the evolutionary order of alleles could be assessed. A PCR with restriction enzyme digest protocol was developed to distinguish the four alleles. The system yielded additional findings not observed by presence/absence scoring. One presence allele was absent in the analyzed European Caucasians and South Americans, whereas a different presence allele was absent in all tested Asian individuals of various countries indicative of distinct geographic variation. The African American and various African populations exhibited all four alleles, consistent with the African origins hypothesis. This find suggested that there may be more informative allelic variation that post-dates Alu integration events, and this hypothesis is currently under investigation.
The application to genetic mapping was also enhanced by the biallelic markers. Since not all biallelic markers generate informative data, as shown in this study, this advancement increases the possibility of identifying informative markers for students to determine Lod scores in order to assess linkage of the marker to the trait. In this case, mapping of the gene for the ACHOO syndrome was chosen since roughly 30% of the population have this non-health risk trait [14], and therefore identifying families to participate would be relatively easy. Since the determination of Lod scores is additive, then a proposed database can be generated for different laboratory courses to incorporate their data for the 12 different markers until significance for linkage or non-linkage is reached, hence students are participating and contributing real data in an ACHOO syndrome mapping consortium. development of the third Alutetraplex by the provision of additional In conclusion, although large-scale projects and advanced technologies are being used to incorporate dimorphic Alu markers in forensic applications, the Alutetraplex system with the newly developed third profile provides a tool that is relatively technologically simple and inexpensive, accessible for classroom use, and allows for highly informative forensics analyses along with an encouraging potential to assess ancestry. This simple method to generate complex profiles offers several advantages over the standard D1S80 VNTR and single Alu variant methods, particularly in an upper-level college course. Also presented was its use in genetic mapping to demonstrate positional cloning methodology. Lastly, this technique has the potential to advance further particularly with the growing number of identified dimorphic Alu elements.
three distinct presence alleles were identified in addition to the absence