Genetic Analysis of Ten Gonosomal STR Loci in an Italian Population Using the Elucigene QST*R-XY Amplification Kit

Genotyping of X-chromosomal short tandem repeats (X-STRs) is an emerging tool in forensic genetics because of its inheritance pattern, and a large number of markers has been characterized. Quantitative fluorescence polymerase chain reaction (QF-PCR) analyses of STR markers on the X-chromosome are performed routinely in medical genetics laboratories for the rapid detection of aneuploidy in chromosome X. In this study, 595 Italian participants were genotyped at 10 gonosomal STRs (DXS680, DXS98, DXS6807, DXS1187, XHPRT, DXS742, DXS6809, DXYS267, DXYS218, and DYS448) using a commercially available QF-PCR kit. Here, we report the allele architecture of DXS1187 and DXYS218, which have not previously been characterized for forensic use. The presence and extent of genetic linkage and linkage disequilibrium between all X-STRs were estimated. Allele and haplotype frequencies in the Italian population were assessed and reported together with statistical parameters.


Introduction
Analyses of X-chromosome markers are useful supplemental tools for genetic investigation, kinship analysis, deficiency paternity cases, and for interpretation of complex profiles in DNA mixtures. Several X-chromosomal markers have been characterized by the forensic DNA community, and assays have been developed to detect of Xchromosomal short tandem repeats (X-STRs) [1]. Commercial kits that analyze STR markers for medical purposes also have emerged. Quantitative fluorescence polymerase chain reaction (QF-PCR) is a routine technique in prenatal genetic diagnosis that allows for rapid, simple, and inexpensive diagnoses of common aneuploidies [2]. Similarly to current forensic DNA kits, highly polymorphic STRs are amplified readily using fluorescent dye-labelled primers, detected with capillary electrophoresis, and analyzed using GeneMapper software [3]. QF-PCR results are interpreted according to the number and the fluorescence intensity of alleles at each locus. The methods and chemistry of QF-PCR diagnostic kits are similar to those currently used in forensic DNA analyses, but these kits have not been evaluated formally by the forensic DNA community. We examined the Elucigene QST*R-XY kit (Gen-Probe Life Sciences Ltd, Abingdon, UK), a DNA-based multiplexed assay for the rapid prenatal determination of sex chromosomal aneuploidies, including Klinefelter and Turner syndromes. This 12-plex QF-PCR enables the identification of the Amelogenin marker, which amplifies nonpolymorphic sequences on the X (104 bp) and Y (110 bp) chromosomes, and of the non-polymorphic Y-specific SRY marker, which permits gender determination. Elucigene QST*R-XY targets (1) the pseudoautosomic STR markers, DXYS267 and DXYS218, located in both the X and Y chromosomes; (2) the X-specific markers, DXS680, DXS98, DXS6807, DXS1187, XHPRT, DXS742, and DXS6809; and (3) the Y-specific marker, DYS448 [4].
In this study, we describe the allele frequencies of these gonosomal STR markers in an Italian population, and we characterize the STR structures of the novel markers, DXS1187 and DXYS218. Statistical and genetic parameters confirm the informativity and usefulness of the Elucigene QST*R-XY for forensic purposes.

Materials and Methods
Samples and data collection DNA was extracted from buccal swabs of 595 unrelated healthy volunteer donors born in the Central Italy (284 males, 311 females) using a QIAamp DNA Blood mini kit (Qiagen, Hilden, Germany). All individuals provided their written informed consent.
All DNA samples were amplified using Elucigene QST*R-XY kits (Gen-Probe Life Sciences Ltd, Abingdon, UK) by dispensing 2 μl of DNA into a 0.2-ml PCR vial containing 10 μl QST*R-XY reaction mix. Samples were amplified on an Applied Biosystems 9800 Fast Thermal Cycler according to the following program: initial denaturation at 95°C for 15 min, 26 cycles of 95°C for 30 s, 59°C for 90 s, and 72°C for 90 s, final extension at 72°C for 30 min. PCR products were separated and detected on an ABI 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA). Raw data were analyzed using GeneMapper ID 3.2 software (Applied Biosystems), according to the manufacturer's recommendations.

Analysis of STR structures
To establish a correspondence between electrophoretic lengths (bp) and allele structures, we sequenced the STRs of several homozygote/ hemizygote samples. Elucigene QST*R-XY enables the detection of 10 STRs, of which 8 have been described previously for forensic purposes [5][6][7][8][9][10][11][12][13][14]. We extensively sequenced and characterized the structures of the novel markers, DXS1187 and DXYS218. Forward and reverse primers (Invitrogen) were designed and used for PCR and sequencing ( Table 1). PCR was performed in 25 µl reaction volumes each containing 5-10 ng of genomic DNA, 0.7 µM of each primer, 200 µM of each dNTP, 2 mM MgCl, 1 U of AmpliGold Taq DNA polymerase (Applied Biosystems), and 10X PCR buffer (Applied Biosystems) using a GeneAmp® PCR System 9700 (Applied Biosystems). 0 Samples were amplified according to the following program: initial denaturation at 96°C for 10 min, 30 amplification cycles at 96°C for 1 min, 57-60°C for 1 min, 72°C for 1 min, and a final extension at 72°C for 30 min. Following enzymatic purification with 1 U exonuclease I and 2 U alkaline phosphatase (Ambion, Austin, TX), samples were sequenced using a BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems) according to the manufacturer's instructions. Sequencing reaction products were purified from the residual dye terminators using a BigDye XTerminator Purification kit (Applied Biosystems). Repeat structures were determined by direct sequencing of PCR products using an ABI 3130xl Genetic Analyzer (Applied Biosystems).

Marker
Primer sequence (5'-3')   ARLEQUIN 3.10 [15] software was used to calculate sex-specific allele frequencies, gene diversity, and heterozygosity. Allele frequency distributions observed for the loci in male and female samples were compared using an exact test of population differentiation. Deviations from Hardy-Weinberg equilibrium at each locus were tested for statistical significance using female data, according to Guo and Thompson [16]. ARLEQUIN 3.10 also was used to estimate the linkage disequilibrium (LD) between all pairs of loci by the exact test. All statistical tests and all calculations of standard deviations (SDs) were based on 10,000 randomizations. To determine linkage values and estimate a genetic map of the markers, we submitted the physical positions of the examined STRs to the Rutgers Map Interpolator application in the Rutgers Combined Linkage-Physical Map of the human genome (v.2) [http://compgen.rutgers.edu/Default.aspx] [17]. The Rutgers Map is a high-resolution genetic map that assembles the largest set of polymorphic markers publicly available. Included in the Rutgers Map are sequence-based positional data, recombination-based data, and genotype data from the CEPH and deCODE pedigrees, the Marshfield map, and the SNP Consortium. Genetic localizations and recombination values were estimated using the formula of Kosambi et al. [18]. The ChrX-STR application [http://www.chrx-str.org] was used to determine the polymorphism information content (PIC) [19], the power of discrimination (PD) for males and females [20], and the mean exclusion chance (MEC) [20][21][22].

Allelic designation and estimation of frequencies
Sequencing data were used to assign alleles in accordance with the ISFG recommendations [23]. Fragment sizes were analyzed by comparison with allelic ladders prepared in house by pooling sequenced alleles. Control DNA from cell line 9947A (Applied Biosystems) was used as reference standard. Repeat structures and alleles observed among the 10 STRs are summarized in Table 2. We examined in detail the structures of two loci that had not previously been characterized for forensic purposes, DXS1187 and DXYS218. Sequencing of 45 chromosomes identified DXS1187 as a tetrameric repeat marker, with 10 alleles and a (GATA)2-GAT-(GATA)n sequence structure. According to the recommended nomenclature [23], the proposed designation of DXS1187 alleles ranges from 12 to 20. For DXYS218, sequencing of 42 chromosomes revealed a repeat structure of (AGAT)2-GAT-AGAT-ACAT-(AGAT)n in both X and Y chromosomes and alleles ranging from 11 to 17.
Sequence data obtained from the other eight STRs were consistent with published structures [5][6][7][8][9][10][11][12][13][14]. As previously described [11] for the XHPRT marker of one sample, we observed an architecture composed by an AG dinucleotide deletion 48-bp downstream of 12-repeat (AGAT) sequence producing an allele fragment that is shorter by two bases. At the DXYS267 locus, corresponding to the DYS393 marker [13], Elucigene QST*R-XY PCR detects the Y-chromosomal del(TTAG) polymorphism located 90-bp downstream of the STR. Therefore, allele frequencies for the DXYS267 marker were estimated using only female data. Similarly, DXS6807 sequencing indicated that Elucigene QST*R-XY PCR amplifies a del(AATAA) polymorphism, 62-bp downstream of the STR, associated to the allele 11.  The exact test on female data showed no significant deviation from Hardy-Weinberg equilibrium. No significant differences were found for any loci in allele frequencies between the male and female subgroups. For this reason, samples were pooled, and the allele frequencies of the ten X and Y chromosomal (XY)-STRs in the Italian population were reported (Table 3). At each locus, 6-13 alleles were observed.   The allele frequencies of the investigated X-STRs were similar to those reported in other European populations and in Italy [24][25][26]. No significant differences were detected when an exact test of population differentiations was performed to compare our marker frequencies with those obtained in other studies of Italian populations (100,000 steps in Markov chain).

Linkage and LD
Unlike autosomal STRs, gonosomal markers are syntenic. Both physical dependencies between loci (linkages) and dependencies between alleles at different associated loci (LD) may occur at gonosomal markers and can affect statistical analyses. In kinship testing, if loci are closely linked and in LD, it is highly indicated to use haplotype frequencies of clustered STRs rather than single STR frequencies [27][28][29].
Linkage measures the co-segregation of closely positioned loci within families and provides an estimate of the genetic distance between loci. Alleles in X-STR loci recombine in female meioses exclusively and at a frequency dependent upon their genetic distance. Table 4 summarizes the recombination fraction theta (ө) values and the genetic distances between the considered X-STR markers obtained from the Rutgers Map.   Genetic distances confirm the distribution of our X-STRs into the typical four linkage groups on Xp22.32 (DXS6807), Xq11/21 (DXS981-DXS6803-DXS6809), Xq26 (DXS1187-XHPRT), and Xq28 (DXS7423). The linkage groups are located at distances ranging from 35 to 78 cM (ө assessed from 0.30 to 0.46) and were regarded as indicative of independent genotypes. The DXS1187-XHPRT cluster spans approximately 1.7 cM, whereas markers in the DXS981-DXS6803-DXS6809 cluster are associated with distances of 6.59 cM and 8.72 cM, respectively. Similar genetic distances were published recently in a large recombination study [30].
LD measures the non-random associations of alleles at different loci at the population level. For closely linked markers, strong LD may be observed. A considerable LD implies a deviation of population-specific haplotype frequencies from the product of the corresponding allele frequencies. When this occurs, haplotype frequencies cannot be inferred from single-allele frequencies and instead must be estimated directly from population data. LD was assessed for all possible pairwise comparisons of loci and between X and Y STRs. Significant LD was only detected for the closest markers, DXS1187-XHPRT (p=0.043 ± 0.001). This result was expected according to the distance between these two X-chromosomal loci (1.7 cM) and given the smaller number of observed haplotypes compared to those expected (37 vs. 99). However, after Bonferroni correction for multiple tests, no significant association was detected in all pairwise comparisons.
Because of the large number of possible haplotypes, a solid evaluation of LD between DXS1187-HPRT would have required an extremely large sample size. Thus, haplotype frequencies were estimated directly from our population (

Marker informativity
Statistical parameters were calculated for the XY-STRs and for the two considered clusters ( Table 5). The heterozygosities and PIC values for most of the markers exceeded 0.7. In particular, DXS981 was associated with 13 alleles and a particularly large PIC of 0.815. Conversely, DXYS218 was the least polymorphic/informative locus with a PIC of 0.590. The genetic diversity of DYS448, corresponding to the PD, was estimated as 0.689. MEC values ranged from 0.385 to 0.815 in the separated XY-STR markers, whereas increases to 0.988 and 0.934 were observed in cluster I (DXS981-DXS6803-DXS6809) and cluster II (DXS1187-XHPRT), respectively. The low MEC values we obtained for pseudoautosomal DXYS218 and DXYS267 STRs highlight the reduced efficiency of these markers compared to X-STR systems for kinship testing involving a daughter.

Conclusion
The application of X-STR marker analyses to forensics has been widely recognized, and several X-STR systems have been validated and adapted into multiplex amplification kits for forensic purposes. Xchromosome markers are especially helpful for deficiency cases, such as paternity testing without maternal genotype information or kinship analyses in which only remote relatives are available. X-chromosomal markers are more efficient than autosomal markers in these cases because they are associated with a larger MEC for a comparable PIC. In the present study, an Italian population was genotyped at 10 gonosomal STR markers using the commercially available Elucigene QST*R-XY kit. Most of these markers had already been validated for forensic use. We characterized the following two additional loci to determine their forensic utility: DXYS218, located in the pseudoautosomal region PAR, and DXS1187, located in the region Xq26.1 near the XHPRT locus. Moreover, we like to point out that this kit, widely available in the diagnostic, non-forensic genetic laboratories, can be also used for a fast segregation analysis of Xchromosome during prenatal or postnatal genetic linkage analysis.