alexa DNASF: A Statistical Package to Analyze the Distribution and Polymorphism of CODIS STR Loci in a Heterogeneous Population | Open Access Journals
ISSN: 2157-7145
Journal of Forensic Research
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

DNASF: A Statistical Package to Analyze the Distribution and Polymorphism of CODIS STR Loci in a Heterogeneous Population

Nuzhat A Akram*

Department of Genetics, University of Karachi, Karachi, Pakistan

*Corresponding Author:
Nuzhat A Akram
Department of Genetics
University of Karachi
Karachi 75270, Pakistan
Tel: 923002589170
E-mail: [email protected]

Received date: October 15, 2012; Accepted date: October 27, 2012; Published date: October 29, 2012

Citation: Akram NA, Farooqi SR (2012) DNASF: A Statistical Package to Analyze the Distribution and Polymorphism of CODIS STR Loci in a Heterogeneous Population. J Forensic Res 3:170. doi:10.4172/2157-7145.1000170

Copyright: © 2012 Akram NA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Forensic Research

Abstract

Short Tandem Repeat (STR) markers are moderately repetitious DNA segments serving efficiently as a core sequence for the human identification. Their use as identification markers involves many technical and statistical issues. DNASF (DNA Statistics for Forensics) is a package of statistical programs designed to analyze the STR distribution in a heterogeneous population. It includes software DNA Forensics GenePro and DNA Forensics and a Microsoft excel workbook DNA AF. They can compute a number of parameters used to estimate the forensic utility of STR loci, including genetic diversity, unbiased heterozygosity, Shannon information index, polymorphism information content, and probability of exclusion and power of discrimination. In these programs each individual/ subpopulation is defined on the basis of two variables namely paternal ethnicity and mother tongue. The options for the two variables consist mainly of Indian subcontinent ethnicities and native languages but it does not undermine the software utility for researchers working on other populations. The input data are CODIS STR genotype and allele frequency data for DNA Forensics GenePro and DNA Forensics respectively. DNA AF can calculate allele frequency and other descriptive statistics from genotype data. Each component of DNASF is user friendly and provided with a set of instructions. For validation studies genotype data of five Pakistani subpopulations and allele frequency data of fifteen world populations were used. Validation studies of DNASF made it a reliable and effective tool for forensic investigations.

Keywords

STR polymorphism; Statistical package; Software; Forensic parameters; Validation studies

Introduction

Currently STR loci are the most informative genetic markers for identity testing [1]. High degree of STR polymorphism showed the great promise for the DNA typing in forensic applications [2-6]. American FBI has designated thirteen STR loci as a core set to be used in determining one’s individuality [7-9]. However, their use as human identification markers in a population is subjected to various issues. One of them is the level of polymorphism which determines their efficiency for human identification purposes and another is the presence of substructure in a population i.e. presence of genetically differentiated subpopulations within a population [10]. Subpopulation is a generic term indicating a cluster within a heterogeneous population [11]. Profile frequencies calculated from population averages might be seriously misleading for particular subpopulations [12-16]. Extensive studies from a wide variety of databases show that there are indeed substantial frequency differences among the major racial and linguistic groups. And within these groups there is often a statistically significant departure from random proportions. National Research Council (NRC) 1992, 1996 have suggested the use of random samples from “relatively genetically homogeneous” population [14,17]. Construction of subpopulation databases can play a crucial role in establishing the confidence in the result of DNA typing.

Advances in the current technology for DNA typing has made the construction of such databases at finer levels of population stratification trouble-free and straight forward. However, software tools are needed for various purposes like storing, tracking, comparing and analyzing such databases [1,18]. A wide variety of software has been written to facilitate the task of managing, error checking, and analyzing genotype and phenotype data for genetic studies [19]. DNASF is a statistical package that offers STR data entry and cumbersome analyses for estimation of forensic parameters in a user friendly manner. It comprises two softwares and a Microsoft Excel workbook. Each component is provided with a user guide/manual. All the components along with their user guide/ manual are available from the corresponding author.

In April 2011, the FBI laboratory proposed an expanded set of core STR loci for the United States in order to reduce the likelihood of adventitious matches [20-22]. However, the current battery of STR loci is still validated for the analysis of single source DNA profile cases. More autosomal STR loci are needed for kinship and DNA mixture analyses [22,23]. Therefore only thirteen STRs published in FBI CODIS program 1997 (http://www.cstl.nist.gov/strbase/fbicore.htm) are included in the program by default along with the list of their alleles (http://www.cstl.nist.gov/strbase/).

Materials and Methods

DNASF

DNASF consists of two software programs; DNA Forensics GenePro and DNA Forensics. It also includes a Microsoft excel workbook DNA AF.

DNA Forensics GenePro: The software needs genotype data for estimation of forensic parameters (Figure 1). Each subpopulation is defined on the basis of paternal ethnicity and mother tongue. Individuals’ genotype data is entered into his or her subpopulation.

forensic-research-DNS-forensics-GenePro

Figure 1: Screen shot of DNS forensics GenePro.

DNA Forensics: The software needs allele frequency data as input file for the estimation of the forensic parameters for a population or subpopulation which are defined on the basis of paternal ethnicity and mother tongue (Figure 2).

forensic-research-Screen-shot-DNA-forensics

Figure 2: Screen shot of DNA forensics.

DNA AF: This is an excel workbook which can calculate various statistics from genotype data (Figure 3). These include allele frequency, its variance and 95% confidence interval, heterozygotes, homozygotes, number of chromosomes genotyped and sample size of the population. It can also calculate the various forensic parameters.

forensic-research-Microsoft-Excel-worksheet-DNA-AF

Figure 3: Preview of Microsoft Excel worksheet DNA AF.

Validation studies

1.Five Pakistani subpopulations namely Balochi, Muhajir, Pathan, Punjabi and Sindhi were genotyped for three CODIS STR loci CSF1PO, TPOX and TH01. Genotype data was analyzed through a statistical program Powerstat (http://www.promega.com/geneticidtools/ powerstats/) and DNA Forensics GenePro for the estimation of forensic parameters. The parameters include heterozygosity, polymorphism information content, probability of exclusion, match probability and power of discrimination. Regression analysis was performed on three of the parameters namely match probability, power of discrimination and polymorphism information content to estimate the accuracy of computation (Figures 4-6).

forensic-research-DNA-Forensic-GenePro

Figure 4: Regression analysis between DNA Forensic GenePro and Powerstat for power of Discrimination (PD) of five Pakistani subpopulations.

forensic-research-five-Pakistani-subpopulations

Figure 5: Regression analysis between DNA Forensic GenePro and Powerstat for Match Probability (MP) of five Pakistani subpopulations.

forensic-research-DNA-Forensic-GenePro-Powerstat

Figure 6: Regression analysis between DNA Forensic GenePro and Powerstat for Polymorphism Information Content (PIC) of five Pakistani subpopulations.

2.Allele frequencies for each of the three STR loci across the five Pakistani subpopulations were calculated using excel workbook DNA AF. Allele frequencies were used as input data for DNA Forensics. Forensic parameters were calculated and the regression analyses were performed between them and the parameters calculated by Powerstat to estimate the accuracy of the software (Figures 7-9).

forensic-research-DNA-Forensics-Powerstat

Figure 7: Regression analysis between DNA Forensics and Powerstat for Match Probability (MP) of five Pakistani subpopulations.

forensic-research-Forensics-Powerstat

Figure 8: Regression analysis between DNA Forensics and Powerstat for Power of Discrimination (PD) of five Pakistani subpopulations.

forensic-research-DNA-Forensics-Powerstat

Figure 9: Regression analysis between DNA Forensics and Powerstat for Polymorphism Information Content (PIC) of five Pakistani subpopulations.

3.Forensic parameters Heterozygosity and Power of discrimination were calculated for the loci CSF1PO, TPOX and TH01 by DNA Forensics using allele frequency data reported in the literature (Table 1) (http:// dnaa.bravehost.com/index.html). The calculated parameters were then compared with those reported in the literature. Regression analysis was performed to estimate the accuracy of computation (Figures 10-15).

S.No. World Populations   CSF1PO TPOX TH01
Ref. No.a Heterozygosity Power of Discrimination(PD) Heterozygosity Power of Discrimination(PD) Heterozygosity Power of Discrimination(PD)
1 African Jordanian 37 0.779 0.917 0.743 0.896 0.769 0.912
2 African Mozambique 2 0.755 0.899 0.776 0.915 0.743 0.889
3 Bangladeshis 8 0.702 0.862 0.715 0.868 0.787 0.921
4 Bohemians 35 0.739 0.88 0.629 0.804 0.79 0.917
5 Brazilian 11 0.749 0.88 0.683 0.848 0.807 0.928
6 Ecuadorian 26 0.721 0.863 0.598 0.753 0.638 0.793
7 Andhra Pradesh Dravidian 1 4 0.735 0.88 0.716 0.865 0.773 0.908
8 Iranian 29 0.718 0.862 0.654 0.832 0.794 0.921
9 Kashmiris 24 0.731 0.873 0.658 0.814 0.798 0.922
10 Northern Greece 16 0.722 0.871 0.624 0.809 0.799 0.928
11 South African Whites 12 0.728 0.873 0.645 0.822 0.75 0.899
12 South African Blacks 12 0.781 0.916 0.788 0.921 0.718 0.867
13 Tibet Lassa 20 0.732 0.883 0.577 0.762 0.659 0.826
14 Nepal 18 0.726 0.882 0.661 0.824 0.689 0.864
15 Bhutan 17 0.722 0.875 0.618 0.797 0.689 0.859

Table 1: Fifteen world populations’ heterozygosities and power of discrimination for the loci CSF1PO, TPOX and TH01 calculated by DNA Forensics.

forensic-research-fifteen-world-populations

Figure 10: Regression analysis between DNA Forensics and related literature for CSF1PO heterozygosity of fifteen world populations.

forensic-research-heterozygosity-fifteen-world

Figure 11: Regression analysis between DNA Forensics and reported literature for TPOX heterozygosity of fifteen world populations.

forensic-research-TH01-heterozygosity

Figure 12: Regression analysis between DNA Forensics and those reported literature for TH01 heterozygosity of fifteen world populations.

forensic-research-CSF1PO-fifteen-world-populations

Figure 13: Regression analysis between DNA Forensics and reported literature for power of discrimination (PD) of CSF1PO of fifteen world populations.

forensic-research-TPOX-fifteen-world-populations

Figure 14: Regression analysis between DNA Forensics and reported literature for Power of Discrimination (PD) of TPOX of fifteen world populations.

forensic-research-literature-Power-Discrimination

Figure 15: Regression analysis between DNA Forensics and reported literature for Power of Discrimination (PD) of TH01 of fifteen world populations.

Calculations for forensic parameters

1. Unbiased Heterozygosity (H)

Unbiased Heterozygosity is calculated as 2n (1- Σ pi2) / (2n-1) where n is the number of chromosomes examined and pi2 is the frequency of heterozygotes [24]. If the number of individuals sampled for the STR locus is 30 then H will be calculated as

[2*30(pi2)/[(2*30)-1]                                                    (1)

2. Probability of Identity (PI) and Power of Discrimination (PD)

PI is derived by the formula Σ (xi)2 + Σ (xij )2 where xi stands for the frequency of homozygotes and is equal to pi2 .While xij stands for the frequency of heterozygotes and is equal to 2 pi qj, where pi and qj stands for the frequencies of i-th and j-th alleles of a locus. PD is defined as [19],

1- Σ (xi)2 + Σ (xij )2 or 1- PI                                            (2)

3. Polymorphism Information Content (PIC)

PIC is defined as

1- Σn i=1 pi2 - 2 Σn-1 i=1 Σn j=1 pi2 pj2                         (3)

where pi and pj stands for the frequencies of i-th and j-th alleles of a locus [25].

Results

1. Regression analysis between POWERSTAT and DNA Forensics GenePro shows a coefficient of determination of 1 which confirms the accuracy of the software in calculating the parameter (Figures 4-6).

2. Regression analysis between POWERSTAT and DNA Forensics for Pakistani subpopulations shows a regression coefficient of determination approaching 1 (0.88 to 0.99) which confirms the accuracy of the software in calculating the parameter (Figures 7-9).

3. Regression analysis between DNA Forensics and those reported in the literature shows a coefficient of determination from 0.80 to 0.999 for heterozygosity which confirms the accuracy of the software in calculating the parameter (Figures 10-12). The coefficient of determination decreases for power of discrimination from 0.14 (CSF1PO) to 0.54 (TPOX) (Figures 13-15). However, the p values are significant (<<0.05) only for coefficients of determination of TPOX and TH01.

Discussion

Molecular information from highly variable DNA markers is being widely used to identify individuals or evidence for forensic purposes [4,26,27]. Various software programs have been introduced to make the use of DNA for identification purposes a fast and trouble-free process. DNASF comprises relatively simple programs; nonetheless, their utility for forensic community cannot be underestimated as they provide the basic calculations considered essential for forensic investigations.

Moreover, the programs encourage the researcher to categorize the population under study into smaller groups that can be differentiated on the basis of their paternal ethnicity and/or mother tongue. This is in accordance with recommendations of National Research Council 1996 to construct DNA databases of genetically homogeneous populations [17]. There may be important differences in allele frequencies in different subpopulations or ethnic groups and these differences may influence the calculations. For example match probability may be higher when a person is compared to his or her own ethnic group than when she or he is compared to the whole population. When population substructure is ignored, the match probability is simply the relative frequency of the defendant’s profile in the suspected population of the culprit [15]. Essentially, this treats each human population as large and randomly mating, ignoring possible subpopulations. People in these subpopulations could tend to mate within their subpopulation which would lead to different allelic frequencies than those estimated from the overall population [28]. To estimate these possible differences, it is necessary to make databases of each subpopulation or ethnic group within a larger population. Another measure to minimize the effect of background relatedness among the subpopulations on forensic calculations is the use of inbreeding coefficient (θ) [29,30]. In 1994, Balding and Nichols proposed a method for calculating match probabilities, which makes use of this inbreeding coefficient [31].

Another feature of the DNASF is that it provides a number of options for the user. For example options are given for paternal ethnicity, mother tongue, CODIS STR loci and their alleles. Data entry by the user is kept minimum thus reducing the chances of error. Although the options for paternal ethnicity and mother tongue mainly consist of ethnicities and native languages of Indian subcontinent populations but it does not undermine the software utility for researcher working on other populations. They can use the software by using the options of ‘unknown’ or ‘other’ for paternal ethnicity or mother tongue. Moreover, validation studies of the statistical package make it a reliable and efficient tool for researchers working on CODIS STR loci.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Recommended Conferences

Article Usage

  • Total views: 11658
  • [From(publication date):
    December-2012 - Aug 20, 2017]
  • Breakdown by view type
  • HTML page views : 7822
  • PDF downloads :3836
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords