Home   |  Publications   |   Conferences    |  Join   |   Contact   | Sitemap  

Journal of Forensic Research

Open Access
ISSN: 2157-7145
home » dnasf-a-statistical-package-to-analyze-the-distribution-and polymorphism-of-codis-str-loci-in-a-heterogeneous-population-2157-7145 Rss Feed Rss Feed
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals that operates with the help of 50,000 + Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 1000+ Global Events inclusive of 300+ Conferences, 500+ Upcoming and Previous Symposiums and
Workshops on Pharma, Medicine, Science and Technology
Research Article Open Access
DNASF: A Statistical Package to Analyze the Distribution and Polymorphism of CODIS STR Loci in a Heterogeneous Population
Nuzhat A Akram*
Department of Genetics, University of Karachi, Karachi, Pakistan
Corresponding Author : Nuzhat A Akram
Department of Genetics
University of Karachi
Karachi 75270, Pakistan
Tel: 923002589170
E-mail: smr.akram@gmail.com
Received October 15, 2012; Accepted October 27, 2012; Published October 29, 2012
Citation: Akram NA, Farooqi SR (2012) DNASF: A Statistical Package to Analyze the Distribution and Polymorphism of CODIS STR Loci in a Heterogeneous Population. J Forensic Res 3:170. doi: 10.4172/2157-7145.1000170
Copyright: © 2012 Akram NA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at
DownloadPubmed DownloadScholar Google
Visit for more related articles at
DownloadJournal of Forensic Research

Short Tandem Repeat (STR) markers are moderately repetitious DNA segments serving efficiently as a core sequence for the human identification. Their use as identification markers involves many technical and statistical issues. DNASF (DNA Statistics for Forensics) is a package of statistical programs designed to analyze the STR distribution in a heterogeneous population. It includes software DNA Forensics GenePro and DNA Forensics and a Microsoft excel workbook DNA AF. They can compute a number of parameters used to estimate the forensic utility of STR loci, including genetic diversity, unbiased heterozygosity, Shannon information index, polymorphism information content, and probability of exclusion and power of discrimination. In these programs each individual/ subpopulation is defined on the basis of two variables namely paternal ethnicity and mother tongue. The options for the two variables consist mainly of Indian subcontinent ethnicities and native languages but it does not undermine the software utility for researchers working on other populations. The input data are CODIS STR genotype and allele frequency data for DNA Forensics GenePro and DNA Forensics respectively. DNA AF can calculate allele frequency and other descriptive statistics from genotype data. Each component of DNASF is user friendly and provided with a set of instructions. For validation studies genotype data of five Pakistani subpopulations and allele frequency data of fifteen world populations were used. Validation studies of DNASF made it a reliable and effective tool for forensic investigations.

STR polymorphism; Statistical package; Software; Forensic parameters; Validation studies
Currently STR loci are the most informative genetic markers for identity testing [1]. High degree of STR polymorphism showed the great promise for the DNA typing in forensic applications [2-6]. American FBI has designated thirteen STR loci as a core set to be used in determining one’s individuality [7-9]. However, their use as human identification markers in a population is subjected to various issues. One of them is the level of polymorphism which determines their efficiency for human identification purposes and another is the presence of substructure in a population i.e. presence of genetically differentiated subpopulations within a population [10]. Subpopulation is a generic term indicating a cluster within a heterogeneous population [11]. Profile frequencies calculated from population averages might be seriously misleading for particular subpopulations [12-16]. Extensive studies from a wide variety of databases show that there are indeed substantial frequency differences among the major racial and linguistic groups. And within these groups there is often a statistically significant departure from random proportions. National Research Council (NRC) 1992, 1996 have suggested the use of random samples from “relatively genetically homogeneous” population [14,17]. Construction of subpopulation databases can play a crucial role in establishing the confidence in the result of DNA typing.
Advances in the current technology for DNA typing has made the construction of such databases at finer levels of population stratification trouble-free and straight forward. However, software tools are needed for various purposes like storing, tracking, comparing and analyzing such databases [1,18]. A wide variety of software has been written to facilitate the task of managing, error checking, and analyzing genotype and phenotype data for genetic studies [19]. DNASF is a statistical package that offers STR data entry and cumbersome analyses for estimation of forensic parameters in a user friendly manner. It comprises two softwares and a Microsoft Excel workbook. Each component is provided with a user guide/manual. All the components along with their user guide/ manual are available from the corresponding author.
In April 2011, the FBI laboratory proposed an expanded set of core STR loci for the United States in order to reduce the likelihood of adventitious matches [20-22]. However, the current battery of STR loci is still validated for the analysis of single source DNA profile cases. More autosomal STR loci are needed for kinship and DNA mixture analyses [22,23]. Therefore only thirteen STRs published in FBI CODIS program 1997 (http://www.cstl.nist.gov/strbase/fbicore.htm) are included in the program by default along with the list of their alleles (http://www.cstl.nist.gov/strbase/).
Materials and Methods
DNASF consists of two software programs; DNA Forensics GenePro and DNA Forensics. It also includes a Microsoft excel workbook DNA AF.
DNA Forensics GenePro: The software needs genotype data for estimation of forensic parameters (Figure 1). Each subpopulation is defined on the basis of paternal ethnicity and mother tongue. Individuals’ genotype data is entered into his or her subpopulation.
DNA Forensics: The software needs allele frequency data as input file for the estimation of the forensic parameters for a population or subpopulation which are defined on the basis of paternal ethnicity and mother tongue (Figure 2).
DNA AF: This is an excel workbook which can calculate various statistics from genotype data (Figure 3). These include allele frequency, its variance and 95% confidence interval, heterozygotes, homozygotes, number of chromosomes genotyped and sample size of the population. It can also calculate the various forensic parameters.
Validation studies
1.Five Pakistani subpopulations namely Balochi, Muhajir, Pathan, Punjabi and Sindhi were genotyped for three CODIS STR loci CSF1PO, TPOX and TH01. Genotype data was analyzed through a statistical program Powerstat (http://www.promega.com/geneticidtools/ powerstats/) and DNA Forensics GenePro for the estimation of forensic parameters. The parameters include heterozygosity, polymorphism information content, probability of exclusion, match probability and power of discrimination. Regression analysis was performed on three of the parameters namely match probability, power of discrimination and polymorphism information content to estimate the accuracy of computation (Figures 4- 6).
2.Allele frequencies for each of the three STR loci across the five Pakistani subpopulations were calculated using excel workbook DNA AF. Allele frequencies were used as input data for DNA Forensics. Forensic parameters were calculated and the regression analyses were performed between them and the parameters calculated by Powerstat to estimate the accuracy of the software (Figures 7-9).
3.Forensic parameters Heterozygosity and Power of discrimination were calculated for the loci CSF1PO, TPOX and TH01 by DNA Forensics using allele frequency data reported in the literature (Table 1) (http:// dnaa.bravehost.com/index.html). The calculated parameters were then compared with those reported in the literature. Regression analysis was performed to estimate the accuracy of computation (Figures 10-15).
Calculations for forensic parameters
1. Unbiased Heterozygosity (H)
Unbiased Heterozygosity is calculated as 2n (1- Σ pi2) / (2n-1) where n is the number of chromosomes examined and pi2 is the frequency of heterozygotes [24]. If the number of individuals sampled for the STR locus is 30 then H will be calculated as
[2*30(pi2)/[(2*30)-1]                                                    (1)
2. Probability of Identity (PI) and Power of Discrimination (PD)
PI is derived by the formula Σ (xi)2 + Σ (xij )2 where xi stands for the frequency of homozygotes and is equal to pi2 .While xij stands for the frequency of heterozygotes and is equal to 2 pi qj, where pi and qj stands for the frequencies of i-th and j-th alleles of a locus. PD is defined as [19],
1- Σ (xi)2 + Σ (xij )2 or 1- PI                                            (2)
3. Polymorphism Information Content (PIC)
PIC is defined as
1- Σn i=1 pi2 - 2 Σn-1 i=1 Σn j=1 pi2 pj2                         (3)
where pi and pj stands for the frequencies of i-th and j-th alleles of a locus [25].
1. Regression analysis between POWERSTAT and DNA Forensics GenePro shows a coefficient of determination of 1 which confirms the accuracy of the software in calculating the parameter (Figures 4-6).
2. Regression analysis between POWERSTAT and DNA Forensics for Pakistani subpopulations shows a regression coefficient of determination approaching 1 (0.88 to 0.99) which confirms the accuracy of the software in calculating the parameter (Figures 7-9).
3. Regression analysis between DNA Forensics and those reported in the literature shows a coefficient of determination from 0.80 to 0.999 for heterozygosity which confirms the accuracy of the software in calculating the parameter (Figures 10-12). The coefficient of determination decreases for power of discrimination from 0.14 (CSF1PO) to 0.54 (TPOX) (Figures 13-15). However, the p values are significant (<<0.05) only for coefficients of determination of TPOX and TH01.
Molecular information from highly variable DNA markers is being widely used to identify individuals or evidence for forensic purposes [4,26,27]. Various software programs have been introduced to make the use of DNA for identification purposes a fast and trouble-free process. DNASF comprises relatively simple programs; nonetheless, their utility for forensic community cannot be underestimated as they provide the basic calculations considered essential for forensic investigations.
Moreover, the programs encourage the researcher to categorize the population under study into smaller groups that can be differentiated on the basis of their paternal ethnicity and/or mother tongue. This is in accordance with recommendations of National Research Council 1996 to construct DNA databases of genetically homogeneous populations [17]. There may be important differences in allele frequencies in different subpopulations or ethnic groups and these differences may influence the calculations. For example match probability may be higher when a person is compared to his or her own ethnic group than when she or he is compared to the whole population. When population substructure is ignored, the match probability is simply the relative frequency of the defendant’s profile in the suspected population of the culprit [15]. Essentially, this treats each human population as large and randomly mating, ignoring possible subpopulations. People in these subpopulations could tend to mate within their subpopulation which would lead to different allelic frequencies than those estimated from the overall population [28]. To estimate these possible differences, it is necessary to make databases of each subpopulation or ethnic group within a larger population. Another measure to minimize the effect of background relatedness among the subpopulations on forensic calculations is the use of inbreeding coefficient (θ) [29,30]. In 1994, Balding and Nichols proposed a method for calculating match probabilities, which makes use of this inbreeding coefficient [31].
Another feature of the DNASF is that it provides a number of options for the user. For example options are given for paternal ethnicity, mother tongue, CODIS STR loci and their alleles. Data entry by the user is kept minimum thus reducing the chances of error. Although the options for paternal ethnicity and mother tongue mainly consist of ethnicities and native languages of Indian subcontinent populations but it does not undermine the software utility for researcher working on other populations. They can use the software by using the options of ‘unknown’ or ‘other’ for paternal ethnicity or mother tongue. Moreover, validation studies of the statistical package make it a reliable and efficient tool for researchers working on CODIS STR loci.



 Download    pdf version of this article


Tables at a glance

Table 1

Figures at a g-lance

Figure 1  Figure 2  Figure 3  Figure 4  Figure 5  Figure 6

Figure 7  Figure 8  Figure 9  Figure 10  Figure 11  Figure 12

Figure 13  Figure 14  Figure 15
Select your language of interest to view the total content in your interested language
Share This Article
About Journal
Disc Journal Home
Disc Editorial Board Members
Disc Current Issue
Disc Previous Issue
Disc Instructions for Authors
Disc Submit Manuscript
Disc Contact Editorial Office
Related Journals
Disc Anatomy & Physiology: Current Research
Disc Sociology and Criminology-Open Access
Related Conferences
Disc 7thAsia Pacific Biotech Congress July 13-15, 2015
Beijing, China
Disc 5th International Conference on Proteomics & Bioinformatics September 01-03, 2015
Valencia, Spain
Disc 4th International Conference on Forensic Research & Technology
September 28-30, 2015 Atlanta, USA
Article Tools
Disc Export citation
Disc Share/Blog this article
Disc Recommend to your Librarian
Article usage
  Total views: 10799
  [From(publication date):
December-2012 - Aug 04, 2015]
  Breakdown by view type
  HTML page views : 7076
  PDF downloads :3723
Subject wise
Open Access Journals
Disc Clinical Journals
Disc Chemistry Journals
Disc Engineering Journals
Disc Environmental Journals
Disc Life Sciences Journals
  Read More »

Post your comment

Your question:
Anti Spam Code:
  Reload  Can't read the image? click here to refresh

OMICS International Conferences 2015-16

Meet Inspiring Speakers and Experts at our 1000+ Global Annual Meetings
Conferences By Country
  USA   Spain   Poland
  Australia   Canada   Austria
  UAE   Switzerland   Turkey
  Italy   France   Finland
  Germany   India   Ukraine
  UK   Malaysia   Denmark
  Japan   Singapore   Mexico
  Brazil   South Africa   Norway
  South Korea   New Zealand   China
  Netherlands   Philippines
Medical & Clinical Conferences
Microbiology Oncology & Cancer
Diabetes & Endocrinology Cardiology
Nursing Dentistry
Healthcare Management Physical Therapy Rehabilitation
Neuroscience Psychiatry
Immunology Infectious Diseases
Gastroenterology Medical Ethics & Health Policies
Genetics & Molecular Biology Palliativecare
Pathology Reproductive Medicine & Women Healthcare
Alternative Healthcare Surgery
Pediatrics Radiology
Conferences by Subject
Pharmaceutical Sciences
Pharma Marketing & Industry
Environmental Science
Physics & Materials Science
EEE & Engineering
Chemical Engineering
Business Management
Geology & Earth science
20082014 OMICS Group - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version