Received date: March 10, 2011; Accepted date: March 24, 2011; Published date: March 29, 2011
Citation: Moroni R, Gasbarra D, Arjas E, Lukka M, Ulmanen I (2011) Effects of Reference Population and Number of STR Markers on Positive Evidence in Paternity Testing. J Forensic Res 2:119. doi:10.4172/2157-7145.1000119
Copyright: © 2011 Moroni R, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Visit for more related articles at Journal of Forensic Research
Three sets of commonly used autosomal short tandem repeat (STR) markers (containing 15, 10 and 9 markers) and 14 databases from populations belonging to Africa, America, Asia and Europe were used to investigate how the selection of the population database and the number of considered markers would influence the statistical evidence that is usually produced to favour paternity. The study was based on a sample of 100 randomly chosen Finnish paternity trios collected during paternity testing case work and without any exclusion after use of 15 STR markers. Paternity Index, Probability of Paternity, Typical Paternity Index and Probability of Exclusion were computed and descriptive statistics were provided separately for trios (mother, child and putative father) and duos (obtained from trios but not considering the genetic information of the mother). This was done for all combinations of markers and databases. In trio cases the differences between results obtained are not statistically significant. However, especially in duo cases the use of 15 STR markers is recommended.
Paternity Testing; Reference Population; Database Effect; Marker Effect
STR: Short Tandem Repeat; PF: Putative Father; M: Mother; C: Child; PI: Paternity Index; W: Probability of Paternity; PE: Probability of Exclusion; TPI: Typical Paternity Index; RMNE: Random Man Not Excluded
In recent years a growing number of disputed paternities have involved parents from different ethnic backgrounds. We examined how sensitive the Paternity Index (PI) , the Probability of Paternity (W)  and the Probability of Exclusion (PE) [1-3] are to the selection of the population STR database ('Database Effect') and to the number of considered STR markers ('Marker Effect'). To evaluate this we performed paternity tests on 100 Finnish trios and duos with three sets of commonly used STR markers (containing 15, 10 and 9 markers) and 14 population databases, representing different marker allele frequencies. Such frequencies are estimated for each specific population and will generally vary between populations. As a consequence, calculating probabilities using different population data means differences in the probabilities' values. Concerning the 'Database Effect' we carried out a comparative statistical analysis of PI and W which were at first calculated considering the allele frequencies of the putative father's own population (Finnish), and then using some other reference population databases. A good measure of this effect is obtained from the analysis of Typical Paternity Index (TPI) . Furthermore, we establish whether there is a correlation between a measure of genetic distance between population (Nei's distance)  and the Typical Paternity Index Ratio (TPI Ratio). Some used measures will be more extensively explained in the materials and methods section. Concerning the 'Marker Effect' we investigated the changes in the PI when performing the test using only the Finnish allele frequencies but considering three different sets of markers. Forensic paternity testing is conducted using DNA profiles which consist of genotyping of several highly polymorphic short tandem repeat (STR) markers, chosen on different chromosome in order to ensure an independent segregation. Mutation  can sometimes cause an apparent discrepancy with the rules ofinheritance between parent and child, but the incompatibility in at least three independent DNA markers is widely seen as sufficient evidence to exclude paternity. In the simplest case of disputed paternity a man, denoted PF, is claimed, by a mother M, to be the true father of a child C. According to , in a standard paternity case, two hypotheses are competing: a prosecution hypothesis (Hp) where PF is supposed to be the true father of C, and a defendant one (Hd) where some other man is the true father of C. Hence, if PF is not the true father of C, then an unknown man, randomly drawn from a population, which we call reference population, is assumed to be the true father of C, and we suppose him to be unrelated to both PF and M.
A total of 100 Finnish standard 'Trios' paternity cases (putative father, mother and child) with no genetic inconsistencies between mother and child or between father and child, were collected during paternity testing case work. The Finnish origin of fathers and mothers was based on surnames. The genotyping of the STR-loci was performed with AmpFlSTR Profiler kit (9 markers), AmpFlSTR SGM Plus kit (10 markers) and with combination of Profiler and SGM kits (15 markers: TH01, D3S1358, FGA, TPOX, CSF1PO, D5S818, D13S317, vWA, D8S1179, D21S11, D18S51, D19S433, D16S539, D7S820, D2S1338). A database of allele frequencies for 14 populations (Finland, Poland, Turkey, Vojvodina, Extremadura, Italy, Belgium, Kosovo, Mexico, Taiwan, Korea, El Salvador, Somalia, Mozambique) was collected from studies that have been already published. Probabilities for paternity were computed, in every trio and in every duo (motherless cases, obtained from trios but not considering the genetic information of the mother) for all combinations of markers and databases. In particular Paternity Index, Probability of Paternity, Typical Paternity Index and Probability of Exclusion were computed and descriptive statistics performed. In order to have a measure of the Database Effect we compared the TPI of different populations with Finnish TPI. TPI is the harmonic mean of PIs and it is considered to be a good measure of typical performance . Nei's distance was calculated between the Finnish population and all the other 13 populations to investigate if the PI is influenced by the genetic distance between reference population and the putative father's population.
Database of allele frequencies for the 15 STR loci for 14 populations was collected from studies that have been already published in journals. The considered populations are the following: Poland (Central Europe) , Turkey (Southwestern Asia) , Vojvoidina (Autonomous Province in Serbia - Southeastern Europe) , Extremadura (Autonomous Community in Spain - Southern Europe) , Italy (Southern Europe) [11-13], Belgium (Northwestern Europe) , Kosovo (Province in Serbia-Southeastern Europe) , Mexico (Northern America) , Taiwan (Eastern Asia) , Korea (Eastern Asia) , El Salvador (Central America) , Somalia (Eastern Africa) , Mozambique (Southeastern Africa)  and Finland (Northern Europe), whose alleles frequencies have been kindly supplied by THL.
Paternity testing and related measures
As the PI and W have been widely explained elsewhere [1-3], we are not going to discuss them in this article. Calculation of the PI has been performed using uniform prior probabilities. The Probability of Exclusion [1,3] was calculated from the following relation:
PE =1− RMNE (1)
where RMNE denotes the Random Man Not Excluded, that is the proportion, in the reference population, of men that present all the obligate alleels, where the 'obligate allele' is defined as the one that must come from the biological father, under the hypothesis that M is the mother of the disputed child C and no mutations occurred. For a given marker j, with j = 1, ., N, the mother M and the child C can have 1 or 2 common alleles. If M and C have 1 common allele, denoted with I, then there is 1 obligate allele that the biological father must contribute. Then we let
RMNEj = 1 − (1 − pji)2(2)
If M and C have two common alleles i and k, then there are two obligate alleles and the true father could contribute either i or k. In this case we define
RMNEj = 1-(1-pij-pjk)2 (3)
The overall RMNE is obtained as follows, where it is assumed that the markers are independent:
RMNE is considered to be the power of a paternity test, because it is strictly related to the Probability of Exclusion (PE) that represents the probability of excluding a falsely accused man. The Typical Paternity Index (TPI) is the harmonic mean of the PIs obtained after performing the paternity test, and it is considered to be a good measure of typical performance . TPI has been here calculated for every population P considering all the cases c =1,., 100 as follows:
In order to have a measure of the 'Database Effect' we compared the TPI of different populations with Finnish TPI. Particularly, given a population P and its TPI, denoted with TPIP, we calculated the following TPI Ratio:
Genetic Nei's distance (1972) was calculated between the Finnish population and all the other 13 populations (see Table 1). Let A and B be two populations and denote with j= 1, ., N the markers and with i=1, ., nj the alleles; Nei's distance is:
Table 1: Nei’s distance between Finnish population and all the others considered.
where is the frequency of the ith allele of the jth marker in the population A and is the frequency of the ith allele of the jth marker in the population B.
100 paternity cases were investigated and in all cases the PI was greater than 10000; moreover no genetic inconsistencies were observed between mother and child or between father and child.
'Database Effect' has been defined as the effect on PI when the paternity test is performed using a reference population different from the one the putative father is from. Results for standard trio and 15 markers (Table 2) show that there is a 'Database Effect' on PI, but not strong enough to change the final conclusion of the paternity test. In fact, the minimum value of PI is 1.4558E+05 (obtained with Kosovo's allele frequencies) which gives a Probability of Paternity of 0.99999. The maximum value for the same parameter is 6.3693 E+17 and it is obtained with El Salvador. TPI gave us an idea of the typical performance of the PI. The minimum value (2.7222E+06) is related to Poland and the maximum (2.8747E+07) is related to Mozambique. It seems that the lowest values of the considered parameters are related to populations genetically close to Finnish one. We can observe a similar behaviour in the TPI Ratios, that we consider as a good measure of the 'Database Effect'. All the TPI Ratios values, with the exception of the one related to Poland, which is the lowest (0.9281) are bigger than 1; in particular the two highest values are obtained with Mozambique (9.8010) and Somalia (8.9809). These two populations present the two highest Nei's distance values.
|Population||TPI||min PI||Max PI||TPI Ratios|
Table 2: Main results for trio cases using 15 markers.
Among motherless cases (Table 3) PIs are generally smaller than in standard trios, but again the final conclusion of the test is not changed. As already seen for trio cases, the lowest values of the parameters are related with Finnish population or with some other population genetically close to the Finnish one. In particular, the minimum PI value (3.2900E+02) is obtained considering Finnish as reference population and the corresponding Probability of Paternity is equal to 0.9970. The maximum value of PI (1.9706E15) is related to El Salvador. TPI takes this minimum value (7.6197E+03) with Finnish allele frequencies and the maximum (7.8919E+04) with Mozambique. TPI Ratios show the presence of a 'Database Effect', and results are slightly less variable than what we have observed in standard trio cases. All the ratios are bigger than 1 and the maximum TPI Ratios value (10.3573) is obtained with Mozambique.
|Population||TPI||min PI||Max PI||TPI Ratios|
Table 3: Main results for duo cases using 15 markers.
Looking at Figure 1 and Figure 2, a positive linear association can intuitively be seen between Nei's distance and TPI Ratio and moreover the variability of TPI Ratio appears to increase with increasing Nei's distance. In particular, there is a strong positive correlation between TPI Ratio and Nei's distance, in both duo and trio cases, using 15 markers. This result is confirmed by the high value of the Multiple Correlation Coefficient R (0.893 for trios and 0.905 for duos). R-square values (0.798 for trios and 0.818 for duos) show that the model is able to explain 79% of the variation in TPI Ratio.
Concerning W and PE, for trio cases and for every choice of reference population, these two probabilities take values greater than 0.9999. On the other hand, for duo cases W and PE can also take values in the interval (0.99-0.9999). Using the 10 markers set all the considered measures' values are smaller than the values obtained with the 15 markers set. This is due to a reduced amount of information involved in the test. In standard trio cases, we can still observe the presence of a 'Database Effect' but, as we have seen for the 15 markers set, the influence of a different reference population does not substantially changing the final result of the paternity test. The minimum PI is related, once again, to Finland (1.0290E+03) and the Probability of Paternity is 0.999. The maximum value for PI is 1.7618E+14 and it is obtained with El Salvador. Using the 10 marekrs set, as when using the 15 markers set, the lowest values of the parameters are obtained with Finland or using allele frequencies from populations genetically close to Finnish. The minimum value for TPI (2.7645E+04) is related to Finland and the maximum value (1.7555E+05) obtained with Mozambique. 'Database Effect' is well represented by TPI Ratios. Once again the maximum value, which is 6.3502, is linked with a population (Mozambique) which is genetically far from the Finnish one. Duo cases' values, for every parameter, are lower then those observed in trio cases. The minimum value for PI (1.9000E+01) related with Finland, implies a Probability of Paternity of 0.95. The minimum value for TPI (5.3223E+02, Finland) gives a Probability of Paternity of 0.9981, the maximum value is obtained with Somalia (3.1026E+03). The minimum value for TPI Ratio is 1 and it is related to Finland, the maximum value is 5.8295 and it is obtained using Somalian allele frequencies.
Using the 10 markers set with motherless cases leads to a reduction in the amount of information such that the Probability of Paternity (W) does not reach the commonly used threshold value 0.99.
A good positive linear association between Nei's distance and TPI Ratio exists both in trio and duo cases. The Multiple Correlation Coefficient R is quite high (0.926 for trios and 0.869 for duos). Moreover the model is able to explain 85% and 75% of the variation in TPI Ratio, for trios and duos respectively. The variability of TPI Ratio appears to increase with increasing Nei's distance.
Reducing the amount of markers considered creates lower values for W and PE in both trio and duo cases for every choice of reference population. In particular, for trio cases W and PE takes values bigger than 0.999. For duo cases, most of the results are bigger than 0.95, only one case, using Polish frequencies, produced a W value smaller than 0.95.
Considering the third set of markers, the one containing 9 markers, results are quite similar to those obtained with 10 markers. Final results of paternity test are not changed for standard trio cases, but among motherless cases it is possible to have PIs not high enough to produce a Probability of Paternity greater than or equal to 0.99. This is probably due to the joint effect of a small number of considered markers and lack of maternal information. Particularly, in trio cases the minimum observed PI value is1.6900E+02 corresponding to a Probability of Paternity of 0.994 and it is obtained with Voivodina allele frequencies. The maximum observed PI value is 2.7757E+08 obtained with Korea. The average value of the PI is once again described by TPI: the minimum obtained value for this parameter is 2.7382E+03 (Poland) with a Probability of Paternity of 0.9996, the maximum value is 7.3046E+03 (Somalia). The TPI Ratios show a weak 'Database Effect' with a minimum value of 0.9570 (Poland) and a maximum of 2.5530 obtained with Somalia. It is important to underline that even Voivodina and Belgium present a TPI Ratio smaller than 1. Results for duo cases present a minimum value for PI of 3.0000E+00 obtained with two populations, Poland and Voivodina which lead a Probability of Paternity of 0.75. The highest value for PI is 8.1235E+06 and it is related to Korea. Concerning the TPI Ratio, this parameter has its minimum with Poland (0.8804) and its maximum (3.3337) with Mozambique. It seems that using Finnish allele frequencies with the 9 markers set does not grant the best results. Studying the relation between Nei's distance and TPI Ratio, considering 9 markers, gives some interesting results. It seems that the model fits very well the 'duo cases-9 markers' set of data, which is the one with the biggest lack of information. The coefficient R is quite high for trios and duos, 0.854 and 0.928 respectively. R-squared shows that the model is able to explain 72.9% and 86.2% of the variation in TPI Ratio in trios and duos respectively. The variability of TPI Ratio appears to increase with increasing Nei's distance, especially for trio cases.
Results for W and PE are decreasing, especially in duo cases. In detail, for trio cases and for every choice of reference population, W and PE take values bigger than 0.99. For duo cases, on the other hand, W and PE can take values even smaller than 0.95. Once again, values obtained with motherless cases are smaller than those obtained in trios.
The following results have been obtained considering only the Finnish population as reference, in order to investigate the effect on some paternity measures of using different sets of markers. Table 4 shows that trio cases are weakly influenced by the number of considered markers, in fact reducing their amount produces a decreasing in values both for W and PE, but the final conclusion of the paternity test does not change using a different set of markers. Moreover W and PE follow the same distribution in trios. Duo cases present a different situation: results are strongly influenced by the number of considered markers. In particular, values for W and PE are smaller than those presented by trios and for some cases the final conclusion of the test is changed, presenting a value of W smaller than the commonly used threshold 0.99. Results suggest that when dealing with a reduced amount of information, typical of duo cases, it is advisable to use a 15 markers set.
(1) <0,95 (2) 0,95 – 0,99 (3) 0,99 – 0,999 (4) 0,999 – 0,9999 (5) >0,9999
Table 4: Frequencies distribution for Probability of Paternity and Probability of Exclusion, Finnish reference population.
Analyzing the TPI we have an idea of the typical performance of PI. Looking at Table 5 it is easy to notice that, both for trios and duos, TPIs become smaller. As we reduce the number of markers considered; once again duo cases are the most strongly influenced.
Table 5: TPI for duos and trios using different set of markers.
The results suggest that for trio cases there is no need to use a specific population database, no matter what set of markers is in use, as the Probability of Paternity is always greater than 0.99, 0.999 and 0.9999 for 9, 10 and 15 markers, respectively. The Probability of Paternity and the Probability of Exclusion display similar distribution. Duo cases present a notable database effect and values of Probability of Paternity and Probability of Exclusion being significantly smaller than what we observed in the trio cases. It is recommendable to use a kit with the highest number of markers, in order to achieve reliable results. Analysis of TPI Ratios shows, for every set of markers and for both trio and duo cases, a positive linear relation between Nei's distances and TPI Ratios. So the more the reference population is genetically far from the Finnish one, the bigger TPI values will be. In particular if the putative father is the true father of the disputed child, then using Finnish allele frequencies will produce a smaller value of PIs than those obtained using allele frequencies of a population wich is genetically far from the Finnish one. It could be interesting to investigate if this behaviour holds in general for every population or whether it is typical of the Finnish population. The 'Marker Effect' is the most critical one as it influences both PI's results and Probability of Paternity, especially in motherless cases, where using the 9 markers set can lead to an exclusion from paternity. It is evident that results obtained with a reduced amount of information (reduced number of markers) are smaller than those obtained with the 15 markers set. It would be fruitful to perform the same experiment using a different set of markers, composed by those markers that present a high level of entropy.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals