Department of Statistics, Federal University of Technology Akure, Akure, Nigeria
Received Date: June 01, 2017; Accepted Date: July 31, 2017; Published Date: August 07, 2017
Citation: Kupolusi JA, Adebola FB (2017) On Application of Development of Test Statistic for Testing Unequal Group Variances. J Appl Computat Math 6: 359. doi: 10.4172/2168-9679.1000359
Copyright: © 2017 Kupolusi JA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Applied & Computational Mathematics
In this paper, an already proposed test-statistic for testing equality of means under unequal population variances is applied. When the group variances differ, using pooled sample variance will give an inappropriate result as a single value for the variances. This kind of problem in statistics is commonly referred to as Brehen-Fisher problem in the k-sample location problems. A proposed unbiased sample harmonic mean of variances 2 HS was examined and found useful for unequal variances which have received a considerable attention in the area of medical and biological sciences. Little or nothing has been achieved in social sciences that form a major part of this work. Data from the six geopolitical zones on road crashes in Nigeria from the year 2004 to 2013 was used to ascertain the consistency of the result with the literature which was found useful and relevant for the proposed developed test statistic. It was observed that using this proposed test statistic, the number of road crashes was significant in some geopolitical zones in Nigeria which was ordinarily latent to pool sample variance.
Statistic; Variances; Harmonic variances; Unequal population variances
The conventional test statistic in ANOVA for testing equality of g population means against non-directional alternative for at least one i, i=1, 2, …, g, is not appropriate under the homogeneity of the variances. Instead, we might be tempted to run all possible pair wise comparisons of the population means. If we assume that all the g distributions are approximately normal with means given by and a common variance σ2 [1-5], we need to run t- test for comparing all pairs of means.
Obviously, this test procedure may be too tedious and time consuming. Besides, a more important but less apparent disadvantage of running multiple t-tests to compare means as stated above is that the probability of falsely rejecting at least one of hypothesis increases as the number of t-test increases (Ott, 1984). This was the origin of the Bonferroni multiple comparison procedure, (Neter and Wasserman, 1974). One difficulty with discussing the Brehrens–Fisher problem and the proposed solutions is that there are various different interpretations of what is meant by the “the Behrens–Fisher Problem”. These differences involve not only what is counted as being a relevant solution [6-8], but even the basic statement of what is being considered.
Solutions to the problem have been presented that make use of either a classical or a Bayesian inference point of view and either solution would be notionally invalid judged from the other point of view. If the consideration is left to classical statistical inference only, there is possibility of seeking solutions to the inference problem that are simple to apply in a practical sense, giving preference to this simplicity over any inaccuracy in the corresponding probability statements. Where exactness of the significance levels of statistical test is required, there may be additional requirement that the procedure should make maximum use of the statistical information in the dataset.
A proposed unbiased sample harmonic mean of variances was examined and found useful for unequal variances, which has received a considerable attention in the area of medical and biological sciences. Little or nothing has be achieved in social science that will form a major part of this work. Data from the six geopolitical zones on road crashes in Nigeria from the year 2004 to 2013 is considered to ascertain the consistency of the result with the literature which was found useful and relevant for the proposed developed test statistic [9-13].
Road traffic crashes have become a re-occurring phenomenon in Nigeria which constitutes a menace in modern times. Although both the developed and developing nations of the world have suffered from varying degrees of road accidents, the developing countries clearly dominates with Nigeria having the second highest rate of road traffic crashes among 193 ranked countries of the world. Deaths from reckless driving are the third leading cause of death in Nigeria. Oladepo and Brieger (1986) argued that three-quarters of all accidents on Nigerian roads involve fatalities.
The earliest proposed solution appeared in a paper by Behrens (1929). Fisher (1935; 1941), while acknowledging some errors in Brehens’ work, claimed to justify Behrens’ solution by the use of fiducial inference. This solution (BF test) consists of comparing the value of the sample statistics with a critical value given by an asymptotic series involving the sample variances and sample sizes. Sukhatme (1938) published tables of the 5% and 1% significance levels of the BF test. In the 1940s, W.G Cochran produced an empirical approximation based on the student’s t-table by an inspection of Sukhatme’s tables. His approximation was passed around by word of mouth and subsequently incorporated into a number of textbooks. This led Cochran in 1964 to publish an account of the accuracy of his approximation for the two-sample problem. For k=2, Cochran (1964) suggested that could be compared to an approximate critical value Where
In a series of papers, Welch (1938; 1947; 1951) disputed Fisher’s use of fiducial inference and rejected the claim that Brehens’ solution had been justified; he presented an approximate solution to the BF problem and published an asymptotic solution which was further studied by Aspin (1948; 1949).
Scheffe (1943) obtained a statistic for the BF problem by minimizing the length of the confidence interval for the difference of the means of two normal populations with unequal variances based on the student’s t distribution. The calculation of his confidence interval involved taking differences between sample values for subtraction was done randomly. The length of his confidence interval depended upon the arrangement of the sample values after random pairing.
Lee and garland (1975) proposed a test for the two-sample case with a critical value that depended on the sample variances, the sample sizes, and the nominal significance level. The computation of the critical value involves the solution of a nonlinear minimization problem. Abidoye et al. (2007) showed that harmonic mean of group variances better represents series of unequal group variances and is estimated by . It was also shown that the sample distribution of is approximated by the chi-square distribution.
Consequently, the test statistic for the hypothesis set in equation (1.1) is
Now p-value= (2.5)
Where is regular t-distribution and r is the appropriate degrees of freedom for the t – test. Abidoye et al. (2013a) showed that harmonic mean of group variances better represents series of unequal group variances and is estimated by . It was revealed that the sample distribution of is approximated by the chi-square distribution.
Consequently, the test statistic for the hypotheses set in equation (2.6) is t=
Now p-value= (2.9.2)
Where λ can be λ1 or λ2 and is the regular t – distribution and r is the appropriate degrees of freedom for the t- test. The degree of freedom r for the harmonic mean of variances have been determined to be r=22.096+0.266(n-g) – 0.000029(n-g)2 in Abidoye et al (2013a)
Method of the proposed test statistic
We are interested in applying a developed procedure to test the hypothesis:
, for at least one i
Where the error term
The hypothesis of equation (1) can be split into two cases that was well explained in Bonferroni test statistic, Dunnett (1964), Guptal et al. (2006) and Abidoye et al. (2007)
Then, equation 1 can be written as
Consequently the hypothesis set is written as
case I or case II ... 4
Suppose that the unbiased estimate of yi is
Abidoye (2012), Abidoye et al. (2013c) and Abodoye et al. (2014)
obtained from the distribution of order statistic.
Distribution of harmonic variance
Abidoye et al. (2013a) explained that harmonic mean of group variances better represents series of unequal group variances and is estimated by . It was shown that the sample distribution of is approximated by the chi-square distribution. His estimation is as shown in the equations below.
Consequently, the test statistic for the hypotheses set in equation (3.9.1) is
Now p-value= (3.9.5)
Where λ can be λ1 or λ2 and is the regular t – distribution and r is the appropriate degrees of freedom for the t- test. The degree of freedom r for the harmonic mean of variances have been determined to be r=22.096+0.266(n-g) – 0.000029 (n-g)2 in Abidoye et al. (2013a)
Secondary data on road crashes was used in this paper collected primarily by the FRSC (Federal Road Safety Commission), Nigeria. The data were grouped into six geopolitical zones (North Central, North East, North West, South East, South South and South West). Below is the table of the data for ten consecutive years covering the period of 2004 to 2013.
By the application of Levene test of equality of variances to Table 1 above, the variances differ from zone to zone which is a violation of assumed equal group variance is presented in literatures.
Table 1: Geopolitical data for road crashes in Nigeria from 2004 to 2013.
Hence, we cannot use the conventional t-test statistic but that which is proposed in this paper. From the data in Tables 1 and 2 the following summary statistic were obtained.
|Levene Statistic||d||d||P – Value|
Table 2: Levene test for variance equality.
North Central (Zone A): =2791.8, =1903122.334, nA=10
North East (Zone B): =1089.7, =89367.57492, nB=10
North West (Zone C): =2150.8, =388122.3962, nC=10
South East (Zone D): =896.1, =91584.97743, nD=10
South East Zone E): =1227.8, =141165.7438, nE=10
SOUTH WEST (ZONE F): = 3299.7, =3861198.669, nF=10.
Therefore, we consider the minimum and maximum difference of means (Table 3) as shown below
Table 3: Table of means.
Y1=2791.8 - 1909.317 =882.483
Y2=1089.7 - 1909.317=-819.617
Then, the minimum difference of means is
From the data set above
The hypothesis to be tested is
Against for at least one i, i.e. i=A, B,…, F
And the test statistic is
Where r=22.096+0.266 (n-g)–0.000029 (n-g)2 as defined in Abidoye et.al (2013)
=22.096 + 0.266 (60-6) – 0.000029(60- 6)2
In this regard, we reject H0 and conclude that average number of road crashes in all the 6 geopolitical zones are significantly different from the overall cases of road crashes at 5% level of significance. Indeed zone 6 (South West) could be the zone for which reported cases of road crash was highest and would need special attention.
Next we consider the maximum difference of means,
The hypothesis to be tested is
This led to the rejection of and conclusion that the mean of road crashes in all the 6 geopolitical zones are not the same at 5% level of significance.
We have applied a developed test statistic by Abidoye et al. (2013) for testing equality of means under unequal population variances to road crashes in the six geopolitical zones in Nigeria, the distribution of road crashes in these zones show that South West has the highest reported cases of road crashes and would need special attention. Since the sample harmonic mean of population variances follows a chi-square distribution, the modified t statistic is appropriated and eliminates the Behren–Fisher’s problem. Hence, sample harmonic mean of population variance is preferred to pooled sample variances of k- sample location problems.