Received date: June 25, 2012; Accepted date: August 27, 2012; Published date: September 01, 2012
Citation: Oyeka ICA, Okeh UM (2012) A Nonparametric Method for Testing H0 : μ2 = (α/β)μ1 + ∂ J .Biomet Biostat S7-021. doi:10.4172/2155-6180.S7-021
Copyright: ©2012 Oyeka ICA. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
This paper proposes and presents a non-parametric statistical method for the analysis of non-homogeneous two sample data, in which the sampled populations may be measurements on as low as the ordinal scale. The test statistic intrinsically and structurally provides the possible presence of tied observations between the sampled populations, thereby obviating the need to require these populations to be continuous or even numeric. The proposed method can easily be modified for use with data that are not necessarily non-homogeneous. The method is illustrated with some data and shown to compare favorably with some existing methods.
Mann-whitney; Rank; PCV; Smokers; Non-smokers
Suppose a researcher has collected two random samples of sizes m and n from two populations X and Y respectively. The researcher’s interest may be in testing the null hypothesis versus either a two sided or any of the one sided alternatives where α and β are non-zero real numbers and ∂ is any real number including zero. This situation may arise when, for instance, in the health delivery system, interest is in testing the hypothesis, that the effective dosage of a certain treatment drug is at least “c” times that of a control drug, where , or the hypothesis that the bed occupancy ratio of general hospitals is less than “c” times that of private hospitals. In business studies, interest may be in determining whether the cost of a certain line of product in a certain retail shop or market is at least “c” times higher on the average than the cost in another retail shop or market, or whether females on the average earn at least “c” times less than their male counterparts of equal skills. In education and public affairs, interest may be in whether students of certain instructor or candidates under certain panel of judges, score at least “c” times more than students of another instructor or candidates under another panel of judges or whether female students take at least “c” times as many hours as male students take to complete a certain task or whether the rate at which a certain set of trial judges deliver judgments in cases is at most “c” times the rate at which a second set of trial judges deliver judgments during the year etc. In each of these and similar situations the parametric t-test cannot properly be used to test the hypothesis without first using appropriate data transformation. This is because multiplying or dividing the data by a constant (here c ≠ 0) changes the variance of the data by the square of the constant, thereby violating the assumption of homogeneity, necessary for the valid application of the parametric t-test. A non-parametric method often used to test this type of hypothesis is the Mann-Whitney U test. To test the hypothesis one simply lists unchanged all the observations in one of the samples, while multiplying each of the observations in the other sample by the specified constant , then adding or subtracting the constant d= ∂ before listing them together. All the listed observations are now ranked and now subjected to the Mann-Whitney analysis as usual. The Mann- Whitney U test is however encumbered by the problem of ties in the data. When these ties are many, the power of the Mann-Whitney U-test is seriously compromised [1-4].
We here propose a non-parametric method for analyzing these types of data that structurally adjusts for any possible ties in the data.
for i = 1,2,...,m. j = 1,2,..., n.
From Eqn 4, we have that
is not affected by any possible ties between the observations from population X and observations from population Y. The first term has, by the specifications in Equations 1, 2, and 3 been adjusted for any possible ties in the data. If these adjustments had not been made, the presence of any ties in the data would have been completely ignored and possibly erroneously assumed to be absent implying that would be equal to mn from Eqn 3, so that would then have been erroneously automatically been set equal to 1. This would lead to an over estimation of the variance of W with an error that is equal to of its true value when provision has been made for the presence of ties, resulting in the inflation of the variance by of its true value for fairly large m and n (m, n ≥8) increasing for smaller values of m and n. This bias cannot become zero unless there are no ties whatsoever between observations from population X and observations from population Y or the ties are so few that in practice it is reasonable to assume that their effect is negligibly small and can be ignored. This assumption is not necessary here because as we already pointed out the model specifications in Equations 1, 2 and 3 have already taken care of the possibility of any ties that may exist in the data. This is one of the limitations of the Mann-Whitney U test that does not make provisions for the possibility of the existence of ties between the sampled populations. It is therefore unable to use the information available from all the samples but uses only information on the non-tied observations within the sampled populations. This approach will therefore tend to increase the variance of the Mann-Whitney U statistic and hence reduce its efficiency and power .
The probabilities can be easily obtained as the relative frequencies of the occurrence of 1, 0, and -1 in the frequency distribution of the mn values of these numbers in uij, i=1,2,…,m and j=1,2,…,n. Now if we let f+, f0 and f- be respectively the frequencies of occurrence of 1, 0 and -1 in the frequency distribution of the mn values of uij, then
Also from equations 8 and 12, we have that
Testing of hypothesis
Interest may be in testing the null hypothesis that observations from population X are on the average equal to times observations from population Y plus ∂ versus any of the alternative hypothesis that observations from population X are stochastically greater (less) than times observations from population Y plus ∂ or symbolically, Versus say which when expressed in terms of is equivalent to the null hypothesis (Equation 14)
If m and n are sufficiently large (m,n ≥8) such that mn is at least 30,then under the null hypothesis H0, the test statistic
is approximately normally distributed with mean zero and variance 1 and may be used as a test statistic for H0 which may be compared with, an appropriately chosen, z critical value at some α level of significance. H0 is rejected at the α level of significance if
otherwise H0 is accepted.
Or equivalently using the chi-square test
which has approximately the chi-square distribution with 1degree of freedom for sufficiently large m and n. The null hypothesis of Equation 14 is rejected at the α level of significance if
Otherwise the null hypothesis is accepted.
Confidence interval for Π+ - Π-
A 100(1-α)% confidence interval for Π+ - Π- is given as:
That is a 100(1-α)% confidence interval for Π+ - Π- is (19)
Note that the proposed method is relatively more efficient and hence is likely to be more powerful than the Mann-Whitney U test. To show this we have that the relative efficiency of the Mann-Whitney U statistic to the proposed test statistic W is
That is, the proposed test statistic is relatively more efficient and hence likely to be more powerful than the Mann-Whitney U statistic when they are tied observations (Π0 ≥ 0) between the sampled populations, provided the combined sample size is at least 12.
The Packed Cell Volume (PCV) levels measured in percent of random samples of male non-smokers (X) and smokers (Y) from a certain population are presented in Table 1.
Table 1: PCV levels of Non-Smokers (X) and Smokers (Y).
Interest is to test the null hypothesis that the median PCV level of non-smokers is 3 percent higher than 95 percent of the median PCV level of smokers.
To use the proposed method to test the null hypothesis we first list the PCV levels of non-smokers (X) unchanged and then list the PCV levels of smokers (Y) after multiplying each by 0.95 and then adding 0.03 which is here designated as in Table 1.The values of u (Eqn 1) for the illustrative data of Table 1 are shown in Table 2.
Table 2: Values of uij (Eqn 1) for the PCV levels of Table 1 (Smokers= ; Nonsmokers=X).
From Table 2 we have that
Therefore the test statistic for the null hypothesis H0 of equation 14 is from equation 15
which is highly statistically significant. Hence we would reject H0 and conclude that the median PCV level of non-smokers is much higher than that of smokers. For a given α level, a 100(1-α) percent confidence interval for Π+ - Π- for any chosen α level is estimated from Eqn 17 as
It may be instructive to compare the proposed method with the Mann-Whitney U test using the illustrative data. To do this we pool together the PCV levels xi of non smokers and the adjusted PCV levels of smokers and then rank the combined sample observations from the smallest assigned the rank 1 to the largest assigned the rank 32. Tied PCV levels are as usual assigned their mean ranks. The results are shown in Table 3.
|PCV levels of Non smokers (xi)||Rank of xi
|Adjusted PCV levels of smokers||Rank of xj2
|Sum of Ranks (R.j)||346|
Table 3: Ranks of PCV levels of Nonsmokers and smokers for use with the Mann-Whitney U test.
Now based on the rank of smokers, (R.2=Ry) the value of the Mann-Whitney U test is
With mean and estimated variance is given by
While the standard deviation is given by
Hence the Mann-Whitney U test statistic is
(P-value=0.0314) which is statistically significant, again leading to a rejection of the null hypothesis. However, although the two methods here both lead to a rejection of the null hypothesis, the sizes of the attained significant levels indicate that the Mann-Whitney U test at least for the present data is likely to lead to an acceptance of a false null hypothesis (Type II error) more frequently and hence is likely to be less powerful than the proposed method. This is probably because the proposed test statistic unlike the Mann-Whitney U test statistic has intrinsically and structurally provided for the possible presence of ties between the sampled populations and hence is able to use most of the information available in the data set. For the same reason the proposed method is also likely to be more powerful than the median test, which is known to be usually less powerful than the Mann-Whitney U test.
We have in this paper presented a non-parametric statistical method for the analyses of non-homogeneous populations that may be measurements on as low as the ordinal scale and need not be continuous. The proposed test statistic is intrinsically and structurally adjusted to provide for the possibility of tied observations between the sampled populations. The method can also be easily modified and used on two sample data that are not necessarily non-homogeneous. The proposed method is illustrated with some data and shown to be at least as powerful as the Mann-Whitney U test.