Medical, Pharma, Engineering, Science, Technology and Business

^{1}Department of Applied Statistics, Nnamdi Azikiwe University, Awka, Nigeria

^{2}Department of Industrial Mathematics and Applied Statistics, Ebonyi State University, Abakaliki, Nigeria

- *Corresponding Author:
- Okeh UM, B.Sc, M.Sc

Department of Industrial Mathematics and Applied Statistics

Ebonyi State University

Abakaliki, Nigeria

**E-mail:**[email protected]

**Received date:** June 25, 2012; **Accepted date:** August 27, 2012; **Published date:** September 01, 2012

**Citation:** Oyeka ICA, Okeh UM (2012) A Nonparametric Method for Testing H0 : μ2 = (α/β)μ1 + ∂ J .Biomet Biostat S7-021. doi:10.4172/2155-6180.S7-021

**Copyright:** ©2012 Oyeka ICA. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

This paper proposes and presents a non-parametric statistical method for the analysis of non-homogeneous two sample data, in which the sampled populations may be measurements on as low as the ordinal scale. The test statistic intrinsically and structurally provides the possible presence of tied observations between the sampled populations, thereby obviating the need to require these populations to be continuous or even numeric. The proposed method can easily be modified for use with data that are not necessarily non-homogeneous. The method is illustrated with some data and shown to compare favorably with some existing methods.

Mann-whitney; Rank; PCV; Smokers; Non-smokers

Suppose a researcher has collected two random samples of sizes m and n from two populations X and Y respectively. The researcher’s interest may be in testing the null hypothesis versus either a two sided or any of the one sided alternatives where α and β are non-zero real numbers and ∂ is any real number including zero. This situation may arise when, for instance, in the health delivery system, interest is in testing the hypothesis, that the effective dosage of a certain treatment drug is at least “c” times that of a control drug, where , or the hypothesis that the bed occupancy ratio of general hospitals is less than “c” times that of private hospitals. In business studies, interest may be in determining whether the cost of a certain line of product in a certain retail shop or market is at least “c” times higher on the average than the cost in another retail shop or market, or whether females on the average earn at least “c” times less than their male counterparts of equal skills. In education and public affairs, interest may be in whether students of certain instructor or candidates under certain panel of judges, score at least “c” times more than students of another instructor or candidates under another panel of judges or whether female students take at least “c” times as many hours as male students take to complete a certain task or whether the rate at which a certain set of trial judges deliver judgments in cases is at most “c” times the rate at which a second set of trial judges deliver judgments during the year etc. In each of these and similar situations the parametric t-test cannot properly be used to test the hypothesis without first using appropriate data transformation. This is because multiplying or dividing the data by a constant (here c ≠ 0) changes the variance of the data by the square of the constant, thereby violating the assumption of homogeneity, necessary for the valid application of the parametric t-test. A non-parametric method often used to test this type of hypothesis is the Mann-Whitney U test. To test the hypothesis one simply lists unchanged all the observations in one of the samples, while multiplying each of the observations in the other sample by the specified constant , then adding or subtracting the constant d= ∂ before listing them together. All the listed observations are now ranked and now subjected to the Mann-Whitney analysis as usual. The Mann- Whitney U test is however encumbered by the problem of ties in the data. When these ties are many, the power of the Mann-Whitney U-test is seriously compromised [1-4].

We here propose a non-parametric method for analyzing these types of data that structurally adjusts for any possible ties in the data.

(1)

for i = 1,2,...,m. j = 1,2,..., n.

(2)

(3)

Define

(4)

(5)

(6)

From Eqn 4, we have that

(7)

Therefore

(8)

Also

(9)

Now

(10)

(11)

is not affected by any possible ties between the observations from population X and observations from population Y. The first term has, by the specifications in Equations 1, 2, and 3 been adjusted for any possible ties in the data. If these adjustments had not been made, the presence of any ties in the data would have been completely ignored and possibly erroneously assumed to be absent implying that would be equal to mn from Eqn 3, so that would then have been erroneously automatically been set equal to 1. This would lead to an over estimation of the variance of W with an error that is equal to of its true value when provision has been made for the presence of ties, resulting in the inflation of the variance by of its true value for fairly large m and n (m, n ≥8) increasing for smaller values of m and n. This bias cannot become zero unless there are no ties whatsoever between observations from population X and observations from population Y or the ties are so few that in practice it is reasonable to assume that their effect is negligibly small and can be ignored. This assumption is not necessary here because as we already pointed out the model specifications in Equations 1, 2 and 3 have already taken care of the possibility of any ties that may exist in the data. This is one of the limitations of the Mann-Whitney U test that does not make provisions for the possibility of the existence of ties between the sampled populations. It is therefore unable to use the information available from all the samples but uses only information on the non-tied observations within the sampled populations. This approach will therefore tend to increase the variance of the Mann-Whitney U statistic and hence reduce its efficiency and power [3].

**Estimating **

The probabilities can be easily obtained as the relative frequencies of the occurrence of 1, 0, and -1 in the frequency distribution of the mn values of these numbers in u_{ij}, i=1,2,…,m and j=1,2,…,n. Now if we let f^{+}, f^{0} and f^{-} be respectively the frequencies of occurrence of 1, 0 and -1 in the frequency distribution of the mn values of u_{ij}, then

(12)

Also from equations 8 and 12, we have that

(13)

**Testing of hypothesis**

Interest may be in testing the null hypothesis that observations from population X are on the average equal to times observations from population Y plus ∂ versus any of the alternative hypothesis that observations from population X are stochastically greater (less) than times observations from population Y plus ∂ or symbolically, Versus say which when expressed in terms of is equivalent to the null hypothesis (Equation 14)

(14)

If m and n are sufficiently large (m,n ≥8) such that mn is at least 30,then under the null hypothesis H_{0}, the test statistic

(15)

is approximately normally distributed with mean zero and variance 1 and may be used as a test statistic for H_{0} which may be compared with, an appropriately chosen, z critical value at some α level of significance. H_{0} is rejected at the α level of significance if

(16)

otherwise H_{0} is accepted.

Or equivalently using the chi-square test

(17)

which has approximately the chi-square distribution with 1degree of freedom for sufficiently large m and n. The null hypothesis of Equation 14 is rejected at the α level of significance if

(18)

Otherwise the null hypothesis is accepted.

**Confidence interval for Π ^{+} - Π^{-}**

A 100(1-α)% confidence interval for Π^{+} - Π^{-} is given as:

That is a 100(1-α)% confidence interval for Π^{+} - Π^{-} is (19)

Note that the proposed method is relatively more efficient and hence is likely to be more powerful than the Mann-Whitney U test. To show this we have that the relative efficiency of the Mann-Whitney U statistic to the proposed test statistic W is

(20)

That is, the proposed test statistic is relatively more efficient and hence likely to be more powerful than the Mann-Whitney U statistic when they are tied observations (Π^{0} ≥ 0) between the sampled populations, provided the combined sample size is at least 12.

The Packed Cell Volume (PCV) levels measured in percent of random samples of male non-smokers (X) and smokers (Y) from a certain population are presented in **Table 1**.

yi |
48 | 38 | 41 | 49 | 41 | 41 | 46 | 41 | 46 | 42 | 38 | 39 | 37 | 33 | ||||

xi |
38 | 41 | 43 | 44 | 40 | 38 | 40 | 41 | 40 | 43 | 40 | 46 | 42 | 44 | 42 | 38 | 49 | 39 |

45.6 | 36.1 | 39.0 | 46.6 | 39.0 | 39.0 | 43.7 | 39.0 | 43.7 | 39.9 | 36.1 | 37.1 | 35.2 | 31.4 |

**Table 1:** PCV levels of Non-Smokers (X) and Smokers (Y).

Interest is to test the null hypothesis that the median PCV level of non-smokers is 3 percent higher than 95 percent of the median PCV level of smokers.

To use the proposed method to test the null hypothesis we first list the PCV levels of non-smokers (X) unchanged and then list the PCV levels of smokers (Y) after multiplying each by 0.95 and then adding 0.03 which is here designated as in **Table 1**.The values of u (Eqn 1) for the illustrative data of **Table 1** are shown in **Table 2**.

Smokers | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

45.6 | 36.1 | 39.0 | 46.6 | 39.0 | 39.0 | 43.7 | 39.0 | 43.7 | 39.9 | 36.1 | 37.1 | 35.2 | 31.4 | ||

x_{i} |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |

38 | 1 | –1 | 1 | –1 | –1 | –1 | –1 | –1 | –1 | –1 | –1 | 1 | 1 | 1 | 1 |

44 | 2 | –1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |

43 | 3 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

44 | 4 | –1 | 1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

40 | 5 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

38 | 6 | –1 | 1 | –1 | –1 | –1 | –1 | –1 | –1 | –1 | –1 | 1 | 1 | 1 | 1 |

40 | 7 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

41 | 8 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

40 | 9 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

43 | 10 | –1 | 1 | 1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 | |

40 | 11 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

46 | 12 | 1 | 1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

42 | 13 | –1 | 1 | 1 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 |

44 | 14 | –1 | 1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

42 | 15 | –1 | 1 | 1 | –1 | 1 | –1 | 1 | –1 | 1 | 1 | 1 | 1 | 1 | |

38 | 16 | –1 | 1 | –1 | –1 | –1 | –1 | –1 | –1 | –1 | –1 | 1 | 1 | 1 | 1 |

49 | 17 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

39 | 18 | –1 | 1 | 0 | –1 | 0 | 1 | –1 | 0 | –1 | –1 | 1 | 1 | 1 | 1 |

**Table 2:** Values of u_{ij} (Eqn 1) for the PCV levels of Table 1 (Smokers= ; Nonsmokers=X).

From **Table 2** we have that

Therefore the test statistic for the null hypothesis H_{0} of equation 14 is from equation 15

which is highly statistically significant. Hence we would reject H_{0} and conclude that the median PCV level of non-smokers is much higher than that of smokers. For a given α level, a 100(1-α) percent confidence interval for Π^{+} - Π^{-} for any chosen α level is estimated from Eqn 17 as

It may be instructive to compare the proposed method with the Mann-Whitney U test using the illustrative data. To do this we pool together the PCV levels x_{i} of non smokers and the adjusted PCV levels of smokers and then rank the combined sample observations from the smallest assigned the rank 1 to the largest assigned the rank 32. Tied PCV levels are as usual assigned their mean ranks. The results are shown in **Table 3**.

PCV levels of Non smokers (x_{i}) |
Rank of x_{i}( r)_{ix} |
Adjusted PCV levels of smokers | Rank of x_{j2}(r _{jy}) |
---|---|---|---|

38 | 7 | 45.6 | 29 |

44 | 27 | 36.1 | 3.5 |

43 | 22.5 | 39.0 | 11 |

44 | 27 | 46.6 | 31 |

40 | 16.5 | 39.0 | 11 |

38 | 7 | 39.0 | 11 |

40 | 16.5 | 43.7 | 24.5 |

41 | 19 | 39.0 | 11 |

40 | 16.5 | 43.7 | 24.5 |

43 | 22.5 | 39.9 | 14 |

40 | 16.5 | 36.1 | 3.5 |

46 | 30 | 37.1 | 5 |

42 | 20.5 | 35.2 | 2 |

44 | 27 | 31.4 | 1 |

42 | 20.5 | 182 | |

38 | 7 | ||

49 | 32 | ||

39 | 11 | ||

Sum of Ranks (R._{j}) |
346 |

**Table 3:** Ranks of PCV levels of Nonsmokers and smokers for use with the Mann-Whitney U test.

Now based on the rank of smokers, (R._{2}=R_{y}) the value of the Mann-Whitney U test is

With mean and estimated variance is given by

While the standard deviation is given by

Hence the Mann-Whitney U test statistic is

(P-value=0.0314) which is statistically significant, again leading to a rejection of the null hypothesis. However, although the two methods here both lead to a rejection of the null hypothesis, the sizes of the attained significant levels indicate that the Mann-Whitney U test at least for the present data is likely to lead to an acceptance of a false null hypothesis (Type II error) more frequently and hence is likely to be less powerful than the proposed method. This is probably because the proposed test statistic unlike the Mann-Whitney U test statistic has intrinsically and structurally provided for the possible presence of ties between the sampled populations and hence is able to use most of the information available in the data set. For the same reason the proposed method is also likely to be more powerful than the median test, which is known to be usually less powerful than the Mann-Whitney U test.

We have in this paper presented a non-parametric statistical method for the analyses of non-homogeneous populations that may be measurements on as low as the ordinal scale and need not be continuous. The proposed test statistic is intrinsically and structurally adjusted to provide for the possibility of tied observations between the sampled populations. The method can also be easily modified and used on two sample data that are not necessarily non-homogeneous. The proposed method is illustrated with some data and shown to be at least as powerful as the Mann-Whitney U test.

- Freund JE (1992) Mathematical Statistics (5th Edition). Prentice Hall Inc, New York.
- Hollander M, Wolfe DA (1999) Nonparametric Statistical Methods (2nd edition). Wiley, New York.
- Gibbons JD (1970) Nonparametric Statistical Inferences. Mc Graw Hill, New York.
- Oyeka Cyprain A (1996) An Introduction to Applied Statistical Methods (3rd edition). Nobern Avocation Publishing Company Enugu, Nigeria.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- 7th International Conference on
**Biostatistics**and**Bioinformatics**

September 26-27, 2018 Chicago, USA - Conference on
**Biostatistics****and****Informatics**

December 05-06-2018 Dubai, UAE

- Total views:
**11508** - [From(publication date):

specialissue-2013 - Mar 18, 2018] - Breakdown by view type
- HTML page views :
**7781** - PDF downloads :
**3727**

Peer Reviewed Journals

International Conferences
2018-19