Intrinsically Ties Adjusted Sign Test by Ranks

This paper proposes and discusses a non-parametric statistical method for the analysis of paired or matched sample data based on ranks rather than on the raw scores themselves. The proposed method intrinsically and structurally adjusts the test statistic for the possible presence of tied observations between the sampled populations and hence obviates the need to require these populations to be continuous. The number k used in the ranking may be any real number and does not affect the result of the analysis. The proposed method can be used with both numeric and nonnumeric measurements on as low as the ordinal scale and is easily modified for use with one sample data. The method is illustrated with some data and shown to compare favorably with the usual sign test and the Wilcoxon signed rank sum test in cases where these two methods may be equally used in data analysis. Department of Applied Statistics, Nnamdi Azikiwe University, Awka, Nigeria Now let 1 i x be assigned the rank 1 , 0.5, 1 = − − i r k k or k if 1 i x is a higher (larger), the same (equal), or lower (smaller) score or observation than 2 i x . Similarly let 2 i x be assigned the rank 1 , 0.5, 1 = − − i r k k or k , if 2 i x is a higher (larger), the same (equal), or lower (smaller) score or observation than 1 i x for i=1, 2,..., n where ( ) 1 2 , i i x x is the ith pair of sample observations and k is any real number. Research Article Open Access search Article Open Access Citation: Oyeka ICA (2012) Intrinsically Ties Adjusted Sign Test by Ranks. J Biom Biostat 3:154. doi:10.4172/2155-6180.1000154 J Biom Biostat ISSN:2155-6180 JBMBS, an open access journal Page 2 of 4 Volume 3 • Issue 8 • 1000154 Where 0 1 π π π + − + + = (4) Define 1 . = =∑n i i i W r u (5)


Introduction
If a researcher has paired sample data that satisfy the usual assumptions of normality and continuity, then the parametric pairsample t-test may be used to determine whether the sampled populations have equal means or some other measure of central tendency. However, if these assumptions are not satisfied by the data, then the parametric t-test cannot be properly used for data analysis. Use of non-parametric methods is then indicated and preferable. Non-parametric methods that readily suggest themselves include the ordinary sign test for paired samples and the Wilcoxon Signed Rank Sum test [1][2][3]. The problem with these sign tests is that they require the data being analyzed to be continuous numeric measurements, a requirement that often reduces the power of the tests if they are tied observations between the sampled populations. Methods for adjusting for ties if they occur across samples are available but often difficult to use in practice. Some of these methods include ignoring the tied observations and reducing the total sample size accordingly; randomly assigning the tied observations to either one or the other portion of the data dichotomized in some way by some chosen measure of interest; or assigning them their mean ranks [1,[3][4][5]. These approaches do not however always recommend themselves because of the reduction in the power of the resulting test statistic especially if the tied observations are not few. Siegel [6] has developed a method of estimating the effect of ties and correcting them in the test statistic but the calculations are often cumbersome and tedious to use in practical applications. Oyeka and others [5] have developed a method that intrinsically and structurally adjusts the test statistic for the possible presence of ties in the data and hence obviates the need to require the sampled populations to be continuous. However, this method cannot be used without modifications to analyze non-numeric measurements on the ordinal scale. Although the method, because it adjusts for ties, has been shown to be more powerful than the ordinary sign test and the Wilcoxon Signed Rank Sum test, it nevertheless has its limitations in that it is based on the raw data rather than on the ranks of the paired observations themselves, an approach that is expected to further increase the power of the test [2]. In this paper, we propose to develop a modified sign test based on the ranks of the paired sample observations rather than only on the observations themselves and which intrinsically and structurally adjusts for any possible ties in the data, and is available for use with measurements on as low as the ordinal scale whether numeric or non-numeric.

The proposed method
Let ( ) 1 2 , i i x x be the i th pair of observations randomly drawn from populations X 1 and X 2 for i=1, 2,…, n. Populations X 1 and X 2 should be measurements on at least the ordinal scale but they may or may not be: (1) Continuous; (2) Normally distributed; (3) Numerical data or (4) Independent.
Interest here is to develop a statistical method for the analysis of paired sample data that may be non-numeric measurements on at least the ordinal scale using ranks assigned to these measurements. The proposed method makes provisions for the possible presence of any tied observations between the sampled populations. Now let: That is R are respectively the sums of the ranks assigned to sample observations from populations X 1 and X 2 .
That is  . .
Where t is the number of tied observations between populations X 1 and X 2 That is Substituting in equation 9 we have Alternatively, π π π π π π π π π π π π π π π π − + − which is independent of the real number k: In fact it is noted that:

t k k n t k k t t n k k t
That is and are respectively the probabilities that in a randomly selected pair of observations, the observation drawn from population X 1 is on the average higher or greater than the same as (equal to) or lower (smaller) than the observation drawn from population X 2 .
The sample estimates of these probabilities are respectively: in the frequency distribution of the n values of these numbers in , 1, 2,..., . = i u i n A null hypothesis that is often of general interest is that the difference between the medians of the sampled populations is some constant value. In other words, a null hypothesis that may be tested is that the difference between the probability that the observations from one of the sampled populations is on the average greater than the observation from the other sampled population and the probability that it is less is some constant value 0 β , say.
Notationally, the null hypothesis may be expressed as: This null hypothesis may be tested using the test statistic π π π π χ π π π π + − + − or simply Which under H 0 has approximately the chi-square distribution with 1 degree of freedom for sufficiently large n. H 0 is rejected at α the level of significance is 0 otherwise H 0 is accepted.
In particular, under the null hypothesis usually tested in paired sample problems χ π π π π π π π π + − + − The modified sign test W has been shown to be more efficient and hence more powerful than the usual Wilcoxon Signed Rank Sum Test T + [7] .However, the proposed modified sign test by ranks which may for the moment be denoted by W r is also more efficient than the modified sign test statistic W, based on only raw scores. To show this, we note that the relative efficiency of W r to W is π π π π π π π π + − + − Hence, for all sample sizes, the modified sign test by ranks is more efficient and thus more powerful than the corresponding modified sign test based on only raw scores whenever there are tied observations between the sampled populations.

Illustrative example
We here illustrate the proposed method with two examples. The first is on ordinal non-numeric scores and the second is on numeric scores as follows.
1. A health insurance company every year assesses the vital signs of its clients for the purpose of determining the annual insurance premium payable. In this process, the company scores its clients from A + (excellent health) through C (fair health) down to F (poorest health; fail). Persons with excellent health pay the lowest annual health premium while clients with very poor score pay the highest annual premium. The scores earned by a random sample of 15 clients of this health insurance company during the past two consecutive years are as follows: 2. A random sample of members of each of 15 newly married couples (husband and wife) are asked to state their preferred family sizes (desired number of children) with the following results: As noted above, the data of example 1 being ordinal non-numeric measurements may only be analyzed using modified sign tests, using either raw scores or ranks. Thus to analyze the data of example 1 using the proposed method, we assign the rank r i1 =k, k-0.5 or k-1 to the letter grade assigned to a client in year 1 if it is higher than, the same as or lower than his grade of him in year 2. Similarly, we assign the rank r i2 = k, k-0.5 or k-1 if the grade earned by the client in year 2 is higher, the same as or lower than the grade he earned in year 1.The results are presented in table 1 together with the difference between these ranks and other statistics.
Interest is to use the proposed method to determine whether the median scores by clients are the same for the two years, that is if clients are likely to pay equal insurance premium for each of the two years (Equation 13, with 0 0 β = ). To do this, we have from column 4 of table 1 that  Table 2 presents the application of the proposed method to the numeric data of example 2.
To apply the proposed method to the numeric data on family size preferences by couples we have from column 5 of table 2 that which is also statistically significant. However, the relative sizes of the chi-square values and the corresponding attained significance levels show that the proposed modified sign test by ranks is likely to be more powerful than its counterpart based on only raw scores in that the later is likely to accept a false null hypothesis (Type II Error) more frequently than the former test statistic which is able to use more information on the data being analyzed.

Client No
Score in year 1(x i1 ) Score in year 2 (x i2 ) U i (1 if yr 1 score is higher than yr 2;0 if yr 1 same as yr 2;-1 if yr 1 is lower than yr2) Rank of x i1 (r i1 ) Rank of x i2 (r i2 ) Diff. (r i =r i1 -r i2 ) which is again statistically significant although not as strongly significant as the one based on ranks. It may be instructive to analyze the numeric data of example 2 using the paired sample parametric t test and the ordinary sign test for comparative purposes. Using the parametric t test, we have that the sample mean difference in couple family size preference is: statistic for the null hypothesis of equal population medians (H 0 :d 0 =0) is , which is not now statistically significant, a conclusion that is occasioned by the fact that tied observations are excluded in the analysis. Finally, it may also be instructive to compare the results obtained here using the modified sign test with what would have been obtained if the usual Wilcoxon Sign Rank Sum test is used to analyze the data of example 2. To do this, we rank the non-zero absolute differences i d of table 2 from the smallest to the largest. The results are shown in column 11 of table 2. From this column, we have that which with 1 degree of freedom is not now statistically significant, leading to an acceptance of the null hypothesis of equal family size desires by newly married husbands and wives. Thus, the modified sign test by ranks is more likely to reject a false null hypothesis and accept a true one than the usual Wilcoxon Signed Rank Sum Test by ranks. The problem with the Wilcoxon Signed Rank Sum test unlike the modified sign tests is that it ignores tied observations and is based on only non-zero absolute differences which consequently lead to loss of power.

Summary and Conclusion
In this paper, we developed a non-parametric modified sign test for the analysis of paired or matched sample data which accommodates paired ordinal data. The proposed test statistic is intrinsically and structurally adjusted and modified to allow for the possible presence of tied observations between the sampled populations and hence obviate the need to require these populations to be continuous. The number k employed in the ranking of the paired observations may be any real number and does not affect the results of the analysis. The proposed method may be used with both numeric and non-numeric measurements on at least the ordinal scale. The method is also easily modified for use in the analysis of one-sample data. The proposed method is illustrated with some data and shown to be more powerful than the usual sign tests and at least as powerful as the modified sign test based on only raw scores or observations.