Received Date: October 14, 2010; Accepted Date: November 22, 2010; Published Date: November 24, 2010
Citation: Fernandez-Duran JJ, Gregorio-Dominguez MM (2010) A Likelihood Ratio Test for Homogeneity in Circular Data. J Biomet Biostat 1:107. doi:10.4172/2155-6180.1000107
Copyright: © 2010 Fernandez-Duran JJ, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
Testing for the homogeneity of density functions of circular random variables is useful in many settings including the study of wind patterns, paleocurrents trends, the seasonality in human-related events such as homicides and suicides and the seasonality in the appearance of diseases. In this paper, we considered that the density functions are members of the flexible family of circular distributions based on nonnegative trigonometric (Fourier) sums (series) developed by Fernandez-Duran . We constructed a test based on the likelihood ratio and we applied the proposed test to simulated and real datasets.
Circular data; Nonnegative fourier series
If Θ is a circular random variable, , ten the circular density function based on nonnegative trigonometric sums (the NNTS density), developed by Fernandez-Duran  and based on Fejer , is expressed as
where , ck are the complex numbers ck = ck + icck for k = 0,…, M, and is the conjugate of ck. The following constraint in the c parameter is imposed to to integrate one:
with cc0 = 0 and cr0 = 0, i.e., c0 is a nonnegative real number. The total number of c free parameters is equal to 2M. Also, M is the total number of terms in the sum that defines the NNTS density, which is equal to the maximum number of the modes of the density and is an additional unknown parameter.
Equivalently, the NNTS density can be expressed as:
where for k = 1,2,…, M. Note that for the NNTS family, the k-th trigonometric moment is equal to . The case M = 0 corresponds to a uniform circular density on (0, 2π). The NNTS family of circular distributions is very flexible to model datasets that present multimodality and/or skewness.
The accumulated distribution function of an NNTS density is easily calculated as:
The original support of the NNTS circular distribution is the interval (0, 2π), but in applications that are related to the occurrence of events over time, it is more common to use as support the interval (0,1). In that case, we transform the variable Θ to the variable T by with the density function
and the distribution function
for . In the case that M = 0 the NNTS distribution is equivalent to a uniform distribution and and for.
For the case of grouped data n1, n2, …, nR where ni is the total number of the occurrences of the event of interest in the i-th interval, (Ri-1, Ri) in which (0,1) has been partitioned, i = 0,1, …, R, we consider that there are R such intervals. For example, in the case of using a year as the unit of time and months as the intervals, Ri-1 and Ri are the beginning and end, respectively, of the i-th month in yearly terms and R = 12. In the case of grouped data, the likelihood function, , is given by
Additional properties of the NNTS circular models are presented by Fernandez-Duran [1,3]. The maximum likelihood estimates of the parameters of the NNTS model are obtained using an efficient Newtonlike algorithm that was developed by Fernandez-Duran  and implemented in the statistical software R  in the library CircNNTSR [6,7].
Previous work on homogeneity tests for circular data includes that of Mardia and Spurr  for l-modal and axial von Mises distributions with modes separated by a constant arc. In addition, Mardia  developed tests against shift-type alternatives, based on uniform scores as an extension of Wheeler and Watson . In particular, Rao’s test of homogeneity [11,12] tests for the equality of the mean directions and dispersions of N circular populations by considering the means of the cosine and sine values, and for j = 1, …, N, by comparing the values of for testing the equality of the mean directions, and for testing the equality of the dispersions . Watson’s two-sample test of homogeneity uses the squared differences between the sample distribution functions as test statistic . Tests for the equality of the mean directions and dispersion parameters of von Mises populations that were developed by Mardia , Watson and Williams , Stephens , Watson , and Fisher  are similar to the homogeneity tests. Similarly, Harrison et al. [19,20], Stephens [21,22] have developed two-way and multi-way ANOVAs for circular data. Recently, Grimshaw et al. , Chen et al. , and Fu et al.  developed homogeneity tests for mixtures of von Mises distributions using likelihood ratio tests. In contrast to previous work that used von Mises or mixtures of von Mises densities as the population densities, the main objective of the present paper was to develop a likelihood ratio test for the homogeneity of circular populations by considering their density functions as members of the flexible NNTS family for ungrouped and grouped observations.
This paper is divided into five sections, including this introduction. The second section outlines the development of the likelihood ratio test for homogeneity. In the third section, the proposed test is applied to simulated and real datasets. The fourth section includes a simulation exercise to study the type I error and power of the proposed test. Lastly, in the fifth section, the conclusions of the present work are presented.
Likelihood ratio homogeneity test
Ungrouped observation: Let for k = 1, …, N be independent random samples from N different continuous circular populations, and let Fk (θ)k and fk(θ) be their respective distribution (density) functions. A test of homogeneity has a null hypothesis given as
and an alternative hypothesis given as
Hα: F1, …, FN are not all the same (Hα: f1, …, fN are not all the same).
By considering that for k = 1, …, N, Fk (fk) is a member of the NNTS family of the circular distributions with parameters Mk and ck, a test for homogeneity can be constructed by considering the likelihood ratio, . This ratio is defined as
where is the maximum of the likelihood function over the restricted parameter space specified by H0, andis the maximum likelihood estimate under the null hypothesis. Note that we are considering that Mk = M0 for k = 1, .., N and that the value of M0 is specified before applying the test. The researcher can select a value for M0 by considering the maximum number of modes in all of the considered populations (taking into account the sample sizes in each population) because the NNTS models are nested with respect to the increasing values of the parameter M. Now, is the maximum of the likelihood function over the complete parameter space, and are the unrestricted maximum likelihood estimates that corresponds to the unrestricted maximum likelihood estimate for the parameter in each of the N considered circular populations. The likelihood function for the ungrouped data under the null hypothesis is calculated as
where is the common NNTS circular density function for all of the observations in all of the considered populations. The likelihood under the complete parameter space corresponds to
where corresponds to the NNTS circular density of the k-th population, with parameters M0 and ck. Maximizing is equivalent to maximizing each of the likelihood functions for each population, with respect to the corresponding k parameters for k = 1, …, N.
By the likelihood asymptotic theory, converges to a chisquared distributed random variable with a number of degrees of freedom that is equal to the difference in the number of the free parameters in the unrestricted and restricted parameter spaces.
Grouped observations: In the case that the observed data consists of grouped observations, i.e., only the total number of occurrences of the circular random variables in the different intervals is observed, then, the likelihood function for the null hypothesis of homogeneity is calculated as
where nki is the total number of observations of population k in the i-th interval in population k, (Rk,i-1, Rki). The total number of grouping intervals for population k is denoted by Ik. Note that the grouping intervals may be different for each population. The likelihood function for the whole parameter space is given by
In practice, the grouping intervals are commonly the same for all of the observed populations.
Ungrouped simulated data
Simulated von mises data: A random sample of 100 observations from a von Mises density, with a location parameter µ = 3.1962 and a dispersion parameter =1.5510, was generated. We divided the dataset in two. The first dataset included the first 50 observations, and the second included the last 50 observations. The results of applying the NNTS homogeneity likelihood ratio test to these datasets are included in Table 1. The NNTS models of order M0 = 0,1, …, 6 were fitted to each dataset of 50 observations (Population 1 and Population 2) and to the complete dataset of 100 observations (Populations 1 and 2). The columns in Table 1 contain the values of the maximized log-likelihood, the observed value of the likelihood ratio statistic , the number of the degrees of freedom , and the corresponding p-values when comparing the observed test statistic with the distribution of a chi-squared random variable. For M0 = 1,2, …, 6, the observed values of the NNTS likelihood ratio statistic resulted, as expected, in the decision of not rejecting the null hypothesis of homogeneity.
|M0||Population 1||Population 2||Populations 1 and 2||-2ln(Λ)||d.f.||χ2p-value|
Table 1: Results of the NNTS likelihood ratio test for homogeneity when applied to two samples of 50 data points from a von Mises population with µ = 3.1962 and κ = 1.5510.
Simulated circular uniform data: We generated seven random samples of 50 data points from a circular uniform distribution. Table 2 contains the results of the NNTS likelihood ratio test. For all of the considered values of M0, the null hypothesis of homogeneity was not rejected.
|M0||Pop. 1||Pop. 2||Pop. 3||Pop. 4||Pop. 5||Pop. 6||Pop. 7||Pops. 1 to 7||-2ln(Λ)||d.f.||χ2 p-value|
Table 2: Results of the NNTS likelihood ratio test for homogeneity when applied to seven samples of 50 data points from a circular uniform distribution.
Ungrouped real data
Hurricane occurrence data: Recently, there has been debate about whether an increase exists in the number of tropical storms and hurricanes that occur yearly in the North Atlantic Ocean. Instead of analyzing the possibility of an increase in the number of tropical storms and hurricanes, we analyzed whether there is evidence of a change in the start date of the tropical storms and hurricanes. The starting dates of the tropical storms and hurricanes were obtained from http://weather.unisys.com/archive/index.html. This website includes tropical storms and hurricanes that occurred between 1851 and 2008 and is based on the HURDAT database of the National Oceanic and Atmospheric Administration (NOAA). The dates were converted to numbers between 0 and 1 to represent the fraction of the year that elapsed at the starting date of the tropical storm or hurricane. These values were then multiplied by 2π to convert them to circular data. Then, we compared the starting dates of the storms that occurred between 1951 and 1970 to those that occurred between 1971 and 2008. Table 3 contains the results of the NNTS homogeneity test. For all of the considered values of M0, we did not reject the null hypothesis of homogeneity.
Table 3: Results of the NNTS homogeneity test for the hurricane data.
Wind direction data: Mardia and Jupp  analyzed a dataset of the wind directions in degrees at Gorleston, England, between 11 a.m. and 12 p.m. on Sundays in 1968, as classified by seasons. The data is shown in Table 4.
Table 4: Wind directions in degrees at Gorleston, England .
The authors applied a homogeneity test to the data based on the ranks of the angles and the uniformity test of Rayleigh, known as the uniform scores test , to this dataset. They concluded that the null hypothesis was rejected at the 5% level (p-value=0.046). We applied the NNTS homogeneity test to this dataset, and the results are shown in Table 5.
Table 5: NNTS homogeneity tests: Results for the wind directions dataset (Mardia and Jupp, 2000, pp. 137).
In contrast to the uniform score test results of Mardia and Jupp , we did not reject the null hypothesis of homogeneity for the considered values of M0. This contrast may be due to the small sample sizes.  also analyzed this dataset using the von Mises model and concluded that it was not possible to reject the null hypotheses of equal mean directions and equal concentration parameters.  also analyzed this dataset. By applying a uniform score test for the Winter, Spring, and Autumn data, Fisher concluded that there was little evidence that the wind distributions in these three seasons differed.
Himalayan molasse data: Fisher  analyzed two samples of cross-bed measurements that were collected from the Himalayan molasse in Pakistan . The first sample consisted of 35 measurements of the Rakhi Nala ripple cross-beds, and the second sample consisted of 104 measurements of the Chaudhwan Zam large bedforms. Fisher suggested that the samples were drawn from von Mises distributions with different shapes but not necessarily with different mean directions and applied a bootstrap method test for different means. Fisher did not find evidence that the mean directions differed. The results of the NNTS homogeneity test are shown in Table 6. For the selected M0 values equal to 2 and 3, the null hypothesis of homogeneity was rejected at the 1% significance level. For M0 = 4,5 and 6, the null hypothesis was rejected at the 5% significance level, and for M0 = 1, the null hypothesis was not rejected because the p-value was equal to 0.3465. This example illustrates the necessity to select a sensible value for M0 before applying the test.
|M0||set 1||set2||combined set (1 and 2)||-2ln(Λ)||d.f.||χ2p-value|
Table 6: NNTS homogeneity tests results for the cross-bed measurements from the Himalayan molasse dataset (Wells, 1990).
Grouped real data
Suicides and homicides data: The Mexican Statistical Agency (INEGI) reports the monthly number of suicides and homicides in Mexico every year. To compare the monthly patterns of the number of suicides and homicides, we applied the NNTS homogeneity test for the monthly numbers of suicides and homicides for the years of 2005 and 2007. Table 7 shows the monthly grouped raw data.
Table 7:Suicides and homicides in Mexico for the years of 2005 and 2007.
Table 8 shows the results of applying the NNTS homogeneity test to the number of suicides and homicides in Mexico in 2005. The null hypothesis of homogeneity was rejected at the 5% significance level when using M0 = 1,2,3,4,5 and 6. Note that the case M0 = 6 corresponds to the saturated model.
|M0||Suicides 2005||Homicides 2005||Suicides and Homicides 2005||-2ln(Λ)||d.f.||p-value|
Table 8: NNTS homogeneity test results: Suicides and homicides in Mexico in 2005.
For the data from the year 2007, Table 9 shows the results of the NNTS homogeneity test where, for M0 = 1,2,4, and 6 the test did not reject the null hypothesis of homogeneity. Only for M0 = 3 was the null hypothesis of homogeneity rejected at the 5% significance level. The change in the seasonal patterns of suicides and homicides from 2005 to 2007 may be the result of the mis-reporting, over-reporting, or under-reporting of these crimes.
|M0||Suicides 2007||Homicides 2007||Suicides and Homicides 2007||-2ln(Λ)||d.f.||χ2p-value|
Table 9: NNTS homogeneity test results: Suicides and homicides in Mexico in 2007.
Power and type I error of the NNTS test of homogeneity
To study the power of the NNTS test of homogeneity and its type I error rate, we simulated data from three different models: a uniform circular distribution, a von Mises distribution with µ = 3.1962, and = 1.5510, and NNTS density with M = 4. Figure 1 presents the graphs for the considered models. We generated 100 simulated datasets of sample sizes equal to 20, 50, and 100 for each of the three models. When comparing the simulated datasets from the different models, we studied the power of the test (the probability that the test will reject a false null hypothesis of homogeneity), and when comparing the simulated datasets from the same model, we studied the type I error of the test (the probability that the test will reject a true null hypothesis of homogeneity).
Table 10 contains the values of the power of the test that were obtained from simulating 100 different datasets for the considered models, applying the NNTS homogeneity likelihood ratio test and counting the number of times that the NNTS test correctly did not reject the null hypothesis of homogeneity for the different values of M0 and the statistical significance of 1%, 5% and 10%. For comparison purposes, we also applied Watson’s two-sample test of homogeneity and Rao’s test of homogeneity for the mean directions and dispersions using the R library circular . When we tested a uniform population against a von Mises or NNTS population, the NNTS and Watson’s tests gave 100% correct decisions, but Rao’s test had a lower level of power specifically when testing for the equality of mean directions. The results when using Rao’s test are a consequence of the fact that the mean values of the simulated models are very similar (see Figure 1); however, there are also cases with low power when testing for the equality of dispersions using Rao’s test. When comparing von Mises and NNTS populations in two datasets that have a sample size equal to 20, Watson and Rao’s tests have higher power than the NNTS test. By increasing the sample sizes, the power of the NNTS test increases much more quickly than the Watson’s test and Rao’s test. When both sample sizes are equal to 100, the NNTS has a much higher power than Watson’s test and Rao’s test. Generally, the maximum power the NNTS test is obtained when M0 = 4, which corresponds to the true M value.
|Model 1||n1||Model 2||n2||α||M0 = 1||2||3||4||5||6||7||8||9||10||Watson||Rao Mean||Rao Disp.|
Table 10: NNTS homogeneity test: Statistical power
Table 11 contains the type I errors that were obtained after simulating 100 different datasets of the considered models, applying the NNTS homogeneity likelihood ratio test, and counting the number of times that the NNTS test erroneously rejected the null hypothesis of homogeneity for the different values of M0. For comparison, we also applied Watson’s two-sample test of homogeneity and Rao’s test of homogeneity to the mean directions and dispersions. When testing a uniform population against another uniform population, the NNTS yielded probabilities of a type I errors that were equal to zero. In contrast, the Watson and Rao’s tests had higher probabilities of type I errors. When applying the homogeneity tests to the datasets that were simulated from the von Mises density, the type I errors for the NNTS tests were similar or smaller than the errors from the Watson and Rao’s tests. Lastly, when testing for homogeneity in the NNTS populations, the type I errors of the NNTS test and Watson and Rao’s tests are similar. Note that the type I errors of the NNTS test were congruent with the significance levels used to perform the test.
|Model||n1||n2||α||M0 = 1||2||3||4||5||6||7||8||9||10||Watson||Rao Mean||Rao Disp.|
Table 11: NNTS homogeneity test: Type I errors.
For the grouped data, we simulated 100 datasets of sizes 20, 50, and 100 for two circular populations in the interval (0,1) that were partitioned on subintervals of equal length . Table 12 presents the probabilities for each model. For Model 1, the probabilities were obtained using the formula for a simple sinusoidal curve  , and for Model 2,  for k = 1, …, 12 were used. Based on the results in Table 13 and Table 14, where n1 and n2 are the number of observations that were simulated from models n1 and n2, respectively, the NNTS test for homogeneity presented good power and type I errors that were congruent with the considered significance levels. As expected, the NNTS model with M0 = 1 presented the best power because we were simulating from simple sinusoidal models.
|Subinterval||Model 1||Model 2|
Table 12: Probabilities of the simulated models for the grouped data.
|n1||n2||α||M0 = 1||2||3||4||5||6|
Table 13: NNTS homogeneity test for grouped data: Statistical power.
|n1||n2||α||M0 = 1||2||3||4||5||6|
Table 14: NNTS homogeneity test for grouped data: Type I Errors.
The family of circular distributions that is based on nonnegative trigonometric sums (NNTS) is able to model circular populations that present multimodality and/or skewness. This flexibility of the NNTS models makes them suitable candidates for the construction of homogeneity tests for circular data when samples from different circular populations are used to test the equality of the populations’ circular distributions. In this paper, a likelihood ratio homogeneity test for circular data was constructed by considering that the distributions of the different populations are or can be approximated by members of the NNTS family. Importantly, this test can be applied to ungrouped or grouped observations from these populations. The test was applied to simulated data, confirming the suitability of the test. Also, the test was applied to interesting real datasets to obtain conclusions about the homogeneity of the circular populations. A simulation study showed that the type I error and the power of the proposed test are similar or better than those of the Watson’s test and Rao’s test for the tested populations (uniform, von Mises and NNTS).
The authors thank the referees for their useful comments and are grateful to the Asociación Mexicana de Cultura A.C. for its support.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals