A Likelihood Ratio Test for Homogeneity in Circular Data

For the case of grouped data n1,n2,...,nR where ni is the total number of the occurrences of the event of interest in the i-th interval, (Ri-l, Ri) in which (0,1) has been partitioned, i = 0,1,..,R, we consider that there are R such intervals. For example, in the case of using a year as the unit of time and months as the intervals, Ri-l and Ri are the beginning and end, respectively, of the i-th month in yearly terms and R = 12. In the case of grouped data, the likelihood function, 1 2 ( ) R L M c n n ... n , , , , , is given by


Introduction
If Θ is a circular random variable, Θ ∈(0,2π), then the circular density function based on nonnegative trigonometric sums (the NNTS density), developed by Fernandez-Duran [1] and based on Fejer [2], is expressed as With c c0 = 0 and c r0 ≥ 0, i.e., c 0 is a nonnegative real number. The total number of c free parameters is equal to 2M. Also, M is the total number of terms in the sum that defines the NNTS density, which is equal to the maximum number of the modes of the density and is an additional unknown parameter.
Equivalently, the NNTS density can be expressed as for k = 1,2,…,M. Note that for the NNTS family, the k-th trigonometric moment is equal to E(e ikθ ) = a k + ib k . The case M = 0 corresponds to a uniform circular density on (0,2π). The NNTS family of circular distributions is very flexible to model datasets that present multimodality and/or skewness.
The accumulated distribution function of an NNTS density is easily calculated as: The original support of the NNTS circular distribution is the interval (0,2π), but in applications that are related to the occurrence of events over time, it is more common to use as support the interval (0,1). In that case, we transform the variable Θ to the variable T by For t∈(0,1). In the case that M = 0 the NNTS distribution is equivalent to a uniform distribution and f T (t;M=1) = 1 and f T (t;M=0) = t for t∈(0,1).
For the case of grouped data n 1 ,n 2 ,…,n R where n i is the total number of the occurrences of the event of interest in the i-th interval, (R i-l , R i ) in which (0,1) has been partitioned, i = 0,1,..,R, we consider that there are R such intervals. For example, in the case of using a year as the unit of time and months as the intervals, R i-l and R i are the beginning and end, respectively, of the i-th month in yearly terms and R = 12. In the case of grouped data, the likelihood function, 1 2 ( ) R L M c n n … n , , , , , is given by ( ) Additional properties of the NNTS circular models are presented by Fernandez-Duran [1,3]. The maximum likelihood estimates of the parameters of the NNTS model are obtained using an efficient Newtonlike algorithm that was developed by Fernandez-Duran [4] and implemented in the statistical software R [5] in the library CircNNTSR [6,7].
Previous work on homogeneity tests for circular data includes that of Mardia and Spurr [8] for l-modal and axial von Mises distributions with modes separated by a constant arc. In addition, Mardia [9] developed tests against shift-type alternatives, based on uniform scores as an extension of Wheeler and Watson [10]. In particular, Rao's test of homogeneity [11,12] tests for the equality of the mean directions and dispersions of N circular populations by considering the means of the j j U C S = + for testing the equality of the dispersions [13]. Watson's two-sample test of homogeneity uses the squared differences between the sample distribution functions as test statistic [14]. Tests for the equality of the mean directions and dispersion parameters of von Mises populations that were developed by Mardia [9], Watson and Williams [15], Stephens [16], Watson [17], and Fisher [18] are similar to the homogeneity tests. Similarly, Harrison et al. [19,20], Stephens [21,22] have developed two-way and multi-way ANOVAs for circular data. Recently, Grimshaw et al. [23], Chen et al. [24], and Fu et al. [25] developed homogeneity tests for mixtures of von Mises distributions using likelihood ratio tests. In contrast to previous work that used von Mises or mixtures of von Mises densities as the population densities, the main objective of the present paper was to develop a likelihood ratio test for the homogeneity of circular populations by considering their density functions as members of the flexible NNTS family for ungrouped and grouped observations. This paper is divided into five sections, including this introduction. The second section outlines the development of the likelihood ratio test for homogeneity. In the third section, the proposed test is applied to simulated and real datasets. The fourth section includes a simulation exercise to study the type I error and power of the proposed test. Lastly, in the fifth section, the conclusions of the present work are presented. and an alternative hypothesis given as H a : F 1 ,…,F N are not all the same (H a : f 1 ,…,f N are not all the same). By considering that for k = 1,…,N, F k (f k ) is a member of the NNTS family of the circular distributions with parameters M k and c k , a test for homogeneity can be constructed by considering the likelihood ratio, . This ratio is defined as are the unrestricted maximum likelihood estimates that corresponds to the unrestricted maximum likelihood estimate for the parameter c in each of the N considered circular populations. The likelihood function for the ungrouped data under the null hypothesis is calculated as

Likelihood ratio homogeneity test
is the common NNTS circular density function    Winter  50  120  190  210  220  250  260  290  290  320  320  340  Spring  0  20  40  60  160  170  200  220  270  290  340  350  Summer  10  10  20  20  30  30  40  150  150  150  170  190  290  Autumn  30  70  110  170  180  190  240  250  260  260 290 350 for all of the observations in all of the considered populations. The likelihood under the complete parameter space corresponds to corresponds to the NNTS circular density of the k-th population, with parameters M 0 and c k . Maximizing By the likelihood asymptotic theory, −2ln(Λ) converges to a chi-squared distributed random variable with a number of degrees of freedom that is equal to the difference in the number of the free parameters in the unrestricted and restricted parameter spaces.

Grouped observations:
In the case that the observed data consists of grouped observations, i.e., only the total number of occurrences of the circular random variables in the different intervals is observed, then, the likelihood function for the null hypothesis of homogeneity is calculated as where n ki is the total number of observations of population k in the i-th interval in population k, (R k,i-1 , R ki ). The total number of grouping intervals for population k is denoted by I k . Note that the grouping intervals may be different for each population. The likelihood function for the whole parameter space is given by In practice, the grouping intervals are commonly the same for all of the observed populations.

Examples
Ungrouped simulated data Simulated von mises data: A random sample of 100 observations from a von Mises density, with a location parameter µ=3.1962 and a dispersion parameter =1.5510, was generated. We divided the dataset in two. The first dataset included the first 50 observations, and the second included the last 50 observations. The results of applying the NNTS homogeneity likelihood ratio test to these datasets are included in Table  1. The NNTS models of order M 0 = 0,1,…,6 were fitted to each dataset of 50 observations (Population 1 and Population 2) and to the complete dataset of 100 observations (Populations 1 and 2). The columns in Table  1 contain the values of the maximized log-likelihood, the observed value of the likelihood ratio statistic (−2ln(Λ)), the number of the degrees of freedom (d.f.), and the corresponding p-values when comparing the observed test statistic with the distribution of a chi-squared random variable. For M 0 = 1,2,…,6, the observed values of the NNTS likelihood ratio statistic resulted, as expected, in the decision of not rejecting the null hypothesis of homogeneity.
Simulated circular uniform data: We generated seven random samples of 50 data points from a circular uniform distribution. Table  2 contains the results of the NNTS likelihood ratio test. For all of the considered values of M 0 , the null hypothesis of homogeneity was not rejected.

Ungrouped real data
Hurricane occurrence data: Recently, there has been debate about whether an increase exists in the number of tropical storms and hurricanes that occur yearly in the North Atlantic Ocean. Instead of analyzing the possibility of an increase in the number of tropical storms and hurricanes, we analyzed whether there is evidence of a change in the start date of the tropical storms and hurricanes. The starting dates of the tropical storms and hurricanes were obtained from http://weather. unisys.com/archive/index.html. This website includes tropical storms and hurricanes that occurred between 1851 and 2008 and is based on the HURDAT database of the National Oceanic and Atmospheric Administration (NOAA). The dates were converted to numbers between 0 and 1 to represent the fraction of the year that elapsed at the starting date of the tropical storm or hurricane. These values were then multiplied by 2π to convert them to circular data. Then, we compared the starting dates of the storms that occurred between 1951 and 1970 to those that occurred between 1971 and 2008. Table 3 contains the results of the NNTS homogeneity test. For all of the considered values of M 0 , we did not reject the null hypothesis of homogeneity.
Wind direction data: Mardia and Jupp [26] analyzed a dataset of the wind directions in degrees at Gorleston, England, between 11 a.m. and 12 p.m. on Sundays in 1968, as classified by seasons. The data is shown in Table 4.
The authors applied a homogeneity test to the data based on the ranks of the angles and the uniformity test of Rayleigh, known as the uniform scores test [27], to this dataset. They concluded that the null hypothesis was rejected at the 5% level (p-value=0.046). We applied the NNTS homogeneity test to this dataset, and the results are shown in Table 5.    In contrast to the uniform score test results of Mardia and Jupp [26], we did not reject the null hypothesis of homogeneity for the considered values of M 0 . This contrast may be due to the small sample sizes. Mardia and Jupp [26] also analyzed this dataset using the von Mises model and concluded that it was not possible to reject the null hypotheses of equal mean directions and equal concentration parameters. Fisher [28] also analyzed this dataset. By applying a uniform score test for the Winter, Spring, and Autumn data, Fisher concluded that there was little evidence that the wind distributions in these three seasons differed.
Himalayan molasse data: Fisher [28] analyzed two samples of cross-bed measurements that were collected from the Himalayan molasse in Pakistan [29]. The first sample consisted of 35 measurements of the Rakhi Nala ripple cross-beds, and the second sample consisted of 104 measurements of the Chaudhwan Zam large bedforms. Fisher suggested that the samples were drawn from von Mises distributions with different shapes but not necessarily with different mean directions and applied a bootstrap method test for different means. Fisher did not find evidence that the mean directions differed. The results of the NNTS homogeneity test are shown in Table 6. For the selected M 0 values equal to 2 and 3, the null hypothesis of homogeneity was rejected at the 1% significance level. For M 0 = 4,5 and 6, the null hypothesis was rejected at the 5% significance level, and for M 0 = 1, the null hypothesis was not rejected because the p-value was equal to 0.3465. This example illustrates the necessity to select a sensible value for M 0 before applying the test.  Table 7 shows the monthly grouped raw data. Table 8 shows the results of applying the NNTS homogeneity test to the number of suicides and homicides in Mexico in 2005. The null hypothesis of homogeneity was rejected at the 5% significance level when using M 0 = 1,2,3,4,5 and 6. Note that the case M 0 = 6 corresponds to the saturated model.
For the data from the year 2007, Table 9 shows the results of the NNTS homogeneity test where, for M 0 = 1,2,4, and 6 the test did not reject the null hypothesis of homogeneity. Only for M 0 = 3 was the null hypothesis of homogeneity rejected at the 5% significance level. The change in the seasonal patterns of suicides and homicides from 2005 to 2007 may be the result of the mis-reporting, over-reporting, or underreporting of these crimes.

Power and type I error of the NNTS test of homogeneity
To study the power of the NNTS test of homogeneity and its type I error rate, we simulated data from three different models: a uniform       100  100  100  100  100  100  100  93  100  0.1  100  100  100  100  100  100  100  95  100  100  0.01  100  100  100  100  100  100  100  100  100  0.05  100  100  100  100  100  100  100  100  100  0.1  100  100  100  100  100  100  100  100  100  Uniform  100  von Mises  20  0.01  100  100  100  100  31  92  0.05  100  100  100  100  54  100  0.1  100  100  100  100  63  100  50  0.01  100  100  100  100  100  100  100  73  100  0.05  100  100  100  100  100  100  100  85  100  0.1  100  100  100  100  100  100  100  90  100  100  0.01  100  100  100  100  100  100  100  100  100  100  100  95  100  0.05  100  100  100  100  100  100  100  100  100  100  100  97  100  0.1  100  100  100  100  100  100  100  100  100  100  100  100  100  NNTS  20  0.01  100  100  100  100  63  78  0.05  100  100  100  100  77  95  0.1  100  100  100  100  82  98  50  0.01  100  100  100  100  100  100  100  87  100  0.05  100  100  100  100  100  100  100  95  100  0.1  100  100  100  100  100  100  100  98  100  100  0.01  100  100  100  100  100  100  100  100  100  100  100  99  100  0.05  100  100  100  100  100  100  100  100  100  100  100  100  100  0.1  100  100  100  100  100  100  100  100  100  100  100  100  100     circular distribution, a von Mises distribution with µ = 3.1962, and κ = 1.5510, and NNTS density with M = 4. Figure 1 presents the graphs for the considered models. We generated 100 simulated datasets of sample sizes equal to 20, 50, and 100 for each of the three models. When comparing the simulated datasets from the different models, we studied the power of the test (the probability that the test will reject a false null hypothesis of homogeneity), and when comparing the simulated datasets from the same model, we studied the type I error of the test (the probability that the test will reject a true null hypothesis of homogeneity). Table 10 contains the values of the power of the test that were obtained from simulating 100 different datasets for the considered models, applying the NNTS homogeneity likelihood ratio test and counting the number of times that the NNTS test correctly did not reject the null hypothesis of homogeneity for the different values of M 0 and the statistical significance of 1%, 5% and 10%. For comparison purposes, we also applied Watson's two-sample test of homogeneity and Rao's test of homogeneity for the mean directions and dispersions using the R library circular [30]. When we tested a uniform population against a von Mises or NNTS population, the NNTS and Watson's tests gave 100% correct decisions, but Rao's test had a lower level of power specifically when testing for the equality of mean directions. The results when using Rao's test are a consequence of the fact that the mean values of the simulated models are very similar (see Figure 1); however, there are also cases with low power when testing for the equality of dispersions using Rao's test. When comparing von Mises and NNTS populations in two datasets that have a sample size equal to 20, Watson and Rao's tests have higher power than the NNTS test. By increasing the sample sizes, the power of the NNTS test increases much more quickly than the Watson's test and Rao's test. When both sample sizes are equal to 100, the NNTS has a much higher power than Watson's test and Rao's test. Generally, the maximum power the NNTS test is obtained when M 0 = 4, which corresponds to the true M value. For the grouped data, we simulated 100 datasets of sizes 20, 50, and 100 for two circular populations in the interval (0,1) that were partitioned on subintervals of equal length 1 12 . Table 12 presents the probabilities for each model. For Model 1, the probabilities were obtained using the formula for a simple sinusoidal curve [31] Tables 13 and 14, where n 1 and n 2 are the number of observations that were simulated from models 1 and 2, respectively, the NNTS test for homogeneity presented good power and type I errors that were congruent with the considered significance levels. As expected, the NNTS model with M 0 = 1 presented the best power because we were simulating from simple sinusoidal models.

Conclusions
The family of circular distributions that is based on nonnegative trigonometric sums (NNTS) is able to model circular populations that present multimodality and/or skewness. This flexibility of the NNTS models makes them suitable candidates for the construction of homogeneity tests for circular data when samples from different circular populations are used to test the equality of the populations'   circular distributions. In this paper, a likelihood ratio homogeneity test for circular data was constructed by considering that the distributions of the different populations are or can be approximated by members of the NNTS family. Importantly, this test can be applied to ungrouped or grouped observations from these populations. The test was applied to simulated data, confirming the suitability of the test. Also, the test was applied to interesting real datasets to obtain conclusions about the homogeneity of the circular populations. A simulation study showed that the type I error and the power of the proposed test are similar or better than those of the Watson's test and Rao's test for the tested populations (uniform, von Mises and NNTS).