Statistical Analysis of Case-Control Data of Endometrial Cancer Based on New Asymmetry Models

Background: For the data from the Los Angeles study in Breslow and Day of endometrial cancer and obtained from the 59 matched pairs using four dose levels of conjugated oestrogen, this study proposes new statistical models and gives an easy interpretation, as an approach to assess the data more properly. Methods: Proposing new statistical models for analyzing the endometrial cancer data, we apply them to the data, compare and assess the models considered here. Results: We have found a more preferable model which fits the data better than some existing models. Under the preferable model, we have seen that the average dose of oestrogen for case in a matched pair tends to be more than that for control in the pair. Conclusions: We have proposed two kinds of statistical models and made a conclusion that average dose for case tends to be more than that for control. J o ur na l o f B iometrics & Bistatis t i c s ISSN: 2155-6180 Journal of Biometrics & Biostatistics Citation: Yamamoto K, Tomizawa S (2012) Statistical Analysis of Case-Control Data of Endometrial Cancer Based on New Asymmetry Models. J Biomet Biostat 3:147. doi:10.4172/2155-6180.1000147 J Biomet Biostat ISSN:2155-6180 JBMBS, an open access journal Page 2 of 4 Volume 3 • Issue 5 • 1000147 below, and (2) Gji for i > j indicates that the cumulative probability that the average dose for case in a pair is in category j or below and that for control in the pair is in category i or above. As a model which indicates the structure of asymmetry for {Gij}, i ≠ j, the CLDPS model is defined by


Introduction
Consider the data in Table 1 taken directly from Breslow and Day [1] . These are from the Los Angeles study of endometrial cancer and obtained from the 59 matched pairs using four dose levels of conjugated oestrogen, (1) none, (2) 0.1-0.299 mg, (3) 0.3-0.625 mg, and (4) 0.626+mg (/day). For these data, we are interested in (a) what times the probability that the average dose of oestrogen for case in a matched pair is in category i and that for control in the pair is in category j (<i) is higher than the probability that the average dose for case in the pair is in category j and that for control is in category i (>j), and (b) what times the probability that the average dose for case in a pair is in category i or above and that for control in the pair is in category j (<i) or below is higher than the probability that the average dose for case in the pair is in category j or below and that for control is in category i (>j) or above. Especially, we are interested in what times the probability that the average dose for case in a pair is not zero (i.e., in categories 2, 3, and 4) and that for control in the pair is zero (i.e., in category 1) is higher than the probability that the average dose for case in the pair is zero and that for control is not zero. Namely we are interested in seeing what structure of asymmetry for probabilities there is between the average dose for case in a pair and that for control in the pair.
Agresti considered an asymmetry model, called the linear diagonalsparameter symmetry (LDPS) model [2]. Miyamoto et al. considered an asymmetry model, called the cumulative linear diagonals-parameter symmetry (CLDPS) model [3] , and applied this model to the data in Table 1.
The present paper (1) reviews some asymmetry models, (2) proposes new asymmetry models which are generalizations of the LDPS model and CLDPS model, and (3) analyzes the data in Table 1 using these new models.

Reviews of models
Consider an r × r square contingency table with the same row and column classifications, as Table 1. Let p ij denote the probability that an observation will fall in the i th row and jth column of the table (i = 1,...,r; j = 1,...,r). As a model which indicates the structure of asymmetry for {p ij }, the LDPS model is given as For the endometrial cancer data in Table 1, this model indicates that the probability that the average dose of oestrogen for case in a matched pair is in category i and that for control in the pair is in category j (<i) is δ i-j times higher than the probability that the average dose for case in the pair is in category j and that for control is in category i (> j). If δ > 1, then the average dose of oestrogen for case in a pair tends to be more than that for control in the pair. A special case of the LDPS model obtained by putting δ = 1 is the symmetry (S) model [4,5]. Also the LDPS model with {δ i-j } replaced by {γδ i-j } is the two ratio-parameter symmetry (2RPS) model [6]. A special case of the 2RPS model obtained by putting δ = 1 is the conditional symmetry (CS) model [7].
Let for i > j, ∑∑ ∑∑ For the endometrial cancer data, (1) G ij for i > j indicates that the cumulative probability that the average dose for case in a pair is in category i or above and that for control in the pair is in category j or below, and (2) G ji for i > j indicates that the cumulative probability that the average dose for case in a pair is in category j or below and that for control in the pair is in category i or above.
As a model which indicates the structure of asymmetry for {G ij }, i ≠ j, the CLDPS model is defined by The CLDPS model is different from the LDPS model. For the endometrial cancer data in Table 1, the CLDPS model indicates that the probability that the average dose for case in a pair is in category i or above and that for control in the pair is in category j (< i) or below is Δ i-j times higher than the probability that the average dose for case in the pair is in category j or below and that for control is in category i or above. If Δ > 1, then the average dose for case in a pair tends to be more than that for control in the pair. Also the CLDPS model with {Δ ij } replaced by {ΓΔ i-j } is the cumulative two ratios-parameter symmetry (C2RPS) model [8].

New models
We shall propose two kinds of new models. First, consider a generalization of the LDPS model as follows: for a fixed K(K=0,1,2,..;-1,-2,…), We shall denote this model by LDPS(K). Then the LDPS(0) model is equivalent to the LDPS model. Also the LDPS (-r) model is equivalent to another LDPS model, proposed by Tomizawa [9]. For the endometrial cancer data, the LDPS(K) model indicates that the probability that the average dose of oestrogen for case in a pair is in category i and that for control in the pair is in category j(<i) is δ K+(i-j) times higher than the probability that the average dose for case in the pair is in category j and that for control is in category i(>j). If δ >1 with K ≥ 1, then the average dose for case in a pair tends to be more than that for control in the pair, and the tendency is stronger under the LDPS(K) model than under the LDPS model, because Secondly, consider a generalization of the CLDPS model as follows: for a fixed K (K = 0,1,2,..;-1,-2,…), We shall denote this model by CLDPS(K). Then the CLDPS(0) model is equivalent to the CLDPS model. For the endometrial cancer data, the CLDPS(K) model indicates that the probability that the average dose of oestrogen for case in a pair is in category i or above and that for control in the pair is in category j(<i) or below is ( ) + − ∆ K i j times higher than the probability that the average dose for case in the pair is in category j or below and that for control is in category i or above. If Δ>1 with K ≥ 1, then the average dose for case in a pair tends to be more than that for control in the pair, and the tendency is stronger under the CLDPS(K) model than under the CLDPS model, because The CLDPS(K) model is different from the LDPS(K) model. The CLDPS(K) model indicates how the cumulative probabilities {G ij } for i>j are asymmetric to {G ij }, and the LDPS model indicates how the cell probabilities {P ij } for i>j are asymmetric to {P ij }. For the endometrial cancer data, we are also interested in seeing what times the probability that the average dose of oestrogen for case in a pair is not zero (i.e., in categories 2, 3, and 4) and that for control in the pair is zero (i.e., in category 1) is higher than the probability that the average dose for case in the pair is zero and that for control is not zero. We can see under the CLDPS(K) model that the probability that the average dose for case in a pair is not zero and that for control in the pair is zero is times higher than the probability that the average dose for case in the pair is zero and that for control is not zero, although we cannot see such a structure under the LDPS(K) model. , which is one less than that for the S model and one more than that for the 2RPS (C2RPS) model.

Analysis of Data
We shall analyze the endometrial cancer data in Table 1 using the models in above section. Table 2 gives the values of likelihood ratio test statistic G 2 for each model. Note that the LDPS(0) model is equivalent to the LDPS model, and the CLDPS(0) model is equivalent to the CLDPS model. The S model fits these data poorly. Therefore it is estimated that the probability that the average dose of oestrogen for case in a matched pair is in category i and that for control in the pair is in category j(<i) is not equal to the probability that the average dose for case in the pair is in category j and that for control is in category i(> j).
Among the LDPS(K) models for various K, the LDPS(0) model (i.e., the LDPS model) provides the best-fitting with 5 degrees of freedom, which fits better than the CS model with same 5 degrees of freedom (Table 2).
Also, the LDPS(0) model is a special case of the 2RPS model, obtained by putting γ =1. Since the 2RPS model fits these data well, we shall test the hypothesis of γ =1 (i.e., the hypothesis that the LDPS(0) model holds) under the assumption that the 2RPS model holds. It can be tested according to the difference between the likelihood ratio statistic G 2 for the LDPS(0) model and that for the 2RPS model. The difference is 0.06 with 1 degree of freedom. Therefore we can accept the hypothesis of γ =1 in the 2RPS model, at the 0.05 level (p = 0.806). Thus the LDPS(0) model would be preferable to the 2RPS model for these data.
Next, among the CLDPS(K) models for various K, the CLDPS(3) model provides the best-fitting with 5 degrees of freedom ( Table 2). The CLDPS(3) model fits these data better than the CLDPS(0) model with both 5 degrees of freedom.
Also, the CLDPS(3) model is a special case of the C2RPS model, obtained by putting Γ = Δ 3 . Since the C2RPS model fits these data well, we shall test the hypothesis of Γ = Δ 3 (i.e., the hypothesis that the CLDPS(3) model holds) under the assumption that the C2RPS model holds. The difference between the likelihood ratio statistic G 2 for the CLDPS(3) model and G 2 for the C2RPS model is 0.02 with 1 degree of freedom. Therefore we can accept the hypothesis of Γ = Δ 3 in the C2RPS model, at the 0.05 level (p = 0.888). Thus the CLDPS(3) model is preferable to the C2RPS model for these data. Therefore for the endometrial cancer data in Table 1, the CLDPS(3) model is the bestfitting model among the models given in Table 2. Hence, under the CLDPS(3) model, the probability that the average dose of oestrogen for case in a matched pair is in category i or above and that for control in the pair is in category j(<i) or below is estimated to be 3 ( ) + − ∆ i j times higher than the probability that the average dose for case in the pair is in category j or below and that for control is in category i or above.
Especially, under the CLDPS(3) model, the probability that the average dose for case in a pair is not zero (i.e., in categories 2, 3, and 4) and that for control in the pair is zero (i.e., in category 1) is estimated to be 4.502 ( 4 = ∆ ) times higher than the probability that the average dose for case in the pair is zero and that for control is not zero. Also under the CLDPS(3) model, the probability that the average dose for case in a pair is 0.626+ (mg/day) (i.e., in category 4) and that for control in the pair is zero (i.e., in category 1) is estimated to be 9.552 ( 6 = ∆ ) times higher than the probability that the average dose for case in the pair is zero and that for control is 0.626+ (mg/day).
for i>j, under the CLDPS(3) model it is estimated that the average dose for case in a pair tends to be more than that for control in the pair.

Discussion
For the endometrial cancer data in Table 1

Conclusions
We have proposed two kinds of asymmetry models, namely, the LDPS(K) model and the CLDPS(K) model. The LDPS(K) model is useful for seeing the structure of asymmetry of cell probabilities {P ij }, and the CLDPS(K) model is useful for seeing the structure of asymmetry of cumulative probabilities {G ij } .
For the endometrial cancer data in Table 1, we have seen using the CLDPS(3) model that the average dose of oestrogen for case in a matched pair tends to be more than that for control in the pair; especially, there is the structure of strong asymmetry such that the probability that the average dose for case in a pair is 0.626+ (mg/day) and that for control in the pair is zero is 9.552 times higher than the probability that the average dose for case in the pair is zero and that for control is 0.626+ (mg/day).  Table 2: Values of likelihood ratio chi-squared statistic G 2 for models applied to the data in Table 1. (The symbols * and ** mean significant at the 0.05 and 0.01 levels, respectively).