S Coefficient of Concordance in Diagnostic Screening Tests

Research Article

Open Access

Oyeka ICA¹ and Okeh UM^2*

¹ Department of Applied Statistics, Nnamdi Azikiwe University, Awka, Nigeria

² Department of Industrial Mathematics and Applied Statistics, Ebonyi State University, Abakaliki, Nigeria

^*Corresponding author:

Okeh UM
Department of Industrial Mathematics and Applied Statistics
Ebonyi State University Abakaliki, Nigeria
E-mail: uzomaokey@ymail.com

Received January 16, 2013; Published February 28, 2013

Citation: Oyeka ICA, Okeh UM (2013) S Coefficient of Concordance in Diagnostic Screening Tests. 2:640 doi:10.4172/scientificreports.640

Copyright: © 2013 Oyeka ICA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

This paper proposes a measure of the strength of association, agreement or concordance between state of nature or condition in a population and test results in diagnostic screening tests. The proposed measure here termed “S Coefficient of Concordance” because it is a function of the sensitivity and specificity of the screening test, is normal between 0 and 1, assuming the value 0 when there is independence, complete discordance or disagreement and the value 1, when there is complete agreement or similarity between state of nature or condition and test results. Unlike the traditional odds ratio, the proposed S statistic in its specification and formulation properly ensures the non inclusion of the number of subjects in the sampled population who test negative even though they are actually positive in nature and the number of subjects who test positive even though they actually do not have the condition in nature, since these numbers are usually not known in diagnostic screening tests. The method develops the standard deviation of the proposed measure as well as test statistics that enables the testing of any desired hypothesis about not only the proposed S measure but also the sensitivity and specificity of the screening test. The method is illustrated with some sample data and the proposed ‘S’ measure is shown to be relatively more efficient and hence likely to be more powerful than the traditional odds ratio measure of association.

Keywords

Concordance; Coefficient; Diagnostic screening tests; Odds ratio; Sensitivity; Specificity

Introduction

Ordinarily in measuring the strength of association between two variables of classification either in cross sectional or longitudinal studies especially in medical research, the odds ratio, relative risk and other such measures rather than the Phi-coefficient are preferably used because unlike the later the former two measures are invariant under the three commonly used study methods [1,2]. However because the odds ratio and relative risk are not easy to clearly interpret and understand, some researchers prefer to use Berkson’s simple difference or Shep’s relative difference between rates as measures of association in medical research [1]. Unfortunately these two last measures are not invariant under the various designs. Also none of these measures of association theoretically has an upper bound to quickly indicate how large such measures could possibly be to indicate perfect disagreement or agreement between the variables of classification [3]. When used in the analysis of data obtained from diagnostic screening tests, a probably more serious problem with the traditional odds ratio and relative risk as measures of association is that they often include in their specifications and formulations the number of subjects testing negative among the subjects in the population known or believed to actually have the condition in nature and the number of subjects testing positives among those subjects known or believed not to actually have the condition in nature [4]. These are sample values that are in fact usually not readily known in diagnostic screening test results and ought not to be used in such analysis without prior modifications of the measures formulation. Thus the expressions for the traditional odds ratio, its standard deviation and significance test statistic are strictly speaking improperly specified and sometimes include unknown sample values.

When the prevalence rate of a condition in a population is known, then a measure of association between state of nature and test results in screening tests should ideally be based on true and false rates of the test that always factors in the prevalence rate. But prevalence rate of conditions in a population are not always known so that measures of association that include prevalence rates have limited applications [ 1,5].

In this paper we propose a measure of the strength of association or concordance between state of nature or condition and screening test results that does not require knowledge of the prevalence rate of the condition in the population. The proposed measure is based only on the sample data usually obtained in diagnostic screening tests, namely the total number of subjects studied consisting of the number of subjects drawn from the population known or believed to actually have the condition, the number drawn from the same population known or believed not to actually have the condition, the number testing positive among those subjects known to have the condition and the number of subjects testing negative among the subjects known not to have the condition. The proposed test statistic here termed the “S” coefficient of concordance because it is based on that is, it a function of only the sensitivity and specificity of the screening test, is normed to have values between 0 and 1 unlike the traditional odds ratios and relative risk measures of association which have no upper bounds.

Estimates of the standard error and test statistic appropriate for use with data from diagnostic screening tests are also proposed.

The Proposed Method

Suppose a medical researcher or clinician collects a random sample of n_.1 subjects from a population known or believed to have a certain condition in nature and another random sample of n_.2 subjects from the same population believed not to have the condition in nature giving a total random sample of n=n_..=n_.1+n_.2 subjects to be studied. Research interest is in confirming through a diagnostic screening test whether or not a randomly selected subject from the population has or does not have the condition of interest.

Let B be the event that a randomly selected subject from the population has the condition in nature and

be the event that the randomly selected subject does not have the condition in nature. Let

and A be respectively the events that a randomly selected subject from the population tests and does not test positives to the condition in the screening test. The results from such a screening test may be presented as in table 1.

Table 1: Formation for the presentation of result of a diagnostic screening test. State of Nature (Condition).

To develop the proposed S measure of association or concordance we may let

(1)

for i=1,2,...,n_.1

Let

Also Let

Now

Also from Equations 3 and 4 we have that the expressed value of W ₁ is

(5)

Similarly from Equation 4

(6)

Now π₁ is the probability that or the proportion of subjects testing positive among the sample of subjects known or believed to have the condition in nature, which is actually a measure of the sensitivity Se of the screening test. That is the proportion of subjects testing positive among those subjects in the population known or believed to have the condition in nature.

The sample estimate of π₁ is

(7)

Where f⁺ is the total number of 1^s in u_i1, which is the number of subjects testing positive among the n_.1 subjects known or believed to have the condition in nature. The sample estimate of the variance of

is from Equations 6 and 7

(8)

A researcher may for some reason wish to test the null hypothesis that the sensitivity of a test is some specified value π₁₀= S_e0: that is the null hypothesis

(9)

The null hypothesis of Equation 9 is tested using the test statistic

(10)

Which has approximately the chi-square distribution with 1 degree of freedom, the null hypothesis is rejected at the α level of significance if

(11)

Otherwise H₀ is accepted

Now the estimated proportion of all samples subjects who have the condition in nature and also test positive to the condition is from Equation 7.

(12)

The corresponding estimated variance is

(13)

Furthermore let

(14)

for i=1,2,…,n_.1

Let

Also let

Now

Also

and

(19)

Now π₂ is the proportion of subjects testing negative among the subjects known or believed not to have the condition in nature. Note that π₂ is actually the specificity Sp of the test; that is, the proportion of subjects testing negative among those subjects in the population known or believed not to have condition in nature. Its sample estimate is

(20)

The corresponding estimated variance is

(21)

Again for some reasons a researcher may wish to test the null hypothesis that sensitivity of a test is some specified value π₂₀= S_p0 or the null hypothesis

(22)

(23)

which has approximately the chi-square distribution with 1 degree of freedom, the null hypothesis is rejected at the α level of significance if Equation 11 is satisfied otherwise H₀ is accepted.

Now the proportion of all sampled subjects in the population who are known or believed not to have the condition and also test negative in the screening test is

(24)

With estimated variance

(25)

The expected value of

, namely

is the proportion of the population who are known or believed to have a condition in nature and also test positive in a the screening test. Similarly the expected value of

namely

, is the proportion of the population who are known or believed not to have the condition in nature and also test negative in the screening test. Their sum, namely

(26)

Is the proportion of all the subjects in the population who either are known or believed to have the condition in nature and also test positive or known or believed not to have the condition in nature and also test negative in the screening test. The larger the value of π=S, the smaller is its complement namely the proportion of subjects in the population who either are known or believed to have the condition in nature but test negative or known or believed not have the condition in nature but positive in the screening test, or vice versa.

In effect this relationship implies that the higher or larger the value of π=S, the stronger is the association, agreement or concordance between state of nature or condition and test results, the lower or smaller the value, the weaker is the agreement or concordance between state of nature and test results.

Now since π=S is a probability, it can only assume values that are between 0 and 1 inclusively. In other words, 0 ≤ S ≤ 1.The proposed measure of association π=S, agreement or concordance assumes the value 0 when there is independence, complete discordance or lack of any agreement between state of nature or condition and test results. In this case there is no association whatsoever between state of nature and test results, so that knowledge of a subjects test result is of no use in predicting or determining whether or not the subject has or does not have the condition in nature. If S=1, then there is perfect positive agreement or concordance between state of nature or condition and test results, so that knowledge of a subjects test result would enable accurate prediction of the subjects true condition in nature. Normally S assumes intermediate values between 0 and 1.

The sample estimate of S, the proposed measure of concordance, agreement or strength of association between state of nature or condition and test results in diagnostic screening tests is therefore

which is a weighted mean of the sensitivity and specificity of the screening test, estimated from Equation 26 as

(27)

Note that S is invariant under the three common study methods used for medical research and assumes all possible values between 0 and 1 inclusively. A researcher may wish to test the null hypothesis of no association that is of independence H₀: S=0 between state of nature or condition and test results. However, a more general hypothesis would be

versus

(28)

Now to estimate the variance of

, we note that by their specifications in Equations 1 and 14, u_i1 and u_i2 are uncorrelated so that from Equations 13 and 25 we have that

(29)

Hence the test statistic for the null hypothesis of Equation 28 is

which has approximately, the chi-square distribution with one degree of freedom for sufficiently large ‘n’ and may be used to test the null hypothesis of Equation 28. The null hypothesis is rejected at the α level of significance if Equation 11 is satisfied, otherwise H0 is accepted.

As noted above the usual practice has often been to use the traditional odds ratio or relative risk to assess the strength of association between state of nature or condition and test results in diagnostic screening tests. A problem with this approach however is that the number of subjects testing positive among those known to be free of the condition and the number of subjects testing, negative among those subjects known to have the condition of interest are not usually known and hence also are their derivatives n_1. and n_2., the total number of subjects that would either test positive or negative to the test. Hence all calculations including measures of association, their precision and test statistics based on these unknown values are fundamentally faulty and without modifications would yield misleading and unsatisfactory results.

The proposed S measure of concordance and the associated test statistic which are functions of the sensitivity and specificity of the screening test are not encumbered by these problems. Furthermore the proposed method enables the testing of any desired hypotheses concerning sensitivity and specificity of the test which provide useful information and an additional advantage over and above other existing methods.

Illustrative Example

A research scientist collected a random sample of n_.1=28 subjects from a certain community known or believed to have breast cancer and also a random sample of n_.2=49 subjects from the same community, known or believed not to have breast cancer. Research interest is to confirm through a diagnostic screening test whether each of the sampled subjects has or does not have breast cancer. The results of the screening test are presented in table 2.

We here use the data of table 2 to illustrate the proposed method.

Table 2: Results of the Screening Test for Breast Cancer in a Certain Community.

Results and Discussions

Now from table 2 and Equations 7 and 20 we have that the estimated sensitivity and specificity of the test are respectively,

and

These values show that the screening test is quite sensitive and specific. The corresponding estimated variances are from Equation 8 and 21 respectively.

and

Now from Equation 12 we have that the estimated proportion of all sample subjects who have the condition in nature and also test positive is

With estimated variance Equation 13 given as

Similarly from Equation 24 we have that the estimated proportion of the sampled population who are known or believed not to have the condition and also test negative in the screening test is

Whose estimated variance is from Equation 25

Therefore from Equation 27, we have that the sample estimate of the proposed measure of association or coefficient of concordance between state of nature (existence of breast cancer in the population) and screening test results (presence of breast cancer as revealed by the test) is

This estimated value of

suggests that there is a high level of agreement or concordance between actual existence of breast cancer in the population and the screening test results.

The sample variance of this estimate is from Equation 29

Hence the test statistic for the null hypotheses of no association between state of nature and test results, that is, the null hypothesis of Equation 28 with H0=S0=0 is from Equation 30

which with one degree of freedom is highly statistically significant indicating a high level of association or concordance between the existence of breast cancer in the sampled population and screening test results.

Now if we have used the traditional odds ratio W=o as a measure of association we would have from table 2 that the estimated sample odds ratio is

The estimated standard deviation is given as

The corresponding chi-square Test statistic is

which with 1 degree of freedom is also highly statistically significant. Note however that the estimated value of the proposed coefficient of concordance or measure of association is

which has by specification is always at most 1.0, its upper limit compared with the estimated odds ratio O=27.600 which theoretically has no upper bound. Also the standard error of

is only

which is much less than the estimated standard error of O of se(O)=17.664, showing that the proposed measure is much more precise and efficient than the traditional odds ratio. Also note that the proposed method and the traditional odds ratio method here both lead to a rejection of the null hypothesis of independence or no association, the relative sizes of the calculated chi-square values show that at least for the present example the odds ratio method is likely to lead to an acceptance of a false null hypothesis (Type II error) more often and hence is likely to be less powerful than the proposed S method.

Summary and Conclusion

We have in this paper proposed a measure of association, agreement or concordance, between state of nature or condition in a population and test results in diagnostic screening tests and it is formulated to be a function of the sensitivity and specificity of the screening test. Unlike the traditional odds ratio method which by specification has no upper limit and assumes the value 1 if there is no association, the proposed statistics, the coefficient of concordance, S is normed to assumed values between 0 and 1 inclusively with 0 indicating perfect discordant or independence and 1 indicating perfect agreement or concordance between state of nature and screening test result. Also the specification and formulation of the proposed measure of association S unlike the traditional odds ratio intrinsically and structurally does not include in its specification and formulation the number of subjects in the sampled population testing negative among those subjects who are known or believed to have the condition in nature and the number of the subjects testing positive among those who are known or believed not to have the condition in nature. These sample values are usually not known in diagnostic screening tests and cannot be properly and validly used without modification in analysis of data from diagnostic screening tests.

The standard error of the proposed statistic and a test statistic for its significance are developed. It is shown using sample data that the proposed test statistic is more efficient and powerful than the traditional odds ratio method.

References

Fleiss JL (1981) Statistical Method for Rates and proportion. (2ndedn) John Wiley & Sons, New York, USA.

Raslich MA. Markert RJ, Stutes SA (2007) Selecting and interpreting diagnostic tests. Biochemia Medica 17: 151-161.

Agresti A (2007) An Introduction to Categorical Data Analysis. (2ndedn) John Wiley & Sons, New Jersey.

Bossuyt PMM (2006) Clinical evaluation of medical tests: still a long road to go. Biochemia Medica 16: 103-106.

Greenberg RS, Daniels SR, Dana Flanders W, Eley JW, Boring JR (2001) Medical Epidemiology, Lange-McGraw-Hill, London, UK.