Received Date: May 23, 2014; Accepted Date: June 25, 2014; Published Date: June 30, 2014
Citation: Pasic N, Bryant D, Naudie D, Willits K (2014) Diagnostic Validity of the Physical Examination Maneuvers for Hip Pathology: A Systematic Review. Orthopedic Muscul Syst 3: 157. doi: 10.4172/2161-0533.1000157
Copyright: © 2014 Pasic N, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Orthopedic & Muscular System: Current Research
Background: There is a number of physical examination maneuvers used to diagnose hip pathology but the diagnostic validity of these maneuvers is unclear. We conducted a systematic review to evaluate current knowledge regarding the diagnostic validity of the physical examination maneuvers for hip pathology.
Methods: We conducted a literature search of the electronic databases MEDLINE, CINAHL, EMBASE, Cochrane, and SPORTDiscus. The methodological quality of each eligible study was assessed and classified according to Sackett and Haynes’ phases of diagnostic research, whereby Phase I and II studies represent proof of concept and Phase III studies are applicable to a clinical setting.
Results: Eight studies were classified as phase III diagnostic studies, four of which were methodologically rigorous. In diagnosing labral tears of the hip, neither the impingement test (sensitivity=0.51-0.78, specificity=0.10-0.89) nor FABER test (sensitivity=0.60, specificity=0.75) demonstrated evidence to support the use of these tests clinically. In diagnosing gluteal tendon pathology the Trendelenburg test demonstrated some evidence for use in a clinical setting (sensitivity=0.23-0.73, specificity=0.77-0.94).
Conclusion: The diagnostic validity of clinical tests to diagnose the presence or absence of hip pathology remains uncertain. The majority of studies supporting validity of these tests lacked methodological rigor, and thus cannot provide evidence to support the use of a test in clinical practice.
Diagnosis; Hip; Maneuvers; SPORTDiscus
Physical examination of the hip may be an important adjunct to a physical exam for diagnosing hip pathology. There are several methods available to study the potential diagnostic value of physical examination maneuvers, the appropriateness of each method is determined by the research question and dictates the strength of evidence that can be produced. Sackett and Haynes1 have defined four phases of diagnostic research to evaluate the validity of diagnostic studies.
Studies classified as phase I determine whether test results in patients known to have the target disorder differ from those who do not have the target disorder. Phase II studies determine whether patients with positive test results (i.e. patients known to be disease positive) are more likely to have the target disorder than patients with negative test results (i.e. patients known to be disease negative). Both phase I and phase II studies do not include a sample of patients that are representative of typical clinical practice and therefore cannot provide evidence to support the use of the test in routine clinical practice. They are however, the first step in moving toward designs that evaluate test validity in a clinical setting.
The final two phases that Sackett and Haynes  refer to are phase III and phase IV designs. Phase III studies question whether the diagnostic test distinguishes between patients with and without the target disorder among patients in whom it is clinically reasonable to suspect that the disease is present. Once a clinical test has been shown to have diagnostic validity in a phase III study, a phase IV study can be conducted to establish whether patients who undergo the test have better health outcomes than similar untested patients.
Both phase III and phase IV designs require the researcher to select a representative sample of patients for whom the clinician would face diagnostic uncertainty. This caveat means that the results of these studies can provide evidence as to the validity of the diagnostic test in a clinical setting. The strength of this evidence is dependent on the methodological rigor with which the study was conducted.
Methodological rigor can be briefly described by the following three criteria, 1) the sample must be representative of patients for whom clinicians would face diagnostic uncertainty; 2) the results of the diagnostic test must not influence those who undergo the gold standard; 3) and the gold standard must be evaluated by investigators who are blinded to the diagnostic test results. To determine the level of evidence supporting the clinical value of hip physical examination maneuvers, the existing literature can be classified according to Sackett and Haynes’ criteria.
We identified two systematic reviews on the physical examination maneuvers for hip pathology [2,3]. Burgess et al.,  conducted a systematic review of diagnostic tests for labral pathology of the hip. Of the 21 articles included in their final review, only five focused on physical exam tests. Furthermore, these five studies were heterogeneous with respect to the gold standard used and physical exam test(s) performed. Additionally, the authors noted that patients in these studies often did not face diagnostic uncertainty, and that reporting of physical exam test results was inconsistent. Tijssen et al.,  performed a systematic review of physical tests for diagnosing femoral acetabular impingement (FAI) or labral tears of the hip. Of the 21 articles they included in their final review, only 14 met Sackett and Haynes’ criteria for phasing diagnostic studies, and only three were found to be of good methodological rigor. Tjissen et al. also noted that the patients included in these studies were a poor representation of those who would be seen in typical clinical practice. Consequently, they found that no test was sensitive or specific enough to be used to confirm or discard a diagnosis of FAI or labral pathology in clinical practice.
The purpose of this study is to conduct a systematic review of the literature to evaluate the validity of physical examination tests for acetabular labral lesions, osteoarthritis, instability, and femoroacetabular impingement. We will classify the existing literature according to Sackett and Haynes’  phases of diagnostic research to determine the strength of evidence supporting the use of these tests in a clinical setting.
We conducted a systematic search of the online bibliographic databases MEDLINE (1966 through week 1 of June 2013), CINAHL (1982 through week 1 of June 2013), EMBASE (1980 through week 1 of June 2013), SPORTDiscus (1982 through week 1 of June 2013), and the Cochrane Library (1985 through week 1 of June 2013) to identify eligible studies reporting on the diagnostic validity of physical examination maneuvers for the hip. A summary of our search results can be found in Table 1. In addition, we performed a review of the reference lists of potentially relevant papers to ensure the completeness of the initial search.
|Search Terms||Embase||Medline OVID||CINAHL||SPORTDiscus||Cochrane|
|Acetabul* or acetabular labrum or hip joint*||32382|
|1 or 2 or 3 or 4||113187||77706||8928||5788|
|Labrum tear* or labral tear* or hip impingement or hip instability or ((femoroacetabular or femoroacetabular or femoro-acetabular or femoral acetabular) adj2 impingement) or FAI||3132|
|6 or 7 or 8||13089||9363||6103||823|
|Range of motion||14671|
|Test or testing or tests or examination or maneuver* or clinical sign* or range of motion||3528812|
|((Thomas or Ely or Ober or Trendelenburg or faber or Patrick or Stinchfield or McCarthy or Anvil or Hibb or impingement or fair or fadir or flexion-adduction-internal rotation or flexion-abduction-external rotation) adj3 (test or maneuver or sign))||797|
|10 or 11 or 12 or 13 or 14 or 15||3547091||1987766||418634||130258|
|“sensitivity and specificity”||197347|
|Likelihood ratio* or valid* or reliable or reliability or sensitivity or specificity||1939311|
|17 or 18 or 19 or 20 or 21||2103471||2010689||170204||41627|
|5 and 9 and 16 and 22||185||208||86||10||5|
|22 and limits||16||21||8||3||1|
Table 1: Summary of Search Results
Selection of studies
The titles and abstracts of articles found in the initial search strategies were reviewed and assessed by a single reviewer. Eligible studies included those that provided one or more of the following: 1) a description of at least one physical examination maneuver for diagnosing hip pathology (including acetabular labral lesions, osteoarthritis, instability, and femoroacetabular impingement), 2) comparison of the results of a physical examination maneuver with an appropriate gold standard (surgery, magnetic resonance arthrography, or radiographs in the case of osteoarthritis), and/or 3) details of sensitivity, specificity, likelihood ratio, a receiver operating curve (ROC) or sufficient detail so that these values could be calculated. All titles and abstracts that met the eligibility criteria and any marked uncertain were obtained in full text and reviewed by a single reviewer (NP) using the same eligibility criteria.
Evidence supporting clinical use of physical examination tests
Eligible studies were classified according to Sackett and Haynes’ phases of diagnostic research. Studies were classified as approaching phase I if they simply described the physical examination maneuver but did not test its accuracy or assessed the results of the test in a population of patients previously diagnosed with the target disorder.
The methodological quality of each eligible study was also assessed. Quality was assessed according to five criteria including 1) selection of a representative sample, 2) the choice of an appropriate gold standard, 3) blinding the interpreter of the physical examination to imaging, 4) blinding the interpreter of the gold standard to physical examination and imaging results, and 5) the presence or absence of verification bias.
Verification bias was classified as being present if any physical examination test or imaging influenced the decision to refer patients to undergo the gold standard. It was classified as uncertain if the study included only patients who underwent arthroscopy. Quality was not a requirement for inclusion or exclusion of studies into the systematic review; however it assisted in the interpretation of the results.
The initial search strategy yielded a total of 494 results (185 EMBASE, 208 MEDLINE OVID, 86 CINAHL, 10 SPORTDiscus, 5 COCHRANE)). Initial screening of titles and abstracts reduced the number of potentially relevant articles to 16, 21, 8, 3, and 1 respectively. Removal of duplicates and full text review of these articles yielded 12 papers meriting inclusion in the review. A secondary search of reference lists yielded two eligible studies. Therefore, 14 studies met the eligibility criteria for this review [4-19].
Table 2 summarizes the results of the quality assessment of the included studies. Eleven (79%) studies were prospective. Ten studies (71%) included a representative sample. The majority of studies (71%) did not use effective blinding techniques. The gold standard evaluator was blinded to the physical examination results in only five studies. The physical examination evaluator was blinded to the results of the gold standard in twelve studies and blinded to the result of other imaging in seven studies. Verification bias was evident in four studies.
|Study||Prospective (P) or Retrospective (R)||Diagnostically uncertain patients||Gold standard evaluator blinded to results of the physical examination?||Physical examination evaluator blinded to gold standard?||Physical examination evaluator blinded to other imaging?||Verification bias present?|
Table 2: Summary of Study Quality Assessment
Table 3 summarizes the study characteristics and the findings from the classification of research phases. Five of the studies were classified as approaching phase I, one was classified as phase I, eight were classified as phase III, and none of the studies were classified as phase IV. These studies provided varying levels of evidence to support 11 physical examination tests as described below.
|Study||Sample Size (n)||Study Population||Physical Exam Tests||Diagnosis||Reference Standard||Phase of Study Design|
Males = 17
Females = 13
Age = 25-54
|Patients with hip pain >3 months and a positive impingement sign||Impingement test||Labral tear and/or FAI||MRA||Approaching Phase I|
Males = 0
Females = 24
Age = 36-72
|Patients with a documented description of lateral hip pain with elicitable tenderness over the greater trochanter||Trendelenburg’s sign||Gluteus medius tear||MRI||Phase III|
Males = 17
Females = 13
Age = 17-62
|Patients with unclear intra-articular hip pain||Impingement test||Labral tear||Arthroscopy||Approaching Phase I|
Males = 25
Females = 24
Age = 18-68
|Patients with the primary complaint of hip pain in the anterior, posterior, lateral, and/or groin regions||FABER, impingement test||Labral tear||Diagnostic intra-articular injection||Phase III|
Males = 16
Females = 9
Age = 16-56
|Patients with clinical suspicion||FABER||Any hip joint pathology (labral tear/detachment/frayed labrum)||Arthroscopy||Phase III|
Males = 13
Females = 5
Age = 17-48
|Patients presenting with groin pain||Internal-rotation-flexion-axial compression maneuver, Thomas test||Labral tear||MRA||Phase III|
Males = 5
Females = 54
Age = 16-64
|Patients with dysplastic hip OA||Maximum flexion and internal rotation test, maximum flexion and external rotation test||Labral tear||Arthroscopy||Approaching Phase I|
Males = 32
Females = 40
Age range not provided
Mean age = 58.6
|Patients over 40, with a chief complaint of unilateral pain in the buttock, groin, or anterior thigh||FABER, scour test, squat test||OA||Plain radiograph||Phase III|
Males = 15
Females = 54
Age = 27-81
|Patients presenting with hip pain||Impingement test||Labral tear||MRI||Phase III|
Males = 65
Females = 0
Age = 19-23
|Asymptomatic males||FABER, impingement test||FAI||XR||Phase III|
Males = 18
Females = 0
Age = 32-56
|Patients who underwent periacetabular osteotomy||Impingement test, FABER, resisted straight leg raise||Labral tear||MRA||Approaching Phase I|
Males = 20
Females = 20
Age = 37-76
|20 patients with confirmed OA and 20 healthy controls||Trendelenburg test||OA||XR||Phase I|
Males = 3
Females = 37
Age = 33-78
|Patients with lateral hip pain||Trendelenburg test||Gluteal tendon pathology||MRI||Phase III|
Males = 9
Females = 12
Age = 17-65
|Patients with unilateral labral tears||Impingement test, FABER, McCarthy test||Labral tear||MRA||Approaching Phase I|
Table 3: Summary of Study Characteristics and Phase Classification
Six studies evaluated the use of the impingement test in diagnosing hip pathology. A labral tear was the sought after diagnosis in 4 four of these studies, [4-8] labral tear and/or FAI in one,4 and FAI in one . Three of these studies were classified as approaching phase I, while the remaining three were classified as phase III. Of the phase III studies seeking a labral tear diagnosis, Martin et al.  (sensitivity=0.78, specificity=0.10) did not use effective blinding techniques or control for verification bias, while Hananouchi et al.  (sensitivity=0.51, specificity=0.89) controlled for verification bias and appropriately blinded the physical exam test evaluators, however, it was unclear whether adequate blinding techniques were used throughout the study. In using the impingement test to diagnose FAI, Kapron et al.  conducted a methodologically rigorous study (sensitivity=0.09, specificity=1.00) in an active asymptomatic male population, which undermines the generizability of their findings. Therefore, there is a paucity of evidence to suggest that the impingement test is of clinical significance in the diagnostic workup of labral tears of the hip or FAI.
Three studies evaluated the use of the Trendelenburg test. One of these studies was a phase I study for diagnosing osteoarthritis,  while the other two were phase III studies, diagnosing gluteal tendon pathology  (sensitivity=0.23, specificity=0.94) and gluteus medius tears  (sensitivity=0.73, specificity=0.77) respectively. Both of these studies were of good methological rigor. Therefore, there is moderate evidence supporting the use of the Trendelenburg test in clinical practice for the workup of lateral hip pain.
Six studies evaluated the use of the FABER test for diagnosing labral tears, FAI, or osteoarthritis. Of these studies, three were approaching phase I studies [6,13,14] and three were phase III studies [7,9,15]. Regarding the phase III studies, the study performed by Sutlive et al.  used the FABER test to diagnose hip osteoarthritis and was methodologically rigorous (sensitivity=0.62, specificity=0.75), Martin et al.,  seeking a labral tear diagnosis, did not use effective blinding techniques or control for verification bias (sensitivity=0.6, specificity=0.18), while Kapron et al.  conducted a methologically rigorous study in an active asymptomatic male population (sensitivity=0.02, specificity=1.00). Consequently, there is minimal evidence to support the clinical use of the FABER test in diagnosing osteoarthritis, labral tears, or FAI.
One phase III evaluated the ability of the Thomas test to diagnose labral tears (sensitivity=0.25) . This study was of questionable methodological quality, and did not provide data to allow for a specificity value to be calculated. Thus, there is no evidence to support the use of the Thomas test in the diagnostic workup of a suspected labral tear.
One phase III study evaluated the effectiveness of the flexion-internal rotation-axial compression test (sensitivity=0.75, specificity=0.43) in diagnosing labral tears . The use of blinding in this study was questionable, therefore, there is limited evidence supporting the use of the flexion-internal rotation-axial compression test in clinical practice.
One methodologically rigorous phase III study evaluated the effectiveness of the scour test (sensitivity=0.62, specificity=0.75) in diagnosing osteoarthritis . Therefore, further evidence is required to support the use of the scour test in clinical practice.
One methodologically rigorous phase III study evaluated the effectiveness of the squat test (sensitivity=0.24, specificity=0.96) in diagnosing osteoarthritis . Consequently, additional studies are required to validate the use of the scour test as a diagnostic test for osteoarthritis.
The maximum flexion and internal rotation test,  maximum flexion and external rotation test,  resisted straight leg raise test,  and McCarthy test  were each described in approaching phase I studies. The methodological features and characteristics of these studies can be found in Tables 2, and 3 respectively.
The results of this systematic review demonstrate that the diagnostic validity of hip physical examination maneuvers is uncertain. For a diagnostic test to be clinically useful it should have at least two independent, phase III studies supporting its validity.1 In addition, these studies must be methodologically rigorous. Only four studies [9,11,12,15] included all five features of methodological rigor.
For a phase III or phase IV sample to be representative, it must be selected from a population of patients with hip complaints for which the physician would face diagnostic uncertainty. This includes patients with the full spectrum of the disease of interest, including those with and without concomitant pathology. One method to ensure that a sample is representative is to recruit consecutive patients with hip complaints. The study by Kapron et al.  utilized a strictly asymptomatic population, which, while diagnostically uncertain for FAI, likely undermines its generizability in the diagnostic workup of FAI and underestimates the sensitivity and specificity values. However, the remainder of the phase III studies consisted of patients facing diagnostic uncertainty.
A second criteria for assessing internal validity is that those responsible for interpreting the physical examination tests should be blind to the results of the gold standard (and other imaging), and vice versa, to prevent a biased interpretation. We found that the majority of the studies identified through our review effectively blinded the physical examination evaluator to the results of the gold standard but rarely discussed blinding to previous imaging. Contrarily, only five of the studies demonstrated adequate blinding of the gold standard interpreter to the results of the physical examination tests. Inadequate blinding of interpreters may produce overestimates of a test’s sensitivity and specificity as the interpreter may feel pressure (conscious or unconscious) to agree with imaging results.
Lastly, the results of the diagnostic test should not influence who undergoes the gold standard (verification bias). All patients for whom the clinician faces diagnostic uncertainty (i.e. consecutive patients with hip complaints), should undergo the gold standard to avoid including only patients with more severe disease, which will overestimate sensitivity. For example, Martin et al.  only performed MRA on patients with hip pain who were candidates for surgery, thus overestimating the sensitivity values quoted in their study. It is important to note that biased estimates of sensitivity and specificity are possible even if it was not the test under investigation that influenced the decision not to undergo the gold standard.
The use of the Trendelenburg test for diagnosing gluteal pathology is the only physical exam maneuver that has two methodologically rigorous phase III studies supporting its clinical use. While these studies demonstrate good specificity, there remains no consensus on the sensitivity value, suggesting that additional research is required before the Trendelenburg test can be considered a gold standard.
The diagnostic validity of the impingement test was evaluated by six studies, three [7-9] of which provided phase III evidence. Only one  of the studies assessing the impingement test was considered methodologically rigorous, and the study population used was asymptomatic. There is currently minimal evidence to support the use of the impingement test in clinical practice.
Of the six studies identified evaluating the use of the FABER test, three were phase III studies [7,9,15]. However, each of these three studies sought a different diagnosis. Consequently, the FABER test requires additional phase III studies for a more accurate estimation of its diagnostic validity.
The remaining physical examination tests (Table 3) included in this systematic review have little or no evidence to support their use in routine clinical practice. The majority of these tests have only one study evaluating their use and the majority of these studies lack internal validity (Table 2). Therefore, further research is required before the validity of these physical examination tests can be accurately assessed.
The majority of the diagnostic maneuvers discussed evaluate for pain, rather than objective findings. Consequently, this has likely resulted in a decrease in sensitivity values due to the inherent subjectivity associated with the reporting of pain.
Strengths of this study include our strong methodological design including the rigorous database search strategy, and willingness to evaluate foreign studies. Limitations of this study include, as with any systematic review, that this study is only as good as the data available. Furthermore, some heterogeneity amongst included studies was noted in terms of gold standard utilized, diagnosis sought, type of study (prospective/retrospective), blinding techniques, and physical exam tests used.
Previous systematic reviews of this nature have proved clinically fruitful when sufficient data is available, and has demonstrated the ability to shape clinical practice both in the realm of orthopedics,  as well as medicine in general . Hopefully the findings of this study can help to serve as an impetus to further study the diagnostic validity of specific hip physical exam tests.
The diagnostic validity of clinical tests to diagnose the presence or absence of hip pathology remains uncertain. The majority of studies supporting validity of these tests lacked methodological rigor, and thus cannot provide evidence to support the use of a test in clinical practice.