Reach Us +447482874092


Retest Repeatability of Motor and Musculoskeletal Fitness Tests for Public Health Monitoring of Adult Populations | OMICS International
ISSN: 2165-7025
Journal of Novel Physiotherapies
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business

Retest Repeatability of Motor and Musculoskeletal Fitness Tests for Public Health Monitoring of Adult Populations

Jaana Helena Suni1*, Marjo Birgitta Rinne1 and Jonatan R Ruiz2
1Urho Kaleva Kekkonen Institute for Health Promotion Research (UKK Institute), Finland
2Department of Physical Education and Sport, School of Sport Sciences, University of Granada, Spain
Corresponding Author : Jaana Helena Suni
UrhoKaleva Kekkonen Institute for Health Promotion Research (UKK Institute)
Kaupinpuistonkatu 1, 33500 Tampere, Finland
Tel: +358 3 2829 265
Fax: +358 3 2829 200
E-mail: [email protected]
Received December 03, 2013; Accepted February 20, 2014; Published February 22, 2014
Citation: Suni JH, Rinne MB, Ruiz JR (2014) Retest Repeatability of Motor and Musculoskeletal Fitness Tests for Public Health Monitoring of Adult Populations. J Nov Physiother 4:194. doi: 10.4172/2165-7025.1000194
Copyright: © 2014 Suni JH, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Novel Physiotherapies


Background and purpose: Physical fitness reflects the effects of regular physical activity and is an important prognostic factor of health. Field tests of fitness can be used for public health monitoring, but their accuracy for this purpose needs to be determined. The purpose was to evaluate the adequacy of retest repeatability of nine tests of motor and musculoskeletal fitness against preset criteria.

Methods: One week test-retest measurements with a single tester design were conducted. The participants we volunteers, 25 women and 26 men between ages 22 and 64. The main repeatability estimates of within subject variation were the typical error of measurement and coefficient of variation (CV). For ordinal scale measurements weighted Kappa coefficients were calculated. The CV values 10% or lower and kappa coefficient >0.60 were rated as adequately reliable for public health monitoring. In addition, the systematic chance and percent change in the mean were calculated.

Results: Six tests out of nine showed adequate repeatability (CV ≤ 10% or Kappa coefficient >0.60): agility in of eight run, handgrip strength, lower extremity power in vertical jump, upper body muscular strength and trunk stabilization in modified push-up, muscular endurance in static back extension, and shoulder-neck mobility. Retest learning effect was detected for backwards tandem walk and modified push-up tests. Reliability of two tests was not studied due to a ceiling effect in results towards maximum values. High level of physical activity and fitness of the participants reduced the intra-and inter-individual variation of the test results compared to general population.

Conclusions: Present study provided further knowledge on reliable fitness tests suitable for population monitoring of musculoskeletal functioning and health. These tests may also be used as outcome measures to follow the effects of exercise interventions target to participants with musculoskeletal problems.

Physical fitness; Reliability; Surveillance; Adult populations; Musculoskeletal functioning; Health
CV: Coefficient of Variation
Adequate physical activity (PA) and fitness are considered as one of the key factors in current public health promotion. Physical fitness also reflects the effects of regular physical activity. It has been recommended that assessment and monitoring of PA and fitness could be part of a public health strategy [1,2] to deliver interventions to communities likely to increase population PA and fitness levels. Furthermore, data collected from populations should provide evidence based information for health policy planning [1,2].
There is strong evidence that low performance levels in several factors of physical fitness are risk factors for various health problems including the major non-communicable diseases [3-5] musculoskeletal disability related to mobility limitations, [6-8] and increasing evidence for low back pain (LBP) [9-11]. Field-based methods of fitness that show meaningful relationship with health and physical functional ability are needed for promotion of PA and fitness for health. At best these assessments of health-related fitness [12] can be used to monitor the level of fitness in different populations, and to identify those with increased health risks due to inadequate levels of fitness [8,13].
In order to apply fitness tests to large populations the test methods need to be safe, economic and easy to administer under conditions available in ordinary communities [12]. Furthermore, their measurement error (reliability) needs to be established in relation to the measurement purposes [14]. Regarding population based fitness measurements, the testers categorize subjects into different levels of performance (i.e. into fitness classes), make comparisons between individuals and groups, and monitor fitness changes over time [14].
Retest repeatability concerns the consistency of the observed values when the measurement is repeated in same environment, tester and participants. Within-subject variation is the most important type of repeatability measures: the smaller the within-subject variation (i.e. typical measurement error) the better precision of single measurements and better observation of changes [15,16]. In order to correctly categorize individuals into fitness classes, the typical (standard) measurement error of the test needs to be smaller than the average range of applied fitness classes [17]. Correct categorization is a critical issue in targeting interventions to low-fit population groups and individuals with increased risk of diseases or disability, and in epidemiological follow-up studies estimating the predictive effect of fitness level on future morbidity and mortality.
Systematic change in the mean, a non-random change in the measurement value between two test sessions, is an important issue when volunteers perform a series of test trials as part of a monitoring program [15,16]. A typical example of a systematic change in physical fitness testing results is a learning effect (bias) i.e. the participants perform better in the second test session than the first, because they benefit from the experience of the first test session. To indicate true improvement or deterioration, the change in person’s fitness level over time needs to be bigger than the systematic change in the mean. The true change is a critical issue also in epidemiological studies that estimate the predictive effects of fitness changes on future morbidity and mortality [18,19].
The purpose of this study was to evaluate the retest repeatability of a set of health-related fitness tests which have potential to be used in public health monitoring to predict changes in future health and musculoskeletal functioning.
Materials and Methods
Volunteers from two departments of the University of Tampere, Finnish Railway Company Ltd, and a private gear manufacturing enterprise were recruited by sending an invitation letter with detailed information on the study by email to persons in supervising positions, who then further delivered the message. Altogether 51 subjects participated in the study, 25 women and 26 men between ages 22 to 64. The mean age of the men was 42.5 (SD 13.4) and of the women 39.4 (SD 9.8). Background characteristics of the participants are given in Table 1.
The subjects were asked to contact the health promotion research institute by phone or email to make appointments for two measurement sessions approximately a week apart. The mean number of days between the two measurement sessions was 7.0 (SD 2.1, range from 2 to 15), 82% (n=40) managed to have a retest within 5-9 days.
A pre-testing health screening was conducted according to the safety model of the Health-Related Fitness Test Battery for Middleaged Adults [20]. It included a modified version of the Physical Activity Readiness Questionnaire (MPAR-Q) [21], and questions on perceived health status, overall level of physical activity [22] (including weekly frequency and intensity), and type of exercise. One experienced fitness tester was responsible for conducting all the fitness measurements.
Assessment of health-related fitness
Nine fitness tests were individually performed by each subject in a pre-set sequence with standard methods [17,23,24]: three motor tests measuring balance and agility, and six musculoskeletal tests measuring shoulder-neck flexibility, muscular strength of upper and lower extremities, and endurance of the trunk muscles. The fitness items with description of their purpose, exclusion criteria, test performance and instructions, and scoring are presented in Appendix.
Statistical analyses
The mean with standard deviation (SD) or the median, and the minimum and maximum of test-retest results in women and men are presented as descriptive statistics. The estimates of repeatability for interval scale measurements were calculated as suggested by Hopkins [15]: (1) typical (standard) error of measurement (s), indicating withinsubject variation, was calculated as the standard deviation of test-retests difference (SDdiff) divided by the square root of two (s=SDdiff/√2). To compare the repeatability of different tests the typical error was presented also as the relative measurement error i.e. (2) coefficient of variation (CV): typical error divided by the mean of two tests (CV=s/ mean(testi+retesti)/2). The (3) systematic chance in the mean with 95% confidence interval was calculated using paired samples t-test in SPSS 17.0 for Windows software (SPSS Inc., Chicago, IL). The mean change between the two measurement sessions was considered statistically significant if the 95% confidence interval (CI) did not include value zero. The (4) percent change in the mean performance between the first and second test session was also calculated. In addition, (5) the Bland and Altman plots (data not shown) were screened for heteroscedasticity [14]. For ordinal scale measurements (6) weighted Kappa coefficient was used to estimate the reproducibility between two test sessions.
The rationale for the criteria on adequacy of repeatability in the present study
Currently there are no consensus statements or standards for acceptable measurement precision for monitoring physical fitness in population level. Other risk factors of non-communicable diseases such as blood pressure, cholesterol and length of waist circumference have been systematically monitored among large populations in several countries around the world. According to data from Australia [25] the CV for intra-individual variation of waist circumference was less than 1%, and less than 5% for blood pressure. The intra-individual variation in total cholesterol measurements is 7% and additional analytical errors of laboratory personnel 6% (personal information from a chief biochemist of a university hospital). Due to this fairly large total error (13%), it has been recommended that measurements of cholesterol level should be taken more than one time before making decisions on medication.
We applied the above knowledge on the aforementioned CVs while making our definitions on the adequacy of retest reliability of fitness tests aimed for public health monitoring. The CV values of intra-individual variation (i.e. typical error) were rated as follows: less than or equal to 5% as highly reliable, values between 6 and 10% adequately reliable, and those 11% or more not reliable for public health monitoring. Altman [26] has rated the Kappa coefficients as follows: 0.81 to 1.00=very good, 0.61 to 0.80 = good, 0.41 to 0.60 = moderate, 0.21 to 0.40=fair and ≤ 0.20=poor. In the present study weighted Kappa estimates higher than 0.60 were considered adequately repeatable for the purpose of public health monitoring.
Descriptive results of test-retests measurements in women and men are presented in Table 2. In general the men had somewhat better fitness in all tests except the static back extension in which women had better endurance strength. In the shoulder-neck test women had better result for right side than the men: in the first and second session 13 women of 25 had maximum score of 3 (no movement restriction), respective figures for men were six and seven out of 26. There was a strong ceiling effect towards maximum values in one motor fitness test (balance in one-leg stand) and one musculoskeletal test (muscular endurance in dynamic sit-up). Due to this repeatability analyses were not conducted for these two tests items.
The main result of the retest repeatability analyses for interval scale measurements are presented in Table 3. Regarding the tests of motor fitness, the smallest within-subject variation as the percent typical error (i.e. CV) was found in for running the figure-of-eight test (2%) with standard deviation of the mean difference (SDdiff) being 0.22 s. Percent change in the mean between the two test sessions was less than 1%, and there was no sign of systematic bias. Repeatability of the backwards tandem walk test indicated more intra-individual variation (SDdiff=3.2s) the CV of 15% being the highest of all of the tests analysed (Table 3). Furthermore, the performance significantly improved from test to retest.
Of the musculo skeletal tests, handgrip strength showed small intraindividual variation (SDdiff=2.3 kg) the CV being the second lowest (4%) of all tests. However the 95% CI of the change in the mean indicate a small, statistically significant improvement in handgrip strength from first test session to retest session (Table 3). Lower extremity power in the jump-and-reach test also had fairly small intra-individual variation (SDdiff=3.4cm) and CV 6%, and a very low 1% change in the mean with no indication of systematic bias. The CV of intra-individual variation (SDdiff=1.6rep.) in modified push-up test was 8%, percentage change in the mean being also 8. The retest performance improved on average, 1.1 repetitions, which was statistically significant (Table 3). Endurance time in static back extension test (SDdiff=18.6s) had CV of 10%, however the mean change value was small with percent change of 1% (Table 3).
In the shoulder-neck flexibility test, the result was based on visual observation of the tester with ordinal scale criteria for three categories (Appendix). The weighted Kappa coefficient for right side was good (0.80) and very good (0.85) for left side. The number of discordant pairs from test to retest for the right side was eight (2 improved, 6 deteriorated) and six for left side (2 improved, 4 deteriorated). None of the ratings changed from 1 to 3 or vice versa.
We studied different aspects of reliability of nine field tests of motor and musculoskeletal fitness with potential for public health monitoring among working aged populations. The following six tests showed adequate repeatability according to preset criteria (CV ≤ 10%, Kappa coefficient >0.60, see methods) on intra-individual variation within one week’s test-retest sessions for population monitoring: agility in figure-of-eight run, handgrip strength, lower extremity power in vertical Jump (jump-and-reach), upper body muscular strength and trunk stabilization in modified push-up, muscular endurance in static back extension, and shoulder-neck mobility. Backwards tandem walk test had unacceptably high intra-individual variation and there was a systematic learning effect. Tests of one-leg balance and dynamic situps lacked discriminatory power (Table 2) and were excluded from the repeatability analyses.
Repeatability of the motor fitness tests
The running figure-of-eight test had the lowest intra individual variation of all the tests (CV 2%), and very small change in the mean. Absolute measures of repeatability for the test were first introduced by Vartiainen et al. [24] in young healthy men. They reported the typical error to be 0.14s which is similar to the present finding of 0.16s in healthy middle-aged men and women. Among 50 young women (athletes and non-athletes) [27] zigzag run test had higher repeatability correlations (intra class correlation coefficient i.e. ICC was 0.97) than any of the five studied hop tests (range of ICC from 0.84 to 0.94, standard error of measurement 0.26s) which may indicate easier customizing to running compared to hopping. The running figure-of eight test was originally designed for clinical use to monitor patients after knee trauma operations [28]. The test requires both agility and speed of movement (power). More recently, a high performance level in the test was associated with high quality of life in elderly women [29]. Furthermore, the test result seems to differentiate patients with head injury (contusion) from healthy control persons [30]. Thus, figure of eight test seems to offer a reliable and meaningful health-related test mode that is feasible from early adulthood to old age.
The repeatability of the of the backwards tandem walk test was poor (CV 15%). Originally the test used as an indicator of dynamic balance in elderly women [31], and the authors reported high correlation (r=0.94) between the repeated measures one week apart. High correlation (ICC=0.85) between two tests and the 95% limits of agreement in the Bland and Altman plots were also reported by Rinne et al. [23]. Their conclusion of the repeatability was “reasonable.” The most recent findings by Vartiainen et al. [24] agree quite well with the present findings: ICC 0.71, SDdiff 2.2s, and typical error 1.6s with corresponding CV of 12% (the latter calculated by the present authors). Among the elderly, both poor static and dynamic balance are strong predictor of mobility disability [8,32] and falls [33].
Repeatability of the musculoskeletal fitness tests
In the present study measurements of hand-grip strength were highly repeatable (CV 4%) among working aged men and women regardless of a small learning effect. The former studies have also reported high repeatability correlations [34,35], however, there is considerable variation and insufficient reporting on the measurement protocols, values recorded, and results used (mean value, from one, two or three attempts, with either hand or the dominant hand alone) [35]. For instance, body, wrist and handle positioning have effect on both strength values and test repeatability [35]. The latter ones were all controlled in the present study (see Appendix). Low grip strength has been associated with a greater likelihood of premature mortality and the development of disability in middle-aged and elderly populations [36,37], and it is measured to detect sarcopenia [34].
The intra-individual variation in test-retest sessions of jump-and reach test (i.e. vertical jump) was adequate (CV 6%) with no systematic bias and very low change in the mean (1%). The finding is line with a former study, which reported standard error of 3.0 cm [17]. According to the review by Hopkins et al. [16] the CV of tape measures of vertical jump height among athletes varied between 3.8-4.8%. Athletes are likely to be measured regularly however our results with more “novice” jumpers are well in line with these former results. In the present study protocol, a practice trial was performed before the two test trials in both test sessions. This is important, while the CV between subsequent trials during same measurement session [16] is much smaller between second and third trial than (0.2%) than the first and second trial (1.2%). Vertical jump requires ability to activate fast type of motor units in a short period of time i.e. muscular power. The importance of power of lower extremities has been emphasized in relation to mobility disability and ultimately risk of falls in elderly populations, especially in women [38-40]. While muscular power starts to decline after 40 year of age which is earlier than maximal muscular strength or endurance [41], population monitoring should start before this age to enhance the importance of muscular power type of training for maintenance of mobility functioning at older age.
The modified push-up test results showed acceptable within- subject variation (CV 8%), however there was a statistically significant mean improvement in test-retest results of 1.1 repetitions (95% CI 0.7 to 1.6) indicating a learning effect. The participants were allowed to practice the performance technique once during a push-up cycle in order to not to cause fatigue before the actual testing. In a previous investigation with a less selected study population, the test-retest leaning effect was even bigger (mean difference 3.0; 95% CI from 2.1 to 3.9) [17]. A practice test session before actual testing would probably help to overcome the bias of learning. We were not able to find any studies reporting absolute measures of repeatability on conventional push-up tests to compare the results with the modified test. Low fitness in modified push-up test has been associated with poor perceived health, and low back dysfunction and pain among middle aged [42]. Recently, an increased risk for low back pain was reported in previously healthy conscripts with a poor fitness level in trunk muscular endurance and aerobic performance [11]. The strongest risk factor at entry was poor fitness in both back-lift and conventional push-up tests, i.e., co-impairment (hazard ratio 2.8; 95% confidence interval 1.4-5.9).
The intra-individual variation in static back (trunk) extension test was barely acceptable (CV 10%). The test is a modification of the Sørensen back endurance test. A review by Essendrop et al. [43] concluded that there is lack of repeatability statistics beside Pearson correlations of the Sørensen test. Keller et al. [44] reported CV of 21% and stated that the test-retest variation was too high for use in follow-up studies. The results of the present study were somewhat better (CV 10%), possibly due to afore mentioned selection of the subjects. It is well accepted that motivation is strongly related to the back endurance test performance due to uncomfortable static position and a long testing time. Despite of the poor general repeatability of the static trunk extension tests, there is evidence that poor endurance predicts the incidence of LBP [11,45-47].
In the shoulder-neck mobility test the weighted Kappa coefficients were well acceptable (≥0.80) for population monitoring. The results were somewhat better than in a former study where weighted Kappa for right side was 0.60 and for left 0.61 [17]. This test could be a practical tool to screen out subjects with stooped upper body posture (see Appendix) commonly related to high amount of sitting and computer work in children, adolescents and working age population. In time, the stooped posture leads to restricted shoulder joint mobility and changes in vertebral level mobility of the lower cervical and upper thoracic spine.
Limitations of the study
The main limitation of the study was the homogeneity of the participants in terms of high level of physical activity and fitness which became restrictive for evaluations of repeatability of two tests (oneleg stand, dynamic sit-up) and probably reduced the intra- and interindividual variation in all test results. The results of the present study are likely to overestimate the level of repeatability in situations when the studied fitness tests are applied to general population.
Strengths of the study
Reasonable precision for estimates of reliability requires approximately 50 study participants [15], the present study included 51 subjects. Furthermore, only one experienced fitness tester conducted the measurements, which emphasizes the role of intra-individual variation as the source of error [15]. The selected statistical methods and outcome measures agree with expert recommendations [14-16,26]. We included Bland Altman plots to screen out possible heteroscedasticity [14], but decided not publish them in the present paper while no such trends were detected. We chose not the report the 95% limits of agreement, while they are considered too wide (2.8 time typical errors) to detect important trustworthy changes [15].
Musculoskeletal disorders, constitute major public health problems, and are the most common causes of work disability and consequent absence from work [48]. The results of the present study provide further knowledge on reliable measurement tools for population monitoring of musculoskeletal functioning and health. Physiotherapists could also use the tests as outcome measures of activity limitations in exercise intervention targeted to improve physical functioning of working aged individuals with musculoskeletal problems.
The study was conducted as part of the project ALPHA (Instruments for Assessing Levels of Physical Activity 2007-2009,, funded by the European Commission, DG SANCO. ALPHA aimed at providing a set of research based instruments for assessing levels of physical activity, its underlying factors (e.g. build environment, transport, and workplace), as well as physical fitness in a comparable way within the EU.

Tables and Figures at a glance

Figure Figure Figure
Table 1 Table 2 Table 3
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Article Usage

  • Total views: 13308
  • [From(publication date):
    February-2014 - May 16, 2021]
  • Breakdown by view type
  • HTML page views : 9422
  • PDF downloads : 3886