Longitudinal Comparison of Desktop and fMRI Scanner Versions of the Computerized Test of Information Processing in Multiple Sclerosis: A Pilot Study

Deficits in information processing speed are common in individuals with MS, but adequate detection of such is fraught with methodological difficulties. The Computerized Test of Information Processing (CTIP) is a relatively new measure that addresses some of these methodological concerns. The equivalency of a desktop and fMRI scanner version of the CTIP at detecting cognitive impairment and monitoring cognition over time was examined in a sample of individuals with multiple sclerosis (MS). Six individuals with a confirmed diagnosis of MS completed both the desktop and a modified scanner version of the CTIP at baseline and 3 year follow-up. Both forms of the task remained equally as sensitive at detecting cognitive impairment across the two time points. Similarly, reaction time performance was generally equivalent across both forms although participants made more errors on the scanner task. No participant demonstrated reliably significant change in performance over time. A modified version of the CTIP adapted for use in an fMRI environment holds promise for the detection of information processing speed deficits in MS.


Introduction
Cognitive impairment presents in roughly 40-60% of individuals with multiple sclerosis (MS) [1,2], affecting a wide variety of cognitive domains. Areas of dysfunction typically include attention, working memory and executive functioning [3], however a number of recent studies have suggested that the primary cognitive deficit that occurs in MS is a reduction in information processing speed (IPS) [4,5].
Processing speed reflects the amount of time required carrying out a cognitive task, or the amount of work conducted over a certain period of time [6]. Deficits in IPS represent a significant impairment in MS given that these deficits may underlie dysfunction in other cognitive domains. DeLuca et al. (2004) have formalized this idea and have proposed the Relative Consequence Model which suggests that the fundamental difficulty in processing speed observed in individuals with MS may underlie other areas of impairment, such as working memory [4,7,8], and episodic memory [9,10].
The Paced Auditory Serial Addition Test (PASAT) is generally acknowledged in the MS literature to be one of the most sensitive measures of IPS deficits in MS [11,12]. Research has shown, however, that PASAT performance can be influenced by age [13], education [14], and mathematical ability [15]. In addition, the PASAT is often reported to be a frustrating and aversive task for most individuals, regardless of cognitive status [12], and is prone to significant practice effects with serial administrations [12,16].
In light of the above, a relatively new measure of IPS, the Computerized Test of Information Processing (CTIP) [17], was developed. The CTIP is composed of three reaction time (RT) tasks which measure the speed at which individuals can respond to different types of stimuli. The time required for a response to the stimuli can be considered a reflection of the speed at which information processing occurs. The three RT tasks which make up the CTIP present with progressively increasing cognitive demands with each successive task. These are accompanied by increased RTs in both individuals with MS as well as healthy controls [18]. Preliminary studies have demonstrated, however, an even greater increase in RT with increasing cognitive demands for individuals with MS when compared to controls, and that this discrepancy increases with increased task complexity [18,19]. Unlike the PASAT, research has shown the CTIP is not influenced by an individual's mathematical ability [19], is perceived as less aversive and less difficult when compared to the PASAT [20], and is free from practice effects [21], making it ideal for serial testing. The CTIP has, thus, been suggested as a viable alternative tool for assessing IPS deficits in MS [18,19].
Research has shown that the CTIP can be used as a sensitive measure to detect cognitive impairment in MS. Reicker et al. (2007) found that when RT scores for a group of MS patients and healthy controls were converted to cut-off percentiles (at the 50 th , 10 th , and 5 th percentile), a significantly greater number of MS patients fell at or below each cut-off for all three subtests of the CTIP when compared to healthy controls. These results support the clinical utility of the CTIP as a sensitive measure of the cognitive impairment typically observed in MS. The current study examined the utility of both the traditional desktop form as well as a functional MRI (fMRI) version of the CTIP at detecting cognitive impairment, as well as change in impairment over time.
While the utility of the CTIP in a desktop form has been relatively well established, its usefulness in conjunction with fMRI remains less clear. Along with the current advances being made in fMRI, comes the need to adapt more sophisticated approaches to neuropsychological assessments to the scanner environment. IPS deficits in individuals with MS have been assessed within the MRI scanner using modified versions of the PASAT [22,23], and Symbol Digit Modalities Test (SDMT) [24]. Only one study to date, however, has examined CTIP performance within the scanner [25]. While this study examined levels of performance for both an MS and healthy control group on a scanner form of the task, no direct comparison to performance on the traditional desktop form was made. The equivalency of performance between a scanner and desktop form of the task were examined in the present study.
Few studies to date have examined the utility of an IPS measure at detecting cognitive change over time in conjunction with fMRI. It is well established that cognitive deficits often progress over time; however the rate of progression is often slow, with changes being only detectable in a subset of individuals and only after at least a 3-year interval [26,27]. The Reliable Change Index (RCI) is a frequently used method of detecting meaningful change over time [28]. This method allows for the examination of significant change which occurs over time above and beyond that expected by practice effects alone (a significant problem which occurs with the serial administration of neuropsychological tests). With this in mind, the current study administered the CTIP both in the fMRI environment and on the desktop after a three year interval. RCI scores were calculated to examine the utility of each form of the test in detecting cognitive change over time.
The primary objective of the study was to determine if the desktop and scanner versions of the CTIP are equivalent in both detecting cognitive impairment and monitoring cognition (IPS specifically) over time. Level of impairment on both forms of the task were examined in order to compare the sensitivity of each form of the CTIP as well as to determine if those individuals who showed impairment at baseline continued to show impairment three years later. Performance between the two versions was examined at both time points to determine their equivalency. The current study also analyzed RCI scores on both forms of the task in order to examine any meaningful change over time over a three year period.
It was hypothesized that the same number of individuals would show impairment on one form of the CTIP when compared to the other. Additionally, it was expected that a similar degree of impairment would be expected at both baseline and 3 year follow-up on both forms. Secondly, it was hypothesized that performance on the desktop version of the CTIP would be equivalent to performance on the scanner version at both time points. Finally, it was hypothesized that those individuals who showed reliable change over time on one form of the CTIP would similarly show change on the other. Note that patterns of neural activation will be discussed elsewhere as the goal of the current paper is to focus on the detection of cognitive impairment.

Method Participants
Six individuals ranging in age from 44 to 54 years (mean=47.67, SD=3.72) with a confirmed diagnosis of MS (McDonald, 2001 criteria) and an Expanded Disability Status Score (EDSS) of <3.0 (excluding cerebral/mental functions) were recruited through the MS Clinic of the Ottawa Hospital. Four individuals presented with a relapsingremitting (RRMS) disease course, and two with a secondary progressive course (SPMS). All participants were female with education levels as follows: 3 completed high school, 1 completed college, and 2 completed university undergraduate degrees. Disease duration ranged from 1-26 years since the date of diagnosis (mean=9.67, SD=9.40). The Quick Test was administered as an estimate of premorbid intellectual ability, with participants required to achieve a Full Scale IQ (FSIQ) score of at least 90 to be eligible for the study to ensure that any cognitive deficits were not due to premorbid intellectual limitations. These individuals presented with obvious cognitive impairment as assessed by their neurologist, themselves, or a close friend or relative and were free from previous neurological, medical, or psychiatric illnesses that may have impaired cognition. In addition, a neuropsychological battery had been administered to all patients that included measures of attention, memory and learning, and executive function. Five out of the six MS patients scored greater than or equal to one standard deviation below the mean (based on standardized normative data) on 5 or more tests at baseline. As such, there is objective evidence to support the subjective opinion that these individuals were cognitively impaired. Patients with significant physical impairment were not included in the study in order to ensure there was no contribution of motor difficulties to their overall level of performance on the tasks. All participants were fluent in English.

Procedure and measures
The study was approved by the Ottawa Hospital Research Ethics Board. After informed consent was obtained, individuals were administered the CTIP both on the desktop (as part of a larger neuropsychological battery of tests) as well as in the MRI scanner. The desktop form of the CTIP is composed of three reaction time (RT) measures: Simple RT (SRT) (i.e. press key for "X"), Choice RT (CRT) (i.e. press left key for "KITE"; press right key for "DUCK"), and Semantic Search RT (SSRT) (i.e. press left key if presented word is not in a particular semantic category; press right key if presented word belongs in a particular semantic category). Each of these three tasks contains 3 blocks of 10 trials each for a total of 30 trials.
The scanner version of the CTIP was as similar to the published desktop version as possible. Given the nature of fMRI, however, differences in design were introduced. The fMRI version presented blocks of stimuli for the three RT measures and a rest condition in a pseudo-random order. SRT was assessed by presenting instructions on a screen for 3 seconds which read "Press for X". CRT was assessed by having individuals pressing with their right index finger for "DUCK" and their left index finger for "KITE". Finally, SSRT was assessed by presenting instructions which read "Press Right for Match, Left for No Match". The Rest condition began with the instruction "REST" presented on the screen, followed by a focal cross to view while no responses were being made. Ten stimuli were presented, one at a time, in each block for the three RT tasks. CRT and SSRT conditions were balanced with 5 right and 5 left finger responses required and the interstimulus intervals between stimulus presentations for each type of  [17]. The number of MS patients falling at or below the 50th, 10th, and 5th percentiles was determined. The 10th and 5th percentiles represent cut-off values commonly utilized in clinical settings to determine an impaired level of performance on neuropsychological tests. Given that no normative data for the scanner form of the CTIP is currently available, percentile scores for each individual on each measure were calculated in a manner similar to the desktop form as outlined above. Chi-square tests were used to determine if any relationship existed between the time of testing (baseline vs. 3-year) and whether performance fell above or below the designated cut-offs on each form of the task (i.e. was there a difference in the number of people impaired at baseline and follow-up). In addition, chi-square analyses were performed to determine if a relationship existed between the two forms of the task and whether performance fell above or below the designated cut-offs at each time point (i.e. was there a difference in the ability to detect impairment between forms).
Hypothesis 2: To determine the equivalency in performance between the two forms, performance was assessed using percentchange scores and the number of errors produced on each form of the task. Percent-change scores examine the degree to which performance on one subtest has changed when compared to performance on a previous subtest. In this way, one can subtract out the motor component (as assessed by the SRT task) from both the CRT and SSRT tasks. This is important when one considers that often individuals with MS present with motor impairment which may impact their ability to perform the task successfully. In addition, one can further subtract the CRT task from the more complex SSRT task in order to examine more specifically the higher level of cognitive processing required when completing the SSRT task.
Separate 2 x 2 (Version [i.e. desktop vs. scanner] x Time) repeated measures analysis of variance (ANOVA) were performed. For the first, the dependent variable was the CRT-SRT percent-change score. A similar 2 x 2 (Version x Time) repeated measures ANOVA was performed with the SSRT-CRT percent-change score. Significant interactions were followed by analyses of simple effects. In addition, equivalency between forms was determined by examining the number of errors made. This analysis included a 2 x 2 x 2 (Version x Task x Time) repeated measures ANOVA. The dependent variable was the number of errors made on the choice and semantic search subtests and, once again, significant interactions were followed up by analyses of simple effects. It should be noted that the SRT task was not included for this analysis given that no errors can be produced (i.e. no right/ wrong choice can be made on this task).
Hypothesis 3: In order to examine reliable change over time, RCI scores were calculated for each individual on each measure. RCI scores were calculated based on normative data as follows: The test-retest reliability coefficient (r) was obtained through the published normative data for the desktop form. The standard error of measurement (SE m ) was calculated by: SE m =SD 1 (√ [1-r]), where SD 1 represents the standard deviation of the normative sample at Time 1. The standard error of the difference (i.e. the distribution spread of change scores expected if no change occurred) (SE Diff ) was calculated by: SE Diff =√[2(SE m ) 2 ]. Once these values were obtained, an RCI score was obtained using the following formula: RCI=(T2-T1)/SE Diff where T2 and T1 represent an individual's median RT obtained on the task at 3-years and baseline respectively.
Traditionally, a correction for practice effects is made when obtaining an RCI score [29]. Given that past research has shown the CTIP to be free from practice effects [21], this correction was deemed unnecessary. As currently no normative data exists for the scanner version of the CTIP, RCI scores were calculated from the published desktop norms in a manner similar to the desktop form as outlined above. RCI scores which fell ± 1.64 were considered as representing a statistically significant change in performance over time.

Results
The computerized statistical package PASW -Version 18 for Windows was used for data analyses. A significance level of α ≤ 0.05 was used throughout.

Impairment
The number and percentage of individuals scoring at or below the 50 th , 10 th , and 5 th percentiles on each subtest of the desktop and scanner forms of the CTIP are presented in Table 1.   Chi-square analyses revealed that there were no differences in the number of people impaired between baseline and follow-up for any of the CTIP tasks. Thus, the amount of impairment detected remained stable over time. When the number of individuals falling at or below each cut-off was compared across forms at baseline, a relationship between the number of individuals who fell below the 10th and 5th percentiles of the SRT task and CTIP version was noted (SRT-10 th =χ 2 (1, N=12)=5.33, p=0.040; SRT-5 th =χ 2 (1, N=12)=5.33, p=0.040). Thus, the scanner version detected impairment in more people on the SRT task than did the desktop version.

Performance
CTIP mean percent-change scores for both the desktop and scanner versions are listed in Table 2. The number of errors made on each subtest for each CTIP version is presented in Table 3.   In contrast, when the number of errors made was considered as the dependent variable, a significant main effect was found for Version (F(1,3)=10.14, p=0.050). This suggests that, unlike when percentchange scores were used as a performance measure, the desktop and scanner forms of the CTIP do differ in terms of their performance equivalency when accuracy of responding is considered. These results occurred at both time points. Examination of the means shows that, in general, individuals made a larger number of errors while completing the CTIP within the scanner environment. In addition, a main effect for Task (F(1,2)=37.23, p=0.026) was found with individuals making a larger number of errors on the SSRT task when compared to the CRT task on both forms of the CTIP at both time points.

Change over time
Individual RCI scores for all subtests on both CTIP forms are presented in Table 4. Examination of these scores indicates that individuals did not demonstrate a statistically significant change in level of performance from baseline to 3-year follow-up on either CTIP form. This suggests that performance remained stable over time at an individual level, as well as the group level noted above. Neither significant improvement nor significant decline was noted on either form of the task.

Discussion
The primary objective of the study was to examine the equivalency of the desktop and scanner forms of the CTIP: 1) at detecting cognitive impairment and monitoring impairment over time, 2) measuring level of performance, and 3) monitoring individual change in cognition over time.

Impairment
When the equivalency at detecting cognitive impairment over time was analyzed within each form, results showed that both forms of the task remained equally as sensitive at detecting cognitive impairment across the two times. That is, the number of individuals falling at or below each cut-off percentile remained consistent between baseline and 3 year follow-up on both forms of the task. These results were present across all three subtests. This suggests that either form of the CTIP can serve as a sensitive measure for detecting cognitive impairment over time.
When equivalency in detecting impairment over time was compared between the two forms, however, a greater number of individuals fell below the 10 th and 5 th percentiles on the scanner form of the SRT task. Indeed, the percentage of the sample (albeit a very small sample) falling below cut-off was 83% (i.e. 5/6 individuals). Although this might suggest that a large proportion of the participants demonstrated slowed simple reaction time perhaps secondary to a primary motor deficit related to MS, the fact that this was not observed on the desktop suggests that this is not an appropriate explanation. The question of why participants would be slower in the scanner than the desk-top on an apparently equivalent task remains. It is possible that the physical set up of the task within the scanner adds a level of complexity to the task, for example, the participant is not able to view the response pads while performing the task. It may also be explained, in part, by the pseudo-random order in which the three subtests of the CTIP are presented within the scanner. Whereas the desktop form of the task presents blocks of each subtest in a consistent manner across the task and in order of increasing task complexity (i.e. all SRT trials followed by all CRT trials and finally all SSRT trials), the scanner form of the task presents the subtests in such a way as the order of trials, and subsequent complexity, cannot be easily predicted. While the pseudorandom order of the tasks in the scanner remained consistent across all participants, it is impossible for each individual to know which subtest would be presented next within the task. The longer reaction times noted for the scanner SRT task (as reflected by a greater number of individuals being classified below the 10th and 5th percentiles) may therefore be attributable to adjustments in response patterns made by an individual between each task. In the scanner individuals complete only 10 trials of each task at one time (as opposed to all 30) and may therefore not develop the same within-task practice as seen on the desktop form given the fewer number of trials presented each time. The longer reaction times noted for the SRT task within the scanner (as reflected by a greater number of individuals being classified below the 10th and 5th percentiles) may therefore be attributable to the lack of within-task practice effects occurring over the 10 trials of this task. This inherent difference between the two tasks leads to a further methodological issue related to how impairment is defined. As noted above, there is no normative data available for the scanner version of the CTIP. As such, for current purposes "impairment" was defined using cut-offs derived from the desk-top normative data. This may be inappropriate. The large degree of "impairment" detected on the scanner SRT task may not be actual impairment, but rather may be related to the structural differences of the two tasks as described. Thus, if the scanner version of the CTIP is to be used appropriately in the future, it will be necessary to establish normative data from a healthy control sample in order to more accurately detect true impairment.

Performance
It is important to note that true motor dysfunction may cause an individual's SRT to be substantially longer than expected. As individuals with MS can often present with impairments in motor functioning, this makes interpretation of results on the CRT and SSRT tasks more difficult as both tasks rely on the same basic motor response as assessed by the SRT task. Consideration must be made then as to how to control for generalized effects of motor dysfunction in order to determine if results on the subsequent subtests are attributable solely to differences observed in SRT or to actual differences in information processing speed. For this reason percentchange scores were used in our performance analyses. In this way, results from the CRT and SSRT tasks can be interpreted as being solely reflective of the more cognitive components of these tasks.
Overall, performance on the scanner and desktop forms of the CTIP was similar across the two forms and at both time points when the percent-change scores were used as the performance measure. These findings were similar when both the SSRT-CRT and CRT-SRT percent-change scores were used. Given that performance on the desktop form of the task has been relatively well established as an appropriate measure of IPS deficits in MS [18,19,25], this equivalency in performance between the two forms suggests that the scanner form of the task can serve as a reliable measure for assessing IPS deficits within a scanner environment when the percent-change scores are used (and when the normative issues discussed above are addressed).
If one considers the number of errors made on each task as a measure of performance, a difference was noted across the two versions. In general, individuals were more likely to produce a higher number of errors on the scanner version of the task. These results were true for both the CRT and SSRT tasks, and at both time points. This suggests that although the versions are equal in their ability to detect processing speed deficits, they are not equivalent with regard to performance accuracy. The reason for this discrepancy may again be related to the physical set up of performing the task inside an MRI scanner and/or to the structural differences in the two versions of the task. The pseudo-random order of stimulus presentation in the scanner may make it more difficult for individuals to perform accurately given that they may have insufficient time to habituate to the particular task they are performing at any given time. A previous report by our group demonstrated that individuals with MS made more errors compared to a healthy control sample but that both groups reached levels of accuracy above 90% for both the CRT and SSRT conditions [25]. Thus, although more errors were made in the scanner, the level of accuracy remained high and thus the clinical significance of error differences between versions may be negligible.

Change over time
The results discussed thus far have focused on group comparisons in order to examine differences between performance on the two forms of the CTIP at baseline and 3-year follow-up. The issue remains however, that group analyses of test scores often mask significant individual differences in performance, particularly with small sample sizes such as this. For this reason, RCI scores were calculated for each individual in order to assess an individual's level of change over time. Results showed that individuals did not demonstrate a significant change in performance over time on either version of the task, or on any of the three subtests. While these results suggest that individuals showed no improvement in cognitive performance over time, it is equally as important to note that no significant decline was observed, suggesting stability in an individual's level of cognitive performance over time. Past research has demonstrated that change in cognition in MS is a slow process and that although there is a general trend for decline over time [30], it typically takes a number of years before change can be detected from a psychometric standpoint, and cognitive change in the short term is generally restricted to a sub-population (i.e. those impaired at baseline are more likely to show declines over time) [31,32]. The lack of change in the current sample may be due to an insufficient test-retest interval, and perhaps also to a methodological limitation of the study related to enrollment. Participants were enrolled based on a subjective impression of cognitive impairment by the patient, a family member or their treating neurologist. More stringent enrollment criteria of clearly defined impairment at baseline may have made it more likely that change would be detected over time.

Conclusions
In summary, the results suggest that, overall, the fMRI and desktop versions of the CTIP are relatively equivalent at monitoring cognitive impairment and individual change in cognition over time. Although both versions are similarly able to detect impairment on tasks of information processing speed, a difference was noted between the two forms when detecting impairment on a measure of motor performance (the SRT task), perhaps due to the methodological considerations noted above. Similarly, differences were noted in their performance equivalency depending on whether one considered processing speed or response accuracy. Whereas the two forms of the task were equivalent in performance levels when assessing how quickly an individual can respond to stimuli (i.e. percent-change score), subtle differences were noted in levels of performance between forms when assessing how accurately individuals respond (i.e. number of errors produced), with a greater number of errors being produced on the scanner form. Nonetheless, the high degree of accuracy on both forms suggests that this difference may be of negligible significance.
One aspect of the current study that limits its generalizability is the small sample size, in keeping with a pilot study. In addition, the lack of normative data for the scanner version of the CTIP makes the calculation of scanner-appropriate percentile cut-offs and RCI scores difficult. Lastly, the enrollment criteria should be more stringent in future studies to increase the likelihood of detecting cognitive change over time. The lack of such enrollment criteria in the current study is clearly a significant weakness and the authors recognize this shortcoming.
The CTIP has demonstrated itself to be a useful tool in the detection of information processing speed deficits in MS [18,19]. Similarly, the use of the CTIP in the fMRI environment holds promise with regard to the detection of information processing speed deficits in an MS population [25].