Received date: January 19, 2015; Accepted date: February 15, 2015; Published date: February 18, 2015
Citation: Kropmans TJB, Griffin L, Cunningham D, Walsh D, Setyonugroho W, et al. (2015) Back to the Future: Electronic Marking of Objective Structured Clinical Examinations and Admission Interviews Using an Online Management Information System in Schools of Health Sciences. J Health Med Informat 6:182. doi: 10.4172/2157-7420.1000182
Copyright: © 2015 Kropmans TJB, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Health & Medical Informatics
Abstract Background: The Objective Structured Clinical Examination (OSCE) and Multi Mini Interviews (MMI) are established tools in the repertoire of clinical assessment methods in Schools of Medicine and Health Sciences worldwide. The use of OSCEs facilitates the assessment of psychomotor skills as well as knowledge and attitudes. Identified benefits of OSCE assessment include development of students’ confidence in their clinical skills and preparation for clinical practice. However, a number of challenges exist with the traditional paper methodology, including documentation errors and inadequate student feedback, electronic assessment is therefore new future. Objectives: To explore electronic OSCE delivery and evaluate the benefits of using an electronic OSCE management system. Design: A pilot study was conducted using electronic software in the management of a five station OSCE assessment with a cohort of first year undergraduate medical students delivered over two consecutive years (n=383) in one higher education institution in Ireland. Methods: All OSCE documentation was converted to electronic format. Assessors were trained in the use of the OSCE management software package and laptops were procured to facilitate electronic management of the OSCE assessment. Following the OSCE assessment, assessors were invited to evaluate the experience. Results: Electronic software facilitated the storage and analysis of overall group and individual results thereby offering considerable time savings. Submission of electronic forms was allowed only when fully completed thus removing the potential for missing data. Conclusions: Analysis of results highlights issues around inter-rater reliability and validity of measurement tools. Regression analysis, as a standard setting method, increases fairness of result calculations as compared to static cutoff scores.
Objective structured clinical examination; OSCE; Multi mini interview; MMI; e-Assessment; Borderline regression analysis; Generalizability theory
The Objective Structured Clinical Skills Examination (OSCE) is a well-established method of assessing non-cognitive skills in multi mini interviews and clinical competence in so called OSCEs among health practitioners or students that apply for a health sciences degree . The OSCE originated in the UK as an objective means to assess medical students’ skills . The examination involves students progressing through a series of stations where they are assessed by an examiner with pre-determined marking criteria . In these series of stations either clinical tasks and/or non-cognitive skills for admission interviews are being assessed.
Several authors have highlighted the importance of using OSCEs as an assessment method in health care education [1,4-6]. The OSCE facilitates the assessment of student’s competency with clinical skills in a controlled simulated environment instead of in the practice setting . According to McWilliam and Botwinski, students recognise the value of the OSCE experience to their education .
A number of benefits have been attributed to the use of OSCEs including, the development of students confidence , the preparation of students for clinical practice and the achievement of deeper more meaningful learning . Importantly, the use of OSCEs facilitates the assessment of psychomotor skills as well as knowledge and attitudes  OSCEs provide students with feedback on their clinical performance and facilitate the identification of strengths and weaknesses . The OSCE has been reviewed positively as an assessment method for clinical competence and for responding to student diversity in education . However, there are a number of notable disadvantages associated with OSCEs. In particular, some students find them stressful and they are resource intensive in terms of staff, equipment and clinical skills laboratories . However, Alinier suggests that the educational benefits surpass the issues associated with resources .
Traditionally OSCEs have been assessed with paper based methods. However, a number of issues have been highlighted with this method including illegible handwriting, missing details (students’ names and student numbers) and lost assessment sheets . Furthermore, it is known that manual calculation of results and entering them into a database is time-consuming and is subject to human errors and feedback is rarely provided to students on their performance after paper based assessments . Despite these issues there is a scarcity of literature regarding the use of computer or OSCE software and the assessment of OSCEs. Segall et al  compared the usability of paper and pencil method and Personal Digital Assistant (PDA) based quizzes and found the PDA based quiz was more efficient and superior to the traditional based method . Similarly, Treadwell  compared the conduction of a paper based OSCEs with an electronic method. The findings indicated that the electronic method was just as effective and more efficient (less time consuming) than the traditional paper based method. In addition, the electronic system was highly rated by the assessors, who found it less invasive and reported they had more time to observe the students and permitted greater observation of the students when using the paper assessment. Schmitz  highlights a number of advantages to use an electronic handheld device to assess OSCEs including, speed of data gathering, simplicity of data evaluation and fast automatic feedback . Segall et al  support computer based assessment suggesting that grading is more accurate, feedback is immediate, security is enhanced and less time is spent by instructors on grading and data entry . Cunningham and Kropmans developed an OSCE Management Information System (OMIS) that is currently used within 19 prestigious universities worldwide to retrieve, store and analyse all OSCE and MMI data. The aim of this study was to explore the benefits of an online OSCE Management Information System for School of Medicine OSCEs, by means of the analysis of two cohort studies [12,13].
In this cross sectional study we analysed the outcome of a fully fledged electronic administered Objective Structured Clinical Examination (OSCE), of two cohorts of students assessing the clinical outcome of the first year MD139 module Medical Professionalism using an in house developed online OSCE Management Information System (OMIS) . This MD139 OSCEs comprised of 5 individual stations. Both consecutive student cohorts (i.e. those from the 2012- 2013 and 2013-2014 academic years) completed a urine analysis station, a chest X-ray station, a BMI and Vital signs station and finally a Basic Life Support station, each of which was of 5 minutes duration. The total number of first year students that completed the OSCE was 383. The 2012-2013 cohorts comprised 213 students, whereas the 2013-2014 cohorts comprised 170 students. The station checklists for both OSCEs were identical. The novel online OSCE Management Information System, which was developed “in-house” at the National University of Ireland Galway, was used to administer both examinations [12,13]. OMIS retrieves stores and analyses assessment data electronically. Student feedback can be sent to students electronically using the Student Feedback Email System. We used item checklists to assess student competency with each task. The number of items per assessment form varied from 13, for the Basic Life Support station, to 22, for the Urine analysis and for Vital signs stations, with a maximum score of 60 marks for all clinical stations. The overall professional impression of the examiners (Global Rating Scale) was rated on a 5 item Likert Global Rating Scale (GRS) which included the following options: Fail (0), Borderline (1), Pass (2), Good (3) and Excellent (4). The numerical values of the GRS options were not incorporated in the final student scores, but were instead used for standard setting using an online Borderline Regression Analysis function that is built in to the OMIS. The static pre-determined cut-off score for medicine studies is 50% (<50% means a fail score; >50% means a pass score).
OMIS produces an online analysis of items and overall total (raw) scores and adjusted (raw) scores using standard setting of student performance after regression analysis. The mean result, standard deviation (SD), minimum and maximum and range and mid range are produced instantly, in real time, during the examination. Internal consistency of OSCE station item forms (Cronbach’s Alpha) is used to provide insight into the consistency of items in each station predicting the overall score of the student of that specific station. Borderline Regression Analysis (Borderline Group Average versus Borderline Regression Method) calculates a ‘flexible cut-off score’ complementary to the general static cut score of 50% for each individual station. The overall average regression cut-score is used to adjust the average overall raw score of the students. Borderline Group Average, which is based upon calculation of the average mark of those students that were globally rated by their examiners as ‘borderline’, is the most simplistic method to use . A complete Borderline Regression Analysis, which is performed over all item marks matched with all of the global ratings (from fail to excellent), can also be used. The flexible cut-off score is calculated using the BRM Cut score (Intercept+1 × Slope)- since borderline=1 using FORECAST method, (Figure 1). All analysis reports and data were exported to Excel to facilitate further detailed analysis.
Dynamic cut-off score
Figure 1: Single borderline score regression analysis illustrating the effect of a regression analysis in which station scores on the Y-axis are outlined against the professional opinion of the examiners (X-axis) (adjusted from John Patterson, honorary senior lecturer at the Centre for Medical Education of the Barts and London School of Medicine and Dentistry and Assessment Consultant).
Data were exported to perform a Generalizability Coefficient analysis using a G- and D-study with EduG software. The G-study generates information about whether the outcome can be generalised to other medicine OSCEs. The D-study provides information on how the generalizability of results can be improved .
The summary of results for the 2012-2013 cohort (n=213), as produced by the OMIS, demonstrated an overall internal consistency (how well predicts the OSCE the overall outcome) of 0.696, where the Cronbach’s Alpha per individual station varied between 0.486 for the Basic Life Support station and 0.769 for the Vital Signs station. In classical psychometric terms internal consistency was moderate . The overall average student performance for the clinical stations was 80.5%, with a minimum score of 16 out of 60 and a maximum of 100% (60 out of 60). The overall average student performance for the Basic Life Support station was 93.7% , with a minimum score of 21 out of 60, and a maximum score of 60 out 60)( Figure 2).
The summary results for the 2013-2014 cohort (n=170), as produced by OMIS, demonstrated an overall internal consistency of 0.666, whereby the Cronbach’s Alpha per station varied between 0.426 for the Basic Life Support station and 0.842 for the Vital Signs station. In classical psychometrics terms internal consistency was moderate. The overall average performance (SD) of the students was 84.8% for the clinical stations, with a minimum score of 20 out of 60 and a maximum score of 100% (60 out of 60). The overall average performance of students was 88.5% for the Basic Life Support station, with a minimum score of 28 out of 60 and a maximum score 100% (60 out of 60). In classical psychometric terms internal consistency was moderate (Figure 3).
Borderline regression analysis
Borderline Group Analysis is a simple way of calculating the average cut-off score of those students that were addressed as ‘borderline performers’ (i.e. examiners not being sure whether the student performance should be marked as fail or pass). Where there are a small number of students in this category, then Borderline Group Average estimates may be very unreliable as shown in the 2012-2013 cohort, where in some stations only 1 or 6 student performances were ‘marked as ‘borderline’ (i.e. 1 borderline scores for the Vital Signs station and 6 for the Basic Life Support station). The ‘average’ score of these students was 60.0% (1 student) for the Vital Signs station and 70.3% (6 students) in the Basic Life Support station. Using Borderline Group Average in this cohort would not provide similar information for those stations where no students were marked as borderline performers. A fully fledged Borderline Regression Analysis would however provide this information due to the inclusion of all Global Ratings from Fail to Excellent (Figure 1).
A similar situation arises with the Borderline Group Analysis of the 2013-2014 cohort, within which 42 student performances were regarded as borderline (i.e. 9 for urine analysis, 0 for the Chest X-ray station, 4 for the BMI station, 11 for the Vital Sign station and 18 for the Basic Life Support station). Due to the small numbers of students, cut-off scores of 67.8%, N/A, 63.8%, 69.2 and 81.5% may be unreliable but provide an indication about the difficulty of each station (BGS>50 means easy station; BGS<50 means a difficult station).
A fully fledged Borderline Regression Analysis is embedded into the OSCE Management Information System software whereby the forecast method is used to calculate new cut-off scores for each station taking into account the ‘difficulty of the station’ and the ‘hawk and dove effect’ of different examiners involved in the OSCE.
Figure 4 shows the item maximum scores for each of the four OSCE stations (60 for each of the clinical stations), along with each station’s mean score (out of 10, EU based) and standard deviation. Borderline Group Average could be calculated for all station. With Borderline Regression Method 1, cut-off scores are calculated for all stations based upon analysing item scores and Global Rating Scores of all students varying respectively from 72% in station 3 (was 70% in case of a ‘group’ average) to 53% for station 5.
Figure 4: Screenshot OMIS: Borderline regression summery table for Cohort 2012-2013 showing Item Scores (I1-I30); Item Total Scores (60); Raw Station Scores and Standard Deviation (SD); the number of students achieving a Borderline Score and finally the Borderline Group Average score (BRA) and Borderline Regression score (BR method 1 and Figure 1).
The summary of results section provided instant information about the scores of each individual student from two different cohorts in these two consecutive OSCEs. Although the average results are quite high in both cohorts 16 students in the first and 13 in second cohort failed in one or two of the consecutive stations using a ‘static pass mark’ set prior to the start of the exam at 50%. Due to the availability of a Global Rating Score facility, and an appropriate number of students (n ≥ 100), we performed a Borderline Group Average analysis. The latter is based upon the overall professional impression of the examiner evaluating a student’s performance and incorporates the difficulty of the stations and the variability within examiners. The examiner will mark this overall performance as a pass, borderline, fail, good, or excellent performance (Borderline Regression Method 1). In the borderline group feature (Figures 5 and 6), the average performance of these ‘borderline performing students’ is substantially above the static pass mark of 50% for all in stations of both cohorts of students. Where N/A is indicated this means that no students were marked as Borderline performing students in the second cohort. ‘Borderline performance’ is an indicator of examiner uncertainty with regard to whether or not a student should pass or fail. Whether the regression outcome is high or low, it is an indicator of whether a station is ‘easy’ or ‘difficult’ to pass respectively. Where the ‘Angoff Method’ is a standard setting method used prior to an examination, Borderline Regression Analysis is a standard setting method used after the examination has taken place and is based upon the professional impression of the examiner evaluating students’ performances according to a Global Rating Scale . In addition to the Borderline Group Average, OMIS provides a fully fledged borderline regression analysis that takes all scores into account and matches those with the professional impression of the examiners using a regression analysis (Figure 1). We used the simple forecast method, in an Excel template, using all item total marks and the Global Rating Scale in the regression equation . All station dynamic cut-off scores were above the 50% cut-off score indicating that stations were too easy to pass according to the professional impression of the examiners.
The OSCE design used for this clinical skills examination of undergraduate medical students demonstrated poor generalizability of results in this 5 station OSCE. The generalizability would improve by introducing more stations (e.g. an OSCE with 5-10 stations). However, current coefficients do not achieve the standards suggested in other research literature on the subject e.g. OSCEs [17,19-22] The generalizability of results is only appropriate in OSCEs with a minimum of 15-18 stations [23-25].
Scores of ‘borderline performing students’ were way beyond the ‘static cut-off score’ in all stations indicating the OSCE needs a qualification as ‘easy to pass. Making station designers aware of these high marks and training them on existing pre-recorded scenarios and using well described rubrics might reduce the amount of error and should be the focus of additional research. The benefit of student feedback allowing them an opportunity to benchmark themselves against the group and to get relevant timely feedback on their performance is available in the system but not being used in these cohorts. Future research should focus on the impact of instant feedback on the performance of students and used in future comparisons. Although not the subject of this study, the overall impact on time reduction in running the OSCEs and students’ and examiners’ behavior during assessment are features that need to be further researched. In contrast to our previous paper based approach, results and feedback could be released immediately after the exam was finished.