Department of Management, University of Tehran, Qom Campus, Tehran, Iran.
Accepted date: January 5, 2011; Published date: February 27, 2011
Visit for more related articles at Arts and Social Sciences Journal
The purpose of the research is to determine high school teachers’ skill rate in designing exam questions in physics subject. The statistical population was all of physics exam shits for two semesters in one school year from which a sample of 364 exam shits was drawn using multistage cluster sampling. Two experts assessed the shits and by using appropriate indices and z-test and chi-squared test the analysis of the data was done. We found that the designed exams have suitable coefficients of validity and reliability. The level of difficulty of exams was high. No significant relationship was found between male and female teachers in terms of the coefficient of validity and reliability but a significant difference between the difficulty level in male and female teachers was found (P<0.001). It means that female teachers had designed questions that are more difficult. We did not find any significant relationship between the teachers’ gender and the coefficient of discrimination of the exams.
Teacher-built exam; content validity; face validity; reliability; coefficient of discrimination; coefficient of difficulty.
Examination and testing is an important part of a teaching-learning process which allows teachers to evaluate their students during and at the end of an educational course. Many teachers dislike preparing and grading exams, and most students dread taking them. Yet tests are powerful educational tools that serve at least four functions. First, tests help you evaluate students and assess whether they are learning what you are expecting them to learn. Second, well-designed tests serve to motivate and help students structure their academic efforts. Crooks , McKeachie , and Wergin report that students study in ways that reflect how they think they will be tested. In last 40 years, the most exams used to evaluate the students have been designed by teachers. Some may have used tests which have been designed by outsider exam designers. These tests have not had enough efficiency . Regarding the importance of teacher-designed test in evaluation process of the students, many researches have been done in this area . In theory, the best test for a subject is a test that includes all educational objectives of the course. However, if the test is too long, its preparation is impractical. Therefore, instead of including all content and objectives, one may choose some questions which are representative of the whole subject to achieve all objectives. Such a test is said to have content validity .
Content validity of a teacher-designed test can be assessed by a sample of the test questions. When a test does not have content validity, two possible outcomes may occur. First, the students cannot present the skills that are not included in the test when they need. Second, instead some unrelated question may be included in the test that is answered wrongly. The important point here is that we should not mistake the face validity with content validity. The face validity is a measure that determines whether a test is measuring logically and whether students think the test questions are appropriate .
Based on what is said, an ideal test in addition to measuring what is supposed to measure must be consistently constant in different times. This characteristic is called reliability. Other measures of an ideal test are difficulty level and discriminant index. The total percent of the individuals who answer the question correctly is known as difficulty coefficient denoted by P . The discriminant index is a measure of discrimination between strong and weak groups. In this study, we intend to evaluate the extent of ideal quality measures (validity, reliability,) in teacher-designed test for first year high school.
The statistical population in this study consisted of all physics exam papers for final physics exams in first and second semester for first year of high school in Qom province of Iran of which a sample of 364 was taken. A multistage cluster sampling was used to draw samples. In first stage, one of four education districts was chosen and in second stage, three schools were randomly selected. In third stage, a number of exam papers from each school were selected according to the number of students in each school.
In this study, the content validity of the exam questions was assessed in two ways. In the first method we used a two dimensional table. One dimension was educational goals and the second dimension was the content of the course materials . The second method applied for assessing content validity was a questionnaire with Likert scale in which two physics education expert evaluated the extent of compatibility of exam questions with course contents. For assessment of face validity of teacherbuilt exams, we used a 12-item questionnaire answered by two physics experts.
To assess the reliability of the tests, we needed to use a number of experts to mark the exam papers in order that the marking does not affect the marker’s opinion . In this study, we asked two teachers to mark the exam papers separately and used Kendal agreement coefficient to check the agreement of the two markings.
Because all of physics exam questions were open questions, we used the following formula for calculating the difficulty coefficient (DifCo).
MS (i) = sum of marks for Strong group in question i
MW (i) = sum of marks for Weak group in question i
NB = number of students in both groups
Mi = total mark of question i
In addition, the Discriminant Coefficient (DisCo) was calculated based on the following formula .
MS (i) = sum of marks for Strong group in question i
MW(i) = sum of marks for Weak group in question i
ng = number of students in one group
mi = total mark of question i
The percentages of papers were almost equal in terms of students’ sex (49% males and 51% females). The characteristics of the exam questions are summarized in Table 1.
Table 1: Exam characteristics by book chapters.
Table 1 shows that almost half of the physics questions were on concept (52.5%) and smaller percentages on knowledge (39.3%) and application (8.2%). There were no questions on analysis, combination and evaluation in the exams.
As stated before, the agreement of teacher’s evaluations was calculated using Kendal’s agreement coefficient. The value of the coefficient was 0.54 which was significant at p-value of 0.002. The Kendal’s agreement coefficient for face validity of the questions based on the evaluation of expert teachers was 0.49 and significant at p-value<0.006. The reliability coefficient based on markers’ evaluations was 0.975 and significant (p<0.003). The minimum and maximum difficulty coefficients estimated were DifCoef (min) = 0.01 and DifCoef (max) = 1 with standard error of 0.20 which indicates that the questions have moderate difficulty level. The minimum and maximum discriminant coefficients were DisCoef (min)=0 and DisCoef (max)=1 with standard error of 0.21 indicating that the questions have good discriminant coefficient.
We also found no significant difference for content validity and reliability between female and male teachers. Then we compared the difficulty coefficient and discriminate coefficient between two sexes of teachers. The test results are shown in Tables 2 and 3.
|Difficulty level||# of questions from female teachers||# of questions from male teachers||Chi- squared value||Degrees of freedom||p-value|
Table 2: Chi-square test for comparison of difficulty coefficients between female and male teachers.
|Discriminant level||# of questions from female teachers||# of questions from male teachers||Chi-squared value||Degrees of freedom||p-value|
Table 3: Chi-square test for comparison of discriminant coefficients between female and male teachers.
Table 2 shows that there is a significant relationship between difficulty level of the questions and the sex of teachers. Female teachers tend to design more difficult physics questions than males.
Table 3 shows no relationship between the teacher’s sex and the discriminant level of the questions.
One of the important issues in any teaching and learning system is the quality of the students. There should be some standards for exam questions so that we have the same and high level of quality among all educational organizations’ output. Although the achievement of students in their course of study is important, the performance of teachers is also of great importance. One of the factors in the performance of teachers is good examination and good marking. Exam questions play a vital role in students’ achievement. The level of difficulty, discrimination, validity and reliability of exam questions must be ensured in order to have good outputs. In this study, we concluded that some of these factors could differ among different teachers in terms teacher’s sex. Female teachers tend to design more difficult questions than males. This may be because of the performance of the female students . We also found that a high percentage of exam questions concentrate on concept (52.4%) and knowledge (39.3%) whereas the small percentage applications. This may be because of the nature of quantitative sciences like physics. These percentages may of course change when the topic of the course changes. In summary, teachers need to be assessed and evaluated during their teaching process to ensure the quality of their performance.
This research was funded by Education and Teaching Organization of Qom Province of Iran.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals