Page 196 - DEDU504_EDUCATIONAL_MEASUREMENT_AND_EVALUATION_ENGLISH
P. 196

Educational Measurement and Evaluation


                   Notes

                                              Commercial, national, norm-referenced “achievement” tests include the California
                                              Achievement Test (CAT); Comprehensive Test of Basic Skills (CTBS), which includes
                                              the “Terra Nova”; Iowa Test of Basic Skills (ITBS) and Tests of Academic Proficiency
                                              (TAP); Metropolitan Achievement Test (MAT); and Stanford Achievement Test (SAT,
                                              not to be confused with the college admissions SAT). “IQ,” “cognitive ability,”
                                              “school readiness,” and developmental screening tests are also NRTs.

                                  15.4 Accuracy of Test Score

                                  The items on the test are only a sample of the whole subject area : There are often thousands of
                                  questions that could be asked, but tests may have just a few dozen questions. A test score is
                                  therefore an estimate of how well the student would do if she could be asked all the possible
                                  questions.
                                  All tests have “measurement error.” No test is perfectly reliable. A score that appears as an
                                  absolute number -- say, Jamal’s 63 -- really is an estimate. For example, Jamal’s “true score” is
                                  probably between 56 and 70, but it could be even further off. Sometimes results are reported in
                                  “score bands,” which show the range within which a test-takers’ “true score” probably lies.
                                  There are many other possible causes of measurement error : A student can be having a bad day.
                                  Test-taking conditions often are not the same from place to place (they are not adequately
                                  “standardized”). Different versions of the same test are in fact not quite exactly the same.
                                  Sub-scores on tests are even less precise : This is mostly because there are often very few items
                                  on the sub-test. A score band for a Juanita’s math sub-test might show that her score is between
                                  the 33rd and 99th percentile because only a handful of questions were asked.
                                  Scores for young children are much less reliable than for older students : This is because young
                                  children’s moods and attention are more variable. Also, young children develop quickly and
                                  unevenly, so even an accurate score today could be wrong next month.
                                  What do score increases mean ? If your child’s or your school’s score goes up on a norm-
                                  referenced test, does that mean she knows more or the school is better ? Maybe yes, maybe not.
                                  Schools cannot teach everything. They teach some facts, some procedures, some concepts, some
                                  skills -- but not others. Often, schools focus most on what is tested and stop teaching many things
                                  that are not tested. When scores go up, it does not mean the students know more, it means they
                                  know more of what is on that test.
                                  For example, history achievement test “A” could have a question on Bacon’s Rebellion (a rebellion
                                  by Black slaves and White indentured servants against the plantation owners in colonial Virginia).
                                  Once teachers know Bacon’s Rebellion is covered on the exam, they are more likely to teach
                                  about it. But if those same students are given history test “B,” which does not ask about Bacon’s
                                  Rebellion but does ask about Shay’s Rebellion, which the teacher has not taught, the students will
                                  not score as well.
                                  Teaching to the test explains why scores usually go down when a new test is used : A district or
                                  state usually uses an NRT for five to ten years. Each year, the score goes up as teachers become
                                  familiar with what is on the test. When a new test is used, the scores suddenly drop. The students
                                  don’t know less, it is just that different things are now being tested.




                                          Multiple-choice and short-answer questions do not measure most knowledge that students
                                          need to do well in college, qualify for good jobs, or be active and informed citizens.





        190                                 LOVELY PROFESSIONAL UNIVERSITY
   191   192   193   194   195   196   197   198   199   200   201