Putting Learning to the Test

Frequent testing leads to assurance of learning—but only if the tests are done right.
Putting Learning to the Test

College students have a hard time remembering what they’ve learned, and their poor retention rate has been well-documented. For example, in a 1980 study, 1,220 college students were re-tested seven years after they had taken a two-semester economics course. On average, when compared with a group that had not even taken the course, they scored only 9.8 percent higher on course content.

Even while students are still in school, many of them appear to do well all semester, then fail the final. Others cram for the exam, but don’t retain enough of the material to move on to more advanced courses. The problem may not be that students aren’t studying enough—it may be that they’re not being tested enough.

That conclusion is drawn from early research in the science of learning, an emerging field that has come about as public policy and mandated high-stakes testing have focused attention on learning outcomes. In this field, scientists seek to identify instructional conditions that promote robust student learning—specifically, learning that is retained for long durations, transfers to novel situations, or serves as a foundation for future learning. Starting this year, the National Science Foundation plans to fund multiple large-scale, long-term centers focused on the science of learning.

So far, it appears that a somewhat counterintuitive take on a well-known phrase is re-emerging as a theme: Practice makes perfect. But not just any practice. Researchers have confirmed that testing—a form of practice—produces better recall than repeated study and simple review sessions.

This “testing effect” may seem to counter the conventional wisdom that repeated studying is what enhances learning, whereas tests are just necessary evils—they motivate students to spread out their studying and they allow teachers to assign grades. Yet, the testing effect has been documented in at least 40 years of research involving students at all levels.

Examining the “Testing Effect”

Recent studies have highlighted the testing effect and elevated interest in the science of learning. Jeffrey Karpicke of Purdue University in West Lafayette, Indiana, and Henry Roediger of Washington University in St. Louis, Missouri, have produced testing research funded by the U.S. Department of Education. Their work has appeared in scholarly journals such as Science, Psychological Science, and the Journal of Memory and Language.

Faculty should avoid simple recall questions such as what or when; instead, they should pose thought-provoking questions built around why, how, and what if.

With different subjects and different content, Karpicke and Roediger conducted multiple studies investigating retention after different study and test sequences. These included study-test-restudy-retest, study-restudy-restudy-test, and study-test-retest-retest. They also altered what was studied. If students had demonstrated mastery of some content on an earlier test, they could exclude that content from further studying or subsequent testing.

If learning were solely a function of studying, then we would expect that study-study-study-test would yield the best results, while study-test-test-test would have a neutral or detrimental effect. Furthermore, if there were no need to study material after it had been mastered, the traditional paradigm of studying something and then moving on would yield the same results as intensive studying interspersed with testing, or intensive studying followed by multiple tests.

However, Karpicke and Roediger affirmed that testing is not a neutral event and that it is not a good idea to skip over mastered material when studying. Students who followed the study-test-test-test pattern had superior long-term recall of content when compared with students who had followed the study-study-study-test or study-test-study-test sequences. Furthermore, both study-test-study-test and study-test-test-test models yielded much better results than models where students studied and were tested, and then no longer had to study or be tested on content that already had been tested.

In short, practice made perfect: Students who were repeatedly tested on the full material did best of all.

Testing Tips

So how do faculty create tests that truly assess what students are learning? Research in testing, learning, and assessment suggests these nine strategies for improving learning—before and after a test.

1. Give frequent assignments.
Before they even issue the first test, professors should give students meaningful assignments that require them to work with the material that will be covered in an exam. When students have to outline, apply, and synthesize information, they learn better than they do when they simply read or re-read material. For instance, in a statistics class, students might present case studies involving various techniques and the class might discuss new scenarios where the techniques would apply.

2. Emphasize practical applications.
It’s easier for students to remember concepts when they’re related to practical applications than when they’re presented as abstractions. Therefore, in most business courses, theory should be kept to a minimum, used only to help students understand key issues. Of course, this depends on the students’ needs. In a terminal course for business students, such as statistics, much of the theory is irrelevant. But in a statistics class for math majors, students need to understand all the formulas and how to generalize from them.

Once students understand one application, they can more readily see how it applies in similar instances, which allows them to transfer what they’ve learned to novel situations. Such transfer of knowledge, from generalized principles to specific situations, is at the heart of all learning.

3. Identify critical skills.
Faculty should make it clear at all levels—from course and syllabus to chapter and classroom—what crucial skills they expect students to learn. For example, at the chapter level in a statistics class, a goal might be for students to understand problems that are addressed in designated books. At the course level, a specific goal might be for them to be able to explain the logic of significance testing. A more general objective might be for them to become critical consumers of scientific studies.

4. Carefully design the test.
Frequent testing isn’t beneficial if tests aren’t well-designed. Professors should make sure that questions are worded clearly and that one question does not give away the answer to another. Constructing a test takes advanced skill, patience, and more time than many professors expect. Faculty need to plan their test content and questions just as carefully as they plan the outlines or frameworks they use for teaching.

5. Test relevant skills.
A test is only valid if its questions are built around knowledge the professor has communicated to students and expects them to have mastered. It’s easy to develop a poor test that has numerous questions addressing relatively obscure points, especially if the professor is drawing questions from an item bank—but that doesn’t help students with long-term retention of key concepts.

6. Prepare the right tests.
Learning is enhanced when students must generate answers instead of simply recognizing answers that are provided. That’s why essay tests with open-ended questions are better than most multiple-choice or true-false tests. Properly constructed multiple-choice questions can assess skills almost as well, but those questions are harder to write and generally aren’t found in abundance in test banks.

7. Ask the right questions.
Tests should require students to use their problem-solving and reasoning skills. Faculty should avoid simple recall questions such as what or when; instead, they should pose thought-provoking questions built around why, how, and what if. Such questions require students to work more actively with the material—which is a form of practice. As such, it leads to better retention.

8. Assess frequently.
Frequent testing enhances both short-term and long-term learning and encourages students to study continuously throughout the semester. Assessments come in many forms, including quizzes, class presentations, and critiques. As previously mentioned, cumulative content tests—exams that include what has been mastered along with new material—are more effective than non-overlapping assessments of separate content.

9. Provide timely feedback.
Frequent assessments not only measure how much students are learning, but also reveal precisely what they are learning. If testing shows that there are portions of the material that students haven’t learned—or haven’t learned well—those portions can be retaught, perhaps in a different way. Professors can correct misunderstood material before it has become ingrained in a student’s mind. If repeated testing is used as feedback, it can lead to better teaching.

Sometimes “erroneous learning” is a side effect of the testing, since testing can lead to long-term retention of misconceptions. On open-ended questions, the constructed response that appears to be reasonable tends to be remembered. On a multiple-choice test, the incorrect answers can be learned instead of the correct ones. This side effect also can be reduced with timely, relevant feedback.

It’s an enormous mistake to give students their corrected tests and allow them to glance at their results only briefly before turning the papers back in. Students should be able to keep these assessments so they can review their past errors—and retain the right answers over the long term.

Use It, It’s Yours

Dale Carnegie taught us that if we want to remember names, we can’t simply hear them repeated; we must say them often. Mark Twain taught us how to expand our vocabularies: “Use a new word correctly three times, and it’s yours.” Similarly, it’s a generally held belief that people learn a language more easily if they immerse themselves in it and speak it daily, instead of just reading a textbook. Testing has the same effect—it encourages long-term retention of information.

Unfortunately, in many classroom situations, testing often is viewed as a nuisance to both faculty and students that takes away from instruction time. The typical college paradigm promotes minimal testing—usually just a midterm and a final—and students often put off studying until the last minute. They obtain better grades than they would have if they hadn’t studied at all, and they feel confident that they’ve mastered the subject matter. However, these are superficial, short-term gains, and they come at the expense of long-term learning and retention.

For true learning, it’s better for professors to test early, test often—and test everything. As the term progresses, faculty should treat each test like a practice final. For students, that kind of active practice will make them letter-perfect.

Lawrence M. Rudner is vice president for research at the Graduate Management Admission Council in McLean, Virginia, and a visiting professor teaching statistics for EMBA students at the Goethe School of Business in Frankfurt, Germany.