A Special Report on Formative Testing

Dr. Greg Sadesky explains the significance of formative assessment for your learning outcomes.

I think it’s safe to say that very few people actually enjoy taking tests. That said, tests tied to the topic of this blog may be an exception: Formative Tests. A formative test is one that is built for learning, rather than to determine whether you pass or fail. So, a formative test might provide you information on what areas or topics you seem to have mastered and what others might be fruitful areas for future growth.

Because a formative test is supposed to provide you with clear, actionable, and accurate information about your mastery/non-mastery of parts of a curriculum or set of knowledge and skills, you might suppose that the report you receive after taking such a test is a critical factor in how good any given formative test is. You’d be right. In fact, the following question about intended reports is good criteria for the development of any assessment:

“Will the report that I intend to produce based on this assessment be soundly supported by the proposed assessment design and implementation?”

Formative tests are usually created to provide the user with an informative report, perhaps something like the following:

In this report, mastery of particular topics is shown by the length and colour of the bars. The length presumably corresponds to the number of questions answered correctly (but could be some other kind of scaled ability estimate), and the colour represents the priority for future study. The graph also shows approximately where the test taker’s performance resides relative to some kind of acceptable standard, as implied by the thick black line.

So, on the basis of this report, the test interpretation guidelines might advise the test taker to study the topics shown as red (7 & 8) as the first priority, followed by those in yellow (1, 6, & 2). In a perfect world, instructional content that addresses each area would be available so that the learner could go seamlessly between assessment and instruction. The assessment may even be set up to allow the test taker to see the items they answered incorrectly, and rationales for the correct and incorrect responses.

Validity and reliability matter in formative assessment, too!

I’ve often been asked why more detailed information on assessment performance can’t be provided, for example, at the level of individual competencies or learning objectives. The answer is, because in all probability, not enough items have been asked at that fine level; information that’s available about a specific competency wouldn’t be reliable and therefore would be prone to overinterpretation.

But beyond a sufficient number of questions, in order for the bars in the graph to be a truthful reflection of a test taker’s strengths and weaknesses, the items representing the topic must reflect well the range of the topic. In other words, the formative assessment must be valid and reliable. What does this mean in concrete terms? Validity is often broken down in terms of relevance and representativeness; each item is relevant to the topic it is tied to, and taken together, the set of questions are a good representation of the topic as a whole. Reliability is the requirement that there are enough questions per topic that the judgment of mastery is justified; the conclusion that a test taker needs to study Topics 7 & 8 should be warranted based on test performance across multiple items.

Innovation in Formative Assessment

From the above description, I hope it’s clear that to get a nuanced picture of student strengths and weaknesses, the results from many test questions need to be considered. If that’s true, and many credentialing tests comprise many (100 or more) competencies, how can formative testing help students prepare for these kinds of exams? There are several possible answers to this question.

Answer 1. Take the information you can get. Consider the graph shown in Figure 1, which tells the student that they need to study more for Areas 7 & 8. It may be the case that the student has already mastered parts of Area 8, but the assessment doesn’t tell them which part. So, they may just decide to study this area in more detail, knowing that at least they don’t have to study Area 3. On the other hand, it may be that within Area 8, it’s likely that certain skills and knowledge are mastered before others and therefore a better approach to studying Area 8 is viable …which leads me to Answer 2.

Answer 2. Formative test developers should use construct mapping. A construct map identifies the knowledge and skill components that are associated with mastery of a domain. What’s more, those specific components are most often associated with specific levels of mastery within a knowledge domain. In this case, the components are ordered from lower to higher levels of mastery. After the completion of a formative test, the score they’ve achieved, either at the topic level or overall, can be used to create a description of the skills they have most likely mastered and not, based on the construct map. In this way, detailed accounts of performance can be given without asking so many questions at that level of detail. The implication for the learner is that they can direct their efforts on the topics and sub-topics informed by both their performance and the progression outlined in the construct map, efficiently addressing gaps.

Answer 3. Get adaptive. A more personalized, evidence-based version of Answer 2 capitalizes on the construct map but provides more than just the ‘most probable’ set of knowledge and skill that the test taker possesses. In this model, based on adaptive testing, a test taker would receive an item of average difficulty according to the construct map. If they answered the question correctly, they would not only be providing evidence that they understand the knowledge or skill to which the question is aimed, but also the pre-requisite, or easier skills on the construct map. The next question might be slightly harder, and a correct answer would again provide evidence for the test taker’s mastery of it and easier concepts. When the test reaches a point on the scale where the test taker begins to answer some questions incorrectly, the test engine could ask several more questions based on the set of knowledge and skill at that level in the construct map and stop when a clear picture of what is and is not known, is gained. Like Answer 2, a clear description of student mastery at a fine grain size can be obtained, but in the present case, this description would be based more on targeted data collection and processing and is therefore more personalized.

I hope you find this blog on formative assessment informative! As part of this series on formative assessment, I’ll be having a conversation with Dr. Chris Beauchamp about his ideal formative test, and more controversial issues like whether creating test prep products constitutes a threat to impartiality of credentialing organizations.