MathEd.net: Sorting Out the Summative: When Standards-Based Grading Meets the End of the Semester

Many teachers who choose to use standards-based grading eventually find themselves facing the reality of their school's grading policies and tradition: the expectation of final, summative grades that are reported as percentages and letters. So regardless how hard you try to focus on quality feedback instead of grades all semester long (for good reason), there comes a time when, for reasons probably beyond your control, you have to turn levels and descriptions of student understanding into numbers. This is SBG's "Monday Morning Problem" that doesn't always get addressed in theory. But this week is finals week for my basic statistics students, so for me the time has come to convert standards-based formative grades into a summative grade, including calculating final exam grades. Here I'll try to describe the two steps I'll take to calculate my students' grades: (a) conversion of their formative scores into a summative score and (b) scoring and inclusion of the final exam into their semester grades.

Formative to Summative
Besides giving students a lot of written and verbal feedback about where they should try to improve, I've been using the simplest of measures to record their performance on class objectives: either students (a) "get it," (b) "sort of get it," or (c) "dont' get it/haven't demonstrated it." You could think of these as "green light," "yellow light," and "red light," respectively. I've tried discerning more levels of understanding in a gradebook and it only seems to lead to confusion and indecision (both for me and students), so I'm sticking to three levels, as suggested in Her & Webb (2004). If I need more detail, I can always go back to the copies of the work students have submitted and the comments I've made.

The gradebook we have for class is pretty primitive and as far as I can tell it only accepts numbers, so I mark my three levels as either a 2, a 1, or a 0. It doesn't take much explaining to students that a 1 shouldn't be viewed as "out of two" and therefore worth 50%. I do tell them, though, that in order to receive credit for the course they should average a 1 across all objectives. In other words, you can't pass the class without an average of at least some understanding of every objective.

Around here and in many other places, 70% seems to be the low end of passing grades. (We're not messing with Ds.) So if a student with all 1s should get at least a 70%, and a student with all 2s maxes out at 100, and we choose a linear function between the two, the "conversion formula" to percentages is simply:

percentage = 30 * objective score average + 40

If you feel a little dirty at this point because you know you just reduced all the various skills, knowledge, and abilities of your students into a single number, I say join the club. If you didn't feel that way I wouldn't have expected you to be using standards-based grading to begin with.

A "No Surprises" Approach to Final Exam Grades
Designing a final exam is often tricky business. It can't possibly assess everything in the course, but we generally want it to include the major topics and themes for the class and be possible to complete in the time allowed. We also have to think about difficulty. Trust me, your students are!

Teachers want their finals to be challenging, but they don't want to have that sinking feeling as they grade the exams that maybe the test was too hard. For whatever reason, sometimes students perform poorly and averaging the final exam grade into their other grades will look like a disaster. But ask yourself: What am I more confident in, my careful judgments of students' ability as demonstrated over an entire semester, or a fleeting, one-time judgement of students' ability on a single assessment during the most stressful time of the year? If you're using standards-based grading, I already know how you'll answer that question. If not, consider this example: I have a student who I know can do stats. She's turned in good work. She's asked quality questions. We've had good discussions. But I also know she has seven final exams this week. I still think she'll do fine, but I'll understand if she's not at her best. And I need a grading system that reflects that understanding.

In order to free myself to still give challenging, yet reasonable, assessments, without risking any huge surprises when grades are calculated, I perform a little statistical magic that ensures that the distribution of final grades has the same center and spread of class grades before the final. I'm sure many of you try "curving" your exam scores some other way, such as letting the top score count as the total possible, or even having a pre-set distribution in mind of how many As, Bs, Cs, etc. you'll allow (which is not a good idea, generally, for reasons described by Krumboltz & Yeh, 1996). I prefer my method because it accounts for the distribution of grades, not just the top score, and the distribution is determined by the students, not arbitrarily by me. Allow me to demonstrate with a couple examples.

Suppose before the final the average percentage grade is 85 and the standard deviation of those grades is 10. Then I grade my final exams and find that the average final exam grade is 60 with a standard deviation of 18. Ouch. But don't worry -- statistics will come to our rescue.

Provided you know a little basic descriptive statistics, the conversion is simple. For each student's final exam score, find out how many standard deviations above or below the mean they scored on the final (their final exam z-score), and match that with the same number of standard deviations above or below the mean they'd fall on the pre-final grade distribution (their pre-final z-score). Consider the following students and the class and exam statistics above:

Suppose Student A scores a 51 on the final exam. That's 0.5 standard deviations below the mean. (51 - 60 = -9, and -9/18 = -0.5.) So where is 0.5 standard deviations below the mean on the pre-final distribution? If that mean is 85 and the SD is 10, then 0.5 standard deviations below the mean is 80. So I record an 80 for that student instead of a 51.
Suppose Student B scores a 75 on the final exam. That's about 0.83 standard deviations above the mean. (75 - 60 = 15, and 15/18 = 0.83.) So where is 0.83 standard deviations above the mean on the pre-final distribution? About 8.3% above an 85, so I record their exam grade as a 93.3.
Suppose Student C scores a 60 on the final exam. That's the same as the mean, so zero standard deviations above or below. That conversion is super-easy: their final exam grade is the mean of the pre-final mean, an 85.

For an example of how to set up a spreadsheet to do this, see https://docs.google.com/spreadsheet/ccc?key=0Anne5Z-jCkqhdDVtemkyaGhnRWFfclJoa0dIUVQ5RVE. I recommend making a copy of it for yourself and seeing what happens as you change values.

This is not a perfect system (and comments about its imperfections are welcome in the comments), but it does take away the element of surprise if the final exam happens to be way too easy or too difficult, or if other circumstances prevent grades from working out the way you'd expect. Yes, this is a norm-referenced system instead of a criterion-referenced system, meaning that the grades students earn on the final is measured largely as how they compare to their classmates and the class average. The good news is this: both the teacher and the students have an incentive before the final to master as many objectives as possible, and that is criterion-referenced. A high pre-final average helps everyone get a high final exam average, and a small pre-final standard deviation minimizes variability in final exam scores.

References

Her, T., & Webb, D. C. (2004). Retracing a path to assessing for understanding. In T. A. Romberg (Ed.), Standards-based mathematics assessment in middle school: Rethinking classroom practice (pp. 200-220). New York, NY: Teachers College Press.

Krumboltz, J. D., & Yeh, C. J. (1996). Competitive grading sabotages good teaching. Phi Delta Kappan, 78(4), 324-326. Retrieved from http://www.jstor.org/stable/20405782