### Sorting Out the Summative: When Standards-Based Grading Meets the End of the Semester

 Source: Wikipedia

Formative to Summative
Besides giving students a lot of written and verbal feedback about where they should try to improve, I've been using the simplest of measures to record their performance on class objectives: either students (a) "get it," (b) "sort of get it," or (c) "dont' get it/haven't demonstrated it." You could think of these as "green light," "yellow light," and "red light," respectively. I've tried discerning more levels of understanding in a gradebook and it only seems to lead to confusion and indecision (both for me and students), so I'm sticking to three levels, as suggested in Her & Webb (2004). If I need more detail, I can always go back to the copies of the work students have submitted and the comments I've made.

The gradebook we have for class is pretty primitive and as far as I can tell it only accepts numbers, so I mark my three levels as either a 2, a 1, or a 0. It doesn't take much explaining to students that a 1 shouldn't be viewed as "out of two" and therefore worth 50%. I do tell them, though, that in order to receive credit for the course they should average a 1 across all objectives. In other words, you can't pass the class without an average of at least some understanding of every objective.

Around here and in many other places, 70% seems to be the low end of passing grades. (We're not messing with Ds.) So if a student with all 1s should get at least a 70%, and a student with all 2s maxes out at 100, and we choose a linear function between the two, the "conversion formula" to percentages is simply:

percentage = 30 * objective score average + 40

If you feel a little dirty at this point because you know you just reduced all the various skills, knowledge, and abilities of your students into a single number, I say join the club. If you didn't feel that way I wouldn't have expected you to be using standards-based grading to begin with.

A "No Surprises" Approach to Final Exam Grades
Designing a final exam is often tricky business. It can't possibly assess everything in the course, but we generally want it to include the major topics and themes for the class and be possible to complete in the time allowed. We also have to think about difficulty. Trust me, your students are!

Teachers want their finals to be challenging, but they don't want to have that sinking feeling as they grade the exams that maybe the test was too hard. For whatever reason, sometimes students perform poorly and averaging the final exam grade into their other grades will look like a disaster. But ask yourself: What am I more confident in, my careful judgments of students' ability as demonstrated over an entire semester, or a fleeting, one-time judgement of students' ability on a single assessment during the most stressful time of the year? If you're using standards-based grading, I already know how you'll answer that question. If not, consider this example: I have a student who I know can do stats. She's turned in good work. She's asked quality questions. We've had good discussions. But I also know she has seven final exams this week. I still think she'll do fine, but I'll understand if she's not at her best. And I need a grading system that reflects that understanding.

In order to free myself to still give challenging, yet reasonable, assessments, without risking any huge surprises when grades are calculated, I perform a little statistical magic that ensures that the distribution of final grades has the same center and spread of class grades before the final. I'm sure many of you try "curving" your exam scores some other way, such as letting the top score count as the total possible, or even having a pre-set distribution in mind of how many As, Bs, Cs, etc. you'll allow (which is not a good idea, generally, for reasons described by Krumboltz & Yeh, 1996). I prefer my method because it accounts for the distribution of grades, not just the top score, and the distribution is determined by the students, not arbitrarily by me. Allow me to demonstrate with a couple examples.

Suppose before the final the average percentage grade is 85 and the standard deviation of those grades is 10. Then I grade my final exams and find that the average final exam grade is 60 with a standard deviation of 18. Ouch. But don't worry -- statistics will come to our rescue.

Provided you know a little basic descriptive statistics, the conversion is simple. For each student's final exam score, find out how many standard deviations above or below the mean they scored on the final (their final exam z-score), and match that with the same number of standard deviations above or below the mean they'd fall on the pre-final grade distribution (their pre-final z-score). Consider the following students and the class and exam statistics above:

• Suppose Student A scores a 51 on the final exam. That's 0.5 standard deviations below the mean. (51 - 60 = -9, and -9/18 = -0.5.) So where is 0.5 standard deviations below the mean on the pre-final distribution? If that mean is 85 and the SD is 10, then 0.5 standard deviations below the mean is 80. So I record an 80 for that student instead of a 51.
• Suppose Student B scores a 75 on the final exam. That's about 0.83 standard deviations above the mean. (75 - 60 = 15, and 15/18 = 0.83.) So where is 0.83 standard deviations above the mean on the pre-final distribution? About 8.3% above an 85, so I record their exam grade as a 93.3.
• Suppose Student C scores a 60 on the final exam. That's the same as the mean, so zero standard deviations above or below. That conversion is super-easy: their final exam grade is the mean of the pre-final mean, an 85.
For an example of how to set up a spreadsheet to do this, see https://docs.google.com/spreadsheet/ccc?key=0Anne5Z-jCkqhdDVtemkyaGhnRWFfclJoa0dIUVQ5RVE. I recommend making a copy of it for yourself and seeing what happens as you change values.

This is not a perfect system (and comments about its imperfections are welcome in the comments), but it does take away the element of surprise if the final exam happens to be way too easy or too difficult, or if other circumstances prevent grades from working out the way you'd expect. Yes, this is a norm-referenced system instead of a criterion-referenced system, meaning that the grades students earn on the final is measured largely as how they compare to their classmates and the class average. The good news is this: both the teacher and the students have an incentive before the final to master as many objectives as possible, and that is criterion-referenced. A high pre-final average helps everyone get a high final exam average, and a small pre-final standard deviation minimizes variability in final exam scores.

References

Her, T., & Webb, D. C. (2004). Retracing a path to assessing for understanding. In T. A. Romberg (Ed.), Standards-based mathematics assessment in middle school: Rethinking classroom practice (pp. 200-220). New York, NY: Teachers College Press.

Krumboltz, J. D., & Yeh, C. J. (1996). Competitive grading sabotages good teaching. Phi Delta Kappan, 78(4), 324-326. Retrieved from http://www.jstor.org/stable/20405782

1. I think it makes sense to have final, overall grades be norm-referenced... if the school agrees that overall grades are for sorting kids into levels. So much stress and angst around final grades comes from differing ideas about what the darn things mean! I also love that you're just plain-old using statistics on the norm-referenced part of your grades. The worst, worst, worst thing about a lot of people's grading is that the final grade is used as a ranking mechanism AND it's not made up of directly-compared data. The teacher decides that assignment 1 is worth x% and assignment 2 is worth y%, and goes from there - with no understanding that those percentages, AND the actual measurement of assignments 1 and 2 themselves, are fraught with error!

ActiveGrade has a neat way to do criterion-referenced overall grades - see the blog post at http://bit.ly/vDShRS for more specifics.

2. I really appreciate how incredibly thoughtful you are about the complex issues of grading and more importantly how a child internalizes what that grade represents. As the principal of an elementary school that uses a traditional report card, I know how hard it is for teachers to synthesize a quarters worth of work into a single grade. To combat the confusion, we have moved to a SB tracking system that provides real time reports to parents. Parents can see at any time the level of mastery their child has achieved on the core standards that have been taught. When the final report card is created, a standard by standard summary is attached to the report card.

This is obviously easier in an elementary environment as the final grade is not attached to graduation requirements are evaluated on college applications. The reality is that parents are much more interested in the standard by standard report then they are the actual report card.

3. Riley: Sorting/ranking works well with single letters or numbers, and any school that is trying to do otherwise with their grades but is still hanging on to the single reported measure probably has a lot of thinking to do. And you're giving me flashbacks of crazy things I saw teachers do with their grades. I'd cringe when I'd hear a teacher say things like, "I'm going to make their book report worth 1000 points so kids will have to turn it in or fail the course." Meanwhile, they forgot they weighted book reports as only 15% of a students grade. Besides, 1000 points? I can discern 3 or 4 levels of student performance. Never 1000. Not even 10.

Trenton: I think there's a tradition of SBG in elementary school. I remember getting a report card with a whole bunch of descriptors for skills, behavior, knowledge, etc. with marks for S (satisfactory), N (needs improvement) and I (improved). Instead of elementary schools trying to grade more like high schools, maybe we should have been doing it the other way around. At the college level I have much more freedom than I had as a high school teacher, but with the right administrative support and professional development I see no good reasons that SBG can't be successful in high school. (Particularly if schools drop their current grading software and adopt Riley's or something like it.)

4. I applaud your efforts (in the midst of finals week!) to share your decision making and statistical methods that relate standards based grading to the percent-based grading scheme still required by most secondary schools and universities.

5. The "I'm going to adjust THIS variable in my gradebook so that students have to do THIS work" is so backwards. I don't know... I get it, and I even did it a few times, but it's a kind of desperate duct-tape solution to the problem of motivating students to do the work you want them to do.

I love SBG because we talk about the learning we want student sto do instead of the work we want them to do.

6. I love your description of this as the "Monday morning problem of SBG" :-) An extra wrinkle in the problem is when you have to compare the results of your class with other classes (who most likely didn't use SBG).