It certainly won't be a surprise if Derek asks a question that focuses on issues of validity and causal inference. He mentioned it to me personally and put it in a study guide, so studying it now will be time well spent. I feel like I've had a tendency to read the validity literature a bit too quickly or superficially, so this is a good opportunity for me to revisit some of the papers I've looked at over the past couple of years. Here's the list I've put together for myself:
AERA/APA/NCME. (1999). Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association. [Just the first chapter, "Validity."]
Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. Braun (Eds.), Test validity (pp. 19–32). Mahwah, NJ: Lawrence Erlbaum Associates.
Borsboom, D., Cramer, A. O. J., Kievit, R. A., Scholten, A. Z., & Franic, S. (2009). The end of construct validity. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135–170). Information Age Publishing.
Brookhart, S. M. (2003). Developing measurement theory for classroom assessment purposes and uses. Educational Measurement: Issues and Practice, 22(4), 5–12. doi:10.1111/j.1745-3992.2003.tb00139.x
Chatterji, M. (2003). Designing and using tools for educational assessment (p. 512). Boston, MA: Allyn & Bacon. [Chapter 3, "Quality of Assessment Results: Validity, Reliability, and Utility"]
Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum.
Eisenhart, M. A., & Howe, K. R. (1992). Validity in educational research. In M. LeCompte, W. Milroy, & J. Priessle (Eds.), The handbook of qualitative research in education (pp. 642–680). San Diego, CA: Academic Press.
Gorin, J. S. (2007). Test design with cognition in mind. Educational Measurement: Issues and Practice, 25(4), 21–35. doi:10.1111/j.1745-3992.2006.00076.x
Haertel, E. H., & Herman, J. L. (2005). A historical perspective on validity arguments for accountability testing. In J. L. Herman & E. H. Haertel (Eds.), Uses and misuses of data for educational accountability and improvement (NSSE 104th., pp. 1–34). Malden, MA: Wiley-Blackwell.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. doi:10.2307/2289069
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. doi:10.1037/0033-2909.112.3.527
Leighton, J. P., & Gierl, M. J. (2004). Defining and evaluating models of cognition used in educational measurement to make inferences about examinees’ thinking processes. Educational Measurement: Issues and Practice, 26(2), 3–16. doi:10.1111/j.1745-3992.2007.00090.x
Linn, R. L., & Baker, E. L. (1996). Can performance-based student assessments be psychometrically sound? Performance-based student assessment: Challenges and possibilities (pp. 84–103). Chicago, IL: The University of Chicago Press.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 33–45). Hillsdale, NJ: Lawrence Erlbaum.
Michell, J. (2009). Invalidity in validity. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 111–133). Information Age Publishing.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference (p. 623). Boston, MA: Houghton Mifflin. [Probably Chapters 1-3 and 11, if not more.]
Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19(1), 405–450.
Shepard, L. A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16(2), 5–24. doi:10.1111/j.1745-3992.1997.tb00585.x
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 65–82). Information Age Publishing.
Thankfully, some of these papers I've read recently for my Advances in Assessment course so the amount of reading I have to do is appreciably less than it might look. In my typical fashion, I'll study these in chronological order with the hopes that I get a sense for how the field has evolved its thinking and practice regarding these ideas over the past several decades.
Although I have little other graduate school experience to compare it to, I feel like this reading list is representative of what sets a PhD apart, particularly one earned at an R1 university. It's not necessarily glamorous, and its relevance to the day-to-day teaching and learning in classrooms might not be immediately obvious. But without attending to issues like validity and causal inference, we have a much more difficult time being sure about what we know and how we're using that knowledge. Issues of validity should be at the heart of any assessment or measurement, and when they're attended to properly we greatly improve our ability to advance educational theories and practice.