Student Assessment

Meg Gorzycki, Ed.D.

Why Think about Assessment?

Assessments of student work and proficiency can have important implications for students’ academic progress and entry into a given profession. Assessments of students' work and proficiency may also reveal important information about:

  • Students’ developmental needs (Hughes & Scott-clayton, 2011)
  • Whether programs meet accreditation standards, (Martell, 2007)
  • The quality of instruction (Upcraft & Schuh, 1996)

It is important, therefore, that assessments:

  1. Measure, qualify, and evaluate exactly what students have been told will be be measured, qualified, and evaluated
  2. Use criteria that is appropriate to the course outcomes and developmental level of course work
  3. Achieve strong consistency in the way standards are applied to each constituent
  4. Apply criteria and standards that reflect competency  widely embraced in a profession or discipline

The following tutorial will address:

Formative Assessment

Summative Assessment

Formative Assessment Techniques

Mulitple Choice Tests

Rubrics

Formative Assessment

Formative assessment provides information on the quality of student work or student understanding at a given moment in time; it generally does not generate a grade; and, the purpose of the event is to give students some ideas about how to improve their work so that they may improve their work before submitting it for a summative grade. Formative assessments are excellent teaching tools (Boston, 2002; Miller, et. al., 1998; Garfield, 1994) as they:

  • Provide instructors with a means of helping students identify and understand essential criteria for mastery and understand why that criteria matters
  • Provide instructors with a means of helping students understand  the subtle difference between various qualities of work
  • Provide instrumentation with which students can rehearse their own peer and self-evaluation skills

Summative Assessment

Summative assessment represents a final evaluation of student work; summative assessments are often associated with final exams that call upon students to recall the cumulative knowledge of a course or with culminating projects that represent students’ synthesis and application of knowledge, multiple concepts, and skills addressed in the course. Summative assessments generate grades.

Formative Assessment Techniques

The decision to use formative assessments is influenced by the purpose and context of the assessment. Formative assessments are often shaped by the developmental level of the class, the knowledge (declarative, procedural, conditional) targeted by the course, and the logistical considerations such as time and available resources (Nilson, 2010). Formative assessments are versatile and can be used to:

  • Review prior knowledge when introducing new material
  • Spot check students’ progress mid-way through a lesson or unit of lessons
  • Rehearse skills embodied in an upcoming summative assessment
  • Prompt student reflection about their values and attitudes

As Table 1 suggests, the techniques for formative assessments may engage students in a variety of activities. The student learning outcomes in the table represent the cardinal course outcomes and the logistics of assessment address where the formative assessments are inserted into the course and what is necessary in order to facilitate them. Note that assessments may address all or only part of a particular outcome.

Table 1: Techniques for Formative Assessments with Student Learning Outcomes in a Course on Ethics and Media

Student Learning Outcomes

Purpose of Formative Assessment

Formative Assessment

Logistics of Assessment

Students will trace the evolution of media monopoly in the US, identify the major owners of media and describe the impact of media monopoly

To review course materials prior to a summative exam

Students will work in pairs to create a timeline of major events and list key consequences of media monopoly

Students will work for 20 minutes in pairs, use their notes and readings, exchange their findings with another pair; each pair will assess the timelines and lists for thoroughness

Students will identify the salient arguments for and against the censorship of entertainment which ethical principles are involved, construct a thesis defending their own position on the controversy

To review students prior knowledge about ethical principles and censorship as a means of introducing the unit on censorship and entertainment

Students will take a 12-piont quiz targeting their knowledge about ethical principles and censorship of entertainment in the US

Students will work individually on the quiz and the instructor will walk through the answers, directing students to take notes and answering questions

Students will identify the salient arguments for and against the censorship of entertainment which ethical principles are involved, construct a thesis defending their own position on the controversy

To critique students’ compositions based on their thesis regarding the ethics of censorship and offer insights about how to rewrite an essay

Students submit essays to the instructor who will use the same rubric students used to compose the essay to assess the quality of student work and provide comments on how to improve it

Students will “grade” essays and in a subsequent lesson, will to review some of the strengths and limitations commonly found in them; the students will be directed to re-write their essays for summative credit

Multiple Choice Tests

Multiple choice exams evolved subsequent to the introduction of quantitative measurement of student achievement by William Farish in 1792 and  were introduced to the classroom in 1914 at the University of Kansas (Hogan, 2007, Lemann, 2000). While decreasing the time and effort required to grade exams, and while enabling quantitative studies of large numbers of tests, multiple choice tests are controversial because they do not encourage students to read or comprehens deeply (Farr, Pritchard, & Smitten, 1990). they do not measure hig-level thinking (Haladyna, Downing & Rodriguez, 2002; Roediger & Marsh, 2005), and they tend to reducing the amount of time students spend studying (Kulhavey, Dwyer, & Silver, 1975; Scouller, 1998).

The decision to use multiple choice should consider the purpose of the exam and the weight it will bear in representing student competencies.

As Table 1 indicates, the objective of exams may vary even if the content of the exam remains the same. In Table 1, both prompts direct students’ attention to Victorian Realism in literature, one by inviting students to identify themese from a bank of options, the other by inviting the students to form and defend an opinion. Each promot targets the students' mastery of declarative knowledge, but the essay prompt also targets studnets analytical and writing skills.

Table 1: A Comparison of Multiple Choice and Essay Prompts in Assessments of Student work in a Course on Victorian Literature.

Test Style

Prompt

Purpose

Multiple Choice

  1. Which of the following themes was not distinct in the Victorian Realism?

a. The indifference of the affluent to the poor
b. The crippling effects of criminalizing debt

c. The ruin brought about by ignorance and vanity.

d. The improvment of civilization wrought by of colonialism

Identify themes in Victorian Realism

Essay

Compose a 2-3 paragraph essay in response to the following:

Which of the following statements would most likely have been asserted by a Victorian Realist? Use examples to support your claims.

a. Let all abandon self-regulation to the muses somewhere between the heart and heaven that we may drink deep of life and dreams

b. In clanking of the mill, the cranking of cogs—the lessons learned young by busy hands discipline the blood to survive

To interpret and synthesize, ideas; develop a thesis and develop evidence to support it; to articulate ideas in a scholarly and grammatically correct format.

 

In determining whether to use the multiple choice style, instructors may consider not only the purpose of the prompts, but the purpose of the assessment itself. If the purpose of the assessment is to pre-test students’ prior knowledge, or to administer a formative progress report of student learning, or to facilitate a review, it may not matter that prompts to do not generate higher level thinking. If the purpose of the assessment is to assess original and higher level thinking and to represent a summative achievement, the multiple choice option may not be optimal.

As illustrated in Tables 2 and 3, multiple choice questions may be based on graphic images, such as charts, tables, and graphs, and narratives.

Table 2: A Multiple Choice Prompt Based on a Graphic Display of Data

Graph: Book Reading Patterns. The General Reading Habits of America. Pew Research Center, 2012.

 

 

 

 

 

 

Prompts

1. The data in the chart above implies that: 

a. People without college degrees have little interest in reading

b. People with high incomes read more than poor people

c. People who read books generally do not read more than 21 a year

2. The data in the chart above reveals that:

a. About twice as many people aged 50-65 read 1-5 books a years as they do over

21 books a year

b. The number of epople who read more than 21 nooks a years is evenly

distributed by ethnicity

c. The number of males who read 1-5 books a years is almost double the number 

of females who read 1-5 books a year

Table 3: A Multiple Choice Prompt Based on an Excerpt from an Article in a Scholarly Journal

Excerpt: Beiderman, J. (2005). Attention deficit/hyperactivity disorder: A selective overview. Biological Psychiatry, 57(11): 1215-20. [Abstract]

Attention-deficit/hyperactivity disorder (ADHD) is a multifactorial and clinically heterogeneous disorder that is associated with tremendous financial burden, stress to families, and adverse academic and vocational outcomes. Attention-deficit/hyperactivity disorder is highly prevalent in children worldwide, and the prevalence of this disorder in adults is increasingly recognized. Studies of adults with a diagnosis of childhood-onset ADHD indicate that clinical correlates—demographic, psychosocial, psychiatric, and cognitive features—mirror findings among children with ADHD. Predictors of persistence of ADHD include family history of the disorder, psychiatric comorbidity, and psychosocial adversity. Family studies of ADHD have consistently supported its strong familial nature. Psychiatric disorders comorbid with childhood ADHD include oppositional defiant and conduct disorders, whereas mood and anxiety disorders are comorbid with ADHD in both children and adults. Pregnancy and delivery complications, maternal smoking during pregnancy, and adverse family environment variables are considered important risk factors for ADHD. The etiology of ADHD has not been clearly identified, although evidence supports neurobiologic and genetic origins. Structural and functional imaging studies suggest that dysfunction in the fronto-subcortical pathways, as well as imbalances in the dopaminergic and noradrenergic systems, contribute to the pathophysiology of ADHD. Medication with dopaminergic and noradrenergic activity seems to reduce ADHD symptoms by blocking dopamine and norepinephrine reuptake. Such alterations in dopaminergic and noradrenergic function are apparently necessary for the clinical efficacy of pharmacologic treatments of ADHD.

Prompts

1. The abstract indicates that:

a. ADHD is caused by hyperactivity, defiance, anxiety, and mood swings

b. The family dynamics of adults with ADHD are different from those of children with ADHD

c. Neuroimaging points to dysfunctions in more than one part of the brain of those with ADHD

2. The abstract implies that:

a. ADHD can be treated but at present cannot be cured

b. Symptoms of ADHD can only be clinically observed by neuroimaging

c. Women with ADHD should not become pregnant or raise children

Research found that 75% of scholars agree that if one is to use multiple choice tests, the tests should (Haladyna Rodriguez, & Downing, 2002):

  1. Target important not trivial information
  2. Contain clear directions
  3. Embed a central idea into the stem (statement or question to be resolves) Avoid disclosing clues
  4. Make distractor answers plausible

Rubrics

The purpose of a rubric is to improve the consistency of grading and to present a clear and transparent set of criteria and standards against which work will be assessed. Research on the use of rubrics in higher education indicate that using rubrics can lead to improvements in student learning despite the challenges rubrics present for rater reliability (Reddy & Andrade, 2010). While rubric reliability refers to the consistency of ratings generated by those who use the rubric to grade or assess student work, the rubric’s validity refers to the appropriateness of the rubric’s construction, content and criteria. (Moskal & Layden, 2000). To ensure a rubric’s reliability and validity, it is helpful for instructors to work with colleagues who teach in the same program and to have regular conversations about student progress and what kind of criteria and standards ought to be ubiquitous in student assessment.

Rubrics are helpful as they clarify expectations and may target core outcomes, tasks, and behaviors, and provide students with specific information about the strengths and limitations of their work; but, they are not a replacement for instruction and often require explanations so that students understand the criteria and standards embodied in them (Andrade, 2005).

Table 3 provides an example of a rubric for a 5-page writing assignment in a course on 20th century history. Students were directed to identify the point at which the Cold War was inevitable and defend their thesis. They were instructed to quote or paraphrase primary sources at least three times in the essay, and properly cite them. The rubric may have provided the instructor with an instrument that simplified the grading process, but a lot of decisions went into the rubric before it could be used, including:

  1. What is the purpose of the 5-page writing assignment
  2. What is the best way to translate the purposes of the essay into grading criteria?
  3. What is the best way to factor the developmental level of students in to the standards?
  4. What departmental expectations for student achievement should be factored in the rubric?
  5. What kind of weight should this project have relative other course requirements?
  6. What is needed to ensure the alignment between the directions for the assignment and the rubric?

Table 3: Rubric for Assessing Essay on the Origins of the Cold War

 

Standards of Proficiency

Criteria

Emerging (1)

Adequate (2)

Proficient (3)

Exemplary (4)

Thesis

Little to no development of background and context for thesis; thesis unclear  

Marginal development of background and context for thesis; thesis vague  

Good development of background and context for thesis; thesis clear  

Robust development of background and context for thesis; thesis very clear  

Supports

Poor grasp of history; little depth; few insights and poor  logic

Moderate grasp of history; developed in sufficient depth; marginal insights

Good grasp of history; developed in some depth with logical, insights

Excellent understanding  of history; developed in depth with sharp, logical, insights

Use of Sources

Poor selection and use of sources; number insufficient; link between sources and assertions underdeveloped

Adequate selection and use of required sources; links sources to thesis sometimes clear and logical

Good selection and use of required sources; links sources to thesis are largely clear and logical

Outstanding selection and use of multiple sources; links sources to thesis clearly and logically

Grammar

Poor use of grammar, spelling, syntax, citation, and APA formatting; major errors consistent

Inconsistent use of grammar, spelling, syntax, citation, and APA formatting; many errors

Consistent use of grammar, spelling, syntax, citation, and APA formatting with few minor errors

Exemplary and consistent use of grammar, spelling, syntax, citation, and APA formatting

 

Because it is sometimes difficult to distinguish between the levels of proficiency, instructors may help students improve their understanding by taking class time to critique samples of essays with the same rubric that will be used to critique their work, and explaining what makes one essay better than the other. The activity keeps students engaged and prepares them to be more critical in their own proof-reading with particular sensitivity to the quality of discrete elements of their work.

References

Andrade, H. G. (2005). Teaching with rubrics: The good, the bad, and the ugly. College teaching, 53(1), 27-31.

Biggs, J. B. (1979) Individual differences in study processes and the quality of learning outcomes. Higher Education, 8, 281-304.

Boston, C. (2002). The concept of formative assessment. Practical Assessment, Research & Evaluation, 8(9), 1-5.

Crook, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58, 438-481.

Cross, K. P., & Angelo, T. A. (1988). Classroom Assessment Techniques. A Handbook for Faculty.

Farr, R., Pritchard, R. & Smitten, B. (1990). A description of what happens when an examinee takes a multiple choice reading comprehension test. Journal of Educational Measurement, 27, 209-226.

Funk, S. C., & Dickson, K. L. (2011). Multiple-choice and short-answer exam performance in a college classroom. Teaching of Psychology, 38(4), 273-277.

Garfield, J. B. (1994). Beyond testing and grading: Using assessment to improve student learning. Journal of Statistics Education, 2(1), 1-11.

Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple choice item-writing guidelines and classroom assessment. Applied measurement in Education, 15, 309-334.

Hogan. R. (2007). The historical development of program evaluation: Exploring the past and present. Online Journal of Workforce Education and Development, 2, 1-14.

 Hughes, K. L., & Scott-Clayton, J. (2011). Assessing developmental assessment in community colleges. Community College Review, 39(4), 327-351.

Kulhavey, R. Dwyer, R., & Silver, L. (1975). The effects of note-taking and test expectancy on the learning of text material. Journal of Educational Research, 68, 363-365.

Lemann, N. (2000). The big test. New York, NY: Farrar, Strauss, and Giroux.

Martell, K. (2007). Assessing student learning: Are business schools making the grade?. The Journal of Education for Business, 82(4), 189-195.

Miller, A. H., Imrie, B. W., & Cox, K. (1998). Student assessment in higher education: a handbook for assessing performance. Psychology Press.

Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research & Evaluation, 7(10), 71-81.

Nilson, L. B. (2010). Teaching at its best: A research-based resource for college instructors. San Francisco, CA: Jossey-Bass.

Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education. Assessment & Evaluation in Higher Education, 35(4), 435-448.

Roediger III, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(5), 1155.

Scouller, K. (1998). The influence of assessment methods on students’ learning approaches: Multiple choice questions examination versus assignment essay. Higher Education, 35, 453-472.

Stiggins, R. J., Griswold, M. M., & Wikelund, K. R. (1989). Measuring thinking skills through classroom assessment. Journal of Education Measurement, 26, 233-246.

Upcraft, M. L., & Schuh, J. H. (1996). Assessment in Student Affairs: A Guide for Practitioners. The Jossey-Bass Higher and Adult Education Series. Jossey-Bass Inc., Publishers, 350 Sansome St., San Francisco, CA 94104.

Additional Resources

Miller, R. & Leskes, A. (2005). Levels of assessment. American Association for colleges and Universities.

Hersh, R. H. & Keeling, R. P. (2013). Changing institutional culture to promote assessment of higher learning. National Institute for Learning Outcomes Assessment.

Nine Principles of Assessment

Stiggins, R. J. (1987). Design and development of performance assessments. Educational Measurement: Issues and Practice, 6(3), 33-42.

Using Rubrics.... and More!