Designing Effective Exams

Course Type:
All

Perhaps one of the most challenging aspects of teaching a course is developing exams.   Whether you use low-stakes and frequent assessments (e.g., quizzes, polling questions) or high-stakes and infrequent assessments (e.g., midterms, final exams), when created effectively, they can be very useful measures of student achievement of your course learning objectives and outcomes. 

Ideally, effective exams have four characteristics (Piontek, 2008).

  • Valid: The exam is aligned with the learning objectives and measures the learning it was intended to measure.
  • Reliable: The exam consistently differentiates between different levels of student learning.
  • Recognizable: The exam is well-aligned with the teaching methods and learning activities.
  • Realistic: The exam is “doable” with respect to the time and resources students can bring into the exam, as well as the time and resources instructors have to grade it. 

Valid and reliable exam questions present students with a task that is both important and clearly understood, and one that can be answered correctly by anyone who has achieved the intended learning outcome, or course objective(s).  Below we outline some general guidelines for writing and grading exams.

General Guidelines for Designing Exams

Align the exam with your course goals and learning objectives.

In other words, they should align with the learning objectives of the course/unit/lesson. Avoid testing for unimportant details, unrelated bits of information, and material that is irrelevant to the learning objectives. For example, if your learning objective involves memorization, then you should assess for remembering or understanding; if you hope students will develop problem-solving capacities, your exams should focus on assessing students’ application and analysis skills.  Many instructors utilize the Revised Bloom’s Taxonomy (Anderson et al., 2001) when constructing questions to assess the kinds of knowledge to be learned (Knowledge Dimension) and the processes used to learn (Cognitive Process Dimension).

Decide on the types of exam items that should be used

There are many different types of exam items that can be used to assess students’ knowledge and skills. As noted in Table 1, each type of exam item may be better suited to measuring some learning objectives than others and each has its advantages and disadvantages in terms of ease of design, implementation, and scoring.

Table 1: Advantages and Disadvantages of Different Types of Exam Items

Type of Exam Item Advantages Disadvantages

True-False

Many items can be administered in a relatively short time. Moderately easy to write; easily scored. Limited primarily to testing knowledge of information.  Easy to guess correctly , even if material has not been mastered.


Multiple-Choice

Can be used to assess a broad range of content in a brief period. Skillfully written items can measure higher order cognitive skills. Can be scored quickly. Difficult and time consuming to write good items. Possible to assess higher order cognitive skills, but most items assess only knowledge.  Some correct answers can be guesses.
Matching Items can be written quickly. A broad range of content can be assessed. Scoring can be done efficiently. Higher order cognitive skills are difficult to assess.
Short Answer or Completion Many can be administered in a brief amount of time. Relatively efficient to score. Moderately easy to write. Difficult to identify defensible criteria for correct answers. Limited to questions that can be answered or completed in very few words.
Essay Can be used to measure higher order cognitive skills. Relatively easy to write questions. Difficult for respondents to get the correct answer by guessing. Time consuming to administer and score. Difficult to identify reliable criteria for scoring. Only a limited range of content can be sampled during any one testing period.

Adapted from Table 10.1 of Worthen, et al., 1993, p. 261.

Plan your exam using an exam blueprint

Once you know the learning objectives and question types you want to include in your exam you should create an exam blueprint. An exam blueprint, also known as an exam plan, is a simple two-dimensional table that reflects the importance  of the concepts and the emphasis they were given in the course/unit/lesson. At the initial stage of test planning, you can use the matrix to determine the proportion of items that you need in each cell of the table.

The blueprint identifies the objectives and skills that are to be tested and the relative weight on the exam given to each. The blueprint can help you ensure that you are obtaining the desired coverage of topics and level of learning. Once you create your exam blueprint you can begin writing your questions. As you develop questions for each section of the exam, record the number of questions in the cells of the matrix. The table below shows the distribution of 30 questions on an exam covering three different topics.

Table 2: Sample exam blueprint for a 30-item exam displaying the number of items by topic and cognitive level

  Topic/Objective 1 Topic/Objective 2 Topic/Objective 3 Total

Remember

2 2 1 5 (16.6%)

Understand

3 2 4 9 (30%)

Apply

3 4 2 9 (30%)

Analyze

  2 2 4 (13.3%)

Evaluate

1   1 2 (6.6%)

Create

1     1 (3.3%)

Total

10 (33.3%) 10 (33.3%) 10 (33.3%) 30

It's important to note that it's just fine to have many questions at the lower cognitive process levels (remember, understand), if that aligns with your learning goals. This is particularly likely in introductory courses.

Formatting the exam

Well-formatted exams not only make taking the exam less confusing and less time consuming for students, they also make grading the exam easier, especially when grading is done by hand. Consider the following suggestions:

  • Provide clear directions at the beginning of the exam. It is also helpful to indicate the number of questions and the points for each question.
  • Avoid splitting exam questions between two different pages.
  • Group similar content items together.

Additional strategies for building exams

Here are some additional guidelines for how you can prepare effective exams:

  • Create exam items while you prepare class lessons.
  • Make note of questions that students ask frequently during class.
  • Make note of common misconceptions students make during class or in homework.
  • Invite students to submit items at the end of class or at other times.
  • Collaborate with a peer or your GSIs to write and review questions.
  • Develop a “bank” of exam items using a variety of types of items that you can draw from.
  • Consider how long it will take students to complete the exam; different exam items will take students different time and effort to complete.
  • Do not write the exam in one day; allow time for editing and revising.
Writing Multiple Choice Questions

Guidelines for Creating Multiple Choice Questions

A standard multiple-choice exam item consists of two basic parts: a problem (stem) and a list of suggested solutions (alternatives). The stem may be in the form of either a question or an incomplete statement, and the list of alternatives contains one correct or best alternative (answer) and a number of incorrect or inferior alternatives (distractors). Here are some general guidelines for constructing multiple choice question stem and alternatives:

  • Design each item to measure an important learning outcome
  • State the stem as clearly, directly, and simply as possible 
  • The problem should be self-contained in the stem
  • Avoid the use of negatives in the stem
  • Include in the stem any word(s) that might otherwise be repeated in each alternative
  • There should be only one correct answer among the alternatives
  • All the alternatives should be parallel with respect to grammatical structure, length, and complexity
  • Keep the alternatives short
  • Distractors should be plausible and focus on common errors or misconceptions
  • Spread correct answers equally among all the choices
  • Develop items to assess higher-order thinking, rather than simple recall or understanding questions (see below)

Writing Multiple Choice Questions to Assess Higher-Order Thinking

In addition to testing simple recall or comprehension, multiple choice questions can be used to assess students' higher-level thinking. According to Bloom's Taxonomy, a widely accepted framework for defining levels of learning, higher-level thinking involves tasks that ask students to go beyond recalling or explaining basic concepts. It includes applying that knowledge to new situations, as well as analyzing and evaluating course content. If your course objectives include these skills, it is important to develop assessment measures to gauge students' ability to do such complex thinking. While essays and papers are well suited to this task, you can also write multiple choice questions that test this type of thinking. This site developed by Cynthia Brame includes a fuller explanation of Bloom's taxonomy, as well as specific examples of higher-order questions. It offers the following guidance and options for getting started:

  • Use Bloom's higher-order thinking categories and verbs associated with them to define the skills you wish to assess
  • Use specific examples, such as case studies or clinical vignettes, as the basis for the question, asking students to make choices, identify a rule or concept from a specific example, or interpret visuals to draw conclusions
  • Have students choose from responses that each represent different reasons underlying an answer or action
  • Use multiple questions about a single data set or scenario 
Writing Essay Questions

Essays can tap complex thinking by requiring students to organize and integrate information, interpret information, construct arguments, give explanations, evaluate the merit of ideas, and carry out other types of reasoning (Piontek, 2008). The table below provides examples of essay question stems for assessing a variety of reasoning skills. Essay items allow you to evaluate how well students are able to communicate their reasoning, and they are usually less time consuming to construct than multiple-choice items that measure reasoning. The major disadvantages of essays include the amount of time you will need to spend reading and scoring student responses, as well as developing and using rubrics to fairness and consistency in grading.

Guidelines for Developing Essay Items

The following guidelines can help you develop effective essay questions.

  • Restrict the use of essay questions to educational outcomes that are difficult to measure using other formats. Other assessment formats are better for measuring recall knowledge (e.g., true-false, fill-in-the-blank, multiple-choice); the essay is able to measure deep understanding and mastery of complex information.
  • Construct the item to elicit skills and knowledge in the educational outcomes. When constructing essay items, start by identifying the specific skills and knowledge that will be assessed. As noted earlier, Table 1 provides examples of essay item question stems for assessing a variety of reasoning skills.
  • Write the item so that students clearly understand the specific task. Once you have identified the specific skills and knowledge, you should word the question clearly and concisely so that it communicates to the students the specific task(s) you expected them to complete (e.g., state, formulate, evaluate, use the principle of, create a plan for, etc.). If the language is ambiguous or students feel they are guessing at “what the instructor wants me to do,” the ability of the item to measure the intended skill or knowledge decreases.

For many educational objectives aimed at higher order reasoning skills, creating a series of essay items that elicit different aspects of students' skills and knowledge can be more efficient than attempting to create one question to capture multiple objectives. By using multiple essay items, you can capture a variety of skills and knowledge while also covering a greater breadth of course content.

Table 1: Sample Essay Item Stems for Assessing Reasoning Skills

Skill Stem
Comparing

What are the major causes of...

What would be the mostly likely effects of...

Relating Cause and Effect

What are the major causes of...

What would be the mostly likely effects of...

Justifying

Which of the following alternatives do you favor and why?

Explain why you agree or disagree with the following statement.

Summarizing State the main points included in...

Briefly summarize the contents of...
Generalizing Formulate several valid generalizations for the following data.

State a set of principles that can explain the following events.
Inferring

In light of the information presented, what is most likely to happen when...

How would person X be likely to react to the following issue?

Classifying Group the following items according to…

What do the following items have in common?
Creating List as many ways as you can think of for/to…

Describe what would happen if…
Applying Using the principles of...as a guide, describe how you would solve the​ following problem.

Describe a situation that illustrates the principle of...
Analyzing Describe the reasoning errors in the following paragraph. List and describe the main characteristics of...
Synthesizing Describe a plan for providing that...

Write a well-organized report that shows...
Evaluating

Describe the strengths and weaknesses of...

Using the given criteria, write an evaluation of..

Grading and Feedback

Grading can be a constructive process for both our students and for us. It can give them the opportunity to improve their knowledge and writing skills and it can give us feedback on our teaching and evaluation methods. By being consistent and fair, we can minimize the inevitably unpleasant aspects of passing judgment on someone's efforts.

Strategies for grading objective questions (Multiple-choice, True/False, Matching)

Although designing unambiguous multiple choice exam questions can be time consuming, they are often easier to grade than essay and short answer questions. But difficulties can still arise. In the case of multiple choice questions, if students are doing worse than chance would predict on a particular question, it may be a signal that the question was poorly worded.

Item analysis is an excellent way to periodically check the effectiveness of your exam items. It identifies items that are not functioning well, thus enabling you to revise the items, remove them from your exam, or revise your instruction, whichever is appropriate. Two common reliability tests are item difficulty and item discrimination.

  • Item difficulty, also referred to as the p-value, is typically reported as the percentage of those taking the exam who choose the correct answer for an item. The higher the percentage, the easier the item. In classical testing theory (McCowan and McCowan, 1999), the optimal difficulty guideline for a 4-alternative multiple choice question is 63%.
  • Item discrimination, also referred to as the Point-Biserial Correlation (PBS) is the measure of how an item detects differences between higher and lower scores on an exam. The higher the value, the more discriminating the item.  Items that discriminate well are answered correctly more often by the higher scoring students and have a high positive correlation.  Good discrimination indexes are 0.4 and above; poor are 0.2 and below.

This Google doc contains a sample item analysis for a 5-question exam showing the item difficulty (P-value), item discrimination (PBS), and distractor frequency (A-E).

Strategies for grading essay and short answer questions

Although essay and short answer questions are more labor-intensive to grade than objective questions, they provide more insight into students’ critical thinking skills. Here are some strategies to help streamline the essay/short answer grading process:

  1.  Develop a rubric or answer key to guide your grading. A rubric or answer key provides an instructor with guidelines to consistently and efficiently evaluate student performance on an assignment or exam, and increases the fairness, accuracy, and consistency of the scoring process.
  • Outline what constitutes an expected answer (criteria for knowledge and skills). Using predetermined criteria increases the fairness, accuracy, and consistency of the scoring process. Identifying the criteria in advance also decreases the likelihood that you will be influenced by the initial answers you read when you begin to grade the exam.
  • Select an appropriate scoring method and point value for the exam item. Generally we describe scoring methods for written communication as either analytic, where each identified criterion is assigned separate points, or holistic, where a single score is generated resulting in an overall judgment of the quality, or single-point, where there are only two scales or performance levels possible.  When assigning a point value, it is helpful to think through how you might give partial credit if a student gets some elements of an answer correct. For example, if an essay question involves four discrete criteria, assigning a point value that is divisible by four makes grading easier.
  1. Clarify the role of writing mechanics and other factors independent of the learning objective being measured. For example, you can outline for students how various elements of written communication (grammar, spelling, punctuation, organization and flow, use of vocabulary/terminology, use of scientific notation or formulas) figure into the scoring criteria. You should also decide whether you will decrease scores or ignore the inclusion of information irrelevant to the question.
  2. Grade all student responses to the same question, rather than reading the whole exam of each student. Grading the same essay question for all students creates a more uniform standard of scoring as it is easier to remember the criteria for scoring each answer. In addition to  helping with grading reliability, this promotes grading equity and may provide a more holistic view of how the class as a whole answered each question.
  3. Create anonymity for students’ responses while scoring. Don’t look at students’ names when you read the exam, or have your students write an ID number on the exam instead.  This way you will eliminate grader bias. Shuffling the set of exams after grading each question also helps create anonymity. The shuffling also creates a random order in which to grade each essay item, making it less likely that you will identify a pattern in an individual student’s answers or base your score on previous impressions of that student.

Grading as a teaching team

When there are multiple sections of a course all using a common exam, it is important to think about issues of consistency and fairness. There are a couple ways to to achieve those goals. For example, you could divide exam questions among the instructors teaching the course so that each question across all sections is graded by a single individual. Another option is to have a norming session in which instructors grade a small subset of exams and compare their grading to ensure consistency and raise questions they may have. And these are not mutually exclusive approaches. Instructors could norm a specific question then grade all of the responses in a section. The process could then be repeated with the next question. In all cases, having explicit rubrics will contribute to consistency.

Returning exams and giving students feedback

Providing feedback to students on exams and other assessments is a crucial element of the learning process. Effective feedback is: timely, relevant, specific, and actionable.  Here are some general strategies:

  1. Return exams promptly. If this is not possible, post an answer key and/or scoring rubric as soon as possible after they finish the exam.  This provides students with feedback on their performance while it is still fresh in their memory.
  2. Provide feedback to the class as a whole regarding the following:
    • Items most missed
    • Mistakes most frequently made
    • What was done particularly well
  3. When providing feedback to individual students, do not overwhelm a student with so much information. Instead, provide targeted feedback on specific questions or concepts they missed or scored poorly. For example, prompt students to “think out loud” or ask students to refer back to their notes.  In other words, have them answer the question and tell you out loud their thinking process.