Different Kinds of Outcomes and How to Assess Them

Handbook and Other Materials l Asking the Right Questions (ARQ) l Training, Consulting, & External EvaluationFAQ

Flashlight Evaluation Handbook Table of Contents

  1. Introduction

  2. Is Everyone Supposed to Benefit in the Same Way?

    1. Yes, to some degree: Uniform Impacts

    2. No, to some degree: Unique Uses

  3. Additional Defining Questions About Benefits

    1. Outcome or Value-Added?

    2. Competitors or Criteria?

    3. When is "After?"

    4. Same Outcome? or Just Similar?

  4. Several Categories of Benefits and How to Assess Each of Them

    1. Access (how many people can learn? what kinds of people?)

    2. Better Outcomes on Traditional Goals

    3. New Outcomes, Better Outcomes?

    4. Variety of Offerings Available to Learners

    5. Net Cost and Stress; Revenues

  5. Summary of Chapter

 

1. Introduction

Imagine that we want to study the benefits of two types of technology-supported activity (a course, major, service, or an institution's effort to produce an outcome such as good writing skill or high employment rates for graduates) in order to decide which of the two activities produces better outcomes. We'll refer to these competing activities as Program A and Program B. A and B might be different versions of the same program ('before an innovation' versus 'with the innovation'), a comparison of two competing pilot programs, or a comparison of a real activity with a hypothetical alternative, for example.  

Let’s simplify the outcomes assessment problem a bit by assuming that the educational benefits of interest are who was able to learn from the program (what types of people, how many people, etc.), what they learned, and the consequences of 'who learned what.'

This chapter of the Flashlight Evaluation Handbook explores three key questions that you would need to answer in order to design such a study.

  1. Are the program’s outcomes intended to be the same for all its beneficiaries? If not, how can you assess them?
  2. To help design assessment procedures, how can we be more specific than merely saying that the technology is meant to cause  “better educational outcomes?”
  3. What kinds of data about benefits might help the people running the program to improve those benefits (paralleling the way that activity based cost data ought to be able to help policy makers control costs)?

 

1. Is Everyone Supposed to Benefit in the Same Ways?

What’s a typical example of the kind of outcome goal that ought to be measured? “All students should learn to think critically (though perhaps to different degrees of skill).” “All students should get jobs (perhaps at different salaries).”  In other words, the goals assume that everyone is supposed to benefit in the same ways.  If that were true, it would certainly make things simpler to measure – the analyst could devise one test of achievement of benefit (e.g., a test of critical thinking skill) and apply it to all the beneficiaries.  But what if some students are gaining in critical thinking while others are mainly improving their creativity and still others are gaining in interpersonal skills?

As those examples indicate, there are two ways to look at almost any educational program.  One perspective focuses on program benefits that are the same for everyone (“uniform impacts”) while the other perspective focuses on benefits are qualitatively different and somewhat unpredictable for each learner (“unique uses”) (Balestri, Ehrmann, et al., 1986; Ehrmann and Zúñiga, 1997, 2002).  This section of the chapter explains these complementary perspectives on education. The following section will use these ideas to suggest ways to assess specific types of benefits.

 

A. Uniform Impacts

To some degree, all students in an educational program are supposed to learn the same things.  As shown in Figure 1, such learning by two people can be represented by two parallel arrows. The length of each person’s arrow represents the amount of growth during (and sometimes after) the program.   Students usually enter a program with differing levels of knowledge, grow to differing degrees, and leave with differing levels of achievement. The uniform impact perspective assumes that the desired direction of growth is the same for all students.

In an English course, for example, uniform impact assessment might measure student understanding of subject-verb agreement, or skill in writing a 5 paragraph essay, or even love of the novels of Jane Austen.  The analyst picks one or more such dimensions of learning and then assesses all learners using the same test(s).  I’ve labeled this perspective “uniform impact” because it assumes that the purpose of the program is to benefit all learners in the same, predesigned way.

 

B. Unique Uses

However, that same English course (or other educational activity) can also be assessed by asking how each learner benefited the most, no matter what that benefit might have been.  I’ve termed this perspective “unique uses” because it assumes that each student is a user of the program and that, as unique human beings, learners each make somewhat different and somewhat unpredictable uses of the opportunities that the program provides.

In that English course, for example, one student may fall in love with poetry, while another gains clarity in persuasive writing, and a third falls in love with literature, and a fourth doesn’t benefit much at all.  (See Figure 2) 

Faculty members cope with this kind of diversity all the time. An instructor may give three students each an “A” but award the “A” for a different reason in each case. The only common denominator is some form of excellence or major growth that relates to the general aims of the course.  There are multiple possibilities for growth and it’s likely that different students will grow in different directions. 

Notice that uniform impact methods tend to miss a lot when benefits are better described in unique uses terms. In that English class for example, imagine that the instructor had decided to grade all students only on poetry skills. One student would pass and the others would fail. Or imagine that the instructor tested all students on poetry, persuasive writing, and love of literature, and only passed students who did well on all three tests: everyone would fail the course.  Meanwhile, an instructor using a unique uses approach (seeking excellence in at least one dimension of learning) would pass three of the four students.

Uniform impact and unique uses are both valid, and usually are both valid for the same program. The challenge for the analyst is to make sure that the assessment approaches are in tune with the program’s goals and performance. If, for example, the program’s goals are strongly “unique uses” then it is inappropriate to employ only “uniform impact” measures, and vice versa.

How can unique uses benefits be assessed?  Most unique uses assessments follow these steps:

  1. Decide which students to assess. All of them? A random sample? A stratified random sample?
  2. Assess the students one at a time. Ask the student what the most important benefit(s) of the program have been for him or her. (At this point, the respondent’s statement should be treated as a hypothesis, not a proven fact.) This hypothesis about benefits can also be created or fine-tuned by asking the instructor(s), peers, or job supervisors about the program’s benefits for that student.
  3. Gather data bearing on this hypothesis. If the student said that the program helped her get a job, what data might help you decide whether to believe the assertion?  (For example, did the student really get a job? If the student said that certain skills learned in the program were important in getting the job, did the interviewer notice those skills?)  If appropriate, assess the benefit for the student (for example, if the benefit is a skill, assess how skilled the student is).
  4. If appropriate, quantify the benefit for that student. Panels of expert judges are sometimes useful for this purpose. Their expertise may come from their experience with programs of this type.  (This is exactly what teachers do when they grade essays.)
  5. Identify patterns of benefits.  Was each student completely unique? Or, more likely, did certain types of students seem to benefit in similar ways? These findings about patterns of benefit may suggest ways in which the program can be improved. For example, suppose program faculty consider “learning how to learn” to be only a minor goal of the program. But 50% of their graduates report that “learning how to learn” was the single most important benefit of taking the program. In that case, the faculty might want to put more resources into “learning how to learn” in the future.
  6. Synthesize data from the sample of students in order to evaluate the program’s success.

                                                                                   

3.  Additional Defining Questions about Benefits

Here are some additional questions to ask yourself before you begin assessing benefits.

Outcomes or Value-Added? When studying benefits, are you interested in outcomes (the state of things after the student completes the program) or in value-added (how much did their math understanding improve from the beginning of the course to the end)?  Outcomes can often be improved simply by recruiting more skilled incoming students, while value-added is more a result of the education. 

Competitors?  or Criteria?  It often makes sense to define value by comparing the target triad with the most realistic competitor -- if you didn't do it this way, what would you do instead?  The answer is only occasionally "nothing." More often there would be some alternative triad (a different activity and/or alternative technologies.)  For example, imagine an institution evaluating the pilot test of an online calendaring system.  The evaluation might compare an alternate online system and also a paper-based, ad hoc calendaring system, looking at the benefits and costs of each of those three options. Criterion-based evaluations set a standard of acceptable benefits and ask whether the technology/activity combination achieved that level of benefit. Similar criteria might be set for costs.  So the evaluation might be designed to answer the question, "Did this intervention improve retention at least 10% at a cost of less than $100/student retained?"

When is “after”?  Imagine two programs about literature: A and B. Program A teaches a thousand facts about novels that can be easily memorized but that are quickly forgotten soon after taking the final exam. In contrast, Program B teaches students to love novels so that they continue reading and rereading books after the course ends. Program B also encourages students to join or organize book clubs so that they can talk with friends about the books they’ve been reading.  Program B’s students finish with less factual knowledge than students from Program A but, over the years, Program B graduates become increasingly knowledgeable about literature. An exam taken immediately after the completion of the two programs might show higher scores for graduates of Program A.  But in another exam, given three months later, Program B’s students might outscore Program A’s.  Two years later, the advantage of Program A over Program B might be even larger.  There are many factors to consider in deciding when to assess benefits. The purpose of the program is one of those considerations.

Same Outcomes, or Just Similar? When comparing learning outcomes of Programs A and B, ask whether the two programs are trying to teach exactly the same things. If they are, comparing benefits is easier: use the same assessment measure for both programs.  That’s the assumption that many people make about assessment: the most fair and appropriate approach is to the use the same test of outcomes on the two competing programs.

But that equivalence of goals is rare, especially when technologies are used differently. Instead the two programs usually have goals that only overlap, as shown in Figure 3.

Imagine that Program A is taught mainly via lecture in a classroom.  The competition, Program B, uses videotapes of that faculty member’s lectures supported by an online seminar that is led by an adjunct staff member.  Goals distinctive to Program A include benefits of face-to-face contact with a tenured faculty member. Goals distinctive to Program B might include benefits of greater student freedom to explore topics of individual interest, greater in-depth exploration of certain topics in the online seminar, and learning how to collaborate online with other students.  A study of benefits that only attended to the common goals (learning of course content, for example) would miss some of the major reasons for choosing one program over the other. In cases such as these it’s important to assess all the important goals, not just those that are common to the competing programs.

 

4. Several Categories of Benefit and How to Assess Each of Them

There are many categories of benefit from technology use for education, including:

  1. Enrollment and attrition (access to education)
  2. Better outcomes on traditional goals (teaching-learning effectiveness)
  3. New outcomes not previously sought or emphasized (e.g., computer-dependent aspects of disciplines such as geographic information systems in geography)
  4. Variety of offerings available to each learner
  5. Controlling net costs and stress; increasing net revenue
  6. Consequences of A, B, and C for the graduate (e.g., employment)
  7. Consequences of A, B, and C for the community in its economic, social, spiritual, and political life.
  8. Consequences of gains in personal and program efficiency (e.g., writing more because it’s easier to use a word processor than a typewriter)
  9. Helping the institution attract and retain students and staff who expect a certain degree of technology access.
  10. Helping the institution attract and retain support from outside constituencies who expect to see a certain level of technological infrastructure.

  This chapter focuses on methods for analyzing benefits A, B, C and D. 

 

A. Access benefits

Some programs are designed to produce gains in access to education: people who couldn’t otherwise have taken courses of this type; people who can now take more courses; people who would have been less likely to pass such courses. 

The uniform impact perspective usually invites attention to changes in total enrollment and retention either for all learners (total enrollment) or a particular target group (e.g., students of color).  To assess changes in enrollment obviously requires counting students (not as easy as it sounds) and, sometimes, getting data to indicate why they are enrolled. For example, evaluators of distance learning programs need to know not only how many students are enrolled but also how many of those course enrollments would have occurred even without the distance learning program.

The unique uses perspective raises the question of whether particular types of students are especially aided or impeded by program features.  For example, do online programs tend to attract and retain students who are more comfortable in that environment than in a face-to-face class?

It’s important to look at these unique uses issues in enrollment and retention.  Historically, changes in educational structures have opened access for some groups while restricting access for others (Ehrmann, 1999a).  The analyst and the policy maker need to deal with whether the net change is positive, whether the groups who benefit especially need that benefit, and whether the groups that are impeded are groups that have been excluded by past arrangements as well. 

 

B. Better Outcomes on Traditional Goals

In this situation, the goals of the two competing programs are the same.

In a uniform impact assessment, it’s appropriate to use objective tests of student performance students from Program A and B.  A high degree of skill is often needed to design objective tests, but only a low amount of skill is needed to “grade” the results: how much time did the student take to finish the task? Did the project designed by the engineering student actually function? How many questions were answered correctly? Rubrics are one of several useful tools for assessing outcomes.

One sign that a unique uses perspective is important for assessment is that there is more than one way to define “successful learning.” Then a high degree of expertise is usually needed to assess and grade student work, e.g., evaluating an essay or term paper, judging a student project.  Rubrics can also be used when employing a unique uses perspective.

Note: Flashlight tools such as the Current Student Inventory and Flashlight Online contain many questions related to Chickering and Gamson's "Seven Principles of Good Practice in Undergraduate Education" (families of activity such as improving faculty-student contact, active learning, and time on task).  Our tools focus in part on these activities because research shows that they tend to improve learning outcomes. So, especially if your study is assessing whether outcomes on traditional goals are improving, you may find many useful items in these Flashlight tools for a study on whether your uses of technology have contributed to those improvements in outcomes.

 

C. New Outcomes, Better Outcomes?

Computers are often used in order to change the goals of instruction: a new course of study in e-business or computer music; education in how to solve problems in a virtual team, an increased emphasis on complex problem solving and abstract thinking in a course where computers can now handle the skills that once required memorization of rote problem-solving methods. So part of the value comes from outcomes that are unique to one program or the other. This brings us back to the challenge of comparing programs whose goals are at least somewhat different (figure 3) or even wholly different.

In these cases, program A and program B use different projects and tests to assess student learning.  Even if we discover that students in program A scored 5 points higher on test A than students in program B did on test B, that tells us nothing about which program is more valuable.  What about giving students in both programs a test that includes everything in both program A and B?  Testing students on something they weren’t taught often leads to rebellion. 

There are at least two feasible ways to assess learning outcomes in programs with different goals.

Criterion-based assessment: It is sometimes possible to assess learning against a standard. Program A is teaching pilots to fly airplanes while program B is teaching students to ride bicycles. Program A’s students also learn to fly, while program B teaches only half its students to ride a bicycle without falling over. In that sense Program A is more successful than Program B, even though different tests have been used.

But that kind of comparison doesn’t deal with the value of teaching people to be pilots versus bicycle riders, and that’s a tough question.  But suppose advocates of Program A and Program B could agree on a panel of expert judges to assess their programs.  Those judges would be given materials describing the programs’ goals and teaching methods, the tests and projects used to assess student learning, and the results of the assessments (test scores, student projects).  Using these materials, the judges could then compare the two programs.  For example, suppose a disciplinary association in graphic arts was considering two ways of teaching, one of which was more technology-intensive than the other. A panel of employers and graduate school representatives might examine data about entering students, the curricula, tests, and artwork from seniors.  The panel would then report on which Program they preferred, and why.

 

D. Variety of Offerings Available to Learners

Education is being transformed by our uses of technology (e.g., Ehrmann, 1999a).  One benefit of that change is the variety of offerings, learning resources, experts and peers that are potentially available to each learner.  How might the analyst assess the value of this variety – both what’s offered, and what’s actually used?

The uniform impact perspective treats all learners and potential learners as equal. For example, in comparing Program A and B, the analyst might ask how many sources of information are used by students doing research papers.  In comparing a virtual university to a campus-based institution, the analyst might compare the ways and places where faculty members were educated: does the virtual institution offer a more varied set of teachers than the campus?

The unique uses perspective focuses on the different experiences of each learner. It tends to direct attention toward the ways in which different types of students exploit the available resources.   Perhaps a unique uses evaluation would conclude that Virtual University A fostered a greater variety of student learning, due to its flexibility and ability to reach out for resources than did Campus B, whose students learned more in lock step, using similar academic resources for similar purposes.

 

E. Controlling Net Costs and Stress; Increasing Net Revenue

The outcomes of technology use in the cost dimension are not limited to the cost of technology, or even the costs (some of them hidden) of supporting its use.  Technology rarely produces educational benefits of the types discussed in sections A-D unless the program is reorganized. And the reorganized program may cost more, or less, than its predecessor to operate. Because higher education is a labor intensive process, these costs are partly the ways that people use their time. So the analysis of cost outcomes and the analysis of work outcomes (e.g., from fulfillment to stress or even burnout) are inseparable: controlling costs ideally includes reducing those uses of time that are aversive while maintaining or even increasing those uses of time that are fulfilling and that advance the program's objectives.  For more on how to conduct this kind of analysis, see the Flashlight Cost Analysis Handbook (one of the other benefits institutions receive as part of their subscription.) 

 

5.  Summary 

Before designing the particular instruments for studying outcomes, you will need to answer at least three defining questions:

a)     Is the program mainly trying to attain the same benefits for all learners (uniform impacts)? Or is the program also designed to help each learner make unique use of its opportunities? Most college and university programs have both goals, and each set of outcomes needs to be assessed differently. In particular, when studying unique uses, one needs to assess each student in the sample separately and then afterward synthesize these assessments in order to evaluate the program.

b)     Is the study going to consider educational value-added (students at the end of the program contrasted with students at the beginning) or only outcomes?  If value-added is to be evaluated, then some kind of pre-test is necessary.

c)     Is the study going to measure benefits as the program is concluding (e.g., final examination), and/or some time after the program ends (e.g., at a time when students would actually be making use of what they learned in the program)?  During this waiting time, some knowledge and skill will diminish while other educational outcomes may improve (if the student continues to use them).

 

6. References

Ehrmann, Stephen C. (1999a) "Access and/or Quality: Redefining Choices in the Third Revolution," Educom Review, September, pp.24-27, 50-51.  On the Web at http://www.tltgroup.org/resources/or%20quality.htm

Ehrmann, Stephen C. (1999b), "What Outcomes Assessment Misses," in Architecture for Change: Information as Foundation. Washington, DC: American Association for Higher Education. On the Web at http://www.tltgroup.org/programs/outcomes.html  

 

Figures

   
 
  Table 1. Two Complementary Perspectives on Education

 

Uniform Impact

Unique Uses

Purpose of Education

Produce certain (observable) outcomes for all beneficiaries

Help each person learn something valuable

Role of student?

Object, who is impacted by intervention.

Subject, who makes use of the intervention (educational opportunity)

The best improvements in education?

Those improvements whose outcomes are replicable, so past excellence portends identical excellence in the future, even in different settings

Those improvements that can consistently produce excellent, creative outcomes, where different settings stimulate new kinds of excellence

Variation among beneficiaries

Quantitative

Qualitative and quantitative

Most important outcomes?

The objectives that the educator used in planning the program

The outcomes that turn out to be most important for each subject

How to assess learning

1. Ask the educator to state goal in advance

2.  Create assessment procedures that can measure progress toward that goal.

1. Observe user and educator goals.

2. Gather data about subjects' achievements, problems.

3. Choose "connoisseurs" to assess what's happened to each learner, and then to evaluate the meaning of the collective experience.

"Did the intervention cause the outcome?" How can you tell?

Statistics, using control groups

Historical analysis of a chain of events for each subject; control group less important

Quantitative and qualitative data: which method uses which data?

Use both

Use both

  Return to Evaluation Handbook Table of Contents

Note: This essay is adapted a chapter of the same name, written by the author, for the second edition of the Technology Costing Methodology Handbook, published by the Western Cooperative for Educational Technology (forthcoming).  

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Flashlight Evaluation Handbook Table of Contents

PO Box 5643
Takoma Park, Maryland 20913
Phone
: 301.270.8312/Fax: 301.270.8110  

To talk about our work
or our organization
contact:  Sally Gilbert

Search TLT Group.org

Contact us | Partners | TLTRs | FridayLive! | Consulting | 7 Principles | LTAs | TLT-SWG | Archives | Site Map |