-
Introduction
-
Is Everyone Supposed to Benefit in the
Same Way?
-
Yes, to some degree:
Uniform Impacts
-
No, to some degree:
Unique Uses
-
Additional Defining Questions About Benefits
-
Outcome or
Value-Added?
-
Competitors or
Criteria?
-
When
is "After?"
-
Same Outcome? or Just Similar?
-
Several Categories of Benefits and How to Assess Each of
Them
-
Access (how many
people can learn? what kinds of people?)
-
Better Outcomes on Traditional Goals
-
New Outcomes, Better Outcomes?
-
Variety of Offerings Available to Learners
-
Net Cost and Stress; Revenues
-
Summary of Chapter
Imagine that we want to study the
benefits of two types of technology-supported activity (a
course, major, service, or an institution's effort to
produce an outcome such as good writing skill or high
employment rates for graduates) in order to decide which of
the two activities produces better outcomes. We'll refer to
these competing activities as Program A and Program B. A and
B might be different versions of the same program ('before
an innovation' versus 'with the innovation'), a comparison
of two competing pilot programs, or a comparison of a real
activity with a hypothetical alternative, for example.
Let’s simplify the outcomes assessment
problem a bit by assuming that the educational benefits of
interest are who was able to learn from the program (what
types of people, how many people, etc.), what they learned,
and the consequences of 'who learned what.'
This chapter of the Flashlight
Evaluation Handbook explores three key questions that
you would need to answer in order to design such a study.
- Are the program’s outcomes
intended to be the same for all its beneficiaries? If
not, how can you assess them?
- To help design assessment
procedures, how can we be more specific than merely
saying that the technology is meant to cause
“better educational outcomes?”
- What kinds of data about
benefits might help the people running the program to
improve those benefits (paralleling the way that
activity based cost data ought to be able to help policy
makers control costs)?
What’s a typical example of the kind of
outcome goal that ought to be measured? “All students should
learn to think critically (though perhaps to different
degrees of skill).” “All students should get jobs (perhaps
at different salaries).” In other words,
the goals assume that everyone is supposed to benefit in the
same ways. If that were true, it would
certainly make things simpler to measure – the analyst could
devise one test of achievement of benefit (e.g., a test of
critical thinking skill) and apply it to all the
beneficiaries. But what if some students
are gaining in critical thinking while others are mainly
improving their creativity and still others are gaining in
interpersonal skills?
As those examples indicate, there are
two ways to look at almost any educational program.
One perspective focuses on program benefits that are
the same for everyone (“uniform impacts”) while the other
perspective focuses on benefits are qualitatively different
and somewhat unpredictable for each learner (“unique uses”)
(Balestri, Ehrmann, et al., 1986; Ehrmann and Zúñiga, 1997,
2002). This section of the chapter
explains these complementary perspectives on education. The
following section will use these ideas to suggest ways to
assess specific types of benefits.
To some degree, all students in an
educational program are supposed to learn the same things.
As shown in Figure 1, such
learning by two people can be represented by two parallel
arrows. The length of each person’s arrow represents the
amount of growth during (and sometimes after) the program.
Students usually enter a program with differing
levels of knowledge, grow to differing degrees, and leave
with differing levels of achievement. The uniform impact
perspective assumes that the desired direction of growth is
the same for all students.
In an English course, for example,
uniform impact assessment might measure student
understanding of subject-verb agreement, or skill in writing
a 5 paragraph essay, or even love of the novels of Jane
Austen. The analyst picks one or more
such dimensions of learning and then assesses all learners
using the same test(s). I’ve labeled
this perspective “uniform impact” because it assumes that
the purpose of the program is to benefit all learners in the
same, predesigned way.
However, that same English course (or
other educational activity) can also be assessed by asking
how each learner benefited the most, no matter what
that benefit might have been. I’ve
termed this perspective “unique uses” because it assumes
that each student is a user of the program and that, as
unique human beings, learners each make somewhat different
and somewhat unpredictable uses of the opportunities that
the program provides.
In that English course, for example,
one student may fall in love with poetry, while another
gains clarity in persuasive writing, and a third falls in
love with literature, and a fourth doesn’t benefit much at
all. (See Figure 2)
Faculty members cope with this kind of
diversity all the time. An instructor may give three
students each an “A” but award the “A” for a different
reason in each case. The only common denominator is some
form of excellence or major growth that relates to the
general aims of the course. There are
multiple possibilities for growth and it’s likely that
different students will grow in different directions.
Notice that uniform impact methods tend
to miss a lot when benefits are better described in unique
uses terms. In that English class for example, imagine that
the instructor had decided to grade all students only on
poetry skills. One student would pass and the others would
fail. Or imagine that the instructor tested all students on
poetry, persuasive writing, and love of literature, and only
passed students who did well on all three tests: everyone
would fail the course. Meanwhile, an
instructor using a unique uses approach (seeking excellence
in at least one dimension of learning) would pass three of
the four students.
Uniform impact and
unique uses are both valid, and usually are both valid
for the same program. The challenge for the analyst is to
make sure that the assessment approaches are in tune with
the program’s goals and performance. If, for example, the
program’s goals are strongly “unique uses” then it is
inappropriate to employ only “uniform impact” measures, and
vice versa.
How can unique uses benefits be
assessed? Most unique uses assessments
follow these steps:
- Decide which students to
assess. All of them? A random sample? A stratified
random sample?
- Assess the students one at a
time. Ask the student what the most important benefit(s)
of the program have been for him or her. (At this point,
the respondent’s statement should be treated as a
hypothesis, not a proven fact.) This hypothesis about
benefits can also be created or fine-tuned by asking the
instructor(s), peers, or job supervisors about the
program’s benefits for that student.
- Gather data bearing on this
hypothesis. If the student said that the program helped
her get a job, what data might help you decide whether
to believe the assertion? (For
example, did the student really get a job? If the
student said that certain skills learned in the program
were important in getting the job, did the interviewer
notice those skills?) If
appropriate, assess the benefit for the student (for
example, if the benefit is a skill, assess how skilled
the student is).
- If appropriate, quantify the
benefit for that student. Panels of expert judges are
sometimes useful for this purpose. Their expertise may
come from their experience with programs of this type.
(This is exactly what teachers do when they grade
essays.)
- Identify patterns of
benefits. Was each student
completely unique? Or, more likely, did certain types of
students seem to benefit in similar ways? These findings
about patterns of benefit may suggest ways in which the
program can be improved. For example, suppose program
faculty consider “learning how to learn” to be only a
minor goal of the program. But 50% of their graduates
report that “learning how to learn” was the single most
important benefit of taking the program. In that case,
the faculty might want to put more resources into
“learning how to learn” in the future.
- Synthesize data from the
sample of students in order to evaluate the program’s
success.
Here are some additional questions to
ask yourself before you begin assessing benefits.
Outcomes or Value-Added? When studying benefits, are
you interested in outcomes (the state of things after the
student completes the program) or in value-added (how much
did their math understanding improve from the beginning of
the course to the end)? Outcomes can
often be improved simply by recruiting more skilled incoming
students, while value-added is more a result of the
education.
Competitors? or Criteria? It often makes sense to
define value by comparing the target triad with the most
realistic competitor -- if you didn't do it this way, what
would you do instead? The answer is only occasionally
"nothing." More often there would be some alternative triad
(a different activity and/or alternative technologies.) For
example, imagine an institution evaluating the pilot test of
an online calendaring system. The evaluation might compare
an alternate online system and also a paper-based, ad hoc
calendaring system, looking at the benefits and costs of
each of those three options. Criterion-based evaluations set
a standard of acceptable benefits and ask whether the
technology/activity combination achieved that level of
benefit. Similar criteria might be set for costs. So the
evaluation might be designed to answer the question, "Did
this intervention improve retention at least 10% at a cost
of less than $100/student retained?"
When is
“after”? Imagine two programs
about literature: A and B. Program A teaches a thousand
facts about novels that can be easily memorized but that are
quickly forgotten soon after taking the final exam. In
contrast, Program B teaches students to love novels so that
they continue reading and rereading books after the course
ends. Program B also encourages students to join or organize
book clubs so that they can talk with friends about the
books they’ve been reading. Program B’s
students finish with less factual knowledge than students
from Program A but, over the years, Program B graduates
become increasingly knowledgeable about literature. An exam
taken immediately after the completion of the two programs
might show higher scores for graduates of Program A.
But in another exam, given three months later,
Program B’s students might outscore Program A’s.
Two years later, the advantage of Program A over
Program B might be even larger. There
are many factors to consider in deciding when to assess
benefits. The purpose of the program is one of those
considerations.
Same Outcomes, or
Just Similar? When comparing learning outcomes of
Programs A and B, ask whether the two programs are trying to
teach exactly the same things. If they are, comparing
benefits is easier: use the same assessment measure for both
programs. That’s the assumption that
many people make about assessment: the most fair and
appropriate approach is to the use the same test of outcomes
on the two competing programs.
But that equivalence of goals is rare,
especially when technologies are used differently. Instead
the two programs usually have goals that only overlap, as
shown in Figure 3.
Imagine that Program A is taught mainly
via lecture in a classroom. The
competition, Program B, uses videotapes of that faculty
member’s lectures supported by an online seminar that is led
by an adjunct staff member. Goals
distinctive to Program A include benefits of face-to-face
contact with a tenured faculty member. Goals distinctive to
Program B might include benefits of greater student freedom
to explore topics of individual interest, greater in-depth
exploration of certain topics in the online seminar, and
learning how to collaborate online with other students.
A study of benefits that only attended to the common
goals (learning of course content, for example) would miss
some of the major reasons for choosing one program over the
other. In cases such as these it’s important to assess all
the important goals, not just those that are common to the
competing programs.
There are many categories of benefit
from technology use for education, including:
- Enrollment and attrition
(access to education)
- Better outcomes on
traditional goals (teaching-learning effectiveness)
- New outcomes not previously
sought or emphasized (e.g., computer-dependent aspects
of disciplines such as geographic information systems in
geography)
- Variety of offerings
available to each learner
- Controlling net costs and
stress; increasing net revenue
- Consequences of A, B, and C
for the graduate (e.g., employment)
- Consequences of A, B, and C
for the community in its economic, social, spiritual,
and political life.
- Consequences of gains in
personal and program efficiency (e.g., writing more
because it’s easier to use a word processor than a
typewriter)
- Helping the institution
attract and retain students and staff who expect a
certain degree of technology access.
- Helping the institution
attract and retain support from outside constituencies
who expect to see a certain level of technological
infrastructure.
This chapter focuses on methods for
analyzing benefits A, B, C and D.
Some programs are designed to produce
gains in access to education: people who couldn’t otherwise
have taken courses of this type; people who can now take
more courses; people who would have been less likely to pass
such courses.
The uniform impact perspective usually
invites attention to changes in total enrollment and
retention either for all learners (total enrollment) or a
particular target group (e.g., students of color).
To assess changes in enrollment obviously requires
counting students (not as easy as it sounds) and, sometimes,
getting data to indicate why they are enrolled. For example,
evaluators of distance learning programs need to know not
only how many students are enrolled but also how many of
those course enrollments would have occurred even without
the distance learning program.
The unique uses perspective raises the
question of whether particular types of students are
especially aided or impeded by program features.
For example, do online programs tend to attract and
retain students who are more comfortable in that environment
than in a face-to-face class?
It’s important to look at these unique
uses issues in enrollment and retention.
Historically, changes in educational structures have opened
access for some groups while restricting access for others
(Ehrmann, 1999a). The analyst and the
policy maker need to deal with whether the net change is
positive, whether the groups who benefit especially need
that benefit, and whether the groups that are impeded are
groups that have been excluded by past arrangements as well.
In this situation, the goals of the two
competing programs are the same.
In a uniform impact assessment, it’s
appropriate to use objective tests of student performance
students from Program A and B. A high
degree of skill is often needed to design objective tests,
but only a low amount of skill is needed to “grade” the
results: how much time did the student take to finish the
task? Did the project designed by the engineering student
actually function? How many questions were answered
correctly?
Rubrics are one of several useful tools for assessing
outcomes.
One sign that a unique uses perspective
is important for assessment is that there is more than one
way to define “successful learning.” Then a high degree of
expertise is usually needed to assess and grade student
work, e.g., evaluating an essay or term paper, judging a
student project.
Rubrics can
also be used when employing a unique uses perspective.
Note: Flashlight tools such as the
Current Student Inventory
and
Flashlight
Online contain many questions related to Chickering and
Gamson's "Seven Principles of
Good Practice in Undergraduate Education" (families of
activity such as improving faculty-student contact, active
learning, and time on task). Our tools focus in part on
these activities because research shows that they tend to
improve learning outcomes. So, especially if your study is
assessing whether outcomes on traditional goals are
improving, you may find many useful items in these
Flashlight tools for a study on whether your uses of
technology have contributed to those improvements in
outcomes.
Computers are often used in order to
change the goals of instruction: a new course of study in
e-business or computer music; education in how to solve
problems in a virtual team, an increased emphasis on complex
problem solving and abstract thinking in a course where
computers can now handle the skills that once required
memorization of rote problem-solving methods. So part of the
value comes from outcomes that are unique to one program or
the other. This brings us back to the challenge of comparing
programs whose goals are at least somewhat different (figure
3) or even wholly different.
In these cases, program A and program B
use different projects and tests to assess student learning.
Even if we discover that students in program A scored
5 points higher on test A than students in program B did on
test B, that tells us nothing about which program is more
valuable. What about giving students in
both programs a test that includes everything in both
program A and B? Testing students on
something they weren’t taught often leads to rebellion.
There are at least two feasible ways to
assess learning outcomes in programs with different goals.
Criterion-based assessment: It is
sometimes possible to assess learning against a standard.
Program A is teaching pilots to fly airplanes while program
B is teaching students to ride bicycles. Program A’s
students also learn to fly, while program B teaches only
half its students to ride a bicycle without falling over. In
that sense Program A is more successful than Program B, even
though different tests have been used.
But that kind of comparison doesn’t
deal with the value of teaching people to be pilots versus
bicycle riders, and that’s a tough question.
But suppose advocates of Program A and Program B
could agree on a panel of expert judges to assess their
programs. Those judges would be given
materials describing the programs’ goals and teaching
methods, the tests and projects used to assess student
learning, and the results of the assessments (test scores,
student projects). Using these
materials, the judges could then compare the two programs.
For example, suppose a disciplinary association in
graphic arts was considering two ways of teaching, one of
which was more technology-intensive than the other. A panel
of employers and graduate school representatives might
examine data about entering students, the curricula, tests,
and artwork from seniors. The panel
would then report on which Program they preferred, and why.
Education is being transformed by our
uses of technology (e.g., Ehrmann, 1999a).
One benefit of that change is the variety of
offerings, learning resources, experts and peers that are
potentially available to each learner.
How might the analyst assess the value of this variety –
both what’s offered, and what’s actually used?
The uniform impact perspective treats
all learners and potential learners as equal. For example,
in comparing Program A and B, the analyst might ask how many
sources of information are used by students doing research
papers. In comparing a virtual
university to a campus-based institution, the analyst might
compare the ways and places where faculty members were
educated: does the virtual institution offer a more varied
set of teachers than the campus?
The unique uses perspective focuses on
the different experiences of each learner. It tends to
direct attention toward the ways in which different types of
students exploit the available resources.
Perhaps a unique uses evaluation would conclude that
Virtual University A fostered a greater variety of student
learning, due to its flexibility and ability to reach out
for resources than did Campus B, whose students learned more
in lock step, using similar academic resources for similar
purposes.
The outcomes of technology use in the cost dimension are
not limited to the cost of technology, or even the costs
(some of them hidden) of supporting its use. Technology
rarely produces educational benefits of the types discussed
in sections A-D unless the program is reorganized. And the
reorganized program may cost more, or less, than its
predecessor to operate. Because higher education is a labor
intensive process, these costs are partly the ways that
people use their time. So the analysis of cost outcomes and
the analysis of work outcomes (e.g., from fulfillment to
stress or even burnout) are inseparable: controlling costs
ideally includes reducing those uses of time that are
aversive while maintaining or even increasing those uses of
time that are fulfilling and that advance the program's
objectives. For more on how to conduct this kind of
analysis, see the
Flashlight Cost Analysis Handbook (one of the other
benefits institutions receive as part of their
subscription.)
Before designing the particular
instruments for studying outcomes, you will need to answer
at least three defining questions:
a)
Is the program mainly trying to attain the same
benefits for all learners (uniform impacts)? Or is the
program also designed to help each learner make unique use
of its opportunities? Most college and university programs
have both goals, and each set of outcomes needs to be
assessed differently. In particular, when studying unique
uses, one needs to assess each student in the sample
separately and then afterward synthesize these assessments
in order to evaluate the program.
b)
Is the study going to consider educational
value-added (students at the end of the program contrasted
with students at the beginning) or only outcomes?
If value-added is to be evaluated, then some kind of
pre-test is necessary.
c)
Is the study going to measure benefits as the program
is concluding (e.g., final examination), and/or some time
after the program ends (e.g., at a time when students would
actually be making use of what they learned in the program)?
During this waiting time, some knowledge and skill
will diminish while other educational outcomes may improve
(if the student continues to use them).
6. References
Ehrmann, Stephen C. (1999a)
"Access and/or Quality: Redefining Choices in the Third
Revolution," Educom Review, September,
pp.24-27, 50-51. On the Web at
http://www.tltgroup.org/resources/or%20quality.htm
Ehrmann, Stephen C.
(1999b), "What Outcomes Assessment Misses," in
Architecture for Change: Information as Foundation.
Washington, DC: American Association for Higher Education.
On the Web at
http://www.tltgroup.org/programs/outcomes.html
Figures
|
 |
|
| |
 |
|
| |
Table 1. Two Complementary Perspectives on Education
|
|
Uniform Impact |
Unique Uses |
|
Purpose of Education
|
Produce certain (observable)
outcomes for all beneficiaries |
Help each person learn
something valuable |
|
Role of student? |
Object, who is impacted by
intervention. |
Subject, who makes use of the
intervention (educational opportunity)
|
|
The best improvements in
education? |
Those improvements whose
outcomes are replicable, so past excellence
portends identical excellence in the future,
even in different settings |
Those improvements that can
consistently produce excellent, creative
outcomes, where different settings stimulate
new kinds of excellence |
|
Variation among beneficiaries
|
Quantitative |
Qualitative and quantitative
|
|
Most important outcomes?
|
The objectives that the
educator used in planning the program
|
The outcomes that turn out to
be most important for each subject
|
|
How to assess learning
|
1. Ask the educator to state
goal in advance
2. Create
assessment procedures that can measure
progress toward that goal. |
1. Observe user and educator
goals.
2. Gather data about
subjects' achievements, problems.
3. Choose "connoisseurs" to
assess what's happened to each learner, and
then to evaluate the meaning of the
collective experience. |
|
"Did the intervention cause
the outcome?" How can you tell? |
Statistics, using control
groups |
Historical analysis of a
chain of events for each subject; control
group less important |
|
Quantitative and qualitative
data: which method uses which data?
|
Use both |
Use both |
|
| |
Return to Evaluation Handbook
Table of Contents
Note: This essay is adapted a chapter of
the same name, written by the author, for the second
edition of the Technology Costing Methodology
Handbook, published by the Western Cooperative for
Educational Technology (forthcoming). |