|
Return to
Flashlight Evaluation Handbook Table of Contents
These
materials are for use only by institutions that subscribe to
The TLT Group, to participants in TLT Group workshops that
feature this particular material, and
to invited guests. The TLT Group is a non-profit whose
existence is made possible by subscription and registration
fees. if you or your institution are not yet among
our subscribers,
we invite you to
join us, use these materials, help us
continue to improve them, and, through your subscription,
help us develop new materials! If you have questions
about your rights to use, adapt or share these materials,
please ask us (info @ tltgroup.org).
The Traditional Paradigm and its Problems
1.
The Project And Its Grant-Funded Evaluation Are Usually Over
Before Much Of The Project's Impact On People's Lives Has Begun
2.
The Paradigm Implies That "Traditional" Education Is Uniform And
Well Understood
3.
The Paradigm Implies That We Understand What This Innovation Is
And What It's For, In Advance
4.
The Paradigm Implies That The Project's Objectives Are The Same
For Everyone
5.
The Paradigm Implies That The Innovative Program Can Have An
Impact On People's Lives, By Itself
6.
Relevance of the Flashlight Program
References
References on uniform impacts and unique uses
The Traditional Paradigm and its Problems
It's easy to describe the ideal way to
evaluate a grant to use technology to improve education:
-
First,
figure out the desired impact on people's lives, what will
matter to them (and to the funder, perhaps).
-
Then
figure out a way to measure that attribute in people.
-
When the
new software or approach is implemented, do a pretest of
that attribute and then measure again after the grant has
had its impact on them. At the same time, do a similar pair
of tests on comparable people going through traditional
education of this sort. Discover, if you're lucky, that the
innovative program has resulted in more learning than the
traditional method.
-
Turn in
this hard data to the funder as part of your final report on
your grant and disseminate it in the wider world.
That paradigm
sounds simple (although not easy) to do. Unfortunately, a number
of fundamental problems often make that sort of evaluation
impossible or inadequate, and potentially misleading.
In this
essay, we'll consider a few of those difficulties with the
paradigm and what sorts of approaches can complement or replace
it.
1. The Project And Its Grant-Funded
Evaluation Are Usually Over Before Much Of The Project's Impact
On People's Lives Has Begun
Ordinarily, development projects have a
chance to make a trial run or two, but the kinds of ultimate
impacts that funders care about may well not have happened by
the time evaluators must stop collecting data and start writing
reports. Thus, the report usually dwells on problems of
implementation and "feel good" stories about process.
Response:
Extend the grant well beyond the end of the rest of the project.
Or fund another project later on whose chief purpose is
evaluation.
2. The Paradigm Implies That
"Traditional" Education Is Uniform And Well Understood
"Traditional education" as a label covers
even more ground than the label "innovative." Many evaluations
are flawed in that they provide virtually no description of what
was done in the control (traditional) group.
Response:
Be just as searching in the evaluation of the "traditional" as
in the evaluation of the "innovation" and just as descriptive in
the report.
3. The Paradigm Implies That We
Understand What This Innovation Is And What It's For, In Advance
The paradigm implies foreknowledge because
it requires the development of a pretest that students take
before using the innovation (or being used by it) and a parallel
post-test. That's not possible if one isn't clear about what is
to be taught. What's so difficult about that?
A first
problem with this image is the problem of the moving target: the
use of technology usually is associated with a change in the
aims of an assignment, course, or course of study. Statistics
provides just one example of this problem. A couple of decades
ago, it was normal to teach statistics to students who used only
paper and pencil to do assignments: that places certain limits
on the techniques that could be taught. Today's statistics
course (taught to students using advanced calculators or other
computers) not only teaches new statistical techniques that
require such machines; it also teaches new skills of thinking
(e.g., strategies for investigating data by analyzing images
created with the data). To evaluate the impact of computing on
learning over the last couple of decades, one would have needed
to anticipate the new techniques and skills and somehow
incorporated them or their analogue in tests given years
earlier. Without that, one is faced with a meaningless
comparison -- students scored an average of 80% on their exams
two decades ago and still average 80% (on fundamentally
different exams). Are they learning "more"? The "same"? "Less"?
The question has no meaning.
Response:
There is no completely satisfactory response. One might:
-
Evaluate
whether the conditions of learning have improved (e.g., time
on task is usually correlated with improved learning
outcomes, so has time on task improved?);
-
Evaluate
the tests themselves (if students averaged, say, a "B" five
years ago and average a "B" today, do external evaluators
believe that this year's tests and assignments are more up
to date and perhaps tougher than five years ago? Or have
they been "dumbed down?").
A second
problem about assuming an understanding of the innovation is
that innovative projects evolve, often relatively rapidly.
Innovations change as the implications of the underlying ideas
emerge. They change as the capabilities of new hardware and
software become better understood. They change as external
decisionmakers perceive and react to the innovation.
Response:
It's always a good idea to periodically step back and ask, "What
is this initiative? What is it for? Where is it going?" The
answers may well have changed in surprising ways over a period
of months. A project whose chief objective a year ago was to
help students understand the sociology of organization may well
have acquired a new goal of instilling multicultural insight in
the last twelve months.
A third
problem relating to the identity of the innovation might be
called "rapture of the technology." The evaluator erroneously
assumes that the technology is the innovation. A project
a decade ago provides a dramatic illustration of this mistake.
In the mid 1980s, Brown University began developing Intermedia.
(If you don't happen to remember it, visualize today's World
Wide Web -- in fact something better than today's Web in the
ease of creating links in a shared body of text and graphics --
but existing only on a local area network.) Brown's project was
ambitious: to develop Intermedia and test it by letting faculty
develop course materials in it and then teach in it. Ultimately,
two professors were chosen to participate in the experiment as
development of Intermedia got underway. One of the two was
George Landow, a full professor in the Department of English.
The project
had an unusually large fraction of its grant devoted to
evaluation. Step 1 was for a team of anthropologists to observe
Landow's literature course in its primal state: pre-Intermedia.
Step 2 was for them to observe his whole course again, a year
later, when he was using the prototype Intermedia system. In the
interim, Landow advised on Intermedia development and crafted
his course materials -- a web of essays and graphics on Dickens,
literature, and the Victorian period.
There was a
slip between Step 1 and Step 2, unfortunately: Intermedia fell
behind schedule and was not ready for use in time. The
evaluators were scheduled to observe Landow, however, so they
did, even though he was still teaching without technology. They
were surprised to observe that he had become a much better
teacher: the experience of developing courseware had changed
Landow -- he used more graphics, he had students creating
pictures of the linkages between concepts, and he put more
emphasis on student participation in the seminar. (When they
observed Landow again, a year later when he did have
Intermedia, the course had improved still further.)
The point
here is that Intermedia itself was only part of the innovation:
the full innovation, in Landow's hands at least, was a more
learner-centered, more visual, more associative approach to the
teaching of this literature course. Intermedia was a powerful
tool that helped him implement this approach (Beeman et. al.,
1988).
Response: Be
systemic in defining the boundaries of the innovation. It might
include not only software but staff (and their skills), changes
in infrastructure, changes in the "market," and so on.
4. The Paradigm Implies That The
Project's Objectives Are The Same For Everyone
There are two ways to look at almost any
educational program, both valid:
-
As an
effort to impact all learners in the same way (though some
learners will almost always achieve more in that direction
than others); and
-
As a
resource that different learners will use differently with
qualitatively different and somewhat unpredictable
consequences.
The first
perspective can be called "uniform impact" because it
assumes that the educator is trying to shape learning and to do
so in the same way for all learners. For example, when an
English faculty member grades students on grammar, a uniform
impact standard is probably being applied.
The second
perspective can be called "unique uses" because it
assumes that each learner actively interprets and makes use of
the resource. Because human beings are unique, because of
accidents, and because the teacher and other students will react
to this learner, the educational outcomes will be particular to
that learner. What matters is how good the learning outcome is
(within a broad range of possibilities) rather than whether it
matches a behavioral standard being applied to all students. For
example, when an English faculty member grades student essays
and gives two students B's for completely different reasons, a
unique uses perspective is probably being applied.
Each of these
two perspectives implies a different approach to evaluation.
(See attached table for details and references.) The paradigm
described at the start of this essay embodies a uniform impact
approach: state the goal for all students in advance, and then
test their learning. The unique uses perspective on that same
innovation can't use that approach to evaluation. The same test
can't be used for all students because one is searching for
learning outcomes that may well be qualitatively different from
one student to the next.
Instead, a
unique uses evaluation begins with a broad search for important
outcomes, good and bad, and then assesses each case one at a
time. The English faculty member assigns an essay topic, "What I
did on my Summer Vacation," and then grades the essays one at a
time. The accreditor asks certain broad questions, reads the
self-studies, and then visits the campuses to examine how well
each one is achieving its own goals. First the cases are
assessed individually (What were the most important outcomes for
this student? How plausible is it that what I'm seeing is indeed
an outcome of the program we're evaluating?). Then the evaluator
considers what they imply, collectively, about the innovation.
The two
perspectives make complementary and quite different assumptions
about the nature of excellence in a program. In the uniform
impact paradigm, an excellent program is one that fosters high
achievement along a particular dimension for all students; it's
even better if that program design can be replicated in new
settings with comparable results. In contrast, a unique uses
perspective values excellence that is unpredictable, fresh, and
appropriately different in different settings. Shakespeare has
been valued as an instructional resource for so long in large
part for that reason: it always seems capable of yielding new
interpretations and new instructional applications, and students
can catch fire for quite different reasons and with quite
different and surprising consequences.
One
wonderful, multi-faceted evaluation of an innovation can be
found in Network-Based Classrooms: Promises and Realities
(Bruce, Peyton, and Batson, 1993). Its focus is the teaching of
composition through the medium of real-time writing. The volume
includes a variety of data and perspectives on this complex
innovation, including both uniform impact evaluation (a study of
essays using a few quite specific prearranged criteria) and
unique uses (e.g., a study of a comparable number of essays by a
renowned English faculty member with no prior relationship with
this project, who read the essays, assessed each one
individually on its merits, and then evaluated what they implied
about the environment in which students had been learning.
These two
perspectives, both important for almost any evaluation, are
particularly important when technology is in use. Most
instructional uses of information technology are meant to be
empowering, i.e., to create fresh choices for instructors and
learners. When students communicate more, when they work on
projects, when they collaborate, the diversity of potential
outcomes for learners increases. Any evaluation which uses only
the uniform impact perspective will miss some of the most
important consequences of this type of innovation.
Response:
Don't fall into the trap of thinking that uniform impact
evaluation is sufficient just because it is (often) quantitative
and just because it is (often) necessary . To capture the
outcomes of the empowering features of an innovation, a unique
uses evaluation will almost always be necessary.
5. The Paradigm Implies That The
Innovative Program Can Have An Impact On People's Lives, By
Itself
That's true sometimes, for some of the
people, but it's often not true for grant-funded projects in
their impacts on most of the people. Or at least it isn't true
if one is looking for unaided impact that is still perceptible
for most people some time after they have finished the program.
People are asking about whether the innovation was unaided if
they ask, "How do you know the impact was caused by the
innovation and not (also) by X?"
In the case
of George Landow, for example, Intermedia played a role in the
instructional improvement, but so did other influences on
Landow's teaching (and on the students).
It's often
true, in fact, that large impacts on people's lives usually
derive from large, coherent patterns of individual events. Each
event (assignment, encounter with a faculty member, encounter
with a peer, etc.) may have different effects on each student,
or fail to Affect that student perceptibly. But coherent
patterns of instructional events (e.g., a set of courses that
lead to a degree, a writing across the curriculum program, a set
of courses and services designed to increase access for
under-prepared students) are more likely to have a predictable,
perceptible effect on most graduates' lives than are single
assignments or courses that are not related to anything else in
the college.
Response:
Consider whether your project should be understood as part of a
larger pattern of instruction and improvement in the
institution, even though the rest of the pattern is not
grant-funded and even though it might not use your technology.
For example, your project may be (partly) designed to improve
collaborative learning skills through uses of e-mail. It may not
be possible to evaluate your project adequately unless you also
attend to the extent to which the rest of the curriculum is
succeeding or failing in fostering collaborative learning. For
example, a great project (potentially) might look terrible if
that project is the only part of the curriculum that's trying to
teach a complex skill. A relatively mediocre innovation might
look deceptively good if it fits into a larger pattern of
teaching and learning that is going its way.
Considering a
grant-funded project as part of a larger pattern of change
implies two quite different evaluative tasks -- finding out:
-
Is that
larger pattern of improvement having a good effect on learning
outcomes for graduates? Or are there other reasons to think
that it will, aside from direct evidence (e.g., research in
other institutions that shows that this teaching and learning
strategy usually has good outcomes)?
-
If so, is
your innovation playing a useful role in the maintenance or
growth of that larger change in strategy? Or, on the other
hand, is your innovation neutral or even interfering with that
improvement in overall teaching and learning strategy?
6. Relevance of the Flashlight
Program
The
Flashlight Program consists of a set of evaluation tool kits and
manuals for the creation of local studies of the educational
uses of information technology. The first tool kit, the
Flashlight Current Student Inventory (Ehrmann and Zúñiga, 1997)
has been site licensed by almost 600 institutions.
Flashlight is
not universally useful. There's probably no grant-funded
technology project for which Flashlight would be a sufficient
tool kit and there are certainly grant-funded technology
projects for which the Flashlight tools would be irrelevant.
Nonetheless, the foregoing lessons about evaluating projects and
larger scale changes in curricula are embodied in Flashlight:
-
Flashlight
studies can be carried out repeatedly over a period of years,
so that the evaluation isn't limited to the period of a single
grant.
-
Flashlight
tools are suitable for studying "traditional" education (in
its variety) alongside technology-enhanced courses.
-
The
Flashlight approach is flexible enough to help in the
discussion of what the innovation is, and how it might be
changing.
-
Flashlight
tools can be used for both uniform impact and unique uses
studies of the same course of study.
-
Flashlight
helps explain why good and bad outcomes occur by helping
document the processes that usually produce such outcomes.
-
Flashlight approach
can be used to diagnose why an innovation is progressing
unevenly and provide guidance to project staff for how to
improve the chances for success.
References
Beeman, William O., Anderson,
Kenneth T., Bader, Gail, Larkin, James, McClard, Anne P.,
McQuillan, Patrick, and Shields, Mark. (1988). Intermedia: A
Case Study of Innovation in Higher Education. Providence,
RI: Institute for Research in Information and Scholarship, Brown
University.
Bruce,
Bertram, Peyton, Joy, and Batson, Trent. (Eds.) (1993).
Network-Based Classrooms: Promises and Realities. New York:
Cambridge University Press.
Two Complementary Philosophies of
Education and Evaluation
Stephen C. Ehrmann, Ph.D.,
Director, The Flashlight Program
|
|
Uniform
Impact |
Unique
Uses |
|
Purpose
of Education |
Produce
certain (observable) outcomes for all beneficiaries
|
Help
each person learn something valuable (almost) no matter
what that thing is |
|
How is
the beneficiary seen? |
Object,
who is impacted by intervention |
Subject
who makes use of the intervention |
|
The
best improvements in education? |
Those
whose outcomes are replicable, so past performance
predicts future performance, even in somewhat different
settings |
Those
that can consistently be used to produce excellent,
sometimes unpredictable outcomes for individual subjects
|
|
Variation among beneficiaries |
Quantitative |
Qualitative and quantitative |
|
Most
important outcomes? |
The
common ones (plus or minus) |
Those
that are most important for each subject (plus or minus)
|
|
How
judge value |
1. Ask
the intervening educator about goals
2. create evaluation procedures that can measure progress
along that goal (measuring value can entirely or mostly be
embodied in the method of assessment) |
1.
Observe user and intervenor goals
2. Gather data about subjects' achievements, problems
3. Choose "connoisseurs" to assess what's happened to
each, and then to evaluate the meaning of the collective
experience |
|
"Did
the intervention cause the outcome?" How can you tell?
|
Statistics, using control groups |
Historical analysis of a chain of events for each subject;
control group less important |
|
Quantitative and qualitative data: which method uses which
data? |
Use
both |
Use
both |
REFERENCES ON UNIFORM IMPACTS AND
UNIQUE USES
The longest discussion of these concepts
appears in The Flashlight Evaluation Handbook by Ehrmann
and Zúñiga (Washington, DC: The TLT Group, 1997); this volume is
currently distributed only to institutions and organizations
that license the Flashlight evaluation tools. For some further
material by the author on these ideas, see:
-
The
evaluation chapter in D.P. Balestri, S.C. Ehrmann, et.al.,
Ivory Towers, Silicon Basements: Learner-Centered
Computing in Postsecondary Education, McKinney TX:
Academic Computing, 1988.
-
Stephen
C. Ehrmann, "Assessing the Open End of Learning: Roles for
New Technologies," Liberal Education, LXXIV:3,
(May-June, 1988), pp. 5-11.
-
Stephen
C. Ehrmann, "Gauging the Educational Value of a College's
Investments in Technology," EDUCOM Review, XXVI:3,4,
(Fall/Winter, 1991, 24-28.
-
For the classic
reference to the role of artistic judgment in evaluation,
see the work of Elliot Eisner, e.g., The Educational
Imagination, NY: Macmillan, 1979.
-
Flashlight Evaluation Handbook Table of Contents
|