How Not to Evaluate Grant-funded Projects

 

Handbook and Other Materials l Asking the Right Questions (ARQ) l Training, Consulting, & External EvaluationFAQ

  Return to Flashlight Evaluation Handbook Table of Contents

These materials are for use only by institutions that subscribe to The TLT Group, to participants in TLT Group workshops that feature this particular material, and to invited guests. The TLT Group is a non-profit whose existence is made possible by subscription and registration fees. if you or your institution are not yet among our subscribers, we invite you to join us, use these materials, help us continue to improve them, and, through your subscription, help us develop new materials!  If you have questions about your rights to use, adapt or share these materials, please ask us (info @ tltgroup.org).

The Traditional Paradigm and its Problems
  1. The Project And Its Grant-Funded Evaluation Are Usually Over Before Much Of The Project's Impact On People's Lives Has Begun
  2. The Paradigm Implies That "Traditional" Education Is Uniform And Well Understood
  3. The Paradigm Implies That We Understand What This Innovation Is And What It's For, In Advance
  4. The Paradigm Implies That The Project's Objectives Are The Same For Everyone
  5. The Paradigm Implies That The Innovative Program Can Have An Impact On People's Lives, By Itself
  6. Relevance of the Flashlight Program
References
References on uniform impacts and unique uses

 

The Traditional Paradigm and its Problems

It's easy to describe the ideal way to evaluate a grant to use technology to improve education:

  1. First, figure out the desired impact on people's lives, what will matter to them (and to the funder, perhaps).

  2. Then figure out a way to measure that attribute in people.

  3. When the new software or approach is implemented, do a pretest of that attribute and then measure again after the grant has had its impact on them. At the same time, do a similar pair of tests on comparable people going through traditional education of this sort. Discover, if you're lucky, that the innovative program has resulted in more learning than the traditional method.

  4. Turn in this hard data to the funder as part of your final report on your grant and disseminate it in the wider world.

     

That paradigm sounds simple (although not easy) to do. Unfortunately, a number of fundamental problems often make that sort of evaluation impossible or inadequate, and potentially misleading.

 

In this essay, we'll consider a few of those difficulties with the paradigm and what sorts of approaches can complement or replace it.

 

1. The Project And Its Grant-Funded Evaluation Are Usually Over Before Much Of The Project's Impact On People's Lives Has Begun

Ordinarily, development projects have a chance to make a trial run or two, but the kinds of ultimate impacts that funders care about may well not have happened by the time evaluators must stop collecting data and start writing reports. Thus, the report usually dwells on problems of implementation and "feel good" stories about process.

Response: Extend the grant well beyond the end of the rest of the project. Or fund another project later on whose chief purpose is evaluation.

 

2. The Paradigm Implies That "Traditional" Education Is Uniform And Well Understood

"Traditional education" as a label covers even more ground than the label "innovative." Many evaluations are flawed in that they provide virtually no description of what was done in the control (traditional) group.

Response: Be just as searching in the evaluation of the "traditional" as in the evaluation of the "innovation" and just as descriptive in the report.

 

3. The Paradigm Implies That We Understand What This Innovation Is And What It's For, In Advance

The paradigm implies foreknowledge because it requires the development of a pretest that students take before using the innovation (or being used by it) and a parallel post-test. That's not possible if one isn't clear about what is to be taught. What's so difficult about that?

 

A first problem with this image is the problem of the moving target: the use of technology usually is associated with a change in the aims of an assignment, course, or course of study. Statistics provides just one example of this problem. A couple of decades ago, it was normal to teach statistics to students who used only paper and pencil to do assignments: that places certain limits on the techniques that could be taught. Today's statistics course (taught to students using advanced calculators or other computers) not only teaches new statistical techniques that require such machines; it also teaches new skills of thinking (e.g., strategies for investigating data by analyzing images created with the data). To evaluate the impact of computing on learning over the last couple of decades, one would have needed to anticipate the new techniques and skills and somehow incorporated them or their analogue in tests given years earlier. Without that, one is faced with a meaningless comparison -- students scored an average of 80% on their exams two decades ago and still average 80% (on fundamentally different exams). Are they learning "more"? The "same"? "Less"? The question has no meaning.

Response: There is no completely satisfactory response. One might:

  1. Evaluate whether the conditions of learning have improved (e.g., time on task is usually correlated with improved learning outcomes, so has time on task improved?);

     

  2. Evaluate the tests themselves (if students averaged, say, a "B" five years ago and average a "B" today, do external evaluators believe that this year's tests and assignments are more up to date and perhaps tougher than five years ago? Or have they been "dumbed down?").

A second problem about assuming an understanding of the innovation is that innovative projects evolve, often relatively rapidly. Innovations change as the implications of the underlying ideas emerge. They change as the capabilities of new hardware and software become better understood. They change as external decisionmakers perceive and react to the innovation.

Response: It's always a good idea to periodically step back and ask, "What is this initiative? What is it for? Where is it going?" The answers may well have changed in surprising ways over a period of months. A project whose chief objective a year ago was to help students understand the sociology of organization may well have acquired a new goal of instilling multicultural insight in the last twelve months.

 

A third problem relating to the identity of the innovation might be called "rapture of the technology." The evaluator erroneously assumes that the technology is the innovation. A project a decade ago provides a dramatic illustration of this mistake. In the mid 1980s, Brown University began developing Intermedia. (If you don't happen to remember it, visualize today's World Wide Web -- in fact something better than today's Web in the ease of creating links in a shared body of text and graphics -- but existing only on a local area network.) Brown's project was ambitious: to develop Intermedia and test it by letting faculty develop course materials in it and then teach in it. Ultimately, two professors were chosen to participate in the experiment as development of Intermedia got underway. One of the two was George Landow, a full professor in the Department of English.

 

The project had an unusually large fraction of its grant devoted to evaluation. Step 1 was for a team of anthropologists to observe Landow's literature course in its primal state: pre-Intermedia. Step 2 was for them to observe his whole course again, a year later, when he was using the prototype Intermedia system. In the interim, Landow advised on Intermedia development and crafted his course materials -- a web of essays and graphics on Dickens, literature, and the Victorian period.

There was a slip between Step 1 and Step 2, unfortunately: Intermedia fell behind schedule and was not ready for use in time. The evaluators were scheduled to observe Landow, however, so they did, even though he was still teaching without technology. They were surprised to observe that he had become a much better teacher: the experience of developing courseware had changed Landow -- he used more graphics, he had students creating pictures of the linkages between concepts, and he put more emphasis on student participation in the seminar. (When they observed Landow again, a year later when he did have Intermedia, the course had improved still further.)

The point here is that Intermedia itself was only part of the innovation: the full innovation, in Landow's hands at least, was a more learner-centered, more visual, more associative approach to the teaching of this literature course. Intermedia was a powerful tool that helped him implement this approach (Beeman et. al., 1988).

Response: Be systemic in defining the boundaries of the innovation. It might include not only software but staff (and their skills), changes in infrastructure, changes in the "market," and so on.

 

4. The Paradigm Implies That The Project's Objectives Are The Same For Everyone

There are two ways to look at almost any educational program, both valid:

  • As an effort to impact all learners in the same way (though some learners will almost always achieve more in that direction than others); and

  • As a resource that different learners will use differently with qualitatively different and somewhat unpredictable consequences.

The first perspective can be called "uniform impact" because it assumes that the educator is trying to shape learning and to do so in the same way for all learners. For example, when an English faculty member grades students on grammar, a uniform impact standard is probably being applied.

 

The second perspective can be called "unique uses" because it assumes that each learner actively interprets and makes use of the resource. Because human beings are unique, because of accidents, and because the teacher and other students will react to this learner, the educational outcomes will be particular to that learner. What matters is how good the learning outcome is (within a broad range of possibilities) rather than whether it matches a behavioral standard being applied to all students. For example, when an English faculty member grades student essays and gives two students B's for completely different reasons, a unique uses perspective is probably being applied.

 

Each of these two perspectives implies a different approach to evaluation. (See attached table for details and references.) The paradigm described at the start of this essay embodies a uniform impact approach: state the goal for all students in advance, and then test their learning. The unique uses perspective on that same innovation can't use that approach to evaluation. The same test can't be used for all students because one is searching for learning outcomes that may well be qualitatively different from one student to the next.

 

Instead, a unique uses evaluation begins with a broad search for important outcomes, good and bad, and then assesses each case one at a time. The English faculty member assigns an essay topic, "What I did on my Summer Vacation," and then grades the essays one at a time. The accreditor asks certain broad questions, reads the self-studies, and then visits the campuses to examine how well each one is achieving its own goals. First the cases are assessed individually (What were the most important outcomes for this student? How plausible is it that what I'm seeing is indeed an outcome of the program we're evaluating?). Then the evaluator considers what they imply, collectively, about the innovation.

 

The two perspectives make complementary and quite different assumptions about the nature of excellence in a program. In the uniform impact paradigm, an excellent program is one that fosters high achievement along a particular dimension for all students; it's even better if that program design can be replicated in new settings with comparable results. In contrast, a unique uses perspective values excellence that is unpredictable, fresh, and appropriately different in different settings. Shakespeare has been valued as an instructional resource for so long in large part for that reason: it always seems capable of yielding new interpretations and new instructional applications, and students can catch fire for quite different reasons and with quite different and surprising consequences.

 

One wonderful, multi-faceted evaluation of an innovation can be found in Network-Based Classrooms: Promises and Realities (Bruce, Peyton, and Batson, 1993). Its focus is the teaching of composition through the medium of real-time writing. The volume includes a variety of data and perspectives on this complex innovation, including both uniform impact evaluation (a study of essays using a few quite specific prearranged criteria) and unique uses (e.g., a study of a comparable number of essays by a renowned English faculty member with no prior relationship with this project, who read the essays, assessed each one individually on its merits, and then evaluated what they implied about the environment in which students had been learning.

 

These two perspectives, both important for almost any evaluation, are particularly important when technology is in use. Most instructional uses of information technology are meant to be empowering, i.e., to create fresh choices for instructors and learners. When students communicate more, when they work on projects, when they collaborate, the diversity of potential outcomes for learners increases. Any evaluation which uses only the uniform impact perspective will miss some of the most important consequences of this type of innovation.

Response: Don't fall into the trap of thinking that uniform impact evaluation is sufficient just because it is (often) quantitative and just because it is (often) necessary . To capture the outcomes of the empowering features of an innovation, a unique uses evaluation will almost always be necessary.

 

5. The Paradigm Implies That The Innovative Program Can Have An Impact On People's Lives, By Itself

That's true sometimes, for some of the people, but it's often not true for grant-funded projects in their impacts on most of the people. Or at least it isn't true if one is looking for unaided impact that is still perceptible for most people some time after they have finished the program. People are asking about whether the innovation was unaided if they ask, "How do you know the impact was caused by the innovation and not (also) by X?"

 

In the case of George Landow, for example, Intermedia played a role in the instructional improvement, but so did other influences on Landow's teaching (and on the students).

It's often true, in fact, that large impacts on people's lives usually derive from large, coherent patterns of individual events. Each event (assignment, encounter with a faculty member, encounter with a peer, etc.) may have different effects on each student, or fail to Affect that student perceptibly. But coherent patterns of instructional events (e.g., a set of courses that lead to a degree, a writing across the curriculum program, a set of courses and services designed to increase access for under-prepared students) are more likely to have a predictable, perceptible effect on most graduates' lives than are single assignments or courses that are not related to anything else in the college.

 

Response: Consider whether your project should be understood as part of a larger pattern of instruction and improvement in the institution, even though the rest of the pattern is not grant-funded and even though it might not use your technology. For example, your project may be (partly) designed to improve collaborative learning skills through uses of e-mail. It may not be possible to evaluate your project adequately unless you also attend to the extent to which the rest of the curriculum is succeeding or failing in fostering collaborative learning. For example, a great project (potentially) might look terrible if that project is the only part of the curriculum that's trying to teach a complex skill. A relatively mediocre innovation might look deceptively good if it fits into a larger pattern of teaching and learning that is going its way.

Considering a grant-funded project as part of a larger pattern of change implies two quite different evaluative tasks -- finding out:

  1. Is that larger pattern of improvement having a good effect on learning outcomes for graduates? Or are there other reasons to think that it will, aside from direct evidence (e.g., research in other institutions that shows that this teaching and learning strategy usually has good outcomes)?

  2. If so, is your innovation playing a useful role in the maintenance or growth of that larger change in strategy? Or, on the other hand, is your innovation neutral or even interfering with that improvement in overall teaching and learning strategy?

 

6. Relevance of the Flashlight Program

The Flashlight Program consists of a set of evaluation tool kits and manuals for the creation of local studies of the educational uses of information technology. The first tool kit, the Flashlight Current Student Inventory (Ehrmann and Zúñiga, 1997) has been site licensed by almost 600 institutions.

 

Flashlight is not universally useful. There's probably no grant-funded technology project for which Flashlight would be a sufficient tool kit and there are certainly grant-funded technology projects for which the Flashlight tools would be irrelevant. Nonetheless, the foregoing lessons about evaluating projects and larger scale changes in curricula are embodied in Flashlight:

  1. Flashlight studies can be carried out repeatedly over a period of years, so that the evaluation isn't limited to the period of a single grant.

  2. Flashlight tools are suitable for studying "traditional" education (in its variety) alongside technology-enhanced courses.

  3. The Flashlight approach is flexible enough to help in the discussion of what the innovation is, and how it might be changing.

  4. Flashlight tools can be used for both uniform impact and unique uses studies of the same course of study.

  5. Flashlight helps explain why good and bad outcomes occur by helping document the processes that usually produce such outcomes.

  6. Flashlight approach can be used to diagnose why an innovation is progressing unevenly and provide guidance to project staff for how to improve the chances for success.

References
Beeman, William O., Anderson, Kenneth T., Bader, Gail, Larkin, James, McClard, Anne P., McQuillan, Patrick, and Shields, Mark. (1988). Intermedia: A Case Study of Innovation in Higher Education. Providence, RI: Institute for Research in Information and Scholarship, Brown University.

 

Bruce, Bertram, Peyton, Joy, and Batson, Trent. (Eds.) (1993). Network-Based Classrooms: Promises and Realities. New York: Cambridge University Press.

 

 

Two Complementary Philosophies of
Education and Evaluation

Stephen C. Ehrmann, Ph.D., Director, The Flashlight Program

 

Uniform Impact

Unique Uses

Purpose of Education

Produce certain (observable) outcomes for all beneficiaries

Help each person learn something valuable (almost) no matter what that thing is

How is the beneficiary seen?

Object, who is impacted by intervention

Subject who makes use of the intervention

The best improvements in education?

Those whose outcomes are replicable, so past performance predicts future performance, even in somewhat different settings

Those that can consistently be used to produce excellent, sometimes unpredictable outcomes for individual subjects

Variation among beneficiaries

Quantitative

Qualitative and quantitative

Most important outcomes?

The common ones (plus or minus)

Those that are most important for each subject (plus or minus)

How judge value

1. Ask the intervening educator about goals
2. create evaluation procedures that can measure progress along that goal (measuring value can entirely or mostly be embodied in the method of assessment)

1. Observe user and intervenor goals
2. Gather data about subjects' achievements, problems
3. Choose "connoisseurs" to assess what's happened to each, and then to evaluate the meaning of the collective experience

"Did the intervention cause the outcome?" How can you tell?

Statistics, using control groups

Historical analysis of a chain of events for each subject; control group less important

Quantitative and qualitative data: which method uses which data?

Use both

Use both

 

REFERENCES ON UNIFORM IMPACTS AND UNIQUE USES

The longest discussion of these concepts appears in The Flashlight Evaluation Handbook by Ehrmann and Zúñiga (Washington, DC: The TLT Group, 1997); this volume is currently distributed only to institutions and organizations that license the Flashlight evaluation tools. For some further material by the author on these ideas, see:

  1. The evaluation chapter in D.P. Balestri, S.C. Ehrmann, et.al., Ivory Towers, Silicon Basements: Learner-Centered Computing in Postsecondary Education, McKinney TX: Academic Computing, 1988.

  2. Stephen C. Ehrmann, "Assessing the Open End of Learning: Roles for New Technologies," Liberal Education, LXXIV:3, (May-June, 1988), pp. 5-11.

  3. Stephen C. Ehrmann, "Gauging the Educational Value of a College's Investments in Technology," EDUCOM Review, XXVI:3,4, (Fall/Winter, 1991, 24-28.

  4. For the classic reference to the role of artistic judgment in evaluation, see the work of Elliot Eisner, e.g., The Educational Imagination, NY: Macmillan, 1979.

  Flashlight Evaluation Handbook Table of Contents

 

PO Box 5643
Takoma Park, Maryland 20913
Phone
: 301.270.8312/Fax: 301.270.8110  

To talk about our work
or our organization
contact:  Sally Gilbert

Search TLT Group.org

Contact us | Partners | TLTRs | FridayLive! | Consulting | 7 Principles | LTAs | TLT-SWG | Archives | Site Map |