Monday, August 09, 2010

Scoring Points or Making Progress?

The smaller the stakes, the more intense the backstabbing - or so the old saying about academia goes.  But sometimes the backstabbing is bad even when the stakes are high.

I went to an aid conference last year where the speakers took turns ridiculing each other's work.   The speakers in question were all among the top in their field, and all had tenure, so to my mind they had little reason to be insecure.  And many of them are even personal friends!  The atmosphere seemed almost childish - though most everyone there was over forty.

I asked one of them afterwards why they all seemed intent on attacking and triumphantly exposing a shortcoming or flaw in the others' presentations, with the implication being the whole approach was baloney.   Why not build on what was valid in others' presentations and then refine one's own view, so as to more quickly advance their understanding of the world? 

"That's not the way it works," my friend told me.  "You advance knowledge by mud-slinging, not by listening.  That's the way it has always been and always will be."

I just can't accept this. People need to get past their own egos and actually talk to - and learn from - each other, especially when the stakes are high.  In that context, I was really happy to have a good conversation about RCTs (Randomized Controlled Trials) with some friends and highly qualified colleagues on Facebook.  I learned a lot from it, and it may even generate a paper.  I am reproducing my original post, along with their comments (with their permission) below.

Traditional Evaluations are Not Scalable

It’s a standard trope of this blog to point out that there’s no panacea in global development. That’s true of impact evaluation, too. It’s a tool for identifying worthwhile development efforts, but it is not the only tool.  We can’t go back to assuming that good intentions lead to good results, but there must be room for judgment and experience in with the quantifiable data.
That is Alanna Shaihk guest blogging at AidWatch.   She describes two limitations to evaluation discussed by Steve Lawry of the Hauser Center at Harvard.  Excessive reliance on evaluation, Lawry says, stifles innovation and artificially constrains aid agencies to initiatives that can be easily measured with data.

I would add a third limitation.  Formal evaluations, including the gold standard of randomized controlled trials, are not scalable.  We simply do not have the time and resources to do centralized, in-depth evaluations of everything.  The only way forward is to establish a decentralized, implicit form of evaluation in which beneficiaries and other stakeholders can provide feedback about quality and relevance of aid projects.

This is how markets work.  The magazine Consumer Reports does a great job of evaluating products.  But it evaluates a miniscule proportion of all the products produced each year in the economy.  So who evaluates the other 99.9999% of products?  The consumer.  If the consumers buy a product, it keeps getting produced.  If not, it doesn't. Does this system work perfectly?  Of course not.  Does it work better than any alternative we have found?  By far.

Related articles by Zemanta

    • Michael Clemens A lot of the opposition to RCTs seems to be opposition to a *requirement* for RCTs. How can an RCT by itself 'stifle innovation'? You can certainly stifle innovation by requiring one single evaluation method for all projects, which would be dumb. Using RCTs in the specific settings in which they can be used is just one additional way to generate information (very good quality information) to add to the marketplace.
      August 1 at 1:07am · ·

    • Dennis Whittle
      Michael - you are completely right, and said it better than I could have - RCTs are a valuable part of a toolkit and very useful for certain things. I am making a broader point about traditional evaluations in genera, which are (a) too exp...ensive and cumbersome, (b) often suffer from the problem of "you can tell a project's position or you can tell its velocity, but not both at the same time" problem, and (c) have little or no effect on future actions. The effective feedback loop issue is the killer. The last speech I gave before I left the World Bank was the keynote at the Evaluation Unit's annual retreat. The title was "I Have a Dream," and the second line was "that project staff would come read evaluations before they started a new project."
      August 1 at 6:02am · ·

    • April Harding
      Sorry I missed that speech Dennis. I hope you saved it 'cause operational staff still don't (or don't have time) to read impact evaluation results. You could give it again some time.

      My problem with RCTs (or the RCT movement) is that they not, by and large, answering the policy and program relevant questions - so those expensive evaluations are generating answers to unimportant questions (or partial answers to important questions). It is my sense that this phenomenon is "crowding out" much needed research on critically important questions. Yet researchers won't shift to the more important questions because the methods you can apply aren't as rigorous - hence they won't get published, or they will lose status in their social milieu - or what ever.
      In the worse cases, researchers then take their answers to one question (e.g. slope of demand curve for a health product) and pretend to have answered the important policy question (e.g. most effective design of a program to increase of the health product).
      See More
      August 1 at 6:54am · ·

    • Michael Clemens
      Thanks April, that is really interesting. I can be too much of a cheerleader for randomization, because I've seen firsthand that it's possible to work in an area where people have said that rigorous evaluation can't be done -- international... migration -- and yet I found several ways to do it, by finding natural experiments. But what I need to pay more attention to is that there are huge numbers of programs in which there isn't a convenient natural experiment and it would be very costly to design an experiment into it, but people still need to have a good idea of the impact. And as you say, many researchers just don't have an incentive to touch such a project.

      Maybe what's needed is the same evolution that took place in medicine, after that field began to adopt randomization for some purposes. In medicine researchers can publish results from Phase I trials (almost never randomized), Phase II trials (sometimes randomized), and Phase III trials (usually randomized). Many patients who need a solution *now* go for experimental treatments that are still in Phase I or II. In other words, for many patients it doesn't matter if there's ever a Phase III because if it doesn't work they won't be around to see it. And researchers have an incentive to do all three flavors because all three are publishable.

      So an important question is: How can the NGO/government/foundation world generate incentives for the creation of Phase I/II-style research? By creating a career path for people who do it, online journals for them to publish in, and so on, the way universities today and existing journals provide a career path for people who do exclusively Phase III-style research. This alternative research world could support those who do Phase I/II work even if the decision point for many projects must arrive before there's a Phase III, or even regardless of whether or not a Phase III is possible.

      Just brainstorming! And interested in what you think.
      See More
      August 1 at 7:39am · ·

    • April Harding
      Michael, I think this is one of the most important issues we can think about - how to incentivize more research along that spectrum.

      To take the health program example:
      In health policy/ program design for developing countries you see two t...ypes of studies: studies doing all kinds of analysis on household survey data (driven by availability of data); and experiments w RCTs (driven by the range of things discussed here). Only a small portion of the burning questions can be answered with HHS data and experiments/ RCTs. If you look at health policy and program research in developed countries, policy decisions are informed by a vast literature (usually referred to as health services research) with standardized data from much broader sources (e.g. not just households, but also from providers and from payers and often from policy implementers. And you see a much broader range of methods applied - where the most rigorous method is selected to suit the question and data. Clearly - part of what enables this is the existence of health services research journals (and reviewers) who understand the range of methodologies and data, so that good research can get published across the range of data sources and methodologies. But another enabling factor is: the large investments made to collect and provide the standardized data in an open, user friendly format. It would make a huge difference if development assistance funding would get programmed toward providing this data (expanding on USAID's incredible contribution via the demographic and health surveys). Of course, these funds would generate results in the medium to long term - and who the heck funds initiatives with that kind of time horizon.

      Our pal Bill said something once, something like: there is no development economics, there is just economics. I often feel like saying something similar: there is no special field/ techniques for researching social programs and policies in developing countries, there is just high quality social policy research. We need to nudge the field away from it's love affair with RCTs (which is not to say become unconcerned with rigor) and away from it's dependence on narrow data sources - which can only shed light on a small fraction of the quesitons we need answered)
      See More
      August 1 at 10:53am · ·

    • Michael Clemens Wow, this is fantastic, April, really clear. Want to write a little 'Note' on this subject with me? It would be great and people would read it.
      August 1 at 7:26pm · ·

    • Dennis Whittle May I remind you both that I own the copyright to this excellent discussion since it occurred on my wall? I want 10% of the royalties and 5% of the movie rights?

      August 2 at 9:07am via Email Reply · ·

    • Marc Maxson
      Dennis - I cringe at metaphors based on Heisenberg's 10^-34 less-than-certainty principle (maybe because I'm a scientist?). There are plenty of examples of tradeoffs, but there has got to be a better way than pseudo-quoting scientific prini...cples beyond their reasonable domain.

      On the metaphor to Phase I,II,III trials in medicine - it's worth paying attention to the biggest innovation to the system in the last decade. In 2005-ish, all drug trials had to be registered BEFORE they began in order to be considered as evidence in the subsequent FDA approval process. Some companies dropped marginal "me-too" drugs as a result, but more good drugs made it into FDA approval and with more reliable evidence. And best of all - the drug companies that would start and restart trials in phase III until they got the result they wanted were no longer able to bias their results.

      If you look back at the shouting over this rule, you'll see lots of people complaining that it would stifle innovation since drug companies would become more conservative and drop drugs before testing them all. I'm for that- it drives the cost of healthcare BACK DOWN. Isn't there a way to keep RCTs in check through a similar process to the drug RCT registry? That way negative data is always public and traceable, and RCTs are used for hypotheses that are a good bet, not a goose chase. That money could be used in some other less rigorous way for all the goose chases.
      See More
      August 2 at 11:05pm · ·

    • Michael Clemens Thanks very much Marc, this is really fascinating.
      August 3 at 4:12am · ·

    • Mari Kuraishi Wow. This has got to be the most substantive discussion I've seen on FB ever. Not to mention the most open-minded--and that's on any platform.
      August 3 at 10:22am · · 1 personLoading... ·

    • April Harding Agree. This is a great discussion.
      Michael, let's get together to discuss writing policy a note when you are back in DC.

    • August 3 at 2:56pm · ·

    • April Harding
      Chris Blattman just blog commented on the issue of development research being led by data rather than bringing to bear the range of relevant methodologies (he categorizes them into quantitative and qualitative). He is discussing a recent ...paper by Bamberger, Rao and Woolcock. Seems relevant for our policy note Michael.
      I loved his recommendation: Marry someone who specializes in the "other" category of methodologies. That oughta help keep you honest. Failing that, find co-authors who fit the bill.
      See More
      August 3 at 9:49pm · ·

    • Dennis Whittle
      Just catching up here. A couple of things. Marc, my use of the uncertainty principle metaphor is only half in jest. Using RCTs keeps the investigator from really trying to understand the phenomenon because of the need for double-blind te...chniques to avoid bias. But if the investigator really tries to understand the phenomenon by talking to the subjects, they bias the results, for a variety of reasons in my experience (which I can expand on later).

      Second, somewhere t I saw an analysis of the proportion of commonly accepted medical treatments that fall into Categories I, II, and III (RCTs) above. Only 1/3 or less were RCT-based. And a surprising number were Category I. This could have several implications. The distribution could be optimal in some sense. Or a number of Category I and II treatments could be useless or even harmful.

      Speaking of distributions, in medicine there is surprisingly little understand of the distribution of patient response to treatments. If the distribution has a very low variance, that has one implication, but if the distribution has fat tails or is even a a power distribution, then the implications are radically different.

      See More
      August 4 at 10:20am · · 1 personMarc Maxson likes this. ·

    • Dennis Whittle
      ‎...(cont). Speaking of Category I, it is worth checking out Seth Roberts's blog (HT: Mari). Roberts is an academic who does a huge amount of self-experimentation to find out what works on him personally. His view is that there is far to...o little creativity brought to bear to come up with novel hypotheses about the pathways of disease. This is because it is so freakin' expensive to do RCTs. He is kind of all over the place but worth reading anyway.

      Third, I (manifestly) agree with Blattman about marrying someone in specializes in a different category of methodology. I can't imagine being married to someone who thought just like me. It is sometimes infuriating, like the tension in a good piece of music, but that makes the resolution all the richer.

      Fourth, I wonder more broadly whether there is space for some type of moderated blog ala Posner and Becker for development. The difference would be that the Posner and Becker would vary week to week. I would love to moderate something like that. Would it be at AidWatch? CGD? Something new?
      See More
      August 4 at 10:27am · ·

    • April Harding I though posner and becker just took turns blogging, and occasionally commented on each other? Is that what you mean? Or do you have in mind a bigger group of bloggers? Or a blog with an overseer who blogged but also invited other bloggers on interesting topics?
      August 4 at 10:31am · ·

    • Dennis Whittle Ah, my memory was that they commented on each other's blogs and had a running conversation. I am talking more about that style.
      August 4 at 10:34am · ·

      • Marc Maxson
        Interesting Dennis (about Cat I,II,III). I do know these statements are true about modern medicine (and somewhat disturbing):

        1. Leading cause of death = going to the hospital
        2. 50% of American women over 50 were being given medicinal horse ...piss at one point, despite there being no RCT to prove it's impact. (Later the Framingham study and the Nurse's Health Study both showed estrogen replacement therapy to be marginally effective, and possibly harmful).
        3. More knowledge about medical outcomes and risk factors has come out from the Framingham, Nurse's Health, and 7-th-day Adventist studies than any RCT.

        I repeat! NONE of these are RCTs. All are longitudinal observations of an isolated, well-defined social group with well-controlled monitoring methods. Hence, I've been setting up the GlobalGiving Kenya story telling project to mirror these approaches.

        4. Isaac Asmiov: "Life is the cruelest of all teachers, because it kills every student in the end." Medicine is quite the same, although the intermediate phases appear to beat all alternatives.
        See More
        August 4 at 10:32pm · ·
      August 4 at 10:32pm · ·

    • Marc Maxson April - I vote for married couples co-blogging and taking point-counterpoint. That would make for interesting reading.
      August 4 at 10:35pm · ·