The discussion around evaluating large-scale development projects is an important one. Michael Clemens and Gabriel Demombynes offer a critique of the Millennium Village Project (MVP) and its research methods. Their paper misunderstands the MVP's aims and evaluation methods. We respond briefly here, to clarify some of those basic misunderstandings about the project's goals and approaches to evaluation.
1. Goals of the Millennium Villages Project
The MVP is a ten-year project running through 2015, the deadline year for the Millennium Development Goals (MDGs), and has a corresponding time horizon for evaluation. The primary aim of the project is to achieve the Millennium Development Goals in the Project sites, as a contribution to the broader fulfillment of the MDGs.
To achieve this goal, there are several key organizing principles.
The MVP emphasizes the deployment of low-cost interventions that have been proven in earlier studies, in areas of smallholder agriculture, public health, primary education, local infrastructure, and business development. The purpose is not to re-test the individual interventions, but rather to demonstrate the feasibility of their joint implementation.
The MVP aims to design and document effective delivery systems for these interventions that can be implemented within the per capita financial envelope of local and national resources plus the external aid levels committed by the donor countries at Monterrey (2002) and Gleneagles (2005).
The MVP emphasizes community-based management of the delivery systems, as well as the feasibility of tailoring these systems to meet diverse local needs across Sub-Saharan Africa's major agro-ecological zones.
The MVP places local teams as the primary managers, problem solvers, and trouble-shooters of the MVP systems.
The MVP is also generating a range of products to support national and Africa-wide MDG scaling up.
The MVP is developing community-based information and management systems, training manuals and courses, computer and mobile-phone-based applications for monitoring and enhancing intervention coverage in rural areas, policy briefs, standards for optimizing institutional designs, and other “institutional and management capital”.
The MVP is carefully measuring the costs of various delivery systems and interventions, for use in national and global needs assessments and budgeting.
The MVP is developing a thorough evidence base of the institutional capacities and needs of local communities in a diverse array of ecological conditions.
The MVP is building an Africa-wide team of expertise that is working with local, national, regional, and AU officials on MDG scaling up, drawing from lessons learned in the villages.
At the same time, it is worth clarifying what the MVP is not trying to do:
The MVP is not testing a rigid protocol for implementing MDG-based systems. The emphasis is community learning, design, and local context. Through careful documentation and codification of these learning mechanisms, the MVP supports a broader, ecologically sensitive scaling up.
The MVP is not claiming or aiming to provide a unique or “optimal” model for achieving the MDGs. It aims, rather, to offer a workable and effective model that fits the relevant budget and resource constraints. Given the shortfalls of the MDGs throughout rural Africa, success of a working model is vital, even if it is not necessarily the perfect model. (The MVP team is not aware of any other initiative that is rigorously documenting processes and costs, and designing systems, to achieve the MDGs at the scale of hundreds of thousands of people across Africa's many distinct eco-zones.)
2. How are the Millennium Villages Actually Being Evaluated?
Clemens and Demombynes inaccurately describe the June 2010 publication, Harvests of Development (HOD), as an MVP evaluation report. As a result, they make several incorrect claims regarding the M&E systems in the MVP. This error is curious because HOD says on the first page:
This report highlights the early results after three years of implementation across five initial Millennium Village sites in Ghana, Kenya, Malawi, Nigeria, and Uganda. Progress toward achieving the MDGs are derived from recently completed mid-term (year three) surveys. All data contained in this report compare baseline values to year three assessments, among a sample of several hundred households across each cluster. (…)
Further scientific results, including comparisons with other villages, will be published later this year, including in peer-reviewed scientific literature. We therefore emphasize the provisional nature of the results presented here, both in the sense that they are after only the third year of a ten-year project, and in that they represent only part of the third-year evaluation underway this year. We are presenting these partial results now in order to foster a better public understanding of the Project and its potential to help reduce extreme poverty, hunger, and disease in rural Africa. We hope that this report contributes to the public discussion in the lead-up to the MDG Summit in September 2010. [p.4, emphasis added.]
In short, HOD is not a formal evaluation of the project, and treating it as such is incorrect and prejudicial to the project.
In fact, the MVP has systems in place to foster learning at a number of levels, which is important given the complexity and localization of the interventions and the diverse range of settings.
The project has implemented quarterly performance monitoring of dozens of MDG-relevant indicators. This has been a laborious process which has by now created a monitoring tool that can be used for general MDG scaling up (as it is now being used in Nigeria, Timor-Leste, and other locations).
The project has several process evaluation modules to examine various issues of local systems design and effectiveness. These help to answer design-level questions such as: “How can community health workers armed with a cell-phone effectively diagnose and treat malaria at the household level?” or “What are the most appropriate mechanisms to provide small-scale loans for agriculture?”
The Project is carefully examining several natural experiments such as how the elimination of user-fees at a particular primary care facility affects health service utilization and access to vital interventions.
A further economic costing module documents project spending by year, sector and stakeholder, to provide detailed estimates of the costs of various interventions and systems, information that is essential for scaling up and that is very difficult to obtain other than through implementation projects such as the MVP.
Finally, detailed impact assessments draw upon socio-economic, health, biological and anthropometric surveys within the villages alongside biophysical data on crop yields. In Harvests of Development, the MVP chose not to present many of these data, as they will appear first in peer-reviewed scientific publications.
Many of these impact assessments rely in part on data comparing the MVs and “control” villages, though emphasizing that the control villages cannot be “no-intervention” villages. They are simply comparison villages outside of the MV project sites.
3. Regarding Clemens and Demombynes' Arguments
On Rigor, Objectivity and Transparency
Clemons and Demombynes claim that the choice of villages was somehow “subjective” rather than rigorous and evidence-based. In fact, this issue has already been discussed at length in a peer-reviewed and registered evaluation process (The Lancet, protocol number 09PRT-8648).
As outlined in the protocol, the original Millennium Village selection was driven by a range of objective criteria with the aim of maximizing “external validity” and enhancing learning across diverse settings. Sites were chosen “purposively” to represent over 95% of the agro-ecological zones on the continent – reflecting a variety of systems-level challenges, disease profiles, and baseline levels of infrastructure and capacity. Within each country, selection criteria included rural areas with high rates of poverty and where at least 20% of children were undernourished. Clemens and Demombynes in fact reinforce the levels of extreme poverty in the MVPs by highlighting that the baseline situation of the Millennium Villages is generally worse off when compared to national or local DHS indicators. These data refute any insinuation that the Millennium Villages were somehow systematically advantaged at the outset of the project.
Comparison villages were selected at random from among a panel of matched candidates. An objective inventory was used to assess nearly two dozen village-level characteristics known to be associated with study outcomes, with one of three candidates chosen at random to undergo detailed monitoring. These are not pure “control” villages in the sense that governments, NGOs and other development partners are currently involved in scaling up many of the same interventions contained in the Millennium Villages package. However, as these may not be occurring in the comparison villages with the same cross-disciplinary scope and intensity as in the Millennium Villages, the Project is well-equipped to document MDG-related progress, link this progress to a costed intervention package, and examine potential synergies that may result from working across sectors. Data on baseline equivalence between intervention and comparison sites will be presented in forthcoming scientific publications. Where differences exist, they will be adjusted for in the final analysis.
Clemens and Demombynes' second major argument is that MVP results are to be downplayed due to broader secular improvements highlighted through difference-in-difference methods. The authors' application of this method reflects a basic misunderstanding of the MVP's goals and purpose, and also the broader dynamics of scale-up happening throughout Africa. Progress is indeed occurring throughout rural Africa, because the same kinds of interventions (e.g. bed nets, medicines, mobile-based health care, etc.) are being taken up more widely. However, progress within the MVP sites is likely faster and more inclusive because the interventions are better financed, better delivered, and more systematically managed. That Clemens and Demombynes show changes within MVP sites generally outpacing national trends supports the hypothesis that additional intensification of scale-up will only further accelerate progress. The fact that the local leadership at the MV sites continues to play a constructive role in supporting the scale-up of those interventions only strengthens the point.
Consider an example. The MVP introduces agricultural support programs in Year 1 to provide 100kg of subsidized fertilizer and 10kg of modern seeds to 10,000 local farmers, in a country where most farmers use less than 10kg of fertilizer per household and very few have access to modern seeds. In the first season, the MVP farmers increase their yield by 2 tons per hectare. The national government then decides that it wants to provide similar support to every smallholder farm household in the country, and is successful in mobilizing ODA support to do so, but the country can only afford to subsidize 50kg to each farmer along with 5kg of seeds. The result is that the national average yield increases by 1 ton per hectare.
In this example, a simple difference-in-difference model could be falsely interpreted as showing that that the MVP interventions increased the yield by only 1 ton, and that to claim a 2 ton increase overstates the results of the interventions. This is obviously an erroneous conclusion. The MVP intervention (100kg of fertilizer and 10kg of seed) is working exactly as advertised: raising yields by 2 tons. Outside of the villages, half the intervention package is producing half the results. There is no “over-reporting” of results, merely an increased treatment dose in the MV, which is exactly the correct point.
It is important to note that the MVP would be entirely successful if, at the end of a 10-year horizon, its lessons were fully scaled up nationwide, and there were no statistically significant differences between the progress indicators in the MVs and in national averages, with the MV-type interventions being fully implemented nationwide.
This logic is crucial in light of the widespread scaling up of MDG-type interventions that has taken place throughout Africa since the MVP launched across 10 countries in 2006. Development assistance for malaria, AIDS, TB, immunizations, and several other interventions has increased markedly, and the results have been very positive where these interventions have been applied.
On Cluster Randomized Trials
Clemens and Demombynes' main recommendation is that the Millennium Villages should be subject to nothing short of a cluster randomized trial (CRT) with a minimum of 20 pairs of clusters. While a relatively latecomer to development economics, cluster randomized designs have been a common feature of public health evaluations for nearly three decades.1 A CRT is a useful methodology for evaluating discrete interventions delivered at the level of groups or populations – such as schools, workplaces or wider communities. The evaluation allows for an assessment of intervention effects by comparing randomly selected groups that receive the intervention with groups that do not.
While CRTs have well-acknowledged limitations2, they are most useful when two important criteria are met. First, the intervention tested should be unproven. Ideally, the intervention should be simple, focused, and standardized, with a clear pathway linking exposure and outcome- e.g. testing the effect of community-wide treatment for intestinal parasites on levels of infection, or the effect of seat-belt laws on motor vehicle deaths. CRT designs are less well suited to the evaluation of complex, adaptive systems such as the Millennium Villages with multi-component interventions operating across numerous sectors. Second, the optimal scientific scenario for a CRT is where comparison communities are untouched by the intervention – meaning they are not in any way exposed to the program being evaluated. If these conditions are met, the CRT design is can provide an unbiased assessment of intervention effects.
There is growing skepticism regarding the appropriateness of the CRT design for the evaluation of large-scale health and development programs in low-income countries.3 There are fairly straightforward reasons for this. First, progress towards the MDG targets is less about designing novel interventions and technologies, and much more about creating effective local systems to put these proven interventions into practice. The main research questions are not simply “does it work?”, but rather how to overcome complex implementation and financial challenges in a diverse range of poor and hard-to-reach communities. The CRT design does very little to help answer these questions. Second, untouched comparison communities, if they ever existed, are unlikely to exist now. In real-world settings pure “controlled conditions” for well-established interventions are impractical, and in this light, the relevance of applying CRT evaluation methodologies is much more limited. Commenting on these issues, a recent Lancet article clearly suggests “a reductionist approach to evaluation based on isolation of program effects is no longer appropriate for scaling up of initiatives to reach the MDGs in most low-income countries.”3
4. Should we wait 15 years?
Clemens and Demombynes also suggest that efforts to take interventions to scale should wait at least 15 years until evidence of long-term effects can be proven and sustained. This assertion cannot be taken seriously. There is a deep, established, proven record of what needs to be done in many crucial areas. The most pressing issues are to design effective systems and financing for implementation. It would be the height of folly to delay the introduction of long-lasting insecticide-treated bed nets, antiretroviral medicines, high-yield seed varieties, micro-finance, fertilizers, emergency obstetrical care, and the dozens of other proven interventions. The key is to find ways to implement them with systematic monitoring, evaluation, budgets, management, scale up and feedback. Those are the core purposes of the MVP. Economists like Clemens and Demombynes should stop believing that the alleviation of suffering needs to wait for their controlled cluster randomized trials. Fifty years of research in countless areas of agriculture, health, education, infrastructure, and business development have proven many effective interventions. The key now is to apply this knowledge systematically and at scale. We need a systems approach for that purpose, and the MVP provides the foundations for exactly such a systems approach.
The Millennium Villages present one important and constructive initiative to design effective local systems to achieve the MDGs in impoverished rural Africa. The project is not a rigid blueprint, but rather a multi-faceted and community-based design process, to generate scalable lessons, processes, tools and systems, all subject to detailed and careful documentation. The various sites are showing that significant and rapid progress can be made at a modest and scalable cost, and that there are likely synergies to working across multiple sectors simultaneously. The progress in the MVs is much faster than in the rural areas outside of the MVs. The project is shedding considerable light and carefully documented evidence on how rural communities in Africa can make rapid progress towards achieving the MDGs.
1 Blackburn, H. (1983). “Research and demonstration projects in community cardiovascular disease prevention.” Journal of Public Health Policy 4: 398-422.
2 Sorenson, G., K. Emmons, et al. (1998). “Implications of the Results of Community Intervention Trials.” Annual Review of Public Health 19: 379-416.
3 Victora, C. G., R. E. Black, et al. (2010). “Measuring impact in the Millennium Development Goal era and beyond: a new approach to large-scale effectiveness evaluations.” The Lancet July 8 Epub(PMID: 20619886 ).
Dr. Paul Pronyk is the Director of Monitoring and Evaluation for the Millennium Villages Project at the Earth Institute, Columbia University. He is based in New York.
Dr. John McArthur is the CEO and Executive Director of Millennium Promise. He is based in New York. Follow John on Twitter @mcarthur
Dr. Prabhjot Singh is Assistant Professor of International and Public Affairs and leads the System Design Group of the Millennium Villages Project. He is based at the Center on Globalization and Sustainable Development, the Earth Institute, Columbia University, New York.
Dr. Jeffrey Sachs is the Director of The Earth Institute, Quetelet Professor of Sustainable Development, and Professor of Health Policy and Management at Columbia University. Dr. Sachs is also President and Co-Founder of Millennium Promise.