Pssst… we can write an original essay just for you.
Any subject. Any type of essay.
We’ll even meet a 3-hour deadline.
121 writers online
Impact evaluations are studies that seek to establish whether the changes that are observed following an intervention program were caused by the intervention rather than by other factors (Khandker et al., 2010, p. 7). They may be undertaken in a variety of contexts and for a variety of purposes, but in general involve an attempt to compare outcomes that were observed to outcomes that are expected to have prevailed if the program had not been undertaken. A report by the European Commission describes the purpose of impact evaluations as being to “enable policymakers to rule out alternative explanations for changes in circumstances or accomplishments that might be observed” (2012, p. 1), and to provide evidence that “enables policymakers to assess the effectiveness of interventions, and moreover, make comparisons between interventions and assess their relative performance” (ibid., p. 1). In other words, impact evaluations are a way for policy makers to answer the question “what works?” when it comes to interventions, so that they might make decisions about which interventions to invest in and which are counterproductive or simply not cost-effective.
Researchers also emphasise the importance of impact evaluations to inform decisions around program and policy development. Campbell wrote that reforms act as experiments, and the purpose of evaluation is to test out the effectiveness of reforms as one would measure the results of an experiment (Campbell, 1979). Tilley (2000) writes that impact evaluation gives policy makers time and space to learn about the effects of a program prior to making large scale decisions about the future of the program or how widespread it should be. To fulfil these purposes, all impact evaluation studies attempt to investigate cause and effect: to look for changes or outcomes that are directly attributable to an intervention or treatment.
However, the evaluation discipline continues to debate the ways in which researchers might investigate and attribute cause. Experimental methods, which rely upon the use of a “counterfactual” to attribute cause, have become popular (Scriven, 2008). Randomised control trials (RCTs), a subset within the experimental evaluation tradition, have become idolised as “true experiments” or the “gold standard” for attributing cause (Tilly, 2000). Other experimental designs, including a wide variety of quasi-experimental designs, have come to be viewed as less rigorous and less acceptable by certain circles within the evaluation and policy communities (Scriven, 2008). This trend has driven what Scriven refers to as “the exclusionary policy” (2008), where programs and policies that are not evidenced by RCT experimental design often fail to receive approval or funding. Some leading evaluators have condemned this policy, and encourage the discipline to dismiss methodological absolutism in favour of matching appropriate methods to the task at hand (Morén & Blom, 2003). The perception that rigour is only to be found in experimental designs and methods is giving way, for some, to an understanding that a plurality of viable, rigorous methodologies exist, and that it is the job of the evaluator to figure out which one is appropriate and feasible for the task at hand given the circumstances. Scriven argues that, rather than there being one optimal research method, “The optimal procedure is simply to require very high standards in matching designs from the wide range that can yield high confidence levels, to the problem and the resources available” (Scriven, 2008, p. 23).
In keeping with this movement, this paper seeks not to participate in a simplistic argument for or against experimental methods as a whole, but to consider the circumstances under which experimental methods may not be appropriate when conducting an impact evaluation. It first addresses the feasibility of RCT experimental designs in different scenarios, and then discusses the generalisability of results. It considers the position of “realistic evaluation” as an alternative to the assumed superiority of experimental methods. It concludes that the “gold standard” approach of RCT experimental designs are an effective method for impact evaluation under ideal circumstances, but that alternative approaches such as quasi-experimental designs are more effective at informing decision-making in many situations that prevail in the real world.
Critics of RCTs point to how difficult it is to conduct a randomised control trial that adheres to best practice in certain circumstances and policy contexts (Scriven, 2008). One of the more common circumstances in which experimental designs are difficult is when it is unfeasible, prohibitively costly, or unethical to establish a suitable matched control group (Chen & Rossi, 1987). Control groups are a way of establishing a counterfactual, which is the central mechanism through which experimental methods infer cause. Counterfactuals involve a comparison between what happened after an intervention and what would have happened in the absence of the intervention. Through a process of constructing equivalent experimental and control groups and applying an intervention to the experimental group only, experimental evaluators can then compare the effect in each group. If it can be proven that the two groups were sufficiently comparable prior to the intervention (ideally achieved by random allocation), any difference between the two groups after the intervention can be attributed to that intervention.
However, establishing a matched control group through random allocation is not always possible. One solution to this feasibility challenge is to employ one of many quasi-experimental designs that do not employ randomisation to create control groups. These designs can improve the feasibility of an experimental approach by seeking creative yet effective ways to establish meaningful comparisons and baselines, to control for threats to internal validity, and to produce rigorous causal claims. For example, in the interrupted times series design, the effects of an intervention are assessed based on changes in measurements before and after the implementation of the intervention (Penfold & Zhang, 2013). This controls for differences between participants that may influence the effects that are observed, as individual participants act as their own control. Similarly, this approach avoids some forms of bias that plague control group designs such as environmental factors that affect the control group in a different way to the treatment group, due to their different locations or contexts (Tilley, 2000).
Quasi-experimental designs are highly appropriate in many circumstances. It is often less about whether experimentation is an appropriate approach and more about which form of experimental design is most fit for purpose – which must include an assessment of feasibility. A second criticism of experimental designs relates to generalisability. Researchers have raised doubts about whether the effects of some types of interventions implemented in one context can be generalised to other contexts with different circumstances. Moren and Blom argue that “randomized controlled trials do not take contexts into much consideration” (2003, p. 38), which for them means that what works in an initial context may be useless in a different context. Accordingly, Stern et al. argue that experiments can answer the question “did it work here?” but not “will it work for us elsewhere?” (Stern et al, 2012)
The situations for which generalisability is a particular problem for impact evaluations are those in which context is a strong mediating factor. This is true for many social programs in the justice, social work, education and development spaces. Stern (2012) writes that perhaps as little as 5% of development programs are suitable for RCTS. Moren and Blom (2003) point out that experimental approaches often neglect that social work and development practices are carried out under what they call “open conditions”. They argue that social work is highly contextual and involves complex and dynamic client-worker relationships – and that these are not confounding variables to be controlled for and homogenised by an experimental approach, but rather important factors in an intervention to be harnessed. While in the pharmaceutical world, the impact of a drug might be more easily measured, social programs are not simple “treatments” or “dosages” like medications that are delivered to passive recipients (Chen & Rossi, 1983). As Pawson argues, “Programs do not ‘work’, rather it is the action of stakeholders that makes them work, and the causal potential of an initiative takes the form of providing reasons and resources to enable participants to change” (Pawson, 2002 p. 215).
Tilley (2000) reflects upon a case study that illustrates the pitfalls of applying an experimental design inappropriately in a complex social situation. Tilley reviews Sherman’s work on a mandatory arrest policy aimed at reducing domestic violence (Sherman, 1992). For the study discussed, a randomised control group was constructed and the evaluation found that repeat assaults were lower in the intervention group compared to the control. On the back of the study, many American cities were encouraged to adopt a mandatory arrest policy in relation to domestic violence as a means of reducing repeat assaults. However, after implementation to other cities, it became clear that the intervention did not reduce re-offending in all cities. Sherman suggested that the mixed findings might be explained by the different causal mechanisms acting in different economic and community conditions. He hypothesised that where there was high employment, arrest may produce shame on the part of the perpetrator who was then less likely to re-offend. Where there was lower employment and less stability in a community, arrest was likely to trigger anger in the offender perpetrator, which was a factor in higher rates of repeat offence. Tilley (2000) uses this case as a clear example of a treatment and effect that varied by context: what works to produce an effect in one place may not necessarily produce it in another. Here, the experimental approach established a cause-and-effect relationship but failed to capture the mechanisms that mediated that relationship – which resulted in a policy being generalised to cities where it was inappropriate.
Tilley puts forward a view of causation that is at odds with what he calls a traditional experimental model – “successionist” or closed-system model of causality. Like many researchers, Tilley argues that in social situations, cause and effect cannot be reduced to simplistic mechanisms. As Moren and Blom put it, “Clients do not react like billiard balls that are hit, rather, interventions are always mediated by and through clients’ responses and choices” (p. 37 Morén & Blom, 2003). As such, many evaluators who work with social programs prefer a more generative or approach to impact evaluation. Pawson and Tilley (1997) describe this as realistic evaluation. For them, social interventions are so complex and so context-sensitive that even basic interventions may not have the same effect in different contexts. While traditional experimentation asks the question, “Does this work?” or “What works?”, the question asked by realistic evaluation is “What works for whom in what circumstances?” (Pawson, 2002). Realistic evaluators begin with the assumption that interventions will vary in their impact depending on the conditions in which they are introduced – consistency is not necessarily the objective. The key problem for evaluation research is to find out how and under what conditions a given measure will produce its impacts. Armed with a detailed understanding of how interventions will produce varying impacts in different circumstances evaluators seek to enable policy makers to better inform decisions what policies to implement in what conditions. It also enables policy makers to improve or take the next step towards a particular outcome. In combination, these two objectives are, in many ways, the practical goal behind the need to generalise findings in the first place (Tilley, 2000).
Experimental approaches to evaluation, like every methodology, have many strengths. When feasible, an experimental impact evaluation is very effective in answering the question: “Has this particular intervention made a difference here?”. However, conditions do not always allow for the experimental use of counterfactual to perform causal analysis. Similarly, conditions often get in the way of an experimental design’s ability to generalisation. This is particularly the case when a program aims to have a complex social effect in a complex social environment. Here, realist approach to impact evaluation may be more practical as this approach allows researchers and policy makers to walk away with a detailed understanding of the causal mechanisms behind the cause and effect relationship – and how these mechanisms may or may not be present in all context.
To export a reference to this article please select a referencing style below:
Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.
Attention! this essay is not unique. You can get 100% plagiarism FREE essay in 30sec
Sorry, we cannot unicalize this essay. You can order Unique paper and our professionals Rewrite it for you
Your essay sample has been sent.
Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.Order now
Are you interested in getting a customized paper?Check it out!