By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1725 |
Pages: 4|
9 min read
Updated: 25 February, 2025
Words: 1725|Pages: 4|9 min read
Updated: 25 February, 2025
Impact evaluations are essential studies aimed at determining whether observed changes following an intervention were indeed caused by that intervention, rather than by external factors (Khandker et al., 2010, p. 7). These evaluations can be conducted in various contexts and serve multiple purposes, generally involving a comparison of actual outcomes to those that would likely have occurred without the intervention. According to a report by the European Commission, the primary goal of impact evaluations is to “enable policymakers to rule out alternative explanations for changes in circumstances or accomplishments that might be observed” (2012, p. 1). Additionally, they provide evidence that allows policymakers to assess the effectiveness of different interventions and compare their relative performance (ibid., p. 1). Thus, impact evaluations help answer the crucial question of “what works?” in the realm of interventions, guiding decisions about which programs to support and which to abandon due to ineffectiveness or lack of cost-efficiency.
Researchers emphasize the significance of impact evaluations in informing decisions related to program and policy development. Campbell (1979) posits that reforms function as experiments, with evaluations serving to measure their effectiveness akin to experimental results. Tilley (2000) notes that impact evaluations provide policymakers with the necessary time and space to understand the effects of a program before making large-scale decisions about its future or scope. To fulfill these purposes, all impact evaluation studies aim to investigate cause and effect, seeking changes or outcomes that can be directly attributed to an intervention or treatment.
Despite the consensus on the importance of impact evaluations, the evaluation discipline continues to debate the methods through which researchers can investigate and attribute causation. Experimental methods, particularly those employing a “counterfactual” to establish cause, have gained popularity (Scriven, 2008). Randomized Control Trials (RCTs), a specific type of experimental evaluation, are often regarded as the “gold standard” for attributing causation (Tilley, 2000). Conversely, other experimental designs, including various quasi-experimental approaches, are sometimes viewed as less rigorous or acceptable within certain evaluation and policy circles (Scriven, 2008). This trend has led to what Scriven describes as “the exclusionary policy” (2008), where programs not supported by RCTs frequently struggle to secure approval or funding. Prominent evaluators have criticized this policy, advocating for a dismissal of methodological absolutism in favor of aligning appropriate methods with the task at hand (Morén & Blom, 2003). The belief that rigor is exclusive to experimental designs is gradually shifting toward a recognition of diverse, robust methodologies. Evaluators are encouraged to identify the most suitable and feasible approach for each specific circumstance. Scriven (2008, p. 23) asserts that instead of seeking a single optimal research method, “the optimal procedure is simply to require very high standards in matching designs from the wide range that can yield high confidence levels, to the problem and the resources available.”
This paper aims not to engage in a simplistic argument for or against experimental methods but rather to explore the conditions under which such methods may be inappropriate for impact evaluations. It first examines the feasibility of RCTs in various scenarios, followed by a discussion on the generalizability of results. Additionally, it considers the position of “realistic evaluation” as an alternative to the perceived superiority of experimental methods. While RCTs can be effective under ideal circumstances, alternative approaches, such as quasi-experimental designs, may be better suited for informing decision-making in many real-world situations.
Critics of RCTs highlight the challenges in conducting trials that adhere to best practices in specific contexts (Scriven, 2008). One common issue arises when establishing a suitable matched control group is unfeasible, excessively costly, or unethical (Chen & Rossi, 1987). Control groups are critical for establishing a counterfactual, which is essential for inferring causation through experimental methods. A counterfactual entails comparing outcomes following an intervention with those that would have occurred in its absence. By creating equivalent experimental and control groups and applying the intervention solely to the experimental group, evaluators can compare the effects in each group. If sufficient comparability between the groups can be established prior to the intervention (ideally through random allocation), any observed differences post-intervention can be attributed to the intervention itself.
However, random allocation for matched control groups is not always feasible. One solution to this challenge is the use of various quasi-experimental designs that do not rely on randomization to form control groups. Such designs can enhance the feasibility of an experimental approach by creatively establishing meaningful comparisons and baselines, controlling for threats to internal validity, and producing rigorous causal claims. For example, in an interrupted time series design, the effects of an intervention are assessed by examining changes in measurements before and after its implementation (Penfold & Zhang, 2013). This method controls for participant differences that may influence the observed effects since individual participants serve as their own control. Additionally, this approach mitigates biases that may arise from control group designs, such as environmental factors that differentially affect treatment and control groups due to their distinct locations or contexts (Tilley, 2000).
Quasi-experimental designs are often highly suitable for various scenarios. The focus should not be solely on whether experimentation is appropriate but rather on which form of experimental design best fits the purpose, including an assessment of feasibility. Another critique of experimental designs pertains to generalizability. Researchers have expressed concerns about whether the effects of specific interventions in one context can be applied to other contexts with differing circumstances. Morén and Blom argue that “randomized controlled trials do not take contexts into much consideration” (2003, p. 38), suggesting that what proves effective in one situation may not be applicable elsewhere. Accordingly, Stern et al. assert that while experiments can answer the question “did it work here?”, they often fail to address “will it work for us elsewhere?” (Stern et al., 2012).
The contexts where generalizability poses a significant challenge for impact evaluations often involve strong mediating factors. This is especially true for many social programs in the realms of justice, social work, education, and development. Stern (2012) posits that only about 5% of development programs are suitable for RCTs. Morén and Blom (2003) contend that experimental approaches frequently overlook the complex and dynamic nature of social work and development practices, which occur under what they term “open conditions.” They argue that these intricate client-worker relationships should not be viewed as confounding variables to be controlled but rather as vital components of an intervention that need to be harnessed. In contrast to the pharmaceutical realm, where the impact of a drug can be more straightforwardly measured, social programs are not simple “treatments” administered to passive recipients (Chen & Rossi, 1983). As Pawson (2002, p. 215) states, “Programs do not ‘work’; rather, it is the action of stakeholders that makes them work, and the causal potential of an initiative takes the form of providing reasons and resources to enable participants to change.”
Tilley (2000) provides a compelling case study that illustrates the pitfalls of inappropriately applying an experimental design in a complex social situation. He reviews Sherman’s work on a mandatory arrest policy designed to reduce domestic violence (Sherman, 1992). In this study, a randomized control group was established, revealing that repeat assaults were lower in the intervention group compared to the control group. Consequently, many American cities adopted the mandatory arrest policy to mitigate repeat assaults. However, subsequent implementation in various cities revealed that the intervention did not yield the same reduction in re-offending everywhere. Sherman suggested that the mixed results could be attributed to different causal mechanisms influenced by varying economic and community conditions. For instance, in areas with high employment, an arrest might instill shame in the perpetrator, reducing the likelihood of re-offending. In contrast, in communities with lower employment and stability, arrests could provoke anger, leading to higher rates of repeat offenses. This case exemplifies how the experimental approach established a cause-and-effect relationship but failed to capture the mediating mechanisms, resulting in a generalized policy that was inappropriate for certain cities.
Tilley challenges the traditional experimental model's notion of causation, advocating for a more nuanced understanding of social situations. He argues that cause and effect cannot be simplified into straightforward mechanisms. As Morén and Blom (2003, p. 37) assert, “Clients do not react like billiard balls that are hit; rather, interventions are always mediated by and through clients’ responses and choices.” Consequently, many evaluators working with social programs favor a generative approach to impact evaluation. Pawson and Tilley (1997) refer to this as realistic evaluation. They contend that the complexities and context sensitivity of social interventions mean that even basic interventions may not yield the same effects across different contexts. While traditional experimentation poses the question, “Does this work?” or “What works?”, realistic evaluation asks, “What works for whom in what circumstances?” (Pawson, 2002). Realistic evaluators assume that interventions will have varying impacts depending on the conditions in which they are implemented, with consistency not necessarily being the goal. The primary challenge for evaluation research is to identify how and under what conditions a given measure will produce its effects. By gaining a detailed understanding of how interventions can yield different impacts across varying circumstances, evaluators empower policymakers to make informed decisions about which policies to implement in specific contexts. This understanding also aids policymakers in improving or advancing towards desired outcomes. Together, these two objectives represent the practical rationale behind the need to generalize findings in the first place (Tilley, 2000).
Experimental approaches to evaluation, like any methodology, possess numerous strengths. When feasible, an experimental impact evaluation effectively addresses the question: “Has this particular intervention made a difference here?” However, real-world conditions do not always permit the use of counterfactuals for causal analysis. Additionally, these conditions often hinder the ability of experimental designs to achieve generalizability. This issue is particularly pronounced when a program aims to produce complex social effects within intricate social environments. In such cases, a realist approach to impact evaluation may prove more practical, allowing researchers and policymakers to gain a comprehensive understanding of the causal mechanisms underlying the cause-and-effect relationship and how these mechanisms may vary across different contexts.
Methodological Approach | Strengths | Limitations |
---|---|---|
Randomized Control Trials (RCTs) | High internal validity, strong causal inference | Feasibility issues, generalizability concerns |
Quasi-Experimental Designs | Greater feasibility, context-sensitive | Potentially lower internal validity |
Realistic Evaluation | Focus on context and mechanisms, flexible | Complexity in analysis, requires detailed understanding |
Browse our vast selection of original essay samples, each expertly formatted and styled