Policy pilots are often understood to be synonymous with their evaluation. The assumption is that a government would not initiate a pilot if the intention were not to evaluate it. This conflation is most obvious in the notion of the ‘policy experiment’, a term that carries connotations of measuring effectiveness through experimental methods, specifically by using randomised controlled trials (RCTs). In English health policy discourse, with its proximity to clinical medicine, the notion of policy experimentation becomes skewed towards using pilots as a way to assess policy effectiveness in quantifiable terms through RCTs.
Yet piloting and evaluation are two quite separate activities in policy terms. Piloting is about testing a policy in practice through its time-bound and geographically limited implementation. Evaluation is the study of this implementation and the effects it achieves, in the health field mostly (but not always) through independent academic research. Whether an experimental research design should be used in the evaluation thus depends on the purpose of the pilots, and not on the intrinsic merits of an experimental evaluation design, as my co-authors and I argue in a new paper in Evaluation.
Policy piloting can be driven by all sorts of motivations and serve a range of policy purposes, not all of them aimed at establishing policy effectiveness. Piloting can be motivated by a desire of policy-makers to promote their policy in the hope that change in participating sites will ultimately spread more widely, or the desire to create opportunities for identifying and overcoming implementation barriers. In some cases, pilots are used to encourage innovation and thus to generate promising new approaches where previous policies have failed (e.g. new ways of integrating health and social care in the face of frustration with previous attempts).
These motivations, while useful in their own right, have little in common with the notion of testing policy effectiveness in the abstract and under the assumption of genuine uncertainty (‘equipoise’) that form the hallmarks of RCTs. While we contend that these other motivations are legitimate reasons for piloting in policy terms, they are often in conflict with evaluations that build on RCTs as the “gold standard” for measuring effectiveness.
For policy-makers, there is a strong contemporary pull towards RCTs when commissioning evaluation. As evidence-informed policy has become the new orthodoxy in policy making in England, policy-makers need to be seen to be obtaining and using evidence of ‘what works’. They show that they are doing so, in part, by initiating policy pilots and commissioning evaluations. However, there is a risk that doing an RCT on an underdeveloped policy pilot can lead to spurious findings and is likely to be more symbolic than informative. It also risks directing attention away from the objective of identifying alternative, or otherwise innovative policy approaches if this is the aim of the pilot programme.
Therefore, the focus on RCTs and the like as the preferred method of measuring effectiveness of pilots may not always be appropriate. There is a risk that policy that is being piloted – given its novelty and uncertainties around implementation – is insufficiently developed to allow for measuring cause and effects at the scale required to provide meaningful conclusions. In addition, the type and scale of challenges to implementation and the degree of ‘system fit’ are often underestimated and are genuinely difficult to predict at the beginning of a pilot. The complexity of many policy initiatives means that there is often substantial uncertainty about how the policy can and should be implemented in practice. This uncertainty, however, is different from the uncertainty about policy effectiveness that RCTs are best at addressing. Finally, in many pilots it is not possible for the researcher to have the level of control over the intervention that is necessary for experimental evaluations to succeed, especially if these require researchers and managers to randomise service users and provide services according to strict research protocols.
While RCTs have their place in informing policy and should certainly be considered when commissioning evaluations, it is questionable whether they are truly as compatible with policy piloting as policymakers in the English healthcare field would like us to believe.
Ettelt S, Mays N, Allen P (2015): The Multiple Purposes of Policy Piloting and their Consequences: Three Examples from National Health and Social Care Policy in England. Journal of Social Policy 44 (2): 319-337.
Ettelt S, Mays, N, Allen P (2015): Policy Experiments – Investigating Effectiveness or Confirming Direction. Evaluation 21 (3): 292-307.