Appendix B: Designs With and Without Control Groups

The designs in the main body of the text focus on data for the group of people or the area receiving the response. To determine whether the response is the cause of a drop in the problem, it is helpful to use a control group. Also, control groups are critical to obtaining reasonable estimates of the amount of spatial displacement and diffusion of benefits (see Problem-Solving Tools Guide No. 10, Analyzing Crime Displacement and Diffusion). Control groups can be added to either the pre-post design or the time series design.

In this appendix, we will look at five designs, including the two examined in the body of this guide. We will use data from an evaluation of a problem-solving effort to reduce injurious and fatal vehicle crashes in Cincinnati. The evaluation used a multiple time series design and a very complex statistical analysis process to get a precise estimate of the number of lives saved and injuries averted by implementing a response to injury-related traffic accidents. The authors found that such accidents declined 5.7 to 10.3 percent in Cincinnati compared to the comparison areas.9 This evaluation was possible because of a long-standing partnership between the Cincinnati Police Department and the University of Cincinnati’s Institute of Crime Science (based within the School of Criminal Justice).m

Here, we will not replicate the analysis conducted in the published paper. Instead, we will use the data to illustrate how conclusions about the effectiveness of the response can change, depending on the evaluation design used. We will start by using the data on Cincinnati traffic accidents to illustrate a design that should not be used: a static comparison group. This will be our baseline. We will then show why the pre-post design is an improvement. Then we will show why a control group is useful. Following this, we will return to the time series design. We will conclude by showing a time series design with a control group. This brief tutorial is an introduction to evaluation designs, meant only to illustrate their basic logic.

Static Comparison Design

Let’s assume that a year after the Cincinnati Police Department’s Traffic Division launched its response, which was meant to decrease the number of injury-related traffic crashes, you are asked to determine whether it made a difference. A common design, which is not recommended, is to compare the number of injury accidents in Cincinnati to the numbers of those in nearby jurisdictions which did not use the response. The logic is that these nearby agencies would be exposed to the same traffic conditions and drivers, so they should have a similar level of accidents. That is, you are assuming that if the response worked, Cincinnati should have fewer accidents than the comparison, and that if Cincinnati had not used the response, its level of accidents would be similar to that of the comparison area.

Figure B1 shows the results. Cincinnati is contained within Hamilton County, so Hamilton County (without Cincinnati) is the comparison. Dividing the number of accidents over a 12-month period by the driving population of each jurisdiction (or road miles driven in the areas) would control for population differences. We do not do this here, for a simple reason: The principal problem with this design is that the comparison area is systematically different from the response area (they have different driving populations, there are more highways in one area than the other, the population is older in one area than in the other, and so on). Population is just one area in which there can be many differences.

Figure B1: Static Comparison Design

Static Comparison Design

In a Static Comparison Design, you compare the problem in an area or group that received a response to a similar area or group that did not. The time period for both is after the response.

The area or group not receiving a response provides an indicator of the level of the problem in the response area or group, if the response had not been applied.

Here, the Cincinnati Police Department’s response takes place only within the city. Comparing the 12 months following the response in Cincinnati to the same 12 months in the surrounding county shows fewer injury accidents.

Though it seems to show the response works, it is weak evidence. This is because Cincinnati usually has fewer accidents involving injuries.

You should avoid using this type of evaluation design as it has a high risk of producing misleading results. How misleading can be appreciated by comparing the results in Figure B1 to the results in the next set of figures, which illustrate better designs.

Pre-Post Without a Control Group Design

We discussed this design in the main body of the guide, so will revisit it only briefly here. Figure B2 shows the results of the evaluation of the response that was designed to lower the incidence of traffic injuries in Cincinnati. The comparison is between 12 months before the response and 12 months after. We use a full-year comparison because it controls for seasonal changes in accidents. A shorter period (e.g., the September before the response to the September after it) is highly susceptible to random changes in accidents that a response cannot address. With this design, we act as if the before measure is an accurate indicator of the number of accidents Cincinnati would have had, if no response had been applied. Therefore, the difference between the before and after measures of the problem is an indicator of the reduction due to the response.

Pre-post designs are simple. They are most useful if your principal interest is in determining whether the problem did decline and you are not going to make a strong claim that the response was a major cause of the decline.

Pre-Post With a Control Group Design

If we combine the static comparison design’s use of a control with the pre-post design’s use of a pre-response measure of the problem, we can improve the evaluation. The control area or group does not receive the response, even though it has a problem similar to that of the area or group that receives the response. The purpose of the control group is to demonstrate what would have occurred if no response had been taken. Knowing this can help you eliminate some alternative explanations for the decline in the problem.

This design is illustrated in Figure B3. Here we see that the county outside Cincinnati had a decline in injury-causing vehicle crashes from before to after the response inside Cincinnati. This indicates that even without a response, Cincinnati might have experienced a similar decline. However, Cincinnati’s decline in vehicle-injury accidents is greater than the decline in Hamilton County (over 40% greater). This indicates that the response in Cincinnati contributed to the general decline.

Figure B2: Pre-Post Design

Pre-Post Design

In a Pre-post Design (also called a before-after design), you compare the level of the problem after the response to the level of the problem before the response. You do not use a comparison, or control, area. You assume that the before measurement of the problem is a reasonable indicator of what the problem would be like, if you had used the response. So a meaningful decline in the problem from before to after is an indicator of success.

Here, the number of crashes with injuries is higher before the Cincinnati Police Department’s response than it is after. This is consistent with an effective response.

The limitations are obvious. First, something else could have occurred at about the same time as the response that could have caused the decline in accidents. Second, it is possible that injury-related accidents were trending downward before the response was implemented.

Figure B3: Pre-Post With Control Design

Pre-Post With Control Design<

This design combines features of the static comparison and pre-post designs. You compare the before-after differences of the two groups. Here, injury accidents in Cincinnati declined 11.6% from before the response to after. The Hamilton County injury accidents declined only 4.8%. We use percent decline because the two areas had different numbers of accidents before the response.

The Hamilton County decline (4.8%) is assumed to be the decline Cincinnati would have received, if it had not engaged in a problem-solving response to injury accidents. To determine the impact of the response, we subtract 4.8 from 11.6 (since the 4.8% decline would presumably have occurred whether or not a response was implemented). The Cincinnati police response may have created a 6.8% decline in injury-related vehicle crashes.

The reason this design is better than the static comparison and pre-post design is that it removes from consideration many alternative causes for the Cincinnati decline. For example, changes in state law or gas prices would impact both the county and the city. So they cannot explain the different rates of decline. Similarly, if injury accidents were trending downward throughout the state, this too would influence both of these jurisdictions, so it cannot explain the difference.

The principal limitation of this design is that there might have been something (other than the response) that occurred in Cincinnati and not the county (or vice versa) that pushed injury accidents down in the city.

Whereas in a pre-post design effectiveness is measured by calculating the percent change, when a control group is used we compare the difference between the percent declines, as illustrated for this example in Table B1. Here we see that the control area had a reduction of 164 crashes, which, when divided by the before number (3,421), is a 4.8 percent drop. Cincinnati had a reduction of 375, which, when divided by the before number (3,215) is an 11.6 percent drop. Subtracting the percent decline in the control from the percent decline in the response yields a net reduction of 6.8 percent (dividing -6.8 by -4.8 shows that the Cincinnati drop was almost 42 percent greater than the county’s drop).

Table B1: Calculating Effectiveness With a Pre-post With Control Design
  Before After Difference % Difference
Control (County) 3,421 3,257 -164 -4.8
Response (Cincinnati) 3,215 2,840 -375 -11.6


Response % difference - Control % difference -6.8

Time Series Design

This design was also discussed in the main body of the guide. Figure B4 shows the time series for the Cincinnati vehicle crashes with injuries. The horizontal dashed lines indicate the average (mean) number of such crashes per month, before and after the response. The before data are used to determine what might have occurred in Cincinnati if no response had been made. Here a simple comparison of these averages suggests an effective response. Typically, an analyst will use a more-complex statistical procedure to remove the effects of trends (here downward) and seasonal cycles. This gives a more-precise estimate of the impact. However, this type of analysis is far beyond what can be explained in this introductory guide.

Figure B4: Time Series Design

Time Series Design<

A time series design stretches the pre and post measurement. Instead of comparing two twelve-month periods, here we look at the monthly changes for over seven years. This allows us to see trends and cycles. Two things become obvious. First, the number of accidents jumps up and down a great deal. This is often the case with police problems, and is the reason we should be wary of month-to-month comparisons: random fluctuations will dominate any systematic changes. Prevention is aimed at producing a systematic change, which the random fluctuation can hide. The second thing we can see is that there is a slow downward trend from May 2003 to around May 2009. So even if the Cincinnati police did nothing new, accident injuries would have gone down.

The vertical line shows when the response began. Before the response, there were 287 crashes in an average month (dashed line). After the response, there were only 220 crashes in an average month. Typically, a complex statistical analysis of these data are used to tease out the impact of the response. Time series analysis can distinguish the real response effect from trends and seasonal cycles, as well as other measurable factors (e.g., changes in gas prices).

Multiple Time Series Designs

When two or more times series are used the design is called a multiple time series. This design can rule out most other possible alternative explanations for the change in the problem. Figure B5 illustrates a multiple time series. This example illustrates the usefulness of adding a control time series. If we had simply looked at Figure B4, we could legitimately have assumed that much of the decline in Cincinnati’s injury-related crashes could have been due to the downward trend that preceded the response. In Figure B5 we see that the trend influenced the surrounding county as well as the city. The differences in the average number of crashes per month between the city and county grew larger after the response: before, the county had an average of 35 more crashes than the city in a month; after, the county had an average of 42 more crashes than the city in a month. Like the simple time series design, analysts use highly complex statistical techniques. The study these data come from illustrates some of the complexity involved.

Figure B5: Multiple Time Series Design

Multiple Time Series Design<

A multiple time series design adds one or more control areas to the analysis. This helps eliminate possible causes unrelated to the response. Here, both the city and county had very similar trends prior to the response, though the city tended to be consistently lower than the county in injury accidents (about 287 v. 322 crashes per month). After the response, the gap between the county and the city grew (220 v. 262 crashes per month). No factor common to both jurisdictions could account for this, so the only explanation is that something different occurred in Cincinnati, relative to the county. The response is one such explanation.

Like the time series design, complex statistical analysis is needed to separate the impact of the response from the impact of other factors and random fluctuation.

The principal advantage of using a multiple time series design is that it can eliminate a large number of alternative explanations for an improvement in the problem. The only possible alternative to the claim that the response caused the decline is that something occurred in Cincinnati at about the same time as the response was implemented, and this thing did not occur in the county (or it occurred in the county but not the city). So the results of a multiple time series design, though solid, are not certain. However, for practical purposes, these results probably exceed the level of certainty we need in order to consider the response to have been successful.

m Thanks to Drs. Nick Corsaro and Robin Engel of the Institute of Crime Science, within the School of Criminal Justice, University of Cincinnati, for making these data available. The Institute of Crime Science provides scientific consulting services to police and other law enforcement agencies, including complex evaluations.

n This design is usually referred to as a “non-equivalent control group design” to draw attention to the fact that members of the treatment (response) group and members of the control group may be different in important ways that could affect the outcome of the evaluation.