# Conducting Impact Evaluations

There are two parts to impact evaluations. The first involves measuring the problem. The second involves systematically comparing changes in measures, using an evaluation design. Evaluation designs are created to provide the maximum evidence that the response was the primary cause of the change in the measure. Weak designs may be adequate for demonstrating that the problem declined, but they provide little assurance that the response caused the decline. Strong designs provide much greater assurance that the response caused the decline.

## Measures

Impact evaluations require measures of the problem before and after the response. You should start deciding how to measure the problem during the scanning stage, and have made final decisions about measures by the time you have completed the analysis. This will allow you to use information collected during the analysis to describe the problem before the response. During the assessment stage, you take measures of the problem after implementing the response. You use the same measures before and after the response. Clearly, you must plan the evaluation well in advance of the assessment.

### Quantitative Measures

Quantitative measures involve numbers. The number of burglaries in an apartment complex is a quantitative measure. You can count such measures before and after the response, and note the difference. Quantitative measures allow you to use math to estimate the response's impact. For example, burglary rates drop 10 percent from before the response to after the response.

### Qualitative Measures

Qualitative measures allow comparisons, but you cannot apply math to them. Though most evaluations use quantitative measures, qualitative measures can be extremely useful. Here is an example. Suppose you are trying to address a problem of gang-related violence in a neighborhood. From your analysis, you know that much of this violence stems from escalating turf disputes, and that graffiti is a useful indicator of intergang tension that can lead to violence.

You count the number of reported gunshots, gun injuries and gun fatalities in the year before and the year after the response. These are quantitative measures. You also take monthly photos of known graffiti hot spots both before and after the response. By comparing the photos, you note that before the response, gang graffiti was quite common, and non-gang graffiti was rare. Further, many of the markings suggested that rival gangs were overwriting each other's graffiti. After the response, you find there is little gang graffiti, but non-gang graffiti has increased. Further, there is no evidence of overwriting in the little gang graffiti that you do find. This qualitative information reinforces the quantitative information by indicating that the response may have reduced gang tensions, or that the gangs have declined.

Maps can provide another qualitative measure. They are very useful for showing crime and disorder patterns. Though the number of crimes is a quantitative measure, the size and shape of the crime patterns are largely qualitative. You can use changes in these patterns to assess the effectiveness of responses.

### Measurement Validity

You must make sure that quantitative and qualitative measures record the problem, and not something else. For example, counts of drug arrests are often better measures of police activity than of changes in a drug problem. You should use arrest data as a measure of the problem only if you are sure that police enforcement efforts and techniques have remained constant. Similarly, systematic covert surveillance of a drugdealing hot spot before and after a response could be a valid measure if the surveillance has remained unchanged and undetected by drug dealers.

Measures are seldom definitively valid or invalid; rather, they are more or less valid than alternative measures. The more indirect the measure, the less valid. Surveillance entails direct observation. Arrest statistics are indirect. They involve the activities of the drug dealers and customers (the aspects of the problem you may be most interested in), but they also involve citizen decisions to bring the problem to police attention, and police decisions about whether (and how) to intervene. These citizen and police decisions may not always reflect the underlying reality of the problem. For example, changes in police overtime policies or the presence of special antidrug squads can change the number of arrests, even if the drug problem remains constant. For this reason, the number of drug dealer arrests is a less directand often poormeasure of a drug problem.

Here is another example of direct and indirect measures of a problem. In this example, what constitutes a direct measure and an indirect measure depends on how you define the problem. Suppose you are addressing a prostitution cruising problem. Men drive into a neighborhood on Friday and Saturday nights, looking for prostitutes to pick up. This annoys the residents, and they call the police. You have a choice of two measures for this problem.

The first is a quantitative measure taken from automatic traffic counters strategically placed on the critical streets three months before the response, and left there for three months after. These devices measure traffic flow. You use the difference between the average Friday and Saturday night traffic volume and the average volume during the rest of the week as an estimate of the traffic due to prostitution.

You base your second measure on interviews of residents conducted three months before and three months after the response. You ask residents to assess the prostitution problem, using a numerical scale (0 = none, 1 = minor, 2 = moderate, 3 = heavy).

If you have defined your problem as prostitution-related traffic, traffic volume is a more direct measure than residents' assessments. Not all of the difference between the Friday and Saturday traffic level and the level for the rest of the week is due to prostitution, but a large part of it probably is. So this is a reasonable approach to measuring the problem. Asking residents, however, is fraught with difficulties. Their current perceptions of prostitution may be colored by past observations. They may not see much of the prostitution traffic, particularly if they are staying indoors to avoid the problem. They may misperceive activities as prostitutionrelated, when they are not.

If, on the other hand, you have defined the problem as residents' perceptions of prostitution-related traffic, the interviews are a more direct measure than the traffic counts. Prostitutionrelated traffic may not have changed, but the residents think it has. By this measure, the response has been a success. But if prostitution-related traffic has declined precipitously, and the residents are unaware of it, then, by this measure, the response has not worked.

Of course, you can use multiple measures. In this example, you could measure both the prostitution-related traffic and the residents' perceptions of it. Only if both declined would you have an unambiguous success. If the traffic counters indicated a drop in traffic, but the interviews showed that the residents were unaware of it, then you could alter the response to address their perceptions.

In addition to taking the most direct measure of the problem possible, you also need to make sure you measure the problem systematically and follow the same measurement process throughout the entire evaluation. If, after the response, you photograph graffiti hot spots from different angles and distances than those used before, then it will be difficult to make valid comparisons. If the hot spots you photograph after the response are not the same ones you photographed before, then the validity of your comparison is highly questionable. This is because any difference noted might be due to how you collected the data, rather than to a real change in the problem.

In short, you want to make sure that any difference noted in the problem is due to changes in the problem, and not to changes in the way you measured it. One way of thinking about this is to compare it with physical evidence-gathering at a crime scene. The reason there are strict protocols for gathering and handling evidence is that we do not want to mistake the evidence gatherers' activities for those of the offender. The same holds true in evaluations.

### Selecting Valid Measures

How do you select specific measures for your problem? There is no simple answer to this question that can be applied to any problem-solving effort. The guides in this series suggest measures for specific problems. If you are working on a problem not covered in a guide, then the simplest approach is to use one or more of the indicators you used to identify and analyze the problem. It is important, however, to think carefully about problem definition. As we saw in the prostitution example, seemingly minor changes in how we define the problem can have significant implications for measurement.

Clearly, you need to think about evaluation measures as soon as you begin the problem-solving process. If you wait until after you have implemented the response, then you might miss the chance to get valid "before" measures.

## Criteria for Claiming Cause

There are two goals for a problem-solving assessment. The first is to determine if the problem has changed. We are particularly interested in whether it has declined. Only after establishing whether the problem has changed does the second goaldetermining if the response caused the changemake sense.

If the problem has not changed, and if you do not intend to use a similar response to address other problems, then you don't need to worry about cause, and the evaluation is relatively simple. If, however, the problem has changed, and if you will likely use the response again, then it is important to determine if the response in fact caused the change. If the problem declined for reasons other than the response, then using the response to address similar problems is unlikely to reduce them. If the problem got worse for reasons other than the response, then the response might still be a useful way to address other problems. Consequently, it is important to understand what criteria we require to claim a response caused a change in a problem.

The concept of cause may seem pretty straightforward, but it is not. To be able to confidently proclaim that a response caused a problem to decline, you need to meet four criteria. The first three criteria are relatively straightforward and can often be met. The fourth criterion cannot be met with absolute certainty.

### There Is a Plausible Explanation of How the Response Reduces the Problem

The technical term for this criterion is "mechanism." Wherever possible in this guide, commonly understood language has been substituted for the technical language of evaluators. Footnotes provide the technical terms for those interested in further study.

The first criterion for claiming cause is that you have a plausible explanation of how the response reduces the problem. You should base this explanation on a detailed problem analysis, preferably augmented by prior research and theory. The fact that others used a similar response and reduced their problem is not an explanation. Such information is useful, but you still need to explain how the problem reduction occurred. Absent a convincing explanation, you do not know whether the response was successful by accident, whether the response was successful due to the particular situation in which it was first applied (and thus will not work on your problem), or whether the response is generally useful.

Here is an example to illustrate what is meant by a "plausible explanation." Suppose you have been working on a street prostitution problem, and you know that the prostitutes congregate along a three-block stretch of road (on B Street, between First and Fourth streets), one block off of a very busy thoroughfare (A Street). Each numbered street has traffic lights (see Figure 2, left side), and all of the streets are two-way. Between A and B streets are a largely vacant old warehouse and a light industrial area. The prostitutes and customers use this abandoned property. Customers enter B Street from A Street using the numbered streets, and circle the blocks looking for prostitutes.

Between B and C streets is an old residential neighborhood of single-family homes called the Elms. C Street has become a thriving entertainment and arts area, and older Elms residents are selling their homes to younger, affluent couples. Residents complain about the traffic and noise, the harassing calls of the prostitutes and customers, and the litter (drink containers, condoms and other debris).

To address this problem, residents propose a series of street changes. B Street will be made one-way north, and Elm Street one-way west, while Fourth Street will be made one-way east between A and B streets. The other numbered streets will be blocked off from A Street, and their traffic lights will be removed. A new traffic light will be placed at the intersection of Elm and A streets, but only left turns from Elm onto A will be permitted. Another traffic light will be placed at the intersection of Elm and C streets. The right side of Figure 2 shows these changes.

Figure 2: Street layout before and after a response to prostitution

Why do the residents think this will work? We hope their explanation is plausiblethat is, it is logical and takes into account the known facts. The residents claim the area is a hotbed of prostitution activity in large part because the streets facilitate solicitation. Customers can quickly cruise around the block looking for "dates." Changing the street patterns in the manner described will make circular cruising more timeconsuming. If customers do not make a contact on the first pass, they will spend much more time on the return trip. By reducing the convenience of prostitution, fewer customers will come to the area, and the problem will decline. In addition, by streamlining the traffic flow, it will be easier for the police to detect prostitution-related activities. By observing customers and prostitutes, you can verify the cruising behavior. If this explanation is logically consistent with the available information, and if there is no obvious contradictory information, then the residents have leaped the first hurdle for establishing a causal connection.

A plausible explanation does not guarantee that the response will work; many plausible ideas do not work when tested. But it does make the response a more likely candidate for a successful solution. The explanation has added credibility in that previous research describes the relationship between prostitution and circular driving patterns,3 and also indicates that reducing the ease of neighborhood traffic movement sometimes reduces crime.4 Further, it is consistent with the theory of situational crime prevention, particularly the strategy of increasing offenders' effort.5

In summary, the first step in claiming cause is to have a plausible explanation of (1) how the problem occurs, and
(2) how the response reduces it. This explanation should also cover when, where and why the response works. If prepared at the time you are crafting the response, the explanation can help guide planning and implementation. The more specific the explanation, the better the response and the more informative the assessment. Ideally, the explanation will also describe the circumstances in which the response is unlikely to work. This can aid in both the process and the impact evaluations.

### The Response and the Level of the Problem Are Related

The technical term for this criterion is "association." Typically, association is measured by the correlation between the response and the level of the problem.

The second criterion for claiming cause is that there be a relationship between the presence of the response and a decline in the problem (and between the absence of the response and an increase in the problem).

Let's go back to the prostitution problem. How would we demonstrate a relationship here? Just north of the Elms is a similar neighborhood (also between A and C streets, with a deteriorated light industrial area to the west, and the thriving C Street development to the east), but the streets do not allow for easy circular driving. Now if the ease of circular driving is associated with prostitution, then we should see little or no prostitution in this other neighborhood. This would imply that changing the Elms' street patterns might be helpful. However, if there is prostitution in this area, too, then there is not a strong link between prostitution and ease of circular driving, and this suggests that changing the street patterns may not be effective. Either way, the evidence would not be strong, but the findings could be helpful.

There is yet another way to examine a relationship. We might also measure the problem before and after the street changes. If we see high levels of prostitution-related traffic (or high levels of resident perceptions of it) before the changes, but low levels after the changes, we will have evidence of a relationship.

So the second hurdle to jump in claiming causation is to demonstrate that the problem is bigger in the absence of the response than when the response is in place. Though it is tempting to declare victory at this stage, we must surmount two other hurdles before we can be confident that the response caused the decline in the problem.

### The Response Occurs Before the Problem Declines

The technical term for this criterion is "temporal order."

The third criterion for claiming cause simply requires that the response precede the decline in the problem. Since it is impossible for a response to have an effect before it is implemented, this criterion makes a lot of sense. There's one major caveat here: in defining "response," we include publicity about the responseintentional or accidental. A widespread media campaign may precede a drunken driving intervention, so that even before the intervention, potential drunken drivers may alter their behavior. In this case, the media campaign is part of the response. A decline in drunken driving before the media campaign would be evidence that something other than the response caused the decline. But a decline after the media campaign, but before the intervention, could be credited to the response.

Despite the obvious simplicity of this criterion, it is surprisingly common to see violations of it. Throughout the 1990s, homicides declined in large U.S. cities. In the middle of the decade, a few years into the downward trend, several cities implemented crime reduction strategies and gained substantial publicity. As homicides continued to decline in these cities, proponents claimed that the reductions were due to the new strategies. However, homicides had been declining before the changes, so it is difficult to attribute the decline to them. In short, the purported cause of the decline followed the decline. If, on the other hand, the cities had implemented the changes in 1990, the claim that the changes caused the drop in homicides would be more plausible.

There is another reason to be skeptical that the changes in policing caused the decline in homicides. Homicides declined in other large cities that had not implemented the same changes. For a more detailed examination of the police contribution to the homicide decline in the 1990s, see Eck and Maguire (2000).

To demonstrate that the response preceded the problem's decline, you must know when the response began (including publicity about it), and have measures of the problem before and after the response. This is a before-after (or pre-post) evaluation design. We saw this design in the prostitution example, when we described ways of demonstrating a relationship. We used a number of examples of pre-post designs in the section on measurement. Pre-post is the most common evaluation design, but it is not particularly strong; that is, a simple pre-post design can show a decline, but it is insufficient for establishing what caused it.

Despite its simplicity, this criterion can be difficult to meet. But even if you can show that the decline in the problem followed the response, you need to meet one more criterion before you can definitively claim that the response caused the decline.

### There Are No Plausible Alternative Explanations

The technical term for this criterion is "non-spuriousness." A spurious relationship is a hypothesized relationship between two or more variables that is false or misleading.

Let's continue with the prostitution problem. You have an explanation, you have demonstrated a relationship, and you have shown that the response preceded the decline in the problem. You now need to make sure that nothing else could have caused the decline. Recall that the C Street corridor and the Elms were going through a series of changes. New residents and the remaining older residents were trying to clean up the area. One thing they did was to ask the police to help. Did they do anything else? Suppose the Elms' Neighborhood Association (ENA) and the C Street Corridor Business Association (CSCBA) identified the owners of the abandoned and vacant property and put pressure on them to clean it up, denying prostitutes access to it. And suppose that this change occurred at about the same time the street changes did. So you could think of the ENA and the CSCBA as the cause of both the street and the land-use changes. If the land-use changes were the real cause of the decline in prostitution, and the street changes were irrelevant, you would still see a relationship between the street changes and the decline, and you would still see the response before the decline. Nevertheless, something else caused the decline.

Figure 3 diagrams the notion of an alternative explanation. The upper half shows what you believe: the response caused the decline in the problem (as indicated by the arrow). This belief may come from a variety of valid sources. Nevertheless, something else caused the response, and something else caused the decline (lower half of the figure). Here, more "something else" led to more response and, at the same time, a reduction in the problem. The absence of an arrow between the response and the decline in the problem shows that the response was irrelevant to the decline. An outsider, observing more response and less of the problem, might conclude that the response caused the decline. In situations like this, the observed relationship between the response and the decline is misleading. The possibility of a misleading relationship is a threat to an evaluation's validity.

Fig. 3. Alternative explanations

There is a related concern that should also be mentioned. The "something else" might not have prompted your problemsolving effort (as was the case in the prostitution example); rather, it might have occurred by coincidence at about the same time as your response. Practically speaking, it might not matter if the "something else" occurred at the same time as your response, or if the "something else" caused both the response and the decline. In neither case did the response cause the drop in the problem.

To demonstrate a causal connection between the response and the decline, you need to provide sound evidence that there is no "something else." To do so, you need to show that there are no reasonable explanations for the decline, other than the response. You do this by carefully examining the most obvious counterclaims and assessing evidence for them.

Ruling out alternative explanations is difficult. You can never do so definitively because there are many possible causes of problem fluctuations. All you can do is rule out the most obvious alternative explanations for the decline. In many respects, it is similar to demonstrating that a suspect committed a crime. The standards of evidence vary, depending on the decision being made. Stronger evidence is required to establish guilt in criminal court than to secure a warrant for an arrest. But in neither case is absolute evidence of guilt required. We can never prove that a response caused a decline in a problem because we cannot rule out all possible alternative explanations. We can make better or worse cases for such claims, however. And this is where the evaluation design comes in. Some designs allow for stronger statements of causality than others, just as some prosecutions are more plausible to a jury than others.

## Contents & Links

 Introduction Assessing Responses to Problems Conducting Impact Evaluations Evaluation Designs Appendix A: The Effects of the Number of Time Periods on the Validity of Evaluation Conclusions Appendix B: Evaluation Designs With Control Groups Appendix C: Calculating a Response's Net Effect Appendix D: Summary Of Evaluation Designs' Strengths and Weaknesses Appendix E: Problem-Solving Evaluation Checklist Endnotes References