How widespread are crime increases? Here is one analytical method
Updated: Feb 2, 2021
Many cities across the US suffered crime increases over the last few years, and especially from 2019 to 202. But were crime increases across a whole city or focused in a few areas? There may be operational benefits to focusing crime prevention if crime increases are concentrated. Dispersion analysis measures the relative dispersion of a crime increase across a region and allows for the identification of particular spatial units that contribute more heavily to driving up the overall jurisdictional crime rate.
The approach works as follows. Take a city (or region) with a certain number of sub-areas; beats/districts/precincts/whatever which we will call units. Count crime in each unit at two time periods, which we call t1 and t2. Order the units by the difference between t1 and t2, starting with the greatest crime count increase (the worst place). The analysis answers; given the region had an overall crime increase, how many units would need to negate their individual crime increase before the overall region did not have a crime increase? And in what order?
There are two ways to 'negate' the crime increase. One method (remove method) can be to remove the unit entirely from the calculation. Analytically this removes both t1 and t2 from the city (region) rate and then recalculates the crime rate as if the removed unit never existed. A more realistic approach is to zero out the change as if the unit crime rate stayed exactly the same. So t2 gets assigned (or matched) with the same crime count as t1, making the change essentially zero (match method). This is like imagining city management had been able to contain crime increases and maintain through t2 the same crime count as t1.
Why is this interesting? If you only have to negate ten percent of a region, that might be much more useful for crime prevention than if a crime increase was more widely dispersed across 50 percent of a region. Here's an example.
A worked example from Philadephia
Here is a simple worked R example using open data. All data are made publicly available by the Philadelphia Police Department through OpenDataPhilly. This example assumes you are using RStudio and have some basic familiarity with R.
Download crime data
Start by downloading Philadelphia crime data for 2018 and 2019 to your local machine. Paste the following straight into your browser, or click on it. It should download a file called "incidents_part1_part2.csv". It might take a few seconds as the file is about 25 Mb.
Move the file from your download folder to a working folder and rename it. For this example I called my file "Phl2018crime.csv".
Now do the same procedure for the following link, renaming your file "Phl2019crime.csv"
Install the package and load the data
Open RStudio. Here we create an R script in RStudio, load a couple of libraries, set the working directory to the place where your example data are lurking, and import the data. I like the 'rio' package for data import/output, but choose whatever works for you. Enter and run the following script, making sure that whatever is at the setwd command points to your data.
library(rio) library(dplyr) setwd("C:/Users/yourname/DispersionExample") phl2018 <- import("Phl2018crime.csv") phl2019 <- import("Phl2019crime.csv")
In Philadelphia, crime data are sorted by general UCR categories rather than specific individual UCR types. So create two new data sets (called sub2018 and sub 2019) comprising just the simple/other assaults, coded 800. At the same time we have to combine the district number (dc_dist) and the beat (psa) to create the unique geographic area used in Philadelphia policing. PSAs (police service areas) are subdivisions of district, so we should put them together. With the distpsa variable created (mutate) we count crimes in each psa. The following command does this for both 2018 and 2019.
sub2018 <- phl2018 %>% filter(ucr_general == 800) %>% mutate (distpsa = paste0(dc_dist,psa)) %>% count(distpsa) sub2019 <- phl2019 %>% filter(ucr_general == 800) %>% mutate (distpsa = paste0(dc_dist,psa)) %>% count(distpsa)
I you want to see how many other/simple assaults were reported in 2018 and 2019, run:
Now we have to get the data from 2018 and 2019 into a single data frame with the three requirements to send to the crimedispersion package. The package needs a single data frame with three columns; a column with the name of the unit area, a column with the t1 crime count, and a column with the t2 crime count.
You create the single data frame (which I call v20182019) and at the same time, rename the columns to something recognizable. The last line of this command finds any areas with zero crime counts and makes sure they have a zero in them (the coomand can create an NA value, so we replace these with zero).
v20182019 <- sub2018 %>% full_join(sub2019, by = 'distpsa') %>% rename (crime2018 = n.x, crime2019 = n.y) %>% mutate_if(is.numeric,coalesce,0)
Download and install the crimedispersion package
First you install a package that can allow you to work with GitHub:
Load the package:
With the development tools installed, install the crimedispersion package.
Load the package:
Okay, now the fun bit. You call the function by passing the name of the data frame, and the names of the three columns. In this example, we want the result to go into a holder we will call output. You can see the passed information is the name of the data frame, and then the three data fields.
output <- crimedispersion(v20182019, 'distpsa', 'crime2018', 'crime2019')
Understanding the result
I have written the program to output a range of results, all stored in a list that you have associated with output (though you can use any name you like). You access them with the double square bracket option. There are five result options:
1. A data frame with the ordered removal list. The data fields are:
unit - the unit that was removed at this stage
adjusted - the number of units removed
unit_t1 - the crime count in this unit at t1
unit_t2 - the crime count in this unit at t2
region_t1 - the crime count in the overall region at t1
region_t2 - the region crime count at t2 (after units have been removed or matched)
chg - the difference t1 to t2 in the region after adjustments
pct - the percentage difference t1 to t2 after adjustment
You can copy the dispersion data frame to a new one (e.g. new.df) with a simple command:
new.df <- output[]
Options 3, 4, and 5, provide the analytical results for:
[] - the number of units that have to be removed for the crime rate to go from positive to negative
[] - the offense dispersion index (ODI), essentially a ratio of the number of units that have to be removed as a ratio of the overall number of units.
[] - the non-contributory dispersion index (NCDI), which is a measure of the ratio of areas that had increases in crime, though were not central to the crime increase. See the citation for details.
Finally, option [] generates a plot based on the data frame at output[]. You can access it using this command:
The plot option needs a little more explaining.
The plot output
You can of course create your own plot based on the data frame at output[], but the automatic plot is available to you in the results. For the Philadelphia study, you get this:
What you can see is that in Philadelphia, citywide simple/other assaults increased 4.42% from 2018 to 2019. To zero out this crime increase, you would need to change 13 of the 66 spatial units to their 2018 crime level before the overall crime increase would no longer be an increase. This ratio is an offense dispersion index of 0.197. Starting with unit (PSA) 192, you can see what the crime rate would have been, had each area in turn been held to its 2018 crime count. As PSA 242 is given the same crime count in 2019 as it had in 2018, the city crime rate would not have had an increase.
Comparing crime types
Repeat the exercise but this time looking at motor vehicle theft. This means replacing the 800 in this command, with 700, as shown.
sub2018 <- phl2018 %>% filter(ucr_general == 700) %>% mutate (distpsa = paste0(dc_dist,psa)) %>% count(distpsa) sub2019 <- phl2019 %>% filter(ucr_general == 700) %>% mutate (distpsa = paste0(dc_dist,psa)) %>% count(distpsa)
The comparison chart is illustrative. Even though the vehicle theft crime rate change from 2018 to 2019 is a larger increase (7.55%) than simple assaults, the substantial increase was more focused. With an ODI of 0.121, the city would only have had to hold 12 percent of the PSAs (12.1% specifically, or 8 of the city's 66 PSAs) to their 2018 crime rate, and the entire city would have had a slight crime decrease for 2019. The 2018 to 2019 vehicle theft increase is therefore much more concentrated, and less dispersed, than the simple/other assaults.
Caveats and references
The 'crimedispersion' package is still under development and will get enhanced only as-and-when I get time away from everything else I am doing. Also, at the moment it only works when you have a crime rate increase from t1 to t2. I'll figure out something for crime rate decreases at some point later. Suggestions welcome.
To cite or just read a bit more about crime dispersion, read:
Ratcliffe, JH (2010) The spatial dependency of crime increase dispersion, Security Journal, 23(1): 18-36. You can read the entire article here. Just click on it's name at number 40.