Geocoding survey
Index > research > geocoding survey![]()
On 20th May 2002 I posted a short message on the NIJ Crime Mapping Research Center (now the Mapping and Analysis for Public Safety [MAPS] program) list server, asking list members for their average success rates with geocoding. The main part of the message read as follows:
Hi all,
Simple question: What is your average geocoding hit rate?
Answers off list please, to jhr@temple.edu.I'll summarise all replies back to the list. All I'd like is your best guess as to your average geocoding hit rate every time your geocode local crime data. It would also help if you told me your agency size and GIS software, though this isn't essential. I don't want to turn this into a 500 question survey, just a simple one question quiz!
By 28th May 2002, I had received replies from 39 individuals, describing their geocoding experiences with numerous different agencies. In the end I had usable responses with regard to 43 different agencies.
Range of responses
E-mails received ranged from a single word answer, to short essays (one person wrote 1,300 words on the issue). Some people kindly described their agency, the city size and the population that their agency covers. This helped me learn a little about the geography of the US! A number of people also described their geocoding process. I've tried to extract the useful parts of their comments below.
The largest agency to respond had 4,500 law enforcement officers, while a number of small agencies had less than 100 (the smallest agency of the 21 people that described their agency had 46 sworn officers). 11 respondents described the population that they covered, the largest being 900,000 and the smallest 31,000. Of the people that explained their geocoding engine, 21 were ESRI users. A couple of others had locally developed proprietary products. I know that there are MapInfo users out there, but either they didn't respond to the post, other they kept their 'MapInfo-ness' a secret! Of the 21 ESRI product users, 12 were using ArcView 3.1 or 3.2. The leap to 8.1 is still to be made by many it seems.
Average geocoding rates
The survey generated 43 useful responses in relation to agencies, from 39 individuals. My own geocoding experiences have not been included in the survey.
The mean average geocoding hit rate was 87.5%, with a standard deviation of 14.1%. The lowest was 41%, while the highest was 99.7%. Slightly more than two thirds of the responses were 90% or greater.
Geocoding hit rate and agency size were negatively correlated (in other words, as agency size increases, geocoding accuracy decreases) however only 21 people provided agency size details and the results are not statistically significant.
Users' experiences
A number of respondents kindly described their geocoding process. From these responses it is possible to determine a number of key points in the geocoding process, points that were reiterated by a number of people.
1. Hit rate is dependent on crime type. Crimes at addresses, such as domestic incidents and burglaries, have a higher hit rate than others.
2. Keeping a master table of difficult addresses is worth it. Those people that made the effort to track down unknown addresses and geocode them for future use returned high hit rates.
3. Having trained, dedicated staff who are focussed on the GIS side of things increases the geocoding hit rate. A number of respondents mentioned that when they had a staff member concentrate on the GIS solely, the accuracy and therefore value of the GIS improved noticeably.
4. Geocoding at the point of entry, checking addresses as they are entered into the database, assured the highest hit rates. This corroborates similar comments in Ratcliffe, 2000, page 315. (The full citation for this paper is; Ratcliffe, J.H. (2000) Implementing and integrating crime mapping into a police intelligence environment, International Journal of Police Science and Management, 2 (4): 313-323. - it can be downloaded from here).Working with geocoding engines
ESRI products allow a user to determine a score, out of 100, above which a match between a database address and a geocoding street segment is agreed upon. With initial geocoding (usually termed 'automatic' geocoding) the choice of a suitable match score is important. Lowering this score can increase the hit rate, but decrease the accuracy.
It appears to be valuable that manual (or 'interactive') geocoding is not just a process to be conducted after automatic geocoding, but that it is used to learn which addresses are commonly misspelled, or not recognised by the geocoding engine. Using the manual geocoding process to refine future geocoding applications appears to be a highly worthwhile venture.
Finally, address scrubbers, programs that search through an address database and correct some of the more common mistakes, appear to be a valuable tool for a first pass prior to geocoding.
A final word on geocoding hit rates
Using Monte Carlo simulation of a declining hit rate, I used a number of real crime patterns, aggregated to census tracts, to calculate a statistically significant geocoding hit rate below which maps failed to accurately reflect a notional 100% hit rate map. The limit found in this analysis was 85%, suggesting that 23% of respondents in the survey are geocoding to a level below the safe determined level (i.e. below 85%).
You can read more about this analysis in the paper: Ratcliffe, J.H. (2001) On the accuracy of TIGER type geocoded address data in relation to cadastral and census areal units, International Journal of Geographical Information Science. 15 (5): 473-485. It can be downloaded from here.
Survey respondents: The roll of honour!
Cabrina Scott, College Station, TX
Anne Davis, Phoenix, AZ
Bob Feliciano , Whittier, CA
Tim Burns, Clearwater, FL
Stacy Belledin, Jacksonville, FL
Julie Wartell, ILJ
Safa Egilmez, Malibu/Lost Hills, CA
Kristina Shull, Kirkland, WA
Carolyn Arbogast, Manatee, FL
John Markovic, Vera Institute
Wil Gorr, Carnegie Mellon University
Eric Arrington, Durham, NC
Julie Cooper, Irvine, CA
Patrick Pence, Tallahassee, FL
Karen Kane, Calgary, Canada
Susan Wernicke, Overland PD
Kim David, Thornton, CO
Gini Connolly, Fort Worth, TX
Tom Callahan, Woburn, MA
Uyen (Win) Pham, Washington, DC
Eileen Rudisill, Philadelphia, PA
Cory L. Becker, Sugar Land, TX
Kevin J. Switala, Philadelphia, PA
Bryan Hill, Glendale, AZ
Deborah Osborne, Buffalo, NY
Jessica McCann, Baltimore, MD
Helen Cook, Bromley, UK
Ron Rasmussen, Seattle, WA
Steve Sullivan, Thousand Oaks, CA
Wes Westerfeld, Baltimore, MD
Jenette Johnson, Los Gatos, CA
Eric Y. Kim, Wisconsin
Jody Murphy, Brooklyn Park, MN
Raymond Wickline, Metro PD, Washington DC
Ken Johnson, Seattle, WA
Tom Casady, Lincoln, NE
Megan Ambrosio, Newark, NJ
Stan Lenhart, Eugene, OR
Pat Creamer, Mobile, ALThanks to everyone!