Search
  • Jerry Ratcliffe

Problem with the near repeat calculator

On 8th October 2019, Dr Toby Davies and Dr Wouter Steenbeek informed me that in the course of developing and testing R and Python code to conduct a near repeat analysis, they had discovered an inconsistency between their results and those from the Near Repeat Calculator. They have identified a mismatch between the analyzed bandwidth ranges and the intervals reported by the near repeat calculator. More specifically where near repeat calculator results are reported at an interval of ‘a to b’; these are in fact computed on the basis of the interval ‘(a-1) to and including (b-1)’. For example, a spatial band of ‘201 to 400 meters’ in fact includes incidents that are 200 up to 400 meters apart. In interval notation, that which is labelled in the near repeat calculator as ‘a to b’ in fact represents the interval [a-1, b). What this means is that where the near repeat calculator labels bandwidths as:


Same location, 1 to 200 meters, 201 to 400 meters, 401 to 600 meters, … the results indicate:

Same location (0 meters), 0 or 1 to and including 199.999… meters, 200 to and including 399.999… meters, 400 to and including 599.999… meters.


And where the near repeat calculator labels bandwidths:

0 to 14 days, 15 to 28 days, 29 to 42 days, the result indicate:

0 to and including 13 days, 14 to and including 27 days, 28 to including 41 days


I made this error, regret the error and offer my apologies to everyone who has used the near repeat calculator. In testing the software, I used a variety of small data sets, and tracked individual code lines through the software, and therefore remain astounded and deeply embarrassed that I missed this coding error. What does this mean? For the spatial bands, as Dr Davies and Dr Steenbeek kindly note “the labels reported by the Near Repeat Calculator are almost the same as the true bandwidths”. The substantive conclusions for most applications are probably not affected - though this does not mitigate the error. This may not be the case for the temporal bands. If users are using large datasets and broad bandwidths then the effects may not be consequential; however they may be when users are using the much smaller bandwidths that were (frankly) unanticipated when the near-repeat hypothesis was developed. Again, this is not mitigation and I regret the error. The bottom line is that the results may not be accurate, and I am indebted to Dr Steenbeek and Dr Davies for pointing out the error.


Given that some people have used the near repeat calculator for research and for analytical crime reporting, I wanted to get this out as soon as I learned of the error and had checked it myself. I’m frankly devastated that this has occurred, and can only at this point offer my sincere apology to everyone affected. I hope that the substantive conclusions from most existing analysis are still valid, though recognize this is only a hope of mitigation. The software has been removed from the website until I figure out the next step. The error has been identified and a revised version will likely be released in the future. Until then, analysts are encouraged to use the R and Python code developed by Dr Davies and Dr Steenbeek. Links follow. In the meantime, if you would like to discuss your analysis with me, please reach me at jhr@temple.edu.

The code for Python and R is at:

https://github.com/wsteenbeek/NearRepeat

and

https://github.com/tobydavies/NearRepeat

139 views
© 2020-21 by Jerry Ratcliffe