Aoristic analysis in R
What is aoristic analysis?
If you are new to aoristic analysis and want to know what it is, as well as get links to relevant papers, please see the main aoristic analysis page. You can also find links to code for aoristic analysis in SPSS and an excel spreadsheet from Andrew Wheeler at that page.
The rest of this page relates to the 'aoristic' package available for R.
Aoristic package functions and development
The development site is hosted at GitHub, though the CRAN version is recommended for stability.
The CRAN site is directly accessed here:
The help associated with the package is directly available here:
This section of the page expects that you have a basic understanding of R, how to use packages, and manipulate data. If you want a longer explanation that makes fewer assumptions, please see the chapter in the Groff and Haberman book described in the references section of this webpage.
I've written an R package called 'aoristic' available through CRAN. To install into R or R Studio, type:
In the package, I included an example data set from the NYPD open data crime archive. It relates to the burglaries in Manhattan borough for the first six months of 2019. You load the data set into memory with:
You can see the basic layout of the data with:
The data fields are:
CMPLNT_FR_DT - the from (or start) date of the burglary
CMPLNT_FR_TM - the from (or start) time of the burglary in Microsoft Excel time format of a fraction of a day
CMPLNT_TO_DT - the end (or to) date of the burglary
CMPLNT_TO_TM - the end (or to) time of the burglary, again as a fraction of 24 hours.
X_COORD_CD - the X coordinate of the crime location, in the local projected coordinate system.
Y_COORD_CD - the & coordinate of the crime location, in the local projected coordinate system.
The start date and time have to be converted into a single date/time object in R. This can be achieved with the following code. First, we convert the start and end times to POSIXct format:
NYburg$CMPLNT_FR_TM <- format(as.POSIXct((NYburg$CMPLNT_FR_TM) * 86400, origin = "1970-01-01"), "%H:%M")
NYburg$CMPLNT_TO_TM <- format(as.POSIXct((NYburg$CMPLNT_TO_TM) * 86400, origin = "1970-01-01"), "%H:%M")
Then we create two new variables that combine the dates with the correctly-formatted times.
NYburg$STARTDateTime <- paste(NYburg$CMPLNT_FR_DT,NYburg$CMPLNT_FR_TM, sep=' ')
NYburg$ENDDateTime <- paste(NYburg$CMPLNT_TO_DT,NYburg$CMPLNT_TO_TM, sep=' ')
The final step is to use the convenient 'lubridate' package to get the date time into the recognized format. If not alreadt installed, make sure to install the library first.
NYburg$STARTDateTime <- ymd_hm(NYburg$STARTDateTime, tz = "")
NYburg$ENDDateTime <- ymd_hm(NYburg$ENDDateTime, tz = "")
You can see the result with:
With the data in the correct format, you can now use the aoristic package.
aor.chk.df <- aoristic.datacheck(NYburg, 'X_COORD_CD', 'Y_COORD_CD', 'STARTDateTime', 'ENDDateTime')
This function checks the source data and tells you (in the console window) if any rows have missing end date/times, and how many (if any) rows have illogical date sequences where the end date and time precede the start date and time.
aor.df <- aoristic.df(NYburg, 'X_COORD_CD', 'Y_COORD_CD', 'STARTDateTime', 'ENDDateTime')
This function creates the aoristic data frame with aoristic weights (probabilities) across all 168 hours of the week. It takes a bit of time to run, so be patient. The console window reports various diagnostics on completion.
aor.df.sum <- aoristic.summary(aor.df)
This function takes the aoristic data frame and creates a summary data frame with results aggregated to hour and day of the week. You can choose optional outputs to an excel spreadsheet or as a jpg image with these options:
p <- aoristic.plot(aor.df)
This function takes the aoristic data frame and creates the same plot as output to the jpeg option above, but does not create the summary data frame. However this R ggplot, and the jpeg output, resemble the grid chart below.
Each cell shows the cumulative aoristic weight for all events that might have occurred in that hour block. The color coding is fixed, and ranges from dark blue to dark red (for the highest probability cells).
The aoristic.graph function produces a graphic output in the plot window that shows eight vertical bar graphs. Each covers the 24 hours of the day for each day of the week (in blue) and one chart for the entire week (in brown). There is an option to add tick marks to the graphs to aid comparison of each day to the overall summary. See ?aoristic.graph
The aoristic.map function allows you to see where the points are for each hour of the week, and their probability (aoristic) score or weight. It is a rudimentary function and if you want more spatial functionality either explore the use of mapping in R or export the data frame to your favorite GIS. The command takes the aoristic.df data frame as one parameter, and the hour you want to see as the other. To see hour 85, you can type:
If you want to know which hour relates to which hour of the week, type:
The NYPD Manhattan borough burglary output for hour 85 (Wednesday from 12:00 to 12:59) looks like this:
Additional aoristic functionality
Any additional functions for the 'aoristic' package not mentioned in this page are indicated by the associated help. For example, to better understand how the aoristic package handles missing or illogical date-time combinations, type:
Or if you want to get a reference data frame that shows which number relates to which hour of the week, type: