As we all know, Colorado is considered one of the scariest places on earth. Denver, CO has had an enormous influx of people over the last decade and it is still ramping up.
So why did I pick Denver?
That’s simple, I have lived in Colorado for the majority of my life and want to know more about my capital city.
Exploration of Data
Data provided by https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-crime
What we’ll do in this post
- Import the crime.csv data set
- Format the data
- Plot the total number of incidents reported by year
Let’s dive in!
Import the necessary libraries
library(dplyr) library(lubridate) library(ggplot2) options("stringsAsFactors" = TRUE)
Load the crime data set provided
It is possible to load the data straight from the URL, however, it’s over 80MB in size, so I simply downloaded it. It is updated regularly, so this post may need to be refreshed from time to time.
#### # Data from: http://data.denvergov.org/dataset/city-and-county-of-denver-crime # File name: crime.csv CWD = getwd() data = read.csv(paste(CWD,'/data/crime.csv',sep='')) ####
Format the data
I added columns for year, month, day and hour into the dataframe in order to simplify life. It takes a bit more time and ram upfront but I prefer to see it that way.
#Format FIRST_OCCURRENCE_DATE as.Date and use as crime date (for now) data$date = as.Date(data$FIRST_OCCURRENCE_DATE) #Create new columns for grouping data$year = year(data$date) data$month = month(data$date) data$day = day(data$date) data$hour = hour(data$FIRST_OCCURRENCE_DATE) print(colnames(data))
ggplot2 will provide a decent chart to show us the number of incidents each year.
#Sum up all incidents IS_CRIME AND IS_TRAFFIC maxYear = max(data$year) maxMonthYTD = max(data$month[data$year==maxYear]) df = data %>% group_by(year,month) %>% filter(month < maxMonthYTD) %>% summarise(incidents = sum(IS_CRIME) + sum(IS_TRAFFIC)) %>% arrange(month) p = ggplot(df) p + geom_bar(aes(x = factor(year), weight = incidents)) + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5))
Adding Some Color
Looking at the same plot but adding in colors for each month of the year.
#Stack bars in colors to view individual months p = ggplot(df,aes(x=factor(year),y=incidents,fill=factor(month))) p + geom_bar(stat='identity') + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5)) + guides(fill = guide_legend(title='Month'))
How much is labeled as TRAFFIC or CRIME
Looking at the same plot but separating out ISTRAFFIC from ISCRIME
Stack bars in colors to view individual months
tmp= data tmp$crimeType[tmp$IS_CRIME == 1] = 'Crime' tmp$crimeType[tmp$IS_CRIME == 0] = 'Traffic' tmp$crimeType = factor(tmp$crimeType) df = tmp %>% group_by(year,crimeType) %>% filter(month < maxMonthYTD) %>% summarise(crimeIncidents = sum(IS_CRIME) + sum(IS_TRAFFIC)) %>% arrange(year) p = ggplot(df,aes(x=factor(year),y=crimeIncidents,fill=crimeType)) p + geom_bar(stat='identity') + ggtitle('Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5)) + guides(fill = guide_legend(title='Incident Type'))
Having isolated only months that have occurred in each year, we’ve seen volume increase most years. The most rapid growth seemed to occur between 2012 – 2014. It appears as if traffic violations seem to be roughly flat and the growth in crimes is much higher. I’ll have to dig into those years in order to see if there’s evidence of a change in crime rate or if something else is hiding in the data.
What I’ll do in the next crime posts
- Determine year-over-year differences in crime
- Dig into the apparent crime rate growth from 2012 – 2014
- Look for patterns by location
- Answer the question: What types of crimes have grown the most in the last 5 years?
Code used in this post is on my GitHub