This past weekend, I attended an event put on by the Cook County State Attorney’s Office focusing on the SAO’s public case-level data. It was a super informative, caffeine-filled couple of days.

I put together this post together hoping to share some of the things I learned. It’s structured as a “couch to 5k” type exercise (yay!) — from the very fundamentals of R to an analysis of monthly drug felonies by race.

1. Installing R.

For Windows:

  • Download R (R is a language and enviroment for statistical computing and graphics).
  • Once the setup file is downloaded, run the file and follow the instructions to install R on your machine (you can use the default options)
  • Download RStudio (RStudio is  a free and open-source integrated development environment for R)
  • Once the setup file is downloaded, run the file and follow the instructions to install R on your machine (you can use the default options)

For Mac:

  • Download R
  • Once the setup file is downloaded, move the file to the Applications folder and follow the instructions to install R on your machine (you can use the default options)
  • Download RStudio
  • Once the setup file is downloaded, move the file to the Applications
    folder1 and follow the instructions to install R on your machine (you
    can use the default options)

2. Loading the data in RStudio

Step I: Downloading the data

  1. The data tables with the case level data that we will be using are available
  2. To download the “Initiation” table, click “Initiation”.


3. On the new tab that opens once you click this, click “Export” and then
“CSV” to download the data as a CSV file.


4. Repeat this for the other three files, “Dispositions”, “Sentencing”, and
“Intake”, on the main page here.

5. If you’re using a Mac, create a new folder on the desktop and name it
“SAO_Data”. If you’re using Windows, create a new folder in your
“Documents” folder and name it “SAO_Data”.

6. Move the downloaded files to this new folder.

Step 2: Loading the data in RStudio

1. Once you have installed R and RStudio and have downloaded all the data
tables and moved them to the “SAO_Data” on your desktop, open RStudio
by clicking the RStudio icon on your desktop. If you don’t see the RStudio
icon on your desktop, open Run (if you’re using Windows) or Finder (if
you’re using Mac).

2. Once it opens, you should see 4 panels on your screen:


3. In the bottom left panel “console” copy and past the following command and press enter:
For Windows users:


For MAC users:


4. Now you’re ready to load the four tables (yay!). Run the following command in the console:

initiation <- read.csv("Initiation.csv")

This will load the csv files into a “data frame” or data set. Go through for each file (initiation, disposition, sentencing, and intake) replacing the above code with the respective file name.

disposition <- read.csv("Dispositions.csv")
sentencing <- read.csv("Sentencing.csv")
initiation <- read.csv("Intake.csv"

5. Once you have run all four commands, you should see the data frames appear in the top right panel.

3. Overview of Data

  • Sentencing: The sentencing data presented in this report reflects the judgment imposed by the court on people that have been found guilty. Each row represents a charge that has been sentenced.
  • Dispositions: The disposition data presented in this data reflects the culmination of the fact-finding process that leads to the resolution of a case. Each row represents a charge that has been disposed of.
  • Initiation: The Initiation results data presented here reflects all of the arrests that came through the door of the State’s Attorneys Office (SAO). An initiation is how an arrest turns into a “case” in the courts. Most cases are initiated through a process known as felony review, in which SAO attorneys make a decision whether or not to prosecute. Cases may also be indicted by a grand jury or, in narcotics cases, filed directly by law enforcement (labeled “BOND SET (Narcotics)” in this data). Included in this data set are the defendant counts by initiation and year. This data includes felony cases handled by the Criminal, Narcotics, and Special Prosecution Bureaus. It does not include information about cases processed through the Juvenile Justice and Civil Actions Bureaus.
  • Intake: The intake data presented in this data reflects the cases brought in for review. Each row represents a potential defendant in a case.

Here’s a case-level matrix to help further understand the data sets.

4. Exploring the Data

View the data

To view the data either click the desired table in the top right panel or use this code in the console:


To get an overview of a column, you can create a frequency table (this tells us every value of the SENTENCE_TYPE column and how many times that value appears in the data set).


This uses the table function to analyze the options in the SENTENCING_TYPE column within the sentencing data frame. Below are the results you should see.


Filter the data

One way to filter data is creating new subsets of data based on certain criteria. Below are some examples:

sentence_female <- sentencing[which(sentence$GENDER == "Female"),] ## filters only the females
sentence_under21 <- sentencing[which(sentence$AGE_AT_INCIDENT <= 21),] ## filters only younger than or equal to 21 years old
sentence_probation <- sentencing[which(sentence$SENTENCE_TYPE %in% c("Probation", "2nd Chance Probation")),] ## %in% means 'is one of'

Another way to filter the data to make it more manageable is to select only specific columns for a new data frame.

cases <- sentencing[,c("CASE_ID", "CASE_PARTICIPANT_ID", 
                     "AGE_AT_INCIDENT", "GENDER", "RACE", 
                     "LENGTH_OF_CASE_in_Days","INCIDENT_CITY")] ##This will create a new data frame that only includes these specific columns

Merging with Another Data set

With this data, it is helpful to combine multiple data sets into one. For example, if I wanted to look information in the sentencing and dispositions data frames I could run the following code:

merged <- merge(sentence, dispositions, by=c("CASE_ID","CASE_PARTICIPANT_ID","CHARGE_ID"))

Note: the merge() function above is assuming that the columns CASE_ID, CASE_PARTICIPANT_ID, and CHARGE_ID exist in both data sets. If the column names were different you would need to specify these more explicitly. Such as:

merge(data1, data2, by.x="column_in_data1", by.y="column_in_data2")

5. Keep going!

Those were some basics to get you started — now comes the fun!

I’m including an example script to look through and play around with. It walks you through what exploring monthly drug felonies by race would look like in R.

  1. Click here.
  2. Download the file.
  3. Open in RStudios.
  4. Step through the code and comments.
  5. Explore the data for yourself!