Project Background and Motivations
Crime rates in and around the University are a prime concern for many University community members and prospective students. This past October, the University saw a slew of armed robberies in Hyde Park and on campus. Posts on Reddit surrounding the University of Chicago regularly consist of questions about how safe it is to attend the University and live in the Hyde Park area. However, research shows that there generally exists some level of misalignment between actual crime rates and perception of crime rates. According to a 2023 study by the American Psychological Association, the perceptions of local and national crime rates vary distinctly based on the type of media a person takes in, with users of crime reporting applications like Citizen reporting higher local crime rates independent of a local area’s actual crime rate. Gallup polls show that people’s assumptions of local crime rates around the country are at a 30-year high. Despite this perception, national violent crime rates are 53 percent of what they were, and property crimes are 39 percent of what they were 30 years ago, according to the Federal Bureau of Investigation.
The Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act, generally called the Clery Act, stipulates that universities report crime statistics relating to general criminal offenses, violence against women, hate crimes, and arrests or deferrals for disciplinary action as well as campus safety information to the student body. The federal act also requires universities to maintain a daily crime log and fire log, which the University of Chicago Police Department (UCPD) releases on the University’s behalf.
However, the UCPD’s crime and fire log only shows up to five incidents at a time, does not have a built-in map to show incident locations, and does not allow users to filter searches by any factor other than date ranges. While it is somewhat cumbersome to navigate, it is worth noting that UCPD does give its users the ability to access the historical record of crime and fire logs. In comparison, DePaul University’s crime and fire log only shows a brief description of the event and the date in the format of month and year, e.g. January 27.
To make the data more accessible, The Maroon’s Technology team developed the UChicago Police Department Incident Reporter, a web application consisting of two distinct components: one to scrape and clean the data for every incident reported to UCPD from July 2011 to the present, and another to visualize it. The project strives to paint a more nuanced picture of incidents reported to UCPD and highlight any patterns that may exist within the incident data.
What is web scraping?
Web scrapers are applications that download webpages, parse out meaningful information, and presumably save that parsed information for later analysis. It is typically done using an application due to difficulties regarding the required work. For example, suppose one wanted to get all UCPD incident data between January 1 and 22 2023. In that case, one would have to manually navigate through 21 pages of the log and do an immense amount of copying and pasting. Meaningfully parsing that data (getting latitude and longitude points for the location, creating dates based on reported times, standardizing incident types, etc.) would take at least an hour. Using a web scraper for this task takes under 20 seconds.
UCPD Web Data Reporting Application
Current features of the web scraping application are:
- Scraping: Incidents are scraped at regular intervals based on the reported date of the last saved incident.
- For example, if the last saved incident took place on September 30, 2022, the scraper application will download, parse, and save the incident data from this date forward.
- Incident formatting: The incidents are formatted from raw text into various data types to make them more conducive for current and future analysis.
- All addresses are searched for and labeled according to the results returned from the Google Maps API.
- Incident types are normalized to ease user readability and machine learning categorization in future analysis. The normalization process includes lemmatization, addressing spelling errors, and capitalization consistency.
- Incident categorization: Incidents solely categorized as ‘Information’ are sent through an XGBoost model in hopes of finding incident information not recorded by UCPD’s categorization model.
- Incident visualization: Incidents are saved for rendering on graphics that update as new incidents are added to the cloud datastore.
The Maroon firmly believes in the power of open data and that all public UCPD data should be easily viewable. However, because the platform is built for data transparency rather than investigation, incident reports relating to sexual assaults, medical evacuations, domestic violence, and the like have been omitted from the map. Such incidents, however, are helpful in aggregate for analysis and can serve as indicators for overall safety in Hyde Park regarding reported incidents of a sensitive nature. A complete list of omitted incident types from location-specific visuals is available here.
The data in the UChicago Police Department Incident Reporter visualizations is updated every weekday morning. In addition, any bugs in the processing pipeline are addressed on a regular basis and new features are added as time allows. The project intends to provide residents of Hyde Park with a more objective view of neighborhood crime rates and build trust between the people who patrol the area and those who live within its borders.
Technology editor’s note: If you’d like to see any analysis in this project going forward, The Maroon invites you to leave an issue on the UCPD Incident Reporting GitHub repository. We cannot promise we’ll be able to get to all requests. Still, The Maroon welcomes any recommendations or pull requests to help make this application more beneficial for the UChicago and Hyde Park communities.