Now that the clusters were defined, I dove deeper into the types of events within each. Northern regions, especially around New York and Chicago, showed higher instances of violence against civilians. Central U.S. clusters were dominated by protests, often peaceful but frequent. Southern regions revealed a more mixed profile, including battles and unrest. I also calculated the average fatalities within each cluster, which helped me understand severity in addition to frequency. These regional differences were striking and raised questions about local governance, community response, and law enforcement practices. The visuals and stats from this week added critical depth to the report.
K-Means Clustering and Elbow Method
This week, I implemented K-Means clustering on the spatial data. I used the geopy
package to calculate real-world distances and applied the elbow method to find the ideal number of clusters. The curve flattened around k=4, so I proceeded with that. After fitting the model, I interpreted the clusters: one group concentrated in Northern U.S., another in Central, one in the South, and one covering scattered high-density zones. I assigned colors to each cluster and visualized them using matplotlib. This confirmed geographic differences in the type and intensity of political events across regions. It was exciting to finally see structure emerge from the noise.
Feature Selection and Location Mapping
After cleaning the dataset, I focused on choosing features for clustering. Latitude, longitude, year, and event type stood out as the most meaningful. I started mapping event locations using scatter plots to get a visual sense of clustering patterns. Early patterns showed high densities in specific areas like Washington D.C., Portland, and parts of California. This confirmed that location-based grouping would be effective. I also calculated event frequency per state, which revealed states like California and New York had higher counts of protest-related activity. These initial visuals were important for narrowing down clustering methods and identifying possible biases.
Dataset Introduction and Initial Cleaning
This week, we officially transitioned to Project 2 using the ACLED U.S. political violence dataset. I began by loading the dataset and exploring the basic structure—this included thousands of entries covering protest events, civilian-targeted violence, and battles across different U.S. states. One of the first issues I noticed was inconsistent naming in event types and missing location fields. I standardized event-type labels and filtered out rows lacking critical spatial data. After that, I created a checklist of features that might influence regional political unrest, such as year, latitude/longitude, and number of fatalities. This setup prepared me for spatial analysis in the coming weeks.
Project Finalization
With my statistical work completed, I finalized the full report this week. I included cumulative plots, clustering maps, and Monte Carlo simulations to enhance the robustness of the analysis. The report showed that Black individuals were, on average, 7 years younger than White individuals in police shootings. I used both visuals and statistical tests to support this conclusion, making sure each method reinforced the others. In the final discussion, I proposed that systemic bias, economic disparity, and policing patterns could contribute to these differences. I also provided policy recommendations that focused on bias training and community oversight. The report was submitted by the March 7 deadline. It was a comprehensive analysis that highlighted both the human impact and the data-driven patterns behind police shootings in the U.S.