Final Thoughts and Reflections

With both major projects now completed, I took some time this week to reflect on everything I’ve learned over the past few months. From analyzing racial and age disparities in police shootings to uncovering patterns of political violence across the U.S., the experience has been intense and eye-opening. Project 1 helped me understand how statistical tools like t-tests and Cohen’s d can quantify bias and show meaningful differences between demographic groups. The visualizations and cumulative plots revealed age gaps that go beyond numbers—they highlight systemic issues.

In Project 2, I switched gears toward unsupervised learning and spatial clustering. Using K-means on real-world event data showed how regional and temporal patterns emerge from political unrest. The clusters not only told us where violence was happening, but what kind of unrest was occurring—whether protests, battles, or violence against civilians. I also learned the importance of careful data cleaning, feature selection, and visualization in driving analysis that makes sense both statistically and socially.

Across both projects, the biggest takeaway was how data science can contribute to social understanding and policy change. These aren’t just numbers—they’re lives, locations, and events that matter. I’m grateful to have had the opportunity to dig deep into such meaningful topics. Looking forward, I hope to keep combining data science with real-world issues to build more awareness and create impact through insight.

Wrapping Up and Policy Implications

In the final stretch, I completed the written report and finalized conclusions. The clustering model helped us uncover meaningful patterns in political violence—each region had distinct behaviors in both event type and intensity. One of the key takeaways was the rise in protest activity over recent years, paired with a decline in average fatalities. This may reflect changing public behavior or improved law enforcement protocols. Our discussion emphasized how clustering tools can be useful for early-warning systems and policy decisions. The project wrapped up on a strong note, with a clear narrative, sound statistics, and actionable insights. Submission completed!

Final Visualizations and Storytelling

As we neared the project deadline, I refined our visuals for clarity and impact. I redesigned the scatter plot to highlight each cluster’s dominant event type, added legends, and used subtle color themes for accessibility. I also created a composite map that showed cluster boundaries, centroid locations, and event types with different shapes. These visuals helped communicate the narrative we were building—that political unrest in the U.S. has regional signatures, with the North experiencing more direct violence and Central regions showing higher protest frequency. I also began drafting the final blog summary for the project.


Yearly Trends and Fatality Reduction

To understand how political unrest evolved over time, I grouped the data by year and event type. A notable trend emerged—between 2018 and 2020, there was a clear spike in protest activity, especially in Central U.S. Simultaneously, the average number of fatalities per event dropped across all regions. I plotted line graphs to show this temporal trend and created heatmaps to track changes in activity by state. These visuals suggested a shift in how unrest is expressed—more protest, less violence. This could be due to improved response protocols or changing public engagement. I’ll explore this further in the discussion section of the report.

Cluster Insights and Event-Type Analysis

Now that the clusters were defined, I dove deeper into the types of events within each. Northern regions, especially around New York and Chicago, showed higher instances of violence against civilians. Central U.S. clusters were dominated by protests, often peaceful but frequent. Southern regions revealed a more mixed profile, including battles and unrest. I also calculated the average fatalities within each cluster, which helped me understand severity in addition to frequency. These regional differences were striking and raised questions about local governance, community response, and law enforcement practices. The visuals and stats from this week added critical depth to the report.

K-Means Clustering and Elbow Method

This week, I implemented K-Means clustering on the spatial data. I used the geopy package to calculate real-world distances and applied the elbow method to find the ideal number of clusters. The curve flattened around k=4, so I proceeded with that. After fitting the model, I interpreted the clusters: one group concentrated in Northern U.S., another in Central, one in the South, and one covering scattered high-density zones. I assigned colors to each cluster and visualized them using matplotlib. This confirmed geographic differences in the type and intensity of political events across regions. It was exciting to finally see structure emerge from the noise.

Feature Selection and Location Mapping

After cleaning the dataset, I focused on choosing features for clustering. Latitude, longitude, year, and event type stood out as the most meaningful. I started mapping event locations using scatter plots to get a visual sense of clustering patterns. Early patterns showed high densities in specific areas like Washington D.C., Portland, and parts of California. This confirmed that location-based grouping would be effective. I also calculated event frequency per state, which revealed states like California and New York had higher counts of protest-related activity. These initial visuals were important for narrowing down clustering methods and identifying possible biases.

Dataset Introduction and Initial Cleaning

This week, we officially transitioned to Project 2 using the ACLED U.S. political violence dataset. I began by loading the dataset and exploring the basic structure—this included thousands of entries covering protest events, civilian-targeted violence, and battles across different U.S. states. One of the first issues I noticed was inconsistent naming in event types and missing location fields. I standardized event-type labels and filtered out rows lacking critical spatial data. After that, I created a checklist of features that might influence regional political unrest, such as year, latitude/longitude, and number of fatalities. This setup prepared me for spatial analysis in the coming weeks.

Project Finalization

With my statistical work completed, I finalized the full report this week. I included cumulative plots, clustering maps, and Monte Carlo simulations to enhance the robustness of the analysis. The report showed that Black individuals were, on average, 7 years younger than White individuals in police shootings. I used both visuals and statistical tests to support this conclusion, making sure each method reinforced the others. In the final discussion, I proposed that systemic bias, economic disparity, and policing patterns could contribute to these differences. I also provided policy recommendations that focused on bias training and community oversight. The report was submitted by the March 7 deadline. It was a comprehensive analysis that highlighted both the human impact and the data-driven patterns behind police shootings in the U.S.

Statistical Methods Applied

This week, I applied statistical tests to validate the patterns observed in my visualizations. I used the Shapiro-Wilk test to assess normality in age distributions, followed by t-tests to confirm age differences between Black and White individuals. To measure the size of the effect, I calculated Cohen’s d and found a moderate value of 0.58. This suggested the age difference wasn’t just statistically significant—it was also meaningful in real-world terms. I also conducted a permutation test as a backup check, which gave a p-value of 0.0, strongly confirming the age gap. At the same time, I reviewed how armed status and fleeing behavior varied by race, though those results were less pronounced. These tests gave me the statistical foundation to back up my initial observations.