Statistical Methods Applied

This week, I applied statistical tests to validate the patterns observed in my visualizations. I used the Shapiro-Wilk test to assess normality in age distributions, followed by t-tests to confirm age differences between Black and White individuals. To measure the size of the effect, I calculated Cohen’s d and found a moderate value of 0.58. This suggested the age difference wasn’t just statistically significant—it was also meaningful in real-world terms. I also conducted a permutation test as a backup check, which gave a p-value of 0.0, strongly confirming the age gap. At the same time, I reviewed how armed status and fleeing behavior varied by race, though those results were less pronounced. These tests gave me the statistical foundation to back up my initial observations.

Visual Exploration

After preprocessing the dataset, I moved on to exploratory visualizations. I created cumulative distribution function (CDF) plots to examine age differences across racial groups. These plots quickly revealed noticeable age gaps—Black victims appeared significantly younger compared to White victims. Using Seaborn and Pandas, I built several plots that confirmed these visual trends. I also began examining other factors like fleeing status and whether the individual was armed. The visuals raised new questions about systemic bias and how age may play a role in police decision-making. These insights helped me shape the direction of my statistical testing. I saved snapshots of these plots and added them to the final report draft.

Getting Started

This week, I began exploring the Washington Post police shootings dataset for our first project. My focus was on understanding the structure, identifying missing or inconsistent values, and familiarizing myself with key variables. Early on, I noticed inconsistent entries in fields like “weapon type” and “fleeing status.” To prepare for deeper analysis, I standardized these variables and filtered the data to include only Black and White individuals. I also reviewed the column data types and made sure categorical and numerical values were properly formatted. These initial cleaning steps laid the groundwork for my statistical analysis. As I cleaned the dataset, I started thinking about questions related to age, race, and potential disparities in how police shootings affect different groups. I documented these points as a foundation for hypothesis testing in later stages of the project.


Diving into the Washington Police Shootings Dataset

This week marked the beginning of my exploration of the U.S. police shootings dataset assigned for our first project. I spent time familiarizing myself with its structure and scanning for any inconsistencies or missing data. Early on, I noticed irregular entries in fields like weapon type and fleeing status. To make future analysis smoother, I standardized these columns. I also started forming some initial research questions, especially around age patterns in shootings and potential racial disparities.