top of page

Mass Shootings in the US

Introducing the Issue

In this project I will explore data related to the mass shootings in the United States, The main reason behind wanting to dive into data relating to a shooting that was happening all around the United States was because of the reason that I exactly stated. As most people know the United States is one of the most dangerous places in the entire world when it comes to mass shootings that happened whether that be at a festival parade, school, etc. I wanted to find out where the brunt of the shootings happened, what state has the highest shootings compared to other states, and what states are on the lower end of the spectrum. 

​

So summed up into a sentence, I wanted to find where the occurrence of the most school shootings happened and where is was the most deadly.

​

Introducing the data

​

This is the link for the DataSet that I will be using. The DataSet was found on a website filled with DataSet called Kaggle.  This has information about the mass shootings in the United States over the last 50 years. I picked this specific piece of data because it was a combination of very well-gathered information with as few missing details as possible. The dataset is a big list of data that I found which looking form the front seemed to possess no missing values and had a lot of interesting columns that I could pull out data from and make charts with. 
Link - https://www.kaggle.com/datasets/zusmani/us-mass-shootings-last-50-years

​

Pre-Processing the data

​

Now I am going to go through what I did with the data. So the very first thing that I did when I imported the data was I checked the columns to see what I had to work with. The next thing that I did shortly after checked what the data types of said columns were. Here I was checking if there were going to be any null values and if so to get rid of them. The thing with my data set however was that there was not a single null value except there was a row that was a column full of data that was not useable and the values were all null. What I proceeded to do was drop the entire column and then I started double checking to see if there were any null values which surprisingly there were none. I made sure that the rest of the data could be used in a way that I could make some accurate visualizations, which brings me to my next point

​

Visualization

​

​

​

​

​

​

​

​

​

​

​

​

​

​

 

 

 

 

 

This scatters plot shows the amount people that were either injured or mortally wounded by mass shootings. Here we can also see that a high concentration of mass shootings that leads to casualties is in texas.​ There are multiple reasons that this could be the case. I believe it is mainly because of the gun laws that exist in texas however that is beside the point. We can also see another outlier that took place in Nevada.

 

Story Telling

 

I think I have learned a lot from the visualizations that I have created. I think the most important thing that I found was don't judge a state by its rumors. What I mean to say is you see all of these states that have a really high density of population in a smaller areas like New York and California however mass shootings aren't as targeted by these areas. Mass shootings can take place anywhere and everywhere so you always have to be on the lookout for signs that something doesn't feel right. I also discovered that texas has the densest area of mass shootings that take place.

 

Impact

 

I feel like this data would affect possibly people's perspectives on some states and what should and shouldn't be allowed with guns. I think this will give people insight into what states have the most mass shooting and they need to have their gun laws checked out. I think that this data could be used interestingly in different cases around the world. Some of these cases are definitely bad.  For example with the number of sick people in this world, someone might try to bring down the reputation of another state because they don't like America or whatever they might be thinking. I think that another set of data that could be valuable and I could've used to draw up some more correlations would be how many people that were convicted for these crimes either showed signs of being mentally ill or were mentally ill.

 

References

​

https://www.kaggle.com/datasets/zusmani/us-mass-shootings-last-50-years

 

Code

 

import numpy as np

import pandas as pd

mass_shoot = pd.read_csv("mass_shooting_data.csv")

mass_shoot.head()

mass_shoot.dtypes

mass_shoot.columns

mass_shoot[mass_shoot['State'] == 'North Carolina']

len(mass_shoot)

mass_shoot = mass_shoot.drop(columns = "Operations")

mass_shoot = mass_shoot.reset_index(drop=True)

import seaborn as sns

sns.pairplot(data=mass_shoot)

sns.set(rc={"figure.figsize":(20,10)}) # change the figure size to width=20, height=12

sns.scatterplot(data=mass_shoot, x='State', y='# Injured', hue = '# Killed').set_title('Injuries/Casualties per State')

plt.xticks(rotation=90)

​

Screen Shot 2022-09-09 at 3.01_edited.jp
bottom of page