Note: This is a project post, so if you're not interested in learning how I did this, just scroll down and mouse over the viz 💁 (and then scroll back up and read this when you realize you need an explanation of what you're looking at).
The Supreme Court is a very complex and difficult to understand topic for most, but is arguably the most important government entity for determining the direction of this country. It's difficult to follow the news about the current nominee, but so important for the average American to grasp an understanding of how the next justice will fare when he (or she!) takes his seat.
The visualization below is the first iteration of an on-going passion project I am working on in which I've gathered every Supreme Court opinion since 1790 (don't worry, I only took a sample of 10k opinions for this iteration), and am mapping the opinions in various ways in attempts to simplify the court for those that want to gain an understanding of this topic without cracking open a history book.
What am I looking at? Every Supreme Court justice in the history of the court grouped by the similarity of their speech patterns. The size of each of their bubbles represents the uniqueness of their own speech. Hover over each bubble for more info!
How did you do this? For the data science part of this project, I used Term Frequency Inverse Document Frequency (t-FIDF), which means that unique words each justice has in common with another justice are strongly up-weighted, while very common words that all justices say all the time are strongly down-weighted (for example, "habeas" holds more weight than "the").
The visualization pulls from this analysis to group similar justices into associated bubbles (I utilized a K-means clustering algorithm, if you want to get technical) - the code for all of this is in various Jupyter Notebooks in this repo. The visualization was made with D3 - my code for this visualization can be found here.
Why did you do this? For this iteration of the project, I wanted to get a sense of similarities in speech pattern as a predictor for what we could expect a nominee to be like once he's on the court. These clusters demonstrate that Garland is more similar to Chief Justice Roberts in language patterns than any other justice currently on the court.
What's next? There are so many things I want to do with this, both with data science and dataviz. Next I will be pulling in voting record and mapping that against the opinion text. After this, I plan to do a timeline of cases, associated news stories and text summarization for each opinion.