D3 Page Template

Better Know a Justice

Supreme Court data is bigger than Justice Taft's bathtub

Supreme Court data is bigger than Justice Taft's bathtub

Note: This is a project post, so if you're not interested in learning how I did this, just scroll down and mouse over the viz 💁 (and then scroll back up and read this when you realize you need an explanation for what you're looking at).


The supreme court is a very complex and difficult to understand topic for anyone without a law degree, which is why this is an interesting topic for natural language processing.


The visualization below is the first iteration of an on-going project I am doing with 200 years of supreme court opinions (sample of 10k opinions for this project) I gathered from around the web. 




What am I looking at?   Every supreme court justice in the history of the court grouped by the similarity of their speech patterns. The size of each of their bubbles represents the uniqueness of their own speech. Hover over each bubble for more info!

How did you do this?    For the data science part of this project, I used Term Frequency Inverse Document Frequency (t-FIDF), which means that unique words that each justice has in common with another justice are strongly up-weighted, while very common words that all justices say all the time are strongly down-weighted (for example, "habeas" holds more weight than "the"). After I vectorized the words, I performed K-means clustering (4 clusters was the optimum number based on several iterations) - the code for all of this is in various Jupyter Notebooks in this repo. The visualization was made with D3 - my code for this visualization can be found here.

Why did you do this?    For this iteration of the project, I wanted to get a sense of similarities in speech pattern as a predictor for what we could expect a nominee to be like once he's on the court. These clusters demonstrate that Garland is more similar to Chief Justice Roberts than everyone else on the court. 

What's next?   What isn't next?    There are so many things I want to do with this, both with data science and dataviz. Next I will be pulling in voting record and triangulate that with the text of each. After this, I plan to do a timeline of justices, associated news stories and text summarization for each opinion.