IMDB Public Data Breakdown

Inspired by: 

Max Woolf

When I stumbled across Max Woolf’s blog post about International Movie Data Base ( data visualization I knew what I wanted to do for my final project for Modeling and Analytics in business school at Tulane University. As a veteran of the film industry, I wanted to stay close to home with the dataset that I chose. I also wanted to challenge myself and go outside the scope of work. Little did I know that the dataset itself would throw me outside my comfort zone.

Before graduate school, the biggest excel spreadsheet I opened was probably 200 rows. The files I would be working with in this (pictured right) would be over 10,000,000 observations. I was naïve enough to assume that I would not have to follow Mr. Woolf and excel could handle a data set like this.  I must have reset my computer ten times before I realized that excel could not handle a data set the size of IMDB. 

Starting TIMDB Dataframes

I ended up learning a lot about the mechanics of R as a language and how it can be a great use in data science for projects in the future. It’s a mix between excel, eViews, and SQL. Below you can see my very noob tactics of comments basically every line, making sure I can try and understand the commands. It will be interesting to see how the future classes teach languages as it seems to be a fight between python, r, and SQL depending on what you are doing. 

R Code Overview

The original question for the data set was to find out if the age of a lead actor or actress had any affect on the ratings of the shows. The data was very obvious in showing that there was a much lower top age accepted for actresses over their male counterparts. 

That’s when I moved to my next observation….


Why did does the age trend upward?

My theory that one of the main reasons we see an upward trend in age for woman and men towards the end of the 20th century is because of the adoption of the internet. My belief is that because of less constrictions on how image was promoted, individuals had more opportunity to spread their personal brands. Traditional conglomerates controlled the pipeline of media and the choice of consumption, but once the internet came that power slowly slipped to the talent.  


Check out my presentation below and let me know what you think!

IMDB Public Data Breakdown
Scroll to top