About
IMDB is the biggest movie database online with data publicly available for downloand. We downloaded data from 1913 to 2014 using some modified ruby scripts based off Hadley’s originals, and parsed them into format suitable for analysis. The output is a tidy dataset with 5944 films and the following fields:
- title. Title of the movie.
- year. Year of release.
- budget. Total budget in US dollars.
- length. Length in minutes.
- rating. Average rating from IMDB users. The raw ratings are on a scale of 1 (worst) to 10 (best).
- votes. Number of IMDB users who rated this movie.
- r1-10. Distribution of votes for each rating, to mid point of nearest decile: 0 = no votes, 4.5 = 1-9 votes, 14.5 = 11-19 of votes, etc. Due to rounding errors these may not sum to 100.
- mpaa. MPAA rating (ex: R, PG-13).
- boxoffice. Total ticket sales in US dollars.
- actor, actress, director, and writer. String variables giving these people’s names.
- action, animation, comedy, drama, documentary, romance, short. Binary variables representing if movie was classified as belonging to that genre.
We analyzed this dataset. This dashboard presents some of the charts we made using the ezplot R package.
Made by Cabaceo LLC.