Raw Data

A csv was pulled from the IMDb website here. This didn't have all of the data I needed to do anything interesting, but what it did have was a "Const" column containing an unambiguous identifier for every movie. This column was used to query the OMDB api to pull the rest of the data of interest using using the following code:

                      
                        count = 0
                        for index, row in main_df.iterrows():
                            count += 1
                            try:
                                print(f"#{count}: Getting data for {row['Title']}")
                                ID = row['Const']
                                call = f"http://www.omdbapi.com/?i={ID}&apikey={key}"
                                data = req.get(call).json()
                                main_df.loc[index,'Box_Office'] = data['BoxOffice']
                                main_df.loc[index, 'Rated'] = data['Rated']
                                main_df.loc[index, 'Production'] = data['Production']
                                main_df.loc[index, 'Country'] = data['Country']
                                if len(data['Ratings'])>0:
                                    for i in data['Ratings']:
                                        if i['Source'] == 'Rotten Tomatoes':
                                            main_df.loc[index,'Rotten_Tomatoes_Rating'] = i['Value']
                                        if i['Source'] == 'Metacritic':
                                            main_df.loc[index,'Metacritic_Rating'] = i['Value']
                                try:
                                    main_df.loc[index, 'Home_Release'] = data['DVD']
                                except:
                                    main_df.loc[index, 'Home_Release'] = np.NaN
                                main_df.loc[index, 'Awards_Blurb'] = data['Awards']
                                main_df.loc[index, 'Languages'] = data['Language']
                            except:
                                print(f"Unable to get data for {row['Title']}")
                      
                    

A little bit of cleaning later and we had the dataset. The table below has been abridged to the top 100 rows of the 10 columns of interest to fit well. The unabridged dataset can be downloaded from the button below.

Const Title IMDb Rating Rotten Tomatoes Rating Metacritic Rating Num Votes Box Office Rated Awards Blurb Awards
0 tt0035423 Kate & Leopold 6.4 50.0 44.0 77905.0 47095453.0 PG-13 Nominated for 1 Oscar. Another 1 win & 4 nominations. {'Oscar.': 1, 'win': 1, 'nominations.': 5}
1 tt0065643 The Dirty Mind of Young Sally 5.2 NaN NaN 226.0 NaN X NaN {}
2 tt0067716 Schlock 5.7 71.0 NaN 1221.0 NaN PG 1 win & 1 nomination. {'win': 1, 'nomination.': 1}
3 tt0068156 1776 7.6 69.0 NaN 7588.0 NaN G Nominated for 1 Oscar. Another 1 win & 1 nomination. Could not parse awards
4 tt0068168 Across 110th Street 7.0 81.0 NaN 4923.0 NaN R NaN {}
5 tt0068280 Black Girl 7.0 NaN NaN 364.0 NaN PG NaN {}
6 tt0068364 The Cheerleaders 5.4 NaN NaN 1128.0 NaN R NaN {}
7 tt0068369 Child's Play 6.0 NaN NaN 928.0 NaN PG 1 nomination. {'nomination.': 1}
8 tt0068370 Children Shouldn't Play with Dead Things 5.4 42.0 NaN 3991.0 NaN PG NaN {}
9 tt0068451 The Day the Clown Cried NaN NaN NaN NaN NaN NaN NaN {}
10 tt0068452 Deadhead Miles 5.5 NaN NaN 193.0 NaN R NaN {}
11 tt0068528 The Effect of Gamma Rays on Man-in-the-Moon Marigolds 7.5 73.0 NaN 2092.0 NaN PG Nominated for 1 Golden Globe. Another 4 wins & 2 nominations. {'Golden': 1, 'wins': 4, 'nominations.': 3}
12 tt0068595 Flesh Gordon 4.7 67.0 NaN 3839.0 NaN R 1 nomination. {'nomination.': 1}
13 tt0068622 Gargoyles 6.2 40.0 NaN 2508.0 NaN Not Rated Won 1 Primetime Emmy. Another 1 nomination. {'Primetime': 1, 'nomination.': 1}
14 tt0068638 The Getaway 7.4 86.0 55.0 27341.0 NaN PG Nominated for 1 Golden Globe. Another 1 win & 1 nomination. Could not parse awards
15 tt0068644 Go Ask Alice 6.2 NaN NaN 864.0 NaN NaN NaN {}
16 tt0068649 The Gore Gore Girls 5.3 NaN NaN 2160.0 NaN X 1 nomination. {'nomination.': 1}
17 tt0068687 The Heartbreak Kid 6.9 92.0 NaN 3878.0 NaN PG Nominated for 2 Oscars. Another 3 wins & 6 nominations. {'Oscars.': 2, 'wins': 3, 'nominations.': 8}
18 tt0068699 High Plains Drifter 7.5 96.0 69.0 48781.0 NaN R NaN {}
19 tt0068720 The House Without a Christmas Tree 8.2 NaN NaN 616.0 NaN Unrated Won 1 Primetime Emmy. Another 1 nomination. {'Primetime': 1, 'nomination.': 1}
20 tt0068762 Jeremiah Johnson 7.6 95.0 75.0 27997.0 NaN GP 1 win & 1 nomination. {'win': 1, 'nomination.': 1}
21 tt0068850 Love and Pain and the Whole Damn Thing 6.8 NaN NaN 734.0 NaN PG NaN {}
22 tt0068853 The Life and Times of Judge Roy Bean 7.0 80.0 NaN 7121.0 NaN PG Nominated for 1 Oscar. Another 2 nominations. {'Oscar.': 1, 'nominations.': 3}
23 tt0068918 Manson 7.3 NaN NaN 904.0 NaN R NaN {}
24 tt0068931 The Mechanic 6.9 29.0 51.0 12224.0 NaN PG NaN {}
25 tt0069002 The Night Strangler 7.5 86.0 NaN 2749.0 NaN Not Rated 1 nomination. {'nomination.': 1}
26 tt0069087 The Pig Keeper's Daughter 4.7 NaN NaN 609.0 NaN NaN NaN {}
27 tt0069113 The Poseidon Adventure 7.1 79.0 70.0 40077.0 NaN PG Won 1 Oscar. Another 4 wins & 13 nominations. {'Oscar.': 1, 'wins': 4, 'nominations.': 13}
28 tt0069282 Slither 6.2 86.0 NaN 1189.0 NaN PG 1 nomination. {'nomination.': 1}
29 tt0069291 Snowball Express 6.5 NaN NaN 1465.0 NaN G NaN {}
30 tt0069404 Travels with My Aunt 6.5 57.0 NaN 1930.0 NaN PG Won 1 Oscar. Another 10 nominations. {'Oscar.': 1, 'nominations.': 10}
31 tt0069449 Up the Sandbox 5.9 60.0 NaN 1264.0 NaN R NaN {}
32 tt0069686 The Men Who Made the Movies: Alfred Hitchcock 7.3 NaN NaN 296.0 NaN NaN NaN {}
33 tt0069704 American Graffiti 7.4 96.0 97.0 80259.0 NaN PG Nominated for 5 Oscars. Another 9 wins & 8 nominations. {'Oscars.': 5, 'wins': 9, 'nominations.': 13}
34 tt0069754 The Baby 6.1 93.0 NaN 2820.0 NaN PG NaN {}
35 tt0069762 Badlands 7.8 98.0 93.0 64036.0 NaN PG Nominated for 1 BAFTA Film Award. Another 3 wins. Could not parse awards
36 tt0069765 Bang the Drum Slowly 6.9 92.0 80.0 5255.0 NaN PG Nominated for 1 Oscar. Another 2 wins. Could not parse awards
37 tt0069768 Battle for the Planet of the Apes 5.5 36.0 40.0 28321.0 NaN G 2 nominations. {'nominations.': 2}
38 tt0069792 Black Caesar 6.5 58.0 NaN 2995.0 NaN R NaN {}
39 tt0069796 Black Snake 5.1 NaN NaN 533.0 NaN R NaN {}
40 tt0069797 Blade 5.5 NaN NaN 204.0 NaN R NaN {}
41 tt0069808 Blume in Love 6.4 83.0 NaN 899.0 NaN R 1 nomination. {'nomination.': 1}
42 tt0069822 Breezy 7.0 NaN 68.0 4210.0 NaN R NaN {}
43 tt0069834 Cahill U.S. Marshal 6.6 NaN NaN 5800.0 NaN PG NaN {}
44 tt0069840 The Candy Snatchers 6.4 83.0 NaN 1240.0 NaN R NaN {}
45 tt0069865 Charley Varrick 7.5 83.0 NaN 10715.0 NaN PG Won 1 BAFTA Film Award. Another 1 nomination. {'BAFTA': 1, 'nomination.': 1}
46 tt0069883 Cinderella Liberty 6.8 63.0 NaN 1859.0 NaN R Nominated for 3 Oscars. Another 1 win & 6 nominations. {'Oscars.': 3, 'win': 1, 'nominations.': 9}
47 tt0069890 Cleopatra Jones 5.9 79.0 NaN 2568.0 NaN PG NaN {}
48 tt0069895 The Crazies 6.1 67.0 63.0 11011.0 NaN R NaN {}
49 tt0069897 Coffy 6.8 79.0 60.0 9724.0 NaN R NaN {}
50 tt0069919 Cops and Robbers 6.3 NaN NaN 677.0 NaN PG NaN {}
51 tt0069945 Dark Star 6.3 80.0 66.0 21782.0 NaN G 1 win & 2 nominations. {'win': 1, 'nominations.': 2}
52 tt0069946 The Day of the Dolphin 6.1 42.0 NaN 2637.0 NaN PG Nominated for 2 Oscars. Another 1 win & 2 nominations. {'Oscars.': 2, 'win': 1, 'nominations.': 4}
53 tt0069952 Deadly Weapons 3.9 NaN NaN 849.0 NaN R NaN {}
54 tt0069957 Deep Throat Part II 4.0 NaN NaN 467.0 NaN R NaN {}
55 tt0069966 Detroit 9000 6.4 25.0 NaN 920.0 NaN R NaN {}
56 tt0069976 Dillinger 7.0 92.0 NaN 4559.0 NaN R NaN {}
57 tt0069992 Don't Be Afraid of the Dark 6.7 67.0 NaN 3306.0 NaN Unrated NaN {}
58 tt0069994 The Forgotten 5.1 NaN NaN 2821.0 NaN R NaN {}
59 tt0070016 Charlotte's Web 6.9 76.0 73.0 16537.0 NaN G 1 win. {'win.': 1}
60 tt0070022 Electra Glide in Blue 7.2 60.0 NaN 4714.0 NaN PG Nominated for 1 Golden Globe. Another 1 nomination. Could not parse awards
61 tt0070030 Emperor of the North Pole 7.3 63.0 NaN 5566.0 NaN PG NaN {}
62 tt0070046 Executive Action 6.7 67.0 NaN 2452.0 NaN PG NaN {}
63 tt0070047 The Exorcist 8.0 83.0 81.0 354510.0 NaN R Won 2 Oscars. Another 14 wins & 17 nominations. {'Oscars.': 2, 'wins': 14, 'nominations.': 17}
64 tt0070068 40 Carats 6.4 NaN NaN 625.0 NaN PG Nominated for 1 Golden Globe. Another 1 nomination. Could not parse awards
65 tt0070077 The Friends of Eddie Coyle 7.5 100.0 NaN 8459.0 NaN R NaN {}
66 tt0070079 From the Mixed-Up Files of Mrs. Basil E. Frankweiler 6.4 52.0 NaN 469.0 NaN G NaN {}
67 tt0070107 Ginger in the Morning 5.2 NaN NaN 242.0 NaN PG NaN {}
68 tt0070112 The Girl Most Likely to... 7.5 NaN NaN 1192.0 NaN Not Rated NaN {}
69 tt0070115 The Glass Menagerie 7.3 NaN NaN 1033.0 NaN TV-PG Won 4 Primetime Emmys. Another 3 nominations. {'Primetime': 4, 'nominations.': 3}
70 tt0070121 Godspell: A Musical Based on the Gospel According to St. Matthew 6.5 64.0 NaN 3080.0 NaN G 2 nominations. {'nominations.': 2}
71 tt0070157 The Harrad Experiment 4.7 NaN NaN 677.0 NaN R NaN {}
72 tt0070158 Harry in Your Pocket 6.4 NaN NaN 1110.0 NaN PG NaN {}
73 tt0070165 Heavy Traffic 6.6 89.0 NaN 2837.0 NaN R NaN {}
74 tt0070183 Hit! 6.1 NaN NaN 420.0 NaN R NaN {}
75 tt0070199 The Men Who Made the Movies: Howard Hawks 6.9 NaN NaN 269.0 NaN NaN NaN {}
76 tt0070207 I Love a Mystery 6.5 NaN NaN 102.0 NaN NaN NaN {}
77 tt0070212 The Iceman Cometh 7.2 91.0 NaN 1391.0 NaN PG 3 wins & 1 nomination. {'wins': 3, 'nomination.': 1}
78 tt0070222 Invasion of the Bee Girls 5.0 43.0 NaN 2408.0 NaN R NaN {}
79 tt0070238 Jeremy 6.9 NaN NaN 706.0 NaN PG Nominated for 1 Golden Globe. Another 1 win & 1 nomination. Could not parse awards
80 tt0070239 Jesus Christ Superstar 7.4 52.0 64.0 24783.0 NaN G Nominated for 1 Oscar. Another 3 wins & 11 nominations. {'Oscar.': 1, 'wins': 3, 'nominations.': 12}
81 tt0070242 Jimi Hendrix 7.8 NaN NaN 1477.0 NaN R NaN {}
82 tt0070248 Jonathan Livingston Seagull 6.1 8.0 NaN 1813.0 NaN G Nominated for 2 Oscars. Another 2 wins & 3 nominations. {'Oscars.': 2, 'wins': 2, 'nominations.': 5}
83 tt0070284 Lady Ice 4.8 NaN NaN 366.0 NaN PG NaN {}
84 tt0070287 The Last American Hero 6.4 80.0 74.0 1474.0 NaN PG 1 win. {'win.': 1}
85 tt0070290 The Last Detail 7.5 89.0 89.0 21585.0 NaN R Nominated for 3 Oscars. Another 6 wins & 6 nominations. {'Oscars.': 3, 'wins': 6, 'nominations.': 9}
86 tt0070291 The Last of Sheila 7.3 86.0 NaN 5144.0 NaN PG 1 win. {'win.': 1}
87 tt0070292 The Laughing Policeman 6.4 57.0 NaN 2370.0 NaN R NaN {}
88 tt0070322 Lisa, Bright and Dark 6.7 NaN NaN 103.0 NaN NaN NaN {}
89 tt0070332 Lolly-Madonna XXX 6.5 NaN NaN 533.0 NaN PG NaN {}
90 tt0070334 The Long Goodbye 7.6 93.0 87.0 25309.0 NaN R 1 win & 1 nomination. {'win': 1, 'nomination.': 1}
91 tt0070337 Lost Horizon 5.4 20.0 NaN 2134.0 NaN G NaN {}
92 tt0070350 The Mack 6.7 60.0 NaN 2596.0 NaN R NaN {}
93 tt0070355 Magnum Force 7.2 75.0 58.0 54901.0 NaN R 2 nominations. {'nominations.': 2}
94 tt0070379 Mean Streets 7.3 95.0 96.0 97354.0 NaN R 5 wins & 5 nominations. {'wins': 5, 'nominations.': 5}
95 tt0070497 Outrage 7.2 NaN NaN 278.0 NaN NaN NaN {}
96 tt0070509 The Paper Chase 7.2 83.0 65.0 7004.0 NaN PG Won 1 Oscar. Another 2 wins & 4 nominations. {'Oscar.': 1, 'wins': 2, 'nominations.': 4}
97 tt0070510 Paper Moon 8.1 92.0 77.0 41326.0 NaN PG Won 1 Oscar. Another 7 wins & 10 nominations. {'Oscar.': 1, 'wins': 7, 'nominations.': 10}
98 tt0070608 Robin Hood 7.6 54.0 57.0 112863.0 NaN G Nominated for 1 Oscar. Another 1 win & 1 nomination. Could not parse awards
99 tt0070622 Sssssss 5.4 30.0 NaN 3224.0 NaN PG 1 nomination. {'nomination.': 1}