Web scraping Movie data from IMDB
There are many different types of movies that come out every year!
From comedy to horror movies, from children's movies to documentaries! There’s a movie for everyone.
We’ll go over how to scrape Movie data from popular movie website IMDB
We’ll extract the following data:
- Movie title
- Movie page link
- Year
- Movie genre
- IMDB Rating
- Meta score
- Director
- Main cast
- Votes
Before we get started, you’ll need to download a free web scraper. While there are many options available, we think you’ll enjoy ParseHub!
It’s free to use and has many features we think you’ll enjoy!
Web scraping movie data from IMDB
If you want to follow along with this project, you can use this link here: IMDB “Top 1000”
Let’s get started!
- Download and install PareseHub. Click on the new project and button and submit the URL into the text box. The website will now render inside of the app.
- A select command will automatically be created. While using the select command, click on the first movie title that is on the page. You should notice the movie you selected will be in green. ParseHub will now suggest which other elements you want to extract in yellow.
- Click on the next headline that is in yellow to select them all. You may need to do this 2-3 times to teach ParseHub what to extract. The rest of the movie titles will now be highlighted in green.
- On the left sidebar rename your headline selection to something more appropriate, we’re going to name it “movie”
- Click on the PLUS (+) sign next to your headline and choose the relative select command.
- Click on the first movie that is highlighted in orange, then click on the rating below it. An arrow will appear showing the association you have created. You may need to repeat this step to fully train the Web scraper. Rename your selection to “description”.
- Repeat steps 5-6 to extract data like: Year, Movie genre, Meta score, Director, Main cast and Votes
Adding pagination
If we were to start our project, we would only give extract the first page of movies. We will now teach you how to add pagination to your web scraping project.
- Click the PLUS(+) sign next to your page selection and choose the “Select” command.
- Using the Select command, scroll down to the Next Page link. Click on it to select it and rename your selection to next_button.
- Click on the icon next to your next_button selection to expand it. Delete the two commands under the next selection.
- Click on the PLUS(+) sign next to your next selection and add a Click command.
- A pop-up will appear asking you if this a “next page” link. Click on Yes and enter the number of times you’d like to repeat this process. In this case, we will repeat it 4 times.
Running your Scrape
It is now time to run your scrape. To do this, click on the green Get Data button on the left sidebar. Here you will be able to test, schedule, or run your scrape job.
For larger projects, we recommend that you always test your job before running it. In this case, we will run it right away.
Once your run is completed, you will be able to download it as an Excel or JSON file.
Closing Thoughts
Pretty easy huh?
Now you know how to scrape movie data from IMDb without any coding skills!
We understand that projects can get quite complicated. If you’re running into any troubles you can contact our customer support team using our live chat.
What will you scrape?
Happy Scrapping!