Amazon offers numerous services to their Prime members.
One thing they do not offer though, is easy access to their product data.
There’s currently no way to just export product data from Amazon to a spreadsheet for any business needs you might have. Either for competitor research, comparison shopping or to build an API for your app project.
Web scraping easily solves this issue.
Amazon and Web Scraping
Web scraping will allow you to select the specific data you’d want from the Amazon website into a spreadsheet or JSON file. You could even make this an automated process that runs on a daily, weekly or monthly basis to continuously update your data.
For this task, we will use ParseHub, an incredibly powerful web scraper. To make things even better, ParseHub is free to download.
Scraping Amazon Product Data
For this example, we will scrape product data from Amazon.com’s results page for “computer monitor”. We will extract information available both on the results page and information available on each of the product pages.
- First, make sure to download and install ParseHub. We will use this web scraper for this project.
- Open ParseHub, click on “New Project” and use the URL from Amazon’s result page. The page will now be rendered inside the app.
Scraping Amazon Results Page
- Once the site is rendered, click on the product name of the first result on the page. In this case, we will ignore the sponsored listings. The name you’ve clicked will become green to indicate that it’s been selected.
- The rest of the product names will be highlighted in yellow. Click on the second one on the list. Now all of the items will be highlighted in green.
- On the left sidebar, rename your selection to product. You will notice that ParseHub is now extracting the product name and URL for each product.
- One the left sidebar, click the PLUS(+) sign next to the product selection and choose the Relative Select command.
- Using the Relative Select command, click on the first product name on the page and then on its listing price. You will see an arrow connect the two selections.
- Expand the new command you’ve created and then delete the URL that is also being extracted by default.
- Repeat steps 4 through 6 to also extract the product star rating, the number of reviews and product image. Make sure to rename your new selections accordingly.
Pro Tip: The method above will only extract the image URL for each product. Want to download the actual image file from the site? Read our guide on how to scrape and download images with ParseHub.
We have now selected all the data we wanted to scrape from the results page. Your project should now look like this:
Scraping Amazon Product Page
Now, we will tell ParseHub to click on each of the products we’ve selected and extract additional data from each page. In this case, we will extract the product ASIN, Screen Size and Screen Resolution.
- First, on the left sidebar, click on the 3 dots next to the main_template text.
- Rename your template to search_results_page. Template help ParseHub keep different page layouts separate.
- Now use the PLUS(+) button next to the product selection and choose the “Click” command. A pop-up will appear asking you if this link is a “next page” button. Click “No” and next to Create New Template input a new template name, in this case, we will use product_page.
- ParseHub will now automatically create this new template and render the Amazon product page for the first product on the list.
- Scroll down the “Product Information” part of the page and using the Select command, click on the first element of the list. In this case, it will be the Screen Size item.
- Like we have done before, keep on selecting the items until they all turn green. Rename this selection to labels.
- Expand the labels selection and remove the begin new entry in labels command.
- Now click the PLUS(+) sign next to the labels selection and use the Conditional command. This will allow us to only pull some of the info from these items.
- For our first Conditional command, we will use the following expression:
- We will then use the PLUS(+) sign next to our conditional command to add a Relative Select command. We will now use this Relative Select command to first click on the Screen Size text and then on the actual measurement next to it (in this case, 21.5 inches).
- Now ParseHub will extract the product’s screen size into its own column. We can copy-paste the conditional command we just created to pull other information. Just make sure to edit the conditional expression. For example, the ASIN expression will be:
- Lastly, make sure that your conditional selections are aligned properly so they are not nested amongst themselves. You can drag and drop the selections to fix this. The final template should look like this:
Now, you might want to scrape several pages worth of data for this project. So far, we are only scraping page 1 of the search results. Let’s setup ParseHub to navigate to the next 10 results pages.
- On the left sidebar, return to the search_results_page template. You might also need to change the browser tab to the search results page as well.
- Click on the PLUS(+) sign next to the page selection and choose the Select command.
- Then select the Next page link at the bottom of the Amazon page. Rename the selection to next_button.
- By default, ParseHub will extract the text and URL from this link, so expand your new next_button selection and remove these 2 commands.
- Now, click on the PLUS(+) sign of your next_button selection and use the Click command.
- A pop-up will appear asking if this is a “Next” link. Click Yes and enter the number of pages you’d like to navigate to. In this case, we will scrape 9 additional pages.
Running and Exporting your Project
Now that we are done setting up the project, it’s time to run our scrape job.
On the left sidebar, click on the "Get Data" button and click on the "Run" button to run your scrape. For longer projects, we recommend doing a Test Run to verify that your data will be formatted correctly.
After the scrape job is completed, you will now be able to download all the information you’ve requested as a handy spreadsheet or as a JSON file.
And that’s it! You are now ready to scrape Amazon data to your heart's desire.
But why stop there? With the skills you’ve just learned, you could scrape almost any other site.