For many people, Walmart might seem more like a retail giant instead of an e-commerce company.
But don’t be fooled, Walmart has been working hard to compete with Amazon in the modern era of e-commerce.
They have even offered a new offer to combat Amazon Prime shipping, with one-day free shipping on orders over $35.
Walmart’s online inventory has grown massively through the last few years as a result. Therefore, there is a lot of value that can be unlocked from having access to their inventory data.
What if you could easily download data for a variety of products from Walmart’s site into a spreadsheet? Web scraping is the answer.
Walmart and Web Scraping
A web scraper will allow us to choose any product category from Walmart’s website and extract the information from these products into a more useful format.
Scraping Walmart Product Data
- First, make sure to download and install ParseHub for free. Once installed, boot it up and log in.
- Then, click on “New Project” and enter the URL for the results page you’d like to scrape. In this case, we will scrape Walmart’s results page for the term “tablet”.
Scraping Walmart Results Page
- Once you’ve submitted the URL, the page will render in the app, ready to select the first element on the list.
- To start, click on the name of the first item on the list. It will be highlighted in green to indicate that it has been selected.
- You will notice that the next couple of listings on the page will be highlighted in yellow, click on the second one on the list to select them all. They will all now be highlighted in green. On the left sidebar, rename your selection to product.
- ParseHub is now pulling the product’s name and listing URL. On the left sidebar, use the PLUS(+) next to the product selection and select the “Relative Select” command.
- Using the “Relative Select” command, click on the title of one of the listings and then on the price next to it. An arrow will appear to show the association. You might need to press Ctrl+1 while hovering over the listing’s price to select the full price amount. Rename your selection price.
- Repeat steps 4-5 to also scrape the product’s star rating. Rename your selection to rating.
- Since the rating is represented by images, we will have to edit out extraction accordingly. First, expand your rating selection.
- Now, click on the “Extract rating” command, and on the dropdown below, select “aria-label Attribute”. ParseHub will now extract the product’s rating and number of reviews.
- Lastly, delete the extract URL command below it.
Scraping Walmart Product Pages
You might want to pull even more information than what is present on the search results page. Such as the Walmart product # and other details.
In order to do this, we will make ParseHub click on every listing results to scrape more information about each listing.
- First, click on the PLUS(+) sign next to the product selection and choose the “Click” command.
- A pop-up will appear asking if your selection is a “next page” button. Click “No” and select “Create New Template”. Name it product_page and and click on “Create New Template”
- The first product page on the list will open in ParseHub, with a new Select command. Click on the product title to start scraping. Rename the selection to product.
- Since we are already pulling the product name from the search result page, we will extract the selection and remove the extract command.
- Now, use the Relative Select command (like in steps 4-5 in the previous section) to select the Walmart product number. Rename your selection to number.
- Repeat the previous step, to create a Relative Select command to extract the product image as well. You will have to use the CTRL+2 keys to select the element with the image URL in it. Rename your selection to image.
Pro-Tip: Want to not only extract the image URL but also download the images themselves? Check out our guide on how to scrape and download images from any website.
After you have selected any additional data you wanted to scrape from the product page, we can go back and setup ParseHub to scrape several search result pages, rather than just one.
- On the left sidebar, click on “main_template” to return to the template we originally worked on.
- Next, click on the PLUS(+) button next to the page selection and use the “Select” command.
- Using this new “Select” command, click on the next page link at the bottom of the search results page and rename your selection to next.
- Expand the next selection and remove its extract command.
- Using the PLUS(+) sign next to the next selection, add a Click command.
- A pop-up will appear asking if your selection is a “next page” button. Click “Yes” and enter the number of times you’d like ParseHub to repeat this process. In this case we will repeat 4 more times.
Running and Exporting your Project
Once you have fully setup your project, use the “Get Data” button on the left sidebar to run your scrape job. For larger and more complex projects, we recommend running a Test Run first to ensure that your data will be extracted correctly.
Once your scrape job is completed you will be able to download the data you’ve selected as a convenient spreadsheet.
You can now use the steps above to scrape data from any product category or search term on Walmart’s website.