When trying to source materials from overseas, Alibaba is a great place to start.
Not only can you find hundreds and thousands of items but also suppliers with track records and ratings.
But sorting through all the listings to find the best supplier for your business can take a lot of time. And you surely do not want to make a hasty decision about such a key component of your business.
This is where a web scraper can help.
Alibaba and Web Scraping
A web scraper can easily extract all the information you need from a website into a convenient spreadsheet for further analysis.
In this case, we will use ParseHub, a free and powerful web scraper to pull information from Alibaba’s search result page for the term “phone case”.
Want to learn more about web scraping? Check out our guide on what web scraping is and how it is used.
Scraping Alibaba Product Data
Now, we will walk you through the process of how to scrape Alibaba product data into a spreadsheet.
Getting Started
- Make sure to install and open ParseHub for free.
- Click on “New Project” and submit the URL you will be scraping. In this case, we will submit the URL for Alibaba’s search results page for the term “phone case”.
Scraping Alibaba Results Page
Once submitted, the URL will be rendered in ParseHub, ready to select your first element to extract.
- Start by clicking on the name of the first product on the page. It will be highlighted green to indicate it has been selected.
- The rest of the product names will be highlighted in Yellow, click on the second one on the page to select them all. On the left sidebar, rename your selection to product.
- Using the PLUS(+) sign next to the product selection, choose the Relative Select command.
- Using the Relative Select command, click on the first product name and then on its price. An arrow will appear to indicate the selection.
- On the left sidebar, rename your selection to price.
- Repeating steps 3-5 create new Relative Select commands to extract additional product data such as minimum order sizes, seller’s age, country, seller name, review score, number of reviews and response rate.
- In this case, we have decided to stop ParseHub from also extracting the target URL from the review_score and reviews commands. This is done by expanding the selection and deleting the extraction.
- Up to this point, your project should look like this:
Scraping Alibaba Product Pages
Now, there might be additional product information you’d want to scrape from within the actual product pages. If you’re not interested in extracting further information, skip to the next part. Otherwise, read on.
- First, we’ll have to tell ParseHub to click on the title of each listing on the page. To do this, we will use the PLUS(+) sign next to the product selection and choose the Click command.
- A click setup screen will appear asking if this is a “next page” button. Click No, select “Create New Template” and name it product_page. Then click on the Create New Template button.
- ParseHub will now render the first product page and let you select new data to extract.
- For this example, we will scrape information from the Quick Details table. We will start by selecting the first label in the table, it will then be highlighted in Green.
- The rest of the labels will be selected in Yellow, click on the second one to select them all. Rename the selection to labels.
- Now, expand the labels selection and remove the “Begin new entry in labels” command.
- Now, use the PLUS(+) sign next to the labels selection to add a Conditional command (You will have to expand this menu to show the command).
- For our first conditional, we will use $e.text.contains("Brand Name")
- Now use the PLUS(+) sign next to the conditional command and use the Relative Select command to select the text next to the Brand Name label.
- You can now copy/paste your conditional selection to extract additional fields. Just make sure to update the conditional expression and drag the elements so they are not nested within themselves. Your final project should look like this:
Adding Pagination
ParseHub is now extracting info on every product on the first page of results. Let’s now set it up to extract info from the second page and onwards.
- Using the tabs on the left side of the application, return to your main_template. You might also need to use the browser tabs to return to the search results page.
- Using the PLUS(+) button next to your page selection, choose the Select command.
- Use the command to select the “next page” button at the bottom of the page. Rename your selection to next.
- Expand your new next selection and remove the extract command.
- Now use the PLUS(+) button next to your next selection and choose the Click command.
- A click setup window will appear asking you if this is a “next page” button. Click Yes and enter the number of times you’d like to repeat the process. In this case, we will repeat it 4 more times.
Running and Exporting your Project
Your project is now ready. To run your scrape job, click on the Get Data button on the left sidebar and select Run.
Pro-Tip: For longer jobs, we recommend running a Test Run to confirm that your data is being scraped correctly.
Now ParseHub is off to collect the data you have requested. You will receive an email notification once your email scrape is complete.
Final Thoughts
Once your scrape is complete you will be able to download it as an Excel spreadsheet.
Having access to this valuable information can be the difference between starting your business on the right foot or selecting the wrong supplier.
Have any questions about ParseHub and web scraping? Message us at hello[at]parsehub.com