In this tutorial, we will show you how to scrape new or used books and other products from the ThriftBooks website!
ThriftBooks was founded in 2003 in the United States and has since sold over 165 million books on its e-commerce platform. Although ThriftBooks hosts its own e-commerce website, they also sell used books on Amazon and eBay. ThriftBooks also offers used DVDs, CDs, video games, and old tapes which are often purchased online as collectibles. Aside from collectibles, many consumers purchase from ThriftBooks to save money when buying desired book titles and media.
Let’s begin scraping ThriftBooks!
Step 1: Extracting Products
- Firstly, open ParseHub on your PC, Mac or Linux system.
- Click “New Project” on the ParseHub application to start a new project.
- Enter the ThriftBooks URL you wish to scrape from, we will be scraping all new and used bestsellers with this URL:
https://www.thriftbooks.com/browse/#b.s=bestsellers-desc&b.p=1&b.pp=30&b.nr - Click the first book’s title to extract it, the rest should turn yellow.
- Click the next book’s title to train the algorithm, all 30 books on this page should be extracted.
- Rename this selection on the left to “product”.
Step 2: Scraping Additional Details
- Begin by clicking the PLUS(+) button next to your product selection.
- Choose “Relative Select” and click the first book’s title.
- Now click the book’s author, to close the arrow.
- Rename this selection on the left to “author”.
- Redo this step for the price, and rename the selection to “price” on the left.
Note: to get even more book information, you will need to click into each book, create a new template, and scrape relative data with the Relative Select tool.
Step 3: Pagination
- To scrape from multiple pages, scroll all the way down until you see the page nav bar.
- Click the PLUS(+) button next to your page selection and choose “Select”.
- Click the next page button to extract it, and rename the selection to “pagination”.
- Expand the selection and delete the extraction.
- Click the PLUS(+) button next to your pagination selection and choose “Click”.
- Choose yes, as this is a next-page button.
- Finally, choose the additional amount of pages you wish to scrape. Choosing zero will scrape every single page!
Step 4: Begin Scraping
You are now ready to begin scraping on ParseHub’s servers!
First, click the green “Get Data” button on the left pane. You can Test, Run or Schedule your scrape. We chose to run the scrape a single time, but testing can be useful to parse through your project, and scheduling is great for monitoring prices and data.
If you followed our steps, your data should look similar to this:
Need help scraping e-commerce websites?
Contact our live chat support!
Happy Scraping! 💻