How to Scrape Audible Audiobooks
Audible is an Amazon service that lets you listen to over 200,000 audiobooks. Out of all its competitors, Audible is the top audiobook service in the world. Audiobooks are a convenient way to listen to your favourite fiction and non-fiction books. You can listen to audiobooks while driving, cooking, cleaning and even sleeping! Audible comes with many exclusive titles, which you cannot get from other audiobook services. Audible also comes with sleep tracks, meditations and even podcasts. In this blog post, we will be scraping audiobooks and their respective data from Audible’s website.
In order to follow along with this tutorial and start scraping for free, download ParseHub.
Starting Your Scraping Project
- To begin, sign up and install ParseHub on your computer.
- Once opened, you can log in to your account and begin your new project.
- Click the “New Project” button to create your Audible scraping project.
- We will now input the audible landing page URL: https://www.audible.ca/?ref=a_search_t1_nav_header_logo&pf_rd_p=d39c273e-babe-486c-ac71-452efa62a1a6&pf_rd_r=QQQW3P2GJDRWPNTQ39H0
- Once the page loads, click the search bar on the top right to select it. For the input, put anything you want to search, we will put “money” to scrape audiobooks about making money.
- Click the PLUS(+) button on your page, click “Select” and now select the magnifying glass icon.
- Expand this selection and remove the extracted element.
- Click the PLUS(+) icon on this selection and choose “Click”.
- Choose “No” on the popup as this is not a next page button.
- Finally, create a new template, and you will be redirected to the money audiobook page!
Extracting Audiobooks
- To begin your first selection, click the first audiobook’s title.
- The rest of the titles should turn yellow, click the next audiobook’s title to train the ParseHub algorithm.
- Finally, all 20 audiobooks on the first page should be extracted, rename the selection on the left pane to “audiobook”.
- In the preview below you should now see the audiobook’s title and URL!
Extracting More Audiobook Data
To scrape additional information from each audiobook, such as the author, price and rating, we need to use the Relative Select command. We need to use Relative Select because each attribute is relative to its own audiobook title:
- Firstly, click the PLUS(+) icon next to the audiobook selection from the last step.
- Choose “Relative Select” and click the first audiobook’s title.
- An arrow will now appear, click the first author’s name.
- All the authors will now be extracted, rename this selection on the left to “author”.
- Do another “Relative Select”, click the first audiobook’s title, and now click the price.
- Click the second title and then its respective price to train the algorithm. Rename this selection to “price” on the left pane.
- Redo this for the ratings, using the Relative Select command again and rename the selection to “rating”.
- To extract just the star ratings, click the extraction and expand it.
- Tick the “Use regex” box and enter this expression: (\d out of \d)
- Finally, you should have star ratings, such as “5 out of 5”!
Adding Pagination
Our project is almost ready to scrape! Right now your scrape would extract the first 20 audiobooks only. To scrape more audiobooks from the following pages, we need to add pagination:
- Click the PLUS(+) icon next to the “page” element and click “Select”.
- Scroll down to page navigation and click the right arrow to select it.
- Rename this selection on the left to “pagination”.
- Expand the selection and delete the two “Extract” elements as it will add unneeded columns to your data.
- Click the PLUS(+) button next to the pagination selection and choose “Click”.
- A popup will appear asking if this is a next page button, click “Yes”.
- You can now choose how many extra pages you want ParseHub to scrape. We will choose 2, which means 3 pages in total.
Bypassing Blocks
At the time of this blog post, scraping the 3 pages from Audible’s website did not require IP Rotation. If you are scraping a lot of pages, you might need to use IP Rotation. Note that this is a paid feature of ParseHub. IP Rotation lets ParseHub switch through its dedicated proxies and user agents to bypass blocks on websites. To turn on IP Rotation, click the settings cog at the top left of the screen. Click “Settings” and you should now tick the “Rotate IP Addresses” checkbox. You are now ready to start scraping!
Starting The Scrape
Now that you have your selections, pagination and IP rotation set up, you are ready to begin scraping!
To begin your scrape, click the green “Get Data” button on the left pane. You can now Test, Run or Schedule your scraping. Click “Run” to start the scraping process a single time. The scraping will now begin and you will be notified when the scrape is over. Finally, you can download the data as a CSV, JSON or using ParseHub’s API!
If you followed our steps correctly, your export should look like this:
Congratulations, you just finished our tutorial! If you followed along you should now be a master at scraping Audible’s website!
Are you interested in scraping Amazon as well? You can scrape Amazon products with ParseHub, which is very similar to scraping Audible…
If you run into any technical issues with ParseHub or require any further assistance, feel free to send us a live chat message on our website. Our support team will be more than happy to help you out!
Happy Scraping! 📊