Scraping Javascript content can be quite a challenge.
Mostly, because a lot of web scrapers struggle when scraping dynamic javascript content.
A lot of web scrapers cannot effectively load, browse or scrape javascript content on the web.
However, there are now free web scrapers that can easily extract data from javascript websites on to a CSV or JSON file.
A Free and Powerful Web Scraper
For this project, we will use ParseHub, a free and powerful web scraper than can extract data from any website.
Additionally, we will also extract data from Amazon, using ParseHub to interact with the search bar, perform a search and scrape content loaded dynamically in the search results page.
Make sure to download and install ParseHub for free before getting started.
Browsing and Scraping Javascript Content
Now, let’s get started with our project.
- Install and open ParseHub. Click on “New Project” and enter the URL you will be scraping data from. In this case, we will scrape data from Amazon.com. The page will then render inside the page.
- A Select command will be created by default. Let’s use it to select and interact with a Javascript element on the page, the search bar. Start by click on the search bar.
- You now will be able to create an input to enter in the search bar. For this example, we will use the term “laptop”. You will notice it will also be filled up inside the page and recommendations will pop-up, successfully interacting with a Javascript element.
- Go back to your first selection and rename it “search_bar”.
- Now let’s set ParseHub to click on the search button and load the search results page.
- Click on the PLUS(+) sign next to your “page” selection and choose the “Select” command.
- With the select command, click on the Search Button to select it. It will be highlighted in green to indicate that has been selected. Rename your selection to “button”.
- Now click on the PLUS(+) sign next to the “button” selection and choose the click command.
- A pop-up will appear asking you if this a “next page” button. Click on “No”, rename your template to “results_template” and click on the “Create New Template” button. The search results page will load inside the app.
Want to setup up ParseHub to search through a list of keywords? Check out our guide on how to enter a list of keywords into a search box.
Extracting Data from a Search Results Page
Let’s now setup ParseHub to extract more data from the Amazon search results page.
- With the select command created by default, click on the name of the first non-sponsored product on the page. It will be highlighted in green to indicate that has been selected.
- Now click on the second product name on the page to select them all. They will now all be highlighted in green. Rename your selection to “product”.
- ParseHub is now pulling the name and URL for each listing on the page.
Want to learn how to scrape even more data from Amazon, such as pricing and product details? Check our guide on how to extract product data from Amazon.
Closing Thoughts
To extract the data you have selected, click on the green Get Data button in the left sidebar.
Here you will be able to test, schedule or run your scrape job.
In this case, we will run it right away. Once your scrape is complete you will be able to download it as a CSV or JSON file.
You know now how to interact with and extract javascript content from the web.
What website will you scrape first?