There are a handful of search engine websites that will allow you to gain valuable information, news and websites. You can extract this data to help you write the most recent articles, industry analyses, keyword research and SEO opportunities.
Constantly browsing these search engines to find what you’re interested in can be a tedious process.
We are ParseHub, and today we’ll show you how you can scrape a search engine results page like bing to get scrape multiple websites and pages.
Getting Started
With a web scraper like ParseHub, we will be able to scrape websites targeting a keyword. We will extract the page title, meta description and URL link.
Make sure to download and install ParseHub for free before you get started.
Now let’s begin!
Web scraping a search engine like bing
For this project, we are going to scrape the web pages that target the term “data science”
How to scrape Search engine data
- Download and install ParseHub. Click on the new project and button and submit the URL into the text box. The website will now render inside the app.
- A select command will automatically be created. While using the select command, click on the first organic title (not an ad) that is on the results page. You should notice the headline you selected will be in green. ParseHub will now suggest which other elements you want to extract in yellow.
- Click on the next headline that is in yellow to select them all. You may need to do this 2-3 times to teach ParseHub what to extract. The rest of the page titles will now be highlighted in green.
- On the left sidebar rename your headline selection to something more appropriate, we’re going to name it “page_title”
- Click on the PLUS (+) sign next to your page title and choose the relative select command.
- Click on the first-page title that is highlighted in orange, then click on the description below it. An arrow will appear showing the association you have created. You may need to repeat this step to fully train the Web scraper. Rename your selection to “description”.
Adding pagination
If we were to start our project, we would only extract URL titles on the first page. We will now teach you how to add pagination to your web scraping project.
- Click the PLUS(+) sign next to your page selection and choose the “Select” command.
- Using the Select command, scroll all the way down to the Next Page link. Click on it to select it and rename your selection to next_button.
- Click on the icon next to your next_button selection to expand it.
- Delete the two commands under the next selection.
- Click on the PLUS(+) sign next to your next selection and add a Click command.
- A pop-up will appear asking you if this a “next page” link. Click on Yes and enter the number of times you’d like to repeat this process. In this case, we will repeat it 4 times.
Running your Scrape
It is now time to run your scrape. To do this, click on the green Get Data button on the left sidebar. Here you will be able to test, schedule, or run your scrape job.
For larger projects, we recommend that you always test your job before running it. In this case, we will run it right away.
Once your run is completed, you will be able to download it as an Excel or JSON file.
Closing Thoughts
You now know how to scrape a search engines results page like bing. The great thing about ParseHub is you can schedule your project to run every hour, day or week, depending on what you need. This way you can always get the latest algorithm updates and see what changes.
If you run into any issues during this project, reach out to us via the live chat on our site and we will be happy to assist you with your project.
Happy Scraping!