The data on a website might sometimes be presented in an inconvenient way.
You might want to extract the info on a website as an Excel spreadsheet.
This way, you can actually use the data and realize its full value.
In any case, web scraping tools can be incredibly helpful at helping you turn a website into a simple spreadsheet. In this article, we’ll guide you on how to set up a free web scraper and how to quickly extract the data you need.
ParseHub: A Powerful and Free Web Scraper
To achieve our goal for this project, we will use ParseHub, a free and powerful web scraper that can turn any website into a handy spreadsheet or API for you to use.
Wondering what a web scraper is and how they work? Read our definite guide on web scraping.
For this guide, we will only focus on the spreadsheet side of things.
So, before we get started, make sure to download and install ParseHub for free.
Example Web Scraping Project
For the sake of example, let’s assume we own an imaginary company that sells napkins, paper plates, plastic utensils, straws, and other consumable restaurant items (All our items will be fully recyclable too since our imaginary company is ahead of the competition).
As a result, having an excel spreadsheet with the contact information of every fast food restaurant in town would be incredibly valuable and a great way to build a leads database.
So, we will extract the data from a Yelp search result page into an excel spreadsheet.
- Make sure you’ve downloaded ParseHub and have it up and running on your computer.
- Find the specific webpage(s) you’d like to scrape. In this example, we will use Yelp’s result page for Fast Food restaurants in Toronto.
Create a Project
- In ParseHub, click on “New Project” and enter the URL to scrape.
- Once submitted, the URL will load inside ParseHub and you will be able to start selecting the information you want to extract.
Identify and Select Data to Scrape
- Let’s start by selecting the business name of the first result on the page. Do this by clicking on it. It will then turn green.
- You will notice that all the business names on the page will turn yellow. Click on the next one to select all of them.
- You will notice that ParseHub is now set to extract the business name for every result on the page plus the URL it is linking to. All business names will now also be green.
- On the left sidebar, click on the selection you’ve just created and rename it to business
- Then click on the PLUS(+) sign on the selection and choose relative select. This will allow us to extract more data, such as the address and phone number of each business.
- Using Relative Select, click on the first business name and then on the phone number next to it. Rename this Relative Select to phone.
- Using Relative Select again, do the same for the business address. Rename this Relative Select to address. We’ll do the same for the business category.
Now, you will notice that this method will only capture the first page of search results. We will now tell ParseHub to scrape the next 5 pages of results.
- Click on the PLUS(+) sign next to the “Select Page” item, choose the Select command and select the “Next” link at the bottom of the page you'd want to scrape.
- Rename this selection to Pagination.
- ParseHub will automatically pull the URL for this link into the spreadsheet. In this case, we will remove these URL’s since we do not need them. Click on the icon next to the selection name and delete the 2 extract commands.
- Now, click on the PLUS(+) sign next to your Pagination selection and use the click command.
- A window will pop up asking if this is a Next Page link. Click “Yes” and enter the number of times you’d like this cycle to repeat. For this example, we will do it 5 times. Then, click on Repeat Current Template.
Scrape and Export Data
Now that you are all set up, it’s time to actually scrape the data and extract it.
- Click on the green Get Data button on the left sidebar
- Here you can either test your scrape run, schedule it for the future or run it right away. In this case, we will run it right away although we recommend to always test your scrapes before running them.
- Now ParseHub is off to scrape all the data you’ve selected. You can either wait on this screen or leave ParseHub, you will be notified once your scrape is complete. In this case, our scrape was completed in under 2 minutes!
- Once your data is ready to download, click on the CSV/Excel button. Now you can save and rename your file.
Depending on the website you are scraping data from, your CSV file might not display correctly in Excel. In this case, apostrophes were not formatted correctly in our sheet.
If you run into these issues, you can quickly solve them by using the import feature on Excel.
Turning a Website into an Excel Spreadsheet
And that is all that there is to it.
You can now use the power of web scraping to collect info from any website just like we did in this example.
Will you use it to generate more business leads? Or maybe to scrape competitor pricing info? Or maybe you can use it to power up your next Fantasy Football bracket.