Web scraping Real Estate Data - How to scrape house prices
As many cities continue to develop homes and communities, the demand for real estate continues to grow!
Many different real estate websites are useful for both buyers, sellers and real estate agents. These websites give you valuable data like prices, images, area, rooms and bathrooms. You can also use these websites to find commercial properties as well.
The great thing about these websites is that their MLS listings. Meaning you can have access to multiple real estate listings that are available.
However, Trying to find and extract real estate data manually can be a long and tedious process.
Today, we’ll show you how to use a real estate scraper like ParseHub. We’ll teach you how to scrape a real estate website to extract useful information you can use for:
- Price comparison
- Create a list of properties for clients
- Industry insights
Before we show you the steps, do note that some real estate websites have blockers that prevent you from scraping their website. Our customer support team will be more than happy to help you with any of your web scraping projects.
So let’s get started!
Choosing a real estate scraper
To get started, you will need to download and install our free real estate scraper. While there are several web scrapers avaiable, we think you'll enjoy ParseHub. Its free to use and has a suite of features we think you'll enjoy like scheduling and IP rotation.
Web scraping a real estate website’s data
For this example, we are going to scrape Royal lepage. We are going to scrape residential properties that have the following requirements:
- For sale in Calgary
- Price range of $400,000 to $700,000
- Located in the SW quadrant
You can use this link if you want to follow along.
Scraping the Real estate Results Page
- Once ParseHub is downloaded and installed, open the app, click on “New Project” and use the URL from the Royal LePages result page. The page will now be rendered inside the app.
2. Once the website is rendered, a selection function will automatically be created. If not, you can click on the plus sign next to the page selection.
3. Click on the first address listing on the page. The address you’ve clicked will become green to indicate that it’s been selected.
4. ParseHub will now suggest the other elements you want to extract. The remaining addresses on the page will be highlighted in yellow. Click on the second address on the list. All of the items that were previously highlighted in yellow are now green because they are selected.
5. On the left sidebar, rename your selection to “Address”. You will notice that ParseHub is now extracting the address and URL for each listing.
6. On the left sidebar, click the PLUS(+) sign next to the address selection and choose the Relative Select command.
7. Using the Relative Select command, click on the first address of the listing on the page and then on the price. You will see an arrow connect the two selections.
8. Expand the new command you’ve created and then delete the URL that is also being extracted by default.
9. Repeat step 7 to also extract the number of rooms, property type and city. Make sure to rename your new selections accordingly.
We have now selected all the data we wanted to scrape from the results page. Your project should now look like this:
Scraping more data from each real estate listing
Now, we will tell ParseHub to click on each listing we’ve selected and extract additional data from each page. In this case, we will extract:
- Images
- Property description
- Property information
First, on the left sidebar, click on the 3 dots next to the main_template text.
Rename your template to” listing_results_page” or anything you see fit. Templates help ParseHub keep different page layouts separate, and will help you organize your project.
- Now use the PLUS(+) button next to your "address" selection and choose the “Click” command. A pop-up will appear asking you if this link is a “next page” button. Click “No” and next to Create New Template input a new template name, in this case, we will use Listing_page.
2. ParseHub will now automatically create this new template and render the first property listing on the results page
3. Click on one of the images of the property, It will be highlighted in green and all the other suggested images will be highlighted in yellow. Click on the next image in yellow to extract the images
Note* For this example, we are only scraping the first few images that are part of the carousel. You can learn how to scrape all of the images from a carousel here.
4. Click on PLUS (+) sign next to the page command use choose select
5. While using the select command, click on the property information description. Usually in these descriptions contain keywords that buyers search for when looking for a place, ie. “open concept”.
6. Rename your selection to “property_information” or anything you see fit
7. Now let’s extract the building features! Click on PLUS (+) sign next to the page command use choose select then click on one of the labels under “building features”. Once you’ve selected a label, click on the next label that is highlighted in yellow.
8. Rename your selection to building_features
9. Click on the PLUS (+) next to your “building_features” selection and choose the relative select command.
10. Click on the first label then click on the feature. You may need need to do this a couple of times to teach ParseHub what you want to extract.
Your listing_page template should look like this:
Adding Pagination (optional)
We can add pagination to this project depending on how many listings you want to scrape. Since this current project has 2 results pages, let’s show you how you can deal with pagination.
Let’s setup ParseHub to navigate to the next results pages.
- On the left sidebar, return to the listing_results_page template. You might also need to change the browser tab to the search results page as well.
- Click on the PLUS(+) sign next to the page selection and choose the Select command.
- Then select the Next page link at the bottom of the Royal Lepage website. Rename the selection to next_button.
4. By default, ParseHub will extract the text and URL from this link, so expand your new next_button selection and remove these 2 commands.
5. Now, click on the PLUS (+) sign of your next_button selection and use the Click command.
6. A pop-up will appear asking if this is a “Next” link. Click Yes and enter the number of pages you’d like to navigate to. In this case, we will scrape 1 additional page.
Your final project should look something like this
Running and Exporting your Real Estate Scraping Project
Now our project is ready to scrape Royal Lepage. To do this, simply click on the left sidebar and click on the green “Get Data” button.
You’ll be brought to this page:
This is where you can test, run or schedule your project. For longer and bigger projects, we recommend doing a Test Run just to make sure your data will be extracted and formatted correctly.
But for this project, click on the “Run” button to begin your scrape.
Once ParseHub is done scraping the website, you will be notified by email and you’ll be able to download your extracted data as an Excel/CSV or as a JSON file.
Closing thoughts
Now you know how to scrape a real estate website like Royal Lepage to create a list of properties for sale.
This list can be used for price comparison, given to clients, and industry insights.
Please note that some real estate websites will block web scrapers from extracting data. You will need to use our IP rotation feature.
If you need help with any of your projects, you can contact our customer support team by using our live chat or using our contact page where they will be more than happy to help!
Happy Scraping!