You can find almost anything on Craigslist.
From your next apartment to a missed connection on the subway.
There are so many listings on Craigslist, that it can be hard to shift through them all and compare them efficiently.
Wouldn’t it be convenient if you could extract all the info from a set of listings into a spreadsheet?
Craigslist and Web Scraping
With the power of Web Scraping, we will be able to easily extract all the info we’d want from Craigslist listings. In this case, we will scrape the rent housing results in Toronto. In this scenario, this information can be used for apartment hunting or for analysis of the current rental market.
We will also use ParseHub, a powerful and free web scraper that can easily deal with websites like Craigslist.
- First, make sure to download and open ParseHub.
- In ParseHub, click on New Project and submit the search results page we will be scraping. The webpage will render inside the app and you will be able to start selecting data to export.
Scraping Craigslist Data
- Once the webpage you’ve submitted renders, click on the title of the first listing on the page. It will be highlighted in green to show that is has been selected.
- The rest of the titles on the screen will be highlighted in yellow. Click on the second one to select them all. They will all now be selected and highlighted in green.
- In the left sidebar, rename your selection to listing. ParseHub is now extracting the listing title and its URL.
- Next to the listing selection, use the PLUS(+) sign and choose the Relative Select command. Use this command to click on the title of the first listing and then on its price. An arrow will appear to highlight the selection. Rename your selection to price.
- Use the icon next to the price selection to expand your selection and remove the price_url extraction, since it’s pulling the listing URL again.
- Repeat Step 5 to also select the listings bedroom info and location. Rename your selections accordingly.
Scraping Craigslist Listing page
Now we will tell ParseHub to click on each listing on the page and extract additional data from each listing.
- First, click on the PLUS(+) sign next to your listing selection and choose the click command.
- A pop-up will appear asking if this is a “next page” button. Click no and choose Create New Template. Name your new template listing_template.
- The first listing will automatically open, you will be able to make your first selection.
- We will start by selecting the title of the listing. Rename your selection to title.
- Use the PLUS(+) sign next to the title selection and use Relative Select to make a new selection to extract.
- In this case, we’ve made selections for the listing details and date.
- For the date extraction, you will notice that the information extracted will show times as relative timestamps (e.g.: 2 Hours Ago). To improve this extraction, expand your date selection and click on the “extract date” command. On the Extract dropdown, choose “title Attribute”. ParseHub will now extract the full time and date of publication.
Interested in also scraping images from each listing? Check our guide on how to scrape and download images from any website.
Dealing with Navigation
So far, we’ve told ParseHub to extract data from the first page of results and each of its listings But you might want to scrape even more information.
We will now tell ParseHub to scrape listings from the next couple of pages of results.
- First, use the left side of sidebar to go back to the main_template. Also, click on the browser tab for the search results page.
- Use the PLUS(+) sign next to the page selection and choose the Select command.
- Using the Select command, scroll to the bottom of the page and select the “next” link on the page. Rename your selection to next.
- Expand your next selection and remove both of the extractions that were created by default.
- Now use the PLUS(+) sign on your next selection and choose the Click command.
- A pop-up will appear asking if this a “next” button. Click Yes and enter the number of times you’d like to repeat this sequence. In this case, we will repeat it 5 times.
Running your Scrape Job
We are now ready to run our scrape job. Click on the Get Data button on the left sidebar and select Run.
After your job is completed, you will be able to download your scrape as an Excel spreadsheet.
Pro-Tip: When working with longer scrape jobs, we recommend you run a Test Run first to verify that your project has been setup correctly.
Now that your first Craigslist scrape is complete, you can transfer over these skills to scraping other product categories on Craigslist.
For example, you could scrape iPhone or car prices to make sure you’re getting the best deal possible for your next purchase.Interested in learning more about web scraping? Read our guide on web scraping and its many uses.