Quora is the new hub for many of the internet’s questions.
And with over 300 Million users, it holds tons of information about what people want to know.
Therefore, you might be interested to scrape data from Quora to uncover insights about the market, your industry, your target audience and more.
Today, we will go over how to scrape data from Quora using a free web scraper. You will then be able to extract all the data as a CSV or JSON file.
A Free Web Scraper for Quora
In order to complete this project, we will use ParseHub, a free and powerful web scraper that can work with any website. Make sure to download ParseHub for free before getting started.
Furthermore, we will be scraping questions and data from Quora’s Smart Phone News community.
Scraping data from Quora
Now it’s time to start setting up our web scraping project.
- Install and open ParseHub. Click on “new project” and enter the URL for the page you will be scraping. In this case, we will be scraping Quora’s Smart Phone News community. Once submitted the URL will render inside the app.
- A select command will be created by default, start by clicking on the first question on the page to select it. It will be highlighted in green to indicate that it’s been selected. The rest of the questions on the page will be highlighted in yellow. In the left sidebar, rename your selection to “question”.
- Now click on the second question on the page to select them all. They will all now be highlighted in green.
- We can now extract more data from this page. Let’s start with the number of answers for each post. Use the PLUS (+) sing next to your “question” selection and choose the Relative Select command.
- Using the Relative Select command, click on the first question on the list and then on the number of answers under it. An arrow will appear to show the association you’re creating. Rename your new selection to “answers”.
- Expand your “answers” selection by clicking on the icon next to it.
- Delete the URL extraction under your “answers” selection since this is data we’ve already extracted.
- Your project should now look like this
We will now extract even more data from Quora.
Extracting Additional Data
Let’s now tell ParseHub to click on each question on the page and extract more data.
- Start by clicking on the PLUS (+) sign next to your “question” selection and choose the “click” command.
- A pop up will appear asking you if this is a next page button. Click on no and name your new template to “question_page” and click on the green “Create New Template” button.
- The page for the first question on the page will now render inside the app and a select command will be created by default.
- Use this select command to extract any additional data you’d want from this page. In this case, we will extract the name of the top answer’s author. We will do this by clicking on it. Rename your selection to “author”.
- To extract more data, click on the PLUS (+) sign next to your “author” selection and choose the Select command. Then use this command to click on more data to extract. We will also extract the date on which the top answer was posted.
Dealing with Infinite Scroll
ParseHub is now extracting the data we’ve selected from the first few questions on the questions page. This page uses infinite scroll to load more questions. So we will setup ParseHub to load and scrape more questions.
- First, use the tabs on the right side of the screen to return to your main template. Then use the browser tab to return to the main questions page.
- Click on the PLUS(+) sign next to the “page” selection, click on “Advanced” and select the Extract command.
- Rename this selection to listing_value and replace the $location.href expression with the digit 0.
- Drag the extract command you’ve just created to the top of the command list, above the “question” select command.
- Use the icon next to the “question” selection to expand all its commands. Hover over the “question” selection and hold the Shift key to make the PLUS(+) sign pop-up. Use the PLUS(+) sign to select an extract command.
- Rename this new extract command to remove and under the extract dropdown choose “Delete element from page”
- Using the instructions in step 5, add a new extract command and name it “listing_value”. In the command settings below, replace the $location.href expression with the digit 1.
- Now click on the PLUS(+) sign next to the “page” selection and add a Conditional command. Edit the expression of this command to “listing_value”.
- Using the PLUS(+) sign on this conditional, add a select command and select the section on the website that contains all the questions on the feed. You might need to use Ctrl+2 while hovering over it to select it. Rename this selection to “feed”.
- Expand your new “feed” command and remove the extract command.
- Click on the PLUS(+) sign on the “feed” command. Use it to add a Scroll command.
- Repeat step 11 to add a Go To Template command. A pop up will appear, accept it with its default settings.
- Now click on the “Go to Template” command and enter the number of times you’d like to repeat this process in the “Repeat This Template” field. In Quora, each repeat represents 20 questions scraped. In this case, we will repeat it 4 more times.
- Lastly, click on the three dots on your left sidebar next to the main_template text and untick “No Duplicates”.
Running your Scrape
It’s now time to run your scrape job and extract all the data you’ve selected.
Start by clicking on the green “Get Data” button on the left sidebar. Here you can Test, Schedule or Run your web scraping project. In this case, we will run it right away.
ParseHub will now go and scrape the data you’ve selected. Once your scrape is completed you will be able to download it as a CSV or JSON file.
You know now how to scrape data from Quora with a free web scraper.
We know projects can get quite complex. If you run into any issues during your project, reach out to us via the live chat on our site and we will be happy to assist you.