The Google play store has a variety of different apps, movies, and books for both computers and phones for Google users.
Many free and paid apps are on the Google store from games to business apps. You’ve probably heard the saying, “there’s an app for it”.
Today we’re going to show you how to scrape apps from the Google play store.
You can use this for your research and development and see what exactly makes a great app.
So let’s get started!
Scraping the google play app store
For this project, we are going to scrape the top free apps on the google play store.
We’ll scrape the app and developer URL, Rating, and most recent reviews.
We will be using a free web scraper, ParseHub. If you would like to follow along, you can download ParseHub for free here.
To follow along with this example, you can use this link here.
So let’s get started!
- Download and install PareseHub. Click on the new project button and submit the URL into the text box. The website will now render inside the ParseHub.
- A select command will automatically be created. While using the select command, click on the first app that is on the page. You should notice the App you selected will be in green. ParseHub will now suggest which other apps you want to extract in yellow.
3. Click on the next app that is in yellow to select them all. You may need to do this 2-3 times to teach ParseHub what to extract. The rest of the apps will now be highlighted in green.
4. On the left sidebar rename your app selection to something more appropriate, we’re going to name it “app_name”
5. Click on the PLUS (+) sign next to your headline and choose the relevant select command.
6. Click on the first app that is highlighted in orange, then click on the developer below it. An arrow will appear showing the association you have created. You may need to repeat this step to fully train the Web scraper. Rename your selection to “app_developer”.
You should notice that both the URL and name are being extracted
Adding the scroll function
Since the results page is a scrolling page (scroll to load more) we will need to tell the web scraper to scroll to get all of the apps .
If you were to run the project now you would only get the first few apps extracted. So let’s show you how you can deal with a scrolling page
- To do this, click on the PLUS + sign beside the page selection and click select. You will need to select the main element to this, in this case, it will look like this.
2. After you've selected the main div, expand your new selection and delete the extracted data.
3. Once you have the main Div clicked you can add the scroll function. To do this, on the left sidebar, click the PLUS (+) sign next to the main selection, click on advanced, then select the scroll function.
4. You will need to tell how long the software to scroll, depending on how big the blog is you may need a bigger number. But for now, let’s put it 3 times and make sure it's aligned to the bottom.
5. Now drag your new scroll function to the top of your project.
Here are other resources for you to read when dealing with pagination or scroll
Web Scraper Pagination: How to Scrape Multiple Pages on a Website
Extracting data from each app - Click command
Now, we will tell ParseHub to click on each app we’ve selected and extract additional data from each page. In this case, we will extract:
- Most recent reviews
- Number of downloads
First, on the left sidebar, click on the 3 dots next to the main_template text.
Rename your template to” app_results_page” or anything you see fit. Templates help ParseHub keep different page layouts separate, and will help you organize your project.
- Now use the PLUS(+) button next to the app selection and choose the “Click” command. A pop-up will appear asking you if this link is a “next page” button. Click “No” and next to Create New Template input a new template name, in this case, we will use app_page.
2. ParseHub will now automatically create this new template and render the first app URL on the results page
3. While using the select command, click on the most recent review. Be sure to check your table below that you’re extracting correctly.
4. While using the same select command, click on the second review below it. ParseHub should now be collecting the most recent reviews that are present on the page.
5. Rename your selection to “App_reviews” or anything you see fit
6. Click on PLUS (+) sign next to the page command use choose select then click on the number of installs under the label “installs”. If you want to extract the other additional information, you’ll need to select the label first, then do a relative select command.
7. Rename your selection to Number_of_installs
8. Be sure to double-check that the data you want is extracted properly by looking at the preview table.
Running your Scrape
It’s now time to run your scrape job and extract all the data you’ve selected.
Start by clicking on the green “Get Data” button on the left sidebar. Here you can Test, Schedule or Run your web scraping project. In this case, we will run it right away.
ParseHub will now go and scrape the data you’ve selected. Once your scrape is completed you will be able to download it as a CSV or JSON file
Closing thoughts
Now you know how to scrape apps from the Google play store! You can use this data for competitor research, product development, and even brand monitoring.
We understand that web scraping projects can get quite complicated. If you run into any problems or have any questions, you can contact our customer support team using our live chat where we’ll be more than happy to assist you!
Happy scrapping!