Scrapy vs ParseHub: A Web Scraper Comparison
Looking for the best web scraper for your project?
Allow us to compare some of the 2 most popular options in the market.
Scrapy and ParseHub are both very powerful and useful web scraping tools. Today, we will put both tools head-to-head to determine which is the best for your scraping project.
Scrapy Introduction
Scrapy is probably the most popular open-source framework for web scraping. It's been around since at least 2008. It started out as an open-source release of a python framework built for scraping a large number of websites for a commercial enterprise.
The framework turned out to be so successful on its own that the creators of it formed a company around it - ScrapingHub.
ParseHub Introduction
ParseHub is a full-fledged web scraper. It comes as a free desktop app with premium features. Hundreds of users and businesses around the world use ParseHub daily for their web scraping needs.
ParseHub was built to be an incredibly versatile web scraper with useful features such as a user-friendly UI, page navigation, IP rotations and more.
In this article, we will first compare the visual web scraping tool ParseHub to Scrapy as an open-source python project. We will also compare ParseHub to the ScrapingHub paid service which runs Scrapy spiders for a fee.
ParseHub and Scrapy Comparison (Plus Portia)
Comparing ParseHub to Scrapy is somewhat of an apples-to-oranges comparison because one is a UI tool and the other is a programming library. A more apples-to-apples comparison would be to the associated open-source project Portia, also built by ScrapingHub.
We’ve gone ahead and compared Portia and ParseHub in an in-depth guide.
But since Scrapy is so established, we will confine this article to the first comparison.
ParseHub Features vs Scrapy Features
FEATURE |
PARSEHUB |
SCRAPY |
Authoring environment |
Desktop app (Mac, Windows and Linux) |
Python plus scrapy command-line tool |
Scraper logic |
Variables, loops, conditionals, function calls (via templates) |
Variables, loops, conditionals, function calls (arbitrary python) |
Javascript, Ajax and dynamic content |
Yes |
With external libraries |
Pop-ups, infinite scroll, hover content |
Yes |
With external libraries |
Debugging |
Visual debugger |
Python logs |
Knowledge of HTML and HTTP |
None required |
Required |
Selecting elements |
Point-and-click, CSS selectors, XPath |
CSS selectors, XPath |
Transforming data |
Regex, javascript expressions |
Regex, arbitrary python |
Speed |
Fast parallel execution |
Fast parallel execution |
Hosting |
Hosted on cloud of hundreds of ParseHub servers |
Hosted on your local machine or your own servers. Can pay for ScrapingHub to host it for you. |
IP Rotation |
Included in paid plans |
Must pay external service |
Sites (AKA spiders, scrapers, projects) |
Free plan: 5, $99/month: 20, $499/month: 120 |
Limited by your infrastructure or as sold by Scrapy Cloud |
Support |
Free professional support |
Community support |
Data export |
CSV, JSON, API |
CSV, JSON, API |
Run-time configuration |
Passed in as a JSON object |
Passed in the command line, arbitrary python |
ParseHub offers most of the web scraping power and scale of Scrapy in a much easier-to-use package. Because we're actually big fans of Scrapy, we still recommend it for a few situations:
- Tight integration with existing python codebase and infrastructure
- Crawling hundreds of websites and grabbing all of the HTML code
ParseHub Pricing vs Scrapy Pricing
Scrapinghub is a paid service for running web scrapers (AKA spiders or projects) created with the open-source python framework Scrapy. It is equivalent to ParseHub's "run on server" and "run on a schedule" service which is integrated into the ParseHub desktop app.
At first glance, the main difference between the two services appears to be their pricing. ParseHub packages capabilities into conventional software-as-a-service (SaaS) plans Free, Standard ($99) and Professional ($499). Scrapinghub prices its service in $9 "Scrapy Cloud units", similar to infrastructure-as-a-service (IaaS) such as Amazon EC2.
ParseHub clearly defines how many pages a minute it will provide for each plan. Scrapinghub offers additional "concurrent crawls" for $9 each. You’d have to calculate how many “Scrapy Cloud units” you would need to run your project at the same speed as a ParseHub paid plan for a closer dollar-to-dollar comparison.
Additional Features
ParseHub bundles all its features in a single package that you can upgrade or downgrade as needed. However, ScrapingHub de-couples several web scraping elements into its own platforms that can quickly add up when going with the paid options.
For example, ParseHub and Scrapinghub both offer IP rotation, but Scrapinghub sells it in a separate service, Crawlera, starting at $25 a month and up to $500 or more a month.
Free Plans
Both services offer a free plan that grants multiple projects and hundreds or more pages.
We recommend you try out the free plans for both tools first before making a decision on paid plans. Visit our download page to start web scraping for free with ParseHub now.
Final Thoughts: ParseHub vs ScrapingHub
Like we mentioned earlier, ParseHub vs Scrapinghub is somewhat of an apples-to-oranges comparison. ParseHub is designed to work at a higher level in which most of the features of Scrapinghub are bundled together.
ParseHub is also a better choice if you do not have the technical knowledge to build and deploy spiders on your own.
You may also work with an business that deals with 'Big Data' and data engineering services.
Scrapinghub is a good choice if you are already convinced that Scrapy is for you. If you are just starting out, we encourage you to try ParseHub which will get you up and running easier and faster.
[This post was originally written on July 15, 2016 and updated on August 9, 2019]