We just posted a data driven analysis of how to make your post rank successfully on HackerNews, GrowthHackers and Inbound. If you are interested in running a successful content marketing strategy you might want to analyze data from your own blog or your competitors blogs. For example, you can combine the social shares, word count, title length, pageviews and url structure of all of the posts on your blog to get an accurate picture of what works and what doesn’t.
In this post I will show you how to:
- Part I - Use ParseHub to scrape posts from a forum
- Part II - Clean up and prepare the results scraped by ParseHub in Excel
- Part III - Use pivot tables to organize, analyze and sort your data to find key insights
We applied all of the steps in this tutorial when analyzing what makes a successful post on HackerNews, GrowthHackers and Inbound. Check it out to get some ideas on how you can use web scraping for marketing.
1. Using ParseHub to scrape posts from a forum
Here is the data that we want to get from GrowthHackers:
- post title
- title word count
- website the post came from
- upvotes
- comments
- author
- category
- link
Let's set-up ParseHub to crawl GrowthHackers and get this data.
-
Download the ParseHub desktop application, open it and log-in.
-
Enter the following url in your browser https://growthhackers.com/page/2/ - click
New Project and Start project on this URL. -
Select the first two posts by clicking on them. All of the posts on the page will now be selected and extracted for you.
-
Rename the selection posts.
- Add a third Extract command by clicking on the "plus" button next to the command "Begin new entry in posts and opening the Advanced menu.
- Rename the command “Extract title_length”.
- In the text box, add the following:
$e.text.split(' ').length
ParseHub will automatically count how many words are in each title for you and extract it into your results. Now you don’t have to use Excel formulas to get this data.
- Add a Relative Select command by clicking on the "plus" button next to "Begin new entry in posts".
- Click on the first post title and click on the name of the source to draw an arrow between them. All of the sources for all of the posts are now selected and extracted for you.
- Rename the selection source.
Loading more posts using ParseHub
- I will travel to the next page to find more stories. This will happen indefinitely if I do not tell ParseHub to stop somewhere, though! Click on the the "plus" button next to the "Select page" command to choose a Conditional command.
- In the Expression text box, enter:
posts.length < 300
- Now scroll down until you see the “More Posts” button.
- Click on the "plus" button next to "If posts.length < 300 and add a Select command.
- Click on the "More Posts" button and rename the selection more.
- Add a Click command from the "plus" button next to "Select and Extract more", to tell ParseHub to click on the “More Posts” button.
- Say that you would to Go to Existing Template main_template.
Extracting data with ParseHub
- It's time! Click on Get Data.
- Choose to Run and Save and Run. This will run the project and get the last 300 posts from GrowthHackers.
- To download your data, wait a few minutes for the project to finish running. You will see buttons to download the CSV or JSON of your results.
OPTIONAL: If you want to run the same project on a schedule and get data from GrowthHackers at the beginning of every month you can use our scheduling feature. This way you will be able to combine data about posts from many months together.
- To schedule your crawls, under Get Data click on Schedule.
- Now select choose to schedule Every month on 1 at 0:0 to extract your data at midnight of the first of every month.
3. Clean up and prepare your data in Excel
I have Excel 2011 for Mac so that is what I will use for the tutorial screenshots and formulas - however, you can do everything I do in Windows Excel - just check out this Excel tutorial.
Make sure you have the results downloaded and opened in Excel.
I like to rename and save a version of the results. Also, I clean up the headings and
rename “post_title” to “title”, “post_upvotes” to “upvotes” and etc.
Step 1: Remove Duplicates
- Select all of the data by clicking on the left corner of the border between the label for row 1 and column A.
- Click on the Data tab. Under the tools label, click on the Remove Duplicates icon.
- A pop-up will appear. Now click on Remove Duplicates again.
Step 2: Add up the total engagement for each post
Create a new column beside post_upvotes and rename it engagement. Now add up the post comments and upvotes. We will use post engagement as a benchmark for which posts are outperforming each other.
=E2+F2
Step 3: Find the day each post is published
Create a new column beside date and rename it day. Now enter the following formula into the cell. J2 represents the cell where you have the date - 11-Jun-15. If you your date value is in another column make sure to change this part of the formula.
=TEXT(J2,"DDDD")
Step 4: Analyze the post titles
There are a few things you can do to see what works in the title or not. Here are just a few:
- Find out if the titles with a number in them perform better. Create a new column and rename it has_number. Use the formula below and make sure to replace A2 if your title is in a different column.
=IF(COUNT(FIND({0,1,2,3,4,5,6,7,8,9},A2))>0, "Yes", "No")
- See if titles with a dollar sign or a colon perform better. Use the two formulas below in separate columns
=IF(ISNUMBER(SEARCH("$",A2)),"Yes", "No")
=IF(ISNUMBER(SEARCH(":",A2)),"Yes", "No")
- Find out the most popular first word of top trending post. Make a column and rename it first_word. Enter the formula below into a cell.
=LEFT(A2,SEARCH(" ",A2)-1)
- Here is a different way to find the title length in Excel. Remember - we did this in ParseHub. But it you want to use an Excel formula to count the number of words in each title, here it is:
=IF(LEN(TRIM(A2))=0,0,LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1)
3. Use pivot tables to analyze your data
You might be wondering, how do I create a pivot table in Excel to get the most of my data? Keep reading and I will show you how to analyze which day is the best to post your content and how many words you should make your title.
What day is the best to post content?
Let's create a pivot table to analyze all of the posts in our excel sheet. We will be able to compare each day of the week to the average amount of comments and upvotes each post gets.
Step 1: Create your pivot table
Select the entire sheet of data. Click on the Data tab and find and click on the Pivot Table icon. Excel will automatically create a new sheet for you with an example pivot table.
Step 2: Organize your data points
You will see a pivot table builder pop-up. Let’s drag and drop the values based on the data we want to analyze.
Let’s see if posting on a specific day creates more upvotes and comments and total engagement. Since we want to organize the data by days, we want to use the day to define our row label. Just drag and drop the day to the Row Labels box. Now drag the upvotes, comments and engagement into the Values box. Count of title will automatically appear for us.
Let’s get the average upvotes, comments and engagement for all of the posts, posted on a specific day. To do this, click on the i symbol beside the Count of upvotes, Count of comments and Count of engagement. In the pop-up select Average from the scroll box and click ok.
Step 3: Sort and analyze your results
Sort the days in the pivot table from the greatest engagement to the least engagement. Click on any row under the “engagement” column and select the data tab. Now click on the sort icon in the left corner and select descending from the dropdown.
In this example, from the 275 posts that we are analyzing, Monday has the highest engagement per post. Saturday is not a very popular day to post but yields a similar engagement per post as Monday. As you can see below, Sunday and Saturday have a similar amount of posts but Sunday is the worst day to post and has almost half as many comments and upvotes as Saturday. So, if you had to pick - post on Saturday instead of Sunday.
Does the word count of the title matter?
You can repeat similar steps to above to analyze if the title length influencers the number of comments and upvotes.
Step 1: Create your pivot table and organize your data
Drag and drop title_length into the **Row Labels” text box. Drag and drop engagement,
upvotes and comments into the Values text box.
Click on the i symbol and select the Average from the scroll box for engagement, upvotes and comments.
Step 2: Sort and analyze your results
Sort the Count of titles to eliminate outliers from our results - remove the title word count that only contains 1 or 2 posts. Click on the little icon beside the Row Labels column heading and uncheck all of the checkboxes for word counts 19 and up.
Select any cell under the Average engagement heading and open the Data tab. Now click on the Sort icon and select Descending from the dropdown. If you want the most engagement on your posts aim for between 11 and 15 words in your title.
Of course you can keep going to analyze the rest of the data. We scraped posts from forums such as HackerNews, GrowthHackers and Inbound.org and used Excel and pivot tables to analyze exactly what you need to do to get your post to rank on the first page.
Happy data-hunting!