How to Scrape UFC Stats

If you're a fan of mixed martial arts (MMA), then you know that the Ultimate Fighting Championship (UFC) is the biggest and baddest organization in the business. Founded in 1993, the UFC has been through a lot of changes over the years. One thing has remained constant: it's always been a place for the best mixed martial artists in the world to test their skills against one another. Today, the UFC is home to some of the biggest names in MMA. With its popularity on the rise, so too are the number of stats and analytics available for fans to consume. In this blog post, we'll show you how to scrape UFC stats from their official stats website, so that you can nerd out like never before!

To begin scraping UFC data, you will need to download and register your free account with ParseHub.

Let’s begin scraping UFCStats.com!

Making Your First Selection

  1. Firstly, begin by logging into ParseHub and click the “New Project” button.
  2. Enter this URL to scrape from the official UFC Stats website and click “Start project on this URL”: http://ufcstats.com/statistics/events/completed
  3. Click the first completed event and it should turn green, click the next event and then all the events on the front page should be extracted.
  4. Rename this selection to “event” on the left pane.

Extracting Relative Data

  1. Begin by clicking the PLUS(+) button next to the “event” selection and choose “Relative Select”.
  2. Click the first event and an arrow will appear, click the date under the event to extract it.
  3. Now all the relative dates will be extracted, rename this to “date”.
  4. Do the same 3 steps and this time point the arrow to the location.
  5. You might need to click the next event and location to train the algorithm.
  6. Rename this selection on the left to “location”.

Extracting Fighters and Stats

To get additional information, such as the fighters in each event, and their stats, we need to go into each UFC event to grab additional information.

  1. Begin by clicking the PLUS(+) button next to the “event” selection we made earlier.
  2. Choose the “Click” option.
  3. On the pop up asking if this is a next page button, choose “No” as we are not doing pagination just yet.
  4. Create a new template called event_details and the event page will now load.
  5. Click the first fighter to extract their name, and click the next two fighters to extract all the fighters. Rename this selection to “fighter” on the left.
  6. Click the PLUS(+) button next to the fighter selection and choose Relative Select.
  7. Click the first fighter, then their STR which is the number of strikes they made in the fight.
  8. Redo this for the next two fighters to train the algorithm, and rename this selection to “strikes”.
  9. Repeat the relative selection, and click the first fighter and then the method of winning, do this for the next winner and rename this selection to “win_method”.

Awesome job, you have now extracted multiple data points! You can repeat these steps for other stats, such as the weight class, the number of rounds, and much more.

Scraping Subsequent Pages

In our other tutorials, we talked about pagination. With this website, there is no next page button, which means we have to use a more complex method to paginate through the pages. Here is a help document on pagination with no next button.

  1. Start by clicking the PLUS(+) button next to your page selection.
  2. Choose “Select” and click the 1st-page number button.
  3. Expand the selection and delete the two extractions.
  4. Rename this selection to “current_page”.
  5. Click the PLUS(+) button next to your “current_page” selection, and choose “Relative Select”.
  6. Click the first-page button, and then point and click the arrow to the number 2 which is the next page.
  7. Rename this selection to “next_page” and delete the extractions.
  8. Click the PLUS(+) symbol next to the “next_page” extraction and choose click.
  9. Choose “Yes” on the pop-up as this is a next page button.
  10. Enter the amount of additional pages you want to scrape, we will choose 2 which is 3 pages in total.
  11. The next page will now load, make sure the “current_page” selection is now on page 2.
  12. You will notice the “next_page” relative select is also on page 2, you will need to click the number 2 and correctly point it to page 3.
  13. Click the browse button and go to page 3 and make sure now everything is correct.

Phew, that was a lot of steps compared to websites that have a next-page button; if you run into any problems, make sure you contact our support for more help!

Bypassing Blocks

Whenever you scrape large amounts of data, you will most likely require ParseHub's IP Rotation. At the time of this blog post, we only scraped 3 pages and did not require any bypassing. If you get empty results, which means you got blocked, you can go into the project settings and enable IP Rotation. Note that this is a paid feature. You can also take it a step further and use your own custom residential proxies.

Starting the UFC Scrape

Now that you have made your first selection, relative selections, entered into each event and have set up pagination, it is time to start your scrape! To begin your scrape, click the big green “Get Data” button on the left pane. You may now choose to Test, Run or Schedule your scrape. Testing is a good option to check if everything is working, in our case we will Run this scrape to get our 3 pages of results!

If you followed this tutorial correctly, your data should look like this:

Big shout out to Reddit user u/ReekFirstOfHisName who used ParseHub to scrape UFC data for their research paper. Here are the results acquired:

  • 5932 Total UFC Fights from UFC 1 to 259
  • 670 Knockouts
  • 1252 TKOs
  • 1252 Submissions
  • 2636 Decision Wins
  • 45 Draws
  • 18 Disqualifications
  • 59 No Contests

We hope you enjoyed this blog post and tutorial on scraping UFC events, fighters and fight stats!

If you have any questions about web scraping, data extraction, or need help with using ParseHub, feel free to contact us on our website.

Happy Scraping! 🥊