Skip to content

Intro to modelling

Want to learn how to create your own predictive model using sports or racing data, but you don’t know where to start? We’re here to help.

The Data Scientists at Betfair have put together the first few steps we suggest you take to get you started on your data modelling journey. We also run occasional data modelling workshops to help you get the basics down – reach out and let us know if you’re interested in being notified about upcoming data events.

Choose your language

There are lots of programming languages to choose from. For our data modelling workshops we work in R and Python, as they’re both relatively easy to learn and designed for working with data.

If you’re new to these languages, here are some resources that will help get you set up.

Language 1: R

Language 2: Python

Find a data source

Finding quality data is crucial to being able to create a successful model. We have lots of historical Exchange data that we’re happy to share, and there are lots of other sources of sports or racing specific data available online, depending on what you’re looking for.

For our workshops we use historical NBA odds data from the Exchange (which you can download directly from here, along with NBA game data from a variety of sources including:

Learn to Program

Okay, so easier said than done, but you don't actually need a high level of programming knowledge to be able to build a decent model, and there are so many excellent resources available online that the barrier to entry is much lower than it's been in the past.

These are some of our favourites if you want to learn to use R or Python for data modelling:

  • Dataquest – free coding resource for learning both Python and R for data science
  • Datacamp – another popular free resource to learn both R and Python for data science
  • Codeacademy – free online programming courses with community engagement

We've also shared a R repo for connecting with our API, which might make that part of the learning process easier for you, if you go down that path.

Learn how to model data

We’ve put together some articles to give you an introduction to some of the different approaches you can take to modelling data, but again there are also lots of resources available online. Here are some good places to start:

  • Work through the modelling tutorials we've put together using AFL and soccer data
  • This Introduction to Tennis Modelling gives a good overview of ranking-based models, regression-based models and point-based models
  • How we used ELO and machine learning as different approaches to modelling the World Cup

Get your hands dirty

The best way to learn is by doing. Make sure you have a solid foundation knowledge to work from, then get excited, get your hands dirty and see what you can create! Here are a final few thoughts to help you decide where to from here: