Skip to content

Greyhound form FastTrack tutorial

| Building a model from greyhound historic data to place bets on Betfair


Workshop


Overview

This tutorial will walk through how to retrieve historic greyhound form data from FastTrack by accessing their Data Download Centre (DDC). We will then build a simple model on the data to demonstrate how we can then easily start betting on Betfair using the Betfair API. The tutorial will be broken up into four sections:

  1. Download historic greyhound data from FastTrack DDC
  2. Build a simple machine learning model
  3. Retrieve today's race lineups from FastTrack and Betfair API
  4. Run model on today's lineups and start betting

Requirements

# Import libraries
import betfairlightweight
from betfairlightweight import filters
from datetime import datetime
from datetime import timedelta
from dateutil import tz
import math
import numpy as np
import pandas as pd
from scipy.stats import zscore
from sklearn.linear_model import LogisticRegression
import fasttrack as ft

1. Download historic greyhound data from FastTrack

Create a FastTrack object

Enter in your FastTrack security key. Create a Fastrack object with this key which will also check whether the key is valid. If the key is vaid, a "Valid Security Key" message will be printed. The created 'greys' object will allow us to call a bunch of functions that interact with the FastTrack DDC.

seckey = "your_security_key"
greys = ft.Fasttrack(seckey)
Valid Security Key

Find a list of greyhound tracks and FastTrack track codes

Call the listTracks function which creates a DataFrame containing all the greyhound tracks, their track codes and their state.

track_codes = greys.listTracks()
track_codes.head()
track_name track_code state
0 Albury 223 NSW
1 Armidale 225 NSW
2 Bathurst 226 NSW
3 Broke Hill 227 NSW
4 Bulli 202 NSW

Later on in this tutorial, we will be building a greyhound model on QLD tracks only so let's create a list of the QLD FastTrack track codes which will be used later to filter our data downloads for only QLD tracks.

tracks_filter = list(track_codes[track_codes['state'] == 'QLD']['track_code'])
tracks_filter
['400',
 '409',
 '401',
 '402',
 '403',
 '404',
 '405',
 '406',
 '407',
 '408',
 '410',
 '411',
 '412',
 '414',
 '413']

Call the getRaceResults function

Call the getRaceResults function which will retrieve race details and historic results for all races between two dates. The function takes in two parameters and one optional third parameter. Two DataFrames are returned, the first contains all the details for each race and the second contains the dog results for each race.

getRaceResults(dt_start, dt_end, tracks = None)

  • dt_start: the start date of the results you want to retrieve (str yyyy-mm-dd)
  • dt_end: the end date of the results you want to retrieve (str yyyy-mm-dd)
  • tracks: optional parameter which will restrict the download to only races in this list. If left blank, all tracks will be downloaded (list of str)

In this example, we'll retrieve data from 2018-01-01 to 2021-06-15 and restrict the download to our tracks_filter list which contains only the QLD track codes.

race_details, dog_results = greys.getRaceResults('2018-01-01', '2021-06-15', tracks_filter)
Getting meets for each date ..

100%|██████████████████████████████████████████████████████████████████████████████| 1262/1262 [10:34<00:00,  1.99it/s]

Getting historic results details ..

100%|██████████████████████████████████████████████████████████████████████████████| 2045/2045 [22:22<00:00,  1.52it/s]

race_details.head()
@id RaceNum RaceName RaceTime Distance RaceGrade Track date
0 285107231 1 UBET - DOWNLOAD THE APP 06:24PM 520m Maiden Albion Park 01 Jan 18
1 285107232 2 THIRTY TALKS @ STUD 06:47PM 600m Restricted Win Albion Park 01 Jan 18
2 285107233 3 BOX 1 PHOTOGRAPHY 07:02PM 331m Grade 5 Albion Park 01 Jan 18
3 285107234 4 ASPLEY LEAGUES CLUB 07:26PM 395m Mixed 4/5 Albion Park 01 Jan 18
4 285107235 5 TWITTER @ BRISGREYS 07:52PM 520m Mixed 3/4 Albion Park 01 Jan 18
dog_results.head()
@id Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney RaceId TrainerId TrainerName
0 124886323 1 MERLOT HAYZE 2 2 27.5 $5.10 None 3.00 None 32 0 32 5.76 30.46 1260.00 285107231 12979 T Trigg
1 1362060038 2 SPIN THAT WHEEL 1 1 28.4 $2.70F None 3.00 3.14 11 0 11 5.67 30.68 360.00 285107231 160421 C Schmidt
2 1770370034 3 SOMERVILLE 8 8 32.7 $11.70 None 6.25 3.29 23 0 23 5.75 30.91 180.00 285107231 69795 L Green
3 108391387 4 SYFY LEGEND 6 6 30.4 $8.30 None 15.75 9.43 54 0 54 5.81 31.57 0.00 285107231 82013 S Kleinhans
4 2032540059 5 GET MESSI 5 5 34.4 $10.20 None 17.25 1.57 46 0 46 5.80 31.68 0.00 285107231 87148 S Lawrance

Here we do some basic data manipulation and cleansing to get variables into format that we can work with. Also adding on a few variables that will be handy down the track. Nothing too special here.

race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
race_details['trackdist'] = race_details['Track'] + race_details['Distance'].astype(str)

dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x  == None 
    else float(x.replace('$', '').replace('F', '')))
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results = dog_results.merge(
    race_details[['FastTrack_RaceId', 'Distance', 'RaceGrade', 'Track', 'date_dt', 'trackdist']], 
    how = 'left',
    on = 'FastTrack_RaceId'
)
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x in ['1', '1='] else 0)

print("Number of races in dataset: " + str(dog_results['FastTrack_RaceId'].nunique()))
Number of races in dataset: 20760


2. Build a simple machine learning model

* NOTE: This model is not profitable. It is provided for educational purposes only. *

Construct some simple features

We'll start by constructing some simple features. Normally we'd explore the data, but the objective of this tutorial is to demonstrate how to connect to FastTrack and Betfair so we'll skip the exploration step and jump straight to model building to generate some probability outputs.

dog_results = dog_results.sort_values(by = ['FastTrack_DogId', 'date_dt'])
dog_results = dog_results.set_index('date_dt')

# Normalise the runtimes for each trackdist so we can compare runs across different track distance combinations. 
# We are making an unrealistic assumption that a dog that can run a good time  on one trackdistance can run a 
# good time on a different trackdistance
dog_results['RunTime_norm'] = dog_results.groupby('trackdist')['RunTime'].transform(lambda x: zscore(x, nan_policy = 'omit'))

# Feature 1 - Total prize money won over the last 365 Days
dog_results['Prizemoney_365D'] = dog_results.groupby('FastTrack_DogId')['Prizemoney'].apply(lambda x: x.rolling("365D").sum().shift(1))
dog_results['Prizemoney_365D'].fillna(0, inplace = True)

# Feature 2 - Number of runs over the last 365D
dog_results['runs_365D'] = dog_results.groupby('FastTrack_DogId')['win'].apply(lambda x: x.rolling("365D").count().shift(1))
dog_results['runs_365D'].fillna(0, inplace = True)

# Feature 3 - win % over the last 365D
dog_results['wins_365D'] = dog_results.groupby('FastTrack_DogId')['win'].apply(lambda x: x.rolling("365D").sum().shift(1))
dog_results['wins_365D'].fillna(0, inplace = True)
dog_results['win%_365D'] = dog_results['wins_365D'] / dog_results['runs_365D']

# Feature 4 - Best runtime over the last 365D
dog_results['RunTime_norm_best_365D'] = dog_results.groupby('FastTrack_DogId')['RunTime_norm'].apply(lambda x: x.rolling("365D").min().shift(1))

# Feature 5 - Median runtime over the last 365D
dog_results['RunTime_norm_median_365D'] = dog_results.groupby('FastTrack_DogId')['RunTime_norm'].apply(lambda x: x.rolling("365D").median().shift(1))

dog_results.head(10)
FastTrack_DogId Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 ... Track trackdist win RunTime_norm Prizemoney_365D runs_365D wins_365D win%_365D RunTime_norm_best_365D RunTime_norm_median_365D
date_dt
2018-04-08 -2143477289 3 SUNBURNT SWAMPY 3 3 31.6 11.2 None 4.75 1.86 ... Albion Park Albion Park331 0 0.856147 0.0 0.0 0.0 NaN NaN NaN
2018-04-15 -2143477289 6 SUNBURNT SWAMPY 4 4 31.1 38.9 None 12.75 0.14 ... Albion Park Albion Park331 0 0.991574 175.0 1.0 0.0 0.0 0.856147 0.856147
2018-04-22 -2143477289 6 SUNBURNT SWAMPY 5 5 30.7 29.1 None 9.50 4.57 ... Albion Park Albion Park331 0 1.194715 175.0 2.0 0.0 0.0 0.856147 0.923861
2018-07-15 -2143477289 3 SUNBURNT SWAMPY 3 3 31.9 38.1 None 10.00 0.00 ... Albion Park Albion Park331 0 0.675578 175.0 3.0 0.0 0.0 0.856147 0.991574
2018-09-02 -2143477289 6 SUNBURNT SWAMPY 2 2 32.8 11.7 None 8.25 3.57 ... Albion Park Albion Park331 0 0.607864 350.0 4.0 0.0 0.0 0.675578 0.923861
2018-09-09 -2143477289 7 SUNBURNT SWAMPY 6 6 32.6 41.0 None 12.75 3.71 ... Albion Park Albion Park331 0 1.262428 350.0 5.0 0.0 0.0 0.607864 0.856147
2018-09-16 -2143477289 4 SUNBURNT SWAMPY 1 1 32.3 18.0 None 1.50 0.43 ... Albion Park Albion Park331 0 -0.385268 350.0 6.0 0.0 0.0 0.607864 0.923861
2018-10-14 -2143477289 5 SUNBURNT SWAMPY 8 8 32.3 5.5 None 11.25 1.29 ... Albion Park Albion Park331 0 1.217286 350.0 7.0 0.0 0.0 -0.385268 0.856147
2018-11-18 -2143477289 7 SUNBURNT SWAMPY 3 3 32.8 21.0 None 9.25 1.71 ... Albion Park Albion Park331 0 1.262428 350.0 8.0 0.0 0.0 -0.385268 0.923861
2019-05-26 -2143477289 4 SUNBURNT SWAMPY 7 7 31.7 71.0 None 11.00 1.86 ... Albion Park Albion Park331 0 0.517579 350.0 9.0 0.0 0.0 -0.385268 0.991574

10 rows × 31 columns

Convert all features into Z-scores within each race so that the features are on a relative basis when fed into the model

dog_results = dog_results.sort_values(by = ['date_dt', 'FastTrack_RaceId'])

for col in ['Prizemoney_365D', 'runs_365D', 'win%_365D',
            'RunTime_norm_best_365D', 'RunTime_norm_median_365D']:
    dog_results[col + '_Z'] = dog_results.groupby('FastTrack_RaceId')[col].transform(lambda x: zscore(x, ddof = 1))

dog_results['runs_365D_Z'].fillna(0, inplace = True)
dog_results['win%_365D_Z'].fillna(0, inplace = True)

Train the model

Next, we'll train our model. To keep things simple, we'll choose a Logistic Regression from the sklearn package.

For modelling purposes, we'll only keep data after 2019 as our features use the last 365 days of history so data in 2018 won't capture an entire 365 day period. Also we'll only keep races where each dog has a value for each feature. The last piece of code is to just double check the DataFrame has no null values.

dog_results = dog_results.reset_index()
dog_results = dog_results.sort_values(by = ['date_dt', 'FastTrack_RaceId'])

# Only keep data aFter 2019
model_df = dog_results[dog_results['date_dt'] >= '2019-01-01']
feature_cols = ['Prizemoney_365D_Z', 'runs_365D_Z', 'win%_365D_Z',
                'RunTime_norm_best_365D_Z', 'RunTime_norm_median_365D_Z']
model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice'] + feature_cols]

# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]

# checking if any null values
model_df.drop(columns = 'StartPrice').isnull().values.any()
False

We will use pre-2021 as our train dataset and post-2021 as our test dataset which gives us an approximate 80/20 split of train to test data.

Note that one issue with training our model this way is that we are training each dog result individually and not in conjunction with the other dogs in the race. Therefore the probabilities are not guaranteed to add up to 1.

# Split the data into train and test data
train_data = model_df[model_df['date_dt'] < '2021-01-01'].reset_index(drop = True)
test_data = model_df[model_df['date_dt'] >= '2021-01-01'].reset_index(drop = True)

train_x, train_y = train_data[feature_cols], train_data['win']
test_x, test_y = test_data[feature_cols], test_data['win']

logit_model = LogisticRegression()
logit_model.fit(train_x, train_y)

test_data['prob_unscaled'] = logit_model.predict_proba(test_x)[:,1]
test_data.groupby('FastTrack_RaceId')['prob_unscaled'].sum()
FastTrack_RaceId
626218700    0.840901
626218701    0.731972
626218702    0.754034
626218703    0.986967
626218704    0.990238
               ...   
680757815    1.178215
680757816    0.847067
680757817    1.043633
680757818    0.805511
680757819    0.782609
Name: prob_unscaled, Length: 2491, dtype: float64

To correct for this, we'll apply a scaling factor to the model's raw outputs to force them to sum to 1. A better way to do this would be to use a conditional logistic regression which in the training process would ensure probabilities sum to unity.

# Scale the raw model output so they sum to unity
test_data['prob_scaled'] = test_data.groupby('FastTrack_RaceId')['prob_unscaled'].apply(lambda x: x / sum(x))
test_data.groupby('FastTrack_RaceId')['prob_scaled'].sum()
FastTrack_RaceId
626218700    1.0
626218701    1.0
626218702    1.0
626218703    1.0
626218704    1.0
            ... 
680757815    1.0
680757816    1.0
680757817    1.0
680757818    1.0
680757819    1.0
Name: prob_scaled, Length: 2491, dtype: float64

As a rudimentary check, let's see how many races the model correctly predicts using the highest probability in a given race as our pick. We'll also do the same for the starting price odds as a comparison.

The model predicts the winner in 33% of races which is not great given the starting price predicts it in 41.7% of races ... but it will do for our purposes!

# Create a boolean column for whether a dog has the higehst model prediction in a race. Do the same for the starting price 
# as a comparison
test_data['model_win_prediction'] = test_data.groupby('FastTrack_RaceId')['prob_scaled'].apply(lambda x: x == max(x))
test_data['odds_win_prediction'] = test_data.groupby('FastTrack_RaceId')['StartPrice'].apply(lambda x: x == min(x))

print("Model predicts the winner in {:.2%} of races".format(
    len(test_data[(test_data['model_win_prediction'] == True) & (test_data['win'] == 1)]) / test_data['FastTrack_RaceId'].nunique()
    ))
print("Starting Price Odds predicts the winner in {:.2%} of races".format(
    len(test_data[(test_data['odds_win_prediction'] == True) & (test_data['win'] == 1)]) / test_data['FastTrack_RaceId'].nunique()
    ))
Model predicts the winner in 32.96% of races
Starting Price Odds predicts the winner in 41.75% of races


3. Retrieve today's race lineups

Retrieve today's lineups from FastTrack

Now that we have trained our model. We want to get today's races from FastTrack and run the model over it.

We have two options from FastTrack:

  • Basic Plus Format: Contains basic information about the dog lineups such as box, best time, trainer, owner, ratings, speed ratings ...
  • Full Plus Format: Contains everything in the basic format with additional information such as previous start information.

getBasicFormat(dt, tracks = None)

getFullFormat(dt, tracks = None)

The calls will return two dataframes, one with the race information and one with the individual dog information. Again, the tracks parameter is optional and if left blank, all tracks will be returned.

As we are only after the dog lineups to run our model on, let's just grab the basic format and again only restrict for QLD tracks.

qld_races_today, qld_dogs_today = greys.getBasicFormat('2021-06-16', tracks_filter)
qld_races_today.head()
Getting meets for each date ..

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.08it/s]

Getting dog lineups ..

100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.43it/s]

@id RaceNum RaceName RaceTime RaceTimeDateUTC Distance RaceGrade PrizeMoney1 PrizeMoney2 PrizeMoney3 ... Handicap TAB GradeCode VICGREYS RaceComment Track Date Quali TipsComments_Bet TipsComments_Tips
0 680665206 1 TAB ORIGIN GREYHOUNDS TOMORROW 03:32PM 16 Jun 21 05:32AM 395m Novice Non Penalty $1750 $500 $250 ... None None NNP None "" Albion Park 16 Jun 21 None None None
1 680665207 2 TERRY HILL VS BEN HANNANT 03:52PM 16 Jun 21 05:52AM 395m Maiden Heat $1750 $500 $250 ... None None MH None "" Albion Park 16 Jun 21 None None None
2 680665208 3 QLD VS NSW TOMORROW @BRISGREYS 04:17PM 16 Jun 21 06:17AM 395m Maiden Heat $1750 $500 $250 ... None None MH None "" Albion Park 16 Jun 21 None None None
3 680665209 4 ORIGIN SPRINT TOMORROW NIGHT 04:38PM 16 Jun 21 06:38AM 395m Maiden Heat $1750 $500 $250 ... None None MH None "" Albion Park 16 Jun 21 None None None
4 680665210 5 BEN HANNANT?S QLD MAROONS 04:57PM 16 Jun 21 06:57AM 395m Grade 5 Heat $1750 $500 $250 ... None None 5H None "" Albion Park 16 Jun 21 None None None

5 rows × 27 columns

Creat a list of the QLD tracks running today which will be used later when we fetch the Betfair data

# Qld tracks running today
qld_tracks_today = list(qld_races_today['Track'].unique())
qld_tracks_today
['Albion Park', 'Ipswich']

Retrieve today's lineups from the Betfair API

The FastTrack lineups contain all the dogs in a race, including reserves and scratched dogs. As we only want to run our model on final lineups, we'll need to connect to the Betfair API to update our lineups for any scratchings.

Let's first login to the Betfair API. Enter in your username, password and API key and create a betfairlightweight object.

my_username = "your_username"
my_password = "your_password"
my_app_key = "your_app_key"

trading = betfairlightweight.APIClient(my_username, my_password, app_key=my_app_key)
trading.login_interactive()
<LoginResource>

Next, we'll call the list_events operation which will return all the greyhound events in Australia over the next 24 hours.

# Create the market filter
greyhounds_event_filter = filters.market_filter(
    event_type_ids=[4339],
    market_countries=['AU'],
    market_start_time={
        'to': (datetime.utcnow() + timedelta(days=1)).strftime("%Y-%m-%dT%TZ")
    }
)

# Get a list of all greyhound events as objects
greyhounds_events = trading.betting.list_events(
    filter=greyhounds_event_filter
)

# Create a DataFrame with all the events by iterating over each event object
greyhounds_events_today = pd.DataFrame({
    'Event Name': [event_object.event.name for event_object in greyhounds_events],
    'Event ID': [event_object.event.id for event_object in greyhounds_events],
    'Event Venue': [event_object.event.venue for event_object in greyhounds_events],
    'Country Code': [event_object.event.country_code for event_object in greyhounds_events],
    'Time Zone': [event_object.event.time_zone for event_object in greyhounds_events],
    'Open Date': [event_object.event.open_date for event_object in greyhounds_events],
    'Market Count': [event_object.market_count for event_object in greyhounds_events]
})

greyhounds_events_today.head()
Event Name Event ID Event Venue Country Code Time Zone Open Date Market Count
0 Bend (AUS) 16th Jun 30618018 Bendigo AU Australia/Sydney 2021-06-16 01:37:00 36
1 WPrk (AUS) 16th Jun 30618017 Wentworth Park AU Australia/Sydney 2021-06-16 09:05:00 40
2 MBdg (AUS) 16th Jun 30618832 Murray Bridge AU Australia/Adelaide 2021-06-16 01:55:00 36
3 Cran (AUS) 16th Jun 30618160 Cranbourne AU Australia/Sydney 2021-06-16 08:44:00 34
4 Ball (AUS) 16th Jun 30618165 Ballarat AU Australia/Sydney 2021-06-16 08:58:00 60

Next, let's fetch the market ids. As we know the meets we're interested in today, let's restrict the market pull request for only the QLD tracks that are running today.

greyhounds_events_today = greyhounds_events_today[greyhounds_events_today['Event Venue'].isin(qld_tracks_today)]
greyhounds_events_today.head()
Event Name Event ID Event Venue Country Code Time Zone Open Date Market Count
7 Ipsw (AUS) 16th Jun 30618813 Ipswich AU Australia/Queensland 2021-06-16 08:55:00 40
9 APrk (AUS) 16th Jun 30618188 Albion Park AU Australia/Queensland 2021-06-16 05:32:00 27
market_catalogue_filter = filters.market_filter(
    event_ids = list(greyhounds_events_today['Event ID']),
    market_type_codes = ['WIN']
)

market_catalogue = trading.betting.list_market_catalogue(
    filter=market_catalogue_filter,
    max_results='1000',
    sort='FIRST_TO_START',
    market_projection=['MARKET_START_TIME', 'MARKET_DESCRIPTION', 'RUNNER_DESCRIPTION', 'EVENT', 'EVENT_TYPE']
)

win_markets = []
runners = []

for market_object in market_catalogue:
    # win_markets_df.append({
    #     'Event Name': market_object.event.name,
    #     'Event ID': market_object.event.id,
    #     'Event Venue': market_object.event_venue,
    #     'Market Name': market_object.market_name,
    #     'Market ID': market_object.market_id,
    #     'Market start time': market_object.market_start_time,
    #     'Total Matched': market_object.total_matched
    #     })
    win_markets.append({
        'event_name': market_object.event.name,
        'event_id': market_object.event.id,
        'event_venue': market_object.event.venue,
        'market_name': market_object.market_name,
        'market_id': market_object.market_id,
        'market_start_time': market_object.market_start_time,
        'total_matched': market_object.total_matched
        })

    for runner in market_object.runners:
        runners.append({
            'market_id': market_object.market_id,
            'runner_id': runner.selection_id,
            'runner_name': runner.runner_name
            })

win_markets_df = pd.DataFrame(win_markets)
runners_df = pd.DataFrame(runners)

For matching purposes, we'll need to extract the race number from the market_name. Also let's add another field 'local_start_time' as the market_start_time field is in UTC format.

# Extract race number from market name
win_markets_df['race_number'] = win_markets_df['market_name'].apply(
    lambda x: x[1:3].strip() if x[0] == 'R' else None)

# Functions that returns the time from a newly specified timezone given a time and an old timezone
def change_timezone(time, oldtz, newtz):
    from_zone = tz.gettz(oldtz)
    to_zone = tz.gettz(newtz)
    newtime = time.replace(tzinfo = from_zone).astimezone(to_zone).replace(tzinfo = None)
    return newtime

# Add in a local_start_time variable
win_markets_df['local_start_time'] = win_markets_df['market_start_time'].apply(lambda x: \
                    change_timezone(x, 'UTC', 'Australia/Sydney'))

win_markets_df.head()
event_name event_id event_venue market_name market_id market_start_time total_matched race_number local_start_time
0 APrk (AUS) 16th Jun 30618188 Albion Park R1 395m Nvce 1.184472300 2021-06-16 05:32:00 0.0 1 2021-06-16 15:32:00
1 APrk (AUS) 16th Jun 30618188 Albion Park R2 395m Heat 1.184472302 2021-06-16 05:52:00 0.0 2 2021-06-16 15:52:00
2 APrk (AUS) 16th Jun 30618188 Albion Park R3 395m Heat 1.184472304 2021-06-16 06:17:00 0.0 3 2021-06-16 16:17:00
3 APrk (AUS) 16th Jun 30618188 Albion Park R4 395m Heat 1.184472306 2021-06-16 06:38:00 0.0 4 2021-06-16 16:38:00
4 APrk (AUS) 16th Jun 30618188 Albion Park R5 395m Heat 1.184472308 2021-06-16 06:57:00 0.0 5 2021-06-16 16:57:00

To match the dog names from Betfair and FastTrack, we'll also need to remove the rug number from the start of the runner_name in the runners_df DataFrame.

# Remove dog number from runner_name
runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())

# Merge on the race number and event venue onto runners_df
runners_df = runners_df.merge(
     win_markets_df[['market_id', 'event_venue', 'race_number']],
     how = 'left',
     on = 'market_id')
runners_df.head()
market_id runner_id runner_name event_venue race_number
0 1.184472300 36594055 LITTLE MISS VANE Albion Park 1
1 1.184472300 39860314 MULGOWIE BELLE Albion Park 1
2 1.184472300 39860315 NIGHT CAPERS Albion Park 1
3 1.184472300 38079770 WRONG GIRL HARRY Albion Park 1
4 1.184472300 37616746 IM ON FIRE Albion Park 1

Merge race lineups from FastTrack and Betfair

Before we can merge, we'll need to do some minor formatting changes to the FastTrack names so we can match onto the Betfair names. Betfair excludes all apostrophes and full stops in their naming convention so we'll create a betfair equivalent dog name on the dataset removing these characters. We'll also tag on the race number to the lineups dataset for merging purposes as well.

qld_races_today = qld_races_today.rename(columns = {'@id': 'FastTrack_RaceId'})
qld_races_today = qld_races_today[['FastTrack_RaceId', 'Date', 'Track', 'RaceNum', 'RaceName', 
                                   'RaceTime', 'Distance', 'RaceGrade']]
qld_dogs_today = qld_dogs_today.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})
qld_dogs_today = qld_dogs_today.merge(
    qld_races_today[['FastTrack_RaceId', 'Track', 'RaceNum']],
    how = 'left',
    on = 'FastTrack_RaceId'
    )
qld_dogs_today['DogName_bf'] = qld_dogs_today['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())

Now we can merge on the FastTrack and Betfair lineup dataframes by dog name, track and race number. We'll check that all selections have been matched by making sure there are no null dog ids.

# Match on the fastTrack dogId to the runners_df
runners_df = runners_df.merge(
    qld_dogs_today[['DogName_bf', 'Track', 'RaceNum', 'FastTrack_DogId']],
    how = 'left',
    left_on = ['runner_name', 'event_venue', 'race_number'],
    right_on = ['DogName_bf', 'Track', 'RaceNum'],
    ).drop(['DogName_bf', 'Track', 'RaceNum'], axis = 1)

# Check all betfair selections are matched to a fastTrack dogId by checking if there are any null dogIds
runners_df['FastTrack_DogId'].isnull().any()
False
runners_df.head()
market_id runner_id runner_name event_venue race_number FastTrack_DogId
0 1.184472300 36594055 LITTLE MISS VANE Albion Park 1 434800466
1 1.184472300 39860314 MULGOWIE BELLE Albion Park 1 510731455
2 1.184472300 39860315 NIGHT CAPERS Albion Park 1 415994834
3 1.184472300 38079770 WRONG GIRL HARRY Albion Park 1 443645048
4 1.184472300 37616746 IM ON FIRE Albion Park 1 448841452

4. Run model on today's lineups and start betting

Create model features for the runners

First we have to create the same model features we used in our logistic regression model on the dogs in the runners_df DataFrame. As our features use historic data over the last 365 days, we'll need to filter our historic results dataset (created in step 1) for only the dog ids we are interested in and only over the last 365 days.

runners_historicdata = dog_results[dog_results['FastTrack_DogId'].isin(runners_df['FastTrack_DogId'])]
runners_historicdata = runners_historicdata.sort_values(by = ['FastTrack_DogId', 'date_dt'])
runners_historicdata = runners_historicdata[runners_historicdata['date_dt'] >= (datetime.now() - timedelta(days = 365))]

Next we create the features. As our trained model requires a non-null value in each of the features, we'll exclude all markets where at least one dog has a null feature.

# Create the feature variables over the last 365 days
runners_features = runners_historicdata.groupby('FastTrack_DogId').agg(
    Prizemoney_365D = ('Prizemoney', 'sum'),
    RunTime_norm_best_365D = ('RunTime_norm', 'min'),
    RunTime_norm_median_365D = ('RunTime_norm', 'median'),
    runs_365D = ('FastTrack_RaceId', 'count'),
    wins_365D = ('win', 'sum')
    ).reset_index()

runners_features['win%_365D'] = runners_features['wins_365D'] / runners_features['runs_365D']
runners_features = runners_features.drop('wins_365D', axis = 1)

runners_df = runners_df.merge(runners_features,
                              how = 'left',
                              on = 'FastTrack_DogId')

# Only run on races where every dog has non-null features
markets_exclude = runners_df[runners_df.isnull().any(axis = 1)]['market_id'].drop_duplicates()
runners_df = runners_df[~runners_df['market_id'].isin(markets_exclude)]

print("{0} markets are excluded".format(str(len(markets_exclude))))

# Convert the feature variables into Z-scores
for col in ['Prizemoney_365D', 'runs_365D', 'win%_365D',
            'RunTime_norm_best_365D', 'RunTime_norm_median_365D']:
    runners_df[col + '_Z'] = runners_df.groupby('market_id')[col].transform(
        lambda x: zscore(x, ddof = 1))

runners_df['runs_365D_Z'].fillna(0, inplace = True)
runners_df['win%_365D_Z'].fillna(0, inplace = True)
6 markets are excluded

Attach the model output onto the runners_df DataFrame. We will also scale the probabilities to sum to unity (same as what we did when assessing the trained model outputs in step 2).

Let's also add a column for model fair odds which is just the reciprocal of the prob_scaled. We'll also add another column for the minimum back odds we're willing to take assuming we'd only bet off a 10% model overlay.

runners_df['prob_unscaled'] = logit_model.predict_proba(runners_df[feature_cols])[:,1]
runners_df['prob_scaled'] = runners_df.groupby('market_id')['prob_unscaled'].apply(lambda x: x / sum(x))
runners_df['model_fairodds'] = 1 / runners_df['prob_scaled']
runners_df['min_odds'] = (0.1 + 1) / runners_df['prob_scaled']
runners_df[['market_id', 'runner_name', 'event_venue', 'prob_scaled', 'model_fairodds', 'min_odds']].head()
market_id runner_name event_venue prob_scaled model_fairodds min_odds
0 1.184472300 LITTLE MISS VANE Albion Park 0.056184 17.798518 19.578370
1 1.184472300 MULGOWIE BELLE Albion Park 0.376277 2.657620 2.923382
2 1.184472300 NIGHT CAPERS Albion Park 0.325158 3.075425 3.382967
3 1.184472300 WRONG GIRL HARRY Albion Park 0.152564 6.554620 7.210082
4 1.184472300 IM ON FIRE Albion Park 0.089817 11.133812 12.247193

Now we can start betting!

Now we can start betting! For demonstration, we'll only bet on one market, but it's just as easy to set it up to bet on all markets based on your model probabilities. Let's take the first market only and create a separate DataFrame from runners_df with only those runners in that market.

market_id = win_markets_df['market_id'][0]
bet_df = runners_df[runners_df['market_id'] == market_id].reset_index(drop = True)
bet_df
market_id runner_id runner_name event_venue race_number FastTrack_DogId Prizemoney_365D RunTime_norm_best_365D RunTime_norm_median_365D runs_365D win%_365D Prizemoney_365D_Z runs_365D_Z win%_365D_Z RunTime_norm_best_365D_Z RunTime_norm_median_365D_Z prob_unscaled prob_scaled model_fairodds min_odds
0 1.184472300 36594055 LITTLE MISS VANE Albion Park 1 434800466 2175.0 -0.049480 0.538154 12.0 0.083333 -0.114708 0.70791 -0.774043 1.632343 0.269920 0.037524 0.056184 17.798518 19.578370
1 1.184472300 39860314 MULGOWIE BELLE Albion Park 1 510731455 1850.0 -1.029058 -0.897651 2.0 0.500000 -0.818108 -0.97759 1.600323 -0.174870 -1.638741 0.251304 0.376277 2.657620 2.923382
2 1.184472300 39860315 NIGHT CAPERS Albion Park 1 415994834 2500.0 -1.513865 0.231713 5.0 0.200000 0.588691 -0.47194 -0.109221 -1.069288 -0.137442 0.217164 0.325158 3.075425 3.382967
3 1.184472300 38079770 WRONG GIRL HARRY Albion Park 1 443645048 1750.0 -1.151977 0.743528 4.0 0.250000 -1.034539 -0.64049 0.175703 -0.401644 0.542930 0.101893 0.152564 6.554620 7.210082
4 1.184472300 37616746 IM ON FIRE Albion Park 1 448841452 2865.0 -0.926976 1.059778 16.0 0.062500 1.378664 1.38211 -0.892762 0.013458 0.963332 0.059986 0.089817 11.133812 12.247193

One thing we have to ensure is that the odds that we place adhere to the betfair price increments stucture. For example odds of 19.578370 are not valid odds to place a bet on. If we were to try we would get an INVALID_ODDS error. For more information on valid price increments click here.

We'll create a function that rounds odds up to the nearest valid price increment and apply this to our min_odds field.

def roundUpOdds(odds):
    if odds < 2:
        return math.ceil(odds * 100) / 100
    elif odds < 3:
        return math.ceil(odds * 50) / 50
    elif odds < 4:
        return math.ceil(odds * 20) / 20
    elif odds < 6:
        return math.ceil(odds * 10) / 10
    elif odds < 10:
        return math.ceil(odds * 5) / 5
    elif odds < 20:
        return math.ceil(odds * 2) / 2
    elif odds < 30:
        return math.ceil(odds * 1) / 1
    elif odds < 50:
        return math.ceil(odds * 0.5) / 0.5
    elif odds < 100:
        return math.ceil(odds * 0.2) / 0.2
    elif odds < 1000:
        return math.ceil(odds * 0.1) / 0.1
    else:
        return odds

bet_df['min_odds'] = bet_df['min_odds'].apply(lambda x: roundUpOdds(x))
bet_df
market_id runner_id runner_name event_venue race_number FastTrack_DogId Prizemoney_365D RunTime_norm_best_365D RunTime_norm_median_365D runs_365D win%_365D Prizemoney_365D_Z runs_365D_Z win%_365D_Z RunTime_norm_best_365D_Z RunTime_norm_median_365D_Z prob_unscaled prob_scaled model_fairodds min_odds
0 1.184472300 36594055 LITTLE MISS VANE Albion Park 1 434800466 2175.0 -0.049480 0.538154 12.0 0.083333 -0.114708 0.70791 -0.774043 1.632343 0.269920 0.037524 0.056184 17.798518 20.00
1 1.184472300 39860314 MULGOWIE BELLE Albion Park 1 510731455 1850.0 -1.029058 -0.897651 2.0 0.500000 -0.818108 -0.97759 1.600323 -0.174870 -1.638741 0.251304 0.376277 2.657620 2.94
2 1.184472300 39860315 NIGHT CAPERS Albion Park 1 415994834 2500.0 -1.513865 0.231713 5.0 0.200000 0.588691 -0.47194 -0.109221 -1.069288 -0.137442 0.217164 0.325158 3.075425 3.40
3 1.184472300 38079770 WRONG GIRL HARRY Albion Park 1 443645048 1750.0 -1.151977 0.743528 4.0 0.250000 -1.034539 -0.64049 0.175703 -0.401644 0.542930 0.101893 0.152564 6.554620 7.40
4 1.184472300 37616746 IM ON FIRE Albion Park 1 448841452 2865.0 -0.926976 1.059778 16.0 0.062500 1.378664 1.38211 -0.892762 0.013458 0.963332 0.059986 0.089817 11.133812 12.50

Now that we have valid minimum odds that we want to bet on for each selection, we'll start betting. The following function will place a standard limit bet on Betfair on the specified market_id and selection_id at the specified size and price.

# Create a function to place a bet using betfairlightweight
def placeBackBet(instance, market_id, selection_id, size, price):
    order_filter = filters.limit_order(
        size = size,
        price = price,
        persistence_type = "LAPSE"
    )
    instructions_filter = filters.place_instruction(
        selection_id = str(selection_id),
        order_type = "LIMIT",
        side = "BACK",
        limit_order = order_filter
    )
    order  = instance.betting.place_orders(
        market_id = market_id,
        instructions = [instructions_filter]
    )
    print("Bet Place on selection {0} is {1}".format(str(selection_id), order.__dict__['_data']['status']))
    return order

Now let's loop through the runners in bet_df and place a bet of $5 on each runner at the minimum odds we're willing to take.

for selection_id, min_odds in zip(bet_df['runner_id'], bet_df['min_odds']):
    placeBackBet(trading, market_id, selection_id, 5, min_odds)
Bet Place on selection 36594055 is SUCCESS
Bet Place on selection 39860314 is SUCCESS
Bet Place on selection 39860315 is SUCCESS
Bet Place on selection 38079770 is SUCCESS
Bet Place on selection 37616746 is SUCCESS

And success! We have downloaded historical greyhound form data from FastTrack, built a simple model, and bet off this model using the Betfair API.


Disclaimer

Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.

Back to top