Greyhound form FastTrack tutorial
Building a model from greyhound historic data to place bets on
Deprecation
The FastTrack API has changed to the Topaz API and the below tutorial will not work with the new Topaz API. It is displayed here as learning material only. Please visit the Topaz API tutorial.
Workshop
Overview
This tutorial will walk through how to retrieve historic greyhound form data from FastTrack by accessing their Data Download Centre (DDC). We will then build a simple model on the data to demonstrate how we can then easily start betting on Betfair using the Betfair API. The tutorial will be broken up into four sections:
- Download historic greyhound data from FastTrack DDC
- Build a simple machine learning model
- Retrieve today's race lineups from FastTrack and Betfair API
- Run model on today's lineups and start betting
# Import libraries
import betfairlightweight
from betfairlightweight import filters
from datetime import datetime
from datetime import timedelta
from dateutil import tz
import math
import numpy as np
import pandas as pd
from scipy.stats import zscore
from sklearn.linear_model import LogisticRegression
import fasttrack as ft
1. Download historic greyhound data from FastTrack
Create a FastTrack object
Enter in your FastTrack security key. Create a Fastrack object with this key which will also check whether the key is valid. If the key is vaid, a "Valid Security Key" message will be printed. The created 'greys' object will allow us to call a bunch of functions that interact with the FastTrack DDC.
Find a list of greyhound tracks and FastTrack track codes
Call the listTracks function which creates a DataFrame containing all the greyhound tracks, their track codes and their state.
Later on in this tutorial, we will be building a greyhound model on QLD tracks only so let's create a list of the QLD FastTrack track codes which will be used later to filter our data downloads for only QLD tracks.
Call the getRaceResults function
Call the getRaceResults function which will retrieve race details and historic results for all races between two dates. The function takes in two parameters and one optional third parameter. Two DataFrames are returned, the first contains all the details for each race and the second contains the dog results for each race.
getRaceResults(dt_start, dt_end, tracks = None)
dt_start
: the start date of the results you want to retrieve (str yyyy-mm-dd)dt_end
: the end date of the results you want to retrieve (str yyyy-mm-dd)tracks
: optional parameter which will restrict the download to only races in this list. If left blank, all tracks will be downloaded (list of str)
In this example, we'll retrieve data from 2018-01-01 to 2021-06-15 and restrict the download to our tracks_filter
list which contains only the QLD track codes.
Here we do some basic data manipulation and cleansing to get variables into format that we can work with. Also adding on a few variables that will be handy down the track. Nothing too special here.
race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
race_details['trackdist'] = race_details['Track'] + race_details['Distance'].astype(str)
dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x == None
else float(x.replace('$', '').replace('F', '')))
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results = dog_results.merge(
race_details[['FastTrack_RaceId', 'Distance', 'RaceGrade', 'Track', 'date_dt', 'trackdist']],
how = 'left',
on = 'FastTrack_RaceId'
)
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x in ['1', '1='] else 0)
print("Number of races in dataset: " + str(dog_results['FastTrack_RaceId'].nunique()))
2. Build a simple machine learning model
* NOTE: This model is not profitable. It is provided for educational purposes only. *
Construct some simple features
We'll start by constructing some simple features. Normally we'd explore the data, but the objective of this tutorial is to demonstrate how to connect to FastTrack and Betfair so we'll skip the exploration step and jump straight to model building to generate some probability outputs.
dog_results = dog_results.sort_values(by = ['FastTrack_DogId', 'date_dt'])
dog_results = dog_results.set_index('date_dt')
# Normalise the runtimes for each trackdist so we can compare runs across different track distance combinations.
# We are making an unrealistic assumption that a dog that can run a good time on one trackdistance can run a
# good time on a different trackdistance
dog_results['RunTime_norm'] = dog_results.groupby('trackdist')['RunTime'].transform(lambda x: zscore(x, nan_policy = 'omit'))
# Feature 1 - Total prize money won over the last 365 Days
dog_results['Prizemoney_365D'] = dog_results.groupby('FastTrack_DogId')['Prizemoney'].apply(lambda x: x.rolling("365D").sum().shift(1))
dog_results['Prizemoney_365D'].fillna(0, inplace = True)
# Feature 2 - Number of runs over the last 365D
dog_results['runs_365D'] = dog_results.groupby('FastTrack_DogId')['win'].apply(lambda x: x.rolling("365D").count().shift(1))
dog_results['runs_365D'].fillna(0, inplace = True)
# Feature 3 - win % over the last 365D
dog_results['wins_365D'] = dog_results.groupby('FastTrack_DogId')['win'].apply(lambda x: x.rolling("365D").sum().shift(1))
dog_results['wins_365D'].fillna(0, inplace = True)
dog_results['win%_365D'] = dog_results['wins_365D'] / dog_results['runs_365D']
# Feature 4 - Best runtime over the last 365D
dog_results['RunTime_norm_best_365D'] = dog_results.groupby('FastTrack_DogId')['RunTime_norm'].apply(lambda x: x.rolling("365D").min().shift(1))
# Feature 5 - Median runtime over the last 365D
dog_results['RunTime_norm_median_365D'] = dog_results.groupby('FastTrack_DogId')['RunTime_norm'].apply(lambda x: x.rolling("365D").median().shift(1))
dog_results.head(10)
Convert all features into Z-scores within each race so that the features are on a relative basis when fed into the model
dog_results = dog_results.sort_values(by = ['date_dt', 'FastTrack_RaceId'])
for col in ['Prizemoney_365D', 'runs_365D', 'win%_365D',
'RunTime_norm_best_365D', 'RunTime_norm_median_365D']:
dog_results[col + '_Z'] = dog_results.groupby('FastTrack_RaceId')[col].transform(lambda x: zscore(x, ddof = 1))
dog_results['runs_365D_Z'].fillna(0, inplace = True)
dog_results['win%_365D_Z'].fillna(0, inplace = True)
Train the model
Next, we'll train our model. To keep things simple, we'll choose a Logistic Regression from the sklearn package.
For modelling purposes, we'll only keep data after 2019 as our features use the last 365 days of history so data in 2018 won't capture an entire 365 day period. Also we'll only keep races where each dog has a value for each feature. The last piece of code is to just double check the DataFrame has no null values.
dog_results = dog_results.reset_index()
dog_results = dog_results.sort_values(by = ['date_dt', 'FastTrack_RaceId'])
# Only keep data aFter 2019
model_df = dog_results[dog_results['date_dt'] >= '2019-01-01']
feature_cols = ['Prizemoney_365D_Z', 'runs_365D_Z', 'win%_365D_Z',
'RunTime_norm_best_365D_Z', 'RunTime_norm_median_365D_Z']
model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice'] + feature_cols]
# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]
# checking if any null values
model_df.drop(columns = 'StartPrice').isnull().values.any()
We will use pre-2021 as our train dataset and post-2021 as our test dataset which gives us an approximate 80/20 split of train to test data.
Note that one issue with training our model this way is that we are training each dog result individually and not in conjunction with the other dogs in the race. Therefore the probabilities are not guaranteed to add up to 1.
# Split the data into train and test data
train_data = model_df[model_df['date_dt'] < '2021-01-01'].reset_index(drop = True)
test_data = model_df[model_df['date_dt'] >= '2021-01-01'].reset_index(drop = True)
train_x, train_y = train_data[feature_cols], train_data['win']
test_x, test_y = test_data[feature_cols], test_data['win']
logit_model = LogisticRegression()
logit_model.fit(train_x, train_y)
test_data['prob_unscaled'] = logit_model.predict_proba(test_x)[:,1]
test_data.groupby('FastTrack_RaceId')['prob_unscaled'].sum()
To correct for this, we'll apply a scaling factor to the model's raw outputs to force them to sum to 1. A better way to do this would be to use a conditional logistic regression which in the training process would ensure probabilities sum to unity.
As a rudimentary check, let's see how many races the model correctly predicts using the highest probability in a given race as our pick. We'll also do the same for the starting price odds as a comparison.
The model predicts the winner in 33% of races which is not great given the starting price predicts it in 41.7% of races ... but it will do for our purposes!
# Create a boolean column for whether a dog has the higehst model prediction in a race. Do the same for the starting price
# as a comparison
test_data['model_win_prediction'] = test_data.groupby('FastTrack_RaceId')['prob_scaled'].apply(lambda x: x == max(x))
test_data['odds_win_prediction'] = test_data.groupby('FastTrack_RaceId')['StartPrice'].apply(lambda x: x == min(x))
print("Model predicts the winner in {:.2%} of races".format(
len(test_data[(test_data['model_win_prediction'] == True) & (test_data['win'] == 1)]) / test_data['FastTrack_RaceId'].nunique()
))
print("Starting Price Odds predicts the winner in {:.2%} of races".format(
len(test_data[(test_data['odds_win_prediction'] == True) & (test_data['win'] == 1)]) / test_data['FastTrack_RaceId'].nunique()
))
3. Retrieve today's race lineups
Retrieve today's lineups from FastTrack
Now that we have trained our model. We want to get today's races from FastTrack and run the model over it.
We have two options from FastTrack:
- Basic Plus Format: Contains basic information about the dog lineups such as box, best time, trainer, owner, ratings, speed ratings ...
- Full Plus Format: Contains everything in the basic format with additional information such as previous start information.
getBasicFormat(dt, tracks = None)
getFullFormat(dt, tracks = None)
The calls will return two dataframes, one with the race information and one with the individual dog information. Again, the tracks parameter is optional and if left blank, all tracks will be returned.
As we are only after the dog lineups to run our model on, let's just grab the basic format and again only restrict for QLD tracks.
Creat a list of the QLD tracks running today which will be used later when we fetch the Betfair data
Retrieve today's lineups from the Betfair API
The FastTrack lineups contain all the dogs in a race, including reserves and scratched dogs. As we only want to run our model on final lineups, we'll need to connect to the Betfair API to update our lineups for any scratchings.
Let's first login to the Betfair API. Enter in your username, password and API key and create a betfairlightweight object.
Next, we'll call the list_events operation which will return all the greyhound events in Australia over the next 24 hours.
# Create the market filter
greyhounds_event_filter = filters.market_filter(
event_type_ids=[4339],
market_countries=['AU'],
market_start_time={
'to': (datetime.utcnow() + timedelta(days=1)).strftime("%Y-%m-%dT%TZ")
}
)
# Get a list of all greyhound events as objects
greyhounds_events = trading.betting.list_events(
filter=greyhounds_event_filter
)
# Create a DataFrame with all the events by iterating over each event object
greyhounds_events_today = pd.DataFrame({
'Event Name': [event_object.event.name for event_object in greyhounds_events],
'Event ID': [event_object.event.id for event_object in greyhounds_events],
'Event Venue': [event_object.event.venue for event_object in greyhounds_events],
'Country Code': [event_object.event.country_code for event_object in greyhounds_events],
'Time Zone': [event_object.event.time_zone for event_object in greyhounds_events],
'Open Date': [event_object.event.open_date for event_object in greyhounds_events],
'Market Count': [event_object.market_count for event_object in greyhounds_events]
})
greyhounds_events_today.head()
Next, let's fetch the market ids. As we know the meets we're interested in today, let's restrict the market pull request for only the QLD tracks that are running today.
market_catalogue_filter = filters.market_filter(
event_ids = list(greyhounds_events_today['Event ID']),
market_type_codes = ['WIN']
)
market_catalogue = trading.betting.list_market_catalogue(
filter=market_catalogue_filter,
max_results='1000',
sort='FIRST_TO_START',
market_projection=['MARKET_START_TIME', 'MARKET_DESCRIPTION', 'RUNNER_DESCRIPTION', 'EVENT', 'EVENT_TYPE']
)
win_markets = []
runners = []
for market_object in market_catalogue:
# win_markets_df.append({
# 'Event Name': market_object.event.name,
# 'Event ID': market_object.event.id,
# 'Event Venue': market_object.event_venue,
# 'Market Name': market_object.market_name,
# 'Market ID': market_object.market_id,
# 'Market start time': market_object.market_start_time,
# 'Total Matched': market_object.total_matched
# })
win_markets.append({
'event_name': market_object.event.name,
'event_id': market_object.event.id,
'event_venue': market_object.event.venue,
'market_name': market_object.market_name,
'market_id': market_object.market_id,
'market_start_time': market_object.market_start_time,
'total_matched': market_object.total_matched
})
for runner in market_object.runners:
runners.append({
'market_id': market_object.market_id,
'runner_id': runner.selection_id,
'runner_name': runner.runner_name
})
win_markets_df = pd.DataFrame(win_markets)
runners_df = pd.DataFrame(runners)
For matching purposes, we'll need to extract the race number from the market_name. Also let's add another field 'local_start_time' as the market_start_time field is in UTC format.
# Extract race number from market name
win_markets_df['race_number'] = win_markets_df['market_name'].apply(
lambda x: x[1:3].strip() if x[0] == 'R' else None)
# Functions that returns the time from a newly specified timezone given a time and an old timezone
def change_timezone(time, oldtz, newtz):
from_zone = tz.gettz(oldtz)
to_zone = tz.gettz(newtz)
newtime = time.replace(tzinfo = from_zone).astimezone(to_zone).replace(tzinfo = None)
return newtime
# Add in a local_start_time variable
win_markets_df['local_start_time'] = win_markets_df['market_start_time'].apply(lambda x: \
change_timezone(x, 'UTC', 'Australia/Sydney'))
win_markets_df.head()
To match the dog names from Betfair and FastTrack, we'll also need to remove the rug number from the start of the runner_name in the runners_df DataFrame.
# Remove dog number from runner_name
runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
# Merge on the race number and event venue onto runners_df
runners_df = runners_df.merge(
win_markets_df[['market_id', 'event_venue', 'race_number']],
how = 'left',
on = 'market_id')
runners_df.head()
Merge race lineups from FastTrack and Betfair
Before we can merge, we'll need to do some minor formatting changes to the FastTrack names so we can match onto the Betfair names. Betfair excludes all apostrophes and full stops in their naming convention so we'll create a betfair equivalent dog name on the dataset removing these characters. We'll also tag on the race number to the lineups dataset for merging purposes as well.
qld_races_today = qld_races_today.rename(columns = {'@id': 'FastTrack_RaceId'})
qld_races_today = qld_races_today[['FastTrack_RaceId', 'Date', 'Track', 'RaceNum', 'RaceName',
'RaceTime', 'Distance', 'RaceGrade']]
qld_dogs_today = qld_dogs_today.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})
qld_dogs_today = qld_dogs_today.merge(
qld_races_today[['FastTrack_RaceId', 'Track', 'RaceNum']],
how = 'left',
on = 'FastTrack_RaceId'
)
qld_dogs_today['DogName_bf'] = qld_dogs_today['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
Now we can merge on the FastTrack and Betfair lineup dataframes by dog name, track and race number. We'll check that all selections have been matched by making sure there are no null dog ids.
# Match on the fastTrack dogId to the runners_df
runners_df = runners_df.merge(
qld_dogs_today[['DogName_bf', 'Track', 'RaceNum', 'FastTrack_DogId']],
how = 'left',
left_on = ['runner_name', 'event_venue', 'race_number'],
right_on = ['DogName_bf', 'Track', 'RaceNum'],
).drop(['DogName_bf', 'Track', 'RaceNum'], axis = 1)
# Check all betfair selections are matched to a fastTrack dogId by checking if there are any null dogIds
runners_df['FastTrack_DogId'].isnull().any()
4. Run model on today's lineups and start betting
Create model features for the runners
First we have to create the same model features we used in our logistic regression model on the dogs in the runners_df DataFrame. As our features use historic data over the last 365 days, we'll need to filter our historic results dataset (created in step 1) for only the dog ids we are interested in and only over the last 365 days.
runners_historicdata = dog_results[dog_results['FastTrack_DogId'].isin(runners_df['FastTrack_DogId'])]
runners_historicdata = runners_historicdata.sort_values(by = ['FastTrack_DogId', 'date_dt'])
runners_historicdata = runners_historicdata[runners_historicdata['date_dt'] >= (datetime.now() - timedelta(days = 365))]
Next we create the features. As our trained model requires a non-null value in each of the features, we'll exclude all markets where at least one dog has a null feature.
# Create the feature variables over the last 365 days
runners_features = runners_historicdata.groupby('FastTrack_DogId').agg(
Prizemoney_365D = ('Prizemoney', 'sum'),
RunTime_norm_best_365D = ('RunTime_norm', 'min'),
RunTime_norm_median_365D = ('RunTime_norm', 'median'),
runs_365D = ('FastTrack_RaceId', 'count'),
wins_365D = ('win', 'sum')
).reset_index()
runners_features['win%_365D'] = runners_features['wins_365D'] / runners_features['runs_365D']
runners_features = runners_features.drop('wins_365D', axis = 1)
runners_df = runners_df.merge(runners_features,
how = 'left',
on = 'FastTrack_DogId')
# Only run on races where every dog has non-null features
markets_exclude = runners_df[runners_df.isnull().any(axis = 1)]['market_id'].drop_duplicates()
runners_df = runners_df[~runners_df['market_id'].isin(markets_exclude)]
print("{0} markets are excluded".format(str(len(markets_exclude))))
# Convert the feature variables into Z-scores
for col in ['Prizemoney_365D', 'runs_365D', 'win%_365D',
'RunTime_norm_best_365D', 'RunTime_norm_median_365D']:
runners_df[col + '_Z'] = runners_df.groupby('market_id')[col].transform(
lambda x: zscore(x, ddof = 1))
runners_df['runs_365D_Z'].fillna(0, inplace = True)
runners_df['win%_365D_Z'].fillna(0, inplace = True)
Attach the model output onto the runners_df
DataFrame. We will also scale the probabilities to sum to unity (same as what we did when assessing the trained model outputs in step 2).
Let's also add a column for model fair odds which is just the reciprocal of the prob_scaled
. We'll also add another column for the minimum back odds we're willing to take assuming we'd only bet off a 10% model overlay.
runners_df['prob_unscaled'] = logit_model.predict_proba(runners_df[feature_cols])[:,1]
runners_df['prob_scaled'] = runners_df.groupby('market_id')['prob_unscaled'].apply(lambda x: x / sum(x))
runners_df['model_fairodds'] = 1 / runners_df['prob_scaled']
runners_df['min_odds'] = (0.1 + 1) / runners_df['prob_scaled']
runners_df[['market_id', 'runner_name', 'event_venue', 'prob_scaled', 'model_fairodds', 'min_odds']].head()
Now we can start betting!
Now we can start betting! For demonstration, we'll only bet on one market, but it's just as easy to set it up to bet on all markets based on your model probabilities. Let's take the first market only and create a separate DataFrame from runners_df
with only those runners in that market.
One thing we have to ensure is that the odds that we place adhere to the betfair price increments stucture. For example odds of 19.578370 are not valid odds to place a bet on. If we were to try we would get an INVALID_ODDS
error. For more information on valid price increments click here.
We'll create a function that rounds odds up to the nearest valid price increment and apply this to our min_odds
field.
def roundUpOdds(odds):
if odds < 2:
return math.ceil(odds * 100) / 100
elif odds < 3:
return math.ceil(odds * 50) / 50
elif odds < 4:
return math.ceil(odds * 20) / 20
elif odds < 6:
return math.ceil(odds * 10) / 10
elif odds < 10:
return math.ceil(odds * 5) / 5
elif odds < 20:
return math.ceil(odds * 2) / 2
elif odds < 30:
return math.ceil(odds * 1) / 1
elif odds < 50:
return math.ceil(odds * 0.5) / 0.5
elif odds < 100:
return math.ceil(odds * 0.2) / 0.2
elif odds < 1000:
return math.ceil(odds * 0.1) / 0.1
else:
return odds
bet_df['min_odds'] = bet_df['min_odds'].apply(lambda x: roundUpOdds(x))
bet_df
Now that we have valid minimum odds that we want to bet on for each selection, we'll start betting. The following function will place a standard limit bet on Betfair on the specified market_id
and selection_id
at the specified size and price.
# Create a function to place a bet using betfairlightweight
def placeBackBet(instance, market_id, selection_id, size, price):
order_filter = filters.limit_order(
size = size,
price = price,
persistence_type = "LAPSE"
)
instructions_filter = filters.place_instruction(
selection_id = str(selection_id),
order_type = "LIMIT",
side = "BACK",
limit_order = order_filter
)
order = instance.betting.place_orders(
market_id = market_id,
instructions = [instructions_filter]
)
print("Bet Place on selection {0} is {1}".format(str(selection_id), order.__dict__['_data']['status']))
return order
Now let's loop through the runners in bet_df
and place a bet of $5 on each runner at the minimum odds we're willing to take.
And success! We have downloaded historical greyhound form data from FastTrack, built a simple model, and bet off this model using the Betfair API.
Disclaimer
Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.