Greyhound form FastTrack tutorial

| Building a model from greyhound historic data to place bets on Betfair

Workshop

Overview

This tutorial will walk through how to retrieve historic greyhound form data from FastTrack by accessing their Data Download Centre (DDC). We will then build a simple model on the data to demonstrate how we can then easily start betting on Betfair using the Betfair API. The tutorial will be broken up into four sections:

Download historic greyhound data from FastTrack DDC
Build a simple machine learning model
Retrieve today's race lineups from FastTrack and Betfair API
Run model on today's lineups and start betting

Requirements

You will need a Betfair API app key. If you don't have one please follow the steps outlined on the The Automation Hub
You will need your own FastTrack security key. Please note - The FastTrack DDC has been moved across to the new Topaz API as of December 2023. This means that, while we can source a key for you, the code in this tutorial will not work for the Topaz API. We will be updating this tutorial in early 2024 to reflect the new Topaz API nomenclature and documentation. (Additionally, only Australia and New Zealand customers are eligible for a free FastTrack key). If you would like to be considered for a FastTrack Topaz key, please email data@betfair.com.au.
This notebook and accompanying files is shared on betfair-downunder's Github.
You can watch our workshop working through this tutorial on YouTube.

# Import libraries
import betfairlightweight
from betfairlightweight import filters
from datetime import datetime
from datetime import timedelta
from dateutil import tz
import math
import numpy as np
import pandas as pd
from scipy.stats import zscore
from sklearn.linear_model import LogisticRegression
import fasttrack as ft

1. Download historic greyhound data from FastTrack

Create a FastTrack object

Enter in your FastTrack security key. Create a Fastrack object with this key which will also check whether the key is valid. If the key is vaid, a "Valid Security Key" message will be printed. The created 'greys' object will allow us to call a bunch of functions that interact with the FastTrack DDC.

seckey = "your_security_key"
greys = ft.Fasttrack(seckey)

Valid Security Key

Find a list of greyhound tracks and FastTrack track codes

Call the listTracks function which creates a DataFrame containing all the greyhound tracks, their track codes and their state.

track_codes = greys.listTracks()
track_codes.head()

	track_name	track_code	state
0	Albury	223	NSW
1	Armidale	225	NSW
2	Bathurst	226	NSW
3	Broke Hill	227	NSW
4	Bulli	202	NSW

Later on in this tutorial, we will be building a greyhound model on QLD tracks only so let's create a list of the QLD FastTrack track codes which will be used later to filter our data downloads for only QLD tracks.

tracks_filter = list(track_codes[track_codes['state'] == 'QLD']['track_code'])
tracks_filter

['400',
 '409',
 '401',
 '402',
 '403',
 '404',
 '405',
 '406',
 '407',
 '408',
 '410',
 '411',
 '412',
 '414',
 '413']

Call the getRaceResults function

Call the getRaceResults function which will retrieve race details and historic results for all races between two dates. The function takes in two parameters and one optional third parameter. Two DataFrames are returned, the first contains all the details for each race and the second contains the dog results for each race.

getRaceResults(dt_start, dt_end, tracks = None)

dt_start: the start date of the results you want to retrieve (str yyyy-mm-dd)
dt_end: the end date of the results you want to retrieve (str yyyy-mm-dd)
tracks: optional parameter which will restrict the download to only races in this list. If left blank, all tracks will be downloaded (list of str)

In this example, we'll retrieve data from 2018-01-01 to 2021-06-15 and restrict the download to our tracks_filter list which contains only the QLD track codes.

race_details, dog_results = greys.getRaceResults('2018-01-01', '2021-06-15', tracks_filter)

Getting meets for each date ..

100%|██████████████████████████████████████████████████████████████████████████████| 1262/1262 [10:34<00:00,  1.99it/s]

Getting historic results details ..

100%|██████████████████████████████████████████████████████████████████████████████| 2045/2045 [22:22<00:00,  1.52it/s]

race_details.head()

	@id	RaceNum	RaceName	RaceTime	Distance	RaceGrade	Track	date
0	285107231	1	UBET - DOWNLOAD THE APP	06:24PM	520m	Maiden	Albion Park	01 Jan 18
1	285107232	2	THIRTY TALKS @ STUD	06:47PM	600m	Restricted Win	Albion Park	01 Jan 18
2	285107233	3	BOX 1 PHOTOGRAPHY	07:02PM	331m	Grade 5	Albion Park	01 Jan 18
3	285107234	4	ASPLEY LEAGUES CLUB	07:26PM	395m	Mixed 4/5	Albion Park	01 Jan 18
4	285107235	5	TWITTER @ BRISGREYS	07:52PM	520m	Mixed 3/4	Albion Park	01 Jan 18

dog_results.head()

	@id	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Comments	SplitMargin	RunTime	Prizemoney	RaceId	TrainerId	TrainerName
0	124886323	1	MERLOT HAYZE	2	2	27.5	$5.10	None	3.00	None	32	32	5.76	30.46	1260.00	285107231	12979	T Trigg
1	1362060038	2	SPIN THAT WHEEL	1	1	28.4	$2.70F	None	3.00	3.14	11	11	5.67	30.68	360.00	285107231	160421	C Schmidt
2	1770370034	3	SOMERVILLE	8	8	32.7	$11.70	None	6.25	3.29	23	23	5.75	30.91	180.00	285107231	69795	L Green
3	108391387	4	SYFY LEGEND	6	6	30.4	$8.30	None	15.75	9.43	54	54	5.81	31.57	0.00	285107231	82013	S Kleinhans
4	2032540059	5	GET MESSI	5	5	34.4	$10.20	None	17.25	1.57	46	46	5.80	31.68	0.00	285107231	87148	S Lawrance

Here we do some basic data manipulation and cleansing to get variables into format that we can work with. Also adding on a few variables that will be handy down the track. Nothing too special here.

race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
race_details['trackdist'] = race_details['Track'] + race_details['Distance'].astype(str)

dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x  == None 
    else float(x.replace('$', '').replace('F', '')))
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results = dog_results.merge(
    race_details[['FastTrack_RaceId', 'Distance', 'RaceGrade', 'Track', 'date_dt', 'trackdist']], 
    how = 'left',
    on = 'FastTrack_RaceId'
)
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x in ['1', '1='] else 0)

print("Number of races in dataset: " + str(dog_results['FastTrack_RaceId'].nunique()))

Number of races in dataset: 20760

2. Build a simple machine learning model

* NOTE: This model is not profitable. It is provided for educational purposes only. *

Construct some simple features

We'll start by constructing some simple features. Normally we'd explore the data, but the objective of this tutorial is to demonstrate how to connect to FastTrack and Betfair so we'll skip the exploration step and jump straight to model building to generate some probability outputs.

dog_results = dog_results.sort_values(by = ['FastTrack_DogId', 'date_dt'])
dog_results = dog_results.set_index('date_dt')

# Normalise the runtimes for each trackdist so we can compare runs across different track distance combinations. 
# We are making an unrealistic assumption that a dog that can run a good time  on one trackdistance can run a 
# good time on a different trackdistance
dog_results['RunTime_norm'] = dog_results.groupby('trackdist')['RunTime'].transform(lambda x: zscore(x, nan_policy = 'omit'))

# Feature 1 - Total prize money won over the last 365 Days
dog_results['Prizemoney_365D'] = dog_results.groupby('FastTrack_DogId')['Prizemoney'].apply(lambda x: x.rolling("365D").sum().shift(1))
dog_results['Prizemoney_365D'].fillna(0, inplace = True)

# Feature 2 - Number of runs over the last 365D
dog_results['runs_365D'] = dog_results.groupby('FastTrack_DogId')['win'].apply(lambda x: x.rolling("365D").count().shift(1))
dog_results['runs_365D'].fillna(0, inplace = True)

# Feature 3 - win % over the last 365D
dog_results['wins_365D'] = dog_results.groupby('FastTrack_DogId')['win'].apply(lambda x: x.rolling("365D").sum().shift(1))
dog_results['wins_365D'].fillna(0, inplace = True)
dog_results['win%_365D'] = dog_results['wins_365D'] / dog_results['runs_365D']

# Feature 4 - Best runtime over the last 365D
dog_results['RunTime_norm_best_365D'] = dog_results.groupby('FastTrack_DogId')['RunTime_norm'].apply(lambda x: x.rolling("365D").min().shift(1))

# Feature 5 - Median runtime over the last 365D
dog_results['RunTime_norm_median_365D'] = dog_results.groupby('FastTrack_DogId')['RunTime_norm'].apply(lambda x: x.rolling("365D").median().shift(1))

dog_results.head(10)

	FastTrack_DogId	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	...	Track	trackdist	win	RunTime_norm	Prizemoney_365D	runs_365D	wins_365D	win%_365D	RunTime_norm_best_365D	RunTime_norm_median_365D
date_dt
2018-04-08	-2143477289	3	SUNBURNT SWAMPY	3	3	31.6	11.2	None	4.75	1.86	...	Albion Park	Albion Park331	0	0.856147	0.0	0.0	0.0	NaN	NaN	NaN
2018-04-15	-2143477289	6	SUNBURNT SWAMPY	4	4	31.1	38.9	None	12.75	0.14	...	Albion Park	Albion Park331	0	0.991574	175.0	1.0	0.0	0.0	0.856147	0.856147
2018-04-22	-2143477289	6	SUNBURNT SWAMPY	5	5	30.7	29.1	None	9.50	4.57	...	Albion Park	Albion Park331	0	1.194715	175.0	2.0	0.0	0.0	0.856147	0.923861
2018-07-15	-2143477289	3	SUNBURNT SWAMPY	3	3	31.9	38.1	None	10.00	0.00	...	Albion Park	Albion Park331	0	0.675578	175.0	3.0	0.0	0.0	0.856147	0.991574
2018-09-02	-2143477289	6	SUNBURNT SWAMPY	2	2	32.8	11.7	None	8.25	3.57	...	Albion Park	Albion Park331	0	0.607864	350.0	4.0	0.0	0.0	0.675578	0.923861
2018-09-09	-2143477289	7	SUNBURNT SWAMPY	6	6	32.6	41.0	None	12.75	3.71	...	Albion Park	Albion Park331	0	1.262428	350.0	5.0	0.0	0.0	0.607864	0.856147
2018-09-16	-2143477289	4	SUNBURNT SWAMPY	1	1	32.3	18.0	None	1.50	0.43	...	Albion Park	Albion Park331	0	-0.385268	350.0	6.0	0.0	0.0	0.607864	0.923861
2018-10-14	-2143477289	5	SUNBURNT SWAMPY	8	8	32.3	5.5	None	11.25	1.29	...	Albion Park	Albion Park331	0	1.217286	350.0	7.0	0.0	0.0	-0.385268	0.856147
2018-11-18	-2143477289	7	SUNBURNT SWAMPY	3	3	32.8	21.0	None	9.25	1.71	...	Albion Park	Albion Park331	0	1.262428	350.0	8.0	0.0	0.0	-0.385268	0.923861
2019-05-26	-2143477289	4	SUNBURNT SWAMPY	7	7	31.7	71.0	None	11.00	1.86	...	Albion Park	Albion Park331	0	0.517579	350.0	9.0	0.0	0.0	-0.385268	0.991574

10 rows × 31 columns

Convert all features into Z-scores within each race so that the features are on a relative basis when fed into the model

dog_results = dog_results.sort_values(by = ['date_dt', 'FastTrack_RaceId'])

for col in ['Prizemoney_365D', 'runs_365D', 'win%_365D',
            'RunTime_norm_best_365D', 'RunTime_norm_median_365D']:
    dog_results[col + '_Z'] = dog_results.groupby('FastTrack_RaceId')[col].transform(lambda x: zscore(x, ddof = 1))

dog_results['runs_365D_Z'].fillna(0, inplace = True)
dog_results['win%_365D_Z'].fillna(0, inplace = True)

Train the model

Next, we'll train our model. To keep things simple, we'll choose a Logistic Regression from the sklearn package.

For modelling purposes, we'll only keep data after 2019 as our features use the last 365 days of history so data in 2018 won't capture an entire 365 day period. Also we'll only keep races where each dog has a value for each feature. The last piece of code is to just double check the DataFrame has no null values.

dog_results = dog_results.reset_index()
dog_results = dog_results.sort_values(by = ['date_dt', 'FastTrack_RaceId'])

# Only keep data aFter 2019
model_df = dog_results[dog_results['date_dt'] &gt;= '2019-01-01']
feature_cols = ['Prizemoney_365D_Z', 'runs_365D_Z', 'win%_365D_Z',
                'RunTime_norm_best_365D_Z', 'RunTime_norm_median_365D_Z']
model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice'] + feature_cols]

# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]

# checking if any null values
model_df.drop(columns = 'StartPrice').isnull().values.any()

False

We will use pre-2021 as our train dataset and post-2021 as our test dataset which gives us an approximate 80/20 split of train to test data.

Note that one issue with training our model this way is that we are training each dog result individually and not in conjunction with the other dogs in the race. Therefore the probabilities are not guaranteed to add up to 1.

# Split the data into train and test data
train_data = model_df[model_df['date_dt'] &lt; '2021-01-01'].reset_index(drop = True)
test_data = model_df[model_df['date_dt'] &gt;= '2021-01-01'].reset_index(drop = True)

train_x, train_y = train_data[feature_cols], train_data['win']
test_x, test_y = test_data[feature_cols], test_data['win']

logit_model = LogisticRegression()
logit_model.fit(train_x, train_y)

test_data['prob_unscaled'] = logit_model.predict_proba(test_x)[:,1]
test_data.groupby('FastTrack_RaceId')['prob_unscaled'].sum()

FastTrack_RaceId
626218700    0.840901
626218701    0.731972
626218702    0.754034
626218703    0.986967
626218704    0.990238
               ...   
680757815    1.178215
680757816    0.847067
680757817    1.043633
680757818    0.805511
680757819    0.782609
Name: prob_unscaled, Length: 2491, dtype: float64

To correct for this, we'll apply a scaling factor to the model's raw outputs to force them to sum to 1. A better way to do this would be to use a conditional logistic regression which in the training process would ensure probabilities sum to unity.

# Scale the raw model output so they sum to unity
test_data['prob_scaled'] = test_data.groupby('FastTrack_RaceId')['prob_unscaled'].apply(lambda x: x / sum(x))
test_data.groupby('FastTrack_RaceId')['prob_scaled'].sum()

FastTrack_RaceId
626218700    1.0
626218701    1.0
626218702    1.0
626218703    1.0
626218704    1.0
            ... 
680757815    1.0
680757816    1.0
680757817    1.0
680757818    1.0
680757819    1.0
Name: prob_scaled, Length: 2491, dtype: float64

As a rudimentary check, let's see how many races the model correctly predicts using the highest probability in a given race as our pick. We'll also do the same for the starting price odds as a comparison.

The model predicts the winner in 33% of races which is not great given the starting price predicts it in 41.7% of races ... but it will do for our purposes!

# Create a boolean column for whether a dog has the higehst model prediction in a race. Do the same for the starting price 
# as a comparison
test_data['model_win_prediction'] = test_data.groupby('FastTrack_RaceId')['prob_scaled'].apply(lambda x: x == max(x))
test_data['odds_win_prediction'] = test_data.groupby('FastTrack_RaceId')['StartPrice'].apply(lambda x: x == min(x))

print("Model predicts the winner in {:.2%} of races".format(
    len(test_data[(test_data['model_win_prediction'] == True) &amp; (test_data['win'] == 1)]) / test_data['FastTrack_RaceId'].nunique()
    ))
print("Starting Price Odds predicts the winner in {:.2%} of races".format(
    len(test_data[(test_data['odds_win_prediction'] == True) &amp; (test_data['win'] == 1)]) / test_data['FastTrack_RaceId'].nunique()
    ))

Model predicts the winner in 32.96% of races
Starting Price Odds predicts the winner in 41.75% of races

3. Retrieve today's race lineups

Retrieve today's lineups from FastTrack

Now that we have trained our model. We want to get today's races from FastTrack and run the model over it.

We have two options from FastTrack:

Basic Plus Format: Contains basic information about the dog lineups such as box, best time, trainer, owner, ratings, speed ratings ...
Full Plus Format: Contains everything in the basic format with additional information such as previous start information.

getBasicFormat(dt, tracks = None)

getFullFormat(dt, tracks = None)

The calls will return two dataframes, one with the race information and one with the individual dog information. Again, the tracks parameter is optional and if left blank, all tracks will be returned.

As we are only after the dog lineups to run our model on, let's just grab the basic format and again only restrict for QLD tracks.

qld_races_today, qld_dogs_today = greys.getBasicFormat('2021-06-16', tracks_filter)
qld_races_today.head()

Getting meets for each date ..

100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.08it/s]

Getting dog lineups ..

100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.43it/s]

	@id	RaceNum	RaceName	RaceTime	RaceTimeDateUTC	Distance	RaceGrade	PrizeMoney1	PrizeMoney2	PrizeMoney3	...	Handicap	TAB	GradeCode	VICGREYS	RaceComment	Track	Date	Quali	TipsComments_Bet	TipsComments_Tips
0	680665206	1	TAB ORIGIN GREYHOUNDS TOMORROW	03:32PM	16 Jun 21 05:32AM	395m	Novice Non Penalty	$1750	$500	$250	...	None	None	NNP	None	""	Albion Park	16 Jun 21	None	None	None
1	680665207	2	TERRY HILL VS BEN HANNANT	03:52PM	16 Jun 21 05:52AM	395m	Maiden Heat	$1750	$500	$250	...	None	None	MH	None	""	Albion Park	16 Jun 21	None	None	None
2	680665208	3	QLD VS NSW TOMORROW @BRISGREYS	04:17PM	16 Jun 21 06:17AM	395m	Maiden Heat	$1750	$500	$250	...	None	None	MH	None	""	Albion Park	16 Jun 21	None	None	None
3	680665209	4	ORIGIN SPRINT TOMORROW NIGHT	04:38PM	16 Jun 21 06:38AM	395m	Maiden Heat	$1750	$500	$250	...	None	None	MH	None	""	Albion Park	16 Jun 21	None	None	None
4	680665210	5	BEN HANNANT?S QLD MAROONS	04:57PM	16 Jun 21 06:57AM	395m	Grade 5 Heat	$1750	$500	$250	...	None	None	5H	None	""	Albion Park	16 Jun 21	None	None	None

5 rows × 27 columns

Creat a list of the QLD tracks running today which will be used later when we fetch the Betfair data

# Qld tracks running today
qld_tracks_today = list(qld_races_today['Track'].unique())
qld_tracks_today

['Albion Park', 'Ipswich']

Retrieve today's lineups from the Betfair API

The FastTrack lineups contain all the dogs in a race, including reserves and scratched dogs. As we only want to run our model on final lineups, we'll need to connect to the Betfair API to update our lineups for any scratchings.

Let's first login to the Betfair API. Enter in your username, password and API key and create a betfairlightweight object.

my_username = "your_username"
my_password = "your_password"
my_app_key = "your_app_key"

trading = betfairlightweight.APIClient(my_username, my_password, app_key=my_app_key)
trading.login_interactive()

<LoginResource>

Next, we'll call the list_events operation which will return all the greyhound events in Australia over the next 24 hours.

# Create the market filter
greyhounds_event_filter = filters.market_filter(
    event_type_ids=[4339],
    market_countries=['AU'],
    market_start_time={
        'to': (datetime.utcnow() + timedelta(days=1)).strftime("%Y-%m-%dT%TZ")
    }
)

# Get a list of all greyhound events as objects
greyhounds_events = trading.betting.list_events(
    filter=greyhounds_event_filter
)

# Create a DataFrame with all the events by iterating over each event object
greyhounds_events_today = pd.DataFrame({
    'Event Name': [event_object.event.name for event_object in greyhounds_events],
    'Event ID': [event_object.event.id for event_object in greyhounds_events],
    'Event Venue': [event_object.event.venue for event_object in greyhounds_events],
    'Country Code': [event_object.event.country_code for event_object in greyhounds_events],
    'Time Zone': [event_object.event.time_zone for event_object in greyhounds_events],
    'Open Date': [event_object.event.open_date for event_object in greyhounds_events],
    'Market Count': [event_object.market_count for event_object in greyhounds_events]
})

greyhounds_events_today.head()

	Event Name	Event ID	Event Venue	Country Code	Time Zone	Open Date	Market Count
0	Bend (AUS) 16th Jun	30618018	Bendigo	AU	Australia/Sydney	2021-06-16 01:37:00	36
1	WPrk (AUS) 16th Jun	30618017	Wentworth Park	AU	Australia/Sydney	2021-06-16 09:05:00	40
2	MBdg (AUS) 16th Jun	30618832	Murray Bridge	AU	Australia/Adelaide	2021-06-16 01:55:00	36
3	Cran (AUS) 16th Jun	30618160	Cranbourne	AU	Australia/Sydney	2021-06-16 08:44:00	34
4	Ball (AUS) 16th Jun	30618165	Ballarat	AU	Australia/Sydney	2021-06-16 08:58:00	60

Next, let's fetch the market ids. As we know the meets we're interested in today, let's restrict the market pull request for only the QLD tracks that are running today.

greyhounds_events_today = greyhounds_events_today[greyhounds_events_today['Event Venue'].isin(qld_tracks_today)]
greyhounds_events_today.head()

	Event Name	Event ID	Event Venue	Country Code	Time Zone	Open Date	Market Count
7	Ipsw (AUS) 16th Jun	30618813	Ipswich	AU	Australia/Queensland	2021-06-16 08:55:00	40
9	APrk (AUS) 16th Jun	30618188	Albion Park	AU	Australia/Queensland	2021-06-16 05:32:00	27

market_catalogue_filter = filters.market_filter(
    event_ids = list(greyhounds_events_today['Event ID']),
    market_type_codes = ['WIN']
)

market_catalogue = trading.betting.list_market_catalogue(
    filter=market_catalogue_filter,
    max_results='1000',
    sort='FIRST_TO_START',
    market_projection=['MARKET_START_TIME', 'MARKET_DESCRIPTION', 'RUNNER_DESCRIPTION', 'EVENT', 'EVENT_TYPE']
)

win_markets = []
runners = []

for market_object in market_catalogue:
    # win_markets_df.append({
    #     'Event Name': market_object.event.name,
    #     'Event ID': market_object.event.id,
    #     'Event Venue': market_object.event_venue,
    #     'Market Name': market_object.market_name,
    #     'Market ID': market_object.market_id,
    #     'Market start time': market_object.market_start_time,
    #     'Total Matched': market_object.total_matched
    #     })
    win_markets.append({
        'event_name': market_object.event.name,
        'event_id': market_object.event.id,
        'event_venue': market_object.event.venue,
        'market_name': market_object.market_name,
        'market_id': market_object.market_id,
        'market_start_time': market_object.market_start_time,
        'total_matched': market_object.total_matched
        })

    for runner in market_object.runners:
        runners.append({
            'market_id': market_object.market_id,
            'runner_id': runner.selection_id,
            'runner_name': runner.runner_name
            })

win_markets_df = pd.DataFrame(win_markets)
runners_df = pd.DataFrame(runners)

For matching purposes, we'll need to extract the race number from the market_name. Also let's add another field 'local_start_time' as the market_start_time field is in UTC format.

# Extract race number from market name
win_markets_df['race_number'] = win_markets_df['market_name'].apply(
    lambda x: x[1:3].strip() if x[0] == 'R' else None)

# Functions that returns the time from a newly specified timezone given a time and an old timezone
def change_timezone(time, oldtz, newtz):
    from_zone = tz.gettz(oldtz)
    to_zone = tz.gettz(newtz)
    newtime = time.replace(tzinfo = from_zone).astimezone(to_zone).replace(tzinfo = None)
    return newtime

# Add in a local_start_time variable
win_markets_df['local_start_time'] = win_markets_df['market_start_time'].apply(lambda x: \
                    change_timezone(x, 'UTC', 'Australia/Sydney'))

win_markets_df.head()

	event_name	event_id	event_venue	market_name	market_id	market_start_time	race_number	local_start_time
0	APrk (AUS) 16th Jun	30618188	Albion Park	R1 395m Nvce	1.184472300	2021-06-16 05:32:00	1	2021-06-16 15:32:00
1	APrk (AUS) 16th Jun	30618188	Albion Park	R2 395m Heat	1.184472302	2021-06-16 05:52:00	2	2021-06-16 15:52:00
2	APrk (AUS) 16th Jun	30618188	Albion Park	R3 395m Heat	1.184472304	2021-06-16 06:17:00	3	2021-06-16 16:17:00
3	APrk (AUS) 16th Jun	30618188	Albion Park	R4 395m Heat	1.184472306	2021-06-16 06:38:00	4	2021-06-16 16:38:00
4	APrk (AUS) 16th Jun	30618188	Albion Park	R5 395m Heat	1.184472308	2021-06-16 06:57:00	5	2021-06-16 16:57:00

To match the dog names from Betfair and FastTrack, we'll also need to remove the rug number from the start of the runner_name in the runners_df DataFrame.

# Remove dog number from runner_name
runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())

# Merge on the race number and event venue onto runners_df
runners_df = runners_df.merge(
     win_markets_df[['market_id', 'event_venue', 'race_number']],
     how = 'left',
     on = 'market_id')
runners_df.head()

	market_id	runner_id	runner_name	event_venue	race_number
0	1.184472300	36594055	LITTLE MISS VANE	Albion Park	1
1	1.184472300	39860314	MULGOWIE BELLE	Albion Park	1
2	1.184472300	39860315	NIGHT CAPERS	Albion Park	1
3	1.184472300	38079770	WRONG GIRL HARRY	Albion Park	1
4	1.184472300	37616746	IM ON FIRE	Albion Park	1

Merge race lineups from FastTrack and Betfair

Before we can merge, we'll need to do some minor formatting changes to the FastTrack names so we can match onto the Betfair names. Betfair excludes all apostrophes and full stops in their naming convention so we'll create a betfair equivalent dog name on the dataset removing these characters. We'll also tag on the race number to the lineups dataset for merging purposes as well.

qld_races_today = qld_races_today.rename(columns = {'@id': 'FastTrack_RaceId'})
qld_races_today = qld_races_today[['FastTrack_RaceId', 'Date', 'Track', 'RaceNum', 'RaceName', 
                                   'RaceTime', 'Distance', 'RaceGrade']]
qld_dogs_today = qld_dogs_today.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})
qld_dogs_today = qld_dogs_today.merge(
    qld_races_today[['FastTrack_RaceId', 'Track', 'RaceNum']],
    how = 'left',
    on = 'FastTrack_RaceId'
    )
qld_dogs_today['DogName_bf'] = qld_dogs_today['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())

Now we can merge on the FastTrack and Betfair lineup dataframes by dog name, track and race number. We'll check that all selections have been matched by making sure there are no null dog ids.

# Match on the fastTrack dogId to the runners_df
runners_df = runners_df.merge(
    qld_dogs_today[['DogName_bf', 'Track', 'RaceNum', 'FastTrack_DogId']],
    how = 'left',
    left_on = ['runner_name', 'event_venue', 'race_number'],
    right_on = ['DogName_bf', 'Track', 'RaceNum'],
    ).drop(['DogName_bf', 'Track', 'RaceNum'], axis = 1)

# Check all betfair selections are matched to a fastTrack dogId by checking if there are any null dogIds
runners_df['FastTrack_DogId'].isnull().any()

False

runners_df.head()

	market_id	runner_id	runner_name	event_venue	race_number	FastTrack_DogId
0	1.184472300	36594055	LITTLE MISS VANE	Albion Park	1	434800466
1	1.184472300	39860314	MULGOWIE BELLE	Albion Park	1	510731455
2	1.184472300	39860315	NIGHT CAPERS	Albion Park	1	415994834
3	1.184472300	38079770	WRONG GIRL HARRY	Albion Park	1	443645048
4	1.184472300	37616746	IM ON FIRE	Albion Park	1	448841452

4. Run model on today's lineups and start betting

Create model features for the runners

First we have to create the same model features we used in our logistic regression model on the dogs in the runners_df DataFrame. As our features use historic data over the last 365 days, we'll need to filter our historic results dataset (created in step 1) for only the dog ids we are interested in and only over the last 365 days.

runners_historicdata = dog_results[dog_results['FastTrack_DogId'].isin(runners_df['FastTrack_DogId'])]
runners_historicdata = runners_historicdata.sort_values(by = ['FastTrack_DogId', 'date_dt'])
runners_historicdata = runners_historicdata[runners_historicdata['date_dt'] &gt;= (datetime.now() - timedelta(days = 365))]

Next we create the features. As our trained model requires a non-null value in each of the features, we'll exclude all markets where at least one dog has a null feature.

# Create the feature variables over the last 365 days
runners_features = runners_historicdata.groupby('FastTrack_DogId').agg(
    Prizemoney_365D = ('Prizemoney', 'sum'),
    RunTime_norm_best_365D = ('RunTime_norm', 'min'),
    RunTime_norm_median_365D = ('RunTime_norm', 'median'),
    runs_365D = ('FastTrack_RaceId', 'count'),
    wins_365D = ('win', 'sum')
    ).reset_index()

runners_features['win%_365D'] = runners_features['wins_365D'] / runners_features['runs_365D']
runners_features = runners_features.drop('wins_365D', axis = 1)

runners_df = runners_df.merge(runners_features,
                              how = 'left',
                              on = 'FastTrack_DogId')

# Only run on races where every dog has non-null features
markets_exclude = runners_df[runners_df.isnull().any(axis = 1)]['market_id'].drop_duplicates()
runners_df = runners_df[~runners_df['market_id'].isin(markets_exclude)]

print("{0} markets are excluded".format(str(len(markets_exclude))))

# Convert the feature variables into Z-scores
for col in ['Prizemoney_365D', 'runs_365D', 'win%_365D',
            'RunTime_norm_best_365D', 'RunTime_norm_median_365D']:
    runners_df[col + '_Z'] = runners_df.groupby('market_id')[col].transform(
        lambda x: zscore(x, ddof = 1))

runners_df['runs_365D_Z'].fillna(0, inplace = True)
runners_df['win%_365D_Z'].fillna(0, inplace = True)

6 markets are excluded

Attach the model output onto the runners_df DataFrame. We will also scale the probabilities to sum to unity (same as what we did when assessing the trained model outputs in step 2).

Let's also add a column for model fair odds which is just the reciprocal of the prob_scaled. We'll also add another column for the minimum back odds we're willing to take assuming we'd only bet off a 10% model overlay.

runners_df['prob_unscaled'] = logit_model.predict_proba(runners_df[feature_cols])[:,1]
runners_df['prob_scaled'] = runners_df.groupby('market_id')['prob_unscaled'].apply(lambda x: x / sum(x))
runners_df['model_fairodds'] = 1 / runners_df['prob_scaled']
runners_df['min_odds'] = (0.1 + 1) / runners_df['prob_scaled']
runners_df[['market_id', 'runner_name', 'event_venue', 'prob_scaled', 'model_fairodds', 'min_odds']].head()

	market_id	runner_name	event_venue	prob_scaled	model_fairodds	min_odds
0	1.184472300	LITTLE MISS VANE	Albion Park	0.056184	17.798518	19.578370
1	1.184472300	MULGOWIE BELLE	Albion Park	0.376277	2.657620	2.923382
2	1.184472300	NIGHT CAPERS	Albion Park	0.325158	3.075425	3.382967
3	1.184472300	WRONG GIRL HARRY	Albion Park	0.152564	6.554620	7.210082
4	1.184472300	IM ON FIRE	Albion Park	0.089817	11.133812	12.247193

Now we can start betting!

Now we can start betting! For demonstration, we'll only bet on one market, but it's just as easy to set it up to bet on all markets based on your model probabilities. Let's take the first market only and create a separate DataFrame from runners_df with only those runners in that market.

market_id = win_markets_df['market_id'][0]
bet_df = runners_df[runners_df['market_id'] == market_id].reset_index(drop = True)
bet_df

	market_id	runner_id	runner_name	event_venue	race_number	FastTrack_DogId	Prizemoney_365D	RunTime_norm_best_365D	RunTime_norm_median_365D	runs_365D	win%_365D	Prizemoney_365D_Z	runs_365D_Z	win%_365D_Z	RunTime_norm_best_365D_Z	RunTime_norm_median_365D_Z	prob_unscaled	prob_scaled	model_fairodds	min_odds
0	1.184472300	36594055	LITTLE MISS VANE	Albion Park	1	434800466	2175.0	-0.049480	0.538154	12.0	0.083333	-0.114708	0.70791	-0.774043	1.632343	0.269920	0.037524	0.056184	17.798518	19.578370
1	1.184472300	39860314	MULGOWIE BELLE	Albion Park	1	510731455	1850.0	-1.029058	-0.897651	2.0	0.500000	-0.818108	-0.97759	1.600323	-0.174870	-1.638741	0.251304	0.376277	2.657620	2.923382
2	1.184472300	39860315	NIGHT CAPERS	Albion Park	1	415994834	2500.0	-1.513865	0.231713	5.0	0.200000	0.588691	-0.47194	-0.109221	-1.069288	-0.137442	0.217164	0.325158	3.075425	3.382967
3	1.184472300	38079770	WRONG GIRL HARRY	Albion Park	1	443645048	1750.0	-1.151977	0.743528	4.0	0.250000	-1.034539	-0.64049	0.175703	-0.401644	0.542930	0.101893	0.152564	6.554620	7.210082
4	1.184472300	37616746	IM ON FIRE	Albion Park	1	448841452	2865.0	-0.926976	1.059778	16.0	0.062500	1.378664	1.38211	-0.892762	0.013458	0.963332	0.059986	0.089817	11.133812	12.247193

One thing we have to ensure is that the odds that we place adhere to the betfair price increments stucture. For example odds of 19.578370 are not valid odds to place a bet on. If we were to try we would get an INVALID_ODDS error. For more information on valid price increments click here.

We'll create a function that rounds odds up to the nearest valid price increment and apply this to our min_odds field.

def roundUpOdds(odds):
    if odds &lt; 2:
        return math.ceil(odds * 100) / 100
    elif odds &lt; 3:
        return math.ceil(odds * 50) / 50
    elif odds &lt; 4:
        return math.ceil(odds * 20) / 20
    elif odds &lt; 6:
        return math.ceil(odds * 10) / 10
    elif odds &lt; 10:
        return math.ceil(odds * 5) / 5
    elif odds &lt; 20:
        return math.ceil(odds * 2) / 2
    elif odds &lt; 30:
        return math.ceil(odds * 1) / 1
    elif odds &lt; 50:
        return math.ceil(odds * 0.5) / 0.5
    elif odds &lt; 100:
        return math.ceil(odds * 0.2) / 0.2
    elif odds &lt; 1000:
        return math.ceil(odds * 0.1) / 0.1
    else:
        return odds

bet_df['min_odds'] = bet_df['min_odds'].apply(lambda x: roundUpOdds(x))
bet_df

	market_id	runner_id	runner_name	event_venue	race_number	FastTrack_DogId	Prizemoney_365D	RunTime_norm_best_365D	RunTime_norm_median_365D	runs_365D	win%_365D	Prizemoney_365D_Z	runs_365D_Z	win%_365D_Z	RunTime_norm_best_365D_Z	RunTime_norm_median_365D_Z	prob_unscaled	prob_scaled	model_fairodds	min_odds
0	1.184472300	36594055	LITTLE MISS VANE	Albion Park	1	434800466	2175.0	-0.049480	0.538154	12.0	0.083333	-0.114708	0.70791	-0.774043	1.632343	0.269920	0.037524	0.056184	17.798518	20.00
1	1.184472300	39860314	MULGOWIE BELLE	Albion Park	1	510731455	1850.0	-1.029058	-0.897651	2.0	0.500000	-0.818108	-0.97759	1.600323	-0.174870	-1.638741	0.251304	0.376277	2.657620	2.94
2	1.184472300	39860315	NIGHT CAPERS	Albion Park	1	415994834	2500.0	-1.513865	0.231713	5.0	0.200000	0.588691	-0.47194	-0.109221	-1.069288	-0.137442	0.217164	0.325158	3.075425	3.40
3	1.184472300	38079770	WRONG GIRL HARRY	Albion Park	1	443645048	1750.0	-1.151977	0.743528	4.0	0.250000	-1.034539	-0.64049	0.175703	-0.401644	0.542930	0.101893	0.152564	6.554620	7.40
4	1.184472300	37616746	IM ON FIRE	Albion Park	1	448841452	2865.0	-0.926976	1.059778	16.0	0.062500	1.378664	1.38211	-0.892762	0.013458	0.963332	0.059986	0.089817	11.133812	12.50

Now that we have valid minimum odds that we want to bet on for each selection, we'll start betting. The following function will place a standard limit bet on Betfair on the specified market_id and selection_id at the specified size and price.

# Create a function to place a bet using betfairlightweight
def placeBackBet(instance, market_id, selection_id, size, price):
    order_filter = filters.limit_order(
        size = size,
        price = price,
        persistence_type = "LAPSE"
    )
    instructions_filter = filters.place_instruction(
        selection_id = str(selection_id),
        order_type = "LIMIT",
        side = "BACK",
        limit_order = order_filter
    )
    order  = instance.betting.place_orders(
        market_id = market_id,
        instructions = [instructions_filter]
    )
    print("Bet Place on selection {0} is {1}".format(str(selection_id), order.__dict__['_data']['status']))
    return order

Now let's loop through the runners in bet_df and place a bet of $5 on each runner at the minimum odds we're willing to take.

for selection_id, min_odds in zip(bet_df['runner_id'], bet_df['min_odds']):
    placeBackBet(trading, market_id, selection_id, 5, min_odds)

Bet Place on selection 36594055 is SUCCESS
Bet Place on selection 39860314 is SUCCESS
Bet Place on selection 39860315 is SUCCESS
Bet Place on selection 38079770 is SUCCESS
Bet Place on selection 37616746 is SUCCESS

And success! We have downloaded historical greyhound form data from FastTrack, built a simple model, and bet off this model using the Betfair API.

Disclaimer

Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.