How to Automate IV: Automate your own Model

This is an archived version of How to Automate 4 the latest version is available here

For this tutorial we will be automating the model that Bruno taught us how to make in the Greyhound Modelling Tutorial. This tutorial follows on logically from How to Automate III. If you haven't already, make sure you take a look at the rest of the series first those before continuing here as they cover some key concepts!

Saving and loading in our model

To generate our predictions, we have two options: we can generate our predictions using the same notebook used to train our model then read those predictions into this notebook, or we can save the model and read that model into this notebook.

For this tutorial we have chosen to save the model, as it becomes a bit less confusing and easier to manage, although there are some pieces of code we may have to write twice (copy and paste). So first we will need to run the code from the tutorial and then save the model. This is super as simple we can just copy and paste the complete code provided at the end of the tutorial or download from Github. Then we can just run this extra line code (which I have copied from the documentation page) at the end of the notebook to save the model.

from joblib import dump
dump(models['LogisticRegression'], 'logistic_regression.joblib')

Now that the file is saved, let's read it into this note book:

from joblib import load

brunos_model = load('logistic_regression.joblib')
brunos_model

LogisticRegression(n_jobs=-1, solver='saga')

Generating predictions for today

Now that we have the model loaded in, we need the data, to generate our predictions for today's races!

# Import libraries required to download today's races
import os
import sys

# Allow imports from src folder
module_path = os.path.abspath(os.path.join('../src'))
if module_path not in sys.path:
    sys.path.append(module_path)

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
from dateutil import tz
from pandas.tseries.offsets import MonthEnd
from sklearn.preprocessing import MinMaxScaler
import itertools

import numpy as np
import pandas as pd
from nltk.tokenize import regexp_tokenize

# settings to display all columns
pd.set_option("display.max_columns", None)

import fasttrack as ft

from dotenv import load_dotenv
load_dotenv()

True

# Validate FastTrack API connection
api_key = os.getenv('FAST_TRACK_API_KEY',)
client = ft.Fasttrack(api_key)
track_codes = client.listTracks()

Valid Security Key

# Import race data excluding NZ races
au_tracks_filter = list(track_codes[track_codes['state'] != 'NZ']['track_code'])

# Time window to import data
# First day of the month 46 months back from now
date_from = (datetime.today() - relativedelta(months=46)).replace(day=1).strftime('%Y-%m-%d')
# First day of previous month
date_to = (datetime.today() - relativedelta(months=1)).replace(day=1).strftime('%Y-%m-%d')

# Dataframes to populate data with
race_details = pd.DataFrame()
dog_results = pd.DataFrame()

# For each month, either fetch data from API or use local CSV file if we already have downloaded it
for start in pd.date_range(date_from, date_to, freq='MS'):
    start_date = start.strftime("%Y-%m-%d")
    end_date = (start + MonthEnd(1)).strftime("%Y-%m-%d")
    try:
        filename_races = f'FT_AU_RACES_{start_date}.csv'
        filename_dogs = f'FT_AU_DOGS_{start_date}.csv'

        filepath_races = f'../data/{filename_races}'
        filepath_dogs = f'../data/{filename_dogs}'

        print(f'Loading data from {start_date} to {end_date}')
        if os.path.isfile(filepath_races):
            # Load local CSV file
            month_race_details = pd.read_csv(filepath_races) 
            month_dog_results = pd.read_csv(filepath_dogs) 
        else:
            # Fetch data from API
            month_race_details, month_dog_results = client.getRaceResults(start_date, end_date, au_tracks_filter)
            month_race_details.to_csv(filepath_races, index=False)
            month_dog_results.to_csv(filepath_dogs, index=False)

        # Combine monthly data
        race_details = race_details.append(month_race_details, ignore_index=True)
        dog_results = dog_results.append(month_dog_results, ignore_index=True)
    except:
        print(f'Could not load data from {start_date} to {end_date}')

Loading data from 2018-09-01 to 2018-09-30
Loading data from 2018-10-01 to 2018-10-31
Loading data from 2018-11-01 to 2018-11-30
Loading data from 2018-12-01 to 2018-12-31
Loading data from 2019-01-01 to 2019-01-31
Loading data from 2019-02-01 to 2019-02-28
Loading data from 2019-03-01 to 2019-03-31
Loading data from 2019-04-01 to 2019-04-30
Loading data from 2019-05-01 to 2019-05-31
Loading data from 2019-06-01 to 2019-06-30
Loading data from 2019-07-01 to 2019-07-31
Loading data from 2019-08-01 to 2019-08-31
Loading data from 2019-09-01 to 2019-09-30
Loading data from 2019-10-01 to 2019-10-31
Loading data from 2019-11-01 to 2019-11-30
Loading data from 2019-12-01 to 2019-12-31
Loading data from 2020-01-01 to 2020-01-31
Loading data from 2020-02-01 to 2020-02-29
Loading data from 2020-03-01 to 2020-03-31
Loading data from 2020-04-01 to 2020-04-30
Loading data from 2020-05-01 to 2020-05-31
Loading data from 2020-06-01 to 2020-06-30
Loading data from 2020-07-01 to 2020-07-31
Loading data from 2020-08-01 to 2020-08-31
Loading data from 2020-09-01 to 2020-09-30
Loading data from 2020-10-01 to 2020-10-31
Loading data from 2020-11-01 to 2020-11-30
Loading data from 2020-12-01 to 2020-12-31
Loading data from 2021-01-01 to 2021-01-31
Loading data from 2021-02-01 to 2021-02-28
Loading data from 2021-03-01 to 2021-03-31
Loading data from 2021-04-01 to 2021-04-30
Loading data from 2021-05-01 to 2021-05-31
Loading data from 2021-06-01 to 2021-06-30
Loading data from 2021-07-01 to 2021-07-31
Loading data from 2021-08-01 to 2021-08-31
Loading data from 2021-09-01 to 2021-09-30
Loading data from 2021-10-01 to 2021-10-31
Loading data from 2021-11-01 to 2021-11-30
Loading data from 2021-12-01 to 2021-12-31
Loading data from 2022-01-01 to 2022-01-31

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\IPython\core\interactiveshell.py:3441: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

Loading data from 2022-02-01 to 2022-02-28
Loading data from 2022-03-01 to 2022-03-31
Loading data from 2022-04-01 to 2022-04-30
Loading data from 2022-05-01 to 2022-05-31
Loading data from 2022-06-01 to 2022-06-30

This piece of code we copied and pasted from the Greyhound Modelling Tutorial is fantastic! It has downloaded/read-in a ton of historic data! There is an issue though! We don't have the data for today's races, and also for any races that has occurred this month. This is because the code above only downloaded data up until the end of last month.

For example, if we are in the middle of June, then any races in the first two weeks of June won't be downloaded by the chunk of code above. An issue is that if we download it now, when tomorrow rolls around it won't include the extra races that have finished today.

So, the simple but inefficient solution is that every single day we redownload all the races that have already concluded this month. (Ideally you have some sort of database set up or you store and download your data in a daily format instead of the monthly format)

race_details.tail()

	@id	RaceNum	RaceName	RaceTime	Distance	RaceGrade	Track	date
88510	792243395	6	SKY RACING (N/P) STAKE	01:27PM	300m	Restricted Win	Murray Bridge (MBS)	07 Jun 22
88511	792243396	7	KURT DONSBERG PHOTOGRAPHY MIXED STAKE	01:44PM	300m	Mixed 4/5	Murray Bridge (MBS)	07 Jun 22
88512	792243397	8	GREYHOUNDS AS PETS	02:04PM	300m	Grade 5 Final	Murray Bridge (MBS)	07 Jun 22
88513	792243398	9	@THEDOGSSA (N/P) STAKE	02:19PM	300m	Restricted Win	Murray Bridge (MBS)	07 Jun 22
88514	792243399	10	FOLLOW THEDOGSSA ON TWITTER (N/P) STAKE	02:39PM	300m	Restricted Win	Murray Bridge (MBS)	07 Jun 22

current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

print(f'Start date: {current_month_start_date}')
print(f'End Date: {current_month_end_date}')

Start date: 2022-07-01
End Date: 2022-07-30

# Download data for races that have concluded this current month up untill today
# Start and end dates for current month
current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

# Files names 
filename_races = f'FT_AU_RACES_{current_month_start_date}.csv'
filename_dogs = f'FT_AU_DOGS_{current_month_start_date}.csv'
# Where to store files locally
filepath_races = f'../data/{filename_races}'
filepath_dogs = f'../data/{filename_dogs}'

# Fetch data from API
month_race_details, month_dog_results = client.getRaceResults(current_month_start_date, current_month_end_date, au_tracks_filter)

# Save the files locally and replace any out of date fields
month_race_details.to_csv(filepath_races, index=False)
month_dog_results.to_csv(filepath_dogs, index=False)

Getting meets for each date ..

100%|██████████| 30/30 [00:14<00:00,  2.01it/s]

Getting historic results details ..

100%|██████████| 162/162 [01:30<00:00,  1.80it/s]

dog_results

	@id	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Checks	Comments	SplitMargin	RunTime	Prizemoney	RaceId	TrainerId	TrainerName
0	114215500	1	DR. MURPHY	7.0	10	29.7	$4.10	NaN	4.24	NaN	Q/111	0	NaN	4.70	22.84	NaN	356387352	107925	W McMahon
1	131737955	2	MOLLY SPOLLY	8.0	8	27.3	$2.20F	NaN	4.24	4.24	M/222	0	NaN	4.72	23.14	NaN	356387352	199516	K Leviston
2	204414097	3	ASTON NARITA	2.0	2	29.2	$4.50	NaN	4.94	0.70	M/343	2	NaN	4.88	23.19	NaN	356387352	101224	K Gorman
3	126744995	4	ONI	6.0	6	25.0	$15.50	NaN	5.70	0.76	S/674	0	NaN	4.95	23.24	NaN	356387352	107925	W McMahon
4	120958941	5	DARCON FLASH	1.0	1	29.3	$31.00	NaN	6.54	0.84	M/765	8	NaN	4.96	23.30	NaN	356387352	125087	R Conway
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
749514	415996416	1	WEBLEC WHIRL	2.0	2	27.6	$4.20	NaN	0.10	NaN	0	0	NaN	4.38	16.75	510.0	792243399	76598	N Loechel
749515	557002281	2	UP THERE BILLY	1.0	1	32.7	$1.80F	NaN	0.10	0.14	NaN	0	NaN	4.43	16.76	175.0	792243399	327728	J Trengove
749516	529022935	3	WEBLEC FLAME	6.0	6	31.8	$5.50	NaN	4.25	4.00	NaN	0	NaN	4.48	17.04	140.0	792243399	76598	N Loechel
749517	383604709	4	STRAIGHT BLAZE	8.0	8	34.7	$23.00	NaN	5.00	0.86	NaN	0	NaN	4.61	17.10	115.0	792243399	123529	D Johnstone
749518	529022943	5	WEBLEC MIST	4.0	4	27.6	$7.00	NaN	15.00	10.14	0	0	VINJ(21)	4.67	17.81	0.0	792243399	76598	N Loechel

749519 rows × 19 columns

# This is super important I have spent literally hours before I found out this was causing errors
dog_results['@id'] = pd.to_numeric(dog_results['@id'])

# Append the extra data to our data frames 
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

What we are really interested in are races that are scheduled for today as we want to use our model to predict their ratings. So, let's write some code we can run in the morning that will download the data for the day:

# Download the data for todays races
todays_date = pd.Timestamp.now().strftime("%Y-%m-%d")
todays_races, todays_dogs = client.getFullFormat(dt= todays_date, tracks = au_tracks_filter)

display(todays_races.head(1), todays_dogs.head(1))

Getting meets for each date ..

100%|██████████| 1/1 [00:00<00:00,  1.89it/s]

Getting dog lineups ..

100%|██████████| 12/12 [00:13<00:00,  1.14s/it]

	@id	RaceNum	RaceName	RaceTime	RaceTimeDateUTC	Distance	RaceGrade	PrizeMoney1	PrizeMoney2	PrizeMoney3	PrizeMoney4	PrizeMoney5	PrizeMoney6	PrizeMoney7	PrizeMoney8	GOBIS	Hurdle	Handicap	TAB	GradeCode	VICGREYS	RaceComment	Track	Date	Quali	TipsComments_Bet	TipsComments_Tips
0	801896110	1	GPP LASER3300	06:44PM	04 Jul 22 08:44AM	385m	Maiden	$1600	$460	$230	$115	None	None	None	None	None	None	None	TRI/QUIN R/D EXACTA PICK4	M	None	"KASUMI BERRY (5) is a well bred type and her ...	Shepparton	04 Jul 22	None	Box Quinella 1,2,4,7 ($10 for 166.67%)	4, 7, 1, 2

	@id	RaceBox	DogName	BestTime	DogHandicap	Odds	Rating	Speed	DogComment	StartsTOT	StartsTTD	Suburb	Owner	Colour	Sex	Whelped	DogGrade	DogGOBIS	DogPRIZE	AgedPrizeMoney	Form	DamId	DamName	SireId	SireName	TrainerId	TrainerName	RaceId
0	536196758	1	HAVE A SHIRAZ	FSH	None	$4.60	0	None	Dam produced the highly talented Flaming rush	Starts 0-0-0-0	Trk/Dst 0-0-0-0	Heathcote	Paul Ellis	BK	B	03 Dec 19	M	N	0	0	[None, None, None, None, None]	1550070039	Pepper Shiraz	-710494	Barcia Bale	117228	Jason Formosa	801896110

# It seems that the todays_races dataframe doesn't have the date column, so let's add that on
todays_races['date'] = pd.Timestamp.now().strftime('%d %b %y')
todays_races.head(1)

	@id	RaceNum	RaceName	RaceTime	RaceTimeDateUTC	Distance	RaceGrade	PrizeMoney1	PrizeMoney2	PrizeMoney3	PrizeMoney4	PrizeMoney5	PrizeMoney6	PrizeMoney7	PrizeMoney8	GOBIS	Hurdle	Handicap	TAB	GradeCode	VICGREYS	RaceComment	Track	Date	Quali	TipsComments_Bet	TipsComments_Tips	date
0	801896110	1	GPP LASER3300	06:44PM	04 Jul 22 08:44AM	385m	Maiden	$1600	$460	$230	$115	None	None	None	None	None	None	None	TRI/QUIN R/D EXACTA PICK4	M	None	"KASUMI BERRY (5) is a well bred type and her ...	Shepparton	04 Jul 22	None	Box Quinella 1,2,4,7 ($10 for 166.67%)	4, 7, 1, 2	04 Jul 22

# It also seems that in todays_dogs dataframe Box is labeled as RaceBox instead, so let's rename it
# We can also see that there are some specific dogs that have "Res." as a suffix of their name, i.e. they are reserve dogs,
# We will treat this later
todays_dogs = todays_dogs.rename(columns={"RaceBox":"Box"})
todays_dogs.tail(3)

	@id	Box	DogName	BestTime	DogHandicap	Odds	Rating	Speed	DogComment	StartsTOT	StartsTTD	Suburb	Owner	Colour	Sex	Whelped	DogGrade	DogGOBIS	DogPRIZE	AgedPrizeMoney	Form	DamId	DamName	SireId	SireName	TrainerId	TrainerName	RaceId
1061	400500428	5	TEQUILA TALKING	22.63	None	None	100	66.425	None	Starts 58-11-7-12	Trk/Dst 28-6-5-5	Wolffdene	Fives Alive Synd D Wolff,N Brauer	BE	D	19 Oct 18	4	N	28855	None	[{'Place': '2nd', 'FormBox': '5', 'Weight': '3...	255840075	Sivamet	-737547	Hostile	93322	Michael Brauer	801490825
1062	525622257	7	REFERRAL	FSTD	None	None	81	61.343	None	Starts 51-4-7-6	Trk/Dst 0-0-0-0	Laidley Heights	Bad Decisions Synd P O'Reilly,A Pearce,D Henery	BD	D	22 Oct 19	5	N	13945	None	[{'Place': '5th', 'FormBox': '7', 'Weight': '3...	257880044	Lovelace	792880037	Sh Avatar	313314	Andrew Pearce	801490825
1063	566347962	8	STARDUST DREAMS	NBT	None	None	92	61.271	None	Starts 22-3-4-2	Trk/Dst 2-0-1-0	Park Ridge	Kerri-Lyn Harkness	BK	D	07 Mar 20	5	N	8240	None	[{'Place': '2nd', 'FormBox': '8', 'Weight': '3...	118703516	Ellie Belles	141317074	My Redeemer	127311	Stephen Woods	801490825

# Appending todays data to this months data
month_dog_results = pd.concat([month_dog_results,todays_dogs],join='outer')[month_dog_results.columns]
month_race_details = pd.concat([month_race_details,todays_races],join='outer')[month_race_details.columns]

# Appending this months data to the rest of our historical data
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

Cleaning our data and feature creation

Originally I thought that since we now that we have all the data we can easily copy and paste the code used in the greyhound modelling tutorial to clean our data and create the features.

But after staring at weird predictions and spending hours trying to work out why some things weren't working I realised that for the most part we can copy and paste code, but when working with the live data we do need to make a few changes. I'll point them out when we get to it, but the main things that tripped me up is the data types the FastTrack API gives and that we need a system to work around reserve dogs

race_details

	@id	RaceNum	RaceName	RaceTime	Distance	RaceGrade	Track	date
0	356387352	1	RUTTER'S BUTCHERY & POULTRY	05:29PM	395m	Mixed 6/7	Traralgon	01 Sep 18
1	356387359	2	TAB - WE LOVE A BET	05:47PM	395m	Grade 5	Traralgon	01 Sep 18
2	356387358	3	HALEY CONCRETING	06:05PM	395m	Grade 5	Traralgon	01 Sep 18
3	356387355	4	R.W & A.R INGLIS ELECTRICIANS	06:29PM	395m	Free For All	Traralgon	01 Sep 18
4	356387363	5	PRINTMAC	06:45PM	525m	Grade 5	Traralgon	01 Sep 18
...	...	...	...	...	...	...	...	...
89359	801490821	5	SENNACHIE @ STUD - STEVE WHITE	08:13PM	520m	Grade 5 Heat	Albion Park	04 Jul 22
89360	801490822	6	ORSON ALLEN @ METICULOUS LODGE	08:35PM	520m	Grade 5 Heat	Albion Park	04 Jul 22
89361	801490823	7	SKY RACING	08:53PM	600m	Mixed 4/5	Albion Park	04 Jul 22
89362	801490824	8	BORGBET TIPPING SERVICE	09:15PM	520m	Mixed 3/4	Albion Park	04 Jul 22
89363	801490825	9	TIGGERLONG TONK @ STUD	09:37PM	395m	Mixed 4/5	Albion Park	04 Jul 22

89364 rows × 8 columns

The first thing that tripped me up was when FastTrack_DogId for live data was in a string format, and because everything looks like it works, it took ages to find this error. So, let's make sure we deal with it here using:

dog_results['FastTrack_DogId'] = pd.to_numeric(dog_results['FastTrack_DogId'])

## Cleanse and normalise the data
# Clean up the race dataset
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
# Clean up the dogs results dataset
dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})

# New line of code (rest of this code chunk is copied from bruno's code)
dog_results['FastTrack_DogId'] = pd.to_numeric(dog_results['FastTrack_DogId'])

# Combine dogs results with race attributes
dog_results = dog_results.merge(
    race_details, 
    how = 'left',
    on = 'FastTrack_RaceId'
)

# Convert StartPrice to probability
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x is None else float(x.replace('$', '').replace('F', '')) if isinstance(x, str) else x)
dog_results['StartPrice_probability'] = (1 / dog_results['StartPrice']).fillna(0)
dog_results['StartPrice_probability'] = dog_results.groupby('FastTrack_RaceId')['StartPrice_probability'].apply(lambda x: x / x.sum())

# Discard entries without results (scratched or did not finish)
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results['Box'] = dog_results['Box'].astype(int)

# Clean up other attributes
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['SplitMargin'] = dog_results['SplitMargin'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float).fillna(0)
dog_results['Place'] = pd.to_numeric(dog_results['Place'].apply(lambda x: x.replace("=", "") if isinstance(x, str) else 0), errors='coerce').fillna(0)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x == 1 else 0)

# Normalise some of the raw values
dog_results['Prizemoney_norm'] = np.log10(dog_results['Prizemoney'] + 1) / 12
dog_results['Place_inv'] = (1 / dog_results['Place']).fillna(0)
dog_results['Place_log'] = np.log10(dog_results['Place'] + 1).fillna(0)
dog_results['RunSpeed'] = (dog_results['RunTime'] / dog_results['Distance']).fillna(0)

## Generate features using raw data
# Calculate median winner time per track/distance
win_results = dog_results[dog_results['win'] == 1]
median_win_time = pd.DataFrame(data=win_results[win_results['RunTime'] &gt; 0].groupby(['Track', 'Distance'])['RunTime'].median()).rename(columns={"RunTime": "RunTime_median"}).reset_index()
median_win_split_time = pd.DataFrame(data=win_results[win_results['SplitMargin'] &gt; 0].groupby(['Track', 'Distance'])['SplitMargin'].median()).rename(columns={"SplitMargin": "SplitMargin_median"}).reset_index()
median_win_time.head()

# Calculate track speed index
median_win_time['speed_index'] = (median_win_time['RunTime_median'] / median_win_time['Distance'])
median_win_time['speed_index'] = MinMaxScaler().fit_transform(median_win_time[['speed_index']])
median_win_time.head()

# Compare dogs finish time with median winner time
dog_results = dog_results.merge(median_win_time, on=['Track', 'Distance'], how='left')
dog_results = dog_results.merge(median_win_split_time, on=['Track', 'Distance'], how='left')

# Normalise time comparison
dog_results['RunTime_norm'] = (dog_results['RunTime_median'] / dog_results['RunTime']).clip(0.9, 1.1)
dog_results['RunTime_norm'] = MinMaxScaler().fit_transform(dog_results[['RunTime_norm']])
dog_results['SplitMargin_norm'] = (dog_results['SplitMargin_median'] / dog_results['SplitMargin']).clip(0.9, 1.1)
dog_results['SplitMargin_norm'] = MinMaxScaler().fit_transform(dog_results[['SplitMargin_norm']])
dog_results.head()

# Calculate box winning percentage for each track/distance
box_win_percent = pd.DataFrame(data=dog_results.groupby(['Track', 'Distance', 'Box'])['win'].mean()).rename(columns={"win": "box_win_percent"}).reset_index()
# Add to dog results dataframe
dog_results = dog_results.merge(box_win_percent, on=['Track', 'Distance', 'Box'], how='left')
# Display example of barrier winning probabilities
print(box_win_percent.head(8))

         Track  Distance  Box  box_win_percent
0  Albion Park       331    1         0.198089
1  Albion Park       331    2         0.152116
2  Albion Park       331    3         0.127354
3  Albion Park       331    4         0.126605
4  Albion Park       331    5         0.111058
5  Albion Park       331    6         0.109304
6  Albion Park       331    7         0.105310
7  Albion Park       331    8         0.115146

The second thing that we need to add is related to reserve dogs, and this took me ages to come to this solution, but if you have a better one, please submit a pull request.

Basically, a single greyhound can be a reserve dog for multiple races on the same day. They each appear as a new row in our data frame. For example, 'MACI REID' is a reserve dog for three different races on the 2022-09-02:

reserve_dogs_example

When we try lag our data by using .shift(1) like in Bruno's original code it will produce the wrong values for our features. In the above example only the first race The Gardens Race 4 (the third row) will have correct data but all the rows under it will have incorrectly calculated features. We need each of the following rows to be the same as the third row. The solution that I have come up with is a little bit complicated, but it gets the job done:

# Please submit a pull request if you have a better solution
temp = rolling_result.reset_index()
temp = temp[temp['date_dt'] == pd.Timestamp.now().normalize()]
temp.groupby(['FastTrack_DogId','date_dt']).first()
rolling_result.loc[pd.IndexSlice[:, pd.Timestamp.now().normalize()], :] = temp.groupby(['FastTrack_DogId','date_dt']).first()

Basically, for each greyhound we can just take the first row of data (which is correct) and set the rest of today's races to have the same value

# Generate rolling window features
dataset = dog_results.copy()
dataset = dataset.set_index(['FastTrack_DogId', 'date_dt']).sort_index()

# Use rolling window of 28, 91 and 365 days
rolling_windows = ['28D', '91D', '365D']
# Features to use for rolling windows calculation
features = ['RunTime_norm', 'SplitMargin_norm', 'Place_inv', 'Place_log', 'Prizemoney_norm']
# Aggregation functions to apply
aggregates = ['min', 'max', 'mean', 'median', 'std']
# Keep track of generated feature names
feature_cols = ['speed_index', 'box_win_percent']

for rolling_window in rolling_windows:
        print(f'Processing rolling window {rolling_window}')

        rolling_result = (
            dataset
            .reset_index(level=0).sort_index()
            .groupby('FastTrack_DogId')[features]
            .rolling(rolling_window)
            .agg(aggregates)
            .groupby(level=0)  # Thanks to Brett for finding this!
            .shift(1)
        )

        # My own dodgey code to work with reserve dogs
        temp = rolling_result.reset_index()
        temp = temp[temp['date_dt'] == pd.Timestamp.now().normalize()]
        temp.groupby(['FastTrack_DogId','date_dt']).first()
        rolling_result.loc[pd.IndexSlice[:, pd.Timestamp.now().normalize()], :] = temp.groupby(['FastTrack_DogId','date_dt']).first()

        # Generate list of rolling window feature names (eg: RunTime_norm_min_365D)
        agg_features_cols = [f'{f}_{a}_{rolling_window}' for f, a in itertools.product(features, aggregates)]
        # Add features to dataset
        dataset[agg_features_cols] = rolling_result
        # Keep track of generated feature names
        feature_cols.extend(agg_features_cols)

Processing rolling window 28D

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\pandas\core\generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

Processing rolling window 91D

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\pandas\core\generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

Processing rolling window 365D

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\pandas\core\generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

# Replace missing values with 0
dataset.fillna(0, inplace=True)
display(dataset.head(8))

# Only keep data after 2018-12-01
model_df = dataset.reset_index()
feature_cols = np.unique(feature_cols).tolist()
model_df = model_df[model_df['date_dt'] &gt;= '2018-12-01']

# This line was originally part of Bruno's tutorial, but we don't run it in this script
# model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice_probability'] + feature_cols]

# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]

		Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Checks	Comments	SplitMargin	RunTime	Prizemoney	FastTrack_RaceId	TrainerId	TrainerName	RaceNum	RaceName	RaceTime	Distance	RaceGrade	Track	date	StartPrice_probability	win	Prizemoney_norm	Place_inv	Place_log	RunSpeed	RunTime_median	speed_index	SplitMargin_median	RunTime_norm	SplitMargin_norm	box_win_percent	RunTime_norm_min_28D	RunTime_norm_max_28D	RunTime_norm_mean_28D	RunTime_norm_median_28D	RunTime_norm_std_28D	SplitMargin_norm_min_28D	SplitMargin_norm_max_28D	SplitMargin_norm_mean_28D	SplitMargin_norm_median_28D	SplitMargin_norm_std_28D	Place_inv_min_28D	Place_inv_max_28D	Place_inv_mean_28D	Place_inv_median_28D	Place_inv_std_28D	Place_log_min_28D	Place_log_max_28D	Place_log_mean_28D	Place_log_median_28D	Place_log_std_28D	Prizemoney_norm_min_28D	Prizemoney_norm_max_28D	Prizemoney_norm_mean_28D	Prizemoney_norm_median_28D	Prizemoney_norm_std_28D	RunTime_norm_min_91D	RunTime_norm_max_91D	RunTime_norm_mean_91D	RunTime_norm_median_91D	RunTime_norm_std_91D	SplitMargin_norm_min_91D	SplitMargin_norm_max_91D	SplitMargin_norm_mean_91D	SplitMargin_norm_median_91D	SplitMargin_norm_std_91D	Place_inv_min_91D	Place_inv_max_91D	Place_inv_mean_91D	Place_inv_median_91D	Place_inv_std_91D	Place_log_min_91D	Place_log_max_91D	Place_log_mean_91D	Place_log_median_91D	Place_log_std_91D	Prizemoney_norm_min_91D	Prizemoney_norm_max_91D	Prizemoney_norm_mean_91D	Prizemoney_norm_median_91D	Prizemoney_norm_std_91D	RunTime_norm_min_365D	RunTime_norm_max_365D	RunTime_norm_mean_365D	RunTime_norm_median_365D	RunTime_norm_std_365D	SplitMargin_norm_min_365D	SplitMargin_norm_max_365D	SplitMargin_norm_mean_365D	SplitMargin_norm_median_365D	SplitMargin_norm_std_365D	Place_inv_min_365D	Place_inv_max_365D	Place_inv_mean_365D	Place_inv_median_365D	Place_inv_std_365D	Place_log_min_365D	Place_log_max_365D	Place_log_mean_365D	Place_log_median_365D	Place_log_std_365D	Prizemoney_norm_min_365D	Prizemoney_norm_max_365D	Prizemoney_norm_mean_365D	Prizemoney_norm_median_365D	Prizemoney_norm_std_365D
FastTrack_DogId	date_dt
-2143477291	2018-09-02	7.0	YOU TELAM ANFY	1	1	31.2	5.2	0.0	7.0	0.57	8	0	8.0	7.48	19.85	0.0	354469749	8462	A Bunney	12	BRISGREYS.COM	09:01PM	331	GRADE 5 PATHWAY NON-PENALTY	Albion Park	02 Sep 18	0.161184	0	0.0	0.142857	0.903090	0.059970	19.17	0.600092	7.18	0.328715	0.299465	0.198089	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00000	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0
	2018-09-16	4.0	YOU TELAM ANFY	1	1	31.0	61.0	0.0	10.0	5.0	8	0	8.0	7.43	19.75	0.0	360928569	8462	A Bunney	11	SKY RACING	05:24PM	331	Grade 5	Albion Park	16 Sep 18	0.013847	0	0.0	0.250000	0.698970	0.059668	19.17	0.600092	7.18	0.353165	0.331763	0.198089	0.328715	0.328715	0.328715	0.328715	0.000000	0.299465	0.299465	0.299465	0.299465	0.000000	0.142857	0.142857	0.142857	0.142857	0.000000	0.903090	0.903090	0.903090	0.903090	0.000000	0.0	0.0	0.0	0.0	0.0	0.328715	0.328715	0.328715	0.328715	0.000000	0.299465	0.299465	0.299465	0.299465	0.000000	0.142857	0.142857	0.142857	0.142857	0.000000	0.903090	0.903090	0.903090	0.903090	0.000000	0.0	0.0	0.0	0.0	0.0	0.328715	0.328715	0.328715	0.328715	0.000000	0.299465	0.299465	0.299465	0.299465	0.000000	0.142857	0.142857	0.142857	0.142857	0.000000	0.90309	0.903090	0.903090	0.903090	0.000000	0.0	0.0	0.0	0.0	0.0
	2018-10-07	7.0	YOU TELAM ANFY	1	1	30.5	71.0	0.0	8.25	4.0	7	0	7.0	7.42	19.71	0.0	367774713	8462	A Bunney	11	SKY RACING	08:27PM	331	Grade 5	Albion Park	07 Oct 18	0.011604	0	0.0	0.142857	0.903090	0.059547	19.17	0.600092	7.18	0.363014	0.338275	0.198089	0.328715	0.353165	0.340940	0.340940	0.017288	0.299465	0.331763	0.315614	0.315614	0.022838	0.142857	0.250000	0.196429	0.196429	0.075761	0.698970	0.903090	0.801030	0.801030	0.144335	0.0	0.0	0.0	0.0	0.0	0.328715	0.353165	0.340940	0.340940	0.017288	0.299465	0.331763	0.315614	0.315614	0.022838	0.142857	0.250000	0.196429	0.196429	0.075761	0.698970	0.903090	0.801030	0.801030	0.144335	0.0	0.0	0.0	0.0	0.0	0.328715	0.353165	0.340940	0.340940	0.017288	0.299465	0.331763	0.315614	0.315614	0.022838	0.142857	0.250000	0.196429	0.196429	0.075761	0.69897	0.903090	0.801030	0.801030	0.144335	0.0	0.0	0.0	0.0	0.0
	2018-10-21	5.0	YOU TELAM ANFY	7	9	29.8	26.0	0.0	5.25	0.29	6	0	6.0	7.46	19.85	0.0	370420123	8462	A Bunney	12	ZILLMERE SPORTS	08:55PM	331	GRADE 5 PATHWAY NON-PENALTY	Albion Park	21 Oct 18	0.032235	0	0.0	0.200000	0.778151	0.059970	19.17	0.600092	7.18	0.328715	0.312332	0.105310	0.353165	0.363014	0.358089	0.358089	0.006964	0.331763	0.338275	0.335019	0.335019	0.004605	0.142857	0.250000	0.196429	0.196429	0.075761	0.698970	0.903090	0.801030	0.801030	0.144335	0.0	0.0	0.0	0.0	0.0	0.328715	0.363014	0.348298	0.353165	0.017659	0.299465	0.338275	0.323168	0.331763	0.020784	0.142857	0.250000	0.178571	0.142857	0.061859	0.698970	0.903090	0.835050	0.903090	0.117849	0.0	0.0	0.0	0.0	0.0	0.328715	0.363014	0.348298	0.353165	0.017659	0.299465	0.338275	0.323168	0.331763	0.020784	0.142857	0.250000	0.178571	0.142857	0.061859	0.69897	0.903090	0.835050	0.903090	0.117849	0.0	0.0	0.0	0.0	0.0
	2018-11-18	5.0	YOU TELAM ANFY	1	1	30.2	41.0	0.0	6.25	3.14	6	0	6.0	7.43	19.86	0.0	378695693	8462	A Bunney	10	ZILLMERE SPORTS	08:16PM	331	Grade 5	Albion Park	18 Nov 18	0.020215	0	0.0	0.200000	0.778151	0.060000	19.17	0.600092	7.18	0.326284	0.331763	0.198089	0.328715	0.363014	0.345865	0.345865	0.024253	0.312332	0.338275	0.325304	0.325304	0.018344	0.142857	0.200000	0.171429	0.171429	0.040406	0.778151	0.903090	0.840621	0.840621	0.088345	0.0	0.0	0.0	0.0	0.0	0.328715	0.363014	0.343402	0.340940	0.017429	0.299465	0.338275	0.320459	0.322048	0.017814	0.142857	0.250000	0.183929	0.171429	0.051632	0.698970	0.903090	0.820825	0.840621	0.100341	0.0	0.0	0.0	0.0	0.0	0.328715	0.363014	0.343402	0.340940	0.017429	0.299465	0.338275	0.320459	0.322048	0.017814	0.142857	0.250000	0.183929	0.171429	0.051632	0.69897	0.903090	0.820825	0.840621	0.100341	0.0	0.0	0.0	0.0	0.0
	2019-06-23	5.0	YOU TELAM ANFY	1	1	29.6	51.0	0.0	5.75	0.86	8	0	8.0	7.44	19.66	0.0	445056131	8462	A Bunney	9	GREYHOUND ADOPTION PROGRAM	07:54PM	331	Grade 5	Albion Park	23 Jun 19	0.016271	0	0.0	0.200000	0.778151	0.059396	19.17	0.600092	7.18	0.375381	0.325269	0.198089	0.326284	0.326284	0.326284	0.326284	0.000000	0.331763	0.331763	0.331763	0.331763	0.000000	0.200000	0.200000	0.200000	0.200000	0.000000	0.778151	0.778151	0.778151	0.778151	0.000000	0.0	0.0	0.0	0.0	0.0	0.326284	0.363014	0.339979	0.328715	0.016924	0.299465	0.338275	0.322720	0.331763	0.016234	0.142857	0.250000	0.187143	0.200000	0.045288	0.698970	0.903090	0.812290	0.778151	0.088969	0.0	0.0	0.0	0.0	0.0	0.326284	0.363014	0.339979	0.328715	0.016924	0.299465	0.338275	0.322720	0.331763	0.016234	0.142857	0.250000	0.187143	0.200000	0.045288	0.69897	0.903090	0.812290	0.778151	0.088969	0.0	0.0	0.0	0.0	0.0
	2019-06-30	8.0	YOU TELAM ANFY	4	10	29.5	51.0	0.0	17.25	2.0	7	0	7.0	11.38	24.16	0.0	448789428	8462	A Bunney	7	SKY RACING	07:45PM	395	Open	Albion Park	30 Jun 19	0.015995	0	0.0	0.125000	0.954243	0.061165	22.85	0.588509	10.58	0.228891	0.148506	0.130206	0.375381	0.375381	0.375381	0.375381	0.000000	0.325269	0.325269	0.325269	0.325269	0.000000	0.200000	0.200000	0.200000	0.200000	0.000000	0.778151	0.778151	0.778151	0.778151	0.000000	0.0	0.0	0.0	0.0	0.0	0.375381	0.375381	0.375381	0.375381	0.000000	0.325269	0.325269	0.325269	0.325269	0.000000	0.200000	0.200000	0.200000	0.200000	0.000000	0.778151	0.778151	0.778151	0.778151	0.000000	0.0	0.0	0.0	0.0	0.0	0.326284	0.375381	0.345879	0.340940	0.020929	0.299465	0.338275	0.323145	0.328516	0.014558	0.142857	0.250000	0.189286	0.200000	0.040846	0.69897	0.903090	0.806601	0.778151	0.080787	0.0	0.0	0.0	0.0	0.0
	2019-08-25	6.0	YOU TELAM ANFY	4	4	29.5	14.0	0.0	5.0	0.57	3	0	3.0	7.33	19.72	0.0	465432748	8462	A Bunney	9	FABREGAS @ METICULOUS LODGE	07:40PM	331	Masters Grade 5	Albion Park	25 Aug 19	0.058684	0	0.0	0.166667	0.845098	0.059577	19.17	0.600092	7.18	0.360548	0.397681	0.126605	0.228891	0.375381	0.302136	0.302136	0.103585	0.148506	0.325269	0.236887	0.236887	0.124990	0.125000	0.200000	0.162500	0.162500	0.053033	0.778151	0.954243	0.866197	0.866197	0.124515	0.0	0.0	0.0	0.0	0.0	0.228891	0.375381	0.302136	0.302136	0.103585	0.148506	0.325269	0.236887	0.236887	0.124990	0.125000	0.200000	0.162500	0.162500	0.053033	0.778151	0.954243	0.866197	0.866197	0.124515	0.0	0.0	0.0	0.0	0.0	0.228891	0.375381	0.329166	0.328715	0.048169	0.148506	0.338275	0.298196	0.325269	0.067332	0.125000	0.250000	0.180102	0.200000	0.044505	0.69897	0.954243	0.827692	0.778151	0.092481	0.0	0.0	0.0	0.0	0.0

Generate predictions

Now this is the part that gets a bit hairy, so I am going to split it up into two parts. The good thing is that the coding will remain relatively simple.

The two things that I want to do is place live bets and save our predictions so that we can use them in a simulator we will create in the Part V.

Let's save our historical ratings for our simulator first as its quick and straight forward and then move on to placing live bets:

Getting data ready for our simulator

Feeding our predictions through the simulator is entirely optional, but, in my opinion it is where the real sauce is made. The idea is that if we are testing our model live, we can also use the simulator to test what would happen if we tested different staking methodologies, market timings and bet placement to optimise our model. This way you can have a model but test out different strategies to optimise model performance. The thing is, I have had a play with the simulator already and we can't simulate market_catalogue unless you have recorded it yourself (which is what I'll be using to get market_id and selection_id to place live bets). The simulator we will use later on will only take your ratings, market_id and selection_id, so we need our data in a similar format to what we had in How to automate III. In other words, since we don't have market_catalogue in the simulator, we need another way to get the market_id and selection_id.

My hacky work around is to generate the probabilities like normal (since the data is historical), we don't need to deal with reserve dogs and scratching's, then get the market_id and selection_id from the Betfair datascience greyhound model by merging on DogName and date. We can take the code we wrote in How to automate III that downloads the greyhound ratings and convert that into a function that downloads the ratings for a date range.

# Generate predictions like normal
# Range of dates that we want to simulate later '2022-03-01' to '2022-04-01'
todays_data = model_df[(model_df['date_dt'] &gt;= pd.Timestamp('2022-03-01').strftime('%Y-%m-%d')) &amp; (model_df['date_dt'] &lt; pd.Timestamp('2022-04-01').strftime('%Y-%m-%d'))]
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities
todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
todays_data['rating'] = 1/todays_data['renormalise_prob']
todays_data = todays_data.sort_values(by = 'date_dt')
todays_data

C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/3121846001.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['prob_LogisticRegression'] = dog_win_probabilities
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/3121846001.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/3121846001.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['rating'] = 1/todays_data['renormalise_prob']

	FastTrack_DogId	date_dt	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Checks	Comments	SplitMargin	RunTime	Prizemoney	FastTrack_RaceId	TrainerId	TrainerName	RaceNum	RaceName	RaceTime	Distance	RaceGrade	Track	date	StartPrice_probability	win	Prizemoney_norm	Place_inv	Place_log	RunSpeed	RunTime_median	speed_index	SplitMargin_median	RunTime_norm	SplitMargin_norm	box_win_percent	RunTime_norm_min_28D	RunTime_norm_max_28D	RunTime_norm_mean_28D	RunTime_norm_median_28D	RunTime_norm_std_28D	SplitMargin_norm_min_28D	SplitMargin_norm_max_28D	SplitMargin_norm_mean_28D	SplitMargin_norm_median_28D	SplitMargin_norm_std_28D	Place_inv_min_28D	Place_inv_max_28D	Place_inv_mean_28D	Place_inv_median_28D	Place_inv_std_28D	Place_log_min_28D	Place_log_max_28D	Place_log_mean_28D	Place_log_median_28D	Place_log_std_28D	Prizemoney_norm_min_28D	Prizemoney_norm_max_28D	Prizemoney_norm_mean_28D	Prizemoney_norm_median_28D	Prizemoney_norm_std_28D	RunTime_norm_min_91D	RunTime_norm_max_91D	RunTime_norm_mean_91D	RunTime_norm_median_91D	RunTime_norm_std_91D	SplitMargin_norm_min_91D	SplitMargin_norm_max_91D	SplitMargin_norm_mean_91D	SplitMargin_norm_median_91D	SplitMargin_norm_std_91D	Place_inv_min_91D	Place_inv_max_91D	Place_inv_mean_91D	Place_inv_median_91D	Place_inv_std_91D	Place_log_min_91D	Place_log_max_91D	Place_log_mean_91D	Place_log_median_91D	Place_log_std_91D	Prizemoney_norm_min_91D	Prizemoney_norm_max_91D	Prizemoney_norm_mean_91D	Prizemoney_norm_median_91D	Prizemoney_norm_std_91D	RunTime_norm_min_365D	RunTime_norm_max_365D	RunTime_norm_mean_365D	RunTime_norm_median_365D	RunTime_norm_std_365D	SplitMargin_norm_min_365D	SplitMargin_norm_max_365D	SplitMargin_norm_mean_365D	SplitMargin_norm_median_365D	SplitMargin_norm_std_365D	Place_inv_min_365D	Place_inv_max_365D	Place_inv_mean_365D	Place_inv_median_365D	Place_inv_std_365D	Place_log_min_365D	Place_log_max_365D	Place_log_mean_365D	Place_log_median_365D	Place_log_std_365D	Prizemoney_norm_min_365D	Prizemoney_norm_max_365D	Prizemoney_norm_mean_365D	Prizemoney_norm_median_365D	Prizemoney_norm_std_365D	prob_LogisticRegression	renormalise_prob	rating
526490	523685389	2022-03-01	1.0	JOSEPH RUMBLE	8	8	30.1	2.5	0.0	8.25	0	0	0	0	0.00	21.83	1365.0	764579619	91264	B Belford	1	TAB	06:55PM	380	Novice Non Penalty	Townsville	01 Mar 22	0.318829	1	0.261288	1.000000	0.301030	0.057447	22.10	0.641825	7.56	0.561842	0.000000	0.123070	0.514985	0.514985	0.514985	0.514985	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	1.000000	1.00	1.000000	1.000000	0.000000	0.301030	0.301030	0.301030	0.301030	0.000000	0.264578	0.264578	2.645776e-01	0.264578	0.000000	0.386805	0.514985	0.450895	0.450895	0.090637	0.482301	0.482301	0.482301	0.482301	0.000000	0.500000	1.00	0.750000	0.750000	0.353553	0.30103	0.477121	0.389076	0.389076	0.124515	0.238161	0.264578	0.251369	0.251369	0.018679	0.386805	0.514985	0.450895	0.450895	0.090637	0.482301	0.482301	0.482301	0.482301	0.000000	0.500000	1.00	0.750000	0.750000	0.353553	0.30103	0.477121	0.389076	0.389076	0.124515	0.238161	0.264578	0.251369	0.251369	0.018679	0.360558	0.321076	3.114527
494469	482776246	2022-03-01	1.0	BLAZING NENNA	4	4	25.7	2.3	0.0	3.31	0	Q/11	0	0	10.27	23.30	0.0	764625202	115912	M Delbridge	6	CHS GROUP HT3	05:37PM	410	Grade 5 Heat	Horsham	01 Mar 22	0.365634	1	0.000000	1.000000	0.301030	0.056829	23.46	0.480327	10.39	0.534335	0.558423	0.131261	0.500000	0.513187	0.504396	0.500000	0.007613	0.464830	0.633333	0.570108	0.612161	0.091786	0.250000	1.00	0.500000	0.250000	0.433013	0.301030	0.698970	0.566323	0.698970	0.229751	0.000000	0.000000	1.628327e-15	0.000000	0.000000	0.307377	0.609439	0.508348	0.524719	0.074167	0.373358	0.651515	0.543480	0.584034	0.093516	0.142857	1.00	0.535714	0.500000	0.324194	0.30103	0.903090	0.528325	0.477121	0.198314	0.000000	0.000000	0.000000	0.000000	0.000000	0.233840	0.609439	0.493044	0.512184	0.074069	0.000000	0.651515	0.515072	0.539757	0.120687	0.125000	1.00	0.443328	0.333333	0.314640	0.30103	0.954243	0.607971	0.602060	0.219526	0.000000	0.000000	0.000000	0.000000	0.000000	0.204496	0.145077	6.892877
583640	578899991	2022-03-01	4.0	RIVER RAGING	2	2	30.3	13.8	0.0	3.71	1.13	M/3	0	0	6.89	20.36	0.0	764592641	283109	L Dalziel	1	FOLLOW @GRV_NEWS ON TWITTER	11:08AM	350	Maiden	Healesville	01 Mar 22	0.061390	0	0.000000	0.250000	0.698970	0.058171	19.56	0.250778	6.64	0.303536	0.318578	0.138427	0.312992	0.312992	0.312992	0.312992	0.000000	0.346715	0.346715	0.346715	0.346715	0.000000	0.250000	0.25	0.250000	0.250000	0.000000	0.698970	0.698970	0.698970	0.698970	0.000000	0.000000	0.000000	0.000000e+00	0.000000	0.000000	0.241288	0.312992	0.285038	0.303536	0.033489	0.251704	0.346715	0.290635	0.290765	0.038193	0.142857	0.25	0.201905	0.200000	0.048369	0.69897	0.903090	0.784856	0.778151	0.090009	0.000000	0.000000	0.000000	0.000000	0.000000	0.241288	0.312992	0.285038	0.303536	0.033489	0.251704	0.346715	0.290635	0.290765	0.038193	0.142857	0.25	0.201905	0.200000	0.048369	0.69897	0.903090	0.784856	0.778151	0.090009	0.000000	0.000000	0.000000	0.000000	0.000000	0.065977	0.102443	9.761504
385698	419530408	2022-03-01	5.0	FREENEY	8	8	25.1	14.0	0.0	8.25	1.14	0	0	0	0.00	22.47	20.0	764579628	63422	R Lound	10	BURDEKIN VET CLINIC	09:55PM	380	Grade 5	Townsville	01 Mar 22	0.058903	0	0.110185	0.200000	0.778151	0.059132	22.10	0.641825	7.56	0.417668	0.000000	0.123070	0.389381	0.406750	0.398065	0.398065	0.012282	0.519920	0.519920	0.519920	0.519920	0.000000	0.125000	0.25	0.187500	0.187500	0.088388	0.698970	0.954243	0.826606	0.826606	0.180505	0.110185	0.161208	1.356966e-01	0.135697	0.036079	0.378587	0.520445	0.438208	0.435239	0.050906	0.519920	0.519920	0.519920	0.519920	0.000000	0.125000	1.00	0.398810	0.250000	0.304154	0.30103	0.954243	0.636079	0.698970	0.229354	0.086783	0.256326	0.171119	0.161208	0.059824	0.277345	0.547967	0.448106	0.459606	0.067232	0.486807	0.519920	0.509978	0.516591	0.015762	0.125000	1.00	0.437302	0.333333	0.287424	0.30103	0.954243	0.595411	0.602060	0.199176	0.000000	0.256326	0.161721	0.181581	0.080195	0.148800	0.136846	7.307467
453065	451768903	2022-03-01	5.0	ENCOURAGING	2	2	22.1	15.0	0.0	6.0	0.29	455	0	455	10.35	22.57	0.0	764579685	70111	B Young	12	GOSSIE TIGERS GOOD TIMES	10:39PM	388	Non Graded	Gosford	01 Mar 22	0.058204	0	0.000000	0.200000	0.778151	0.058170	22.39	0.564085	10.19	0.460124	0.422705	0.166667	0.462217	0.582400	0.498498	0.474688	0.056299	0.413211	0.605769	0.539241	0.568992	0.086316	0.200000	0.50	0.300000	0.250000	0.135401	0.000000	0.778151	0.530643	0.698970	0.317165	0.000000	0.227091	4.541824e-02	0.000000	0.101558	0.435567	0.582400	0.510280	0.506062	0.043547	0.335526	0.605769	0.507137	0.513845	0.080824	0.200000	1.00	0.559375	0.500000	0.321914	0.00000	0.778151	0.476169	0.477121	0.204293	0.000000	0.269225	0.170899	0.227091	0.115477	0.435567	0.620208	0.519683	0.508949	0.050995	0.335526	0.676056	0.528287	0.534247	0.085502	0.200000	1.00	0.592857	0.500000	0.310625	0.00000	0.778151	0.460377	0.477121	0.185631	0.000000	0.269225	0.187400	0.227091	0.105954	0.310750	0.189764	5.269698
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
269364	328634961	2022-03-31	6.0	LUNARAY	7	7	28.8	18.8	0.0	6.96	0.24	M/766	0	0	6.99	25.92	0.0	771803534	117062	V Mileto	6	SHEPPARTON WORKWEAR & SAFETY	01:19PM	450	Grade 5 T3	Shepparton	31 Mar 22	0.044487	0	0.000000	0.166667	0.845098	0.057600	25.55	0.404304	6.70	0.428627	0.292561	0.118601	0.370377	0.461165	0.415771	0.415771	0.064197	0.320144	0.341040	0.330592	0.330592	0.014776	0.200000	0.25	0.225000	0.225000	0.035355	0.698970	0.778151	0.738561	0.738561	0.055990	0.000000	0.000000	2.720046e-15	0.000000	0.000000	0.282853	0.461165	0.399299	0.415352	0.060173	0.000000	0.397661	0.280607	0.320144	0.129212	0.125000	1.00	0.379762	0.250000	0.297826	0.30103	0.954243	0.644364	0.698970	0.211158	0.000000	0.159040	0.022720	0.000000	0.060112	0.280719	0.529528	0.410459	0.414407	0.058641	0.000000	0.441003	0.297743	0.312132	0.111641	0.125000	1.00	0.324033	0.250000	0.245639	0.30103	0.954243	0.687225	0.698970	0.181101	0.000000	0.177795	0.010526	0.000000	0.041488	0.075047	0.097817	10.223167
538082	534921234	2022-03-31	6.0	QUINNLEY BALE	6	6	26.7	18.0	0.0	7.5	1.14	56	0	56	5.47	17.83	0.0	771810563	98781	S Rhodes	12	WATCH LIVE ON SPORTSBET	10:46PM	297	Grade 5	Dapto	31 Mar 22	0.046607	0	0.000000	0.166667	0.845098	0.060034	17.19	0.593790	5.37	0.320527	0.408592	0.120301	0.288301	0.347716	0.318008	0.318008	0.042013	0.303220	0.390710	0.346965	0.346965	0.061865	0.142857	0.50	0.321429	0.321429	0.252538	0.477121	0.903090	0.690106	0.690106	0.301205	0.000000	0.221975	1.109875e-01	0.110988	0.156960	0.288301	0.471082	0.384562	0.389432	0.082219	0.303220	0.426606	0.368411	0.371909	0.052814	0.142857	1.00	0.473214	0.375000	0.381742	0.30103	0.903090	0.595053	0.588046	0.262071	0.000000	0.264698	0.121668	0.110988	0.141569	0.236247	0.500000	0.387665	0.390897	0.075799	0.000000	0.481447	0.342572	0.353107	0.117160	0.125000	1.00	0.374454	0.250000	0.278973	0.00000	0.954243	0.605624	0.650515	0.270858	0.000000	0.264698	0.053752	0.000000	0.100704	0.068536	0.065307	15.312345
363059	403640676	2022-03-31	3.0	SAINT CHARLOTTE	5	5	27.7	18.0	0.0	4.5	0.14	0	0	0	10.87	23.59	140.0	771539517	67189	W Wilson	8	EXCHANGE PRINTERS (N/P) STAKE	01:52PM	400	Restricted Win	Mount Gambier	31 Mar 22	0.041828	0	0.179102	0.333333	0.602060	0.058975	23.39	0.696399	10.98	0.457609	0.550598	0.147799	0.401509	0.534438	0.467974	0.467974	0.093995	0.328496	0.527473	0.427984	0.427984	0.140698	0.125000	1.00	0.562500	0.562500	0.618718	0.301030	0.954243	0.627636	0.627636	0.461891	0.000000	0.225702	1.128509e-01	0.112851	0.159595	0.401509	0.534438	0.475213	0.486146	0.041384	0.328496	0.564576	0.467640	0.504558	0.078033	0.125000	1.00	0.440833	0.333333	0.320552	0.30103	0.954243	0.603688	0.602060	0.220083	0.000000	0.225702	0.135590	0.179102	0.095348	0.257933	0.534438	0.457113	0.477655	0.064917	0.212446	0.564576	0.443305	0.445950	0.089104	0.125000	1.00	0.358408	0.250000	0.274086	0.30103	0.954243	0.660966	0.698970	0.195529	0.000000	0.225702	0.106690	0.172038	0.098393	0.155050	0.100326	9.967472
362095	403093108	2022-03-31	3.0	SILVER SANDALS	5	5	27.0	10.0	0.0	2.5	2.29	42	0	42	5.58	31.67	800.0	771539501	87148	S Lawrance	2	SKY RACING	06:40PM	520	Masters Grade 5	Ipswich	31 Mar 22	0.083134	0	0.241969	0.333333	0.602060	0.060904	30.79	0.823159	5.42	0.361067	0.356631	0.095370	0.435748	0.439595	0.437671	0.437671	0.002720	0.354196	0.368046	0.361121	0.361121	0.009793	0.333333	1.00	0.666667	0.666667	0.471405	0.301030	0.602060	0.451545	0.451545	0.212860	0.207730	0.275374	2.415521e-01	0.241552	0.047832	0.330867	0.507902	0.426973	0.435748	0.052271	0.198046	0.460029	0.340578	0.340419	0.073412	0.166667	1.00	0.498485	0.333333	0.340462	0.30103	0.845098	0.560166	0.602060	0.203530	0.110185	0.275374	0.199906	0.207730	0.064087	0.297445	0.507902	0.398206	0.402791	0.056453	0.156690	0.460029	0.325128	0.317851	0.074465	0.125000	1.00	0.328665	0.225000	0.260181	0.30103	0.954243	0.694669	0.738561	0.197928	0.000000	0.275374	0.127103	0.138606	0.097759	0.081817	0.110114	9.081527
565804	556974861	2022-03-31	2.0	ORSON LAURIE	3	3	30.4	3.1	0.0	0.75	0.86	112	0	112	8.32	23.04	530.0	771810580	125472	E Harris	4	RIVERINA STOCKFEEDS	08:04PM	411	Free For All	Casino	31 Mar 22	0.275736	0	0.227091	0.500000	0.477121	0.056058	23.37	0.418680	8.62	0.571615	0.680288	0.062500	0.351105	0.583040	0.453627	0.445632	0.092450	0.487805	0.651134	0.562563	0.574442	0.064323	0.142857	1.00	0.362698	0.183333	0.339607	0.000000	0.903090	0.592798	0.778151	0.343479	0.000000	0.267033	7.058912e-02	0.000000	0.121104	0.351105	0.588161	0.473661	0.483607	0.085113	0.465422	0.664141	0.554578	0.574442	0.067472	0.142857	1.00	0.543492	0.500000	0.401886	0.00000	0.903090	0.545322	0.477121	0.298264	0.000000	0.275104	0.142790	0.167586	0.124768	0.351105	0.588161	0.479065	0.494281	0.085020	0.465422	0.664141	0.557285	0.574442	0.065012	0.142857	1.00	0.572024	0.500000	0.404685	0.00000	0.903090	0.530952	0.477121	0.294808	0.000000	0.275104	0.149961	0.224986	0.124372	0.150460	0.123089	8.124181

26438 rows × 117 columns

def download_iggy_ratings(date):
    """Downloads the Betfair Iggy model ratings for a given date and formats it into a nice DataFrame.

    Args:
        date (datetime): the date we want to download the ratings for
    """
    iggy_url_1 = 'https://betfair-data-supplier-prod.herokuapp.com/api/widgets/iggy-joey/datasets?date='
    iggy_url_2 = date.strftime("%Y-%m-%d")
    iggy_url_3 = '&amp;presenter=RatingsPresenter&amp;csv=true'
    iggy_url = iggy_url_1 + iggy_url_2 + iggy_url_3

    # Download todays greyhounds ratings
    iggy_df = pd.read_csv(iggy_url)

    # Data clearning
    iggy_df = iggy_df.rename(
    columns={
        "meetings.races.bfExchangeMarketId":"market_id",
        "meetings.races.runners.bfExchangeSelectionId":"selection_id",
        "meetings.races.runners.ratedPrice":"rating",
        "meetings.races.number":"RaceNum",
        "meetings.name":"Track",
        "meetings.races.runners.name":"DogName"
        }
    )
    # iggy_df = iggy_df[['market_id','selection_id','rating']]
    iggy_df['market_id'] = iggy_df['market_id'].astype(str)
    iggy_df['date_dt'] = date

    # Set market_id and selection_id as index for easy referencing
    # iggy_df = iggy_df.set_index(['market_id','selection_id'])
    return(iggy_df)

# Download historical ratings over a time period and convert into a big DataFrame.
back_test_period = pd.date_range(start='2022-03-01', end='2022-04-01')
frames = [download_iggy_ratings(day) for day in back_test_period]
iggy_df = pd.concat(frames)
iggy_df

	Track	meetings.bfExchangeEventId	meetings.races.name	RaceNum	market_id	meetings.races.comment	selection_id	meetings.races.runners.number	DogName	rating	date_dt
0	Devonport	3	R1 452m Gr6	1	1.195395419	NaN	43154446	1	Magic Rogue	11.07	2022-03-01
1	Devonport	3	R1 452m Gr6	1	1.195395419	NaN	43154447	2	Youre Off	6.10	2022-03-01
2	Devonport	3	R1 452m Gr6	1	1.195395419	NaN	42352031	3	Castle Town	6.60	2022-03-01
3	Devonport	3	R1 452m Gr6	1	1.195395419	NaN	43154448	4	Buckle Up Aumond	8.02	2022-03-01
4	Devonport	3	R1 452m Gr6	1	1.195395419	NaN	42413752	5	Was That Then	17.14	2022-03-01
...	...	...	...	...	...	...	...	...	...	...	...
906	Wentworth Park	11	R9 520m Heat	9	1.196971034	NaN	27692794	3	Mercator Closer	24.49	2022-04-01
907	Wentworth Park	11	R9 520m Heat	9	1.196971034	NaN	36057267	4	Kooringa Lucy	2.61	2022-04-01
908	Wentworth Park	11	R9 520m Heat	9	1.196971034	NaN	25540155	6	Shanjo Prince	213.53	2022-04-01
909	Wentworth Park	11	R9 520m Heat	9	1.196971034	NaN	39788790	7	Lots Of Chatter	2.10	2022-04-01
910	Wentworth Park	11	R9 520m Heat	9	1.196971034	NaN	30681833	8	Zipping Brady	18.25	2022-04-01

27346 rows × 11 columns

# format DogNames to merge
todays_data['DogName'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
iggy_df['DogName'] = iggy_df['DogName'].str.upper()
# Merge
backtest = iggy_df[['market_id','selection_id','DogName','date_dt']].merge(todays_data[['rating','DogName','date_dt']], how = 'inner', on = ['DogName','date_dt'])
backtest

	market_id	selection_id	DogName	date_dt	rating
0	1.195395419	43154446	MAGIC ROGUE	2022-03-01	8.137725
1	1.195395419	43154447	YOURE OFF	2022-03-01	6.051752
2	1.195395419	42352031	CASTLE TOWN	2022-03-01	9.768546
3	1.195395419	43154448	BUCKLE UP AUMOND	2022-03-01	13.656089
4	1.195395419	42413752	WAS THAT THEN	2022-03-01	11.941057
...	...	...	...	...	...
25540	1.196921144	39309141	RANAKO BALE	2022-03-31	3.975601
25541	1.196921144	39348645	NEVAEH BALE	2022-03-31	3.917458
25542	1.196921144	26870111	INGA MIA	2022-03-31	7.459386
25543	1.196921144	42472271	WINNIE COASTER	2022-03-31	7.046953
25544	1.196921144	40022831	ASTON HEBE	2022-03-31	4.603341

25545 rows × 5 columns

# Save predictions for if we want to backtest/simulate it later
backtest.to_csv('backtest.csv', index=False) # Csv format
# backtest.to_pickle('backtest.pkl') # pickle format (faster, but can't open in excel)

Perfect, with our hacky solution we have managed to merge around a months' worth of data relatively quickly and saved it in a csv format. With all the merging it seems we have only lost around 1000 - 2000 rows of data out of 27,000 rows of data, which seems only a small price to pay.

Getting data ready for placing live bets

Placing live bets is pretty simple but we have one issue. FastTrack Data alone is unable to tell us how many greyhounds will run in the race. For example, this race later today (2022-07-04) has 8 runners + 2 reserves:

todays_data[todays_data['FastTrack_RaceId'] == '798906744']

			FastTrack_DogId	date_dt	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Checks	Comments	SplitMargin	RunTime	Prizemoney	FastTrack_RaceId	TrainerId	TrainerName	RaceName	RaceTime	Distance	RaceGrade	date	StartPrice_probability	win	Prizemoney_norm	Place_inv	Place_log	RunSpeed	RunTime_median	speed_index	SplitMargin_median	RunTime_norm	SplitMargin_norm	box_win_percent	RunTime_norm_min_28D	RunTime_norm_max_28D	RunTime_norm_mean_28D	RunTime_norm_median_28D	RunTime_norm_std_28D	SplitMargin_norm_min_28D	SplitMargin_norm_max_28D	SplitMargin_norm_mean_28D	SplitMargin_norm_median_28D	SplitMargin_norm_std_28D	Place_inv_min_28D	Place_inv_max_28D	Place_inv_mean_28D	Place_inv_median_28D	Place_inv_std_28D	Place_log_min_28D	Place_log_max_28D	Place_log_mean_28D	Place_log_median_28D	Place_log_std_28D	Prizemoney_norm_min_28D	Prizemoney_norm_max_28D	Prizemoney_norm_mean_28D	Prizemoney_norm_median_28D	Prizemoney_norm_std_28D	RunTime_norm_min_91D	RunTime_norm_max_91D	RunTime_norm_mean_91D	RunTime_norm_median_91D	RunTime_norm_std_91D	SplitMargin_norm_min_91D	SplitMargin_norm_max_91D	SplitMargin_norm_mean_91D	SplitMargin_norm_median_91D	SplitMargin_norm_std_91D	Place_inv_min_91D	Place_inv_max_91D	Place_inv_mean_91D	Place_inv_median_91D	Place_inv_std_91D	Place_log_min_91D	Place_log_max_91D	Place_log_mean_91D	Place_log_median_91D	Place_log_std_91D	Prizemoney_norm_min_91D	Prizemoney_norm_max_91D	Prizemoney_norm_mean_91D	Prizemoney_norm_median_91D	Prizemoney_norm_std_91D	RunTime_norm_min_365D	RunTime_norm_max_365D	RunTime_norm_mean_365D	RunTime_norm_median_365D	RunTime_norm_std_365D	SplitMargin_norm_min_365D	SplitMargin_norm_max_365D	SplitMargin_norm_mean_365D	SplitMargin_norm_median_365D	SplitMargin_norm_std_365D	Place_inv_min_365D	Place_inv_max_365D	Place_inv_mean_365D	Place_inv_median_365D	Place_inv_std_365D	Place_log_min_365D	Place_log_max_365D	Place_log_mean_365D	Place_log_median_365D	Place_log_std_365D	Prizemoney_norm_min_365D	Prizemoney_norm_max_365D	Prizemoney_norm_mean_365D	Prizemoney_norm_median_365D	Prizemoney_norm_std_365D	prob_LogisticRegression
DogName_bf	Track	RaceNum
YOU SEE LINA	Cannington	1	530411826	2022-07-04	0.0	YOU SEE LINA	1	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	10408	Michael McLennan	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.167785	0.363135	0.419607	0.395672	0.404274	0.029202	0.395833	0.432534	0.410499	0.403130	0.019428	0.200000	0.500000	0.300000	0.200000	0.173205	0.477121	0.778151	0.677808	0.778151	0.173800	0.0	0.213623	7.120781e-02	0.000000	0.123336	0.184971	0.419607	0.324576	0.346645	0.092382	0.220812	0.432534	0.338647	0.353089	0.084555	0.166667	0.500000	0.316667	0.266667	0.153116	0.477121	0.845098	0.659617	0.690106	0.162743	0.0	0.213623	0.101436	0.092505	0.111554	0.179269	0.419607	0.303822	0.326906	0.075138	0.133803	0.432534	0.315318	0.310345	0.082022	0.125000	0.500000	0.247937	0.200000	0.121737	0.477121	0.954243	0.743299	0.778151	0.156196	0.0	0.213623	0.085118	0.000000	0.095461	0.115534
BELLA LINA	Cannington	1	547605028	2022-07-04	0.0	BELLA LINA	5	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	83951	Rodney Noden	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.111111	0.228705	0.307236	0.273491	0.284534	0.040413	0.220812	0.285592	0.245073	0.228814	0.035318	0.166667	0.200000	0.188889	0.200000	0.019245	0.778151	0.845098	0.800467	0.778151	0.038652	0.0	0.000000	3.700743e-16	0.000000	0.000000	0.158046	0.307236	0.246575	0.264844	0.050600	0.029221	0.285592	0.203061	0.228814	0.091884	0.142857	0.200000	0.187075	0.200000	0.023119	0.778151	0.903090	0.805563	0.778151	0.049718	0.0	0.000000	0.000000	0.000000	0.000000	0.084276	0.307236	0.225040	0.234235	0.061784	0.029221	0.366864	0.240440	0.236842	0.088850	0.125000	0.250000	0.174702	0.166667	0.034400	0.698970	0.954243	0.835377	0.845098	0.073324	0.0	0.142298	0.008894	0.000000	0.035574	0.031221
PENNY KEEPING	Cannington	1	561780971	2022-07-04	0.0	PENNY KEEPING	8	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	68481	Bradley Cook	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.100671	0.076736	0.356220	0.228759	0.253321	0.141352	0.424645	0.513538	0.474906	0.486535	0.045573	0.125000	0.500000	0.250000	0.125000	0.216506	0.477121	0.954243	0.795202	0.954243	0.275466	0.0	0.000000	0.000000e+00	0.000000	0.000000	0.076736	0.356220	0.236342	0.256207	0.116406	0.424645	0.513538	0.467291	0.465490	0.040207	0.125000	0.500000	0.223214	0.133929	0.184716	0.477121	0.954243	0.822174	0.928666	0.231296	0.0	0.000000	0.000000	0.000000	0.000000	0.076736	0.356220	0.236342	0.256207	0.116406	0.424645	0.513538	0.467291	0.465490	0.040207	0.125000	0.500000	0.223214	0.133929	0.184716	0.477121	0.954243	0.822174	0.928666	0.231296	0.0	0.000000	0.000000	0.000000	0.000000	0.049673
WHAT A PHOENIX	Cannington	1	603189486	2022-07-04	0.0	WHAT A PHOENIX	7	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	10408	Michael McLennan	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.108392	0.155537	0.318460	0.227087	0.207265	0.083251	0.000000	0.338235	0.150847	0.114306	0.172053	0.125000	0.250000	0.166667	0.125000	0.072169	0.698970	0.954243	0.869152	0.954243	0.147382	0.0	0.000000	0.000000e+00	0.000000	0.000000	0.155537	0.318460	0.227087	0.207265	0.083251	0.000000	0.338235	0.150847	0.114306	0.172053	0.125000	0.250000	0.166667	0.125000	0.072169	0.698970	0.954243	0.869152	0.954243	0.147382	0.0	0.000000	0.000000	0.000000	0.000000	0.155537	0.318460	0.227087	0.207265	0.083251	0.000000	0.338235	0.150847	0.114306	0.172053	0.125000	0.250000	0.166667	0.125000	0.072169	0.698970	0.954243	0.869152	0.954243	0.147382	0.0	0.000000	0.000000	0.000000	0.000000	0.040679
WHAT A QUIZ	Cannington	1	603189487	2022-07-04	0.0	WHAT A QUIZ	2	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	72510	Barry McPherson	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.141892	0.233563	0.233563	0.233563	0.233563	0.000000	0.302920	0.302920	0.302920	0.302920	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	0.698970	0.698970	0.000000	0.0	0.000000	0.000000e+00	0.000000	0.000000	0.233563	0.233563	0.233563	0.233563	0.000000	0.302920	0.302920	0.302920	0.302920	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	0.698970	0.698970	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.233563	0.233563	0.233563	0.233563	0.000000	0.302920	0.302920	0.302920	0.302920	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	0.698970	0.698970	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.043975
WHAT A SHAKER	Cannington	1	603189986	2022-07-04	0.0	WHAT A SHAKER	4	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	72510	Barry McPherson	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.125000	0.282892	0.282892	0.282892	0.282892	0.000000	0.268116	0.268116	0.268116	0.268116	0.000000	0.333333	0.333333	0.333333	0.333333	0.000000	0.602060	0.602060	0.602060	0.602060	0.000000	0.0	0.000000	0.000000e+00	0.000000	0.000000	0.282892	0.282892	0.282892	0.282892	0.000000	0.268116	0.268116	0.268116	0.268116	0.000000	0.333333	0.333333	0.333333	0.333333	0.000000	0.602060	0.602060	0.602060	0.602060	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.282892	0.282892	0.282892	0.282892	0.000000	0.268116	0.268116	0.268116	0.268116	0.000000	0.333333	0.333333	0.333333	0.333333	0.000000	0.602060	0.602060	0.602060	0.602060	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.059174
WIZARDS LEGEND	Cannington	1	614056673	2022-07-04	0.0	WIZARD'S LEGEND Res.	10	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	127397	Colin Bainbridge	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.000000	0.307944	0.341758	0.326417	0.327983	0.015946	0.206724	0.288937	0.259703	0.271576	0.036365	0.166667	0.250000	0.204167	0.200000	0.034359	0.698970	0.845098	0.775093	0.778151	0.059761	0.0	0.151629	3.790717e-02	0.000000	0.075814	0.089468	0.341758	0.269683	0.313202	0.093934	0.103960	0.377622	0.240904	0.244173	0.081242	0.125000	0.250000	0.194792	0.200000	0.042477	0.698970	0.954243	0.797104	0.778151	0.084209	0.0	0.151629	0.037164	0.000000	0.068832	0.089468	0.341758	0.275533	0.303754	0.077507	0.103960	0.377622	0.242702	0.244173	0.071625	0.125000	0.333333	0.204266	0.200000	0.058218	0.602060	0.954243	0.785504	0.778151	0.099650	0.0	0.151629	0.037412	0.000000	0.067696	0.019388
WIZARDS DRAMA	Cannington	1	614057677	2022-07-04	0.0	WIZARD'S DRAMA Res.	9	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	127397	Colin Bainbridge	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.000000	0.225948	0.344591	0.271951	0.245316	0.063648	0.150000	0.269231	0.221376	0.244898	0.063000	0.142857	0.500000	0.280952	0.200000	0.191840	0.477121	0.903090	0.719454	0.778151	0.218967	0.0	0.209986	6.999522e-02	0.000000	0.121235	0.086870	0.344591	0.234575	0.238391	0.084399	0.088816	0.269231	0.188587	0.189288	0.067980	0.142857	0.500000	0.229762	0.171429	0.139270	0.477121	0.903090	0.777252	0.840621	0.169536	0.0	0.209986	0.059278	0.000000	0.094057	0.036656	0.344591	0.200202	0.228706	0.100907	0.069444	0.269231	0.170954	0.162215	0.071007	0.125000	0.500000	0.229613	0.171429	0.130210	0.477121	0.954243	0.777477	0.840621	0.171435	0.0	0.209986	0.044459	0.000000	0.084096	0.014393
DASHING ONYX	Cannington	1	626191408	2022-07-04	0.0	DASHING ONYX	6	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	67839	Graeme Hall	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.144876	0.262045	0.359113	0.314231	0.321535	0.048944	0.244898	0.377622	0.319292	0.335355	0.067805	0.166667	0.333333	0.277778	0.333333	0.096225	0.602060	0.845098	0.683073	0.602060	0.140318	0.0	0.185009	1.233393e-01	0.185009	0.106815	0.229498	0.359113	0.293047	0.291790	0.058241	0.244898	0.377622	0.318715	0.326170	0.055374	0.142857	0.333333	0.244048	0.250000	0.103555	0.602060	0.903090	0.738077	0.723579	0.158833	0.0	0.185009	0.092505	0.092505	0.106815	0.229498	0.359113	0.293047	0.291790	0.058241	0.244898	0.377622	0.318715	0.326170	0.055374	0.142857	0.333333	0.244048	0.250000	0.103555	0.602060	0.903090	0.738077	0.723579	0.158833	0.0	0.185009	0.092505	0.092505	0.106815	0.096959
WINTER RAIN	Cannington	1	637972981	2022-07-04	0.0	WINTER RAIN	3	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	798906744	27464	Jennifer Thompson	FREE ENTRY TABTOUCH PARK	01:32PM	275	Maiden	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	16.21	0.777366	5.58	0.0	0.0	0.101754	0.239673	0.371004	0.305338	0.305338	0.092865	0.133803	0.350640	0.242221	0.242221	0.153327	0.166667	0.500000	0.333333	0.333333	0.235702	0.477121	0.845098	0.661110	0.661110	0.260199	0.0	0.000000	0.000000e+00	0.000000	0.000000	0.239673	0.371004	0.305338	0.305338	0.092865	0.133803	0.350640	0.242221	0.242221	0.153327	0.166667	0.500000	0.333333	0.333333	0.235702	0.477121	0.845098	0.661110	0.661110	0.260199	0.0	0.000000	0.000000	0.000000	0.000000	0.239673	0.371004	0.305338	0.305338	0.092865	0.133803	0.350640	0.242221	0.242221	0.153327	0.166667	0.500000	0.333333	0.333333	0.235702	0.477121	0.845098	0.661110	0.661110	0.260199	0.0	0.000000	0.000000	0.000000	0.000000	0.101402

If we predict probabilities and renormalise now, we will calculate incorrect probabilities.

I've spent a really long time thinking about this and testing different methods that didn't work or weren't optimal. The best solution (and least complicated) that I have come up with is to predict probabilities on the FastTrack data first. Then a few minutes before the jump when all the lineups have been confirmed we use market_catalogue from the Betfair API to merge our predicted probabilities, merging on DogName,Track and RaceNum. If we merge on these three fields, it will bypass any issues with reserve dogs and scratchings. Then we can renormalise probabilities live within Flumine.

# Select todays data 
todays_data = model_df[model_df['date_dt'] == pd.Timestamp.now().strftime('%Y-%m-%d')]

# Generate runner win predictions
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities

# We no longer renomralise probability in this chunk of code, do it in Flumine instead
# todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
# todays_data['rating'] = 1/todays_data['renormalise_prob']
# todays_data = todays_data.sort_values(by = 'date_dt')

todays_data

C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/2638603781.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['prob_LogisticRegression'] = dog_win_probabilities

	FastTrack_DogId	date_dt	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Checks	Comments	SplitMargin	RunTime	Prizemoney	FastTrack_RaceId	TrainerId	TrainerName	RaceNum	RaceName	RaceTime	Distance	RaceGrade	Track	date	StartPrice_probability	win	Prizemoney_norm	Place_inv	Place_log	RunSpeed	RunTime_median	speed_index	SplitMargin_median	RunTime_norm	SplitMargin_norm	box_win_percent	RunTime_norm_min_28D	RunTime_norm_max_28D	RunTime_norm_mean_28D	RunTime_norm_median_28D	RunTime_norm_std_28D	SplitMargin_norm_min_28D	SplitMargin_norm_max_28D	SplitMargin_norm_mean_28D	SplitMargin_norm_median_28D	SplitMargin_norm_std_28D	Place_inv_min_28D	Place_inv_max_28D	Place_inv_mean_28D	Place_inv_median_28D	Place_inv_std_28D	Place_log_min_28D	Place_log_max_28D	Place_log_mean_28D	Place_log_median_28D	Place_log_std_28D	Prizemoney_norm_min_28D	Prizemoney_norm_max_28D	Prizemoney_norm_mean_28D	Prizemoney_norm_median_28D	Prizemoney_norm_std_28D	RunTime_norm_min_91D	RunTime_norm_max_91D	RunTime_norm_mean_91D	RunTime_norm_median_91D	RunTime_norm_std_91D	SplitMargin_norm_min_91D	SplitMargin_norm_max_91D	SplitMargin_norm_mean_91D	SplitMargin_norm_median_91D	SplitMargin_norm_std_91D	Place_inv_min_91D	Place_inv_max_91D	Place_inv_mean_91D	Place_inv_median_91D	Place_inv_std_91D	Place_log_min_91D	Place_log_max_91D	Place_log_mean_91D	Place_log_median_91D	Place_log_std_91D	Prizemoney_norm_min_91D	Prizemoney_norm_max_91D	Prizemoney_norm_mean_91D	Prizemoney_norm_median_91D	Prizemoney_norm_std_91D	RunTime_norm_min_365D	RunTime_norm_max_365D	RunTime_norm_mean_365D	RunTime_norm_median_365D	RunTime_norm_std_365D	SplitMargin_norm_min_365D	SplitMargin_norm_max_365D	SplitMargin_norm_mean_365D	SplitMargin_norm_median_365D	SplitMargin_norm_std_365D	Place_inv_min_365D	Place_inv_max_365D	Place_inv_mean_365D	Place_inv_median_365D	Place_inv_std_365D	Place_log_min_365D	Place_log_max_365D	Place_log_mean_365D	Place_log_median_365D	Place_log_std_365D	Prizemoney_norm_min_365D	Prizemoney_norm_max_365D	Prizemoney_norm_mean_365D	Prizemoney_norm_median_365D	Prizemoney_norm_std_365D	prob_LogisticRegression
44514	148673258	2022-07-04	0.0	SPEEDY MARINA	5	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455740	65928	Dawn Lee	5	LADBROKES BLENDED BETS 1-3 WIN	04:37PM	307	Grade 5	Bathurst	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	17.830	0.628105	7.880	0.0	0.0	0.136986	0.317232	0.317232	0.317232	0.317232	0.000000	0.173267	0.173267	0.173267	0.173267	0.000000	0.333333	0.333333	0.333333	0.333333	0.000000	0.602060	0.602060	6.020600e-01	0.602060	0.000000	0.212109	0.212109	2.121089e-01	0.212109	0.000000e+00	0.317232	0.332373	0.324803	0.324803	0.010706	0.173267	0.210579	0.191923	0.191923	0.026383	0.166667	0.333333	0.250000	0.250000	0.117851	0.602060	0.845098	7.235790e-01	0.723579	0.171854	0.000000	0.212109	0.106054	0.106054	0.149984	0.236982	0.371585	0.306603	0.317232	0.052095	0.173267	0.406600	0.242140	0.210579	0.096894	0.125000	0.333333	0.186905	0.166667	0.083715	0.602060	0.954243	8.299177e-01	0.845098	0.135269	0.000000	0.212109	0.042422	0.000000	0.094858	0.073607
54463	161977365	2022-07-04	0.0	FILTHY PHANTOM	7	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801448232	110385	Tony Hinrichsen	5	GIDDY-UP (N/P) STAKE	07:27PM	342	Masters	Angle Park	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	19.675	0.533632	7.730	0.0	0.0	0.124590	0.386907	0.455919	0.427165	0.420839	0.028492	0.314004	0.500000	0.408769	0.394027	0.082256	0.166667	0.333333	0.220000	0.200000	0.064979	0.602060	0.845098	7.563224e-01	0.778151	0.090977	0.000000	0.182760	3.655208e-02	0.000000	8.173293e-02	0.386907	0.572783	0.449667	0.436026	0.053935	0.210921	0.551665	0.403088	0.394027	0.092070	0.142857	1.000000	0.291209	0.200000	0.235613	0.000000	0.903090	6.692431e-01	0.778151	0.257193	0.000000	0.249855	0.057150	0.000000	0.095131	0.277002	0.572783	0.459854	0.456897	0.059514	0.210921	0.616496	0.443040	0.443820	0.089192	0.142857	1.000000	0.464354	0.333333	0.320517	0.000000	0.903090	5.716763e-01	0.602060	0.227821	0.000000	0.249855	0.121936	0.180363	0.106520	0.128265
77950	196384049	2022-07-04	0.0	HOUND 'EM DOWN	7	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801490787	313281	Steven Winstanley	3	ZIPPING GARTH @ STUD 0-2 WIN	07:36PM	565	Mixed Maiden and Grade Five	Maitland	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	31.875	0.342029	13.795	0.0	0.0	0.200000	0.233442	0.277637	0.261405	0.273136	0.024321	0.218310	0.316690	0.270554	0.276662	0.049474	0.142857	0.200000	0.161905	0.142857	0.032991	0.778151	0.903090	8.614437e-01	0.903090	0.072133	0.000000	0.000000	4.440892e-16	0.000000	0.000000e+00	0.194132	0.355912	0.279655	0.277637	0.046001	0.218310	0.316690	0.271078	0.275180	0.032574	0.125000	0.200000	0.158862	0.142857	0.026641	0.778151	0.954243	8.681223e-01	0.903090	0.060784	0.000000	0.000000	0.000000	0.000000	0.000000	0.194132	0.462627	0.328446	0.316396	0.054098	0.218310	0.346154	0.292413	0.296715	0.036166	0.125000	0.333333	0.187245	0.166667	0.054481	0.000000	0.954243	7.520173e-01	0.845098	0.239572	0.000000	0.212109	0.014373	0.000000	0.050118	0.070298
121171	230053393	2022-07-04	0.0	CAWBOURNE CROSS	3	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801448232	104699	Lisa Rasmussen	5	GIDDY-UP (N/P) STAKE	07:27PM	342	Masters	Angle Park	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	19.675	0.533632	7.730	0.0	0.0	0.169811	0.336395	0.413836	0.386878	0.410402	0.043753	0.500000	0.500000	0.500000	0.500000	0.000000	0.142857	0.250000	0.186508	0.166667	0.056260	0.698970	0.903090	8.157193e-01	0.845098	0.105184	0.000000	0.000000	3.700743e-16	0.000000	2.533726e-17	0.336395	0.491121	0.420107	0.412119	0.059452	0.401747	0.506477	0.480510	0.500000	0.044239	0.142857	0.500000	0.273810	0.250000	0.129975	0.477121	0.903090	7.042182e-01	0.698970	0.155860	0.000000	0.188140	0.058566	0.000000	0.091070	0.280126	0.505631	0.403460	0.405048	0.064662	0.329857	0.610664	0.472084	0.500000	0.079724	0.142857	0.500000	0.269048	0.225000	0.125458	0.477121	0.903090	7.115827e-01	0.738561	0.144209	0.000000	0.205324	0.060368	0.000000	0.090871	0.179535
142478	243599770	2022-07-04	0.0	SKAIKRU	1	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455741	83214	Robert Sonter	6	BATHURST RSL CLUB	04:59PM	450	Grade 5	Bathurst	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	26.000	0.576406	15.450	0.0	0.0	0.159574	0.392354	0.392354	0.392354	0.392354	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	6.989700e-01	0.698970	0.000000	0.000000	0.000000	1.998401e-15	0.000000	0.000000e+00	0.293318	0.392354	0.329286	0.302186	0.054798	0.238956	0.238956	0.238956	0.238956	0.000000	0.142857	0.333333	0.242063	0.250000	0.095486	0.602060	0.903090	7.347067e-01	0.698970	0.153664	0.000000	0.167027	0.055676	0.000000	0.096433	0.285293	0.392354	0.317943	0.303341	0.036066	0.162722	0.258454	0.223927	0.237266	0.042031	0.125000	0.333333	0.202551	0.200000	0.070985	0.602060	0.954243	7.942519e-01	0.778151	0.120113	0.000000	0.167027	0.023861	0.000000	0.063130	0.092948
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
599603	673011364	2022-07-04	0.0	FERAL AGENT	8	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455737	96421	Derek Kerr	2	ZIPPING GARTH @ STUD MAIDEN	03:28PM	307	Maiden	Bathurst	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	17.830	0.628105	7.880	0.0	0.0	0.177215	0.409480	0.425999	0.418593	0.420299	0.008391	0.411700	0.431973	0.421836	0.421836	0.014335	0.250000	1.000000	0.527778	0.333333	0.411074	0.301030	0.698970	5.340200e-01	0.602060	0.207512	0.000000	0.231573	1.478939e-01	0.212109	1.284491e-01	0.409480	0.425999	0.418593	0.420299	0.008391	0.411700	0.431973	0.421836	0.421836	0.014335	0.250000	1.000000	0.527778	0.333333	0.411074	0.301030	0.698970	5.340200e-01	0.602060	0.207512	0.000000	0.231573	0.147894	0.212109	0.128449	0.409480	0.425999	0.418593	0.420299	0.008391	0.411700	0.431973	0.421836	0.421836	0.014335	0.250000	1.000000	0.527778	0.333333	0.411074	0.301030	0.698970	5.340200e-01	0.602060	0.207512	0.000000	0.231573	0.147894	0.212109	0.128449	0.194287
599956	694776805	2022-07-04	0.0	WENDY MAREE	3	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801490818	307271	Brian Baker	2	GARRARD'S HORSE AND HOUND	07:11PM	520	Novice Non Penalty	Albion Park	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	30.220	0.634509	5.630	0.0	0.0	0.136417	0.325934	0.410627	0.368281	0.368281	0.059887	0.168325	0.336770	0.252547	0.252547	0.119108	0.200000	0.250000	0.225000	0.225000	0.035355	0.698970	0.778151	7.385606e-01	0.738561	0.055990	0.110185	0.193690	1.519376e-01	0.151938	5.904714e-02	0.325934	0.410627	0.366778	0.363772	0.042426	0.168325	0.344322	0.283139	0.336770	0.099504	0.200000	0.250000	0.216667	0.200000	0.028868	0.698970	0.778151	7.517575e-01	0.778151	0.045715	0.110185	0.193690	0.138020	0.110185	0.048212	0.305980	0.410627	0.355981	0.361146	0.036561	0.168325	0.344322	0.274549	0.279411	0.067105	0.142857	0.333333	0.221032	0.200000	0.064636	0.602060	0.903090	7.564290e-01	0.778151	0.100056	0.110185	0.218690	0.142187	0.110185	0.050203	0.123793
600100	707214702	2022-07-04	0.0	WARDEN JODIE Res.	10	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455737	75264	Sam Simonetta	2	ZIPPING GARTH @ STUD MAIDEN	03:28PM	307	Maiden	Bathurst	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	17.830	0.628105	7.880	0.0	0.0	0.000000	0.146202	0.146202	0.146202	0.146202	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.125000	0.125000	0.125000	0.125000	0.000000	0.000000	0.954243	1.908485e-01	0.000000	0.426750	0.000000	0.000000	0.000000e+00	0.000000	0.000000e+00	0.146202	0.146202	0.146202	0.146202	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.125000	0.125000	0.125000	0.125000	0.000000	0.000000	0.954243	1.908485e-01	0.000000	0.426750	0.000000	0.000000	0.000000	0.000000	0.000000	0.146202	0.146202	0.146202	0.146202	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.125000	0.125000	0.125000	0.125000	0.000000	0.000000	0.954243	1.908485e-01	0.000000	0.426750	0.000000	0.000000	0.000000	0.000000	0.000000	0.014319
600105	707215693	2022-07-04	0.0	MYSTERY ANNE	3	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455737	75264	Sam Simonetta	2	ZIPPING GARTH @ STUD MAIDEN	03:28PM	307	Maiden	Bathurst	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	17.830	0.628105	7.880	0.0	0.0	0.125874	0.331265	0.331265	0.331265	0.331265	0.000000	0.473776	0.473776	0.473776	0.473776	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	6.989700e-01	0.698970	0.000000	0.000000	0.000000	0.000000e+00	0.000000	0.000000e+00	0.331265	0.331265	0.331265	0.331265	0.000000	0.473776	0.473776	0.473776	0.473776	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	6.989700e-01	0.698970	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.331265	0.331265	0.331265	0.331265	0.000000	0.473776	0.473776	0.473776	0.473776	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	6.989700e-01	0.698970	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.090248
600106	707215696	2022-07-04	0.0	BROKEN PROMISES	2	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455736	75264	Sam Simonetta	1	WELCOME GBOTA MAIDEN	03:07PM	450	Maiden	Bathurst	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	26.000	0.576406	15.450	0.0	0.0	0.139785	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	2.708944e-14	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000	0.000000e+00	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	2.642331e-14	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	2.842171e-14	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.060767

1704 rows × 115 columns

Before we merge, let's do some minor formatting changes to the FastTrack names so we can match onto the Betfair names. Betfair excludes all apostrophes and full stops in their naming convention, so we'll create a Betfair equivalent dog name on the dataset removing these characters. We also need to do this for the tracks, sometimes FastTrack will name tracks differently to Betfair e.g., Sandown Park from Betfair is known as Sandown (SAP) in the FastTrack database.

# Prepare data for easy reference in flumine
todays_data['DogName_bf'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
todays_data.replace({'Sandown (SAP)': 'Sandown Park'}, regex=True, inplace=True)
todays_data = todays_data.set_index(['DogName_bf','Track','RaceNum'])
todays_data.head()

C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/90992895.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['DogName_bf'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())

			FastTrack_DogId	date_dt	Place	DogName	Box	Rug	Weight	StartPrice	Handicap	Margin1	Margin2	PIR	Checks	Comments	SplitMargin	RunTime	Prizemoney	FastTrack_RaceId	TrainerId	TrainerName	RaceName	RaceTime	Distance	RaceGrade	date	StartPrice_probability	win	Prizemoney_norm	Place_inv	Place_log	RunSpeed	RunTime_median	speed_index	SplitMargin_median	RunTime_norm	SplitMargin_norm	box_win_percent	RunTime_norm_min_28D	RunTime_norm_max_28D	RunTime_norm_mean_28D	RunTime_norm_median_28D	RunTime_norm_std_28D	SplitMargin_norm_min_28D	SplitMargin_norm_max_28D	SplitMargin_norm_mean_28D	SplitMargin_norm_median_28D	SplitMargin_norm_std_28D	Place_inv_min_28D	Place_inv_max_28D	Place_inv_mean_28D	Place_inv_median_28D	Place_inv_std_28D	Place_log_min_28D	Place_log_max_28D	Place_log_mean_28D	Place_log_median_28D	Place_log_std_28D	Prizemoney_norm_min_28D	Prizemoney_norm_max_28D	Prizemoney_norm_mean_28D	Prizemoney_norm_median_28D	Prizemoney_norm_std_28D	RunTime_norm_min_91D	RunTime_norm_max_91D	RunTime_norm_mean_91D	RunTime_norm_median_91D	RunTime_norm_std_91D	SplitMargin_norm_min_91D	SplitMargin_norm_max_91D	SplitMargin_norm_mean_91D	SplitMargin_norm_median_91D	SplitMargin_norm_std_91D	Place_inv_min_91D	Place_inv_max_91D	Place_inv_mean_91D	Place_inv_median_91D	Place_inv_std_91D	Place_log_min_91D	Place_log_max_91D	Place_log_mean_91D	Place_log_median_91D	Place_log_std_91D	Prizemoney_norm_min_91D	Prizemoney_norm_max_91D	Prizemoney_norm_mean_91D	Prizemoney_norm_median_91D	Prizemoney_norm_std_91D	RunTime_norm_min_365D	RunTime_norm_max_365D	RunTime_norm_mean_365D	RunTime_norm_median_365D	RunTime_norm_std_365D	SplitMargin_norm_min_365D	SplitMargin_norm_max_365D	SplitMargin_norm_mean_365D	SplitMargin_norm_median_365D	SplitMargin_norm_std_365D	Place_inv_min_365D	Place_inv_max_365D	Place_inv_mean_365D	Place_inv_median_365D	Place_inv_std_365D	Place_log_min_365D	Place_log_max_365D	Place_log_mean_365D	Place_log_median_365D	Place_log_std_365D	Prizemoney_norm_min_365D	Prizemoney_norm_max_365D	Prizemoney_norm_mean_365D	Prizemoney_norm_median_365D	Prizemoney_norm_std_365D	prob_LogisticRegression
DogName_bf	Track	RaceNum
SPEEDY MARINA	Bathurst	5	148673258	2022-07-04	0.0	SPEEDY MARINA	5	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455740	65928	Dawn Lee	LADBROKES BLENDED BETS 1-3 WIN	04:37PM	307	Grade 5	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	17.830	0.628105	7.880	0.0	0.0	0.136986	0.317232	0.317232	0.317232	0.317232	0.000000	0.173267	0.173267	0.173267	0.173267	0.000000	0.333333	0.333333	0.333333	0.333333	0.000000	0.602060	0.602060	0.602060	0.602060	0.000000	0.212109	0.212109	2.121089e-01	0.212109	0.000000e+00	0.317232	0.332373	0.324803	0.324803	0.010706	0.173267	0.210579	0.191923	0.191923	0.026383	0.166667	0.333333	0.250000	0.250000	0.117851	0.602060	0.845098	0.723579	0.723579	0.171854	0.0	0.212109	0.106054	0.106054	0.149984	0.236982	0.371585	0.306603	0.317232	0.052095	0.173267	0.406600	0.242140	0.210579	0.096894	0.125000	0.333333	0.186905	0.166667	0.083715	0.602060	0.954243	0.829918	0.845098	0.135269	0.0	0.212109	0.042422	0.000000	0.094858	0.073607
FILTHY PHANTOM	Angle Park	5	161977365	2022-07-04	0.0	FILTHY PHANTOM	7	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801448232	110385	Tony Hinrichsen	GIDDY-UP (N/P) STAKE	07:27PM	342	Masters	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	19.675	0.533632	7.730	0.0	0.0	0.124590	0.386907	0.455919	0.427165	0.420839	0.028492	0.314004	0.500000	0.408769	0.394027	0.082256	0.166667	0.333333	0.220000	0.200000	0.064979	0.602060	0.845098	0.756322	0.778151	0.090977	0.000000	0.182760	3.655208e-02	0.000000	8.173293e-02	0.386907	0.572783	0.449667	0.436026	0.053935	0.210921	0.551665	0.403088	0.394027	0.092070	0.142857	1.000000	0.291209	0.200000	0.235613	0.000000	0.903090	0.669243	0.778151	0.257193	0.0	0.249855	0.057150	0.000000	0.095131	0.277002	0.572783	0.459854	0.456897	0.059514	0.210921	0.616496	0.443040	0.443820	0.089192	0.142857	1.000000	0.464354	0.333333	0.320517	0.000000	0.903090	0.571676	0.602060	0.227821	0.0	0.249855	0.121936	0.180363	0.106520	0.128265
HOUND EM DOWN	Maitland	3	196384049	2022-07-04	0.0	HOUND 'EM DOWN	7	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801490787	313281	Steven Winstanley	ZIPPING GARTH @ STUD 0-2 WIN	07:36PM	565	Mixed Maiden and Grade Five	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	31.875	0.342029	13.795	0.0	0.0	0.200000	0.233442	0.277637	0.261405	0.273136	0.024321	0.218310	0.316690	0.270554	0.276662	0.049474	0.142857	0.200000	0.161905	0.142857	0.032991	0.778151	0.903090	0.861444	0.903090	0.072133	0.000000	0.000000	4.440892e-16	0.000000	0.000000e+00	0.194132	0.355912	0.279655	0.277637	0.046001	0.218310	0.316690	0.271078	0.275180	0.032574	0.125000	0.200000	0.158862	0.142857	0.026641	0.778151	0.954243	0.868122	0.903090	0.060784	0.0	0.000000	0.000000	0.000000	0.000000	0.194132	0.462627	0.328446	0.316396	0.054098	0.218310	0.346154	0.292413	0.296715	0.036166	0.125000	0.333333	0.187245	0.166667	0.054481	0.000000	0.954243	0.752017	0.845098	0.239572	0.0	0.212109	0.014373	0.000000	0.050118	0.070298
CAWBOURNE CROSS	Angle Park	5	230053393	2022-07-04	0.0	CAWBOURNE CROSS	3	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801448232	104699	Lisa Rasmussen	GIDDY-UP (N/P) STAKE	07:27PM	342	Masters	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	19.675	0.533632	7.730	0.0	0.0	0.169811	0.336395	0.413836	0.386878	0.410402	0.043753	0.500000	0.500000	0.500000	0.500000	0.000000	0.142857	0.250000	0.186508	0.166667	0.056260	0.698970	0.903090	0.815719	0.845098	0.105184	0.000000	0.000000	3.700743e-16	0.000000	2.533726e-17	0.336395	0.491121	0.420107	0.412119	0.059452	0.401747	0.506477	0.480510	0.500000	0.044239	0.142857	0.500000	0.273810	0.250000	0.129975	0.477121	0.903090	0.704218	0.698970	0.155860	0.0	0.188140	0.058566	0.000000	0.091070	0.280126	0.505631	0.403460	0.405048	0.064662	0.329857	0.610664	0.472084	0.500000	0.079724	0.142857	0.500000	0.269048	0.225000	0.125458	0.477121	0.903090	0.711583	0.738561	0.144209	0.0	0.205324	0.060368	0.000000	0.090871	0.179535
SKAIKRU	Bathurst	6	243599770	2022-07-04	0.0	SKAIKRU	1	0	0	0.0	0.0	0	0	0	0	0	0.0	0.0	0.0	801455741	83214	Robert Sonter	BATHURST RSL CLUB	04:59PM	450	Grade 5	04 Jul 22	0.0	0	0.0	inf	0.0	0.0	26.000	0.576406	15.450	0.0	0.0	0.159574	0.392354	0.392354	0.392354	0.392354	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.250000	0.250000	0.250000	0.250000	0.000000	0.698970	0.698970	0.698970	0.698970	0.000000	0.000000	0.000000	1.998401e-15	0.000000	0.000000e+00	0.293318	0.392354	0.329286	0.302186	0.054798	0.238956	0.238956	0.238956	0.238956	0.000000	0.142857	0.333333	0.242063	0.250000	0.095486	0.602060	0.903090	0.734707	0.698970	0.153664	0.0	0.167027	0.055676	0.000000	0.096433	0.285293	0.392354	0.317943	0.303341	0.036066	0.162722	0.258454	0.223927	0.237266	0.042031	0.125000	0.333333	0.202551	0.200000	0.070985	0.602060	0.954243	0.794252	0.778151	0.120113	0.0	0.167027	0.023861	0.000000	0.063130	0.092948

If you look closely at the data frame above you might notice that for reserve dogs, they will have a Box number of 9 or 10. There is only ever a max of 8 greyhounds per race therefore we will need to adjust it somehow. I didn't notice this issue for quite a while, but the good thing is the website gives us the info we need to adjust:

reserve_dog

We can see that Rhinestone Ash is a reserve dog and has the number 9, if you click on rules, you can see what Box it is starting from:

reserve_dog_2

The problem is, my webscraping is pretty poor, and it would take significant time for me to learn it. But after going through the documentation again, changes to boxes are actually available through the API under the clarifications attribute of marketDescription. You will be able to access this within Flumine as market.market_catalogue.description.clarifications, but it's a bit weird. It returns box changes as a string that looks like this: box_changes

Originally I had planned to leave this article as it is since, I've never worked with anything like this before and its already getting pretty long, however huge shoutout to Betfair Quants community and especially Brett who provided his solution to working with box changes.

from nltk.tokenize import regexp_tokenize
# my_string is an example string, that you will need to get live from the api via: market.market_catalogue.description.clarifications.replace("<br/> Dog","<br/>Dog")
my_string = "<br/>Box changes:<br/>Dog 9. Tralee Blaze starts from box no. 8<br/><br/>Dog 6. That Other One starts from box no. 2<br/><br/>"
print(f'HTML Comment: {my_string}')
pattern1 = r'(?&lt;=<br/>Dog ).+?(?= starts)'
pattern2 = r"(?&lt;=\bbox no. )(\w+)"
runners_df = pd.DataFrame (regexp_tokenize(my_string, pattern1), columns = ['runner_name'])
runners_df['runner_name'] = runners_df['runner_name'].astype(str)
# Remove dog name from runner_number
runners_df['runner_number'] = runners_df['runner_name'].apply(lambda x: x[:(x.find(" ") - 1)].upper())
# Remove dog number from runner_name
runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
runners_df['Box'] = regexp_tokenize(my_string, pattern2)
runners_df

HTML Comment: <br>Box changes:<br>Dog 9. Tralee Blaze starts from box no. 8<br><br>Dog 6. That Other One starts from box no. 2<br><br>

	runner_name	runner_number	Box
0	TRALEE BLAZE	9	8
1	THAT OTHER ONE	6	2

Brett's solution is amazing, there is only one problem, currently our code is structured so that we generate our predictions in the morning well before the race starts. To implement the above fix, we need to generate our predictions just before the race starts to incorporate the Box information.

This means we need to write a little bit more code to make it happen, but we are almost there.

So now my plan to update the old data and generate probabilities just before the race. So now just before the jump my code structure will look like this:

pull any data on box changes from the Betfair API
convert the box change data into a dataframe named runners_df using the Brett's code
in my original dataframe named todays_data replace any Box data with runners_df data, otherwise leave it untouched
then merge the box_win_percent dataframe back onto the todays_data dataframe
now we can predict probabilities again and then renormalise them

It may sound a little complicated but as we already have Brett's code there is only a few extra lines of code we need to write. This is what we will add into our Flumine strategy along with Brett's code:

# Running Brett's code gives us a nice dataframe named runners_df that we can work with
# Replace any old Box info in our original dataframe with data available in 
runners_df = runners_df.set_index('runner_name')
todays_data.loc[(runners_df.index[runners_df.index.isin(dog_names)],track,race_number),'Box'] = runners_df.loc[runners_df.index.isin(dog_names),'Box'].to_list()
# Merge box_win_percent data onto todays_data
todays_data = todays_data.merge(box_win_percent, on=['Track', 'Distance', 'Box'], how='left')
# Merge box_win_percentage back on:
todays_data = todays_data.drop(columns = 'box_win_percentage', axis = 1)
todays_data = todays_data.merge(box_win_percent, on = ['Track', 'Distance','Box'], how = 'left')

# Generate probabilities using Bruno's model
todays_data.loc[(dog_names,track,race_number),'prob_LogisticRegression'] = brunos_model.predict_proba(todays_data.loc[(dog_names,track,race_number)][feature_cols])[:,1]
# renomalise probabilities
probabilities = todays_data.loc[dog_names,track,race_number]['prob_LogisticRegression']
todays_data.loc[(dog_names,track,race_number),'renormalised_prob'] = probabilities/probabilities.sum()
# convert probaiblities to ratings
todays_data.loc[(dog_names,track,race_number),'rating'] = 1/todays_data.loc[dog_names,track,race_number]['renormalised_prob']

Now everything is done, and we can finally move onto placing our bets

Automating our predictions

Now that we have our data nicely set up. We can reference our probabilities by getting the DogName, Track and RaceNum from the Betfair polling API and then renormalised probabilities to calculate ratings with only a few lines of code. Then the rest is the same as How to Automate III

# Import libraries for logging in
import betfairlightweight
from flumine import Flumine, clients

# Credentials to login and logging in 
trading = betfairlightweight.APIClient('username','password',app_key='appkey')
client = clients.BetfairClient(trading, interactive_login=True)

# Login
framework = Flumine(client=client)

# Code to login when using security certificates
# trading = betfairlightweight.APIClient('username','password',app_key='appkey', certs=r'C:\Users\zhoui\openssl_certs')
# client = clients.BetfairClient(trading)

# framework = Flumine(client=client)

# Import libraries and logging
from flumine import BaseStrategy 
from flumine.order.trade import Trade
from flumine.order.order import LimitOrder
from flumine.markets.market import Market
from betfairlightweight.filters import streaming_market_filter
from betfairlightweight.resources import MarketBook
import re
import pandas as pd
import numpy as np
import datetime
import logging
logging.basicConfig(filename = 'how_to_automate_4.log', level=logging.INFO, format='%(asctime)s:%(levelname)s:%(message)s')

Let's create a new class for our strategy called FlatBetting that finds the best available to back and lay price 60 seconds before the jump. If any of those prices have value, we will place a flat bet for $5 at those prices. This code is almost the same as How to Automate III

Since we are now editing our todays_data dataframe inside our Flumine strategy we will also need to convert todays_data to a global variable which is a simple one liner:

global todays_data

I also wanted to call out one gotcha that, Brett found that is almost impossible to find unless you are keeping a close eye on your logs. Sometimes the polling API and streaming API doesn't match up when there are scratchings, so we need to check if it does:

# Check the polling API and streaming API matches up (sometimes it doesn't)
if runner_cata.selection_id == runner.selection_id:

class FlatBetting(BaseStrategy):
    def start(self) -&gt; None:
        print("starting strategy 'FlatBetting' using the model we created the Greyhound modelling in Python Tutorial")

    def check_market_book(self, market: Market, market_book: MarketBook) -&gt; bool:
        if market_book.status != "CLOSED":
            return True

    def process_market_book(self, market: Market, market_book: MarketBook) -&gt; None:
        # Convert dataframe to a global variable
        global todays_data

        # At the 60 second mark:
        if market.seconds_to_start &lt; 60 and market_book.inplay == False:
            # get the list of dog_names, name of the track/venue and race_number/RaceNum from Betfair Polling API
            dog_names = []
            track = market.market_catalogue.event.venue
            race_number = market.market_catalogue.market_name.split(' ',1)[0]  # comes out as R1/R2/R3 .. etc
            race_number = re.sub("[^0-9]", "", race_number)  # only keep the numbers 
            for runner_cata in market.market_catalogue.runners:
                dog_name = runner_cata.runner_name.split(' ',1)[1].upper()
                dog_names.append(dog_name)

            # Check if there are box changes, if there are then use Brett's code
            if market.market_catalogue.description.clarifications != None:
                # Brett's code to get Box changes:
                my_string = market.market_catalogue.description.clarifications.replace("<br/> Dog","<br/>Dog")
                pattern1 = r'(?&lt;=<br/>Dog ).+?(?= starts)'
                pattern2 = r"(?&lt;=\bbox no. )(\w+)"
                runners_df = pd.DataFrame (regexp_tokenize(my_string, pattern1), columns = ['runner_name'])
                runners_df['runner_name'] = runners_df['runner_name'].astype(str)
                # Remove dog name from runner_number
                runners_df['runner_number'] = runners_df['runner_name'].apply(lambda x: x[:(x.find(" ") - 1)].upper())
                # Remove dog number from runner_name
                runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
                runners_df['Box'] = regexp_tokenize(my_string, pattern2)

                # Replace any old Box info in our original dataframe with data available in runners_df
                runners_df = runners_df.set_index('runner_name')
                todays_data.loc[(runners_df.index[runners_df.index.isin(dog_names)],track,race_number),'Box'] = runners_df.loc[runners_df.index.isin(dog_names),'Box'].to_list()
                # Merge box_win_percentage back on:
                todays_data = todays_data.drop(columns = 'box_win_percentage', axis = 1)
                todays_data = todays_data.reset_index().merge(box_win_percent, on = ['Track', 'Distance','Box'], how = 'left').set_index(['DogName_bf','Track','RaceNum'])

            # Generate probabilities using Bruno's model
            todays_data.loc[(dog_names,track,race_number),'prob_LogisticRegression'] = brunos_model.predict_proba(todays_data.loc[(dog_names,track,race_number)][feature_cols])[:,1]
            # renomalise probabilities
            probabilities = todays_data.loc[dog_names,track,race_number]['prob_LogisticRegression']
            todays_data.loc[(dog_names,track,race_number),'renormalised_prob'] = probabilities/probabilities.sum()
            # convert probaiblities to ratings
            todays_data.loc[(dog_names,track,race_number),'rating'] = 1/todays_data.loc[dog_names,track,race_number]['renormalised_prob']

            # Use both the polling api (market.catalogue) and the streaming api at once:
            for runner_cata, runner in zip(market.market_catalogue.runners, market_book.runners):
                # Check the polling api and streaming api matches up (sometimes it doesn't)
                if runner_cata.selection_id == runner.selection_id:
                    # Get the dog_name from polling api then reference our data for our model rating
                    dog_name = runner_cata.runner_name.split(' ',1)[1].upper()

                    # Rest is the same as How to Automate III
                    model_price = todays_data.loc[dog_name,track,race_number]['rating']
                    ### If you have an issue such as:
                        # Unknown error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
                        # Then do model_price = todays_data.loc[dog_name,track,race_number]['rating'].item()

                    # Log info before placing bets
                    logging.info(f'dog_name: {dog_name}')
                    logging.info(f'model_price: {model_price}')
                    logging.info(f'market_id: {market_book.market_id}')
                    logging.info(f'selection_id: {runner.selection_id}')

                    # If best available to back price is &gt; rated price then flat $5 back
                    if runner.status == "ACTIVE" and runner.ex.available_to_back[0]['price'] &gt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="BACK", order_type=LimitOrder(price=runner.ex.available_to_back[0]['price'], size=5.00)
                        )
                        market.place_order(order)
                    # If best available to lay price is &lt; rated price then flat $5 lay
                    if runner.status == "ACTIVE" and runner.ex.available_to_lay[0]['price'] &lt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="LAY", order_type=LimitOrder(price=runner.ex.available_to_lay[0]['price'], size=5.00)
                        )
                        market.place_order(order)

As the model we have built is a greyhound model for Australian racing let's point our strategy to Australian greyhound win markets

greyhounds_strategy = FlatBetting(
    market_filter=streaming_market_filter(
        event_type_ids=["4339"], # Greyhounds markets
        country_codes=["AU"], # Australian markets
        market_types=["WIN"], # Win markets
    ),
    max_order_exposure= 50, # Max exposure per order = 50
    max_trade_count=1, # Max 1 trade per selection
    max_live_trade_count=1, # Max 1 unmatched trade per selection
)

framework.add_strategy(greyhounds_strategy)

And add our auto-terminate and bet logging from the previous tutorials:

# import logging
import datetime
from flumine.worker import BackgroundWorker
from flumine.events.events import TerminationEvent

# logger = logging.getLogger(__name__)

"""
Worker can be used as followed:
    framework.add_worker(
        BackgroundWorker(
            framework,
            terminate,
            func_kwargs={"today_only": True, "seconds_closed": 1200},
            interval=60,
            start_delay=60,
        )
    )
This will run every 60s and will terminate 
the framework if all markets starting 'today' 
have been closed for at least 1200s
"""


# Function that stops automation running at the end of the day
def terminate(
    context: dict, flumine, today_only: bool = True, seconds_closed: int = 600
) -&gt; None:
    """terminate framework if no markets
    live today.
    """
    markets = list(flumine.markets.markets.values())
    markets_today = [
        m
        for m in markets
        if m.market_start_datetime.date() == datetime.datetime.utcnow().date()
        and (
            m.elapsed_seconds_closed is None
            or (m.elapsed_seconds_closed and m.elapsed_seconds_closed &lt; seconds_closed)
        )
    ]
    if today_only:
        market_count = len(markets_today)
    else:
        market_count = len(markets)
    if market_count == 0:
        # logger.info("No more markets available, terminating framework")
        flumine.handler_queue.put(TerminationEvent(flumine))

# Add the stopped to our framework
framework.add_worker(
    BackgroundWorker(
        framework,
        terminate,
        func_kwargs={"today_only": True, "seconds_closed": 1200},
        interval=60,
        start_delay=60,
    )
)

import os
import csv
import logging
from flumine.controls.loggingcontrols import LoggingControl
from flumine.order.ordertype import OrderTypes

logger = logging.getLogger(__name__)

FIELDNAMES = [
    "bet_id",
    "strategy_name",
    "market_id",
    "selection_id",
    "trade_id",
    "date_time_placed",
    "price",
    "price_matched",
    "size",
    "size_matched",
    "profit",
    "side",
    "elapsed_seconds_executable",
    "order_status",
    "market_note",
    "trade_notes",
    "order_notes",
]


class LiveLoggingControl(LoggingControl):
    NAME = "BACKTEST_LOGGING_CONTROL"

    def __init__(self, *args, **kwargs):
        super(LiveLoggingControl, self).__init__(*args, **kwargs)
        self._setup()

    # Changed file path and checks if the file orders_hta_4.csv already exists, if it doens't then create it
    def _setup(self):
        if os.path.exists("orders_hta_4.csv"):
            logging.info("Results file exists")
        else:
            with open("orders_hta_4.csv", "w") as m:
                csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                csv_writer.writeheader()

    def _process_cleared_orders_meta(self, event):
        orders = event.event
        with open("orders_hta_4.csv", "a") as m:
            for order in orders:
                if order.order_type.ORDER_TYPE == OrderTypes.LIMIT:
                    size = order.order_type.size
                else:
                    size = order.order_type.liability
                if order.order_type.ORDER_TYPE == OrderTypes.MARKET_ON_CLOSE:
                    price = None
                else:
                    price = order.order_type.price
                try:
                    order_data = {
                        "bet_id": order.bet_id,
                        "strategy_name": order.trade.strategy,
                        "market_id": order.market_id,
                        "selection_id": order.selection_id,
                        "trade_id": order.trade.id,
                        "date_time_placed": order.responses.date_time_placed,
                        "price": price,
                        "price_matched": order.average_price_matched,
                        "size": size,
                        "size_matched": order.size_matched,
                        "profit": 0 if not order.cleared_order else order.cleared_order.profit,
                        "side": order.side,
                        "elapsed_seconds_executable": order.elapsed_seconds_executable,
                        "order_status": order.status.value,
                        "market_note": order.trade.market_notes,
                        "trade_notes": order.trade.notes_str,
                        "order_notes": order.notes_str,
                    }
                    csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                    csv_writer.writerow(order_data)
                except Exception as e:
                    logger.error(
                        "_process_cleared_orders_meta: %s" % e,
                        extra={"order": order, "error": e},
                    )

        logger.info("Orders updated", extra={"order_count": len(orders)})

    def _process_cleared_markets(self, event):
        cleared_markets = event.event
        for cleared_market in cleared_markets.orders:
            logger.info(
                "Cleared market",
                extra={
                    "market_id": cleared_market.market_id,
                    "bet_count": cleared_market.bet_count,
                    "profit": cleared_market.profit,
                    "commission": cleared_market.commission,
                },
            )

framework.add_logging_control(
    LiveLoggingControl()
)

framework.run()

Conclusion and next steps

Boom! We now have an automated script that will downloads all the data we need in the morning, generates a set of predictions, place flat stakes bets, logs all bets and switches itself off at the end of the day. All we need to do is hit play in the morning!

We have now written code automation code for three different strategies, however we haven't actually backtested any of our strategies or models yet. So for the final part of the How to Automate series we will be writing code to How to simulate the Exchange to backtest and optimise our strategies. Make sure not to miss it as this is where I believe the sauce is made (not that I have made significant sauce).

Complete code

Run the code from your ide by using py <filename>.py, making sure you amend the path to point to your input data.

Download from Github

from joblib import load
import os
import sys

# Allow imports from src folder
module_path = os.path.abspath(os.path.join('../src'))
if module_path not in sys.path:
    sys.path.append(module_path)

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
from dateutil import tz
from pandas.tseries.offsets import MonthEnd
from sklearn.preprocessing import MinMaxScaler
import itertools

import numpy as np
import pandas as pd
from nltk.tokenize import regexp_tokenize

# settings to display all columns
pd.set_option("display.max_columns", None)

import fasttrack as ft

from dotenv import load_dotenv
load_dotenv()

# Import libraries for logging in
import betfairlightweight
from flumine import Flumine, clients
# Import libraries and logging
from flumine import BaseStrategy 
from flumine.order.trade import Trade
from flumine.order.order import LimitOrder
from flumine.markets.market import Market
from betfairlightweight.filters import streaming_market_filter
from betfairlightweight.resources import MarketBook
import re
import pandas as pd
import numpy as np
import datetime
import logging
logging.basicConfig(filename = 'how_to_automate_4.log', level=logging.INFO, format='%(asctime)s:%(levelname)s:%(message)s')

# import logging
from flumine.worker import BackgroundWorker
from flumine.events.events import TerminationEvent

import csv
from flumine.controls.loggingcontrols import LoggingControl
from flumine.order.ordertype import OrderTypes

logger = logging.getLogger(__name__)

brunos_model = load('logistic_regression.joblib')
brunos_model

# Validate FastTrack API connection
api_key = os.getenv('FAST_TRACK_API_KEY',)
client = ft.Fasttrack(api_key)
track_codes = client.listTracks()

# Import race data excluding NZ races
au_tracks_filter = list(track_codes[track_codes['state'] != 'NZ']['track_code'])

# Time window to import data
# First day of the month 46 months back from now
date_from = (datetime.today() - relativedelta(months=46)).replace(day=1).strftime('%Y-%m-%d')
# First day of previous month
date_to = (datetime.today() - relativedelta(months=1)).replace(day=1).strftime('%Y-%m-%d')

# Dataframes to populate data with
race_details = pd.DataFrame()
dog_results = pd.DataFrame()

# For each month, either fetch data from API or use local CSV file if we already have downloaded it
for start in pd.date_range(date_from, date_to, freq='MS'):
    start_date = start.strftime("%Y-%m-%d")
    end_date = (start + MonthEnd(1)).strftime("%Y-%m-%d")
    try:
        filename_races = f'FT_AU_RACES_{start_date}.csv'
        filename_dogs = f'FT_AU_DOGS_{start_date}.csv'

        filepath_races = f'../data/{filename_races}'
        filepath_dogs = f'../data/{filename_dogs}'

        print(f'Loading data from {start_date} to {end_date}')
        if os.path.isfile(filepath_races):
            # Load local CSV file
            month_race_details = pd.read_csv(filepath_races) 
            month_dog_results = pd.read_csv(filepath_dogs) 
        else:
            # Fetch data from API
            month_race_details, month_dog_results = client.getRaceResults(start_date, end_date, au_tracks_filter)
            month_race_details.to_csv(filepath_races, index=False)
            month_dog_results.to_csv(filepath_dogs, index=False)

        # Combine monthly data
        race_details = race_details.append(month_race_details, ignore_index=True)
        dog_results = dog_results.append(month_dog_results, ignore_index=True)
    except:
        print(f'Could not load data from {start_date} to {end_date}')

race_details.tail()

current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

print(f'Start date: {current_month_start_date}')
print(f'End Date: {current_month_end_date}')

# Download data for races that have concluded this current month up untill today
# Start and end dates for current month
current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

# Files names 
filename_races = f'FT_AU_RACES_{current_month_start_date}.csv'
filename_dogs = f'FT_AU_DOGS_{current_month_start_date}.csv'
# Where to store files locally
filepath_races = f'../data/{filename_races}'
filepath_dogs = f'../data/{filename_dogs}'

# Fetch data from API
month_race_details, month_dog_results = client.getRaceResults(current_month_start_date, current_month_end_date, au_tracks_filter)

# Save the files locally and replace any out of date fields
month_race_details.to_csv(filepath_races, index=False)
month_dog_results.to_csv(filepath_dogs, index=False)

dog_results

# This is super important I have spent literally hours before I found out this was causing errors
dog_results['@id'] = pd.to_numeric(dog_results['@id'])

# Append the extra data to our data frames 
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

# Download the data for todays races
todays_date = pd.Timestamp.now().strftime("%Y-%m-%d")
todays_races, todays_dogs = client.getFullFormat(dt= todays_date, tracks = au_tracks_filter)

# display is for ipython notebooks only
# display(todays_races.head(1), todays_dogs.head(1))

# It seems that the todays_races dataframe doesn't have the date column, so let's add that on
todays_races['date'] = pd.Timestamp.now().strftime('%d %b %y')
todays_races.head(1)

# It also seems that in todays_dogs dataframe Box is labeled as RaceBox instead, so let's rename it
# We can also see that there are some specific dogs that have "Res." as a suffix of their name, i.e. they are reserve dogs,
# We will treat this later
todays_dogs = todays_dogs.rename(columns={"RaceBox":"Box"})
todays_dogs.tail(3)

# Appending todays data to this months data
month_dog_results = pd.concat([month_dog_results,todays_dogs],join='outer')[month_dog_results.columns]
month_race_details = pd.concat([month_race_details,todays_races],join='outer')[month_race_details.columns]

# Appending this months data to the rest of our historical data
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

race_details

## Cleanse and normalise the data
# Clean up the race dataset
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
# Clean up the dogs results dataset
dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})

# New line of code (rest of this code chunk is copied from bruno's code)
dog_results['FastTrack_DogId'] = pd.to_numeric(dog_results['FastTrack_DogId'])

# Combine dogs results with race attributes
dog_results = dog_results.merge(
    race_details, 
    how = 'left',
    on = 'FastTrack_RaceId'
)

# Convert StartPrice to probability
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x is None else float(x.replace('$', '').replace('F', '')) if isinstance(x, str) else x)
dog_results['StartPrice_probability'] = (1 / dog_results['StartPrice']).fillna(0)
dog_results['StartPrice_probability'] = dog_results.groupby('FastTrack_RaceId')['StartPrice_probability'].apply(lambda x: x / x.sum())

# Discard entries without results (scratched or did not finish)
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results['Box'] = dog_results['Box'].astype(int)

# Clean up other attributes
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['SplitMargin'] = dog_results['SplitMargin'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float).fillna(0)
dog_results['Place'] = pd.to_numeric(dog_results['Place'].apply(lambda x: x.replace("=", "") if isinstance(x, str) else 0), errors='coerce').fillna(0)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x == 1 else 0)

# Normalise some of the raw values
dog_results['Prizemoney_norm'] = np.log10(dog_results['Prizemoney'] + 1) / 12
dog_results['Place_inv'] = (1 / dog_results['Place']).fillna(0)
dog_results['Place_log'] = np.log10(dog_results['Place'] + 1).fillna(0)
dog_results['RunSpeed'] = (dog_results['RunTime'] / dog_results['Distance']).fillna(0)

## Generate features using raw data
# Calculate median winner time per track/distance
win_results = dog_results[dog_results['win'] == 1]
median_win_time = pd.DataFrame(data=win_results[win_results['RunTime'] &gt; 0].groupby(['Track', 'Distance'])['RunTime'].median()).rename(columns={"RunTime": "RunTime_median"}).reset_index()
median_win_split_time = pd.DataFrame(data=win_results[win_results['SplitMargin'] &gt; 0].groupby(['Track', 'Distance'])['SplitMargin'].median()).rename(columns={"SplitMargin": "SplitMargin_median"}).reset_index()
median_win_time.head()

# Calculate track speed index
median_win_time['speed_index'] = (median_win_time['RunTime_median'] / median_win_time['Distance'])
median_win_time['speed_index'] = MinMaxScaler().fit_transform(median_win_time[['speed_index']])
median_win_time.head()

# Compare dogs finish time with median winner time
dog_results = dog_results.merge(median_win_time, on=['Track', 'Distance'], how='left')
dog_results = dog_results.merge(median_win_split_time, on=['Track', 'Distance'], how='left')

# Normalise time comparison
dog_results['RunTime_norm'] = (dog_results['RunTime_median'] / dog_results['RunTime']).clip(0.9, 1.1)
dog_results['RunTime_norm'] = MinMaxScaler().fit_transform(dog_results[['RunTime_norm']])
dog_results['SplitMargin_norm'] = (dog_results['SplitMargin_median'] / dog_results['SplitMargin']).clip(0.9, 1.1)
dog_results['SplitMargin_norm'] = MinMaxScaler().fit_transform(dog_results[['SplitMargin_norm']])
dog_results.head()

# Calculate box winning percentage for each track/distance
box_win_percent = pd.DataFrame(data=dog_results.groupby(['Track', 'Distance', 'Box'])['win'].mean()).rename(columns={"win": "box_win_percent"}).reset_index()
# Add to dog results dataframe
dog_results = dog_results.merge(box_win_percent, on=['Track', 'Distance', 'Box'], how='left')
# Display example of barrier winning probabilities
print(box_win_percent.head(8))

dog_results[dog_results['FastTrack_DogId'] == 592253143].tail()[['date_dt','Place','DogName','RaceNum','Track','Distance','win','Prizemoney_norm','Place_inv','Place_log']]

# Generate rolling window features
dataset = dog_results.copy()
dataset = dataset.set_index(['FastTrack_DogId', 'date_dt']).sort_index()

# Use rolling window of 28, 91 and 365 days
rolling_windows = ['28D', '91D', '365D']
# Features to use for rolling windows calculation
features = ['RunTime_norm', 'SplitMargin_norm', 'Place_inv', 'Place_log', 'Prizemoney_norm']
# Aggregation functions to apply
aggregates = ['min', 'max', 'mean', 'median', 'std']
# Keep track of generated feature names
feature_cols = ['speed_index', 'box_win_percent']

for rolling_window in rolling_windows:
        print(f'Processing rolling window {rolling_window}')

        rolling_result = (
            dataset
            .reset_index(level=0).sort_index()
            .groupby('FastTrack_DogId')[features]
            .rolling(rolling_window)
            .agg(aggregates)
            .groupby(level=0)  # Thanks to Brett for finding this!
            .shift(1)
        )

        # My own dodgey code to work with reserve dogs
        temp = rolling_result.reset_index()
        temp = temp[temp['date_dt'] == pd.Timestamp.now().normalize()]
        temp.groupby(['FastTrack_DogId','date_dt']).first()
        rolling_result.loc[pd.IndexSlice[:, pd.Timestamp.now().normalize()], :] = temp.groupby(['FastTrack_DogId','date_dt']).first()

        # Generate list of rolling window feature names (eg: RunTime_norm_min_365D)
        agg_features_cols = [f'{f}_{a}_{rolling_window}' for f, a in itertools.product(features, aggregates)]
        # Add features to dataset
        dataset[agg_features_cols] = rolling_result
        # Keep track of generated feature names
        feature_cols.extend(agg_features_cols)

# Replace missing values with 0
dataset.fillna(0, inplace=True)
# display(dataset.head(8))  # display is only for ipython notebooks

# Only keep data after 2018-12-01
model_df = dataset.reset_index()
feature_cols = np.unique(feature_cols).tolist()
model_df = model_df[model_df['date_dt'] &gt;= '2018-12-01']

# This line was originally part of Bruno's tutorial, but we don't run it in this script
# model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice_probability'] + feature_cols]

# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]

# Generate predictions like normal
# Range of dates that we want to simulate later '2022-03-01' to '2022-04-01'
todays_data = model_df[(model_df['date_dt'] &gt;= pd.Timestamp('2022-03-01').strftime('%Y-%m-%d')) &amp; (model_df['date_dt'] &lt; pd.Timestamp('2022-04-01').strftime('%Y-%m-%d'))]
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities
todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
todays_data['rating'] = 1/todays_data['renormalise_prob']
todays_data = todays_data.sort_values(by = 'date_dt')
todays_data

def download_iggy_ratings(date):
    """Downloads the Betfair Iggy model ratings for a given date and formats it into a nice DataFrame.

    Args:
        date (datetime): the date we want to download the ratings for
    """
    iggy_url_1 = 'https://betfair-data-supplier-prod.herokuapp.com/api/widgets/iggy-joey/datasets?date='
    iggy_url_2 = date.strftime("%Y-%m-%d")
    iggy_url_3 = '&amp;presenter=RatingsPresenter&amp;csv=true'
    iggy_url = iggy_url_1 + iggy_url_2 + iggy_url_3

    # Download todays greyhounds ratings
    iggy_df = pd.read_csv(iggy_url)

    # Data clearning
    iggy_df = iggy_df.rename(
    columns={
        "meetings.races.bfExchangeMarketId":"market_id",
        "meetings.races.runners.bfExchangeSelectionId":"selection_id",
        "meetings.races.runners.ratedPrice":"rating",
        "meetings.races.number":"RaceNum",
        "meetings.name":"Track",
        "meetings.races.runners.name":"DogName"
        }
    )
    # iggy_df = iggy_df[['market_id','selection_id','rating']]
    iggy_df['market_id'] = iggy_df['market_id'].astype(str)
    iggy_df['date_dt'] = date

    # Set market_id and selection_id as index for easy referencing
    # iggy_df = iggy_df.set_index(['market_id','selection_id'])
    return(iggy_df)

# Download historical ratings over a time period and convert into a big DataFrame.
back_test_period = pd.date_range(start='2022-03-01', end='2022-04-01')
frames = [download_iggy_ratings(day) for day in back_test_period]
iggy_df = pd.concat(frames)
iggy_df

# format DogNames to merge
todays_data['DogName'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
iggy_df['DogName'] = iggy_df['DogName'].str.upper()
# Merge
backtest = iggy_df[['market_id','selection_id','DogName','date_dt']].merge(todays_data[['rating','DogName','date_dt']], how = 'inner', on = ['DogName','date_dt'])
backtest

# Save predictions for if we want to backtest/simulate it later
backtest.to_csv('backtest.csv', index=False) # Csv format
# backtest.to_pickle('backtest.pkl') # pickle format (faster, but can't open in excel)

todays_data[todays_data['FastTrack_RaceId'] == '798906744']

# Select todays data 
todays_data = model_df[model_df['date_dt'] == pd.Timestamp.now().strftime('%Y-%m-%d')]

# Generate runner win predictions
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities

# We no longer renomralise probability in this chunk of code, do it in Flumine instead
# todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
# todays_data['rating'] = 1/todays_data['renormalise_prob']
# todays_data = todays_data.sort_values(by = 'date_dt')

todays_data

# Prepare data for easy reference in flumine
todays_data['DogName_bf'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
todays_data.replace({'Sandown (SAP)': 'Sandown Park'}, regex=True, inplace=True)
todays_data = todays_data.set_index(['DogName_bf','Track','RaceNum'])
todays_data.head()

# Credentials to login and logging in 
trading = betfairlightweight.APIClient('username','password',app_key='appkey')
client = clients.BetfairClient(trading, interactive_login=True)

# Login
framework = Flumine(client=client)

# Code to login when using security certificates
# trading = betfairlightweight.APIClient('username','password',app_key='appkey', certs=r'C:\Users\zhoui\openssl_certs')
# client = clients.BetfairClient(trading)

# framework = Flumine(client=client)

class FlatBetting(BaseStrategy):
    def start(self) -&gt; None:
        print("starting strategy 'FlatBetting' using the model we created the Greyhound modelling in Python Tutorial")

    def check_market_book(self, market: Market, market_book: MarketBook) -&gt; bool:
        if market_book.status != "CLOSED":
            return True

    def process_market_book(self, market: Market, market_book: MarketBook) -&gt; None:
        # Convert dataframe to a global variable
        global todays_data

        # At the 60 second mark:
        if market.seconds_to_start &lt; 60 and market_book.inplay == False:
            # get the list of dog_names, name of the track/venue and race_number/RaceNum from Betfair Polling API
            dog_names = []
            track = market.market_catalogue.event.venue
            race_number = market.market_catalogue.market_name.split(' ',1)[0]  # comes out as R1/R2/R3 .. etc
            race_number = re.sub("[^0-9]", "", race_number)  # only keep the numbers 
            for runner_cata in market.market_catalogue.runners:
                dog_name = runner_cata.runner_name.split(' ',1)[1].upper()
                dog_names.append(dog_name)

            # Check if there are box changes, if there are then use Brett's code
            if market.market_catalogue.description.clarifications != None:
                # Brett's code to get Box changes:
                my_string = market.market_catalogue.description.clarifications.replace("<br/> Dog","<br/>Dog")
                pattern1 = r'(?&lt;=<br/>Dog ).+?(?= starts)'
                pattern2 = r"(?&lt;=\bbox no. )(\w+)"
                runners_df = pd.DataFrame (regexp_tokenize(my_string, pattern1), columns = ['runner_name'])
                runners_df['runner_name'] = runners_df['runner_name'].astype(str)
                # Remove dog name from runner_number
                runners_df['runner_number'] = runners_df['runner_name'].apply(lambda x: x[:(x.find(" ") - 1)].upper())
                # Remove dog number from runner_name
                runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
                runners_df['Box'] = regexp_tokenize(my_string, pattern2)

                # Replace any old Box info in our original dataframe with data available in runners_df
                runners_df = runners_df.set_index('runner_name')
                todays_data.loc[(runners_df.index[runners_df.index.isin(dog_names)],track,race_number),'Box'] = runners_df.loc[runners_df.index.isin(dog_names),'Box'].to_list()
                # Merge box_win_percentage back on:
                todays_data = todays_data.drop(columns = 'box_win_percentage', axis = 1)
                todays_data = todays_data.reset_index().merge(box_win_percent, on = ['Track', 'Distance','Box'], how = 'left').set_index(['DogName_bf','Track','RaceNum'])

            # Generate probabilities using Bruno's model
            todays_data.loc[(dog_names,track,race_number),'prob_LogisticRegression'] = brunos_model.predict_proba(todays_data.loc[(dog_names,track,race_number)][feature_cols])[:,1]
            # renomalise probabilities
            probabilities = todays_data.loc[dog_names,track,race_number]['prob_LogisticRegression']
            todays_data.loc[(dog_names,track,race_number),'renormalised_prob'] = probabilities/probabilities.sum()
            # convert probaiblities to ratings
            todays_data.loc[(dog_names,track,race_number),'rating'] = 1/todays_data.loc[dog_names,track,race_number]['renormalised_prob']

            # Use both the polling api (market.catalogue) and the streaming api at once:
            for runner_cata, runner in zip(market.market_catalogue.runners, market_book.runners):
                # Check the polling api and streaming api matches up (sometimes it doesn't)
                if runner_cata.selection_id == runner.selection_id:
                    # Get the dog_name from polling api then reference our data for our model rating
                    dog_name = runner_cata.runner_name.split(' ',1)[1].upper()

                    # Rest is the same as How to Automate III
                    model_price = todays_data.loc[dog_name,track,race_number]['rating']
                    ### If you have an issue such as:
                        # Unknown error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
                        # Then do model_price = todays_data.loc[dog_name,track,race_number]['rating'].item()

                    # Log info before placing bets
                    logging.info(f'dog_name: {dog_name}')
                    logging.info(f'model_price: {model_price}')
                    logging.info(f'market_id: {market_book.market_id}')
                    logging.info(f'selection_id: {runner.selection_id}')

                    # If best available to back price is &gt; rated price then flat $5 back
                    if runner.status == "ACTIVE" and runner.ex.available_to_back[0]['price'] &gt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="BACK", order_type=LimitOrder(price=runner.ex.available_to_back[0]['price'], size=5.00)
                        )
                        market.place_order(order)
                    # If best available to lay price is &lt; rated price then flat $5 lay
                    if runner.status == "ACTIVE" and runner.ex.available_to_lay[0]['price'] &lt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="LAY", order_type=LimitOrder(price=runner.ex.available_to_lay[0]['price'], size=5.00)
                        )
                        market.place_order(order)

greyhounds_strategy = FlatBetting(
    market_filter=streaming_market_filter(
        event_type_ids=["4339"], # Greyhounds markets
        country_codes=["AU"], # Australian markets
        market_types=["WIN"], # Win markets
    ),
    max_order_exposure= 50, # Max exposure per order = 50
    max_trade_count=1, # Max 1 trade per selection
    max_live_trade_count=1, # Max 1 unmatched trade per selection
)

framework.add_strategy(greyhounds_strategy)

# logger = logging.getLogger(__name__)

"""
Worker can be used as followed:
    framework.add_worker(
        BackgroundWorker(
            framework,
            terminate,
            func_kwargs={"today_only": True, "seconds_closed": 1200},
            interval=60,
            start_delay=60,
        )
    )
This will run every 60s and will terminate 
the framework if all markets starting 'today' 
have been closed for at least 1200s
"""


# Function that stops automation running at the end of the day
def terminate(
    context: dict, flumine, today_only: bool = True, seconds_closed: int = 600
) -&gt; None:
    """terminate framework if no markets
    live today.
    """
    markets = list(flumine.markets.markets.values())
    markets_today = [
        m
        for m in markets
        if m.market_start_datetime.date() == datetime.datetime.utcnow().date()
        and (
            m.elapsed_seconds_closed is None
            or (m.elapsed_seconds_closed and m.elapsed_seconds_closed &lt; seconds_closed)
        )
    ]
    if today_only:
        market_count = len(markets_today)
    else:
        market_count = len(markets)
    if market_count == 0:
        # logger.info("No more markets available, terminating framework")
        flumine.handler_queue.put(TerminationEvent(flumine))

# Add the stopped to our framework
framework.add_worker(
    BackgroundWorker(
        framework,
        terminate,
        func_kwargs={"today_only": True, "seconds_closed": 1200},
        interval=60,
        start_delay=60,
    )
)

logger = logging.getLogger(__name__)

FIELDNAMES = [
    "bet_id",
    "strategy_name",
    "market_id",
    "selection_id",
    "trade_id",
    "date_time_placed",
    "price",
    "price_matched",
    "size",
    "size_matched",
    "profit",
    "side",
    "elapsed_seconds_executable",
    "order_status",
    "market_note",
    "trade_notes",
    "order_notes",
]


class LiveLoggingControl(LoggingControl):
    NAME = "BACKTEST_LOGGING_CONTROL"

    def __init__(self, *args, **kwargs):
        super(LiveLoggingControl, self).__init__(*args, **kwargs)
        self._setup()

    # Changed file path and checks if the file orders_hta_4.csv already exists, if it doens't then create it
    def _setup(self):
        if os.path.exists("orders_hta_4.csv"):
            logging.info("Results file exists")
        else:
            with open("orders_hta_4.csv", "w") as m:
                csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                csv_writer.writeheader()

    def _process_cleared_orders_meta(self, event):
        orders = event.event
        with open("orders_hta_4.csv", "a") as m:
            for order in orders:
                if order.order_type.ORDER_TYPE == OrderTypes.LIMIT:
                    size = order.order_type.size
                else:
                    size = order.order_type.liability
                if order.order_type.ORDER_TYPE == OrderTypes.MARKET_ON_CLOSE:
                    price = None
                else:
                    price = order.order_type.price
                try:
                    order_data = {
                        "bet_id": order.bet_id,
                        "strategy_name": order.trade.strategy,
                        "market_id": order.market_id,
                        "selection_id": order.selection_id,
                        "trade_id": order.trade.id,
                        "date_time_placed": order.responses.date_time_placed,
                        "price": price,
                        "price_matched": order.average_price_matched,
                        "size": size,
                        "size_matched": order.size_matched,
                        "profit": 0 if not order.cleared_order else order.cleared_order.profit,
                        "side": order.side,
                        "elapsed_seconds_executable": order.elapsed_seconds_executable,
                        "order_status": order.status.value,
                        "market_note": order.trade.market_notes,
                        "trade_notes": order.trade.notes_str,
                        "order_notes": order.notes_str,
                    }
                    csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                    csv_writer.writerow(order_data)
                except Exception as e:
                    logger.error(
                        "_process_cleared_orders_meta: %s" % e,
                        extra={"order": order, "error": e},
                    )

        logger.info("Orders updated", extra={"order_count": len(orders)})

    def _process_cleared_markets(self, event):
        cleared_markets = event.event
        for cleared_market in cleared_markets.orders:
            logger.info(
                "Cleared market",
                extra={
                    "market_id": cleared_market.market_id,
                    "bet_count": cleared_market.bet_count,
                    "profit": cleared_market.profit,
                    "commission": cleared_market.commission,
                },
            )

framework.add_logging_control(
    LiveLoggingControl()
)

framework.run()

Disclaimer

Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.