Skip to content

How to Automate IV: Automate your own Model


This is an archived version of How to Automate 4 the latest version is available here

For this tutorial we will be automating the model that Bruno taught us how to make in the Greyhound Modelling Tutorial. This tutorial follows on logically from How to Automate III. If you haven't already, make sure you take a look at the rest of the series first those before continuing here as they cover some key concepts!


Saving and loading in our model

To generate our predictions, we have two options: we can generate our predictions using the same notebook used to train our model then read those predictions into this notebook, or we can save the model and read that model into this notebook.

For this tutorial we have chosen to save the model, as it becomes a bit less confusing and easier to manage, although there are some pieces of code we may have to write twice (copy and paste). So first we will need to run the code from the tutorial and then save the model. This is super as simple we can just copy and paste the complete code provided at the end of the tutorial or download from Github. Then we can just run this extra line code (which I have copied from the documentation page) at the end of the notebook to save the model.

from joblib import dump
dump(models['LogisticRegression'], 'logistic_regression.joblib')

Now that the file is saved, let's read it into this note book:

from joblib import load

brunos_model = load('logistic_regression.joblib')
brunos_model
LogisticRegression(n_jobs=-1, solver='saga')

Generating predictions for today

Now that we have the model loaded in, we need the data, to generate our predictions for today's races!

# Import libraries required to download today's races
import os
import sys

# Allow imports from src folder
module_path = os.path.abspath(os.path.join('../src'))
if module_path not in sys.path:
    sys.path.append(module_path)

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
from dateutil import tz
from pandas.tseries.offsets import MonthEnd
from sklearn.preprocessing import MinMaxScaler
import itertools

import numpy as np
import pandas as pd
from nltk.tokenize import regexp_tokenize

# settings to display all columns
pd.set_option("display.max_columns", None)

import fasttrack as ft

from dotenv import load_dotenv
load_dotenv()
True
# Validate FastTrack API connection
api_key = os.getenv('FAST_TRACK_API_KEY',)
client = ft.Fasttrack(api_key)
track_codes = client.listTracks()
Valid Security Key

# Import race data excluding NZ races
au_tracks_filter = list(track_codes[track_codes['state'] != 'NZ']['track_code'])

# Time window to import data
# First day of the month 46 months back from now
date_from = (datetime.today() - relativedelta(months=46)).replace(day=1).strftime('%Y-%m-%d')
# First day of previous month
date_to = (datetime.today() - relativedelta(months=1)).replace(day=1).strftime('%Y-%m-%d')

# Dataframes to populate data with
race_details = pd.DataFrame()
dog_results = pd.DataFrame()

# For each month, either fetch data from API or use local CSV file if we already have downloaded it
for start in pd.date_range(date_from, date_to, freq='MS'):
    start_date = start.strftime("%Y-%m-%d")
    end_date = (start + MonthEnd(1)).strftime("%Y-%m-%d")
    try:
        filename_races = f'FT_AU_RACES_{start_date}.csv'
        filename_dogs = f'FT_AU_DOGS_{start_date}.csv'

        filepath_races = f'../data/{filename_races}'
        filepath_dogs = f'../data/{filename_dogs}'

        print(f'Loading data from {start_date} to {end_date}')
        if os.path.isfile(filepath_races):
            # Load local CSV file
            month_race_details = pd.read_csv(filepath_races) 
            month_dog_results = pd.read_csv(filepath_dogs) 
        else:
            # Fetch data from API
            month_race_details, month_dog_results = client.getRaceResults(start_date, end_date, au_tracks_filter)
            month_race_details.to_csv(filepath_races, index=False)
            month_dog_results.to_csv(filepath_dogs, index=False)

        # Combine monthly data
        race_details = race_details.append(month_race_details, ignore_index=True)
        dog_results = dog_results.append(month_dog_results, ignore_index=True)
    except:
        print(f'Could not load data from {start_date} to {end_date}')
Loading data from 2018-09-01 to 2018-09-30
Loading data from 2018-10-01 to 2018-10-31
Loading data from 2018-11-01 to 2018-11-30
Loading data from 2018-12-01 to 2018-12-31
Loading data from 2019-01-01 to 2019-01-31
Loading data from 2019-02-01 to 2019-02-28
Loading data from 2019-03-01 to 2019-03-31
Loading data from 2019-04-01 to 2019-04-30
Loading data from 2019-05-01 to 2019-05-31
Loading data from 2019-06-01 to 2019-06-30
Loading data from 2019-07-01 to 2019-07-31
Loading data from 2019-08-01 to 2019-08-31
Loading data from 2019-09-01 to 2019-09-30
Loading data from 2019-10-01 to 2019-10-31
Loading data from 2019-11-01 to 2019-11-30
Loading data from 2019-12-01 to 2019-12-31
Loading data from 2020-01-01 to 2020-01-31
Loading data from 2020-02-01 to 2020-02-29
Loading data from 2020-03-01 to 2020-03-31
Loading data from 2020-04-01 to 2020-04-30
Loading data from 2020-05-01 to 2020-05-31
Loading data from 2020-06-01 to 2020-06-30
Loading data from 2020-07-01 to 2020-07-31
Loading data from 2020-08-01 to 2020-08-31
Loading data from 2020-09-01 to 2020-09-30
Loading data from 2020-10-01 to 2020-10-31
Loading data from 2020-11-01 to 2020-11-30
Loading data from 2020-12-01 to 2020-12-31
Loading data from 2021-01-01 to 2021-01-31
Loading data from 2021-02-01 to 2021-02-28
Loading data from 2021-03-01 to 2021-03-31
Loading data from 2021-04-01 to 2021-04-30
Loading data from 2021-05-01 to 2021-05-31
Loading data from 2021-06-01 to 2021-06-30
Loading data from 2021-07-01 to 2021-07-31
Loading data from 2021-08-01 to 2021-08-31
Loading data from 2021-09-01 to 2021-09-30
Loading data from 2021-10-01 to 2021-10-31
Loading data from 2021-11-01 to 2021-11-30
Loading data from 2021-12-01 to 2021-12-31
Loading data from 2022-01-01 to 2022-01-31

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\IPython\core\interactiveshell.py:3441: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

Loading data from 2022-02-01 to 2022-02-28
Loading data from 2022-03-01 to 2022-03-31
Loading data from 2022-04-01 to 2022-04-30
Loading data from 2022-05-01 to 2022-05-31
Loading data from 2022-06-01 to 2022-06-30

This piece of code we copied and pasted from the Greyhound Modelling Tutorial is fantastic! It has downloaded/read-in a ton of historic data! There is an issue though! We don't have the data for today's races, and also for any races that has occurred this month. This is because the code above only downloaded data up until the end of last month.

For example, if we are in the middle of June, then any races in the first two weeks of June won't be downloaded by the chunk of code above. An issue is that if we download it now, when tomorrow rolls around it won't include the extra races that have finished today.

So, the simple but inefficient solution is that every single day we redownload all the races that have already concluded this month. (Ideally you have some sort of database set up or you store and download your data in a daily format instead of the monthly format)

race_details.tail()
@id RaceNum RaceName RaceTime Distance RaceGrade Track date
88510 792243395 6 SKY RACING (N/P) STAKE 01:27PM 300m Restricted Win Murray Bridge (MBS) 07 Jun 22
88511 792243396 7 KURT DONSBERG PHOTOGRAPHY MIXED STAKE 01:44PM 300m Mixed 4/5 Murray Bridge (MBS) 07 Jun 22
88512 792243397 8 GREYHOUNDS AS PETS 02:04PM 300m Grade 5 Final Murray Bridge (MBS) 07 Jun 22
88513 792243398 9 @THEDOGSSA (N/P) STAKE 02:19PM 300m Restricted Win Murray Bridge (MBS) 07 Jun 22
88514 792243399 10 FOLLOW THEDOGSSA ON TWITTER (N/P) STAKE 02:39PM 300m Restricted Win Murray Bridge (MBS) 07 Jun 22
current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

print(f'Start date: {current_month_start_date}')
print(f'End Date: {current_month_end_date}')
Start date: 2022-07-01
End Date: 2022-07-30

# Download data for races that have concluded this current month up untill today
# Start and end dates for current month
current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

# Files names 
filename_races = f'FT_AU_RACES_{current_month_start_date}.csv'
filename_dogs = f'FT_AU_DOGS_{current_month_start_date}.csv'
# Where to store files locally
filepath_races = f'../data/{filename_races}'
filepath_dogs = f'../data/{filename_dogs}'

# Fetch data from API
month_race_details, month_dog_results = client.getRaceResults(current_month_start_date, current_month_end_date, au_tracks_filter)

# Save the files locally and replace any out of date fields
month_race_details.to_csv(filepath_races, index=False)
month_dog_results.to_csv(filepath_dogs, index=False)
Getting meets for each date ..

100%|██████████| 30/30 [00:14<00:00,  2.01it/s]

Getting historic results details ..

100%|██████████| 162/162 [01:30<00:00,  1.80it/s]

dog_results
@id Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney RaceId TrainerId TrainerName
0 114215500 1 DR. MURPHY 7.0 10 29.7 $4.10 NaN 4.24 NaN Q/111 0 NaN 4.70 22.84 NaN 356387352 107925 W McMahon
1 131737955 2 MOLLY SPOLLY 8.0 8 27.3 $2.20F NaN 4.24 4.24 M/222 0 NaN 4.72 23.14 NaN 356387352 199516 K Leviston
2 204414097 3 ASTON NARITA 2.0 2 29.2 $4.50 NaN 4.94 0.70 M/343 2 NaN 4.88 23.19 NaN 356387352 101224 K Gorman
3 126744995 4 ONI 6.0 6 25.0 $15.50 NaN 5.70 0.76 S/674 0 NaN 4.95 23.24 NaN 356387352 107925 W McMahon
4 120958941 5 DARCON FLASH 1.0 1 29.3 $31.00 NaN 6.54 0.84 M/765 8 NaN 4.96 23.30 NaN 356387352 125087 R Conway
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
749514 415996416 1 WEBLEC WHIRL 2.0 2 27.6 $4.20 NaN 0.10 NaN 0 0 NaN 4.38 16.75 510.0 792243399 76598 N Loechel
749515 557002281 2 UP THERE BILLY 1.0 1 32.7 $1.80F NaN 0.10 0.14 NaN 0 NaN 4.43 16.76 175.0 792243399 327728 J Trengove
749516 529022935 3 WEBLEC FLAME 6.0 6 31.8 $5.50 NaN 4.25 4.00 NaN 0 NaN 4.48 17.04 140.0 792243399 76598 N Loechel
749517 383604709 4 STRAIGHT BLAZE 8.0 8 34.7 $23.00 NaN 5.00 0.86 NaN 0 NaN 4.61 17.10 115.0 792243399 123529 D Johnstone
749518 529022943 5 WEBLEC MIST 4.0 4 27.6 $7.00 NaN 15.00 10.14 0 0 VINJ(21) 4.67 17.81 0.0 792243399 76598 N Loechel

749519 rows × 19 columns

# This is super important I have spent literally hours before I found out this was causing errors
dog_results['@id'] = pd.to_numeric(dog_results['@id'])
# Append the extra data to our data frames 
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

What we are really interested in are races that are scheduled for today as we want to use our model to predict their ratings. So, let's write some code we can run in the morning that will download the data for the day:

# Download the data for todays races
todays_date = pd.Timestamp.now().strftime("%Y-%m-%d")
todays_races, todays_dogs = client.getFullFormat(dt= todays_date, tracks = au_tracks_filter)

display(todays_races.head(1), todays_dogs.head(1))
Getting meets for each date ..

100%|██████████| 1/1 [00:00<00:00,  1.89it/s]

Getting dog lineups ..

100%|██████████| 12/12 [00:13<00:00,  1.14s/it]

@id RaceNum RaceName RaceTime RaceTimeDateUTC Distance RaceGrade PrizeMoney1 PrizeMoney2 PrizeMoney3 PrizeMoney4 PrizeMoney5 PrizeMoney6 PrizeMoney7 PrizeMoney8 GOBIS Hurdle Handicap TAB GradeCode VICGREYS RaceComment Track Date Quali TipsComments_Bet TipsComments_Tips
0 801896110 1 GPP LASER3300 06:44PM 04 Jul 22 08:44AM 385m Maiden $1600 $460 $230 $115 None None None None None None None TRI/QUIN R/D EXACTA PICK4 M None "KASUMI BERRY (5) is a well bred type and her ... Shepparton 04 Jul 22 None Box Quinella 1,2,4,7 ($10 for 166.67%) 4, 7, 1, 2
@id RaceBox DogName BestTime DogHandicap Odds Rating Speed DogComment StartsTOT StartsTTD Suburb Owner Colour Sex Whelped DogGrade DogGOBIS DogPRIZE AgedPrizeMoney Form DamId DamName SireId SireName TrainerId TrainerName RaceId
0 536196758 1 HAVE A SHIRAZ FSH None $4.60 0 None Dam produced the highly talented Flaming rush Starts 0-0-0-0 Trk/Dst 0-0-0-0 Heathcote Paul Ellis BK B 03 Dec 19 M N 0 0 [None, None, None, None, None] 1550070039 Pepper Shiraz -710494 Barcia Bale 117228 Jason Formosa 801896110
# It seems that the todays_races dataframe doesn't have the date column, so let's add that on
todays_races['date'] = pd.Timestamp.now().strftime('%d %b %y')
todays_races.head(1)
@id RaceNum RaceName RaceTime RaceTimeDateUTC Distance RaceGrade PrizeMoney1 PrizeMoney2 PrizeMoney3 PrizeMoney4 PrizeMoney5 PrizeMoney6 PrizeMoney7 PrizeMoney8 GOBIS Hurdle Handicap TAB GradeCode VICGREYS RaceComment Track Date Quali TipsComments_Bet TipsComments_Tips date
0 801896110 1 GPP LASER3300 06:44PM 04 Jul 22 08:44AM 385m Maiden $1600 $460 $230 $115 None None None None None None None TRI/QUIN R/D EXACTA PICK4 M None "KASUMI BERRY (5) is a well bred type and her ... Shepparton 04 Jul 22 None Box Quinella 1,2,4,7 ($10 for 166.67%) 4, 7, 1, 2 04 Jul 22
# It also seems that in todays_dogs dataframe Box is labeled as RaceBox instead, so let's rename it
# We can also see that there are some specific dogs that have "Res." as a suffix of their name, i.e. they are reserve dogs,
# We will treat this later
todays_dogs = todays_dogs.rename(columns={"RaceBox":"Box"})
todays_dogs.tail(3)
@id Box DogName BestTime DogHandicap Odds Rating Speed DogComment StartsTOT StartsTTD Suburb Owner Colour Sex Whelped DogGrade DogGOBIS DogPRIZE AgedPrizeMoney Form DamId DamName SireId SireName TrainerId TrainerName RaceId
1061 400500428 5 TEQUILA TALKING 22.63 None None 100 66.425 None Starts 58-11-7-12 Trk/Dst 28-6-5-5 Wolffdene Fives Alive Synd D Wolff,N Brauer BE D 19 Oct 18 4 N 28855 None [{'Place': '2nd', 'FormBox': '5', 'Weight': '3... 255840075 Sivamet -737547 Hostile 93322 Michael Brauer 801490825
1062 525622257 7 REFERRAL FSTD None None 81 61.343 None Starts 51-4-7-6 Trk/Dst 0-0-0-0 Laidley Heights Bad Decisions Synd P O'Reilly,A Pearce,D Henery BD D 22 Oct 19 5 N 13945 None [{'Place': '5th', 'FormBox': '7', 'Weight': '3... 257880044 Lovelace 792880037 Sh Avatar 313314 Andrew Pearce 801490825
1063 566347962 8 STARDUST DREAMS NBT None None 92 61.271 None Starts 22-3-4-2 Trk/Dst 2-0-1-0 Park Ridge Kerri-Lyn Harkness BK D 07 Mar 20 5 N 8240 None [{'Place': '2nd', 'FormBox': '8', 'Weight': '3... 118703516 Ellie Belles 141317074 My Redeemer 127311 Stephen Woods 801490825
# Appending todays data to this months data
month_dog_results = pd.concat([month_dog_results,todays_dogs],join='outer')[month_dog_results.columns]
month_race_details = pd.concat([month_race_details,todays_races],join='outer')[month_race_details.columns]

# Appending this months data to the rest of our historical data
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

Cleaning our data and feature creation

Originally I thought that since we now that we have all the data we can easily copy and paste the code used in the greyhound modelling tutorial to clean our data and create the features.

But after staring at weird predictions and spending hours trying to work out why some things weren't working I realised that for the most part we can copy and paste code, but when working with the live data we do need to make a few changes. I'll point them out when we get to it, but the main things that tripped me up is the data types the FastTrack API gives and that we need a system to work around reserve dogs

race_details
@id RaceNum RaceName RaceTime Distance RaceGrade Track date
0 356387352 1 RUTTER'S BUTCHERY & POULTRY 05:29PM 395m Mixed 6/7 Traralgon 01 Sep 18
1 356387359 2 TAB - WE LOVE A BET 05:47PM 395m Grade 5 Traralgon 01 Sep 18
2 356387358 3 HALEY CONCRETING 06:05PM 395m Grade 5 Traralgon 01 Sep 18
3 356387355 4 R.W & A.R INGLIS ELECTRICIANS 06:29PM 395m Free For All Traralgon 01 Sep 18
4 356387363 5 PRINTMAC 06:45PM 525m Grade 5 Traralgon 01 Sep 18
... ... ... ... ... ... ... ... ...
89359 801490821 5 SENNACHIE @ STUD - STEVE WHITE 08:13PM 520m Grade 5 Heat Albion Park 04 Jul 22
89360 801490822 6 ORSON ALLEN @ METICULOUS LODGE 08:35PM 520m Grade 5 Heat Albion Park 04 Jul 22
89361 801490823 7 SKY RACING 08:53PM 600m Mixed 4/5 Albion Park 04 Jul 22
89362 801490824 8 BORGBET TIPPING SERVICE 09:15PM 520m Mixed 3/4 Albion Park 04 Jul 22
89363 801490825 9 TIGGERLONG TONK @ STUD 09:37PM 395m Mixed 4/5 Albion Park 04 Jul 22

89364 rows × 8 columns

The first thing that tripped me up was when FastTrack_DogId for live data was in a string format, and because everything looks like it works, it took ages to find this error. So, let's make sure we deal with it here using:

dog_results['FastTrack_DogId'] = pd.to_numeric(dog_results['FastTrack_DogId'])
## Cleanse and normalise the data
# Clean up the race dataset
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
# Clean up the dogs results dataset
dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})

# New line of code (rest of this code chunk is copied from bruno's code)
dog_results['FastTrack_DogId'] = pd.to_numeric(dog_results['FastTrack_DogId'])

# Combine dogs results with race attributes
dog_results = dog_results.merge(
    race_details, 
    how = 'left',
    on = 'FastTrack_RaceId'
)

# Convert StartPrice to probability
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x is None else float(x.replace('$', '').replace('F', '')) if isinstance(x, str) else x)
dog_results['StartPrice_probability'] = (1 / dog_results['StartPrice']).fillna(0)
dog_results['StartPrice_probability'] = dog_results.groupby('FastTrack_RaceId')['StartPrice_probability'].apply(lambda x: x / x.sum())

# Discard entries without results (scratched or did not finish)
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results['Box'] = dog_results['Box'].astype(int)

# Clean up other attributes
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['SplitMargin'] = dog_results['SplitMargin'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float).fillna(0)
dog_results['Place'] = pd.to_numeric(dog_results['Place'].apply(lambda x: x.replace("=", "") if isinstance(x, str) else 0), errors='coerce').fillna(0)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x == 1 else 0)

# Normalise some of the raw values
dog_results['Prizemoney_norm'] = np.log10(dog_results['Prizemoney'] + 1) / 12
dog_results['Place_inv'] = (1 / dog_results['Place']).fillna(0)
dog_results['Place_log'] = np.log10(dog_results['Place'] + 1).fillna(0)
dog_results['RunSpeed'] = (dog_results['RunTime'] / dog_results['Distance']).fillna(0)
## Generate features using raw data
# Calculate median winner time per track/distance
win_results = dog_results[dog_results['win'] == 1]
median_win_time = pd.DataFrame(data=win_results[win_results['RunTime'] &gt; 0].groupby(['Track', 'Distance'])['RunTime'].median()).rename(columns={"RunTime": "RunTime_median"}).reset_index()
median_win_split_time = pd.DataFrame(data=win_results[win_results['SplitMargin'] &gt; 0].groupby(['Track', 'Distance'])['SplitMargin'].median()).rename(columns={"SplitMargin": "SplitMargin_median"}).reset_index()
median_win_time.head()

# Calculate track speed index
median_win_time['speed_index'] = (median_win_time['RunTime_median'] / median_win_time['Distance'])
median_win_time['speed_index'] = MinMaxScaler().fit_transform(median_win_time[['speed_index']])
median_win_time.head()

# Compare dogs finish time with median winner time
dog_results = dog_results.merge(median_win_time, on=['Track', 'Distance'], how='left')
dog_results = dog_results.merge(median_win_split_time, on=['Track', 'Distance'], how='left')

# Normalise time comparison
dog_results['RunTime_norm'] = (dog_results['RunTime_median'] / dog_results['RunTime']).clip(0.9, 1.1)
dog_results['RunTime_norm'] = MinMaxScaler().fit_transform(dog_results[['RunTime_norm']])
dog_results['SplitMargin_norm'] = (dog_results['SplitMargin_median'] / dog_results['SplitMargin']).clip(0.9, 1.1)
dog_results['SplitMargin_norm'] = MinMaxScaler().fit_transform(dog_results[['SplitMargin_norm']])
dog_results.head()

# Calculate box winning percentage for each track/distance
box_win_percent = pd.DataFrame(data=dog_results.groupby(['Track', 'Distance', 'Box'])['win'].mean()).rename(columns={"win": "box_win_percent"}).reset_index()
# Add to dog results dataframe
dog_results = dog_results.merge(box_win_percent, on=['Track', 'Distance', 'Box'], how='left')
# Display example of barrier winning probabilities
print(box_win_percent.head(8))
         Track  Distance  Box  box_win_percent
0  Albion Park       331    1         0.198089
1  Albion Park       331    2         0.152116
2  Albion Park       331    3         0.127354
3  Albion Park       331    4         0.126605
4  Albion Park       331    5         0.111058
5  Albion Park       331    6         0.109304
6  Albion Park       331    7         0.105310
7  Albion Park       331    8         0.115146

The second thing that we need to add is related to reserve dogs, and this took me ages to come to this solution, but if you have a better one, please submit a pull request.

Basically, a single greyhound can be a reserve dog for multiple races on the same day. They each appear as a new row in our data frame. For example, 'MACI REID' is a reserve dog for three different races on the 2022-09-02:

reserve_dogs_example

When we try lag our data by using .shift(1) like in Bruno's original code it will produce the wrong values for our features. In the above example only the first race The Gardens Race 4 (the third row) will have correct data but all the rows under it will have incorrectly calculated features. We need each of the following rows to be the same as the third row. The solution that I have come up with is a little bit complicated, but it gets the job done:

# Please submit a pull request if you have a better solution
temp = rolling_result.reset_index()
temp = temp[temp['date_dt'] == pd.Timestamp.now().normalize()]
temp.groupby(['FastTrack_DogId','date_dt']).first()
rolling_result.loc[pd.IndexSlice[:, pd.Timestamp.now().normalize()], :] = temp.groupby(['FastTrack_DogId','date_dt']).first()

Basically, for each greyhound we can just take the first row of data (which is correct) and set the rest of today's races to have the same value

# Generate rolling window features
dataset = dog_results.copy()
dataset = dataset.set_index(['FastTrack_DogId', 'date_dt']).sort_index()

# Use rolling window of 28, 91 and 365 days
rolling_windows = ['28D', '91D', '365D']
# Features to use for rolling windows calculation
features = ['RunTime_norm', 'SplitMargin_norm', 'Place_inv', 'Place_log', 'Prizemoney_norm']
# Aggregation functions to apply
aggregates = ['min', 'max', 'mean', 'median', 'std']
# Keep track of generated feature names
feature_cols = ['speed_index', 'box_win_percent']

for rolling_window in rolling_windows:
        print(f'Processing rolling window {rolling_window}')

        rolling_result = (
            dataset
            .reset_index(level=0).sort_index()
            .groupby('FastTrack_DogId')[features]
            .rolling(rolling_window)
            .agg(aggregates)
            .groupby(level=0)  # Thanks to Brett for finding this!
            .shift(1)
        )

        # My own dodgey code to work with reserve dogs
        temp = rolling_result.reset_index()
        temp = temp[temp['date_dt'] == pd.Timestamp.now().normalize()]
        temp.groupby(['FastTrack_DogId','date_dt']).first()
        rolling_result.loc[pd.IndexSlice[:, pd.Timestamp.now().normalize()], :] = temp.groupby(['FastTrack_DogId','date_dt']).first()

        # Generate list of rolling window feature names (eg: RunTime_norm_min_365D)
        agg_features_cols = [f'{f}_{a}_{rolling_window}' for f, a in itertools.product(features, aggregates)]
        # Add features to dataset
        dataset[agg_features_cols] = rolling_result
        # Keep track of generated feature names
        feature_cols.extend(agg_features_cols)
Processing rolling window 28D

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\pandas\core\generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

Processing rolling window 91D

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\pandas\core\generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

Processing rolling window 365D

c:\Users\zhoui\greyhounds_bruno\greyhound-modelling\venv_greyhounds\lib\site-packages\pandas\core\generic.py:4150: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

# Replace missing values with 0
dataset.fillna(0, inplace=True)
display(dataset.head(8))

# Only keep data after 2018-12-01
model_df = dataset.reset_index()
feature_cols = np.unique(feature_cols).tolist()
model_df = model_df[model_df['date_dt'] &gt;= '2018-12-01']

# This line was originally part of Bruno's tutorial, but we don't run it in this script
# model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice_probability'] + feature_cols]

# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]
Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney FastTrack_RaceId TrainerId TrainerName RaceNum RaceName RaceTime Distance RaceGrade Track date StartPrice_probability win Prizemoney_norm Place_inv Place_log RunSpeed RunTime_median speed_index SplitMargin_median RunTime_norm SplitMargin_norm box_win_percent RunTime_norm_min_28D RunTime_norm_max_28D RunTime_norm_mean_28D RunTime_norm_median_28D RunTime_norm_std_28D SplitMargin_norm_min_28D SplitMargin_norm_max_28D SplitMargin_norm_mean_28D SplitMargin_norm_median_28D SplitMargin_norm_std_28D Place_inv_min_28D Place_inv_max_28D Place_inv_mean_28D Place_inv_median_28D Place_inv_std_28D Place_log_min_28D Place_log_max_28D Place_log_mean_28D Place_log_median_28D Place_log_std_28D Prizemoney_norm_min_28D Prizemoney_norm_max_28D Prizemoney_norm_mean_28D Prizemoney_norm_median_28D Prizemoney_norm_std_28D RunTime_norm_min_91D RunTime_norm_max_91D RunTime_norm_mean_91D RunTime_norm_median_91D RunTime_norm_std_91D SplitMargin_norm_min_91D SplitMargin_norm_max_91D SplitMargin_norm_mean_91D SplitMargin_norm_median_91D SplitMargin_norm_std_91D Place_inv_min_91D Place_inv_max_91D Place_inv_mean_91D Place_inv_median_91D Place_inv_std_91D Place_log_min_91D Place_log_max_91D Place_log_mean_91D Place_log_median_91D Place_log_std_91D Prizemoney_norm_min_91D Prizemoney_norm_max_91D Prizemoney_norm_mean_91D Prizemoney_norm_median_91D Prizemoney_norm_std_91D RunTime_norm_min_365D RunTime_norm_max_365D RunTime_norm_mean_365D RunTime_norm_median_365D RunTime_norm_std_365D SplitMargin_norm_min_365D SplitMargin_norm_max_365D SplitMargin_norm_mean_365D SplitMargin_norm_median_365D SplitMargin_norm_std_365D Place_inv_min_365D Place_inv_max_365D Place_inv_mean_365D Place_inv_median_365D Place_inv_std_365D Place_log_min_365D Place_log_max_365D Place_log_mean_365D Place_log_median_365D Place_log_std_365D Prizemoney_norm_min_365D Prizemoney_norm_max_365D Prizemoney_norm_mean_365D Prizemoney_norm_median_365D Prizemoney_norm_std_365D
FastTrack_DogId date_dt
-2143477291 2018-09-02 7.0 YOU TELAM ANFY 1 1 31.2 5.2 0.0 7.0 0.57 8 0 8.0 7.48 19.85 0.0 354469749 8462 A Bunney 12 BRISGREYS.COM 09:01PM 331 GRADE 5 PATHWAY NON-PENALTY Albion Park 02 Sep 18 0.161184 0 0.0 0.142857 0.903090 0.059970 19.17 0.600092 7.18 0.328715 0.299465 0.198089 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
2018-09-16 4.0 YOU TELAM ANFY 1 1 31.0 61.0 0.0 10.0 5.0 8 0 8.0 7.43 19.75 0.0 360928569 8462 A Bunney 11 SKY RACING 05:24PM 331 Grade 5 Albion Park 16 Sep 18 0.013847 0 0.0 0.250000 0.698970 0.059668 19.17 0.600092 7.18 0.353165 0.331763 0.198089 0.328715 0.328715 0.328715 0.328715 0.000000 0.299465 0.299465 0.299465 0.299465 0.000000 0.142857 0.142857 0.142857 0.142857 0.000000 0.903090 0.903090 0.903090 0.903090 0.000000 0.0 0.0 0.0 0.0 0.0 0.328715 0.328715 0.328715 0.328715 0.000000 0.299465 0.299465 0.299465 0.299465 0.000000 0.142857 0.142857 0.142857 0.142857 0.000000 0.903090 0.903090 0.903090 0.903090 0.000000 0.0 0.0 0.0 0.0 0.0 0.328715 0.328715 0.328715 0.328715 0.000000 0.299465 0.299465 0.299465 0.299465 0.000000 0.142857 0.142857 0.142857 0.142857 0.000000 0.90309 0.903090 0.903090 0.903090 0.000000 0.0 0.0 0.0 0.0 0.0
2018-10-07 7.0 YOU TELAM ANFY 1 1 30.5 71.0 0.0 8.25 4.0 7 0 7.0 7.42 19.71 0.0 367774713 8462 A Bunney 11 SKY RACING 08:27PM 331 Grade 5 Albion Park 07 Oct 18 0.011604 0 0.0 0.142857 0.903090 0.059547 19.17 0.600092 7.18 0.363014 0.338275 0.198089 0.328715 0.353165 0.340940 0.340940 0.017288 0.299465 0.331763 0.315614 0.315614 0.022838 0.142857 0.250000 0.196429 0.196429 0.075761 0.698970 0.903090 0.801030 0.801030 0.144335 0.0 0.0 0.0 0.0 0.0 0.328715 0.353165 0.340940 0.340940 0.017288 0.299465 0.331763 0.315614 0.315614 0.022838 0.142857 0.250000 0.196429 0.196429 0.075761 0.698970 0.903090 0.801030 0.801030 0.144335 0.0 0.0 0.0 0.0 0.0 0.328715 0.353165 0.340940 0.340940 0.017288 0.299465 0.331763 0.315614 0.315614 0.022838 0.142857 0.250000 0.196429 0.196429 0.075761 0.69897 0.903090 0.801030 0.801030 0.144335 0.0 0.0 0.0 0.0 0.0
2018-10-21 5.0 YOU TELAM ANFY 7 9 29.8 26.0 0.0 5.25 0.29 6 0 6.0 7.46 19.85 0.0 370420123 8462 A Bunney 12 ZILLMERE SPORTS 08:55PM 331 GRADE 5 PATHWAY NON-PENALTY Albion Park 21 Oct 18 0.032235 0 0.0 0.200000 0.778151 0.059970 19.17 0.600092 7.18 0.328715 0.312332 0.105310 0.353165 0.363014 0.358089 0.358089 0.006964 0.331763 0.338275 0.335019 0.335019 0.004605 0.142857 0.250000 0.196429 0.196429 0.075761 0.698970 0.903090 0.801030 0.801030 0.144335 0.0 0.0 0.0 0.0 0.0 0.328715 0.363014 0.348298 0.353165 0.017659 0.299465 0.338275 0.323168 0.331763 0.020784 0.142857 0.250000 0.178571 0.142857 0.061859 0.698970 0.903090 0.835050 0.903090 0.117849 0.0 0.0 0.0 0.0 0.0 0.328715 0.363014 0.348298 0.353165 0.017659 0.299465 0.338275 0.323168 0.331763 0.020784 0.142857 0.250000 0.178571 0.142857 0.061859 0.69897 0.903090 0.835050 0.903090 0.117849 0.0 0.0 0.0 0.0 0.0
2018-11-18 5.0 YOU TELAM ANFY 1 1 30.2 41.0 0.0 6.25 3.14 6 0 6.0 7.43 19.86 0.0 378695693 8462 A Bunney 10 ZILLMERE SPORTS 08:16PM 331 Grade 5 Albion Park 18 Nov 18 0.020215 0 0.0 0.200000 0.778151 0.060000 19.17 0.600092 7.18 0.326284 0.331763 0.198089 0.328715 0.363014 0.345865 0.345865 0.024253 0.312332 0.338275 0.325304 0.325304 0.018344 0.142857 0.200000 0.171429 0.171429 0.040406 0.778151 0.903090 0.840621 0.840621 0.088345 0.0 0.0 0.0 0.0 0.0 0.328715 0.363014 0.343402 0.340940 0.017429 0.299465 0.338275 0.320459 0.322048 0.017814 0.142857 0.250000 0.183929 0.171429 0.051632 0.698970 0.903090 0.820825 0.840621 0.100341 0.0 0.0 0.0 0.0 0.0 0.328715 0.363014 0.343402 0.340940 0.017429 0.299465 0.338275 0.320459 0.322048 0.017814 0.142857 0.250000 0.183929 0.171429 0.051632 0.69897 0.903090 0.820825 0.840621 0.100341 0.0 0.0 0.0 0.0 0.0
2019-06-23 5.0 YOU TELAM ANFY 1 1 29.6 51.0 0.0 5.75 0.86 8 0 8.0 7.44 19.66 0.0 445056131 8462 A Bunney 9 GREYHOUND ADOPTION PROGRAM 07:54PM 331 Grade 5 Albion Park 23 Jun 19 0.016271 0 0.0 0.200000 0.778151 0.059396 19.17 0.600092 7.18 0.375381 0.325269 0.198089 0.326284 0.326284 0.326284 0.326284 0.000000 0.331763 0.331763 0.331763 0.331763 0.000000 0.200000 0.200000 0.200000 0.200000 0.000000 0.778151 0.778151 0.778151 0.778151 0.000000 0.0 0.0 0.0 0.0 0.0 0.326284 0.363014 0.339979 0.328715 0.016924 0.299465 0.338275 0.322720 0.331763 0.016234 0.142857 0.250000 0.187143 0.200000 0.045288 0.698970 0.903090 0.812290 0.778151 0.088969 0.0 0.0 0.0 0.0 0.0 0.326284 0.363014 0.339979 0.328715 0.016924 0.299465 0.338275 0.322720 0.331763 0.016234 0.142857 0.250000 0.187143 0.200000 0.045288 0.69897 0.903090 0.812290 0.778151 0.088969 0.0 0.0 0.0 0.0 0.0
2019-06-30 8.0 YOU TELAM ANFY 4 10 29.5 51.0 0.0 17.25 2.0 7 0 7.0 11.38 24.16 0.0 448789428 8462 A Bunney 7 SKY RACING 07:45PM 395 Open Albion Park 30 Jun 19 0.015995 0 0.0 0.125000 0.954243 0.061165 22.85 0.588509 10.58 0.228891 0.148506 0.130206 0.375381 0.375381 0.375381 0.375381 0.000000 0.325269 0.325269 0.325269 0.325269 0.000000 0.200000 0.200000 0.200000 0.200000 0.000000 0.778151 0.778151 0.778151 0.778151 0.000000 0.0 0.0 0.0 0.0 0.0 0.375381 0.375381 0.375381 0.375381 0.000000 0.325269 0.325269 0.325269 0.325269 0.000000 0.200000 0.200000 0.200000 0.200000 0.000000 0.778151 0.778151 0.778151 0.778151 0.000000 0.0 0.0 0.0 0.0 0.0 0.326284 0.375381 0.345879 0.340940 0.020929 0.299465 0.338275 0.323145 0.328516 0.014558 0.142857 0.250000 0.189286 0.200000 0.040846 0.69897 0.903090 0.806601 0.778151 0.080787 0.0 0.0 0.0 0.0 0.0
2019-08-25 6.0 YOU TELAM ANFY 4 4 29.5 14.0 0.0 5.0 0.57 3 0 3.0 7.33 19.72 0.0 465432748 8462 A Bunney 9 FABREGAS @ METICULOUS LODGE 07:40PM 331 Masters Grade 5 Albion Park 25 Aug 19 0.058684 0 0.0 0.166667 0.845098 0.059577 19.17 0.600092 7.18 0.360548 0.397681 0.126605 0.228891 0.375381 0.302136 0.302136 0.103585 0.148506 0.325269 0.236887 0.236887 0.124990 0.125000 0.200000 0.162500 0.162500 0.053033 0.778151 0.954243 0.866197 0.866197 0.124515 0.0 0.0 0.0 0.0 0.0 0.228891 0.375381 0.302136 0.302136 0.103585 0.148506 0.325269 0.236887 0.236887 0.124990 0.125000 0.200000 0.162500 0.162500 0.053033 0.778151 0.954243 0.866197 0.866197 0.124515 0.0 0.0 0.0 0.0 0.0 0.228891 0.375381 0.329166 0.328715 0.048169 0.148506 0.338275 0.298196 0.325269 0.067332 0.125000 0.250000 0.180102 0.200000 0.044505 0.69897 0.954243 0.827692 0.778151 0.092481 0.0 0.0 0.0 0.0 0.0

Generate predictions

Now this is the part that gets a bit hairy, so I am going to split it up into two parts. The good thing is that the coding will remain relatively simple.

The two things that I want to do is place live bets and save our predictions so that we can use them in a simulator we will create in the Part V.

Let's save our historical ratings for our simulator first as its quick and straight forward and then move on to placing live bets:

Getting data ready for our simulator

Feeding our predictions through the simulator is entirely optional, but, in my opinion it is where the real sauce is made. The idea is that if we are testing our model live, we can also use the simulator to test what would happen if we tested different staking methodologies, market timings and bet placement to optimise our model. This way you can have a model but test out different strategies to optimise model performance. The thing is, I have had a play with the simulator already and we can't simulate market_catalogue unless you have recorded it yourself (which is what I'll be using to get market_id and selection_id to place live bets). The simulator we will use later on will only take your ratings, market_id and selection_id, so we need our data in a similar format to what we had in How to automate III. In other words, since we don't have market_catalogue in the simulator, we need another way to get the market_id and selection_id.

My hacky work around is to generate the probabilities like normal (since the data is historical), we don't need to deal with reserve dogs and scratching's, then get the market_id and selection_id from the Betfair datascience greyhound model by merging on DogName and date. We can take the code we wrote in How to automate III that downloads the greyhound ratings and convert that into a function that downloads the ratings for a date range.

# Generate predictions like normal
# Range of dates that we want to simulate later '2022-03-01' to '2022-04-01'
todays_data = model_df[(model_df['date_dt'] &gt;= pd.Timestamp('2022-03-01').strftime('%Y-%m-%d')) &amp; (model_df['date_dt'] &lt; pd.Timestamp('2022-04-01').strftime('%Y-%m-%d'))]
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities
todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
todays_data['rating'] = 1/todays_data['renormalise_prob']
todays_data = todays_data.sort_values(by = 'date_dt')
todays_data
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/3121846001.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['prob_LogisticRegression'] = dog_win_probabilities
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/3121846001.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/3121846001.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['rating'] = 1/todays_data['renormalise_prob']

FastTrack_DogId date_dt Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney FastTrack_RaceId TrainerId TrainerName RaceNum RaceName RaceTime Distance RaceGrade Track date StartPrice_probability win Prizemoney_norm Place_inv Place_log RunSpeed RunTime_median speed_index SplitMargin_median RunTime_norm SplitMargin_norm box_win_percent RunTime_norm_min_28D RunTime_norm_max_28D RunTime_norm_mean_28D RunTime_norm_median_28D RunTime_norm_std_28D SplitMargin_norm_min_28D SplitMargin_norm_max_28D SplitMargin_norm_mean_28D SplitMargin_norm_median_28D SplitMargin_norm_std_28D Place_inv_min_28D Place_inv_max_28D Place_inv_mean_28D Place_inv_median_28D Place_inv_std_28D Place_log_min_28D Place_log_max_28D Place_log_mean_28D Place_log_median_28D Place_log_std_28D Prizemoney_norm_min_28D Prizemoney_norm_max_28D Prizemoney_norm_mean_28D Prizemoney_norm_median_28D Prizemoney_norm_std_28D RunTime_norm_min_91D RunTime_norm_max_91D RunTime_norm_mean_91D RunTime_norm_median_91D RunTime_norm_std_91D SplitMargin_norm_min_91D SplitMargin_norm_max_91D SplitMargin_norm_mean_91D SplitMargin_norm_median_91D SplitMargin_norm_std_91D Place_inv_min_91D Place_inv_max_91D Place_inv_mean_91D Place_inv_median_91D Place_inv_std_91D Place_log_min_91D Place_log_max_91D Place_log_mean_91D Place_log_median_91D Place_log_std_91D Prizemoney_norm_min_91D Prizemoney_norm_max_91D Prizemoney_norm_mean_91D Prizemoney_norm_median_91D Prizemoney_norm_std_91D RunTime_norm_min_365D RunTime_norm_max_365D RunTime_norm_mean_365D RunTime_norm_median_365D RunTime_norm_std_365D SplitMargin_norm_min_365D SplitMargin_norm_max_365D SplitMargin_norm_mean_365D SplitMargin_norm_median_365D SplitMargin_norm_std_365D Place_inv_min_365D Place_inv_max_365D Place_inv_mean_365D Place_inv_median_365D Place_inv_std_365D Place_log_min_365D Place_log_max_365D Place_log_mean_365D Place_log_median_365D Place_log_std_365D Prizemoney_norm_min_365D Prizemoney_norm_max_365D Prizemoney_norm_mean_365D Prizemoney_norm_median_365D Prizemoney_norm_std_365D prob_LogisticRegression renormalise_prob rating
526490 523685389 2022-03-01 1.0 JOSEPH RUMBLE 8 8 30.1 2.5 0.0 8.25 0 0 0 0 0.00 21.83 1365.0 764579619 91264 B Belford 1 TAB 06:55PM 380 Novice Non Penalty Townsville 01 Mar 22 0.318829 1 0.261288 1.000000 0.301030 0.057447 22.10 0.641825 7.56 0.561842 0.000000 0.123070 0.514985 0.514985 0.514985 0.514985 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.00 1.000000 1.000000 0.000000 0.301030 0.301030 0.301030 0.301030 0.000000 0.264578 0.264578 2.645776e-01 0.264578 0.000000 0.386805 0.514985 0.450895 0.450895 0.090637 0.482301 0.482301 0.482301 0.482301 0.000000 0.500000 1.00 0.750000 0.750000 0.353553 0.30103 0.477121 0.389076 0.389076 0.124515 0.238161 0.264578 0.251369 0.251369 0.018679 0.386805 0.514985 0.450895 0.450895 0.090637 0.482301 0.482301 0.482301 0.482301 0.000000 0.500000 1.00 0.750000 0.750000 0.353553 0.30103 0.477121 0.389076 0.389076 0.124515 0.238161 0.264578 0.251369 0.251369 0.018679 0.360558 0.321076 3.114527
494469 482776246 2022-03-01 1.0 BLAZING NENNA 4 4 25.7 2.3 0.0 3.31 0 Q/11 0 0 10.27 23.30 0.0 764625202 115912 M Delbridge 6 CHS GROUP HT3 05:37PM 410 Grade 5 Heat Horsham 01 Mar 22 0.365634 1 0.000000 1.000000 0.301030 0.056829 23.46 0.480327 10.39 0.534335 0.558423 0.131261 0.500000 0.513187 0.504396 0.500000 0.007613 0.464830 0.633333 0.570108 0.612161 0.091786 0.250000 1.00 0.500000 0.250000 0.433013 0.301030 0.698970 0.566323 0.698970 0.229751 0.000000 0.000000 1.628327e-15 0.000000 0.000000 0.307377 0.609439 0.508348 0.524719 0.074167 0.373358 0.651515 0.543480 0.584034 0.093516 0.142857 1.00 0.535714 0.500000 0.324194 0.30103 0.903090 0.528325 0.477121 0.198314 0.000000 0.000000 0.000000 0.000000 0.000000 0.233840 0.609439 0.493044 0.512184 0.074069 0.000000 0.651515 0.515072 0.539757 0.120687 0.125000 1.00 0.443328 0.333333 0.314640 0.30103 0.954243 0.607971 0.602060 0.219526 0.000000 0.000000 0.000000 0.000000 0.000000 0.204496 0.145077 6.892877
583640 578899991 2022-03-01 4.0 RIVER RAGING 2 2 30.3 13.8 0.0 3.71 1.13 M/3 0 0 6.89 20.36 0.0 764592641 283109 L Dalziel 1 FOLLOW @GRV_NEWS ON TWITTER 11:08AM 350 Maiden Healesville 01 Mar 22 0.061390 0 0.000000 0.250000 0.698970 0.058171 19.56 0.250778 6.64 0.303536 0.318578 0.138427 0.312992 0.312992 0.312992 0.312992 0.000000 0.346715 0.346715 0.346715 0.346715 0.000000 0.250000 0.25 0.250000 0.250000 0.000000 0.698970 0.698970 0.698970 0.698970 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.241288 0.312992 0.285038 0.303536 0.033489 0.251704 0.346715 0.290635 0.290765 0.038193 0.142857 0.25 0.201905 0.200000 0.048369 0.69897 0.903090 0.784856 0.778151 0.090009 0.000000 0.000000 0.000000 0.000000 0.000000 0.241288 0.312992 0.285038 0.303536 0.033489 0.251704 0.346715 0.290635 0.290765 0.038193 0.142857 0.25 0.201905 0.200000 0.048369 0.69897 0.903090 0.784856 0.778151 0.090009 0.000000 0.000000 0.000000 0.000000 0.000000 0.065977 0.102443 9.761504
385698 419530408 2022-03-01 5.0 FREENEY 8 8 25.1 14.0 0.0 8.25 1.14 0 0 0 0.00 22.47 20.0 764579628 63422 R Lound 10 BURDEKIN VET CLINIC 09:55PM 380 Grade 5 Townsville 01 Mar 22 0.058903 0 0.110185 0.200000 0.778151 0.059132 22.10 0.641825 7.56 0.417668 0.000000 0.123070 0.389381 0.406750 0.398065 0.398065 0.012282 0.519920 0.519920 0.519920 0.519920 0.000000 0.125000 0.25 0.187500 0.187500 0.088388 0.698970 0.954243 0.826606 0.826606 0.180505 0.110185 0.161208 1.356966e-01 0.135697 0.036079 0.378587 0.520445 0.438208 0.435239 0.050906 0.519920 0.519920 0.519920 0.519920 0.000000 0.125000 1.00 0.398810 0.250000 0.304154 0.30103 0.954243 0.636079 0.698970 0.229354 0.086783 0.256326 0.171119 0.161208 0.059824 0.277345 0.547967 0.448106 0.459606 0.067232 0.486807 0.519920 0.509978 0.516591 0.015762 0.125000 1.00 0.437302 0.333333 0.287424 0.30103 0.954243 0.595411 0.602060 0.199176 0.000000 0.256326 0.161721 0.181581 0.080195 0.148800 0.136846 7.307467
453065 451768903 2022-03-01 5.0 ENCOURAGING 2 2 22.1 15.0 0.0 6.0 0.29 455 0 455 10.35 22.57 0.0 764579685 70111 B Young 12 GOSSIE TIGERS GOOD TIMES 10:39PM 388 Non Graded Gosford 01 Mar 22 0.058204 0 0.000000 0.200000 0.778151 0.058170 22.39 0.564085 10.19 0.460124 0.422705 0.166667 0.462217 0.582400 0.498498 0.474688 0.056299 0.413211 0.605769 0.539241 0.568992 0.086316 0.200000 0.50 0.300000 0.250000 0.135401 0.000000 0.778151 0.530643 0.698970 0.317165 0.000000 0.227091 4.541824e-02 0.000000 0.101558 0.435567 0.582400 0.510280 0.506062 0.043547 0.335526 0.605769 0.507137 0.513845 0.080824 0.200000 1.00 0.559375 0.500000 0.321914 0.00000 0.778151 0.476169 0.477121 0.204293 0.000000 0.269225 0.170899 0.227091 0.115477 0.435567 0.620208 0.519683 0.508949 0.050995 0.335526 0.676056 0.528287 0.534247 0.085502 0.200000 1.00 0.592857 0.500000 0.310625 0.00000 0.778151 0.460377 0.477121 0.185631 0.000000 0.269225 0.187400 0.227091 0.105954 0.310750 0.189764 5.269698
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
269364 328634961 2022-03-31 6.0 LUNARAY 7 7 28.8 18.8 0.0 6.96 0.24 M/766 0 0 6.99 25.92 0.0 771803534 117062 V Mileto 6 SHEPPARTON WORKWEAR & SAFETY 01:19PM 450 Grade 5 T3 Shepparton 31 Mar 22 0.044487 0 0.000000 0.166667 0.845098 0.057600 25.55 0.404304 6.70 0.428627 0.292561 0.118601 0.370377 0.461165 0.415771 0.415771 0.064197 0.320144 0.341040 0.330592 0.330592 0.014776 0.200000 0.25 0.225000 0.225000 0.035355 0.698970 0.778151 0.738561 0.738561 0.055990 0.000000 0.000000 2.720046e-15 0.000000 0.000000 0.282853 0.461165 0.399299 0.415352 0.060173 0.000000 0.397661 0.280607 0.320144 0.129212 0.125000 1.00 0.379762 0.250000 0.297826 0.30103 0.954243 0.644364 0.698970 0.211158 0.000000 0.159040 0.022720 0.000000 0.060112 0.280719 0.529528 0.410459 0.414407 0.058641 0.000000 0.441003 0.297743 0.312132 0.111641 0.125000 1.00 0.324033 0.250000 0.245639 0.30103 0.954243 0.687225 0.698970 0.181101 0.000000 0.177795 0.010526 0.000000 0.041488 0.075047 0.097817 10.223167
538082 534921234 2022-03-31 6.0 QUINNLEY BALE 6 6 26.7 18.0 0.0 7.5 1.14 56 0 56 5.47 17.83 0.0 771810563 98781 S Rhodes 12 WATCH LIVE ON SPORTSBET 10:46PM 297 Grade 5 Dapto 31 Mar 22 0.046607 0 0.000000 0.166667 0.845098 0.060034 17.19 0.593790 5.37 0.320527 0.408592 0.120301 0.288301 0.347716 0.318008 0.318008 0.042013 0.303220 0.390710 0.346965 0.346965 0.061865 0.142857 0.50 0.321429 0.321429 0.252538 0.477121 0.903090 0.690106 0.690106 0.301205 0.000000 0.221975 1.109875e-01 0.110988 0.156960 0.288301 0.471082 0.384562 0.389432 0.082219 0.303220 0.426606 0.368411 0.371909 0.052814 0.142857 1.00 0.473214 0.375000 0.381742 0.30103 0.903090 0.595053 0.588046 0.262071 0.000000 0.264698 0.121668 0.110988 0.141569 0.236247 0.500000 0.387665 0.390897 0.075799 0.000000 0.481447 0.342572 0.353107 0.117160 0.125000 1.00 0.374454 0.250000 0.278973 0.00000 0.954243 0.605624 0.650515 0.270858 0.000000 0.264698 0.053752 0.000000 0.100704 0.068536 0.065307 15.312345
363059 403640676 2022-03-31 3.0 SAINT CHARLOTTE 5 5 27.7 18.0 0.0 4.5 0.14 0 0 0 10.87 23.59 140.0 771539517 67189 W Wilson 8 EXCHANGE PRINTERS (N/P) STAKE 01:52PM 400 Restricted Win Mount Gambier 31 Mar 22 0.041828 0 0.179102 0.333333 0.602060 0.058975 23.39 0.696399 10.98 0.457609 0.550598 0.147799 0.401509 0.534438 0.467974 0.467974 0.093995 0.328496 0.527473 0.427984 0.427984 0.140698 0.125000 1.00 0.562500 0.562500 0.618718 0.301030 0.954243 0.627636 0.627636 0.461891 0.000000 0.225702 1.128509e-01 0.112851 0.159595 0.401509 0.534438 0.475213 0.486146 0.041384 0.328496 0.564576 0.467640 0.504558 0.078033 0.125000 1.00 0.440833 0.333333 0.320552 0.30103 0.954243 0.603688 0.602060 0.220083 0.000000 0.225702 0.135590 0.179102 0.095348 0.257933 0.534438 0.457113 0.477655 0.064917 0.212446 0.564576 0.443305 0.445950 0.089104 0.125000 1.00 0.358408 0.250000 0.274086 0.30103 0.954243 0.660966 0.698970 0.195529 0.000000 0.225702 0.106690 0.172038 0.098393 0.155050 0.100326 9.967472
362095 403093108 2022-03-31 3.0 SILVER SANDALS 5 5 27.0 10.0 0.0 2.5 2.29 42 0 42 5.58 31.67 800.0 771539501 87148 S Lawrance 2 SKY RACING 06:40PM 520 Masters Grade 5 Ipswich 31 Mar 22 0.083134 0 0.241969 0.333333 0.602060 0.060904 30.79 0.823159 5.42 0.361067 0.356631 0.095370 0.435748 0.439595 0.437671 0.437671 0.002720 0.354196 0.368046 0.361121 0.361121 0.009793 0.333333 1.00 0.666667 0.666667 0.471405 0.301030 0.602060 0.451545 0.451545 0.212860 0.207730 0.275374 2.415521e-01 0.241552 0.047832 0.330867 0.507902 0.426973 0.435748 0.052271 0.198046 0.460029 0.340578 0.340419 0.073412 0.166667 1.00 0.498485 0.333333 0.340462 0.30103 0.845098 0.560166 0.602060 0.203530 0.110185 0.275374 0.199906 0.207730 0.064087 0.297445 0.507902 0.398206 0.402791 0.056453 0.156690 0.460029 0.325128 0.317851 0.074465 0.125000 1.00 0.328665 0.225000 0.260181 0.30103 0.954243 0.694669 0.738561 0.197928 0.000000 0.275374 0.127103 0.138606 0.097759 0.081817 0.110114 9.081527
565804 556974861 2022-03-31 2.0 ORSON LAURIE 3 3 30.4 3.1 0.0 0.75 0.86 112 0 112 8.32 23.04 530.0 771810580 125472 E Harris 4 RIVERINA STOCKFEEDS 08:04PM 411 Free For All Casino 31 Mar 22 0.275736 0 0.227091 0.500000 0.477121 0.056058 23.37 0.418680 8.62 0.571615 0.680288 0.062500 0.351105 0.583040 0.453627 0.445632 0.092450 0.487805 0.651134 0.562563 0.574442 0.064323 0.142857 1.00 0.362698 0.183333 0.339607 0.000000 0.903090 0.592798 0.778151 0.343479 0.000000 0.267033 7.058912e-02 0.000000 0.121104 0.351105 0.588161 0.473661 0.483607 0.085113 0.465422 0.664141 0.554578 0.574442 0.067472 0.142857 1.00 0.543492 0.500000 0.401886 0.00000 0.903090 0.545322 0.477121 0.298264 0.000000 0.275104 0.142790 0.167586 0.124768 0.351105 0.588161 0.479065 0.494281 0.085020 0.465422 0.664141 0.557285 0.574442 0.065012 0.142857 1.00 0.572024 0.500000 0.404685 0.00000 0.903090 0.530952 0.477121 0.294808 0.000000 0.275104 0.149961 0.224986 0.124372 0.150460 0.123089 8.124181

26438 rows × 117 columns

def download_iggy_ratings(date):
    """Downloads the Betfair Iggy model ratings for a given date and formats it into a nice DataFrame.

    Args:
        date (datetime): the date we want to download the ratings for
    """
    iggy_url_1 = 'https://betfair-data-supplier-prod.herokuapp.com/api/widgets/iggy-joey/datasets?date='
    iggy_url_2 = date.strftime("%Y-%m-%d")
    iggy_url_3 = '&amp;presenter=RatingsPresenter&amp;csv=true'
    iggy_url = iggy_url_1 + iggy_url_2 + iggy_url_3

    # Download todays greyhounds ratings
    iggy_df = pd.read_csv(iggy_url)

    # Data clearning
    iggy_df = iggy_df.rename(
    columns={
        "meetings.races.bfExchangeMarketId":"market_id",
        "meetings.races.runners.bfExchangeSelectionId":"selection_id",
        "meetings.races.runners.ratedPrice":"rating",
        "meetings.races.number":"RaceNum",
        "meetings.name":"Track",
        "meetings.races.runners.name":"DogName"
        }
    )
    # iggy_df = iggy_df[['market_id','selection_id','rating']]
    iggy_df['market_id'] = iggy_df['market_id'].astype(str)
    iggy_df['date_dt'] = date

    # Set market_id and selection_id as index for easy referencing
    # iggy_df = iggy_df.set_index(['market_id','selection_id'])
    return(iggy_df)

# Download historical ratings over a time period and convert into a big DataFrame.
back_test_period = pd.date_range(start='2022-03-01', end='2022-04-01')
frames = [download_iggy_ratings(day) for day in back_test_period]
iggy_df = pd.concat(frames)
iggy_df
Track meetings.bfExchangeEventId meetings.races.name RaceNum market_id meetings.races.comment selection_id meetings.races.runners.number DogName rating date_dt
0 Devonport 3 R1 452m Gr6 1 1.195395419 NaN 43154446 1 Magic Rogue 11.07 2022-03-01
1 Devonport 3 R1 452m Gr6 1 1.195395419 NaN 43154447 2 Youre Off 6.10 2022-03-01
2 Devonport 3 R1 452m Gr6 1 1.195395419 NaN 42352031 3 Castle Town 6.60 2022-03-01
3 Devonport 3 R1 452m Gr6 1 1.195395419 NaN 43154448 4 Buckle Up Aumond 8.02 2022-03-01
4 Devonport 3 R1 452m Gr6 1 1.195395419 NaN 42413752 5 Was That Then 17.14 2022-03-01
... ... ... ... ... ... ... ... ... ... ... ...
906 Wentworth Park 11 R9 520m Heat 9 1.196971034 NaN 27692794 3 Mercator Closer 24.49 2022-04-01
907 Wentworth Park 11 R9 520m Heat 9 1.196971034 NaN 36057267 4 Kooringa Lucy 2.61 2022-04-01
908 Wentworth Park 11 R9 520m Heat 9 1.196971034 NaN 25540155 6 Shanjo Prince 213.53 2022-04-01
909 Wentworth Park 11 R9 520m Heat 9 1.196971034 NaN 39788790 7 Lots Of Chatter 2.10 2022-04-01
910 Wentworth Park 11 R9 520m Heat 9 1.196971034 NaN 30681833 8 Zipping Brady 18.25 2022-04-01

27346 rows × 11 columns

# format DogNames to merge
todays_data['DogName'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
iggy_df['DogName'] = iggy_df['DogName'].str.upper()
# Merge
backtest = iggy_df[['market_id','selection_id','DogName','date_dt']].merge(todays_data[['rating','DogName','date_dt']], how = 'inner', on = ['DogName','date_dt'])
backtest
market_id selection_id DogName date_dt rating
0 1.195395419 43154446 MAGIC ROGUE 2022-03-01 8.137725
1 1.195395419 43154447 YOURE OFF 2022-03-01 6.051752
2 1.195395419 42352031 CASTLE TOWN 2022-03-01 9.768546
3 1.195395419 43154448 BUCKLE UP AUMOND 2022-03-01 13.656089
4 1.195395419 42413752 WAS THAT THEN 2022-03-01 11.941057
... ... ... ... ... ...
25540 1.196921144 39309141 RANAKO BALE 2022-03-31 3.975601
25541 1.196921144 39348645 NEVAEH BALE 2022-03-31 3.917458
25542 1.196921144 26870111 INGA MIA 2022-03-31 7.459386
25543 1.196921144 42472271 WINNIE COASTER 2022-03-31 7.046953
25544 1.196921144 40022831 ASTON HEBE 2022-03-31 4.603341

25545 rows × 5 columns

# Save predictions for if we want to backtest/simulate it later
backtest.to_csv('backtest.csv', index=False) # Csv format
# backtest.to_pickle('backtest.pkl') # pickle format (faster, but can't open in excel)

Perfect, with our hacky solution we have managed to merge around a months' worth of data relatively quickly and saved it in a csv format. With all the merging it seems we have only lost around 1000 - 2000 rows of data out of 27,000 rows of data, which seems only a small price to pay.

Getting data ready for placing live bets

Placing live bets is pretty simple but we have one issue. FastTrack Data alone is unable to tell us how many greyhounds will run in the race. For example, this race later today (2022-07-04) has 8 runners + 2 reserves:

todays_data[todays_data['FastTrack_RaceId'] == '798906744']
FastTrack_DogId date_dt Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney FastTrack_RaceId TrainerId TrainerName RaceName RaceTime Distance RaceGrade date StartPrice_probability win Prizemoney_norm Place_inv Place_log RunSpeed RunTime_median speed_index SplitMargin_median RunTime_norm SplitMargin_norm box_win_percent RunTime_norm_min_28D RunTime_norm_max_28D RunTime_norm_mean_28D RunTime_norm_median_28D RunTime_norm_std_28D SplitMargin_norm_min_28D SplitMargin_norm_max_28D SplitMargin_norm_mean_28D SplitMargin_norm_median_28D SplitMargin_norm_std_28D Place_inv_min_28D Place_inv_max_28D Place_inv_mean_28D Place_inv_median_28D Place_inv_std_28D Place_log_min_28D Place_log_max_28D Place_log_mean_28D Place_log_median_28D Place_log_std_28D Prizemoney_norm_min_28D Prizemoney_norm_max_28D Prizemoney_norm_mean_28D Prizemoney_norm_median_28D Prizemoney_norm_std_28D RunTime_norm_min_91D RunTime_norm_max_91D RunTime_norm_mean_91D RunTime_norm_median_91D RunTime_norm_std_91D SplitMargin_norm_min_91D SplitMargin_norm_max_91D SplitMargin_norm_mean_91D SplitMargin_norm_median_91D SplitMargin_norm_std_91D Place_inv_min_91D Place_inv_max_91D Place_inv_mean_91D Place_inv_median_91D Place_inv_std_91D Place_log_min_91D Place_log_max_91D Place_log_mean_91D Place_log_median_91D Place_log_std_91D Prizemoney_norm_min_91D Prizemoney_norm_max_91D Prizemoney_norm_mean_91D Prizemoney_norm_median_91D Prizemoney_norm_std_91D RunTime_norm_min_365D RunTime_norm_max_365D RunTime_norm_mean_365D RunTime_norm_median_365D RunTime_norm_std_365D SplitMargin_norm_min_365D SplitMargin_norm_max_365D SplitMargin_norm_mean_365D SplitMargin_norm_median_365D SplitMargin_norm_std_365D Place_inv_min_365D Place_inv_max_365D Place_inv_mean_365D Place_inv_median_365D Place_inv_std_365D Place_log_min_365D Place_log_max_365D Place_log_mean_365D Place_log_median_365D Place_log_std_365D Prizemoney_norm_min_365D Prizemoney_norm_max_365D Prizemoney_norm_mean_365D Prizemoney_norm_median_365D Prizemoney_norm_std_365D prob_LogisticRegression
DogName_bf Track RaceNum
YOU SEE LINA Cannington 1 530411826 2022-07-04 0.0 YOU SEE LINA 1 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 10408 Michael McLennan FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.167785 0.363135 0.419607 0.395672 0.404274 0.029202 0.395833 0.432534 0.410499 0.403130 0.019428 0.200000 0.500000 0.300000 0.200000 0.173205 0.477121 0.778151 0.677808 0.778151 0.173800 0.0 0.213623 7.120781e-02 0.000000 0.123336 0.184971 0.419607 0.324576 0.346645 0.092382 0.220812 0.432534 0.338647 0.353089 0.084555 0.166667 0.500000 0.316667 0.266667 0.153116 0.477121 0.845098 0.659617 0.690106 0.162743 0.0 0.213623 0.101436 0.092505 0.111554 0.179269 0.419607 0.303822 0.326906 0.075138 0.133803 0.432534 0.315318 0.310345 0.082022 0.125000 0.500000 0.247937 0.200000 0.121737 0.477121 0.954243 0.743299 0.778151 0.156196 0.0 0.213623 0.085118 0.000000 0.095461 0.115534
BELLA LINA Cannington 1 547605028 2022-07-04 0.0 BELLA LINA 5 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 83951 Rodney Noden FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.111111 0.228705 0.307236 0.273491 0.284534 0.040413 0.220812 0.285592 0.245073 0.228814 0.035318 0.166667 0.200000 0.188889 0.200000 0.019245 0.778151 0.845098 0.800467 0.778151 0.038652 0.0 0.000000 3.700743e-16 0.000000 0.000000 0.158046 0.307236 0.246575 0.264844 0.050600 0.029221 0.285592 0.203061 0.228814 0.091884 0.142857 0.200000 0.187075 0.200000 0.023119 0.778151 0.903090 0.805563 0.778151 0.049718 0.0 0.000000 0.000000 0.000000 0.000000 0.084276 0.307236 0.225040 0.234235 0.061784 0.029221 0.366864 0.240440 0.236842 0.088850 0.125000 0.250000 0.174702 0.166667 0.034400 0.698970 0.954243 0.835377 0.845098 0.073324 0.0 0.142298 0.008894 0.000000 0.035574 0.031221
PENNY KEEPING Cannington 1 561780971 2022-07-04 0.0 PENNY KEEPING 8 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 68481 Bradley Cook FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.100671 0.076736 0.356220 0.228759 0.253321 0.141352 0.424645 0.513538 0.474906 0.486535 0.045573 0.125000 0.500000 0.250000 0.125000 0.216506 0.477121 0.954243 0.795202 0.954243 0.275466 0.0 0.000000 0.000000e+00 0.000000 0.000000 0.076736 0.356220 0.236342 0.256207 0.116406 0.424645 0.513538 0.467291 0.465490 0.040207 0.125000 0.500000 0.223214 0.133929 0.184716 0.477121 0.954243 0.822174 0.928666 0.231296 0.0 0.000000 0.000000 0.000000 0.000000 0.076736 0.356220 0.236342 0.256207 0.116406 0.424645 0.513538 0.467291 0.465490 0.040207 0.125000 0.500000 0.223214 0.133929 0.184716 0.477121 0.954243 0.822174 0.928666 0.231296 0.0 0.000000 0.000000 0.000000 0.000000 0.049673
WHAT A PHOENIX Cannington 1 603189486 2022-07-04 0.0 WHAT A PHOENIX 7 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 10408 Michael McLennan FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.108392 0.155537 0.318460 0.227087 0.207265 0.083251 0.000000 0.338235 0.150847 0.114306 0.172053 0.125000 0.250000 0.166667 0.125000 0.072169 0.698970 0.954243 0.869152 0.954243 0.147382 0.0 0.000000 0.000000e+00 0.000000 0.000000 0.155537 0.318460 0.227087 0.207265 0.083251 0.000000 0.338235 0.150847 0.114306 0.172053 0.125000 0.250000 0.166667 0.125000 0.072169 0.698970 0.954243 0.869152 0.954243 0.147382 0.0 0.000000 0.000000 0.000000 0.000000 0.155537 0.318460 0.227087 0.207265 0.083251 0.000000 0.338235 0.150847 0.114306 0.172053 0.125000 0.250000 0.166667 0.125000 0.072169 0.698970 0.954243 0.869152 0.954243 0.147382 0.0 0.000000 0.000000 0.000000 0.000000 0.040679
WHAT A QUIZ Cannington 1 603189487 2022-07-04 0.0 WHAT A QUIZ 2 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 72510 Barry McPherson FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.141892 0.233563 0.233563 0.233563 0.233563 0.000000 0.302920 0.302920 0.302920 0.302920 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 0.698970 0.698970 0.000000 0.0 0.000000 0.000000e+00 0.000000 0.000000 0.233563 0.233563 0.233563 0.233563 0.000000 0.302920 0.302920 0.302920 0.302920 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 0.698970 0.698970 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.233563 0.233563 0.233563 0.233563 0.000000 0.302920 0.302920 0.302920 0.302920 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 0.698970 0.698970 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.043975
WHAT A SHAKER Cannington 1 603189986 2022-07-04 0.0 WHAT A SHAKER 4 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 72510 Barry McPherson FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.125000 0.282892 0.282892 0.282892 0.282892 0.000000 0.268116 0.268116 0.268116 0.268116 0.000000 0.333333 0.333333 0.333333 0.333333 0.000000 0.602060 0.602060 0.602060 0.602060 0.000000 0.0 0.000000 0.000000e+00 0.000000 0.000000 0.282892 0.282892 0.282892 0.282892 0.000000 0.268116 0.268116 0.268116 0.268116 0.000000 0.333333 0.333333 0.333333 0.333333 0.000000 0.602060 0.602060 0.602060 0.602060 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.282892 0.282892 0.282892 0.282892 0.000000 0.268116 0.268116 0.268116 0.268116 0.000000 0.333333 0.333333 0.333333 0.333333 0.000000 0.602060 0.602060 0.602060 0.602060 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.059174
WIZARDS LEGEND Cannington 1 614056673 2022-07-04 0.0 WIZARD'S LEGEND Res. 10 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 127397 Colin Bainbridge FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.000000 0.307944 0.341758 0.326417 0.327983 0.015946 0.206724 0.288937 0.259703 0.271576 0.036365 0.166667 0.250000 0.204167 0.200000 0.034359 0.698970 0.845098 0.775093 0.778151 0.059761 0.0 0.151629 3.790717e-02 0.000000 0.075814 0.089468 0.341758 0.269683 0.313202 0.093934 0.103960 0.377622 0.240904 0.244173 0.081242 0.125000 0.250000 0.194792 0.200000 0.042477 0.698970 0.954243 0.797104 0.778151 0.084209 0.0 0.151629 0.037164 0.000000 0.068832 0.089468 0.341758 0.275533 0.303754 0.077507 0.103960 0.377622 0.242702 0.244173 0.071625 0.125000 0.333333 0.204266 0.200000 0.058218 0.602060 0.954243 0.785504 0.778151 0.099650 0.0 0.151629 0.037412 0.000000 0.067696 0.019388
WIZARDS DRAMA Cannington 1 614057677 2022-07-04 0.0 WIZARD'S DRAMA Res. 9 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 127397 Colin Bainbridge FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.000000 0.225948 0.344591 0.271951 0.245316 0.063648 0.150000 0.269231 0.221376 0.244898 0.063000 0.142857 0.500000 0.280952 0.200000 0.191840 0.477121 0.903090 0.719454 0.778151 0.218967 0.0 0.209986 6.999522e-02 0.000000 0.121235 0.086870 0.344591 0.234575 0.238391 0.084399 0.088816 0.269231 0.188587 0.189288 0.067980 0.142857 0.500000 0.229762 0.171429 0.139270 0.477121 0.903090 0.777252 0.840621 0.169536 0.0 0.209986 0.059278 0.000000 0.094057 0.036656 0.344591 0.200202 0.228706 0.100907 0.069444 0.269231 0.170954 0.162215 0.071007 0.125000 0.500000 0.229613 0.171429 0.130210 0.477121 0.954243 0.777477 0.840621 0.171435 0.0 0.209986 0.044459 0.000000 0.084096 0.014393
DASHING ONYX Cannington 1 626191408 2022-07-04 0.0 DASHING ONYX 6 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 67839 Graeme Hall FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.144876 0.262045 0.359113 0.314231 0.321535 0.048944 0.244898 0.377622 0.319292 0.335355 0.067805 0.166667 0.333333 0.277778 0.333333 0.096225 0.602060 0.845098 0.683073 0.602060 0.140318 0.0 0.185009 1.233393e-01 0.185009 0.106815 0.229498 0.359113 0.293047 0.291790 0.058241 0.244898 0.377622 0.318715 0.326170 0.055374 0.142857 0.333333 0.244048 0.250000 0.103555 0.602060 0.903090 0.738077 0.723579 0.158833 0.0 0.185009 0.092505 0.092505 0.106815 0.229498 0.359113 0.293047 0.291790 0.058241 0.244898 0.377622 0.318715 0.326170 0.055374 0.142857 0.333333 0.244048 0.250000 0.103555 0.602060 0.903090 0.738077 0.723579 0.158833 0.0 0.185009 0.092505 0.092505 0.106815 0.096959
WINTER RAIN Cannington 1 637972981 2022-07-04 0.0 WINTER RAIN 3 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 798906744 27464 Jennifer Thompson FREE ENTRY TABTOUCH PARK 01:32PM 275 Maiden 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 16.21 0.777366 5.58 0.0 0.0 0.101754 0.239673 0.371004 0.305338 0.305338 0.092865 0.133803 0.350640 0.242221 0.242221 0.153327 0.166667 0.500000 0.333333 0.333333 0.235702 0.477121 0.845098 0.661110 0.661110 0.260199 0.0 0.000000 0.000000e+00 0.000000 0.000000 0.239673 0.371004 0.305338 0.305338 0.092865 0.133803 0.350640 0.242221 0.242221 0.153327 0.166667 0.500000 0.333333 0.333333 0.235702 0.477121 0.845098 0.661110 0.661110 0.260199 0.0 0.000000 0.000000 0.000000 0.000000 0.239673 0.371004 0.305338 0.305338 0.092865 0.133803 0.350640 0.242221 0.242221 0.153327 0.166667 0.500000 0.333333 0.333333 0.235702 0.477121 0.845098 0.661110 0.661110 0.260199 0.0 0.000000 0.000000 0.000000 0.000000 0.101402

If we predict probabilities and renormalise now, we will calculate incorrect probabilities.

I've spent a really long time thinking about this and testing different methods that didn't work or weren't optimal. The best solution (and least complicated) that I have come up with is to predict probabilities on the FastTrack data first. Then a few minutes before the jump when all the lineups have been confirmed we use market_catalogue from the Betfair API to merge our predicted probabilities, merging on DogName,Track and RaceNum. If we merge on these three fields, it will bypass any issues with reserve dogs and scratchings. Then we can renormalise probabilities live within Flumine.

# Select todays data 
todays_data = model_df[model_df['date_dt'] == pd.Timestamp.now().strftime('%Y-%m-%d')]

# Generate runner win predictions
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities

# We no longer renomralise probability in this chunk of code, do it in Flumine instead
# todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
# todays_data['rating'] = 1/todays_data['renormalise_prob']
# todays_data = todays_data.sort_values(by = 'date_dt')

todays_data
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/2638603781.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['prob_LogisticRegression'] = dog_win_probabilities

FastTrack_DogId date_dt Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney FastTrack_RaceId TrainerId TrainerName RaceNum RaceName RaceTime Distance RaceGrade Track date StartPrice_probability win Prizemoney_norm Place_inv Place_log RunSpeed RunTime_median speed_index SplitMargin_median RunTime_norm SplitMargin_norm box_win_percent RunTime_norm_min_28D RunTime_norm_max_28D RunTime_norm_mean_28D RunTime_norm_median_28D RunTime_norm_std_28D SplitMargin_norm_min_28D SplitMargin_norm_max_28D SplitMargin_norm_mean_28D SplitMargin_norm_median_28D SplitMargin_norm_std_28D Place_inv_min_28D Place_inv_max_28D Place_inv_mean_28D Place_inv_median_28D Place_inv_std_28D Place_log_min_28D Place_log_max_28D Place_log_mean_28D Place_log_median_28D Place_log_std_28D Prizemoney_norm_min_28D Prizemoney_norm_max_28D Prizemoney_norm_mean_28D Prizemoney_norm_median_28D Prizemoney_norm_std_28D RunTime_norm_min_91D RunTime_norm_max_91D RunTime_norm_mean_91D RunTime_norm_median_91D RunTime_norm_std_91D SplitMargin_norm_min_91D SplitMargin_norm_max_91D SplitMargin_norm_mean_91D SplitMargin_norm_median_91D SplitMargin_norm_std_91D Place_inv_min_91D Place_inv_max_91D Place_inv_mean_91D Place_inv_median_91D Place_inv_std_91D Place_log_min_91D Place_log_max_91D Place_log_mean_91D Place_log_median_91D Place_log_std_91D Prizemoney_norm_min_91D Prizemoney_norm_max_91D Prizemoney_norm_mean_91D Prizemoney_norm_median_91D Prizemoney_norm_std_91D RunTime_norm_min_365D RunTime_norm_max_365D RunTime_norm_mean_365D RunTime_norm_median_365D RunTime_norm_std_365D SplitMargin_norm_min_365D SplitMargin_norm_max_365D SplitMargin_norm_mean_365D SplitMargin_norm_median_365D SplitMargin_norm_std_365D Place_inv_min_365D Place_inv_max_365D Place_inv_mean_365D Place_inv_median_365D Place_inv_std_365D Place_log_min_365D Place_log_max_365D Place_log_mean_365D Place_log_median_365D Place_log_std_365D Prizemoney_norm_min_365D Prizemoney_norm_max_365D Prizemoney_norm_mean_365D Prizemoney_norm_median_365D Prizemoney_norm_std_365D prob_LogisticRegression
44514 148673258 2022-07-04 0.0 SPEEDY MARINA 5 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455740 65928 Dawn Lee 5 LADBROKES BLENDED BETS 1-3 WIN 04:37PM 307 Grade 5 Bathurst 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 17.830 0.628105 7.880 0.0 0.0 0.136986 0.317232 0.317232 0.317232 0.317232 0.000000 0.173267 0.173267 0.173267 0.173267 0.000000 0.333333 0.333333 0.333333 0.333333 0.000000 0.602060 0.602060 6.020600e-01 0.602060 0.000000 0.212109 0.212109 2.121089e-01 0.212109 0.000000e+00 0.317232 0.332373 0.324803 0.324803 0.010706 0.173267 0.210579 0.191923 0.191923 0.026383 0.166667 0.333333 0.250000 0.250000 0.117851 0.602060 0.845098 7.235790e-01 0.723579 0.171854 0.000000 0.212109 0.106054 0.106054 0.149984 0.236982 0.371585 0.306603 0.317232 0.052095 0.173267 0.406600 0.242140 0.210579 0.096894 0.125000 0.333333 0.186905 0.166667 0.083715 0.602060 0.954243 8.299177e-01 0.845098 0.135269 0.000000 0.212109 0.042422 0.000000 0.094858 0.073607
54463 161977365 2022-07-04 0.0 FILTHY PHANTOM 7 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801448232 110385 Tony Hinrichsen 5 GIDDY-UP (N/P) STAKE 07:27PM 342 Masters Angle Park 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 19.675 0.533632 7.730 0.0 0.0 0.124590 0.386907 0.455919 0.427165 0.420839 0.028492 0.314004 0.500000 0.408769 0.394027 0.082256 0.166667 0.333333 0.220000 0.200000 0.064979 0.602060 0.845098 7.563224e-01 0.778151 0.090977 0.000000 0.182760 3.655208e-02 0.000000 8.173293e-02 0.386907 0.572783 0.449667 0.436026 0.053935 0.210921 0.551665 0.403088 0.394027 0.092070 0.142857 1.000000 0.291209 0.200000 0.235613 0.000000 0.903090 6.692431e-01 0.778151 0.257193 0.000000 0.249855 0.057150 0.000000 0.095131 0.277002 0.572783 0.459854 0.456897 0.059514 0.210921 0.616496 0.443040 0.443820 0.089192 0.142857 1.000000 0.464354 0.333333 0.320517 0.000000 0.903090 5.716763e-01 0.602060 0.227821 0.000000 0.249855 0.121936 0.180363 0.106520 0.128265
77950 196384049 2022-07-04 0.0 HOUND 'EM DOWN 7 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801490787 313281 Steven Winstanley 3 ZIPPING GARTH @ STUD 0-2 WIN 07:36PM 565 Mixed Maiden and Grade Five Maitland 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 31.875 0.342029 13.795 0.0 0.0 0.200000 0.233442 0.277637 0.261405 0.273136 0.024321 0.218310 0.316690 0.270554 0.276662 0.049474 0.142857 0.200000 0.161905 0.142857 0.032991 0.778151 0.903090 8.614437e-01 0.903090 0.072133 0.000000 0.000000 4.440892e-16 0.000000 0.000000e+00 0.194132 0.355912 0.279655 0.277637 0.046001 0.218310 0.316690 0.271078 0.275180 0.032574 0.125000 0.200000 0.158862 0.142857 0.026641 0.778151 0.954243 8.681223e-01 0.903090 0.060784 0.000000 0.000000 0.000000 0.000000 0.000000 0.194132 0.462627 0.328446 0.316396 0.054098 0.218310 0.346154 0.292413 0.296715 0.036166 0.125000 0.333333 0.187245 0.166667 0.054481 0.000000 0.954243 7.520173e-01 0.845098 0.239572 0.000000 0.212109 0.014373 0.000000 0.050118 0.070298
121171 230053393 2022-07-04 0.0 CAWBOURNE CROSS 3 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801448232 104699 Lisa Rasmussen 5 GIDDY-UP (N/P) STAKE 07:27PM 342 Masters Angle Park 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 19.675 0.533632 7.730 0.0 0.0 0.169811 0.336395 0.413836 0.386878 0.410402 0.043753 0.500000 0.500000 0.500000 0.500000 0.000000 0.142857 0.250000 0.186508 0.166667 0.056260 0.698970 0.903090 8.157193e-01 0.845098 0.105184 0.000000 0.000000 3.700743e-16 0.000000 2.533726e-17 0.336395 0.491121 0.420107 0.412119 0.059452 0.401747 0.506477 0.480510 0.500000 0.044239 0.142857 0.500000 0.273810 0.250000 0.129975 0.477121 0.903090 7.042182e-01 0.698970 0.155860 0.000000 0.188140 0.058566 0.000000 0.091070 0.280126 0.505631 0.403460 0.405048 0.064662 0.329857 0.610664 0.472084 0.500000 0.079724 0.142857 0.500000 0.269048 0.225000 0.125458 0.477121 0.903090 7.115827e-01 0.738561 0.144209 0.000000 0.205324 0.060368 0.000000 0.090871 0.179535
142478 243599770 2022-07-04 0.0 SKAIKRU 1 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455741 83214 Robert Sonter 6 BATHURST RSL CLUB 04:59PM 450 Grade 5 Bathurst 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 26.000 0.576406 15.450 0.0 0.0 0.159574 0.392354 0.392354 0.392354 0.392354 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 6.989700e-01 0.698970 0.000000 0.000000 0.000000 1.998401e-15 0.000000 0.000000e+00 0.293318 0.392354 0.329286 0.302186 0.054798 0.238956 0.238956 0.238956 0.238956 0.000000 0.142857 0.333333 0.242063 0.250000 0.095486 0.602060 0.903090 7.347067e-01 0.698970 0.153664 0.000000 0.167027 0.055676 0.000000 0.096433 0.285293 0.392354 0.317943 0.303341 0.036066 0.162722 0.258454 0.223927 0.237266 0.042031 0.125000 0.333333 0.202551 0.200000 0.070985 0.602060 0.954243 7.942519e-01 0.778151 0.120113 0.000000 0.167027 0.023861 0.000000 0.063130 0.092948
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
599603 673011364 2022-07-04 0.0 FERAL AGENT 8 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455737 96421 Derek Kerr 2 ZIPPING GARTH @ STUD MAIDEN 03:28PM 307 Maiden Bathurst 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 17.830 0.628105 7.880 0.0 0.0 0.177215 0.409480 0.425999 0.418593 0.420299 0.008391 0.411700 0.431973 0.421836 0.421836 0.014335 0.250000 1.000000 0.527778 0.333333 0.411074 0.301030 0.698970 5.340200e-01 0.602060 0.207512 0.000000 0.231573 1.478939e-01 0.212109 1.284491e-01 0.409480 0.425999 0.418593 0.420299 0.008391 0.411700 0.431973 0.421836 0.421836 0.014335 0.250000 1.000000 0.527778 0.333333 0.411074 0.301030 0.698970 5.340200e-01 0.602060 0.207512 0.000000 0.231573 0.147894 0.212109 0.128449 0.409480 0.425999 0.418593 0.420299 0.008391 0.411700 0.431973 0.421836 0.421836 0.014335 0.250000 1.000000 0.527778 0.333333 0.411074 0.301030 0.698970 5.340200e-01 0.602060 0.207512 0.000000 0.231573 0.147894 0.212109 0.128449 0.194287
599956 694776805 2022-07-04 0.0 WENDY MAREE 3 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801490818 307271 Brian Baker 2 GARRARD'S HORSE AND HOUND 07:11PM 520 Novice Non Penalty Albion Park 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 30.220 0.634509 5.630 0.0 0.0 0.136417 0.325934 0.410627 0.368281 0.368281 0.059887 0.168325 0.336770 0.252547 0.252547 0.119108 0.200000 0.250000 0.225000 0.225000 0.035355 0.698970 0.778151 7.385606e-01 0.738561 0.055990 0.110185 0.193690 1.519376e-01 0.151938 5.904714e-02 0.325934 0.410627 0.366778 0.363772 0.042426 0.168325 0.344322 0.283139 0.336770 0.099504 0.200000 0.250000 0.216667 0.200000 0.028868 0.698970 0.778151 7.517575e-01 0.778151 0.045715 0.110185 0.193690 0.138020 0.110185 0.048212 0.305980 0.410627 0.355981 0.361146 0.036561 0.168325 0.344322 0.274549 0.279411 0.067105 0.142857 0.333333 0.221032 0.200000 0.064636 0.602060 0.903090 7.564290e-01 0.778151 0.100056 0.110185 0.218690 0.142187 0.110185 0.050203 0.123793
600100 707214702 2022-07-04 0.0 WARDEN JODIE Res. 10 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455737 75264 Sam Simonetta 2 ZIPPING GARTH @ STUD MAIDEN 03:28PM 307 Maiden Bathurst 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 17.830 0.628105 7.880 0.0 0.0 0.000000 0.146202 0.146202 0.146202 0.146202 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.125000 0.125000 0.125000 0.125000 0.000000 0.000000 0.954243 1.908485e-01 0.000000 0.426750 0.000000 0.000000 0.000000e+00 0.000000 0.000000e+00 0.146202 0.146202 0.146202 0.146202 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.125000 0.125000 0.125000 0.125000 0.000000 0.000000 0.954243 1.908485e-01 0.000000 0.426750 0.000000 0.000000 0.000000 0.000000 0.000000 0.146202 0.146202 0.146202 0.146202 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.125000 0.125000 0.125000 0.125000 0.000000 0.000000 0.954243 1.908485e-01 0.000000 0.426750 0.000000 0.000000 0.000000 0.000000 0.000000 0.014319
600105 707215693 2022-07-04 0.0 MYSTERY ANNE 3 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455737 75264 Sam Simonetta 2 ZIPPING GARTH @ STUD MAIDEN 03:28PM 307 Maiden Bathurst 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 17.830 0.628105 7.880 0.0 0.0 0.125874 0.331265 0.331265 0.331265 0.331265 0.000000 0.473776 0.473776 0.473776 0.473776 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 6.989700e-01 0.698970 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000e+00 0.331265 0.331265 0.331265 0.331265 0.000000 0.473776 0.473776 0.473776 0.473776 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 6.989700e-01 0.698970 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.331265 0.331265 0.331265 0.331265 0.000000 0.473776 0.473776 0.473776 0.473776 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 6.989700e-01 0.698970 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.090248
600106 707215696 2022-07-04 0.0 BROKEN PROMISES 2 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455736 75264 Sam Simonetta 1 WELCOME GBOTA MAIDEN 03:07PM 450 Maiden Bathurst 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 26.000 0.576406 15.450 0.0 0.0 0.139785 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.708944e-14 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.642331e-14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.842171e-14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.060767

1704 rows × 115 columns

Before we merge, let's do some minor formatting changes to the FastTrack names so we can match onto the Betfair names. Betfair excludes all apostrophes and full stops in their naming convention, so we'll create a Betfair equivalent dog name on the dataset removing these characters. We also need to do this for the tracks, sometimes FastTrack will name tracks differently to Betfair e.g., Sandown Park from Betfair is known as Sandown (SAP) in the FastTrack database.

# Prepare data for easy reference in flumine
todays_data['DogName_bf'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
todays_data.replace({'Sandown (SAP)': 'Sandown Park'}, regex=True, inplace=True)
todays_data = todays_data.set_index(['DogName_bf','Track','RaceNum'])
todays_data.head()
C:\Users\zhoui\AppData\Local\Temp/ipykernel_25584/90992895.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  todays_data['DogName_bf'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())

FastTrack_DogId date_dt Place DogName Box Rug Weight StartPrice Handicap Margin1 Margin2 PIR Checks Comments SplitMargin RunTime Prizemoney FastTrack_RaceId TrainerId TrainerName RaceName RaceTime Distance RaceGrade date StartPrice_probability win Prizemoney_norm Place_inv Place_log RunSpeed RunTime_median speed_index SplitMargin_median RunTime_norm SplitMargin_norm box_win_percent RunTime_norm_min_28D RunTime_norm_max_28D RunTime_norm_mean_28D RunTime_norm_median_28D RunTime_norm_std_28D SplitMargin_norm_min_28D SplitMargin_norm_max_28D SplitMargin_norm_mean_28D SplitMargin_norm_median_28D SplitMargin_norm_std_28D Place_inv_min_28D Place_inv_max_28D Place_inv_mean_28D Place_inv_median_28D Place_inv_std_28D Place_log_min_28D Place_log_max_28D Place_log_mean_28D Place_log_median_28D Place_log_std_28D Prizemoney_norm_min_28D Prizemoney_norm_max_28D Prizemoney_norm_mean_28D Prizemoney_norm_median_28D Prizemoney_norm_std_28D RunTime_norm_min_91D RunTime_norm_max_91D RunTime_norm_mean_91D RunTime_norm_median_91D RunTime_norm_std_91D SplitMargin_norm_min_91D SplitMargin_norm_max_91D SplitMargin_norm_mean_91D SplitMargin_norm_median_91D SplitMargin_norm_std_91D Place_inv_min_91D Place_inv_max_91D Place_inv_mean_91D Place_inv_median_91D Place_inv_std_91D Place_log_min_91D Place_log_max_91D Place_log_mean_91D Place_log_median_91D Place_log_std_91D Prizemoney_norm_min_91D Prizemoney_norm_max_91D Prizemoney_norm_mean_91D Prizemoney_norm_median_91D Prizemoney_norm_std_91D RunTime_norm_min_365D RunTime_norm_max_365D RunTime_norm_mean_365D RunTime_norm_median_365D RunTime_norm_std_365D SplitMargin_norm_min_365D SplitMargin_norm_max_365D SplitMargin_norm_mean_365D SplitMargin_norm_median_365D SplitMargin_norm_std_365D Place_inv_min_365D Place_inv_max_365D Place_inv_mean_365D Place_inv_median_365D Place_inv_std_365D Place_log_min_365D Place_log_max_365D Place_log_mean_365D Place_log_median_365D Place_log_std_365D Prizemoney_norm_min_365D Prizemoney_norm_max_365D Prizemoney_norm_mean_365D Prizemoney_norm_median_365D Prizemoney_norm_std_365D prob_LogisticRegression
DogName_bf Track RaceNum
SPEEDY MARINA Bathurst 5 148673258 2022-07-04 0.0 SPEEDY MARINA 5 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455740 65928 Dawn Lee LADBROKES BLENDED BETS 1-3 WIN 04:37PM 307 Grade 5 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 17.830 0.628105 7.880 0.0 0.0 0.136986 0.317232 0.317232 0.317232 0.317232 0.000000 0.173267 0.173267 0.173267 0.173267 0.000000 0.333333 0.333333 0.333333 0.333333 0.000000 0.602060 0.602060 0.602060 0.602060 0.000000 0.212109 0.212109 2.121089e-01 0.212109 0.000000e+00 0.317232 0.332373 0.324803 0.324803 0.010706 0.173267 0.210579 0.191923 0.191923 0.026383 0.166667 0.333333 0.250000 0.250000 0.117851 0.602060 0.845098 0.723579 0.723579 0.171854 0.0 0.212109 0.106054 0.106054 0.149984 0.236982 0.371585 0.306603 0.317232 0.052095 0.173267 0.406600 0.242140 0.210579 0.096894 0.125000 0.333333 0.186905 0.166667 0.083715 0.602060 0.954243 0.829918 0.845098 0.135269 0.0 0.212109 0.042422 0.000000 0.094858 0.073607
FILTHY PHANTOM Angle Park 5 161977365 2022-07-04 0.0 FILTHY PHANTOM 7 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801448232 110385 Tony Hinrichsen GIDDY-UP (N/P) STAKE 07:27PM 342 Masters 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 19.675 0.533632 7.730 0.0 0.0 0.124590 0.386907 0.455919 0.427165 0.420839 0.028492 0.314004 0.500000 0.408769 0.394027 0.082256 0.166667 0.333333 0.220000 0.200000 0.064979 0.602060 0.845098 0.756322 0.778151 0.090977 0.000000 0.182760 3.655208e-02 0.000000 8.173293e-02 0.386907 0.572783 0.449667 0.436026 0.053935 0.210921 0.551665 0.403088 0.394027 0.092070 0.142857 1.000000 0.291209 0.200000 0.235613 0.000000 0.903090 0.669243 0.778151 0.257193 0.0 0.249855 0.057150 0.000000 0.095131 0.277002 0.572783 0.459854 0.456897 0.059514 0.210921 0.616496 0.443040 0.443820 0.089192 0.142857 1.000000 0.464354 0.333333 0.320517 0.000000 0.903090 0.571676 0.602060 0.227821 0.0 0.249855 0.121936 0.180363 0.106520 0.128265
HOUND EM DOWN Maitland 3 196384049 2022-07-04 0.0 HOUND 'EM DOWN 7 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801490787 313281 Steven Winstanley ZIPPING GARTH @ STUD 0-2 WIN 07:36PM 565 Mixed Maiden and Grade Five 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 31.875 0.342029 13.795 0.0 0.0 0.200000 0.233442 0.277637 0.261405 0.273136 0.024321 0.218310 0.316690 0.270554 0.276662 0.049474 0.142857 0.200000 0.161905 0.142857 0.032991 0.778151 0.903090 0.861444 0.903090 0.072133 0.000000 0.000000 4.440892e-16 0.000000 0.000000e+00 0.194132 0.355912 0.279655 0.277637 0.046001 0.218310 0.316690 0.271078 0.275180 0.032574 0.125000 0.200000 0.158862 0.142857 0.026641 0.778151 0.954243 0.868122 0.903090 0.060784 0.0 0.000000 0.000000 0.000000 0.000000 0.194132 0.462627 0.328446 0.316396 0.054098 0.218310 0.346154 0.292413 0.296715 0.036166 0.125000 0.333333 0.187245 0.166667 0.054481 0.000000 0.954243 0.752017 0.845098 0.239572 0.0 0.212109 0.014373 0.000000 0.050118 0.070298
CAWBOURNE CROSS Angle Park 5 230053393 2022-07-04 0.0 CAWBOURNE CROSS 3 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801448232 104699 Lisa Rasmussen GIDDY-UP (N/P) STAKE 07:27PM 342 Masters 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 19.675 0.533632 7.730 0.0 0.0 0.169811 0.336395 0.413836 0.386878 0.410402 0.043753 0.500000 0.500000 0.500000 0.500000 0.000000 0.142857 0.250000 0.186508 0.166667 0.056260 0.698970 0.903090 0.815719 0.845098 0.105184 0.000000 0.000000 3.700743e-16 0.000000 2.533726e-17 0.336395 0.491121 0.420107 0.412119 0.059452 0.401747 0.506477 0.480510 0.500000 0.044239 0.142857 0.500000 0.273810 0.250000 0.129975 0.477121 0.903090 0.704218 0.698970 0.155860 0.0 0.188140 0.058566 0.000000 0.091070 0.280126 0.505631 0.403460 0.405048 0.064662 0.329857 0.610664 0.472084 0.500000 0.079724 0.142857 0.500000 0.269048 0.225000 0.125458 0.477121 0.903090 0.711583 0.738561 0.144209 0.0 0.205324 0.060368 0.000000 0.090871 0.179535
SKAIKRU Bathurst 6 243599770 2022-07-04 0.0 SKAIKRU 1 0 0 0.0 0.0 0 0 0 0 0 0.0 0.0 0.0 801455741 83214 Robert Sonter BATHURST RSL CLUB 04:59PM 450 Grade 5 04 Jul 22 0.0 0 0.0 inf 0.0 0.0 26.000 0.576406 15.450 0.0 0.0 0.159574 0.392354 0.392354 0.392354 0.392354 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.250000 0.250000 0.250000 0.250000 0.000000 0.698970 0.698970 0.698970 0.698970 0.000000 0.000000 0.000000 1.998401e-15 0.000000 0.000000e+00 0.293318 0.392354 0.329286 0.302186 0.054798 0.238956 0.238956 0.238956 0.238956 0.000000 0.142857 0.333333 0.242063 0.250000 0.095486 0.602060 0.903090 0.734707 0.698970 0.153664 0.0 0.167027 0.055676 0.000000 0.096433 0.285293 0.392354 0.317943 0.303341 0.036066 0.162722 0.258454 0.223927 0.237266 0.042031 0.125000 0.333333 0.202551 0.200000 0.070985 0.602060 0.954243 0.794252 0.778151 0.120113 0.0 0.167027 0.023861 0.000000 0.063130 0.092948

If you look closely at the data frame above you might notice that for reserve dogs, they will have a Box number of 9 or 10. There is only ever a max of 8 greyhounds per race therefore we will need to adjust it somehow. I didn't notice this issue for quite a while, but the good thing is the website gives us the info we need to adjust:

reserve_dog

We can see that Rhinestone Ash is a reserve dog and has the number 9, if you click on rules, you can see what Box it is starting from:

reserve_dog_2

The problem is, my webscraping is pretty poor, and it would take significant time for me to learn it. But after going through the documentation again, changes to boxes are actually available through the API under the clarifications attribute of marketDescription. You will be able to access this within Flumine as market.market_catalogue.description.clarifications, but it's a bit weird. It returns box changes as a string that looks like this: box_changes

Originally I had planned to leave this article as it is since, I've never worked with anything like this before and its already getting pretty long, however huge shoutout to Betfair Quants community and especially Brett who provided his solution to working with box changes.

from nltk.tokenize import regexp_tokenize
# my_string is an example string, that you will need to get live from the api via: market.market_catalogue.description.clarifications.replace("<br/> Dog","<br/>Dog")
my_string = "<br/>Box changes:<br/>Dog 9. Tralee Blaze starts from box no. 8<br/><br/>Dog 6. That Other One starts from box no. 2<br/><br/>"
print(f'HTML Comment: {my_string}')
pattern1 = r'(?&lt;=<br/>Dog ).+?(?= starts)'
pattern2 = r"(?&lt;=\bbox no. )(\w+)"
runners_df = pd.DataFrame (regexp_tokenize(my_string, pattern1), columns = ['runner_name'])
runners_df['runner_name'] = runners_df['runner_name'].astype(str)
# Remove dog name from runner_number
runners_df['runner_number'] = runners_df['runner_name'].apply(lambda x: x[:(x.find(" ") - 1)].upper())
# Remove dog number from runner_name
runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
runners_df['Box'] = regexp_tokenize(my_string, pattern2)
runners_df
HTML Comment: <br>Box changes:<br>Dog 9. Tralee Blaze starts from box no. 8<br><br>Dog 6. That Other One starts from box no. 2<br><br>

runner_name runner_number Box
0 TRALEE BLAZE 9 8
1 THAT OTHER ONE 6 2

Brett's solution is amazing, there is only one problem, currently our code is structured so that we generate our predictions in the morning well before the race starts. To implement the above fix, we need to generate our predictions just before the race starts to incorporate the Box information.

This means we need to write a little bit more code to make it happen, but we are almost there.

So now my plan to update the old data and generate probabilities just before the race. So now just before the jump my code structure will look like this:

  • pull any data on box changes from the Betfair API
  • convert the box change data into a dataframe named runners_df using the Brett's code
  • in my original dataframe named todays_data replace any Box data with runners_df data, otherwise leave it untouched
  • then merge the box_win_percent dataframe back onto the todays_data dataframe
  • now we can predict probabilities again and then renormalise them

It may sound a little complicated but as we already have Brett's code there is only a few extra lines of code we need to write. This is what we will add into our Flumine strategy along with Brett's code:

# Running Brett's code gives us a nice dataframe named runners_df that we can work with
# Replace any old Box info in our original dataframe with data available in 
runners_df = runners_df.set_index('runner_name')
todays_data.loc[(runners_df.index[runners_df.index.isin(dog_names)],track,race_number),'Box'] = runners_df.loc[runners_df.index.isin(dog_names),'Box'].to_list()
# Merge box_win_percent data onto todays_data
todays_data = todays_data.merge(box_win_percent, on=['Track', 'Distance', 'Box'], how='left')
# Merge box_win_percentage back on:
todays_data = todays_data.drop(columns = 'box_win_percentage', axis = 1)
todays_data = todays_data.merge(box_win_percent, on = ['Track', 'Distance','Box'], how = 'left')

# Generate probabilities using Bruno's model
todays_data.loc[(dog_names,track,race_number),'prob_LogisticRegression'] = brunos_model.predict_proba(todays_data.loc[(dog_names,track,race_number)][feature_cols])[:,1]
# renomalise probabilities
probabilities = todays_data.loc[dog_names,track,race_number]['prob_LogisticRegression']
todays_data.loc[(dog_names,track,race_number),'renormalised_prob'] = probabilities/probabilities.sum()
# convert probaiblities to ratings
todays_data.loc[(dog_names,track,race_number),'rating'] = 1/todays_data.loc[dog_names,track,race_number]['renormalised_prob']

Now everything is done, and we can finally move onto placing our bets


Automating our predictions

Now that we have our data nicely set up. We can reference our probabilities by getting the DogName, Track and RaceNum from the Betfair polling API and then renormalised probabilities to calculate ratings with only a few lines of code. Then the rest is the same as How to Automate III

# Import libraries for logging in
import betfairlightweight
from flumine import Flumine, clients

# Credentials to login and logging in 
trading = betfairlightweight.APIClient('username','password',app_key='appkey')
client = clients.BetfairClient(trading, interactive_login=True)

# Login
framework = Flumine(client=client)

# Code to login when using security certificates
# trading = betfairlightweight.APIClient('username','password',app_key='appkey', certs=r'C:\Users\zhoui\openssl_certs')
# client = clients.BetfairClient(trading)

# framework = Flumine(client=client)
# Import libraries and logging
from flumine import BaseStrategy 
from flumine.order.trade import Trade
from flumine.order.order import LimitOrder
from flumine.markets.market import Market
from betfairlightweight.filters import streaming_market_filter
from betfairlightweight.resources import MarketBook
import re
import pandas as pd
import numpy as np
import datetime
import logging
logging.basicConfig(filename = 'how_to_automate_4.log', level=logging.INFO, format='%(asctime)s:%(levelname)s:%(message)s')

Let's create a new class for our strategy called FlatBetting that finds the best available to back and lay price 60 seconds before the jump. If any of those prices have value, we will place a flat bet for $5 at those prices. This code is almost the same as How to Automate III

Since we are now editing our todays_data dataframe inside our Flumine strategy we will also need to convert todays_data to a global variable which is a simple one liner:

global todays_data

I also wanted to call out one gotcha that, Brett found that is almost impossible to find unless you are keeping a close eye on your logs. Sometimes the polling API and streaming API doesn't match up when there are scratchings, so we need to check if it does:

# Check the polling API and streaming API matches up (sometimes it doesn't)
if runner_cata.selection_id == runner.selection_id:

class FlatBetting(BaseStrategy):
    def start(self) -&gt; None:
        print("starting strategy 'FlatBetting' using the model we created the Greyhound modelling in Python Tutorial")

    def check_market_book(self, market: Market, market_book: MarketBook) -&gt; bool:
        if market_book.status != "CLOSED":
            return True

    def process_market_book(self, market: Market, market_book: MarketBook) -&gt; None:
        # Convert dataframe to a global variable
        global todays_data

        # At the 60 second mark:
        if market.seconds_to_start &lt; 60 and market_book.inplay == False:
            # get the list of dog_names, name of the track/venue and race_number/RaceNum from Betfair Polling API
            dog_names = []
            track = market.market_catalogue.event.venue
            race_number = market.market_catalogue.market_name.split(' ',1)[0]  # comes out as R1/R2/R3 .. etc
            race_number = re.sub("[^0-9]", "", race_number)  # only keep the numbers 
            for runner_cata in market.market_catalogue.runners:
                dog_name = runner_cata.runner_name.split(' ',1)[1].upper()
                dog_names.append(dog_name)

            # Check if there are box changes, if there are then use Brett's code
            if market.market_catalogue.description.clarifications != None:
                # Brett's code to get Box changes:
                my_string = market.market_catalogue.description.clarifications.replace("<br/> Dog","<br/>Dog")
                pattern1 = r'(?&lt;=<br/>Dog ).+?(?= starts)'
                pattern2 = r"(?&lt;=\bbox no. )(\w+)"
                runners_df = pd.DataFrame (regexp_tokenize(my_string, pattern1), columns = ['runner_name'])
                runners_df['runner_name'] = runners_df['runner_name'].astype(str)
                # Remove dog name from runner_number
                runners_df['runner_number'] = runners_df['runner_name'].apply(lambda x: x[:(x.find(" ") - 1)].upper())
                # Remove dog number from runner_name
                runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
                runners_df['Box'] = regexp_tokenize(my_string, pattern2)

                # Replace any old Box info in our original dataframe with data available in runners_df
                runners_df = runners_df.set_index('runner_name')
                todays_data.loc[(runners_df.index[runners_df.index.isin(dog_names)],track,race_number),'Box'] = runners_df.loc[runners_df.index.isin(dog_names),'Box'].to_list()
                # Merge box_win_percentage back on:
                todays_data = todays_data.drop(columns = 'box_win_percentage', axis = 1)
                todays_data = todays_data.reset_index().merge(box_win_percent, on = ['Track', 'Distance','Box'], how = 'left').set_index(['DogName_bf','Track','RaceNum'])

            # Generate probabilities using Bruno's model
            todays_data.loc[(dog_names,track,race_number),'prob_LogisticRegression'] = brunos_model.predict_proba(todays_data.loc[(dog_names,track,race_number)][feature_cols])[:,1]
            # renomalise probabilities
            probabilities = todays_data.loc[dog_names,track,race_number]['prob_LogisticRegression']
            todays_data.loc[(dog_names,track,race_number),'renormalised_prob'] = probabilities/probabilities.sum()
            # convert probaiblities to ratings
            todays_data.loc[(dog_names,track,race_number),'rating'] = 1/todays_data.loc[dog_names,track,race_number]['renormalised_prob']

            # Use both the polling api (market.catalogue) and the streaming api at once:
            for runner_cata, runner in zip(market.market_catalogue.runners, market_book.runners):
                # Check the polling api and streaming api matches up (sometimes it doesn't)
                if runner_cata.selection_id == runner.selection_id:
                    # Get the dog_name from polling api then reference our data for our model rating
                    dog_name = runner_cata.runner_name.split(' ',1)[1].upper()

                    # Rest is the same as How to Automate III
                    model_price = todays_data.loc[dog_name,track,race_number]['rating']
                    ### If you have an issue such as:
                        # Unknown error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
                        # Then do model_price = todays_data.loc[dog_name,track,race_number]['rating'].item()

                    # Log info before placing bets
                    logging.info(f'dog_name: {dog_name}')
                    logging.info(f'model_price: {model_price}')
                    logging.info(f'market_id: {market_book.market_id}')
                    logging.info(f'selection_id: {runner.selection_id}')

                    # If best available to back price is &gt; rated price then flat $5 back
                    if runner.status == "ACTIVE" and runner.ex.available_to_back[0]['price'] &gt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="BACK", order_type=LimitOrder(price=runner.ex.available_to_back[0]['price'], size=5.00)
                        )
                        market.place_order(order)
                    # If best available to lay price is &lt; rated price then flat $5 lay
                    if runner.status == "ACTIVE" and runner.ex.available_to_lay[0]['price'] &lt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="LAY", order_type=LimitOrder(price=runner.ex.available_to_lay[0]['price'], size=5.00)
                        )
                        market.place_order(order)

As the model we have built is a greyhound model for Australian racing let's point our strategy to Australian greyhound win markets

greyhounds_strategy = FlatBetting(
    market_filter=streaming_market_filter(
        event_type_ids=["4339"], # Greyhounds markets
        country_codes=["AU"], # Australian markets
        market_types=["WIN"], # Win markets
    ),
    max_order_exposure= 50, # Max exposure per order = 50
    max_trade_count=1, # Max 1 trade per selection
    max_live_trade_count=1, # Max 1 unmatched trade per selection
)

framework.add_strategy(greyhounds_strategy)

And add our auto-terminate and bet logging from the previous tutorials:

# import logging
import datetime
from flumine.worker import BackgroundWorker
from flumine.events.events import TerminationEvent

# logger = logging.getLogger(__name__)

"""
Worker can be used as followed:
    framework.add_worker(
        BackgroundWorker(
            framework,
            terminate,
            func_kwargs={"today_only": True, "seconds_closed": 1200},
            interval=60,
            start_delay=60,
        )
    )
This will run every 60s and will terminate 
the framework if all markets starting 'today' 
have been closed for at least 1200s
"""


# Function that stops automation running at the end of the day
def terminate(
    context: dict, flumine, today_only: bool = True, seconds_closed: int = 600
) -&gt; None:
    """terminate framework if no markets
    live today.
    """
    markets = list(flumine.markets.markets.values())
    markets_today = [
        m
        for m in markets
        if m.market_start_datetime.date() == datetime.datetime.utcnow().date()
        and (
            m.elapsed_seconds_closed is None
            or (m.elapsed_seconds_closed and m.elapsed_seconds_closed &lt; seconds_closed)
        )
    ]
    if today_only:
        market_count = len(markets_today)
    else:
        market_count = len(markets)
    if market_count == 0:
        # logger.info("No more markets available, terminating framework")
        flumine.handler_queue.put(TerminationEvent(flumine))

# Add the stopped to our framework
framework.add_worker(
    BackgroundWorker(
        framework,
        terminate,
        func_kwargs={"today_only": True, "seconds_closed": 1200},
        interval=60,
        start_delay=60,
    )
)
import os
import csv
import logging
from flumine.controls.loggingcontrols import LoggingControl
from flumine.order.ordertype import OrderTypes

logger = logging.getLogger(__name__)

FIELDNAMES = [
    "bet_id",
    "strategy_name",
    "market_id",
    "selection_id",
    "trade_id",
    "date_time_placed",
    "price",
    "price_matched",
    "size",
    "size_matched",
    "profit",
    "side",
    "elapsed_seconds_executable",
    "order_status",
    "market_note",
    "trade_notes",
    "order_notes",
]


class LiveLoggingControl(LoggingControl):
    NAME = "BACKTEST_LOGGING_CONTROL"

    def __init__(self, *args, **kwargs):
        super(LiveLoggingControl, self).__init__(*args, **kwargs)
        self._setup()

    # Changed file path and checks if the file orders_hta_4.csv already exists, if it doens't then create it
    def _setup(self):
        if os.path.exists("orders_hta_4.csv"):
            logging.info("Results file exists")
        else:
            with open("orders_hta_4.csv", "w") as m:
                csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                csv_writer.writeheader()

    def _process_cleared_orders_meta(self, event):
        orders = event.event
        with open("orders_hta_4.csv", "a") as m:
            for order in orders:
                if order.order_type.ORDER_TYPE == OrderTypes.LIMIT:
                    size = order.order_type.size
                else:
                    size = order.order_type.liability
                if order.order_type.ORDER_TYPE == OrderTypes.MARKET_ON_CLOSE:
                    price = None
                else:
                    price = order.order_type.price
                try:
                    order_data = {
                        "bet_id": order.bet_id,
                        "strategy_name": order.trade.strategy,
                        "market_id": order.market_id,
                        "selection_id": order.selection_id,
                        "trade_id": order.trade.id,
                        "date_time_placed": order.responses.date_time_placed,
                        "price": price,
                        "price_matched": order.average_price_matched,
                        "size": size,
                        "size_matched": order.size_matched,
                        "profit": 0 if not order.cleared_order else order.cleared_order.profit,
                        "side": order.side,
                        "elapsed_seconds_executable": order.elapsed_seconds_executable,
                        "order_status": order.status.value,
                        "market_note": order.trade.market_notes,
                        "trade_notes": order.trade.notes_str,
                        "order_notes": order.notes_str,
                    }
                    csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                    csv_writer.writerow(order_data)
                except Exception as e:
                    logger.error(
                        "_process_cleared_orders_meta: %s" % e,
                        extra={"order": order, "error": e},
                    )

        logger.info("Orders updated", extra={"order_count": len(orders)})

    def _process_cleared_markets(self, event):
        cleared_markets = event.event
        for cleared_market in cleared_markets.orders:
            logger.info(
                "Cleared market",
                extra={
                    "market_id": cleared_market.market_id,
                    "bet_count": cleared_market.bet_count,
                    "profit": cleared_market.profit,
                    "commission": cleared_market.commission,
                },
            )

framework.add_logging_control(
    LiveLoggingControl()
)
framework.run()

Conclusion and next steps

Boom! We now have an automated script that will downloads all the data we need in the morning, generates a set of predictions, place flat stakes bets, logs all bets and switches itself off at the end of the day. All we need to do is hit play in the morning!

We have now written code automation code for three different strategies, however we haven't actually backtested any of our strategies or models yet. So for the final part of the How to Automate series we will be writing code to How to simulate the Exchange to backtest and optimise our strategies. Make sure not to miss it as this is where I believe the sauce is made (not that I have made significant sauce).

Complete code

Run the code from your ide by using py <filename>.py, making sure you amend the path to point to your input data.

Download from Github

from joblib import load
import os
import sys

# Allow imports from src folder
module_path = os.path.abspath(os.path.join('../src'))
if module_path not in sys.path:
    sys.path.append(module_path)

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
from dateutil import tz
from pandas.tseries.offsets import MonthEnd
from sklearn.preprocessing import MinMaxScaler
import itertools

import numpy as np
import pandas as pd
from nltk.tokenize import regexp_tokenize

# settings to display all columns
pd.set_option("display.max_columns", None)

import fasttrack as ft

from dotenv import load_dotenv
load_dotenv()

# Import libraries for logging in
import betfairlightweight
from flumine import Flumine, clients
# Import libraries and logging
from flumine import BaseStrategy 
from flumine.order.trade import Trade
from flumine.order.order import LimitOrder
from flumine.markets.market import Market
from betfairlightweight.filters import streaming_market_filter
from betfairlightweight.resources import MarketBook
import re
import pandas as pd
import numpy as np
import datetime
import logging
logging.basicConfig(filename = 'how_to_automate_4.log', level=logging.INFO, format='%(asctime)s:%(levelname)s:%(message)s')

# import logging
from flumine.worker import BackgroundWorker
from flumine.events.events import TerminationEvent

import csv
from flumine.controls.loggingcontrols import LoggingControl
from flumine.order.ordertype import OrderTypes

logger = logging.getLogger(__name__)

brunos_model = load('logistic_regression.joblib')
brunos_model

# Validate FastTrack API connection
api_key = os.getenv('FAST_TRACK_API_KEY',)
client = ft.Fasttrack(api_key)
track_codes = client.listTracks()

# Import race data excluding NZ races
au_tracks_filter = list(track_codes[track_codes['state'] != 'NZ']['track_code'])

# Time window to import data
# First day of the month 46 months back from now
date_from = (datetime.today() - relativedelta(months=46)).replace(day=1).strftime('%Y-%m-%d')
# First day of previous month
date_to = (datetime.today() - relativedelta(months=1)).replace(day=1).strftime('%Y-%m-%d')

# Dataframes to populate data with
race_details = pd.DataFrame()
dog_results = pd.DataFrame()

# For each month, either fetch data from API or use local CSV file if we already have downloaded it
for start in pd.date_range(date_from, date_to, freq='MS'):
    start_date = start.strftime("%Y-%m-%d")
    end_date = (start + MonthEnd(1)).strftime("%Y-%m-%d")
    try:
        filename_races = f'FT_AU_RACES_{start_date}.csv'
        filename_dogs = f'FT_AU_DOGS_{start_date}.csv'

        filepath_races = f'../data/{filename_races}'
        filepath_dogs = f'../data/{filename_dogs}'

        print(f'Loading data from {start_date} to {end_date}')
        if os.path.isfile(filepath_races):
            # Load local CSV file
            month_race_details = pd.read_csv(filepath_races) 
            month_dog_results = pd.read_csv(filepath_dogs) 
        else:
            # Fetch data from API
            month_race_details, month_dog_results = client.getRaceResults(start_date, end_date, au_tracks_filter)
            month_race_details.to_csv(filepath_races, index=False)
            month_dog_results.to_csv(filepath_dogs, index=False)

        # Combine monthly data
        race_details = race_details.append(month_race_details, ignore_index=True)
        dog_results = dog_results.append(month_dog_results, ignore_index=True)
    except:
        print(f'Could not load data from {start_date} to {end_date}')

race_details.tail()

current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

print(f'Start date: {current_month_start_date}')
print(f'End Date: {current_month_end_date}')

# Download data for races that have concluded this current month up untill today
# Start and end dates for current month
current_month_start_date = pd.Timestamp.now().replace(day=1).strftime("%Y-%m-%d")
current_month_end_date = (pd.Timestamp.now().replace(day=1)+ MonthEnd(1))
current_month_end_date = (current_month_end_date - pd.Timedelta('1 day')).strftime("%Y-%m-%d")

# Files names 
filename_races = f'FT_AU_RACES_{current_month_start_date}.csv'
filename_dogs = f'FT_AU_DOGS_{current_month_start_date}.csv'
# Where to store files locally
filepath_races = f'../data/{filename_races}'
filepath_dogs = f'../data/{filename_dogs}'

# Fetch data from API
month_race_details, month_dog_results = client.getRaceResults(current_month_start_date, current_month_end_date, au_tracks_filter)

# Save the files locally and replace any out of date fields
month_race_details.to_csv(filepath_races, index=False)
month_dog_results.to_csv(filepath_dogs, index=False)

dog_results

# This is super important I have spent literally hours before I found out this was causing errors
dog_results['@id'] = pd.to_numeric(dog_results['@id'])

# Append the extra data to our data frames 
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

# Download the data for todays races
todays_date = pd.Timestamp.now().strftime("%Y-%m-%d")
todays_races, todays_dogs = client.getFullFormat(dt= todays_date, tracks = au_tracks_filter)

# display is for ipython notebooks only
# display(todays_races.head(1), todays_dogs.head(1))

# It seems that the todays_races dataframe doesn't have the date column, so let's add that on
todays_races['date'] = pd.Timestamp.now().strftime('%d %b %y')
todays_races.head(1)

# It also seems that in todays_dogs dataframe Box is labeled as RaceBox instead, so let's rename it
# We can also see that there are some specific dogs that have "Res." as a suffix of their name, i.e. they are reserve dogs,
# We will treat this later
todays_dogs = todays_dogs.rename(columns={"RaceBox":"Box"})
todays_dogs.tail(3)

# Appending todays data to this months data
month_dog_results = pd.concat([month_dog_results,todays_dogs],join='outer')[month_dog_results.columns]
month_race_details = pd.concat([month_race_details,todays_races],join='outer')[month_race_details.columns]

# Appending this months data to the rest of our historical data
race_details = race_details.append(month_race_details, ignore_index=True)
dog_results = dog_results.append(month_dog_results, ignore_index=True)

race_details

## Cleanse and normalise the data
# Clean up the race dataset
race_details = race_details.rename(columns = {'@id': 'FastTrack_RaceId'})
race_details['Distance'] = race_details['Distance'].apply(lambda x: int(x.replace("m", "")))
race_details['date_dt'] = pd.to_datetime(race_details['date'], format = '%d %b %y')
# Clean up the dogs results dataset
dog_results = dog_results.rename(columns = {'@id': 'FastTrack_DogId', 'RaceId': 'FastTrack_RaceId'})

# New line of code (rest of this code chunk is copied from bruno's code)
dog_results['FastTrack_DogId'] = pd.to_numeric(dog_results['FastTrack_DogId'])

# Combine dogs results with race attributes
dog_results = dog_results.merge(
    race_details, 
    how = 'left',
    on = 'FastTrack_RaceId'
)

# Convert StartPrice to probability
dog_results['StartPrice'] = dog_results['StartPrice'].apply(lambda x: None if x is None else float(x.replace('$', '').replace('F', '')) if isinstance(x, str) else x)
dog_results['StartPrice_probability'] = (1 / dog_results['StartPrice']).fillna(0)
dog_results['StartPrice_probability'] = dog_results.groupby('FastTrack_RaceId')['StartPrice_probability'].apply(lambda x: x / x.sum())

# Discard entries without results (scratched or did not finish)
dog_results = dog_results[~dog_results['Box'].isnull()]
dog_results['Box'] = dog_results['Box'].astype(int)

# Clean up other attributes
dog_results['RunTime'] = dog_results['RunTime'].astype(float)
dog_results['SplitMargin'] = dog_results['SplitMargin'].astype(float)
dog_results['Prizemoney'] = dog_results['Prizemoney'].astype(float).fillna(0)
dog_results['Place'] = pd.to_numeric(dog_results['Place'].apply(lambda x: x.replace("=", "") if isinstance(x, str) else 0), errors='coerce').fillna(0)
dog_results['win'] = dog_results['Place'].apply(lambda x: 1 if x == 1 else 0)

# Normalise some of the raw values
dog_results['Prizemoney_norm'] = np.log10(dog_results['Prizemoney'] + 1) / 12
dog_results['Place_inv'] = (1 / dog_results['Place']).fillna(0)
dog_results['Place_log'] = np.log10(dog_results['Place'] + 1).fillna(0)
dog_results['RunSpeed'] = (dog_results['RunTime'] / dog_results['Distance']).fillna(0)

## Generate features using raw data
# Calculate median winner time per track/distance
win_results = dog_results[dog_results['win'] == 1]
median_win_time = pd.DataFrame(data=win_results[win_results['RunTime'] &gt; 0].groupby(['Track', 'Distance'])['RunTime'].median()).rename(columns={"RunTime": "RunTime_median"}).reset_index()
median_win_split_time = pd.DataFrame(data=win_results[win_results['SplitMargin'] &gt; 0].groupby(['Track', 'Distance'])['SplitMargin'].median()).rename(columns={"SplitMargin": "SplitMargin_median"}).reset_index()
median_win_time.head()

# Calculate track speed index
median_win_time['speed_index'] = (median_win_time['RunTime_median'] / median_win_time['Distance'])
median_win_time['speed_index'] = MinMaxScaler().fit_transform(median_win_time[['speed_index']])
median_win_time.head()

# Compare dogs finish time with median winner time
dog_results = dog_results.merge(median_win_time, on=['Track', 'Distance'], how='left')
dog_results = dog_results.merge(median_win_split_time, on=['Track', 'Distance'], how='left')

# Normalise time comparison
dog_results['RunTime_norm'] = (dog_results['RunTime_median'] / dog_results['RunTime']).clip(0.9, 1.1)
dog_results['RunTime_norm'] = MinMaxScaler().fit_transform(dog_results[['RunTime_norm']])
dog_results['SplitMargin_norm'] = (dog_results['SplitMargin_median'] / dog_results['SplitMargin']).clip(0.9, 1.1)
dog_results['SplitMargin_norm'] = MinMaxScaler().fit_transform(dog_results[['SplitMargin_norm']])
dog_results.head()

# Calculate box winning percentage for each track/distance
box_win_percent = pd.DataFrame(data=dog_results.groupby(['Track', 'Distance', 'Box'])['win'].mean()).rename(columns={"win": "box_win_percent"}).reset_index()
# Add to dog results dataframe
dog_results = dog_results.merge(box_win_percent, on=['Track', 'Distance', 'Box'], how='left')
# Display example of barrier winning probabilities
print(box_win_percent.head(8))

dog_results[dog_results['FastTrack_DogId'] == 592253143].tail()[['date_dt','Place','DogName','RaceNum','Track','Distance','win','Prizemoney_norm','Place_inv','Place_log']]

# Generate rolling window features
dataset = dog_results.copy()
dataset = dataset.set_index(['FastTrack_DogId', 'date_dt']).sort_index()

# Use rolling window of 28, 91 and 365 days
rolling_windows = ['28D', '91D', '365D']
# Features to use for rolling windows calculation
features = ['RunTime_norm', 'SplitMargin_norm', 'Place_inv', 'Place_log', 'Prizemoney_norm']
# Aggregation functions to apply
aggregates = ['min', 'max', 'mean', 'median', 'std']
# Keep track of generated feature names
feature_cols = ['speed_index', 'box_win_percent']

for rolling_window in rolling_windows:
        print(f'Processing rolling window {rolling_window}')

        rolling_result = (
            dataset
            .reset_index(level=0).sort_index()
            .groupby('FastTrack_DogId')[features]
            .rolling(rolling_window)
            .agg(aggregates)
            .groupby(level=0)  # Thanks to Brett for finding this!
            .shift(1)
        )

        # My own dodgey code to work with reserve dogs
        temp = rolling_result.reset_index()
        temp = temp[temp['date_dt'] == pd.Timestamp.now().normalize()]
        temp.groupby(['FastTrack_DogId','date_dt']).first()
        rolling_result.loc[pd.IndexSlice[:, pd.Timestamp.now().normalize()], :] = temp.groupby(['FastTrack_DogId','date_dt']).first()

        # Generate list of rolling window feature names (eg: RunTime_norm_min_365D)
        agg_features_cols = [f'{f}_{a}_{rolling_window}' for f, a in itertools.product(features, aggregates)]
        # Add features to dataset
        dataset[agg_features_cols] = rolling_result
        # Keep track of generated feature names
        feature_cols.extend(agg_features_cols)

# Replace missing values with 0
dataset.fillna(0, inplace=True)
# display(dataset.head(8))  # display is only for ipython notebooks

# Only keep data after 2018-12-01
model_df = dataset.reset_index()
feature_cols = np.unique(feature_cols).tolist()
model_df = model_df[model_df['date_dt'] &gt;= '2018-12-01']

# This line was originally part of Bruno's tutorial, but we don't run it in this script
# model_df = model_df[['date_dt', 'FastTrack_RaceId', 'DogName', 'win', 'StartPrice_probability'] + feature_cols]

# Only train model off of races where each dog has a value for each feature
races_exclude = model_df[model_df.isnull().any(axis = 1)]['FastTrack_RaceId'].drop_duplicates()
model_df = model_df[~model_df['FastTrack_RaceId'].isin(races_exclude)]

# Generate predictions like normal
# Range of dates that we want to simulate later '2022-03-01' to '2022-04-01'
todays_data = model_df[(model_df['date_dt'] &gt;= pd.Timestamp('2022-03-01').strftime('%Y-%m-%d')) &amp; (model_df['date_dt'] &lt; pd.Timestamp('2022-04-01').strftime('%Y-%m-%d'))]
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities
todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
todays_data['rating'] = 1/todays_data['renormalise_prob']
todays_data = todays_data.sort_values(by = 'date_dt')
todays_data

def download_iggy_ratings(date):
    """Downloads the Betfair Iggy model ratings for a given date and formats it into a nice DataFrame.

    Args:
        date (datetime): the date we want to download the ratings for
    """
    iggy_url_1 = 'https://betfair-data-supplier-prod.herokuapp.com/api/widgets/iggy-joey/datasets?date='
    iggy_url_2 = date.strftime("%Y-%m-%d")
    iggy_url_3 = '&amp;presenter=RatingsPresenter&amp;csv=true'
    iggy_url = iggy_url_1 + iggy_url_2 + iggy_url_3

    # Download todays greyhounds ratings
    iggy_df = pd.read_csv(iggy_url)

    # Data clearning
    iggy_df = iggy_df.rename(
    columns={
        "meetings.races.bfExchangeMarketId":"market_id",
        "meetings.races.runners.bfExchangeSelectionId":"selection_id",
        "meetings.races.runners.ratedPrice":"rating",
        "meetings.races.number":"RaceNum",
        "meetings.name":"Track",
        "meetings.races.runners.name":"DogName"
        }
    )
    # iggy_df = iggy_df[['market_id','selection_id','rating']]
    iggy_df['market_id'] = iggy_df['market_id'].astype(str)
    iggy_df['date_dt'] = date

    # Set market_id and selection_id as index for easy referencing
    # iggy_df = iggy_df.set_index(['market_id','selection_id'])
    return(iggy_df)

# Download historical ratings over a time period and convert into a big DataFrame.
back_test_period = pd.date_range(start='2022-03-01', end='2022-04-01')
frames = [download_iggy_ratings(day) for day in back_test_period]
iggy_df = pd.concat(frames)
iggy_df

# format DogNames to merge
todays_data['DogName'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
iggy_df['DogName'] = iggy_df['DogName'].str.upper()
# Merge
backtest = iggy_df[['market_id','selection_id','DogName','date_dt']].merge(todays_data[['rating','DogName','date_dt']], how = 'inner', on = ['DogName','date_dt'])
backtest

# Save predictions for if we want to backtest/simulate it later
backtest.to_csv('backtest.csv', index=False) # Csv format
# backtest.to_pickle('backtest.pkl') # pickle format (faster, but can't open in excel)

todays_data[todays_data['FastTrack_RaceId'] == '798906744']

# Select todays data 
todays_data = model_df[model_df['date_dt'] == pd.Timestamp.now().strftime('%Y-%m-%d')]

# Generate runner win predictions
dog_win_probabilities = brunos_model.predict_proba(todays_data[feature_cols])[:,1]
todays_data['prob_LogisticRegression'] = dog_win_probabilities

# We no longer renomralise probability in this chunk of code, do it in Flumine instead
# todays_data['renormalise_prob'] = todays_data.groupby('FastTrack_RaceId')['prob_LogisticRegression'].apply(lambda x: x / x.sum())
# todays_data['rating'] = 1/todays_data['renormalise_prob']
# todays_data = todays_data.sort_values(by = 'date_dt')

todays_data

# Prepare data for easy reference in flumine
todays_data['DogName_bf'] = todays_data['DogName'].apply(lambda x: x.replace("'", "").replace(".", "").replace("Res", "").strip())
todays_data.replace({'Sandown (SAP)': 'Sandown Park'}, regex=True, inplace=True)
todays_data = todays_data.set_index(['DogName_bf','Track','RaceNum'])
todays_data.head()

# Credentials to login and logging in 
trading = betfairlightweight.APIClient('username','password',app_key='appkey')
client = clients.BetfairClient(trading, interactive_login=True)

# Login
framework = Flumine(client=client)

# Code to login when using security certificates
# trading = betfairlightweight.APIClient('username','password',app_key='appkey', certs=r'C:\Users\zhoui\openssl_certs')
# client = clients.BetfairClient(trading)

# framework = Flumine(client=client)

class FlatBetting(BaseStrategy):
    def start(self) -&gt; None:
        print("starting strategy 'FlatBetting' using the model we created the Greyhound modelling in Python Tutorial")

    def check_market_book(self, market: Market, market_book: MarketBook) -&gt; bool:
        if market_book.status != "CLOSED":
            return True

    def process_market_book(self, market: Market, market_book: MarketBook) -&gt; None:
        # Convert dataframe to a global variable
        global todays_data

        # At the 60 second mark:
        if market.seconds_to_start &lt; 60 and market_book.inplay == False:
            # get the list of dog_names, name of the track/venue and race_number/RaceNum from Betfair Polling API
            dog_names = []
            track = market.market_catalogue.event.venue
            race_number = market.market_catalogue.market_name.split(' ',1)[0]  # comes out as R1/R2/R3 .. etc
            race_number = re.sub("[^0-9]", "", race_number)  # only keep the numbers 
            for runner_cata in market.market_catalogue.runners:
                dog_name = runner_cata.runner_name.split(' ',1)[1].upper()
                dog_names.append(dog_name)

            # Check if there are box changes, if there are then use Brett's code
            if market.market_catalogue.description.clarifications != None:
                # Brett's code to get Box changes:
                my_string = market.market_catalogue.description.clarifications.replace("<br/> Dog","<br/>Dog")
                pattern1 = r'(?&lt;=<br/>Dog ).+?(?= starts)'
                pattern2 = r"(?&lt;=\bbox no. )(\w+)"
                runners_df = pd.DataFrame (regexp_tokenize(my_string, pattern1), columns = ['runner_name'])
                runners_df['runner_name'] = runners_df['runner_name'].astype(str)
                # Remove dog name from runner_number
                runners_df['runner_number'] = runners_df['runner_name'].apply(lambda x: x[:(x.find(" ") - 1)].upper())
                # Remove dog number from runner_name
                runners_df['runner_name'] = runners_df['runner_name'].apply(lambda x: x[(x.find(" ") + 1):].upper())
                runners_df['Box'] = regexp_tokenize(my_string, pattern2)

                # Replace any old Box info in our original dataframe with data available in runners_df
                runners_df = runners_df.set_index('runner_name')
                todays_data.loc[(runners_df.index[runners_df.index.isin(dog_names)],track,race_number),'Box'] = runners_df.loc[runners_df.index.isin(dog_names),'Box'].to_list()
                # Merge box_win_percentage back on:
                todays_data = todays_data.drop(columns = 'box_win_percentage', axis = 1)
                todays_data = todays_data.reset_index().merge(box_win_percent, on = ['Track', 'Distance','Box'], how = 'left').set_index(['DogName_bf','Track','RaceNum'])

            # Generate probabilities using Bruno's model
            todays_data.loc[(dog_names,track,race_number),'prob_LogisticRegression'] = brunos_model.predict_proba(todays_data.loc[(dog_names,track,race_number)][feature_cols])[:,1]
            # renomalise probabilities
            probabilities = todays_data.loc[dog_names,track,race_number]['prob_LogisticRegression']
            todays_data.loc[(dog_names,track,race_number),'renormalised_prob'] = probabilities/probabilities.sum()
            # convert probaiblities to ratings
            todays_data.loc[(dog_names,track,race_number),'rating'] = 1/todays_data.loc[dog_names,track,race_number]['renormalised_prob']

            # Use both the polling api (market.catalogue) and the streaming api at once:
            for runner_cata, runner in zip(market.market_catalogue.runners, market_book.runners):
                # Check the polling api and streaming api matches up (sometimes it doesn't)
                if runner_cata.selection_id == runner.selection_id:
                    # Get the dog_name from polling api then reference our data for our model rating
                    dog_name = runner_cata.runner_name.split(' ',1)[1].upper()

                    # Rest is the same as How to Automate III
                    model_price = todays_data.loc[dog_name,track,race_number]['rating']
                    ### If you have an issue such as:
                        # Unknown error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
                        # Then do model_price = todays_data.loc[dog_name,track,race_number]['rating'].item()

                    # Log info before placing bets
                    logging.info(f'dog_name: {dog_name}')
                    logging.info(f'model_price: {model_price}')
                    logging.info(f'market_id: {market_book.market_id}')
                    logging.info(f'selection_id: {runner.selection_id}')

                    # If best available to back price is &gt; rated price then flat $5 back
                    if runner.status == "ACTIVE" and runner.ex.available_to_back[0]['price'] &gt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="BACK", order_type=LimitOrder(price=runner.ex.available_to_back[0]['price'], size=5.00)
                        )
                        market.place_order(order)
                    # If best available to lay price is &lt; rated price then flat $5 lay
                    if runner.status == "ACTIVE" and runner.ex.available_to_lay[0]['price'] &lt; model_price:
                        trade = Trade(
                        market_id=market_book.market_id,
                        selection_id=runner.selection_id,
                        handicap=runner.handicap,
                        strategy=self,
                        )
                        order = trade.create_order(
                            side="LAY", order_type=LimitOrder(price=runner.ex.available_to_lay[0]['price'], size=5.00)
                        )
                        market.place_order(order)

greyhounds_strategy = FlatBetting(
    market_filter=streaming_market_filter(
        event_type_ids=["4339"], # Greyhounds markets
        country_codes=["AU"], # Australian markets
        market_types=["WIN"], # Win markets
    ),
    max_order_exposure= 50, # Max exposure per order = 50
    max_trade_count=1, # Max 1 trade per selection
    max_live_trade_count=1, # Max 1 unmatched trade per selection
)

framework.add_strategy(greyhounds_strategy)

# logger = logging.getLogger(__name__)

"""
Worker can be used as followed:
    framework.add_worker(
        BackgroundWorker(
            framework,
            terminate,
            func_kwargs={"today_only": True, "seconds_closed": 1200},
            interval=60,
            start_delay=60,
        )
    )
This will run every 60s and will terminate 
the framework if all markets starting 'today' 
have been closed for at least 1200s
"""


# Function that stops automation running at the end of the day
def terminate(
    context: dict, flumine, today_only: bool = True, seconds_closed: int = 600
) -&gt; None:
    """terminate framework if no markets
    live today.
    """
    markets = list(flumine.markets.markets.values())
    markets_today = [
        m
        for m in markets
        if m.market_start_datetime.date() == datetime.datetime.utcnow().date()
        and (
            m.elapsed_seconds_closed is None
            or (m.elapsed_seconds_closed and m.elapsed_seconds_closed &lt; seconds_closed)
        )
    ]
    if today_only:
        market_count = len(markets_today)
    else:
        market_count = len(markets)
    if market_count == 0:
        # logger.info("No more markets available, terminating framework")
        flumine.handler_queue.put(TerminationEvent(flumine))

# Add the stopped to our framework
framework.add_worker(
    BackgroundWorker(
        framework,
        terminate,
        func_kwargs={"today_only": True, "seconds_closed": 1200},
        interval=60,
        start_delay=60,
    )
)

logger = logging.getLogger(__name__)

FIELDNAMES = [
    "bet_id",
    "strategy_name",
    "market_id",
    "selection_id",
    "trade_id",
    "date_time_placed",
    "price",
    "price_matched",
    "size",
    "size_matched",
    "profit",
    "side",
    "elapsed_seconds_executable",
    "order_status",
    "market_note",
    "trade_notes",
    "order_notes",
]


class LiveLoggingControl(LoggingControl):
    NAME = "BACKTEST_LOGGING_CONTROL"

    def __init__(self, *args, **kwargs):
        super(LiveLoggingControl, self).__init__(*args, **kwargs)
        self._setup()

    # Changed file path and checks if the file orders_hta_4.csv already exists, if it doens't then create it
    def _setup(self):
        if os.path.exists("orders_hta_4.csv"):
            logging.info("Results file exists")
        else:
            with open("orders_hta_4.csv", "w") as m:
                csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                csv_writer.writeheader()

    def _process_cleared_orders_meta(self, event):
        orders = event.event
        with open("orders_hta_4.csv", "a") as m:
            for order in orders:
                if order.order_type.ORDER_TYPE == OrderTypes.LIMIT:
                    size = order.order_type.size
                else:
                    size = order.order_type.liability
                if order.order_type.ORDER_TYPE == OrderTypes.MARKET_ON_CLOSE:
                    price = None
                else:
                    price = order.order_type.price
                try:
                    order_data = {
                        "bet_id": order.bet_id,
                        "strategy_name": order.trade.strategy,
                        "market_id": order.market_id,
                        "selection_id": order.selection_id,
                        "trade_id": order.trade.id,
                        "date_time_placed": order.responses.date_time_placed,
                        "price": price,
                        "price_matched": order.average_price_matched,
                        "size": size,
                        "size_matched": order.size_matched,
                        "profit": 0 if not order.cleared_order else order.cleared_order.profit,
                        "side": order.side,
                        "elapsed_seconds_executable": order.elapsed_seconds_executable,
                        "order_status": order.status.value,
                        "market_note": order.trade.market_notes,
                        "trade_notes": order.trade.notes_str,
                        "order_notes": order.notes_str,
                    }
                    csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                    csv_writer.writerow(order_data)
                except Exception as e:
                    logger.error(
                        "_process_cleared_orders_meta: %s" % e,
                        extra={"order": order, "error": e},
                    )

        logger.info("Orders updated", extra={"order_count": len(orders)})

    def _process_cleared_markets(self, event):
        cleared_markets = event.event
        for cleared_market in cleared_markets.orders:
            logger.info(
                "Cleared market",
                extra={
                    "market_id": cleared_market.market_id,
                    "bet_count": cleared_market.bet_count,
                    "profit": cleared_market.profit,
                    "commission": cleared_market.commission,
                },
            )

framework.add_logging_control(
    LiveLoggingControl()
)

framework.run()

Disclaimer

Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.