How To Build A Betfair Soccer Bot Part 3
This is a continuation of the tutorial - How To Build A Betfair Soccer Bot Part 2
"I've done my simulations, now how I use it to bet with?"
The previous tutorial describes how to run simulations on historical stream files using what I call a "shotgun" approach; where we bet on everything (both backing and laying). This part will walkthrough how to analyse the simulation results to determine a profitable angle.
Is there still profit to be made betting English Premier League?
It's no secret that English Premier League is likely the most modelled sports league in existence but that doesn't mean the markets are 100% efficient on Betfair or elsewhere. There are many theories on why this might be, one being that some punters take positions preplay with the intent to trade inplay. Australian customers aren't allowed to bet inplay without calling up, and even leaving bets inplay may not work in a trading sense due to the voiding of matched bets in the case of a VAR review leaving positions exposed.
However, these trading punters may prioritise flexibility for inplay trading rather than seeking the most efficient preplay prices, creating minor inefficiencies in the preplay market. As a result, preplay money may not fully reflect true probabilities, leaving gaps for value betting. Even a simple model that doesn’t use player-specific information, like the one we've created, can still find an edge by capitalising on these inefficiencies.
Grouping selections
To do this, we'll need to group selections together so as not to overfit. For market like Over/Under 1.5 Goals, the selection names are always the same and so this is straightforward but for any market where the team name is in the selection id, we'd want to group the selections together using a different characteristic related to each selection. In racing, this might be by grouping together runners by barrier groups, whereas in sports this is usually by utilising the home/away status of each team. In the Match Odds markets, there may not be enough information to accurately determine how good we are at predicting Arsenal but by grouping together we can look at our Home Team bets as a group rather than just a single team.
So the next code block will process our simulation results and initiate a selection grouping column where we can apply changes depending on the home/away status of the team and the market type. Once we've grouped together our markets we'll create some graphs that chart profitability against implied value of our rated price against the price at which we placed our bets. Ideally, we'd like to be able to make profit at any price, but overall that isn't realistic, so we might find that we'll have to introduce some "edge limits" to specify where we might not bet at all, depending on the price available in the market.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
'''
Here we will initialise a new column called 'selection_group' as a copy of 'selection' and then
apply changes to this column based on our groups.
For Correct Score & Half Time Score markets we will group scores based on whether the score indicates
a home win, away win or a draw, and additionally treat 0-0 scorelines separately as these
selections behave differently again.
For markets like Match Odds and Handicap (e.g. 'TEAM_A_1'), we will identify the home and away
teams from the event name and compare to whether the team name is contained with the selection name
Finally we will generate a series of graphs using pyplot to visualise these profitability curves
and identify where we might place bets to our gain
'''
models = ['ensemble']
# Define common conditions for simplification
home_team_win = ['1&0', '2&1', '2&0', '3&0', '3&1', '3&2', 'Any Other Home Win']
away_team_win = ['0&1', '1&2', '0&2', '0&3', '1&3', '2&3', 'Any Other Away Win']
draw_results = ['1&1', '2&2', '3&3', 'Any Other Draw', 'The Draw']
for model in models:
print(f'Processing {model}')
# Load the CSV file with specified data types
df = pd.read_csv(destination+f'soccer_simulation_{model}_processed.csv')
# Filter data based on implied values
df = df[(df[f'{model}_implied_value'] > -0.3) & (df[f'{model}_implied_value'] < 0.3)]
# Extract home and away teams
df[['home_team', 'away_team']] = df['fixture'].str.split(' v ', expand=True)
# Initialize the 'selection_group' column with default values
df['selection_group'] = df['selection'].str.replace(r'/','-')
# Home win related conditions
df.loc[
(df['market_type'].isin(['CORRECT_SCORE', 'HALF_TIME_SCORE', 'HALF_TIME', 'DRAW_NO_BET', 'MATCH_ODDS']) &
(df['selection'].isin(home_team_win) | (df['home_team'] == df['selection']))) |
(df['market_type'].str.contains('TEAM_A') & df['selection'].str.contains(r'\+')) |
(df['market_type'].str.contains('TEAM_B') & df['selection'].str.contains('-')),
'selection_group'
] = 'Home'
# Away win related conditions
df.loc[
(df['market_type'].isin(['CORRECT_SCORE', 'HALF_TIME_SCORE', 'HALF_TIME', 'DRAW_NO_BET', 'MATCH_ODDS']) &
(df['selection'].isin(away_team_win) | (df['away_team'] == df['selection']))) |
(df['market_type'].str.contains('TEAM_B') & df['selection'].str.contains(r'\+')) |
(df['market_type'].str.contains('TEAM_A') & df['selection'].str.contains('-')),
'selection_group'
] = 'Away'
# Draw related conditions
df.loc[
df['market_type'].isin(['CORRECT_SCORE', 'HALF_TIME_SCORE', 'HALF_TIME', 'MATCH_ODDS']) &
df['selection'].isin(draw_results),
'selection_group'
] = 'Draw'
# Half-Time/Full-Time related conditions
df.loc[
df['market_type'] == 'HALF_TIME_FULL_TIME',
'selection_group'
] = df.apply(lambda row:
'Home-Draw' if row['selection'].startswith(row['home_team']) and 'Draw' in row['selection'] else
'Home-Home' if row['selection'].startswith(row['home_team']) and row['selection'].endswith(row['home_team']) else
'Home-Away' if row['selection'].startswith(row['home_team']) else
'Away-Draw' if row['selection'].startswith(row['away_team']) and 'Draw' in row['selection'] else
'Away-Away' if row['selection'].startswith(row['away_team']) and row['selection'].endswith(row['away_team']) else
'Away-Home' if row['selection'].startswith(row['away_team']) else
'Draw-Home' if 'Draw' in row['selection'] and row['selection'].endswith(row['home_team']) else
'Draw-Away' if 'Draw' in row['selection'] and row['selection'].endswith(row['away_team']) else
row['selection_group'],
axis=1
)
# Match Odds and BTTS conditions
df.loc[
df['market_type'] == 'MATCH_ODDS_AND_BTTS',
'selection_group'
] = df.apply(lambda row:
'Away-Yes' if row['selection'].startswith(row['away_team']) and 'Yes' in row['selection'] else
'Away-No' if row['selection'].startswith(row['away_team']) and 'No' in row['selection'] else
'Home-Yes' if row['selection'].startswith(row['home_team']) and 'Yes' in row['selection'] else
'Home-No' if row['selection'].startswith(row['home_team']) and 'No' in row['selection'] else
row['selection_group'],
axis=1
)
# Match Odds and OU conditions
df.loc[
df['market_type'].str.contains('MATCH_ODDS_AND_OU'),
'selection_group'
] = df.apply(lambda row:
'Home-Over' if row['home_team'] in row['selection'] and 'Over' in row['selection'] else
'Home-Under' if row['home_team'] in row['selection'] and 'Under' in row['selection'] else
'Away-Over' if row['away_team'] in row['selection'] and 'Over' in row['selection'] else
'Away-Under' if row['away_team'] in row['selection'] and 'Under' in row['selection'] else
'Draw-Over' if 'Draw' in row['selection'] and 'Over' in row['selection'] else
'Draw-Under' if 'Draw' in row['selection'] and 'Under' in row['selection'] else
row['selection_group'],
axis=1
)
df.to_csv(destination+f'soccer_simulation_{model}_grouped.csv')
# Iterate over each unique combination of 'market_type' and 'side' and 'selection_group'
for (market_type, side, selection_group), group_df in df.groupby(['market_type', 'side','selection_group']):
# Drop rows where 'col' or 'profit' is NaN
group_df = group_df.dropna(subset=[f'{model}_implied_value', 'profit'])
# Sort by the column in ascending order
group_df_sorted = group_df.sort_values(by=f'{model}_implied_value', ascending=True)
# Calculate cumulative sum of 'profit'
group_df_sorted['cumulative_profit'] = group_df_sorted['profit'].cumsum()
# Find the index of the maximum cumulative profit
max_index = group_df_sorted['cumulative_profit'].idxmax()
max_value = group_df_sorted.loc[max_index, 'cumulative_profit']
max_x_value = group_df_sorted.loc[max_index, f'{model}_implied_value']
# Plot the cumulative sum with increased figure size
fig, ax = plt.subplots(figsize=(12, 8)) # Increase the figure size (width x height in inches)
ax.plot(group_df_sorted[f'{model}_implied_value'], group_df_sorted['cumulative_profit'], label=f'Cumulative Profit ({market_type}, {side}, {selection_group})')
ax.set_xlabel(f'{model}_implied_value')
ax.set_ylabel('Cumulative Profit')
ax.set_title(f'{f'{model}_implied_value'}: Cumulative Profit for {market_type} and {side} and {selection_group}')
ax.legend()
# Add a horizontal line at y=0
ax.axhline(y=0, color='red', linestyle='--', linewidth=1) # Red dashed line at y=0
# Set x-ticks to show a maximum of 10 ticks
num_ticks = min(len(group_df_sorted[f'{model}_implied_value']), 10)
ticks = np.linspace(group_df_sorted[f'{model}_implied_value'].min(), group_df_sorted[f'{model}_implied_value'].max(), num_ticks)
ax.set_xticks(ticks)
# Add text annotations for maximum and filtered minimum x-values
text_annotation_max = f"Max Profit: {max_value:.2f}\nAt {f'{model}_implied_value'}: {max_x_value:.4f}"
ax.text(
0.95, 0.05, # Position in relative coordinates (x, y) where (0, 0) is bottom-left and (1, 1) is top-right
text_annotation_max,
transform=ax.transAxes, # Transform coordinates to relative plot area
fontsize=12,
verticalalignment='bottom',
horizontalalignment='right',
bbox=dict(facecolor='white', alpha=0.8, edgecolor='none')
)
# Save the plot as a JPEG file
filename = destination+f"{market_type}_{side}_{selection_group}_{model}_cumulative_profit.jpeg"
print(f'Saving {filename}')
plt.savefig(filename, format='jpeg')
plt.close()
This image here is an example of a market where we could theoretically bet at any price and make a profit
This image here is a market type and grouping where we would need to input an edge limit
Once we've identified our edge from our graphs, then in order to bet these edges, we'll need to be able to generate new rated prices for upcoming matches, and then feed our rated prices into our live (not simulation) flumine class for the upcoming matches. We will start by using the EPL competition ID to find upcoming matches, update our dataset and perform the rolling window calculations
'''
Here's our boilerplate code again to ensure all the required packages are imported and variables defined
'''
# Import libraries
import pandas as pd
from datetime import datetime,timedelta
import numpy as np
import pickle
import betfairlightweight
import json
import warnings
warnings.filterwarnings('ignore')
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Specify destination for files
destination = 'C:/Users/motykam/OneDrive - Betfair Pty Ltd/Documents/Scripts/Soccer-Goals-Model/'
# Set a list of important columns to keep
column_list = ['date','name','match_id','home_or_away']
# Specify column lists to use later
MATCH_INFO_COLUMNS = ['date_home','match_id','name_home','name_away','goalsScored_home','goalsScored_away','halfTimeGoalsScored_home','halfTimeGoalsScored_away']
DROP_COLUMNS = ['home_or_away_home','date_away','home_or_away_away','goalsConceded_home','goalsConceded_away','halfTimeGoalsConceded_home','halfTimeGoalsConceded_away']
# Define model weights (these should sum to 1 for proper ensembling)
model_weights = {'GradientBoostingClassifier': 0.25,
'RandomForestClassifier': 0.25,
'LGBMClassifier': 0.25,
'KNeighborsClassifier': 0.15,
'LogisticsRegression': 0.1}
model_names = ["KNeighborsClassifier", "LogisticsRegression", "RandomForestClassifier", "GradientBoostingClassifier","LGBMClassifier"]
prefixes = ['home','away']
suffixes = ['','_ht']
new_column_names = []
'''
This function here will login to the Betfair API to find upcoming EPL matches, and then download required markets to a dataframe.
The streaming API doesn't support filtering on competition id so by gathering market_ids here we can just feed them into our streaming_market_filter later
We also generate additional rows to add to our dataset for upcoming matches to calculate the rolling figure calculations
It's important to ensure that your data is up-to-date
'''
def pull_upcoming_fixtures(days_ahead):
with open("credentials.json") as f:
cred = json.load(f)
my_username = cred["username"]
my_password = cred["password"]
my_app_key = cred["app_key"]
trading = betfairlightweight.APIClient(username=my_username,
password=my_password,
app_key=my_app_key
)
trading.login_interactive()
# Define a market filter
epl_event_filter = betfairlightweight.filters.market_filter(
competition_ids=["10932509"],
market_start_time={
"to": (datetime.datetime.now() + datetime.timedelta(days=days_ahead)).strftime("%Y-%m-%dT%TZ")},
market_type_codes=['BOTH_TEAMS_TO_SCORE','CORRECT_SCORE','DOUBLE_CHANCE','DRAW_NO_BET','FIRST_HALF_GOALS_05','FIRST_HALF_GOALS_15','FIRST_HALF_GOALS_25','HALF_TIME','HALF_TIME_FULL_TIME','HALF_TIME_SCORE','MATCH_ODDS','MATCH_ODDS_AND_BTTS','MATCH_ODDS_AND_OU_25','MATCH_ODDS_AND_OU_35','OVER_UNDER_05','OVER_UNDER_15','OVER_UNDER_25','OVER_UNDER_35','OVER_UNDER_45','TEAM_A_1','TEAM_A_OVER_UNDER_05','TEAM_A_OVER_UNDER_15','TEAM_A_OVER_UNDER_25','TEAM_A_WIN_TO_NIL','TEAM_B_1','TEAM_B_OVER_UNDER_05','TEAM_B_OVER_UNDER_15','TEAM_B_OVER_UNDER_25','TEAM_B_WIN_TO_NIL']
)
# Get a list of all thoroughbred events as objects
epl_events = trading.betting.list_market_catalogue(
filter=epl_event_filter,
market_projection=['EVENT','MARKET_DESCRIPTION'],
max_results=200
)
# Create a DataFrame with all the events by iterating over each event object
epl_events_upcoming = pd.DataFrame({
"event_name": [market_cat_object.event.name for market_cat_object in epl_events],
"event_id": [market_cat_object.event.id for market_cat_object in epl_events],
"open_date": [market_cat_object.event.open_date for market_cat_object in epl_events],
"market_name": [market_cat_object.market_name for market_cat_object in epl_events],
"market_id": [market_cat_object.market_id for market_cat_object in epl_events],
"market_type" : [market_cat_object.description.market_type for market_cat_object in epl_events]
})
# Remove daily goals and matches too far ahead
epl_events_upcoming = epl_events_upcoming[epl_events_upcoming["event_name"].str.contains(" v ", case=False, na=False)]
# Find the maximum match_id from our existing dataset
existing_df = pd.read_csv("englishPremierLeague.csv")
max_match_id = existing_df["match_id"].max()
upcoming_fixtures = epl_events_upcoming[["open_date","event_name"]]
upcoming_fixtures.drop_duplicates(inplace=True)
# Create the new DataFrame
upcoming_matches = pd.DataFrame({
"date": upcoming_fixtures["open_date"].dt.date,
"match_id": range(max_match_id + 1, max_match_id + 1 + len(upcoming_fixtures)),
"home_team_name": upcoming_fixtures["event_name"].str.split(" v ").str[0],
"away_team_name": upcoming_fixtures["event_name"].str.split(" v ").str[1],
"home_team_goal_count": 0,
"away_team_goal_count": 0,
"home_team_half_time_goal_count": 0,
"away_team_half_time_goal_count": 0,
"home_team_shots": 0,
"away_team_shots": 0,
"home_team_shots_on_target": 0,
"away_team_shots_on_target": 0,
"home_team_fouls": 0,
"away_team_fouls": 0,
"home_team_corner_count": 0,
"away_team_corner_count": 0,
"home_team_yellow": 0,
"away_team_yellow": 0,
"home_team_red": 0,
"away_team_red": 0,
})
upcoming_matches['date'] = pd.to_datetime(upcoming_matches['date'])
return epl_events_upcoming, upcoming_matches
'''
This function pulls our previous functions for calculating the rolling windows and applies them again to the upcoming matches
'''
def generate_rolling_windows_upcoming(upcoming_matches):
raw_match_stats = load_data()
raw_match_stats = pd.concat([raw_match_stats,upcoming_matches])
raw_match_stats = championshipPoints(raw_match_stats)
team_match_stats = separate_home_and_away(raw_match_stats)
team_match_stats = calculate_differential(team_match_stats)
team_data_all_sorted, team_rolling_columns = calculate_rolling_median(team_match_stats)
team_stats_rolling_dataframe = rejoin_home_away_data(team_data_all_sorted,column_list,team_rolling_columns)
team_stats_rolling_dataframe,feature_columns = reshape_dataframe(MATCH_INFO_COLUMNS,DROP_COLUMNS,team_stats_rolling_dataframe)
upcoming_matches_rolling_windows = team_stats_rolling_dataframe[team_stats_rolling_dataframe['date_home'] >= datetime.now() - timedelta(days=1)]
return upcoming_matches_rolling_windows,feature_columns
'''
This function loads our pickle files and applies them to generate the class probabilities for the new matches
'''
def apply_models_to_new_data_with_ensemble(model_names, new_data, model_weights):
# Separate match information and feature columns
final_predictions = new_data[MATCH_INFO_COLUMNS]
feature_data = new_data.drop(columns=MATCH_INFO_COLUMNS)
for prefix in prefixes:
for suffix in suffixes:
# Initialize a list to hold the weighted predictions
weighted_predictions = None
# Loop over each model
for model_name in model_names:
# Load the model
with open(f"{model_name}_{prefix}{suffix}.pickle", "rb") as file:
model = pickle.load(file)
model_predictions = model.predict_proba(feature_data)
# Weight the predictions by the model's weight
if weighted_predictions is None:
weighted_predictions = model_predictions * model_weights.get(model_name, 0)
else:
weighted_predictions += model_predictions * model_weights.get(model_name, 0)
# Normalize the weighted predictions (to make sure they sum to 1 for each instance)
weighted_predictions /= np.sum(list(model_weights.values()))
# Add the ensemble predictions as new columns to the final DataFrame
for i in range(weighted_predictions.shape[1]):
final_predictions[f"{prefix}_{i}{suffix}"] = weighted_predictions[:, i]
return final_predictions
'''
This function takes our class probabilities and then calculates the market rated prices
'''
def generate_rated_prices(goal_predictions):
# List of indices representing the number of goals for which we want to calculate the probability
full_time_indices = range(8) # 0,1,2,3,4,5,6,7
half_time_indices = range(6) # 0,1,2,3,4,5
# Generate new columns efficiently using vectorized operations
for i in full_time_indices:
for j in full_time_indices:
home_col = f'home_{i}'
away_col = f'away_{j}'
new_col_name = f'{home_col}_x_{away_col}'
goal_predictions[new_col_name] = goal_predictions[home_col].fillna(0) * goal_predictions[away_col].fillna(0)
new_column_names.append(new_col_name)
for i in half_time_indices:
for j in half_time_indices:
home_ht_col = f'home_{i}_ht'
away_ht_col = f'away_{j}_ht'
new_col_name = f'{home_ht_col}_x_{away_ht_col}'
goal_predictions[new_col_name] = goal_predictions[home_ht_col].fillna(0) * goal_predictions[away_ht_col].fillna(0)
new_column_names.append(new_col_name)
# Final DataFrame with only the relevant columns (Match Information plus the probability of each scoreline)
goal_predictions = goal_predictions[MATCH_INFO_COLUMNS + [col for col in goal_predictions.columns if '_x_' in col]]
goal_predictions.to_csv('upcoming_scoreline_predictions.csv', index=False)
rated_prices = process_model_predictions(goal_predictions)
rated_prices.to_csv('upcoming_rated_prices.csv', index=False)
return rated_prices
Now that we've generated our rated prices, lets go back to our simulation results to determine which markets we want to bet into, which selections and the minimum and maximum acceptable prices for which we'll bet.
The below is an example of some market types, selections and value ranges that could be used to place bets. Where we've determined that we don't need to set a value boundary, then we'll set the min/max prices to 1.01/1000. If we do need to define a value boundary, we'll do the calculations individually for each market_type/selection_group
Market Type | Selection Type | Side | Min EV | Max EV |
---|---|---|---|---|
BOTH_TEAMS_TO_SCORE | No | LAY | - | - |
CORRECT_SCORE | 0-0 | LAY | - | -0.0325 |
CORRECT_SCORE | Away | LAY | - | 0.08 |
CORRECT_SCORE | Home | BACK | - | 0.08 |
CORRECT_SCORE | Draw | BACK | - | 0 |
DOUBLE_CHANCE | Draw-Away | LAY | - | - |
DOUBLE_CHANCE | Home-Draw | BACK | -0.08 | 0.11 |
DRAW_NO_BET | Away | LAY | - | 0.02 |
FIRST_HALF_GOALS_05 | Under 0.5 Goals | LAY | - | - |
FIRST_HALF_GOALS_15 | Under 1.5 Goals | LAY | - | - |
FIRST_HALF_GOALS_25 | Under 2.5 Goals | LAY | - | - |
HALF_TIME_SCORE | 0-0 | LAY | -0.05 | - |
HALF_TIME_SCORE | Draw | BACK | -0.01 | - |
HALF_TIME_SCORE | Away | LAY | -0.07 | 0.11 |
HALF_TIME_FULL_TIME | Draw-Home | BACK | -0.06 | - |
HALF_TIME_FULL_TIME | Away-Away | LAY | - | - |
MATCH_ODDS | Away | LAY | - | - |
MATCH_ODDS | Home | BACK | - | - |
MATCH_ODDS_AND_OU_25 | Draw/Under | LAY | - | - |
MATCH_ODDS_AND_OU_25 | Home/Over | BACK | -0.05 | - |
OVER_UNDER_05 | Under 0.5 Goals | LAY | -0.03 | - |
OVER_UNDER_15 | Over 1.5 Goals | BACK | -0.05 | 0.1 |
OVER_UNDER_25 | Over 2.5 Goals | BACK | -0.05 | 0.1 |
OVER_UNDER_35 | Over 3.5 Goals | BACK | -0.05 | 0.1 |
OVER_UNDER_45 | Over 4.5 Goals | BACK | -0.05 | 0.1 |
OVER_UNDER_55 | Over 5.5 Goals | BACK | -0.05 | 0.1 |
TEAM_A_1 | Home +1 | BACK | - | - |
TEAM_A_1 | Draw | LAY | - | - |
TEAM_A_OVER_UNDER_05 | Over 0.5 Goals | BACK | - | 0.07 |
TEAM_A_OVER_UNDER_15 | Over 1.5 Goals | BACK | -0.05 | - |
TEAM_A_OVER_UNDER_25 | Over 2.5 Goals | BACK | -0.05 | - |
TEAM_B_1 | Draw | BACK | - | - |
TEAM_B_1 | Away | LAY | - | - |
'''
This function will calculate our upper and lower price limits
'''
def set_min_max_prices(rated_prices,epl_events_upcoming):
# Initialise certain columns as required with placeholder values
rated_prices['home_team'] = rated_prices['fixture'].str.split(' v ').str[0]
rated_prices['away_team'] = rated_prices['fixture'].str.split(' v ').str[1]
rated_prices['min_price'] = 1.01
rated_prices['max_price'] = 1000
rated_prices['side'] = None
df = pd.merge(rated_prices, epl_events_upcoming, how="left",left_on=["fixture","market_name"],right_on=["event_name","market_name"])
df = df[['event_date',
'home_team',
'away_team',
'fixture',
'market_type',
'market_name',
'market_id',
'runner_name',
'rated_price',
'min_price',
'max_price',
'side'
]]
df = df.dropna(subset=['market_type'])
# Set our side
df.loc[
(
(df['market_type'].isin(['BOTH_TEAMS_TO_SCORE']) & df['runner_name'].isin(['No'])) |
(df['market_type'].isin(['CORRECT_SCORE']) & df['runner_name'].isin(['0 - 0','0 - 1','0 - 2','0 - 3','1 - 2','1 - 3','2 - 3','Any Other Away Win'])) |
(df['market_type'].isin(['DOUBLE CHANCE']) & df['runner_name'].isin(['Draw or Away'])) |
(df['market_type'].isin(['DRAW_NO_BET']) & (df['runner_name'] == df['away_team'])) |
(df['market_type'].isin(['FIRST_HALF_GOALS_05']) & df['runner_name'].isin(['Under 0.5 Goals'])) |
(df['market_type'].isin(['FIRST_HALF_GOALS_15']) & df['runner_name'].isin(['Under 1.5 Goals'])) |
(df['market_type'].isin(['FIRST_HALF_GOALS_25']) & df['runner_name'].isin(['Under 2.5 Goals'])) |
(df['market_type'].isin(['HALF_TIME_SCORE']) & df['runner_name'].isin(['0 - 0','0 - 1','0 - 2','1 - 2'])) |
(df['market_type'].isin(['MATCH_ODDS']) & (df['runner_name'] == df['away_team'])) |
(df['market_type'].isin(['MATCH_ODDS_AND_OU_25']) & df['runner_name'].isin(['Draw/Under 2.5 Goals'])) |
(df['market_type'].isin(['OVER_UNDER_05']) & df['runner_name'].isin(['Under 0.5 Goals'])) |
(df['market_type'].isin(['TEAM_A_1']) & df['runner_name'].isin(['Draw']))
),
'side'
] = 'LAY'
df['side'] = df.apply(lambda row: 'LAY' if row['market_type'] == 'TEAM_B_1' and '+' in row['runner_name'] else row['side'], axis=1)
df['side'] = df.apply(lambda row: 'LAY' if row['market_type'] == 'HALF_TIME_FULL_TIME' and row['runner_name'].count(row['away_team']) == 2 else row['side'], axis=1)
df.loc[
(
(df['market_type'].isin(['CORRECT_SCORE']) & df['runner_name'].isin(['1 - 0','2 - 0','3 - 0','2 - 1','2 - 0','3 - 1','3 - 2','Any Other Home Win','1 - 1','2 - 2','3 - 3', 'Any Other Draw'])) |
(df['market_type'].isin(['DOUBLE CHANCE']) & df['runner_name'].isin(['Home or Draw'])) |
(df['market_type'].isin(['HALF_TIME_SCORE']) & df['runner_name'].isin(['1 - 1','2 - 2'])) |
(df['market_type'].isin(['MATCH_ODDS']) & (df['runner_name'] == df['home_team'])) |
(df['market_type'].isin(['OVER_UNDER_15']) & df['runner_name'].isin(['Over 1.5 Goals'])) |
(df['market_type'].isin(['OVER_UNDER_25']) & df['runner_name'].isin(['Over 2.5 Goals'])) |
(df['market_type'].isin(['OVER_UNDER_35']) & df['runner_name'].isin(['Over 3.5 Goals'])) |
(df['market_type'].isin(['OVER_UNDER_45']) & df['runner_name'].isin(['Over 4.5 Goals'])) |
(df['market_type'].isin(['OVER_UNDER_55']) & df['runner_name'].isin(['Over 5.5 Goals'])) |
(df['market_type'].isin(['TEAM_A_OVER_05']) & df['runner_name'].isin(['Over 0.5 Goals'])) |
(df['market_type'].isin(['TEAM_A_OVER_15']) & df['runner_name'].isin(['Over 1.5 Goals'])) |
(df['market_type'].isin(['TEAM_A_OVER_25']) & df['runner_name'].isin(['Over 2.5 Goals'])) |
(df['market_type'].isin(['TEAM_B_1']) & df['runner_name'].isin(['Draw']))
),
'side'
] = 'BACK'
df['side'] = df.apply(lambda row: 'BACK' if row['market_type'] == 'TEAM_A_1' and '+' in row['runner_name'] else row['side'], axis=1)
df['side'] = df.apply(lambda row: 'BACK' if row['market_type'] == 'HALF_TIME_FULL_TIME' and 'Draw/' in row['runner_name'] and row['runner_name'].count(row['home_team']) == 1 else row['side'], axis=1)
df['side'] = df.apply(lambda row: 'BACK' if row['market_type'] == 'MATCH_ODDS_AND_OU_25' and 'OVer' in row['runner_name'] and row['runner_name'].count(row['home_team']) == 1 else row['side'], axis=1)
# Discard any markets/selections that we're not interested in
df.dropna(subset=['side'],inplace=True)
# Set any prices that we have limits for
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 3.25 * row['rated_price'])
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] == '0 - 0' else row['max_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 8 * row['rated_price'])
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] in ['0 - 1','0 - 2','0 - 3','1 - 2','1 - 3','2 - 3','Any Other Away Win'] else row['max_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 8 * row['rated_price'])
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] in ['1 - 0','2 - 0','3 - 0','2 - 1','3 - 1','3 - 2','Any Other Home Win'] else row['max_price'],axis=1)
df['max_price'] = df.apply(lambda row: row['rated_price']
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] in ['1 - 1','2 - 2','Any Other Draw'] else row['max_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 11 * row['rated_price'])
if row['market_type'] == 'DOUBLE_CHANCE' and row['runner_name'] == 'Home or Draw' else row['max_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 8 * row['rated_price'])
if row['market_type'] == 'DOUBLE_CHANCE' and row['runner_name'] == 'Home or Draw' else row['min_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 2 * row['rated_price'])
if row['market_type'] == 'DRAW_NO_BET' else row['max_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 5 * row['rated_price'])
if row['market_type'] == 'HALF_TIME_SCORE' and row['runner_name'] == '0 - 0' else row['min_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 1 * row['rated_price'])
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] in ['1 - 1','2 - 2'] else row['min_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 11 * row['rated_price'])
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] in ['0 - 1','0 - 2','1 - 2'] else row['max_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 7 * row['rated_price'])
if row['market_type'] == 'CORRECT_SCORE' and row['runner_name'] in ['0 - 1','0 - 2','1 - 2'] else row['min_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 6 * row['rated_price'])
if row['market_type'] == 'HALF_TIME_FULL_TIME' and 'Draw/' in row['runner_name'] and row['runner_name'].count(row['home_team']) == 1 else row['min_price'], axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 5 * row['rated_price'])
if row['market_type'] == 'MATCH_ODDS_AND_OU_25' and 'OVer' in row['runner_name'] and row['runner_name'].count(row['home_team']) == 1 else row['min_price'], axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 3 * row['rated_price'])
if row['market_type'] == 'OVER_UNDER_05' else row['min_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 5 * row['rated_price'])
if row['market_type'] in ['OVER_UNDER_15','OVER_UNDER_25','OVER_UNDER_35','OVER_UNDER_45','OVER_UNDER_55'] else row['min_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 10 * row['rated_price'])
if row['market_type'] in ['OVER_UNDER_15','OVER_UNDER_25','OVER_UNDER_35','OVER_UNDER_45','OVER_UNDER_55'] else row['max_price'],axis=1)
df['min_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 + 5 * row['rated_price'])
if row['market_type'] in ['OVER_UNDER_15','OVER_UNDER_25','OVER_UNDER_35','OVER_UNDER_45','OVER_UNDER_55'] else row['min_price'],axis=1)
df['max_price'] = df.apply(lambda row: (100 * row['rated_price'])/(100 - 10 * row['rated_price'])
if row['market_type'] in ['OVER_UNDER_15','OVER_UNDER_25','OVER_UNDER_35','OVER_UNDER_45','OVER_UNDER_55'] else row['max_price'],axis=1)
'''
These calculations can result in negative values
e.g. where we've set an upper boundary of 7% but the implied probability of the rated price is only 4%
So we'll set any negative values to 1000
'''
df['max_price'] = df.apply(lambda row: 1000
if row['max_price'] < 0 else row['max_price'], axis=1)
df.sort_values(by=['event_date','fixture','market_type','runner_name'],inplace=True)
return df
# Call our functions to generate the dataframe for flumine
def generate_flumine_dataframe():
epl_events_upcoming, upcoming_matches = pull_upcoming_fixtures(2)
upcoming_matches_rolling_windows,feature_columns = generate_rolling_windows_upcoming(upcoming_matches)
upcoming_matches_rolling_windows[feature_columns] = scaler.fit_transform(upcoming_matches_rolling_windows[feature_columns])
goal_predictions = apply_models_to_new_data_with_ensemble(model_names, upcoming_matches_rolling_windows, model_weights)
rated_prices = generate_rated_prices(goal_predictions)
final_flumine_dataframe = set_min_max_prices(rated_prices,epl_events_upcoming)
print(final_flumine_dataframe)
return final_flumine_dataframe
final_flumine_dataframe = generate_flumine_dataframe()
final_flumine_dataframe.to_csv('flumine_dataframe_upcoming_matches.csv',index=False)
# Import libraries for logging in
import betfairlightweight
from flumine import Flumine, clients
import json
import pandas as pd
import os
import csv
import time
# Import libraries and logging
from flumine import BaseStrategy
from flumine.order.trade import Trade
from flumine.order.order import MarketOnCloseOrder
from flumine.markets.market import Market
from betfairlightweight.filters import streaming_market_filter
from betfairlightweight.resources import MarketBook
from flumine.events.events import TerminationEvent
import logging
from datetime import datetime
from flumine.controls.loggingcontrols import LoggingControl
from flumine.order.ordertype import OrderTypes
from flumine.worker import BackgroundWorker
logging.basicConfig(filename = 'soccer_bot.log', level=logging.CRITICAL, format='%(asctime)s:%(levelname)s:%(message)s')
logger = logging.getLogger(__name__)
FIELDNAMES = [
"bet_id",
"strategy_name",
"market_id",
"selection_id",
"trade_id",
"date_time_placed",
"price",
"price_matched",
"size",
"size_matched",
"profit",
"side",
"elapsed_seconds_executable",
"order_status",
"market_note",
"trade_notes",
"order_notes",
]
def bflw_trading():
with open('credentials.json') as f:
cred = json.load(f)
username = cred['username']
password = cred['password']
app_key = cred['app_key']
# Define the betfairlightweight client
trading = betfairlightweight.APIClient(username, password, app_key=app_key)
return trading
def process_runner_books(runner_books):
# Grab the prices for each runner from the win market
'''
This function processes the runner books and returns a DataFrame with the best back/lay prices + vol for each runner
:param runner_books:
:return:
'''
best_back_prices = [runner_book.ex.available_to_back[0]['price']
if runner_book.ex.available_to_back
else 1.01
for runner_book
in runner_books]
best_lay_prices = [runner_book.ex.available_to_lay[0]['price']
if runner_book.ex.available_to_lay
else 1000
for runner_book
in runner_books]
selection_ids = [runner_book.selection_id for runner_book in runner_books]
df = pd.DataFrame({
'selection_id': selection_ids,
'best_back_price': best_back_prices,
'best_lay_price':best_lay_prices
})
return df
# Function to process the runner catalogue to gather all selection names
def process_runner_catalogue(market_book: MarketBook):
runners_df = process_runner_books(market_book.runners)
for runner in market_book.runners:
runner_name = next((rd.name for rd in market_book.market_definition.runners if rd.selection_id == runner.selection_id), None)
# rstrip() removes any trailing white spaces
runners_df.loc[runner.selection_id, 'runner_name'] = runner_name.rstrip().title()
return runners_df
def load_model_ratings():
soccer_ratings = pd.read_csv('flumine_dataframe_upcoming_matches.csv')
soccer_markets = soccer_ratings['market_id'].unique().tolist()
return soccer_ratings, soccer_markets
class SoccerModel(BaseStrategy):
def __init__(self, context, *args, **kwargs):
super().__init__(*args, **kwargs)
self.context = context
def check_market_book(self, market: Market, market_book: MarketBook) -> bool:
# process_market_book only executed if this returns True
if market_book.status not in ["INPLAY","CLOSED"]:
return True
def process_market_book(self, market: Market, market_book: MarketBook) -> None:
try:
runners_df = process_runner_catalogue(market_book)
runners_df['market_id'] = market.market_id
market_df = pd.merge(runners_df,self.context['model_prices'],how="left",on=['market_id','runner_name'])
market_df['selection_id'] = market_df['selection_id'].astype(int)
market_df.set_index('selection_id',inplace=True)
if market.seconds_to_start < 300 and not market_book.inplay:
for runner in market_book.runners:
if market_df.loc[runner.selection_id, 'min_price'] == 'BACK':
if market_df.loc[runner.selection_id, 'min_price'] <= market_df.loc[runner.selection_id, 'best_back_price'] and market_df.loc[runner.selection_id, 'max_price'] >= market_df.loc[runner.selection_id, 'best_back_price']:
if runner.selection_id in market_df.index:
trade = Trade(
market_id=market_book.market_id,
selection_id=runner.selection_id,
handicap=runner.handicap,
strategy=self,
)
order = trade.create_order(
side="BACK",
order_type=LimitOrder(
price = market_df.loc[runner.selection_id, 'best_back_price'],
size = round(100/market_df.loc[runner.selection_id, 'best_back_price'],2),
persistence_type='LAPSE'
)
)
market.place_order(order)
if market_df.loc[runner.selection_id, 'min_price'] == 'LAY':
if market_df.loc[runner.selection_id, 'min_price'] <= market_df.loc[runner.selection_id, 'best_back_price'] and market_df.loc[runner.selection_id, 'max_price'] >= market_df.loc[runner.selection_id, 'best_back_price']:
if runner.selection_id in market_df.index:
trade = Trade(
market_id=market_book.market_id,
selection_id=runner.selection_id,
handicap=runner.handicap,
strategy=self,
)
order = trade.create_order(
side="LAY",
order_type=LimitOrder(
price = market_df.loc[runner.selection_id, 'best_back_price'],
size = round(100/market_df.loc[runner.selection_id, 'best_back_price'],2),
persistence_type='LAPSE'
)
)
market.place_order(order)
except AttributeError:
pass
class LiveLoggingControl(LoggingControl):
NAME = "BACKTEST_LOGGING_CONTROL"
def __init__(self, *args, **kwargs):
super(LiveLoggingControl, self).__init__(*args, **kwargs)
self._setup()
def _setup(self):
if os.path.exists("soccer_model.csv"):
logging.info("Results file exists")
else:
with open("soccer_model.csv", "w") as m:
csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
csv_writer.writeheader()
def _process_cleared_orders_meta(self, event):
orders = event.event
with open("soccer_model.csv", "a") as m:
for order in orders:
if order.order_type.ORDER_TYPE == OrderTypes.LIMIT:
size = order.order_type.size
else:
size = order.order_type.liability
if order.order_type.ORDER_TYPE == OrderTypes.MARKET_ON_CLOSE:
price = None
else:
price = order.order_type.price
try:
order_data = {
"bet_id": order.bet_id,
"strategy_name": order.trade.strategy,
"market_id": order.market_id,
"selection_id": order.selection_id,
"trade_id": order.trade.id,
"date_time_placed": order.responses.date_time_placed,
"price": price,
"price_matched": order.average_price_matched,
"size": size,
"size_matched": order.size_matched,
"profit": 0 if not order.cleared_order else order.cleared_order.profit,
"side": order.side,
"elapsed_seconds_executable": order.elapsed_seconds_executable,
"order_status": order.status.value,
"market_note": order.trade.market_notes,
"trade_notes": order.trade.notes_str,
"order_notes": order.notes_str,
}
csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
csv_writer.writerow(order_data)
except Exception as e:
logger.error(
"_process_cleared_orders_meta: %s" % e,
extra={"order": order, "error": e},
)
logger.info("Orders updated", extra={"order_count": len(orders)})
# Function that stops automation running at the end of the day
def terminate(
context: dict, flumine, today_only: bool = True, seconds_closed: int = 600
) -> None:
"""terminate framework if no markets
live today.
"""
markets = list(flumine.markets.markets.values())
markets_today = [
m
for m in markets
if m.market_start_datetime.date() == datetime.datetime.now().date()
and (
m.elapsed_seconds_closed is None
or (m.elapsed_seconds_closed and m.elapsed_seconds_closed < seconds_closed)
)
]
if today_only:
market_count = len(markets_today)
else:
market_count = len(markets)
if market_count == 0:
# logger.info("No more markets available, terminating framework")
flumine.handler_queue.put(TerminationEvent(flumine))
if __name__ == "__main__":
# Credentials to login and logging in
trading = bflw_trading()
client = clients.BetfairClient(trading, interactive_login=True, min_bet_validation = False)
soccer_ratings, soccer_markets = load_model_ratings()
framework = Flumine(client=client)
# Define the betting strategy
soccer_strategy = SoccerModel(
market_filter=streaming_market_filter(
market_ids=soccer_markets
),
max_order_exposure=100, # Max bet sizes of $100
max_trade_count=1, # Max of trade/bet attempt per selection
max_live_trade_count=1, # Max of 1 unmatched Bet per selection
max_selection_exposure=100,
context = {'model_prices':soccer_ratings}
)
framework.add_strategy(soccer_strategy)
framework.add_logging_control(LiveLoggingControl())
# Add the stopped to our framework
framework.add_worker(
BackgroundWorker(
framework,
terminate,
func_kwargs={"today_only": True, "seconds_closed": 1200},
interval=60,
start_delay=60,
)
)
Final Step
Running the below line of code will start the flumine instance and place real bets, so run this at your own risk
Conclusion
Hopefully this 3-part tutorial has been informative and insightful and provided lots of ideas for different strategies! Soccer/Football is the most popular sport across the Betfair Exchange and so there are numerous possibilities across different leagues to create a new strategy! If you have any questions, including about accessing historic pricing data, Australian and NZ customers can email us at automation@betfair.com.au!
Disclaimer
Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.