Skip to content

How To Build A Betfair Soccer Bot Part 2

This is a continuation of the tutorial - How To Build A Betfair Soccer Bot Part 1

"I've trained a model, but how can I use it to bet with?"

This is an example of what we predicted in the previous tutorial

{date_home':'2022-08-13',
'match_id':'221827',
'name_home':'Arsenal',
'name_away':'Leicester',
'goalsScored_home':'4',
'goalsScored_away':'2',
'halfTimeGoalsScored_home':'2',
'halfTimeGoalsScored_away':'0',
'home_0_x_away_0':'0.0615',
'home_0_x_away_1':'0.0573',
'home_0_x_away_2':'0.0256',
'home_0_x_away_3':'0.0092',
'home_0_x_away_4':'0.0016',
'home_0_x_away_5':'0.0004',
'home_0_x_away_6':'0.0003',
'home_0_x_away_7':'0',
'home_1_x_away_0':'0.1152',
'home_1_x_away_1':'0.1074',
'home_1_x_away_2':'0.0479',
'home_1_x_away_3':'0.0172',
'home_1_x_away_4':'0.003',
'home_1_x_away_5':'0.0008',
'home_1_x_away_6':'0.0006',
'home_1_x_away_7':'0',
'home_2_x_away_0':'0.1132',
'home_2_x_away_1':'0.1054',
'home_2_x_away_2':'0.0471',
'home_2_x_away_3':'0.0169',
'home_2_x_away_4':'0.0029',
'home_2_x_away_5':'0.0008',
'home_2_x_away_6':'0.0006',
'home_2_x_away_7':'0',
'home_3_x_away_0':'0.0545',
'home_3_x_away_1':'0.0508',
'home_3_x_away_2':'0.0227',
'home_3_x_away_3':'0.0081',
'home_3_x_away_4':'0.0014',
'home_3_x_away_5':'0.0004',
'home_3_x_away_6':'0.0003',
'home_3_x_away_7':'0',
'home_4_x_away_0':'0.0388',
'home_4_x_away_1':'0.0362',
'home_4_x_away_2':'0.0161',
'home_4_x_away_3':'0.0058',
'home_4_x_away_4':'0.001',
'home_4_x_away_5':'0.0003',
'home_4_x_away_6':'0.0002',
'home_4_x_away_7':'0',
'home_5_x_away_0':'0.0083',
'home_5_x_away_1':'0.0077',
'home_5_x_away_2':'0.0034',
'home_5_x_away_3':'0.0012',
'home_5_x_away_4':'0.0002',
'home_5_x_away_5':'0.0001',
'home_5_x_away_6':'0',
'home_5_x_away_7':'0',
'home_6_x_away_0':'0.0024',
'home_6_x_away_1':'0.0022',
'home_6_x_away_2':'0.001',
'home_6_x_away_3':'0.0004',
'home_6_x_away_4':'0.0001',
'home_6_x_away_5':'0',
'home_6_x_away_6':'0',
'home_6_x_away_7':'0',
'home_7_x_away_0':'0.0004',
'home_7_x_away_1':'0.0004',
'home_7_x_away_2':'0.0002',
'home_7_x_away_3':'0.0001',
'home_7_x_away_4':'0',
'home_7_x_away_5':'0',
'home_7_x_away_6':'0',
'home_7_x_away_7':'0',
'home_0_ht_x_away_0_ht':'0.311',
'home_0_ht_x_away_1_ht':'0.115',
'home_0_ht_x_away_2_ht':'0.0213',
'home_0_ht_x_away_3_ht':'0.0115',
'home_0_ht_x_away_4_ht':'0.0002',
'home_0_ht_x_away_5_ht':'0',
'home_1_ht_x_away_0_ht':'0.2309',
'home_1_ht_x_away_1_ht':'0.0854',
'home_1_ht_x_away_2_ht':'0.0158',
'home_1_ht_x_away_3_ht':'0.0085',
'home_1_ht_x_away_4_ht':'0.0002',
'home_1_ht_x_away_5_ht':'0',
'home_2_ht_x_away_0_ht':'0.1158',
'home_2_ht_x_away_1_ht':'0.0428',
'home_2_ht_x_away_2_ht':'0.0079',
'home_2_ht_x_away_3_ht':'0.0043',
'home_2_ht_x_away_4_ht':'0.0001',
'home_2_ht_x_away_5_ht':'0',
'home_3_ht_x_away_0_ht':'0.0174',
'home_3_ht_x_away_1_ht':'0.0064',
'home_3_ht_x_away_2_ht':'0.0012',
'home_3_ht_x_away_3_ht':'0.0006',
'home_3_ht_x_away_4_ht':'0',
'home_3_ht_x_away_5_ht':'0',
'home_4_ht_x_away_0_ht':'0.0023',
'home_4_ht_x_away_1_ht':'0.0009',
'home_4_ht_x_away_2_ht':'0.0002',
'home_4_ht_x_away_3_ht':'0.0001',
'home_4_ht_x_away_4_ht':'0',
'home_4_ht_x_away_5_ht':'0',
'home_5_ht_x_away_0_ht':'0.0001',
'home_5_ht_x_away_1_ht':'0',
'home_5_ht_x_away_2_ht':'0',
'home_5_ht_x_away_3_ht':'0',
'home_5_ht_x_away_4_ht':'0',
'home_5_ht_x_away_5_ht':'0',
}

While we might directly be able to use these values for Correct Score markets, they're overall not that useful for other markets. We could certainly manually figure them out for upcoming matches but to run a back test on two seasons worth of data, we certainly don't want to be doing this manually for 748 matches. So let's run through how we turn this into something more useful!

Creating Rated Prices

Now that we've trained our model and calculated the probability for each unique scoreline possible for both full-time and half-time outcomes, we'll need to do some additional processing to be able to use these probabilities to bet into the Betfair markets. The next code block will take our model outputs and create rated prices for a selection of markets from the exchange.

This isn't an exhaustive list but it gives an idea of the type of calculation required to transition between modelling and betting.


The next block of code assumes that all required modules have already been imported

Define our markets
def process_model_predictions(model):

    df=pd.read_csv(f'{model}_model_results.csv')

    # List of indices representing the number of goals for which we want to calculate the probability
    full_time_indices = range(8) # 0,1,2,3,4,5,6,7
    half_time_indices = range(6) # 0,1,2,3,4,5

    '''
    The below code will calculate probabilities for each individual market and selection for most markets related to goals scored/conceded
    The new column name takes the format that matches the exchange 'marketName_selectionName'
    '''

    # Both Teams To Score
    df['Both Teams To Score?_Yes'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > 0 and j > 0]].sum(axis=1)
    df['Both Teams To Score?_No'] = 1 - df['Both Teams To Score?_Yes']

    # Match Odds
    df['Match Odds_Home'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j]].sum(axis=1)
    df['Match Odds_The Draw'] = df[[f'home_{i}_x_away_{i}' for i in full_time_indices]].sum(axis=1)
    df['Match Odds_Away'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < j]].sum(axis=1)

    # Draw No Bet - This is equivalent to match odds normalised after removal of the Draw probability (In the event of a draw, the market is voided)
    df['Draw No Bet_Home'] = df['Match Odds_Home'] / (1 - df['Match Odds_The Draw'])
    df['Draw No Bet_Away'] = df['Match Odds_Away'] / (1 - df['Match Odds_The Draw'])

    # Double Chance - This is a 2 winner market and has a market percentage of 200%
    df['Double Chance_Home Or Away'] = df['Match Odds_Home'] + df['Match Odds_Away']
    df['Double Chance_Home Or Draw'] = df['Match Odds_Home'] + df['Match Odds_The Draw']
    df['Double Chance_Draw Or Away'] = df['Match Odds_The Draw'] + df['Match Odds_Away']

    # Match Odds & Both Teams to Score
    df['Match Odds And Both Teams To Score_Home/Yes'] = df['Match Odds_Home'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j and j < 0.5]].sum(axis=1)
    df['Match Odds And Both Teams To Score_Home/No'] = df['Match Odds_Home'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j and j > 0.5]].sum(axis=1)
    df['Match Odds And Both Teams To Score_Draw/Yes'] = df['Match Odds_The Draw'] - df['home_0_x_away_0']
    df['Match Odds And Both Teams To Score_Draw/No'] = df['home_0_x_away_0']
    df['Match Odds And Both Teams To Score_Away/Yes'] = df['Match Odds_Away'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < j and i < 0.5]].sum(axis=1)
    df['Match Odds And Both Teams To Score_Away/No'] = df['Match Odds_Away'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < j and i > 0.5]].sum(axis=1)

    # Correct Score
    df['Correct Score_0 - 0'] = df['home_0_x_away_0']
    df['Correct Score_0 - 1'] = df['home_0_x_away_1']
    df['Correct Score_0 - 2'] = df['home_0_x_away_2']
    df['Correct Score_0 - 3'] = df['home_0_x_away_3']
    df['Correct Score_1 - 0'] = df['home_1_x_away_0']
    df['Correct Score_1 - 1'] = df['home_1_x_away_1']
    df['Correct Score_1 - 2'] = df['home_1_x_away_2']
    df['Correct Score_1 - 3'] = df['home_1_x_away_3']
    df['Correct Score_2 - 0'] = df['home_2_x_away_0']
    df['Correct Score_2 - 1'] = df['home_2_x_away_1']
    df['Correct Score_2 - 2'] = df['home_2_x_away_2']
    df['Correct Score_2 - 3'] = df['home_2_x_away_3']
    df['Correct Score_3 - 0'] = df['home_3_x_away_0']
    df['Correct Score_3 - 1'] = df['home_3_x_away_1'] 
    df['Correct Score_3 - 2'] = df['home_3_x_away_2']
    df['Correct Score_3 - 3'] = df['home_3_x_away_3']
    df['Correct Score_Any Other Home Win'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > 3 and i > j]].sum(axis=1)
    df['Correct Score_Any Other Draw'] = df[[f'home_{i}_x_away_{i}' for i in full_time_indices if i > 3]].sum(axis=1)
    df['Correct Score_Any Other Away Win'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j > 3 and j > i]].sum(axis=1)

    # Over/Under 5.5 Goals
    df['Over/Under 5.5 Goals_Over 5.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 5.5]].sum(axis=1)
    df['Over/Under 5.5 Goals_Under 5.5 Goals'] = 1 - df['Over/Under 5.5 Goals_Over 5.5 Goals']

    # Over/Under 4.5 Goals
    df['Over/Under 4.5 Goals_Over 4.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 4.5]].sum(axis=1)
    df['Over/Under 4.5 Goals_Under 4.5 Goals'] = 1 - df['Over/Under 4.5 Goals_Over 4.5 Goals']

    # Over/Under 3.5 Goals
    df['Over/Under 3.5 Goals_Over 3.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 3.5]].sum(axis=1)
    df['Over/Under 3.5 Goals_Under 3.5 Goals'] = 1 - df['Over/Under 3.5 Goals_Over 3.5 Goals']

    # Over/Under 2.5 Goals
    df['Over/Under 2.5 Goals_Over 2.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5]].sum(axis=1)
    df['Over/Under 2.5 Goals_Under 2.5 Goals'] = 1 - df['Over/Under 2.5 Goals_Over 2.5 Goals']

    # Over/Under 1.5 Goals
    df['Over/Under 1.5 Goals_Over 1.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 1.5]].sum(axis=1)
    df['Over/Under 1.5 Goals_Under 1.5 Goals'] = 1 - df['Over/Under 1.5 Goals_Over 1.5 Goals']

    # Over/Under 0.5 Goals
    df['Over/Under 0.5 Goals_Under 0.5 Goals'] = df['home_0_x_away_0']
    df['Over/Under 0.5 Goals_Over 0.5 Goals'] = 1 - df['home_0_x_away_0']

    # Home Over/Under 0.5 Goals
    df['Home Over/Under 0.5 Goals_Over 0.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > 0.5]].sum(axis=1)
    df['Home Over/Under 0.5 Goals_Under 0.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < 0.5]].sum(axis=1)

    # Home Over/Under 1.5 Goals
    df['Home Over/Under 1.5 Goals_Over 1.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > 1.5]].sum(axis=1)
    df['Home Over/Under 1.5 Goals_Under 1.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < 1.5]].sum(axis=1)

    # Home Over/Under 2.5 Goals
    df['Home Over/Under 2.5 Goals_Over 2.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > 2.5]].sum(axis=1)
    df['Home Over/Under 2.5 Goals_Under 2.5 Goals'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < 2.5]].sum(axis=1)

    # Away Over/Under 0.5 Goals
    df['Away Over/Under 0.5 Goals_Over 0.5 Goals'] = df[[f'home_{i}_x_away_{j}' for j in full_time_indices for i in full_time_indices if j > 0.5]].sum(axis=1)
    df['Away Over/Under 0.5 Goals_Under 0.5 Goals'] = df[[f'home_{i}_x_away_{j}' for j in full_time_indices for i in full_time_indices if j < 0.5]].sum(axis=1)

    # Away Over/Under 1.5 Goals
    df['Away Over/Under 1.5 Goals_Over 1.5 Goals'] = df[[f'home_{i}_x_away_{j}' for j in full_time_indices for i in full_time_indices if j > 1.5]].sum(axis=1)
    df['Away Over/Under 1.5 Goals_Under 1.5 Goals'] = df[[f'home_{i}_x_away_{j}' for j in full_time_indices for i in full_time_indices if j < 1.5]].sum(axis=1)

    # Away Over/Under 2.5 Goals
    df['Away Over/Under 2.5 Goals_Over 2.5 Goals'] = df[[f'home_{i}_x_away_{j}' for j in full_time_indices for i in full_time_indices if j > 2.5]].sum(axis=1)
    df['Away Over/Under 2.5 Goals_Under 2.5 Goals'] = df[[f'home_{i}_x_away_{j}' for j in full_time_indices for i in full_time_indices if j < 2.5]].sum(axis=1)

    # Match Odds & O/U 2.5 Goals
    df['Match Odds And Over/Under 2.5 Goals_Home/Over 2.5 Goals'] = df['Match Odds_Home'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) < 2.5 and i > j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Home/Under 2.5 Goals'] = df['Match Odds_Home'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5 and i > j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Draw/Over 2.5 Goals'] = df['Match Odds_The Draw']  - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) < 2.5 and i == j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Draw/Under 2.5 Goals'] = df['Match Odds_The Draw']  - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5 and i == j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Away/Over 2.5 Goals'] = df['Match Odds_Away'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) < 2.5 and i < j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Away/Under 2.5 Goals'] = df['Match Odds_Away'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5 and i < j]].sum(axis=1)

    # Match Odds & O/U 3.5 Goals
    df['Match Odds And Over/Under 2.5 Goals_Home/Over 3.5 Goals'] = df['Match Odds_Home'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) < 2.5 and i > j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Home/Under 3.5 Goals'] = df['Match Odds_Home'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5 and i > j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Draw/Over 3.5 Goals'] = df['Match Odds_The Draw']  - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) < 2.5 and i == j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Draw/Under 3.5 Goals'] = df['Match Odds_The Draw']  - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5 and i == j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Away/Over 3.5 Goals'] = df['Match Odds_Away'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) < 2.5 and i < j]].sum(axis=1)
    df['Match Odds And Over/Under 2.5 Goals_Away/Under 3.5 Goals'] = df['Match Odds_Away'] - df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if (i + j) > 2.5 and i < j]].sum(axis=1)

    # Home Win To Nil
    df['Home Win To Nil_Yes'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j and j == 0]].sum(axis=1)
    df['Home Win To Nil_No'] = 1 - df['Home Win To Nil_Yes']

    # Away Win To Nil
    df['Away Win To Nil_Yes'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j > i and i == 0]].sum(axis=1)
    df['Away Win To Nil_No'] = 1 - df['Away Win To Nil_Yes']

    # Home +1
    df['Home +1_Home +1'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j - 1]].sum(axis=1)
    df['Home +1_Draw'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i == j - 1]].sum(axis=1)
    df['Home +1_Away -1'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < j - 1]].sum(axis=1)

    # Away +1
    df['Away +1_Away +1'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j > i - 1]].sum(axis=1)
    df['Away +1_Draw'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j == i - 1]].sum(axis=1)
    df['Away +1_Home -1'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j < i - 1]].sum(axis=1)

    # Home +2
    df['Home +2_Home +2'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j - 2]].sum(axis=1)
    df['Home +2_Draw'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i == j - 2]].sum(axis=1)
    df['Home +2_Away -2'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < j - 2]].sum(axis=1)

    # Away +2
    df['Away +2_Away +2'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j > i - 2]].sum(axis=1)
    df['Away +2_Draw'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j == i - 2]].sum(axis=1)
    df['Away +2_Home -2'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j < i - 2]].sum(axis=1)

    # Home +3
    df['Home +3_Home +3'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i > j - 3]].sum(axis=1)
    df['Home +3_Draw'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i == j - 3]].sum(axis=1)
    df['Home +3_Away -3'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if i < j - 3]].sum(axis=1)

    # Away +3
    df['Away +3_Away +3'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j > i - 3]].sum(axis=1)
    df['Away +3_Draw'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j == i - 3]].sum(axis=1)
    df['Away +3_Home -3'] = df[[f'home_{i}_x_away_{j}' for i in full_time_indices for j in full_time_indices if j < i - 3]].sum(axis=1)

    # Half Time Result
    df['Half Time_Home'] = df[[f'home_{i}_ht_x_away_{j}_ht' for i in half_time_indices for j in half_time_indices if i > j]].sum(axis=1)
    df['Half Time_The Draw'] = df[[f'home_{i}_ht_x_away_{i}_ht' for i in half_time_indices]].sum(axis=1)
    df['Half Time_Away'] = df[[f'home_{i}_ht_x_away_{j}_ht' for i in half_time_indices for j in half_time_indices if i < j]].sum(axis=1)

    # Half Time / Full Time
    df['Half Time/Full Time_Home/Home'] = df['Match Odds_Home'] * df['Half Time_Home']
    df['Half Time/Full Time_Home/Draw'] = df['Match Odds_The Draw'] * df['Half Time_Home']
    df['Half Time/Full Time_Home/Away'] = df['Match Odds_Away'] * df['Half Time_Home']
    df['Half Time/Full Time_Draw/Home'] = df['Match Odds_Home'] * df['Half Time_The Draw']
    df['Half Time/Full Time_Draw/Draw'] = df['Match Odds_The Draw'] * df['Half Time_The Draw']
    df['Half Time/Full Time_Draw/Away'] = df['Match Odds_Away'] * df['Half Time_The Draw']
    df['Half Time/Full Time_Away/Home'] = df['Match Odds_Home'] * df['Half Time_Away']
    df['Half Time/Full Time_Away/Draw'] = df['Match Odds_The Draw'] * df['Half Time_Away']
    df['Half Time/Full Time_Away/Away'] = df['Match Odds_Away'] * df['Half Time_Away']

    # First Half Over/Under 2.5 Goals
    df['First Half Goals 2.5_Over 2.5 Goals'] = df[[f'home_{i}_ht_x_away_{j}_ht' for i in half_time_indices for j in half_time_indices if (i + j) > 2.5]].sum(axis=1)
    df['First Half Goals 2.5_Under 2.5 Goals'] = 1 - df['First Half Goals 2.5_Over 2.5 Goals']

    # First Half Over/Under 1.5 Goals
    df['First Half Goals 1.5_Over 1.5 Goals'] = df[[f'home_{i}_ht_x_away_{j}_ht' for i in half_time_indices for j in half_time_indices if (i + j) > 1.5]].sum(axis=1)
    df['First Half Goals 1.5_Under 1.5 Goals'] = 1 - df['First Half Goals 1.5_Over 1.5 Goals']

    # First Half Over/Under 0.5 Goals
    df['First Half Goals 0.5_Under 0.5 Goals'] = df['home_0_ht_x_away_0_ht']
    df['First Half Goals 0.5_Over 0.5 Goals'] = 1 - df['home_0_ht_x_away_0_ht']

    # Half Time Score
    df['Half Time Score_0 - 0'] = df['home_0_ht_x_away_0_ht']
    df['Half Time Score_0 - 1'] = df['home_0_ht_x_away_1_ht']
    df['Half Time Score_0 - 2'] = df['home_0_ht_x_away_2_ht']
    df['Half Time Score_1 - 0'] = df['home_1_ht_x_away_0_ht']
    df['Half Time Score_1 - 1'] = df['home_1_ht_x_away_1_ht']
    df['Half Time Score_1 - 2'] = df['home_1_ht_x_away_2_ht']
    df['Half Time Score_2 - 0'] = df['home_2_ht_x_away_0_ht']
    df['Half Time Score_2 - 1'] = df['home_2_ht_x_away_1_ht']
    df['Half Time Score_2 - 2'] = df['home_2_ht_x_away_2_ht']
    df['Half Time Score_Any Unquoted'] = df[[f'home_{i}_ht_x_away_{j}_ht' for i in half_time_indices for j in half_time_indices if i > 2 or j > 2]].sum(axis=1)

    # Reshape the dataframe to keep only required columns for our simulations
    df = df.drop(columns=new_column_names)
    df = df.drop(columns=['match_id','goalsScored_home','goalsScored_away','halfTimeGoalsScored_home','halfTimeGoalsScored_away'])
    df = df.rename(columns={'date_home':'event_date'})
    df.insert(3,'fixture',df['name_home'].fillna('').astype(str) + ' v ' + df['name_away'].fillna('').astype(str))

    # Create a rated price for each column, and ensuring that each rating is in the range valid for the exchange (1.01-1000)
    for col in df.columns:
        if col not in ['event_date','name_home','name_away','fixture']:
            # Avoid division by zero by replacing zeroes with 0.001 before taking the reciprocal
            df[col] = 1 / df[col].replace(0, 0.001)
            df[col] = df[col].round(2) # Round to 2 decimal places
            df[col] = df[col].clip(lower=1.01, upper=1000) # Restrict rated prices to exchange min/max prices

    '''
    The below code block is to transform the dataframe from one row per match to having one row per market/selection

    Pre-Transformation Shape: 144 columns x 727 rows
    Post-Transformation Shape: 5 columns x 92,202 rows
    '''
    # Initialize an empty list to store the sub-DataFrames
    sub_dfs = []

    # Get the list of columns excluding the first four
    columns = df.columns[4:]

    # Loop through each column
    for col in columns:
        # Create a sub-DataFrame with the first four columns and the current column
        sub_df = df.iloc[:, :4].copy()
        sub_df['rated_price'] = df[col]
        # Extract market_name and runner_name from the column name
        market_name, runner_name = col.split('_', 1)
        sub_df['market_name'] = market_name
        sub_df['runner_name'] = runner_name

        # Define a function to replace 'Home' and 'Away' only if 'Any Other' is not in the string
        def replace_names(value, home_name, away_name):
            if 'Any Other' not in value and ' Or ' not in value:
                value = value.replace('Home', home_name).replace('Away', away_name)
            return value

        sub_df['market_name'] = sub_df.apply(lambda row: replace_names(row['market_name'], row['name_home'], row['name_away']), axis=1)
        sub_df['runner_name'] = sub_df.apply(lambda row: replace_names(row['runner_name'], row['name_home'], row['name_away']), axis=1)

        # Add the sub-DataFrame to the list
        sub_dfs.append(sub_df)

    # Concatenate all sub-DataFrames
    final_df = pd.concat(sub_dfs, ignore_index=True)
    final_df = final_df[['event_date','fixture','market_name','runner_name','rated_price']]

    # Write our predictions to a csv file for our simulations
    final_df.to_csv(f'{model}_model_results_processed.csv',index=False)

models = ['ensemble','LGBMClassifier','KNeighborsClassifier','RandomForestClassifier','LogisticsRegression','GradientBoostingClassifier']

for model in models:

    process_model_predictions(model)
Following this we've applied the models to our test set (on which we'll run our simulations) and then generated rated prices for a (non-exhaustive) list of popular markets present on English Premier League matches.

Player markets are not modelled here due to the nature of the dataset but this method could be applied to Cards, Corners and Shots as these fields are all present in our training data.

There are a subset of markets which are only loaded in-play if required (e.g. Over/Under 10.5 Goals) which have been excluded. Markets like Over/Under 8.5 Goals exist but usually contain very little liquidity unless the goal count approaches that value (i.e. scores are unusually high and most volume will come in-play).

Now our model outputs for Arsenal v Leicester look like:

event_date fixture market_name runner_name rated_price
13/08/2022 Arsenal v Leicester Both Teams To Score? Yes 1.96
13/08/2022 Arsenal v Leicester Both Teams To Score? No 2.05
13/08/2022 Arsenal v Leicester Match Odds Arsenal 1.7
13/08/2022 Arsenal v Leicester Match Odds The Draw 4.44
13/08/2022 Arsenal v Leicester Match Odds Leicester 5.32
13/08/2022 Arsenal v Leicester Draw No Bet Arsenal 1.32
13/08/2022 Arsenal v Leicester Draw No Bet Leicester 4.12
13/08/2022 Arsenal v Leicester Double Chance Home Or Away 1.29
13/08/2022 Arsenal v Leicester Double Chance Home Or Draw 1.23
13/08/2022 Arsenal v Leicester Double Chance Draw Or Away 2.42
13/08/2022 Arsenal v Leicester Match Odds And Both Teams To Score Arsenal/Yes 3.33
13/08/2022 Arsenal v Leicester Match Odds And Both Teams To Score Arsenal/No 3.49
13/08/2022 Arsenal v Leicester Match Odds And Both Teams To Score Draw/Yes 8.69
13/08/2022 Arsenal v Leicester Match Odds And Both Teams To Score Draw/No 9.08
13/08/2022 Arsenal v Leicester Match Odds And Both Teams To Score Leicester/Yes 10.41
13/08/2022 Arsenal v Leicester Match Odds And Both Teams To Score Leicester/No 10.88
13/08/2022 Arsenal v Leicester Correct Score 0 - 0 16.25
13/08/2022 Arsenal v Leicester Correct Score 0 - 1 17.44
13/08/2022 Arsenal v Leicester Correct Score 0 - 2 39.08
13/08/2022 Arsenal v Leicester Correct Score 0 - 3 108.73
13/08/2022 Arsenal v Leicester Correct Score 1 - 0 8.68
13/08/2022 Arsenal v Leicester Correct Score 1 - 1 9.31
13/08/2022 Arsenal v Leicester Correct Score 1 - 2 20.87
13/08/2022 Arsenal v Leicester Correct Score 1 - 3 58.06
13/08/2022 Arsenal v Leicester Correct Score 2 - 0 8.84
13/08/2022 Arsenal v Leicester Correct Score 2 - 1 9.48
13/08/2022 Arsenal v Leicester Correct Score 2 - 2 21.25
13/08/2022 Arsenal v Leicester Correct Score 2 - 3 59.13
13/08/2022 Arsenal v Leicester Correct Score 3 - 0 18.34
13/08/2022 Arsenal v Leicester Correct Score 3 - 1 19.69
13/08/2022 Arsenal v Leicester Correct Score 3 - 2 44.11
13/08/2022 Arsenal v Leicester Correct Score 3 - 3 122.74
13/08/2022 Arsenal v Leicester Correct Score Any Other Home Win 8
13/08/2022 Arsenal v Leicester Correct Score Any Other Draw 928.52
13/08/2022 Arsenal v Leicester Correct Score Any Other Away Win 72.27
13/08/2022 Arsenal v Leicester Over/Under 5.5 Goals Over 5.5 Goals 16.73
13/08/2022 Arsenal v Leicester Over/Under 5.5 Goals Under 5.5 Goals 1.06
13/08/2022 Arsenal v Leicester Over/Under 4.5 Goals Over 4.5 Goals 6.79
13/08/2022 Arsenal v Leicester Over/Under 4.5 Goals Under 4.5 Goals 1.17
13/08/2022 Arsenal v Leicester Over/Under 3.5 Goals Over 3.5 Goals 3.3
13/08/2022 Arsenal v Leicester Over/Under 3.5 Goals Under 3.5 Goals 1.43
13/08/2022 Arsenal v Leicester Over/Under 2.5 Goals Over 2.5 Goals 1.92
13/08/2022 Arsenal v Leicester Over/Under 2.5 Goals Under 2.5 Goals 2.08
13/08/2022 Arsenal v Leicester Over/Under 1.5 Goals Over 1.5 Goals 1.31
13/08/2022 Arsenal v Leicester Over/Under 1.5 Goals Under 1.5 Goals 4.27
13/08/2022 Arsenal v Leicester Over/Under 0.5 Goals Under 0.5 Goals 16.25
13/08/2022 Arsenal v Leicester Over/Under 0.5 Goals Over 0.5 Goals 1.07
13/08/2022 Arsenal v Leicester Arsenal Over/Under 0.5 Goals Over 0.5 Goals 1.18
13/08/2022 Arsenal v Leicester Arsenal Over/Under 0.5 Goals Under 0.5 Goals 6.41
13/08/2022 Arsenal v Leicester Arsenal Over/Under 1.5 Goals Over 1.5 Goals 1.81
13/08/2022 Arsenal v Leicester Arsenal Over/Under 1.5 Goals Under 1.5 Goals 2.23
13/08/2022 Arsenal v Leicester Arsenal Over/Under 2.5 Goals Over 2.5 Goals 3.78
13/08/2022 Arsenal v Leicester Arsenal Over/Under 2.5 Goals Under 2.5 Goals 1.36
13/08/2022 Arsenal v Leicester Leicester Over/Under 0.5 Goals Over 0.5 Goals 1.65
13/08/2022 Arsenal v Leicester Leicester Over/Under 0.5 Goals Under 0.5 Goals 2.54
13/08/2022 Arsenal v Leicester Leicester Over/Under 1.5 Goals Over 1.5 Goals 4.2
13/08/2022 Arsenal v Leicester Leicester Over/Under 1.5 Goals Under 1.5 Goals 1.31
13/08/2022 Arsenal v Leicester Leicester Over/Under 2.5 Goals Over 2.5 Goals 13.48
13/08/2022 Arsenal v Leicester Leicester Over/Under 2.5 Goals Under 2.5 Goals 1.08
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 2.5 Goals Arsenal/Over 2.5 Goals 3.28
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 2.5 Goals Arsenal/Under 2.5 Goals 3.55
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 2.5 Goals Draw/Over 2.5 Goals 8.54
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 2.5 Goals Draw/Under 2.5 Goals 9.25
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 2.5 Goals Leicester/Over 2.5 Goals 10.23
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 2.5 Goals Leicester/Under 2.5 Goals 11.07
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 3.5 Goals Arsenal/Over 3.5 Goals 5.63
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 3.5 Goals Arsenal/Under 3.5 Goals 2.44
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 3.5 Goals Draw/Over 3.5 Goals 14.67
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 3.5 Goals Draw/Under 3.5 Goals 6.37
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 3.5 Goals Leicester/Over 3.5 Goals 17.57
13/08/2022 Arsenal v Leicester Match Odds And Over/Under 3.5 Goals Leicester/Under 3.5 Goals 7.63
13/08/2022 Arsenal v Leicester Arsenal Win To Nil Yes 3
13/08/2022 Arsenal v Leicester Arsenal Win To Nil No 1.5
13/08/2022 Arsenal v Leicester Leicester Win To Nil Yes 10.58
13/08/2022 Arsenal v Leicester Leicester Win To Nil No 1.1
13/08/2022 Arsenal v Leicester Arsenal +1 Arsenal +1 1.23
13/08/2022 Arsenal v Leicester Arsenal +1 Draw 8.07
13/08/2022 Arsenal v Leicester Arsenal +1 Leicester -1 15.6
13/08/2022 Arsenal v Leicester Leicester +1 Leicester +1 2.42
13/08/2022 Arsenal v Leicester Leicester +1 Draw 4.01
13/08/2022 Arsenal v Leicester Leicester +1 Arsenal -1 2.96
13/08/2022 Arsenal v Leicester Arsenal +2 Arsenal +2 1.07
13/08/2022 Arsenal v Leicester Arsenal +2 Draw 21.58
13/08/2022 Arsenal v Leicester Arsenal +2 Leicester -2 56.26
13/08/2022 Arsenal v Leicester Leicester +2 Leicester +2 1.51
13/08/2022 Arsenal v Leicester Leicester +2 Draw 5.51
13/08/2022 Arsenal v Leicester Leicester +2 Arsenal -2 6.41
13/08/2022 Arsenal v Leicester Arsenal +3 Arsenal +3 1.02
13/08/2022 Arsenal v Leicester Arsenal +3 Draw 75.29
13/08/2022 Arsenal v Leicester Arsenal +3 Leicester -3 222.5
13/08/2022 Arsenal v Leicester Leicester +3 Leicester +3 1.18
13/08/2022 Arsenal v Leicester Leicester +3 Draw 10.58
13/08/2022 Arsenal v Leicester Leicester +3 Arsenal -3 16.25
13/08/2022 Arsenal v Leicester Half Time Arsenal 2.39
13/08/2022 Arsenal v Leicester Half Time The Draw 2.47
13/08/2022 Arsenal v Leicester Half Time Leicester 5.65
13/08/2022 Arsenal v Leicester Half Time/Full Time Arsenal/Arsenal 4.08
13/08/2022 Arsenal v Leicester Half Time/Full Time Arsenal/Draw 10.62
13/08/2022 Arsenal v Leicester Half Time/Full Time Arsenal/Leicester 12.72
13/08/2022 Arsenal v Leicester Half Time/Full Time Draw/Arsenal 4.21
13/08/2022 Arsenal v Leicester Half Time/Full Time Draw/Draw 10.97
13/08/2022 Arsenal v Leicester Half Time/Full Time Draw/Leicester 13.13
13/08/2022 Arsenal v Leicester Half Time/Full Time Leicester/Arsenal 9.63
13/08/2022 Arsenal v Leicester Half Time/Full Time Leicester/Draw 25.1
13/08/2022 Arsenal v Leicester Half Time/Full Time Leicester/Leicester 30.07
13/08/2022 Arsenal v Leicester First Half Goals 2.5 Over 2.5 Goals 8.29
13/08/2022 Arsenal v Leicester First Half Goals 2.5 Under 2.5 Goals 1.14
13/08/2022 Arsenal v Leicester First Half Goals 1.5 Over 1.5 Goals 2.91
13/08/2022 Arsenal v Leicester First Half Goals 1.5 Under 1.5 Goals 1.52
13/08/2022 Arsenal v Leicester First Half Goals 0.5 Under 0.5 Goals 3.22
13/08/2022 Arsenal v Leicester First Half Goals 0.5 Over 0.5 Goals 1.45
13/08/2022 Arsenal v Leicester Half Time Score 0 - 0 3.22
13/08/2022 Arsenal v Leicester Half Time Score 0 - 1 8.7
13/08/2022 Arsenal v Leicester Half Time Score 0 - 2 46.97
13/08/2022 Arsenal v Leicester Half Time Score 1 - 0 4.33
13/08/2022 Arsenal v Leicester Half Time Score 1 - 1 11.72
13/08/2022 Arsenal v Leicester Half Time Score 1 - 2 63.26
13/08/2022 Arsenal v Leicester Half Time Score 2 - 0 8.63
13/08/2022 Arsenal v Leicester Half Time Score 2 - 1 23.36
13/08/2022 Arsenal v Leicester Half Time Score 2 - 2 126.13
13/08/2022 Arsenal v Leicester Half Time Score Any Unquoted 18.5

Flumine Simulation

Our next step is to take our results and run simulations on the Betfair Historical Stream Files. These files have a cost associated with them (Australia and New Zealand customers should reach out to us at automation@betfair.com.au to discuss options). We'll use Flumine and its simulation mode to run these tests as these tend to be accurate, and will also help us to test out our string matching against the exchange markets, as this is key to being able to place live bets.

It's important to note that the historic files do not contain cross-matching volume (also called virtual bets) or information from the market catalogue. So just be aware that live behaviour may not necessarily reflect the simulations. Runner information is contained within market_book.market_definition.runners so runner names will be available in the files.

Unzipping The Files

Flumine requires the tar files to be unzipped to run the simulations so we'll iterate over the tar files to extract only the markets we need before commencing the simulation.

We'll quickly process our tar files using our super fast Rust-driven tutorial: JSON to CSV Revisited

By doing so we've created a csv file where we can easily pick out the markets we want to use. (You can skip this step and use the flumine process of check_market_book to filter out unwanted markets)

Market Ids

Market ids in the stream files are denoted by "1.XXXXXXXXX", however in the provided csv file we have removed the "1.".

In excel we do this using the formula: (marketId - 1) * 1000000000

We do this because excel likes to truncate trailing zeros on decimal numbers. The python line is provided below

Remove '1.' from market_id in Python
df['MARKET_ID'] = df['MARKET_ID'].apply(lambda x: str(x).ljust(9, '0')[:9])
Unzip Historic Stream Files
import os
import glob
import shutil
import tarfile
import bz2
import pandas as pd

'''
Here we will specify the folder where we are storing our stream files and where we want to extract the files to
The code will check if the output folder exists, and will create one if it does not.
It will then check the folder and delete any files in the folder that are not tar or csv files
This deletion is to clear any previously extracted stream files and will be much faster than manually deleting them
'''

# Specify the directory where your stream files are stored and where you want to extract the files
source_folder = 'DIRECTORY OF DOWNLOADED FILES'
output_folder = 'DIRECTORY TO OUTPUT DECOMPRESSED STREAM FILES'

# Ensure the output folder exists
os.makedirs(output_folder, exist_ok=True)

# Get a list of all files in the directory
files = glob.glob(os.path.join(output_folder, '*'))

# Loop through the files and delete if they are not tar/csv files
for file in files:
    if not file.endswith('.tar') and not file.endswith('.csv'):
        try:
            os.remove(file)
            print(f'Deleted: {file}')
        except Exception as e:
            print(f'Error deleting {file}: {e}')

'''
The following section will iterate over all files in your folder, so the options to optimize are:

    - Move the files you don't want to process to another folder
    - Copy the files you do want to process to a new folder
    - Use string matching to remove unwanted files from the tar_files list
    - Manually specify each file by typing out a list
'''

# Iterate over all .tar files in the source folder
tar_files = glob.glob(os.path.join(source_folder, '*.tar'))

def retrieve_betfair_markets():
    markets = pd.read_csv('EPL_Markets.csv',dtype={'MARKET_ID' : str})

    markets = markets['MARKET_ID'].tolist()

    return markets

markets = retrieve_betfair_markets()

'''
The next code block here will iterate over each stream file and check if the market_id is in the list of our win market ids
Only the win markets will be extracted to the folder.
'''

for tar_path in tar_files:

    with tarfile.open(tar_path, 'r') as tar:
        # Iterate over each file in the tar archive
        for member in tar.getmembers():
            if member.name.endswith('.bz2'):
                # Extract the .bz2 file to a temporary location
                extracted_bz2_path = os.path.join(output_folder, os.path.basename(member.name))

                # Determine the final output path by removing the .bz2 extension
                final_output_path = extracted_bz2_path[:-4]
                market_id = extracted_bz2_path[-13:-4]

                if market_id in markets:

                    try:
                        with tar.extractfile(member) as extracted_file, open(extracted_bz2_path, 'wb') as temp_bz2_file:
                            shutil.copyfileobj(extracted_file, temp_bz2_file)

                        # Extract the .bz2 file to the final destination
                        with bz2.BZ2File(extracted_bz2_path, 'rb') as bz2_file, open(final_output_path, 'wb') as output_file:
                            shutil.copyfileobj(bz2_file, output_file)

                        # Remove the temporary .bz2 file
                        os.remove(extracted_bz2_path)

                        print(f'Extracted {member.name} to {final_output_path}')
                    except OSError:
                        pass

Running the Simulation

Now that we've unzipped the files, we'll need to run the simulations on the files.

Just a word of warning: This is not a fast process to run over two years of EPL data. It took my machine with 4 CPUs about 5 hours to pass over this data. For ~27,000 markets this equated to about 1.6 seconds per market, which is quite reasonable.

The advice here is to use the check_market_book function to not process markets that you're not interested in (e.g. OVER/UNDER_8.5_GOALS, SHOTS_ON_TARGET).

There's a code snippet included after the simulation code that enables you to delete markets that you've already processed so the entire process doesn't need to be restarted if it crashes or needs to be paused.

Import modules and set-up logging
# Import libraries
import os
import time
import logging
import csv
import pandas as pd
from pythonjsonlogger import jsonlogger
from flumine import FlumineSimulation, BaseStrategy, clients
from flumine.order.trade import Trade
from flumine.order.order import LimitOrder
from flumine.order.ordertype import OrderTypes
from flumine.markets.market import Market
from flumine.controls.loggingcontrols import LoggingControl
from betfairlightweight.resources import MarketBook
from pythonjsonlogger import jsonlogger
from concurrent import futures 
from flumine.utils import price_ticks_away
from collections import OrderedDict
from tqdm import tqdm
from multiprocessing import Lock

# Create a global lock
file_write_lock = Lock()

# Create custom logger
logger = logging.getLogger()

# Remove existing handlers to avoid duplicates
logger.handlers = []

# Set up JSON formatter for the log file
log_handler = logging.FileHandler(destination+'process_logs.log')
custom_format = "%(asctime)s %(levelname)s %(message)s"
formatter = jsonlogger.JsonFormatter(custom_format)
formatter.converter = time.gmtime  # Optional: Use UTC/GMT time for logs
log_handler.setFormatter(formatter)
logger.addHandler(log_handler)

# Ensure logs are only written to the file
logger.propagate = False

# Specify the folder where the unzipped stream files are stored
source_folder = output_folder

'''
This code file is designed to iterate over the folder with the previously unzipped stream files
and place simulated bets on selections at 10 minutes before kick-off

We will speed this process up using multi-threading
'''

# Function to split our list into chunks
def split_list(lst, chunk_size):
    for i in range(0, len(lst), chunk_size):
        yield lst[i:i + chunk_size]

# Function to process the runner_book to gather each selection_id
def process_runner_books(runner_books):
    selection_ids = [runner_book.selection_id for runner_book in runner_books]

    df = pd.DataFrame({
        'selection_id': selection_ids,
    })
    return df.set_index('selection_id')

# Function to process the runner catalogue to gather all selection names
def process_runner_catalogue(market_book: MarketBook):

    runners_df = process_runner_books(market_book.runners)

    for runner in market_book.runners:
        runner_name = next((rd.name for rd in market_book.market_definition.runners if rd.selection_id == runner.selection_id), None)
        # rstrip() removes any trailing white spaces
        runners_df.loc[runner.selection_id, 'runner_name'] = runner_name.rstrip().title()

    return runners_df

# Defining our flumine class
class SoccerSimulation(BaseStrategy):

    '''
    The __init__ function defines what the strategy should do when it first fires
    We define our external dataframe where we have loaded our rated prices
    It is essential that we tie the dataframe to the class using a self definition
    We also define an empty list to use later
    '''
    def __init__(self, model, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.processed_selection_ids = []
        self._is_main_df_pruned = False
        self._pruned_soccer_df = None
        self.model = model

        # Read the model results CSV for the current model
        self.soccer_df = pd.read_csv(
            destination + f'{self.model}_model_results_processed.csv',
            parse_dates=['event_date'],
            dayfirst=False
        )

    def check_market_book(self, market: Market, market_book: MarketBook) -> bool:
        ''' 
        process_market_book only executed if this returns True.
        if True is not returned then the framework will skip to the next market
        '''
        # This check skips markets for the ring-fenced Italian and Spanish exchanges
        if market_book.market_definition.regulators != ["MR_INT"]:
            return False
        if market_book.status != "CLOSED" and market_book.inplay == False:
            return True

    def prune_soccer_df(self, market_name, event_date, fixture):
        """ """
        event_date = pd.to_datetime(event_date)

        self._pruned_soccer_df = self.soccer_df[
            (self.soccer_df['fixture'] == fixture)
            & (self.soccer_df['event_date'] == event_date)
            & (self.soccer_df['market_name'] == market_name)
        ]

        return self._pruned_soccer_df

    def process_market_book(self, market: Market, market_book: MarketBook) -> None:

        # Create a dataframe with all the selection_ids and runner_names from the market
        runners_df = process_runner_catalogue(market_book)
        # Define the event name as the fixture name e.g. Aston Villa v Chelsea
        runners_df['fixture'] = market_book.market_definition.event_name
        # Define the market name and ensure each word is capitalised
        runners_df['market_name'] = market_book.market_definition.name.title()
        # Extract market_type
        runners_df['market_type'] = market_book.market_definition.market_type
        # Extract the date from the market_time
        runners_df['event_date'] = market_book.market_definition.market_time.date()
        # Ensure the event_date is datetime format
        runners_df['event_date'] = pd.to_datetime(runners_df['event_date'])
        # Preserve the index as a column before merging
        runners_df['selection_id'] = runners_df.index

        '''
        Flumine keeps track of the market_time as it moves through the market. When doing this merge,
        it will use the market_time at the point of the merge. There is some strange behaviour in the
        historic stream files where the market_time will change to a different date and then back to
        the original date back again. By using the counter we're able to ensure that we always have a
        rated price to bet with by retrying until we have a pruned ratings dataframe that isn't empty
        '''
        # Merge the market info with our ratings dataframe
        if not self._is_main_df_pruned:
            retry_count = 0
            max_retries = 10
            while retry_count < max_retries:
                # Attempt to prune the soccer dataframe
                self._pruned_soccer_df = self.prune_soccer_df(
                    market_name=market_book.market_definition.name.title(),
                    event_date=market_book.market_definition.market_time.date(),
                    fixture=market_book.market_definition.event_name,
                )

                # Check if the dataframe is not empty
                if not self._pruned_soccer_df.empty:
                    # If successful, mark the flag and break out of the loop
                    self._is_main_df_pruned = True
                    retry_count = max_retries
                else:
                    # Increment the retry counter and try again
                    retry_count += 1

        market_df = pd.merge(
            runners_df,
            self._pruned_soccer_df,
            how="left",
            on=["event_date", "fixture", "market_name", "runner_name"],
        )
        # Set the index as the selection_id for order placement
        market_df.set_index('selection_id',inplace=True)

        if round(market.seconds_to_start, 0) <= 600:

            # Loop over each runner in the market
            for runner in market_book.runners:
                # Check runner isn't scratched and that first layer of back/lay prices exists
                if runner.status == "ACTIVE" and len(runner.ex.available_to_back) > 0 and len(runner.ex.available_to_lay) > 0:
                    runner_name = market_df.loc[runner.selection_id, 'runner_name']
                    event_name = market_df.loc[runner.selection_id, 'fixture']
                    market_type = market_df.loc[runner.selection_id, 'market_type']
                    market_name = market_df.loc[runner.selection_id, 'market_name']
                    rated_price = float(market_df.loc[runner.selection_id, 'rated_price'] or 0)

                    if runner.selection_id not in self.processed_selection_ids and rated_price > 1:
                        # Create our ordered dictionary to store our order notes
                        notes = OrderedDict()
                        # Write our order notes - adding these notes is time-consuming but is helpful for troubleshooting string matching issues
                        notes["fixture"] = "fixture:" + str(event_name)
                        notes["market_type"] = "market_type:" + str(market_type)
                        notes["market"] = "market:" + str(market_name)
                        notes["runner_name"] = "selection:" + str(runner_name).replace(' - ','&')
                        notes["rated_price"] = "rated_price:" + str(rated_price)

                        trade = Trade(
                            market_id=market_book.market_id,
                            selection_id=runner.selection_id,
                            handicap=runner.handicap,
                            notes=notes,
                            strategy=self,
                        )
                        order = trade.create_order(
                            side="LAY",
                            order_type=LimitOrder(
                                price=runner.ex.available_to_lay[0]['price'],
                                size=round(100 / rated_price, 2),
                                persistence_type="LAPSE",
                            )
                        )
                        market.place_order(order)

                        order = trade.create_order(
                            side="BACK",
                            order_type=LimitOrder(
                                price=runner.ex.available_to_back[0]['price'],
                                size=round(100 / rated_price, 2),
                                persistence_type="LAPSE",
                            )
                        )
                        market.place_order(order)
                        # Add the selection to the list to ensure that we don't bet on it again.
                        self.processed_selection_ids.append(runner.selection_id)

# Fields we want to log in our simulations
FIELDNAMES = [
    "bet_id",
    "strategy_name",
    "market_id",
    "selection_id",
    "trade_id",
    "date_time_placed",
    "price",
    "price_matched",
    "size",
    "size_matched",
    "profit",
    "side",
    "elapsed_seconds_executable",
    "order_status",
    "market_note",
    "trade_notes",
    "order_notes",
]

# Class to define logging class and output results to csv files
class BacktestLoggingControl(LoggingControl):
    NAME = "BACKTEST_LOGGING_CONTROL"

    def __init__(self, model, *args, **kwargs):
        self.model = model
        super().__init__(*args, **kwargs)
        self._setup()

    def _setup(self):
        if os.path.exists(destination + f"soccer_simulation_{self.model}.csv"):
            logging.info("Results file exists")
        else:
            with open(destination + f"soccer_simulation_{self.model}.csv", "w") as m:
                csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                csv_writer.writeheader()

    def _process_cleared_orders_meta(self, event):
        orders = event.event
        file_path = destination + f"soccer_simulation_{self.model}.csv"

        # Locking around the file write operation
        with file_write_lock:
            with open(file_path, "a") as m:
                csv_writer = csv.DictWriter(m, delimiter=",", fieldnames=FIELDNAMES)
                for order in orders:
                    if order.order_type.ORDER_TYPE == OrderTypes.LIMIT:
                        size = order.order_type.size
                    else:
                        size = order.order_type.liability
                    price = None if order.order_type.ORDER_TYPE == OrderTypes.MARKET_ON_CLOSE else order.order_type.price

                    try:
                        order_data = {
                            "bet_id": order.bet_id,
                            "strategy_name": order.trade.strategy,
                            "market_id": order.market_id,
                            "selection_id": order.selection_id,
                            "trade_id": order.trade.id,
                            "date_time_placed": order.responses.date_time_placed,
                            "price": price,
                            "price_matched": order.average_price_matched,
                            "size": size,
                            "size_matched": order.size_matched,
                            "profit": order.simulated.profit,
                            "side": order.side,
                            "elapsed_seconds_executable": order.elapsed_seconds_executable,
                            "order_status": order.status.value,
                            "market_note": order.trade.market_notes,
                            "trade_notes": order.trade.notes_str,
                            "order_notes": order.notes_str,
                        }
                        csv_writer.writerow(order_data)
                    except Exception as e:
                        logger.error("_process_cleared_orders_meta: %s" % e, extra={"order": order, "error": e})

            logger.info("Orders updated", extra={"order_count": len(orders)})

def run_process(chunk,model):  
    try:
        # Set Flumine to simulation mode
        client = clients.SimulatedClient(min_bet_validation=False)
        framework = FlumineSimulation(client=client)

        # Set parameters for our strategy
        strategy = SoccerSimulation(
            market_filter={
                "markets": chunk,  
                "listener_kwargs": {"inplay":False,"seconds_to_start": 660},  
            },
            model = model,
            max_order_exposure=1000,
            max_selection_exposure=1000,
            max_live_trade_count=2,
            max_trade_count=2,
        )
        # Run our strategy on the simulated market
        framework.add_strategy(strategy)
        framework.add_logging_control(BacktestLoggingControl(model))
        framework.run()
    except Exception as e:
        logger.error(f"Error in run_process: {e}")

def process_csv_file(model):

    df = pd.read_csv(destination+f'soccer_simulation_{model}.csv')

    # Process market_id column
    df['market_id'] = df['market_id'].astype(str).str.replace('1.', '', regex=False).str.ljust(9, '0')

    # Split trade_notes column into three new columns
    df[['fixture','market_type','market','selection',f'{model}_rated_price']] = df['trade_notes'].str.split(',', expand=True)

    # Remove column names from the strings in each row
    df['fixture'] = df['fixture'].str.replace('fixture:', '', regex=False)
    df['market_type'] = df['market_type'].str.replace('market_type:', '', regex=False)
    df['market'] = df['market'].str.replace('market:', '', regex=False)
    df['selection'] = df['selection'].str.replace('selection:', '', regex=False)
    df[f'{model}_rated_price'] = df[f'{model}_rated_price'].str.replace('rated_price:', '', regex=False).astype(float)
    df[f'{model}_implied_value'] = 1/df['price_matched'] - 1/df[f'{model}_rated_price']

    # Modify the date parsing line to handle errors
    df['date_time_placed'] = pd.to_datetime(df['date_time_placed'], errors='coerce')
    df = df.dropna(subset=['date_time_placed'])
    df['date_time_placed'] = df['date_time_placed'].dt.tz_localize('UTC', ambiguous=False).dt.strftime('%d-%m-%Y %H:%M:%S')

    # Select relevant columns
    columns_to_keep = ['date_time_placed', 'fixture', 'market_id','market_type','market', 'selection_id', 'selection', 'side', f'{model}_rated_price', f'{model}_implied_value', 'price_matched', 'size', 'size_matched', 'profit']
    df = df[columns_to_keep]

    df.to_csv(destination+f'soccer_simulation_{model}_processed.csv', index=False)

def process_model(model):
    logging.info(f"Processing model: {model}")

    data_folder = source_folder
    data_files = [os.path.join(data_folder, path) for path in os.listdir(data_folder)]

    chunks = list(split_list(data_files, 1000))  # Split data into chunks of 1000 files

    # Iterate over each chunk
    for chunk_index, chunk in enumerate(tqdm(chunks, desc="Chunks Progress")):
        processes = 4  # Number of processes
        markets_per_process = 1  # Number of markets each process should handle

        # Create a list to store futures for tracking process completion
        _process_jobs = []

        with futures.ProcessPoolExecutor(max_workers=processes) as executor:
            for market_subset in split_list(chunk, markets_per_process):
                # Submit jobs to executor
                _process_jobs.append(executor.submit(run_process, market_subset, model))

            # Ensure each job completes and handle any potential errors
            for job in futures.as_completed(_process_jobs):
                try:
                    job.result()  # Trigger any exceptions raised
                except Exception as e:
                    logging.error(f"Error processing chunk {chunk_index+1} for model {model}: {e}")

        logging.info(f"Completed processing chunk {chunk_index+1} for model {model}")

    # Once done, process the final CSV
    process_csv_file(model)
    logging.info(f"Completed processing for model {model}")

if __name__ == '__main__':
    process_model('ensemble')
    # You can substitute any model here that you want to run e.g. 'LogisticsRegression'
Delete Processed Markets
'''
If the market processing is stopped for any reason, use the below code to delete any markets that have
already been processed so you don't have to re-process them
'''

# Load the CSV file and extract unique values from the 'market_id' column
csv_file = destination+'soccer_simulation_ensemble.csv' 
df = pd.read_csv(csv_file,dtype={'market_id':'str'})
market_ids = df['market_id'].unique().tolist()
# Define the folder to iterate over
folder_path = 'FOLDER WHERE THE DECOMPRESSED STREAM FILES ARE STORED'

# Iterate over files in the folder
for filename in os.listdir(folder_path):
    # Check if the filename contains a market_id
    file_path = os.path.join(folder_path, filename)
    # If the filename (without extension) is in the market_ids list, print it
    if filename not in market_ids:
        print(f"{filename} has not been processed.")
    else:
        # If not, delete the file
        print(f"{filename} is in the list, deleting...")
        os.remove(file_path)

Conclusion

In this tutorial we've taken our previously trained model, processed the model outputs into rated prices for the exchange markets, unzipped some historical files and run simulations using our rated prices! In Part III we'll take our simulation output to decide what we can use to bet into future markets!

Disclaimer

Note that whilst models and automated strategies are fun and rewarding to create, we can't promise that your model or betting strategy will be profitable, and we make no representations in relation to the code shared or information on this page. If you're using this code or implementing your own strategies, you do so entirely at your own risk and you are responsible for any winnings/losses incurred. Under no circumstances will Betfair be liable for any loss or damage you suffer.