Skip to content

AFL Modelling Walkthrough

04. Weekly Predictions

Now that we have explored different algorithms for modelling, we can implement our chosen model and predict this week's AFL games! All you need to do is run the afl_modelling script each Thursday or Friday to predict the following week's games.

# Import Modules
from afl_feature_creation_v2 import prepare_afl_features
import afl_data_cleaning_v2
import afl_feature_creation_v2
import afl_modelling_v2
import datetime
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

Creating The Features For This Weekend's Games

To actually predict this weekend's games, we need to create the same features that we have created in the previous tutorials for the games that will be played this weekend. This includes all the rolling averages, efficiency features, elo features etc. So the majority of this tutorial will be using previously defined functions to create features for the following weekend's games.

Create Next Week's DataFrame

Let's first get our cleaned afl_data dataset, as well as the odds for next weekend and the 2018 fixture.

# Grab the cleaned AFL dataset and the column order
afl_data = afl_data_cleaning_v2.prepare_afl_data()
ordered_cols = afl_data.columns

# Define a function which grabs the odds for each game for the following weekend
def get_next_week_odds(path):
    # Get next week's odds
    next_week_odds = pd.read_csv(path)
    next_week_odds = next_week_odds.rename(columns={"team_1": "home_team", 
                                                "team_2": "away_team", 
                                                "team_1_odds": "odds", 
                                                "team_2_odds": "odds_away"
                                               })
    return next_week_odds

# Import the fixture
# Define a function which gets the fixture and cleans it up
def get_fixture(path):
    # Get the afl fixture
    fixture = pd.read_csv(path)

    # Replace team names and reformat
    fixture = fixture.replace({'Brisbane Lions': 'Brisbane', 'Footscray': 'Western Bulldogs'})
    fixture['Date'] = pd.to_datetime(fixture['Date']).dt.date.astype(str)
    fixture = fixture.rename(columns={"Home.Team": "home_team", "Away.Team": "away_team"})
    return fixture

next_week_odds = get_next_week_odds("data/weekly_odds.csv")
fixture = get_fixture("data/afl_fixture_2018.csv")
fixture.tail()
Date Season Season.Game Round home_team away_team Venue
202 2018-09-14 2018 1 26 Hawthorn Melbourne MCG
203 2018-09-15 2018 1 26 Collingwood GWS MCG
204 2018-09-21 2018 1 27 Richmond Collingwood MCG
205 2018-09-22 2018 1 27 West Coast Melbourne Optus Stadium
206 2018-09-29 2018 1 28 West Coast Collingwood MCG
next_week_odds
home_team away_team odds odds_away
0 West Coast Collingwood 2.34 1.75

Now that we have these DataFrames, we will define a function which combines the fixture and next week's odds to create a single DataFrame for the games over the next 7 days. To use this function we will need Game IDs for next week. So we will create another function which creates Game IDs by using the Game ID from the last game played and adding 1 to it.

# Define a function which creates game IDs for this week's footy games
def create_next_weeks_game_ids(afl_data):
    odds = get_next_week_odds("data/weekly_odds.csv")

    # Get last week's Game ID
    last_afl_data_game = afl_data['game'].iloc[-1]

    # Create Game IDs for next week
    game_ids = [(i+1) + last_afl_data_game for i in range(odds.shape[0])]
    return game_ids

# Define a function which creates this week's footy game DataFrame
def get_next_week_df(afl_data):
    # Get the fixture and the odds for next week's footy games
    fixture = get_fixture("data/afl_fixture_2018.csv")
    next_week_odds = get_next_week_odds("data/weekly_odds.csv")
    next_week_odds['game'] = create_next_weeks_game_ids(afl_data)

    # Get today's date and next week's date and create a DataFrame for next week's games
#     todays_date = datetime.datetime.today().strftime('%Y-%m-%d')

#     date_in_7_days = (datetime.datetime.today() + datetime.timedelta(days=7)).strftime('%Y-%m-%d')
    todays_date = '2018-09-27'
    date_in_7_days = '2018-10-04'
    fixture = fixture[(fixture['Date'] >= todays_date) & (fixture['Date'] < date_in_7_days)].drop(columns=['Season.Game'])
    next_week_df = pd.merge(fixture, next_week_odds, on=['home_team', 'away_team'])

    # Split the DataFrame onto two rows for each game
    h_df = (next_week_df[['Date', 'game', 'home_team', 'away_team', 'odds', 'Season', 'Round', 'Venue']]
               .rename(columns={'home_team': 'team', 'away_team': 'opponent'})
               .assign(home_game=1))

    a_df = (next_week_df[['Date', 'game', 'home_team', 'away_team', 'odds_away', 'Season', 'Round', 'Venue']]
                .rename(columns={'odds_away': 'odds', 'home_team': 'opponent', 'away_team': 'team'})
                .assign(home_game=0))

    next_week = a_df.append(h_df).sort_values(by='game').rename(columns={
        'Date': 'date',
        'Season': 'season',
        'Round': 'round',
        'Venue': 'venue'
    })
    next_week['date'] = pd.to_datetime(next_week.date)
    next_week['round'] = afl_data['round'].iloc[-1] + 1
    return next_week
next_week_df = get_next_week_df(afl_data)
game_ids_next_round = create_next_weeks_game_ids(afl_data)
next_week_df
date round season venue game home_game odds opponent team
0 2018-09-29 27 2018 MCG 15407 0 1.75 West Coast Collingwood
0 2018-09-29 27 2018 MCG 15407 1 2.34 Collingwood West Coast
fixture.tail()
Date Season Season.Game Round home_team away_team Venue
202 2018-09-14 2018 1 26 Hawthorn Melbourne MCG
203 2018-09-15 2018 1 26 Collingwood GWS MCG
204 2018-09-21 2018 1 27 Richmond Collingwood MCG
205 2018-09-22 2018 1 27 West Coast Melbourne Optus Stadium
206 2018-09-29 2018 1 28 West Coast Collingwood MCG

Create Each Feature

Now let's append next week's DataFrame to our afl_data, match_results and odds DataFrames and then create all the features we used in the AFL Feature Creation Tutorial. We need to append the games and then feed them into our function so that we can create features for upcoming games.

# Append next week's games to our afl_data DataFrame
afl_data = afl_data.append(next_week_df).reset_index(drop=True)

# Append next week's games to match results (we need to do this for our feature creation to run)
match_results = afl_data_cleaning_v2.get_cleaned_match_results().append(next_week_df)

# Append next week's games to odds
odds = (afl_data_cleaning_v2.get_cleaned_odds().pipe(lambda df: df.append(next_week_df[df.columns]))
       .reset_index(drop=True))
features_df = afl_feature_creation_v2.prepare_afl_features(afl_data=afl_data, match_results=match_results, odds=odds)
features_df.tail()
game home_team away_team date round venue season f_odds f_form_margin_btwn_teams f_form_past_5_btwn_teams f_odds_away f_elo_home f_elo_away f_I50_efficiency_home f_R50_efficiency_home f_I50_efficiency_away f_R50_efficiency_away f_AF_diff f_B_diff f_BO_diff f_CCL_diff f_CG_diff f_CL_diff f_CM_diff f_CP_diff f_D_diff f_ED_diff f_FA_diff f_FF_diff f_G_diff f_GA_diff f_GA1_diff f_HB_diff f_HO_diff f_I50_diff f_ITC_diff f_K_diff f_M_diff f_MG_diff f_MI5_diff f_One.Percenters_diff f_R50_diff f_SC_diff f_SCL_diff f_SI_diff f_T_diff f_T5_diff f_TO_diff f_UP_diff f_Unnamed: 0_diff f_behinds_diff f_goals_diff f_margin_diff f_opponent_behinds_diff f_opponent_goals_diff f_opponent_points_diff f_points_diff f_current_odds_prob f_current_odds_prob_away
1065 15397 Melbourne GWS 2018-08-26 23 M.C.G. 2018 1.966936 -23.2 2.0 1.813998 1523.456734 1609.444874 0.653525 0.680168 0.704767 0.749812 140.535514 0.605144 -9.771981 5.892176 7.172376 6.614609 -1.365211 30.766262 21.998618 0.067228 -1.404730 -3.166732 6.933998 6.675576 0.000000 38.708158 24.587333 12.008987 10.482382 -16.709540 -15.415060 289.188486 6.350287 -2.263536 -20.966818 50.388632 0.723637 15.537783 22.912269 2.065039 10.215523 -6.689429 3259.163465 -0.136383 3.553795 16.563721 -2.353514 1.162696 4.622664 21.186385 0.661551 0.340379
1066 15398 St Kilda North Melbourne 2018-08-26 23 Docklands 2018 5.089084 -3.2 2.0 2.577161 1397.237139 1499.366007 0.725980 0.655749 0.723949 0.677174 51.799992 3.399035 6.067393 -2.189489 -10.475859 1.154766 -8.883840 -21.810962 33.058382 40.618410 2.286314 -0.345734 -3.778445 -2.182673 0.000000 19.816372 -21.562916 2.678384 -14.777698 13.242010 12.065594 -82.381996 -2.176564 2.335825 -4.952336 45.719406 3.344217 -2.095613 -3.929084 -3.182381 -12.832197 57.226776 -20221.371526 1.968709 -1.897958 -15.177001 1.067099 0.781811 5.757963 -9.419038 0.284269 0.717566
1067 15404 Collingwood GWS 2018-09-15 25 M.C.G. 2018 1.882301 12.6 3.0 2.018344 1546.000498 1590.806454 0.693185 0.706222 0.718446 0.727961 205.916671 -1.642954 -2.980828 -0.266023 8.547225 -3.751909 -0.664977 10.563513 48.175985 43.531908 -5.836979 5.388668 4.395675 2.555152 0.000000 51.588962 11.558254 4.276481 11.284445 -3.412977 -2.206815 -234.577304 2.637758 -10.537765 -11.127876 125.607377 -3.485896 3.532031 15.102292 -2.500685 8.187543 38.053445 12500.525732 -1.006173 2.520135 18.634835 -2.159882 -0.393386 -4.520198 14.114637 0.608495 0.393856
1068 15406 West Coast Melbourne 2018-09-22 26 Perth Stadium 2018 2.013572 21.2 3.0 1.884148 1577.888606 1542.095154 0.688877 0.708941 0.649180 0.698319 -118.135184 -3.005709 2.453190 -5.103869 -14.368949 -12.245458 2.771411 -45.364271 -60.210182 -24.049523 -2.791277 6.115918 -5.041030 -5.335746 0.000000 -78.816902 -18.784547 -13.957754 -5.527613 18.606721 25.366778 -910.988860 -5.515812 -9.483590 8.914093 -131.380758 -7.142529 -49.484957 -13.718798 -4.862994 -9.834616 -23.673638 -3178.282073 -1.785349 -2.569957 -20.008787 0.476202 0.387915 2.803694 -17.205093 0.543774 0.457875
1069 15407 West Coast Collingwood 2018-09-29 27 MCG 2018 1.981832 17.2 3.0 1.838864 1591.348723 1562.924273 0.679011 0.724125 0.711352 0.709346 159.522670 0.893421 -0.475725 3.391070 -5.088751 5.875388 5.352234 7.729063 -7.358202 -4.719968 6.113565 4.822252 2.871241 2.690270 3.636364 -64.238180 -0.631102 2.078832 6.005613 56.879978 34.373271 1016.491933 1.199751 2.454685 12.197047 219.666562 2.484363 0.379162 2.566991 0.639666 2.258377 -23.841529 -368920.360240 -0.646160 0.892051 3.040850 1.589568 0.012622 1.665299 4.706148 0.427350 0.571429

Create Predictions For the Upcoming Round

Now that we have our features, we can use our model that we created in part 3 to predict the next round. First we need to filter our features_df into a training df and a df with next round's features/matches. Then we can use the model created in the last tutorial to create predictions. For simplicity, I have hardcoded the parameters we used in the last tutorial.

# Get the train df by only taking the games IDs which aren't in the next week df
train_df = features_df[~features_df.game.isin(next_week_df.game)]

# Get the result and merge to the feature_df
match_results = (pd.read_csv("data/afl_match_results.csv")
                    .rename(columns={'Game': 'game'})
                    .assign(result=lambda df: df.apply(lambda row: 1 if row['Home.Points'] > row['Away.Points'] else 0, axis=1)))

train_df = pd.merge(train_df,  match_results[['game', 'result']], on='game')

train_x = train_df.drop(columns=['result'])
train_y = train_df.result

next_round_x = features_df[features_df.game.isin(next_week_df.game)]
# Fit out logistic regression model - note that our predictions come out in the order of [away_team_prob, home_team_prob]

lr_best_params = {'C': 0.01,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'max_iter': 100,
 'multi_class': 'ovr',
 'n_jobs': 1,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'newton-cg',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

feature_cols = [col for col in train_df if col.startswith('f_')]

# Scale features
scaler = StandardScaler()
train_x[feature_cols] = scaler.fit_transform(train_x[feature_cols])
next_round_x[feature_cols] = scaler.transform(next_round_x[feature_cols])

lr = LogisticRegression(**lr_best_params)
lr.fit(train_x[feature_cols], train_y)
prediction_probs = lr.predict_proba(next_round_x[feature_cols])

modelled_home_odds = [1/i[1] for i in prediction_probs]
modelled_away_odds = [1/i[0] for i in prediction_probs]
# Create a predictions df
preds_df = (next_round_x[['date', 'home_team', 'away_team', 'venue', 'game']].copy()
               .assign(modelled_home_odds=modelled_home_odds,
                      modelled_away_odds=modelled_away_odds)
               .pipe(pd.merge, next_week_odds, on=['home_team', 'away_team'])
               .pipe(pd.merge, features_df[['game', 'f_elo_home', 'f_elo_away']], on='game')
               .drop(columns='game')
           )
preds_df
date home_team away_team venue modelled_home_odds modelled_away_odds odds odds_away f_elo_home f_elo_away
0 2018-09-29 West Coast Collingwood MCG 2.326826 1.753679 2.34 1.75 1591.348723 1562.924273

Alternatively, if you want to generate predictions using a script which uses all the above code, just run the following:

print(afl_modelling_v2.create_predictions())
        date   home_team    away_team venue  modelled_home_odds  \
0 2018-09-29  West Coast  Collingwood   MCG            2.326826

   modelled_away_odds  odds  odds_away   f_elo_home   f_elo_away  
0            1.753679  2.34       1.75  1591.348723  1562.924273

Conclusion

Congratulations! You have created AFL predictions for this week. If you are beginner to this, don't be overwhelmed. The process gets easier each time you do it. And it is super rewarding. In future iterations we will update this tutorial to predict actual odds, and then integrate this with Betfair's API so that you can create an automated betting strategy using Machine Learning to create your predictions!