Now showing 1 - 4 of 4
  • Publication
    Predicting Match Outcomes in Football by an Ordered Forest Estimator
    Predicting the outcome of football (i.e. soccer) games based on past information is a non-standard predictive task because of the nature of the game outcome, as well as because of the importance of uncertainty (luck and unobservables). The game outcome consists of the scores of the two teams that are usually either collapsed into a goal-difference or further aggregated to reflect whether the game ended as a win for the home or away team, or as a draw. From a statistical perspective, such outcomes have bounded support and, thus, standard linear modelling can be expected to perform poorly. The large amount of uncertainty in the game outcomes due to just luck or due to game- or team-specific unobservables (e.g. hidden injuries of players, etc.) makes it imperative to use prediction methods that fully exploit the potential of the available information, as well as to uncover the uncertainty of a match outcome. The latter is also relevant when interest is not only in single games but also in a league table at the end of the season. Obviously, such league tables should capture the uncertainty for the single games accumulated over a season to be useful guides on what to expect. Recently, machine learning methods have shown their power in all sorts of prediction problems, in particular in situations where the relation of the variables capturing the information used to predict with the target of the prediction, i.e. here the outcome of the game, is non-linear. However, so far there has been only little development in gearing these methods explicitly towards the estimation of the probabilities of ordered outcomes, such as score differences and points, or just wins, draws, and losses. Lechner and Okasa (2019) propose adapting classical random forest estimation, which is known to have excellent predictive performance (e.g. Biau and Scornet (2016), Fernández-Delgado et al. (2014)) to the problem of predicting probabilities of ordered categorical outcomes, such as the win-draw-loss problem of a football game. In this chapter, we use their approach to predict game outcomes of the German Bundesliga 1 (BL1) based on more than ten years' data on game outcomes as well as extensive information about teams, their players, and their environment. These predictions are then used to obtain the final season rankings in a way that reflects and shows the magnitude of the inherent uncertainty of football games.
  • Publication
    Essays in Predictive and Causal Machine Learning
    (Universität St. Gallen, 2022-02-21)
    This dissertation consists of three chapters devoted to topics in predictive and causal machine learning. Common to all chapters is the synthesis of classical econometric methods and novel machine learning algorithms. Hence this doctoral thesis provides new insights into applications of machine learning for predictive tasks and for causal inference. The first chapter investigates the estimation of heterogeneous causal effects using machine learning. We focus on the meta-learning framework where the estimation of the causal parameter is decomposed into separate prediction tasks. Using synthetic and empirical simulations we study the finite sample performance of meta-learners based on the Random Forest algorithm under different implementations using sample-splitting and cross-fitting procedures. The results imply that sample-splitting is beneficial in large samples for bias reduction but leads to an increase in variance, whereas cross-fitting keeps the bias low and successfully restores the full sample size efficiency. In contrast, the full-sample estimation is preferable in small samples when using machine learning. Additionally, we provide guidelines for applications of meta-learners in empirical studies depending on particular data characteristics such as treatment shares and sample size. The second chapter considers the estimation of ordered choice models using machine learning. Similarly, as in the first chapter, we focus on the Random Forest algorithm and develop a new machine learning estimator for models with ordered categorical outcome variable. The proposed Ordered Forest flexibly estimates the conditional ordered choice probabilities while taking the ordering information explicitly into account. In contrast to common machine learning estimators, it is not only suited for prediction tasks, but it also enables the estimation of marginal effects and conducting statistical inference, which provides additional interpretability as in classical econometric estimators. We conduct an extensive simulation study and find a good predictive performance, particularly in settings with nonlinearities and multicollinearity. Furthermore, we demonstrate the estimation of marginal effects and their standard errors in an empirical application. The third chapter presents an empirical application based on the estimation of causal effects using machine learning. As in the previous two chapters, we rely on the Random Forest method and consider its causal variant, the Modified Causal Forest. Following the rise of online dating, we study the effect of sport activity on partner choice by exploiting a unique dataset from an online dating platform. In particular, we estimate the causal effect of sport frequency on the contact chances, controlling for a large set of observable user characteristics. We find that for male users, doing sport on a weekly basis increases the probability to receive a first message by more than 50%, in comparison to no sport activity. In contrast, we do not find such an evidence for female users. Moreover, the results indicate heterogeneity as for male users the effect increases with higher income.