Analysing Individuals’ Fertility Behaviour Using Machine Learning Techniques: An Application of Random Survival Forest to French Data

Isaure Delaporte, University of St Andrews
Hill Kulu, University of St Andrews

The most used techniques to analyse the multiple factors that shape people’s lives outcomes are the techniques of multivariate survival data analysis. Yet, these techniques have a number of limitations. The non-parametric methods such as survival trees and tree ensembles are a useful alternative to the classical survival data analysis. This paper aims to illustrate the advantages of random survival forest (RSF) to study the fertility dynamics of immigrants and their descendants. More specifically, we examine the probability of having a first, second and third birth among immigrants and their descendants in the French population using a rich French survey named Trajectories and Origins. We first assess the performance of the algorithm in predicting the event. We then demonstrate random forest variable selection techniques using Variable Importance and Minimal Depth. This allows us to determine which variables are the most important to explain survival. We then examine how and to which extent important variables affect survival and explore potential interaction terms. Our findings justify the robust interpretability and competitive performance of the random survival forest algorithm to study the family dynamics of immigrants and their descendants.

Keywords: Fertility and childbirth, Big data / Social media, Migrant populations

See extended abstract.

  Presented in Session 39. Fertility and Sexual and Reproductive Health: New Methods