Causal Mediation Analysis and Machine Learning based estimators

Project Details


Causal mechanisms analyses allow uncovering the role played by alternative intermediate variables on the causal pathway between the treatment and the ‘final’ outcome of interest. As a result, this methodology enables researchers to make more effective policy recommendation with respect to the channels through which a given policy intervention is affecting a given outcome of interest.
In this project, we aim at combining causal mediation analyses with machine learning based techniques, contributing to the development of cutting-edge methodologies in the field of causal inference and high-dimensional data.
In particular, in the first part, by using the ‘Understanding Society’ dataset
(, we will optimally
select genetic instruments and focus on the interaction of these genetic (and environmental) factors, studying its
effect on individual income conditions mediated by individual health status. Given the high-dimensional data setting, machine learning based techniques will be also applied to allow for an optimal selection of important control variables.
Thanks to the recent inclusion of multiple COVID-19 survey waves inside the original dataset, we also intend to perform an additional analysis to evaluate the effect of genetic and environmental factors on the psychological status of the individuals mediated by the contraction of the COVID-19 infectious disease.
In the second part of the project, we will develop a doubly robust version of a Heckman-type sample selection correction for nonparametric treatment evaluation suitable for machine-learning based estimation of propensity score. We will then implement this methodology to the US ‘Job Corps Dataset’ for the evaluation of different training programs on hourly wages of unemployed in US.
Effective start/end date30/05/2230/09/22


  • Luxembourg National Research Fund (FNR)


  • Causal mediat
  • Machine learning
  • High-dimensional data
  • Health