Drop us a line and see how predictive analytics can transform your organisation.

This post is dedicated to the statistical field known as **causal inference** of which one of the main goals is estimating the causal effects of an intervention (often called a treatment) on a quantity of interest (outcome).

The “gold standard” method for inferring the effects of an intervention (for example, prescribing medicine, setting a price, and so on) is the randomized experiment in which treatment allocation amongst units is random. Unfortunately, such an experiment may be unfeasible (too expensive, time-consuming, unethical, etc.).

Nevertheless, one often has access to “observational data”, that is, records of past cases where different levels of treatment were made and the resulting outcomes documented.

The theory of causal inference, to which Judea Pearl has vastly contributed, offers conditions under which observational data can, in principle, be used to estimate the causal effects (without bias). For example, Pearl's back-door criterion provides a graphical test to whether a set of variables is appropriate for adjustment (*Causality by J. Pearl*).

But even when such favorable conditions hold (a very popular situation when this happens is known in the literature as **strong ignorability**) causal inference from observational data is challenging. This is due to the fact that we are only exposed to what has really happened (also termed factual data) and not to what would have happened (also termed counterfactual data). Since the distribution of treated samples often differs from the distribution of untreated samples, a naive model trained to fit the (factual) data would not generalize well to the counterfactuals.

The recent success of neural networks in many challenging tasks has encouraged attempts to apply it on the causal inference problem and in the last 2-3 years we witness ingenious ideas addressing the above-mentioned problem. For example, a recent paper introduces a framework called CFR (for counterfactual regressor) in which a representation of the data is learned that minimizes the distance between the control and treated distributions induced by the representation.

To conclude this discussion, I wish to mention a major challenge in the task of causal inference that is less dominant in other applications of machine learning algorithms. Since we lack the true values of the counterfactuals against which we can measure the quality of our “intervention predictions”, it is non-trivial to choose the appropriate model, given multiple alternative models.

Working at Pecan, I see on a daily basis the high practical relevance of causal inference to our customers from a variety of industries and verticals and we do our best to provide them with the state of the art technology currently available for this daunting task.