[DS Principles] All you need to know about Causal Inference [Part 1]

Be-cause you need this...

Sep 15, 2023

I’ve long been intimidated by causal inference. But really, it’s just some basic philosophy with some math behind it. Remember when all scientists were called Philosophers? well, all of science is really just a branch of philosophy. So let’s start with some philosophy. Then we’ll get into some data science concepts you can use today.

brown concrete statue of man — Photo by Tingey on Unsplash

Causation by Philosophical Reasoning

The math always follows the logic. Let’s dive into the logic first.

Counterfactuals and Modality: Causal reasoning often employs counterfactual thinking. "What would have happened if event A did not occur?" This allows us to compare the actual outcome with a hypothetical scenario to infer causality. Philosophers like David Lewis have developed detailed accounts of counterfactuals using possible world semantics.
Regularities: One traditional view of causation, often associated with the philosopher David Hume, is the regularity view. This posits that causation is just a regular succession of events: when we see A regularly followed by B, we infer that A causes B. But this regularity is not sufficient for causation, as there can be coincidences or confounders.
Manipulability Theory: Some philosophers, such as James Woodward, have framed causation in terms of interventions. If by manipulating A we can bring about changes in B, then A has a causal effect on B. This perspective has been particularly influential in experimental science and statistics.
Probabilistic Causation: Some events do not have deterministic causes but probabilistic ones. In this perspective, causation can be seen as increasing the probability of an effect. This is common in fields like epidemiology where certain factors increase the likelihood of an outcome but don't guarantee it.
Causal Mechanisms: A more recent trend in philosophy is to think about causation in terms of mechanisms. Here, A causes B if there's a mechanism that links them. This view can be seen in opposition to the regularity view, as it emphasizes the "how" of causation, not just the "what."
Causal Relata: What are the things between which causal relationships obtain? Are they events, processes, variables, or something else? Different philosophical theories of causation might have different answers to this question.
Causation vs. Explanation: Causal relationships and explanations are deeply intertwined but distinct. While causation refers to the objective relationship between events or states of affairs, explanation concerns our understanding or description of this relationship. Not all explanations are causal; some might be based on laws, regularities, or other principles.
Challenges to Causation: There are well-known philosophical challenges to causation. For instance, the "problem of induction" posits that just because event A has always been followed by event B in the past doesn't guarantee it will in the future. There's also the post-Humean challenge of identifying causes without reference to unobservable entities or metaphysical principles.

Three Foundational Frameworks for Causal Inference

Now, most of the time people just use linear regression for causal inference. We’ll get there, but these following three frameworks provide a common ground for how people apply the philosophical reasonings above.

Potential Outcomes:
- Background: Introduced by Donald Rubin, the potential outcomes framework (often referred to as the Rubin Causal Model) provides a formal way to define causal effects.
- Key Concept: For each unit (e.g., a person in a medical study), imagine two potential outcomes: one if the unit is treated (denoted Y1) and one if the unit is not treated (denoted Y0).
- Causal Effect: The causal effect for a unit is the difference between these two potential outcomes (Y1−Y0). However, for any given unit, we can only observe one of these outcomes, leading to the "fundamental problem of causal inference": we can't directly observe the causal effect for any individual unit.
- Average Treatment Effect (ATE): A solution to the fundamental problem is to consider populations or groups. ATE is the expected difference in outcomes between treated and untreated units, E[Y1−Y0].
Directed Acyclic Graphs (DAGs):
- Background: DAGs are graphical representations of the relationships between variables. They're especially useful for illustrating confounding variables and other structural relationships that can bias causal estimates.
- Key Concept: In a DAG, nodes represent variables, and directed edges (arrows) represent direct causal relationships. The graph being acyclic means that it doesn't have any loops.
- Confounding: If there's a common cause (a "confounder") for both the treatment and the outcome, it can bias estimates. DAGs help visualize such structures. For instance, if A affects both B and C, then A is a confounder when assessing the causal relationship between B and C.
- Adjustment: Using DAGs, we can identify which variables to adjust for in order to obtain unbiased estimates of causal effects.
Structural Causal Models (SCMs):
- Background: SCMs (or sometimes just "structural models") provide a mathematical way to represent systems of equations and their causal relationships.
- Key Concept: An SCM is composed of two parts:
  1. Structural Equations: Each equation represents how a variable is generated as a function of its parents (direct causes) and some noise term. For example, Y=f(X,U) denotes that Y is a function of X and some unobserved factors U.
  2. Noise Terms: These are variables that capture unobserved factors affecting each variable. They're assumed to be independent of each other.
- Intervention: SCMs allow us to mathematically represent interventions. For instance, "do" operations (like do(X=x)) let us set the value of X and see how it propagates through the system.
- Connection to DAGs: The structural equations in an SCM can be visualized as a DAG, where each equation corresponds to a node and its parents.

Doing a Basic Causal Inference Analysis

Now, the things above aren’t so fancy. They just come down to computing some averages and using some math to account or some things. If you’re using Python, you may consider the CausalInference package.

So here we have a simple model for how to do casual inference analysis, trying to measure the causal effect of the treatment on Y.

from causalinference import CausalModel

# Create a causal model
cm = CausalModel(Y=observed_outcome, D=treatment_variable, X=covariates)

# Estimate causal effects
cm.est_via_ols(adj=1)  # Using Ordinary Least Squares
print(cm.estimates)

But this package is old, not maintained. What’s it doing under the hood? Well, it’s doing some linear regression with some other adjustments (if desired). We’ll get to that. First, let’s figure out how we might screw up our causal inference.

Confounders of Causality

A confounder is something that prevents you from being able to philosophically declare causality. Here are some concepts:

Causal inference in observational studies is affected by a myriad of factors and challenges. The goal is to estimate the causal effect of a treatment/intervention as if it were randomized, but in practice, this is hard due to various confounding factors. Here's a list of some key elements and challenges affecting causal inference:

Confounding: This is when an external factor influences both the treatment assignment and the outcome. Properly controlling for confounding is essential to derive unbiased causal estimates. Techniques like regression adjustment, stratification, and matching (like propensity score matching) aim to address confounding.
Selection Bias: This occurs when there's a non-random selection of participants or observations, leading to a sample that's not representative of the population. For example, if only healthier patients opt for a certain treatment, their health outcomes might be better regardless of the treatment, leading to bias.
Measurement Error: Errors in measuring either the treatment, outcome, or covariates can lead to biased causal estimates. For instance, if there's non-differential misclassification of treatment status, it can attenuate the estimated treatment effect.
Missing Data: If data is missing not at random (MNAR), it can introduce bias. Methods like multiple imputation or weighting (like inverse probability weighting) can be used to address missing data.
Post-Treatment Bias: This occurs when a covariate affected by the treatment is controlled for in the analysis, potentially leading to a biased estimate of the direct effect of the treatment.
Mediation & Mechanisms: Understanding not just if, but how a treatment has an effect is vital. Mediation analysis helps identify intermediate variables through which a treatment exerts its effect.
Unobserved Confounding: Even after controlling for observed variables, there might be hidden variables affecting both treatment and outcome. Instrumental variable methods, among others, aim to address this challenge.
Generalizability (External Validity): Even if a causal effect is valid for a specific sample or context, it might not generalize to other settings, populations, or times.
Temporal Order: Establishing that the treatment precedes the outcome in time is essential for causation.
Stable Unit Treatment Value Assumption (SUTVA): This assumes that the treatment level of one unit does not affect the outcome of another unit, and that there are no multiple versions of treatment which can affect the outcome differently.
Design Considerations: The study design, whether it's a cross-sectional study, cohort study, or case-control study, can influence the ease or difficulty of inferring causation.
Model Specification: Choosing the right model for causal inference, including the functional form, interactions, and polynomial terms, can significantly affect results.

I don’t want to focus on CausalModel, but instead focus on what the package aims to try to do.

CausalModel vs. Ordinary Least Squares Regression

OLS is a tool used by CausalModel to do inference. But it CausalModel does a few other things as well. Namely, because of the 12+ confounders listed above, CausalModel tries to focus on how to enhance the linear regression framework to get to better estimates of the treatment effect.

Objective:
- CausalModel: The primary aim is to estimate the causal effect of a treatment (or intervention) on an outcome. It seeks to answer questions like "What is the effect of treatment A on outcome Y?"
- Linear Regression: Its main goal is to fit a line (or hyperplane) that best predicts the outcome variable based on one or more predictor variables. It's primarily concerned with prediction and correlation.
Assumptions:
- CausalModel:
  - Unconfoundedness: Treatment assignment is independent of potential outcomes, given observed covariates. In simpler terms, once we control for certain variables, the treatment doesn't depend on any hidden variables that also affect the outcome.
  - Overlap (or common support): Each individual has a positive probability of receiving either treatment.
  - Stable Unit Treatment Value Assumption (SUTVA): The potential outcome for any individual is unaffected by the treatment assignments of others.
- Linear Regression:
  - Linearity: The relationship between predictors and the outcome is linear.
  - Independence: Observations are independent of each other.
  - Homoscedasticity: The variance of the residuals is constant across observations.
  - No Perfect Multicollinearity: Predictor variables are not perfectly correlated with each other.
Methodology:
- CausalModel: Uses various techniques such as propensity score matching, stratification, weighting, or instrumental variable methods to try and tease out causal relationships and account for confounding.
- Linear Regression: Uses least squares estimation (or other optimization methods) to minimize the sum of the squared residuals and determine the line of best fit.
Interpretation:
- CausalModel: Coefficients represent the average causal effect of the treatment on the outcome. They answer the question: "On average, how does the outcome change when the treatment is applied, compared to when it isn't?"
- Linear Regression: Coefficients represent the average change in the outcome for a one-unit change in the predictor, holding all else constant. They don't necessarily imply causation.

Other Principles

Okay, so CausalModel is a little more advanced. Let’s review some of the core differences in the methodology. Note - the details here aren’t as important as thinking high-level about your data. The model won’t be perfect, but it will try to account for some of the concepts above. Remember - these are all just high-level techniques to try to get better estimates of causality.

CausalModel:
- Concept: At its core, a CausalModel is a formalized way to represent and analyze causal relationships in data. The goal is to understand the effect of an intervention or treatment on an outcome while accounting for various biases and confounders that can cloud true causal relationships.
- Example: Consider trying to evaluate the effect of a new drug on patient recovery times. Instead of a randomized controlled trial, you might have observational data where those who received the drug might differ in various ways from those who didn't. A CausalModel aims to emulate what you would see in a randomized setting using this observational data.
Propensity Score Matching:
- Concept: As discussed before, propensity score is the probability of receiving the treatment given observed covariates. By matching treated and untreated units with similar propensity scores, one can create a pseudo-randomized experiment.
- Example: In the drug scenario, some patients might have been more likely to receive the drug because of their age, severity of symptoms, etc. Matching ensures that for each treated patient, there's a similar untreated patient, balancing these observed characteristics.
Stratification:
- Concept: It involves dividing data into separate strata (or bins) based on propensity scores and then estimating treatment effects within each stratum. The overall treatment effect is then a weighted average of effects across strata.
- Example: Using the drug scenario, instead of direct matching, you could group patients by propensity score intervals (e.g., 0-0.2, 0.2-0.4, etc.) and compare outcomes between treated and untreated within each group.
Weighting:
- Concept: Involves assigning weights to observations in order to reduce the bias due to confounding. One common method is the inverse probability of treatment weighting (IPTW) where units are weighted by the inverse of their probability of receiving the treatment they actually received.
- Example: Consider patients who were less likely to get the drug but did. Since they are underrepresented in the treated group given their characteristics, they might be given a higher weight in analyses.
Instrumental Variable Methods:
- Concept: Instrumental variables (IV) are variables that affect the treatment but have no direct effect on the outcome (except through the treatment). IV methods use these instruments to isolate a source of variation in treatment that's not confounded with the outcome, providing a pathway to causal estimation in the presence of unobserved confounding.
- Example: Suppose there's a policy that randomly assigns the new drug to certain clinics. The clinic's assignment can be an instrument. Patients going to those clinics are more likely to receive the drug, not because of their health but because of the clinic's assignment.

[Sidebar] Propensity score matching

This is a fascinating concept - that even when you randomize your patients there could be some confounding in how they’re randomized, such as they’re more likely to get a treatment if they’re old. This can happen anywhere in your data.

Propensity Score Calculation:
- Given covariates X, the propensity score is the probability of receiving the treatment D.
- Typically, this score is estimated using logistic regression:
- where D=1 indicates treatment and D=0 indicates control.
Matching on Propensity Score:
- Once propensity scores are calculated, each treated unit is matched to one or more control units based on the similarity of their scores.
- Different matching techniques include nearest-neighbor, caliper matching, and kernel matching, among others.
Balancing Property:
- One of the key properties of the propensity score is its balancing property. If two units have the same propensity score, they should, on average, have the same distribution of observed covariates, irrespective of treatment status.
- After matching, one checks for balance on the covariates between the treated and control groups to ensure that this property holds.

While you might not have a “treatment” in your data, thinking about propensity scores in general will improve your ability to think about the relationships in your data.

Summary

Causal inference relies on philosophical reasoning about counterfactuals, regularities, interventions, and mechanisms to establish causality. Frameworks like potential outcomes, DAGs, and structural causal models formalize this reasoning.
Confounding factors like selection bias, measurement error, and unobserved variables can distort causal estimates. Techniques like matching, weighting, and instrumental variables aim to address confounding.
CausalModel in Python tries to estimate treatment effects from observational data by using methods like propensity score matching to emulate a randomized controlled trial.
Propensity scores represent the probability of receiving treatment given covariates. Matching on these scores creates balance between treated and control groups.
While linear regression focuses on prediction, causal inference focuses on accurately estimating treatment effects in the presence of confounding through techniques like matching, stratification, and weighting.

Embracing AI/ML/DS

Discussion about this post