Assignment_5

# Assignment 5

Deadline: Saturday, November 30th.

## Part 1: Heterogeneous treatment effects using causal trees and forests

For this part, we will be using experimental data for computing heterogeneous effects through causal trees and forests. For all exercises, the predictors $X$ are all variables that are not the outcome $Y$ or the treatment $D$.

1.1. **Load the data** (1 points). This is data for and experiment regarding the National Supported Work Demonstration (NSW) job-training program. You can find the data [here](https://github.com/d2cml-ai/CausalAI-Course/blob/main/Labs/Assignment/Assignment_5/data/experimental/experimental_control.csv), and read a description of the data [here](https://github.com/d2cml-ai/CausalAI-Course/blob/main/Labs/Assignment/Assignment_5/data/experimental/README.md). For further details of the experiment and the program, you can use [this link](https://mixtape.scunning.com/05-matching_and_subclassification#example-the-nsw-job-training-program)

1.2. **Find the ATE** (1.5 points). With `re78` as the outcome variable of interest, find the Average Treatment Effect of participation in the program. Specifically, you should find it by calculating the difference between the means of the treatment group and the control group (the Simple Difference of Means or SDM). What can you say about the program?

1.3. **Heterogeneous effects with causal trees** (3 points). Use causal trees like we saw in class. For Python, you should use the `econml` package; for R, use the `grf` package; and for Julia, you will need to create the auxiliary variable $Y^*$ and fit a decision tree regressor. Report the splits the tree finds and interpret them.

1.4. **Heterogeneous effects with causal forests** (3 points). Use causal forests like we saw in class. For Python, you should use the `econml` package; for R, use the `grf` package; and for Julia, you will need to use the auxiliary variable $Y^*$ computed in the previous exercise and fit a random forest regressor. Report the importance of the prediction variables.

1.5. **Plot heterogeneous effects** (1.5 points). Plot how the predicted treatment effect changes depending on a variable of your choice. (You can see the last example in [PD11](https://github.com/d2cml-ai/CausalAI-Course/tree/main/Labs/PD/PD11) for clarification of what you should do in this exercise)

## Part 2: Double/Debiased machine learning in observational data

In this part, we will be using observational data for computing the average treatment effect of the same program as in Part 1. This data is constructed by taking the treatment group from the same dataset as in Part 1, but constructing the control group from a different dataset; that is, the entirety of the control is comprised of observations from the Current Population Survey. Therefor, we may not have comparable treatment and control groups. To tackle this issue, we can use Double/Debiased machine learning.

2.1. **Load the data** (1 points). You can find the data [here](https://github.com/d2cml-ai/CausalAI-Course/blob/main/Labs/Assignment/Assignment_5/data/observational/biased_control.csv), and read a description of the data [here](https://github.com/d2cml-ai/CausalAI-Course/blob/main/Labs/Assignment/Assignment_5/data/observational/README.md). For further details on how this data was created, you can use [this link](https://mixtape.scunning.com/05-matching_and_subclassification#example-the-nsw-job-training-program).

2.2. **Group comparisons** (1.5 points). For the treatment and control group separately, report summary statistics of three variables of your choice. Can you spot any big differences between the treatment and control groups?

2.3. **Compute the SMD** (1.5 points). Find the simple difference of means, which we can use as a naive estimate of the ATE. How does the result in this case compare to the result in point 1.2.?

2.4. **Using DML** (6 points). Use the DML procedure as we saw in the Lab, in order to find a better estimate of the ATE. You may use the [`doubleML`](https://docs.doubleml.org/stable/index.html) packages for Python and R, but this package does not exist for Julia, so you will have to build your own procedure like we saw in class. You will be rewarded extra points for using more than one method for predictions. At the end, report the treatment effect you found, as well as the MSE for $D$ and $Y$ achieved by the method(s) you used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assignment_5 #1131

Assignment 5

Part 1: Heterogeneous treatment effects using causal trees and forests

Part 2: Double/Debiased machine learning in observational data

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Assignment_5 #1131

Description

Assignment 5

Part 1: Heterogeneous treatment effects using causal trees and forests

Part 2: Double/Debiased machine learning in observational data

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions