Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
424b1ec
add first tuning simulation
SvenKlaassen Nov 14, 2025
445c30f
add plr tuning example
SvenKlaassen Nov 17, 2025
d228c7d
Merge branch 'main' into sk-optuna
SvenKlaassen Nov 24, 2025
df3d01f
Fix links and standardize section titles in coverage documentation
SvenKlaassen Nov 24, 2025
a1c474b
Standardize section titles to "Coverage" in multiple documentation files
SvenKlaassen Nov 24, 2025
c450065
Update documentation links to stable API references in coverage sections
SvenKlaassen Nov 24, 2025
93f68ad
Standardize section titles to "Coverage" in multiple documentation files
SvenKlaassen Nov 24, 2025
27fa90e
update tuning sim
SvenKlaassen Nov 24, 2025
e21b357
first apos tuning sim
SvenKlaassen Nov 26, 2025
46522ac
rerun lplr sim and add tuning with 100 reps and 200 trials
SvenKlaassen Nov 26, 2025
454361c
formatting
SvenKlaassen Nov 26, 2025
375333c
add tuning results to docs
SvenKlaassen Nov 27, 2025
c31b082
add sim for did_pa_multi_tune
SvenKlaassen Nov 27, 2025
1c734f3
add did pa multi tune
SvenKlaassen Nov 28, 2025
7393505
rerun did sim
SvenKlaassen Nov 28, 2025
347e33d
rerun did sim
SvenKlaassen Dec 1, 2025
3525aeb
rerun irm sim with updated nuisance loss logging
SvenKlaassen Dec 1, 2025
2334050
add LightGBM parameter tuning functions for regression and classifica…
SvenKlaassen Dec 1, 2025
899ff88
update APO models documentation and enhance APOSTuningCoverageSimulat…
SvenKlaassen Dec 1, 2025
ef28d95
refactor: update PLR ATE tuning coverage simulation with loss metrics…
SvenKlaassen Dec 1, 2025
e3eb809
update lplr sim
SvenKlaassen Dec 1, 2025
10bb9d9
Update DID multi-tuning results and metadata
SvenKlaassen Dec 1, 2025
84ea55f
add did
SvenKlaassen Dec 1, 2025
2cb8aae
refactor: update DGP specifications and sample size in configuration …
SvenKlaassen Dec 2, 2025
4d4e84e
update dml version in toml
SvenKlaassen Dec 2, 2025
7a5032e
update did sim
SvenKlaassen Dec 3, 2025
e64099d
run did sim with updated treatment assignment
SvenKlaassen Dec 3, 2025
4f425d3
rerun plr tuning
SvenKlaassen Dec 3, 2025
e925e0c
Merge pull request #32 from DoubleML/sk-optuna
SvenKlaassen Dec 3, 2025
be02667
Update results from script: scripts/irm/iivm_late.py
invalid-email-address Dec 4, 2025
b878e61
Update results from script: scripts/irm/irm_atte_sensitivity.py
invalid-email-address Dec 4, 2025
2b6a375
Update results from script: scripts/irm/irm_ate_sensitivity.py
invalid-email-address Dec 4, 2025
0703b9a
Update results from script: scripts/irm/apo.py
invalid-email-address Dec 4, 2025
883c316
Update results from script: scripts/irm/apos.py
invalid-email-address Dec 4, 2025
10de76d
Update results from script: scripts/irm/irm_gate.py
invalid-email-address Dec 4, 2025
0814dd6
Update results from script: scripts/irm/irm_cate.py
invalid-email-address Dec 4, 2025
9281637
Update results from script: scripts/irm/cvar.py
invalid-email-address Dec 4, 2025
df8ffeb
Update results from script: scripts/irm/lpq.py
invalid-email-address Dec 4, 2025
b68a32f
Update results from script: scripts/did/did_pa_multi.py
invalid-email-address Dec 4, 2025
76660f3
Update results from script: scripts/irm/pq.py
invalid-email-address Dec 4, 2025
25628e7
Update results from script: scripts/irm/irm_atte.py
invalid-email-address Dec 4, 2025
1b918af
Update results from script: scripts/irm/irm_ate.py
invalid-email-address Dec 4, 2025
49b0ffd
Update results from script: scripts/plm/lplr_ate.py
invalid-email-address Dec 4, 2025
6a6a4e8
Update results from script: scripts/ssm/ssm_nonig_ate.py
invalid-email-address Dec 4, 2025
e5c1a05
Update results from script: scripts/plm/plr_gate.py
invalid-email-address Dec 4, 2025
cc66b6c
Update results from script: scripts/plm/plr_cate.py
invalid-email-address Dec 4, 2025
55c51bb
Update results from script: scripts/plm/plr_ate.py
invalid-email-address Dec 4, 2025
586efca
Update results from script: scripts/did/did_cs_atte_coverage.py
invalid-email-address Dec 4, 2025
0578196
Update results from script: scripts/plm/plr_ate_sensitivity.py
invalid-email-address Dec 4, 2025
c82ba44
Update results from script: scripts/did/did_pa_atte_coverage.py
invalid-email-address Dec 4, 2025
a0a59f0
Update results from script: scripts/ssm/ssm_mar_ate.py
invalid-email-address Dec 4, 2025
3fcf8a2
Update results from script: scripts/did/did_cs_multi.py
invalid-email-address Dec 4, 2025
17d9360
Update results from script: scripts/plm/pliv_late.py
invalid-email-address Dec 4, 2025
7329852
Merge pull request #34 from DoubleML/sk-update-0-11
SvenKlaassen Dec 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/apo_sim.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
uses: astral-sh/setup-uv@v5
with:
version: "0.7.8"

- name: Set up Python
uses: actions/setup-python@v5
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pliv_sim.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
cd monte-cover
uv venv
uv sync

- name: Install DoubleML from correct branch
run: |
source monte-cover/.venv/bin/activate
Expand Down
4 changes: 2 additions & 2 deletions doc/did/did_cs.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
init_notebook_mode(all_interactive=True)
```

## ATTE Coverage
## Coverage

The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).
The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).

::: {.callout-note title="Metadata" collapse="true"}

Expand Down
14 changes: 7 additions & 7 deletions doc/did/did_cs_multi.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
init_notebook_mode(all_interactive=True)
```

## ATTE Coverage
## Coverage

The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/dev/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
The simulations are based on the [make_did_cs_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_cs_CS2021.html)-DGP with $1000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:

- Type 1: Linear outcome model and treatment assignment
- Type 4: Nonlinear outcome model and treatment assignment
Expand Down Expand Up @@ -52,7 +52,7 @@ df = pd.read_csv("../../results/did/did_cs_multi_detailed.csv", index_col=None)
assert df["repetition"].nunique() == 1
n_rep = df["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
```

### Observational Score
Expand Down Expand Up @@ -112,7 +112,7 @@ generate_and_show_styled_table(

## Aggregated Effects

These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/dev/guide/models.html#difference-in-differences-models-did).
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did).

The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidence intervals).

Expand All @@ -127,7 +127,7 @@ df_group = pd.read_csv("../../results/did/did_cs_multi_group.csv", index_col=Non
assert df_group["repetition"].nunique() == 1
n_rep_group = df_group["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
```

#### Observational Score
Expand Down Expand Up @@ -195,7 +195,7 @@ df_time = pd.read_csv("../../results/did/did_cs_multi_time.csv", index_col=None)
assert df_time["repetition"].nunique() == 1
n_rep_time = df_time["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
```

#### Observational Score
Expand Down Expand Up @@ -263,7 +263,7 @@ df_es = pd.read_csv("../../results/did/did_cs_multi_eventstudy.csv", index_col=N
assert df_es["repetition"].nunique() == 1
n_rep_es = df_es["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_d0_t0", "Loss g_d0_t1", "Loss g_d1_t0", "Loss g_d1_t1", "Loss m"]
```

#### Observational Score
Expand Down
4 changes: 2 additions & 2 deletions doc/did/did_pa.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
init_notebook_mode(all_interactive=True)
```

## ATTE Coverage
## Coverage

The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).
The simulations are based on the the [make_did_SZ2020](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_SZ2020.html)-DGP with $1000$ observations. Learners are only set to boosting, due to time constraints (and the nonlinearity of some of the DGPs).

::: {.callout-note title="Metadata" collapse="true"}

Expand Down
205 changes: 198 additions & 7 deletions doc/did/did_pa_multi.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ from utils.style_tables import generate_and_show_styled_table
init_notebook_mode(all_interactive=True)
```

## ATTE Coverage
## Coverage

The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/dev/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $2000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:
The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $1000$ observations. Learners are both set to either boosting or a linear (logistic) model. Due to time constraints we only consider the following DGPs:

- Type 1: Linear outcome model and treatment assignment
- Type 4: Nonlinear outcome model and treatment assignment
Expand Down Expand Up @@ -52,7 +52,7 @@ df = pd.read_csv("../../results/did/did_pa_multi_detailed.csv", index_col=None)
assert df["repetition"].nunique() == 1
n_rep = df["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

### Observational Score
Expand Down Expand Up @@ -112,7 +112,7 @@ generate_and_show_styled_table(

## Aggregated Effects

These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/dev/guide/models.html#difference-in-differences-models-did).
These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did).

The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals).

Expand All @@ -127,7 +127,7 @@ df_group = pd.read_csv("../../results/did/did_pa_multi_group.csv", index_col=Non
assert df_group["repetition"].nunique() == 1
n_rep_group = df_group["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

#### Observational Score
Expand Down Expand Up @@ -195,7 +195,7 @@ df_time = pd.read_csv("../../results/did/did_pa_multi_time.csv", index_col=None)
assert df_time["repetition"].nunique() == 1
n_rep_time = df_time["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

#### Observational Score
Expand Down Expand Up @@ -263,7 +263,7 @@ df_es = pd.read_csv("../../results/did/did_pa_multi_eventstudy.csv", index_col=N
assert df_es["repetition"].nunique() == 1
n_rep_es = df_es["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage"]
display_columns = ["Learner g", "Learner m", "DGP", "In-sample-norm.", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

#### Observational Score
Expand Down Expand Up @@ -320,3 +320,194 @@ generate_and_show_styled_table(
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```


## Tuning

The simulations are based on the the [make_did_CS2021](https://docs.doubleml.org/stable/api/generated/doubleml.did.datasets.make_did_CS2021.html)-DGP with $1000$ observations. Due to time constraints we only consider one learner, use in-sample normalization and the following DGPs:

- Type 1: Linear outcome model and treatment assignment
- Type 4: Nonlinear outcome model and treatment assignment

The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals). This is only an example as the untuned version just relies on the default configuration.

::: {.callout-note title="Metadata" collapse="true"}

```{python}
#| echo: false
metadata_file = '../../results/did/did_pa_multi_tune_metadata.csv'
metadata_df = pd.read_csv(metadata_file)
print(metadata_df.T.to_string(header=False))
```

:::

```{python}
#| echo: false

# set up data
df = pd.read_csv("../../results/did/did_pa_multi_tune_detailed.csv", index_col=None)

assert df["repetition"].nunique() == 1
n_rep = df["repetition"].unique()[0]

display_columns = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

### Observational Score

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df,
filters={"level": 0.95, "Score": "observational"},
display_cols=display_columns,
n_rep=n_rep,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df,
filters={"level": 0.9, "Score": "observational"},
display_cols=display_columns,
n_rep=n_rep,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```

## Tuning Aggregated Effects

These simulations test different types of aggregation, as described in [DiD User Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did).

As before, we only consider one learner, use in-sample normalization and the following DGPs:

- Type 1: Linear outcome model and treatment assignment
- Type 4: Nonlinear outcome model and treatment assignment

The non-uniform results (coverage, ci length and bias) refer to averaged values over all $ATTs$ (point-wise confidende intervals). This is only an example as the untuned version just relies on the default configuration.

### Group Effects

```{python}
#| echo: false

# set up data
df_group_tune = pd.read_csv("../../results/did/did_pa_multi_tune_group.csv", index_col=None)

assert df_group_tune["repetition"].nunique() == 1
n_rep_group_tune = df_group_tune["repetition"].unique()[0]

display_columns_tune = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

#### Observational Score

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df_group_tune,
filters={"level": 0.95, "Score": "observational"},
display_cols=display_columns_tune,
n_rep=n_rep_group_tune,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df_group_tune,
filters={"level": 0.9, "Score": "observational"},
display_cols=display_columns_tune,
n_rep=n_rep_group_tune,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```


### Time Effects

```{python}
#| echo: false

# set up data
df_time_tune = pd.read_csv("../../results/did/did_pa_multi_tune_time.csv", index_col=None)

assert df_time_tune["repetition"].nunique() == 1
n_rep_time_tune = df_time_tune["repetition"].unique()[0]

display_columns_tune = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

#### Observational Score

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df_time_tune,
filters={"level": 0.95, "Score": "observational"},
display_cols=display_columns_tune,
n_rep=n_rep_time_tune,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df_time_tune,
filters={"level": 0.9, "Score": "observational"},
display_cols=display_columns_tune,
n_rep=n_rep_time_tune,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```

### Event Study Aggregation

```{python}
#| echo: false

# set up data
df_es_tune = pd.read_csv("../../results/did/did_pa_multi_tune_eventstudy.csv", index_col=None)

assert df_es_tune["repetition"].nunique() == 1
n_rep_es_tune = df_es_tune["repetition"].unique()[0]

display_columns_tune = ["Learner g", "Learner m", "DGP", "Tuned", "Bias", "CI Length", "Coverage", "Uniform CI Length", "Uniform Coverage", "Loss g_control", "Loss g_treated", "Loss m"]
```

#### Observational Score

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df_es_tune,
filters={"level": 0.95, "Score": "observational"},
display_cols=display_columns_tune,
n_rep=n_rep_es_tune,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```

```{python}
#| echo: false
generate_and_show_styled_table(
main_df=df_es_tune,
filters={"level": 0.9, "Score": "observational"},
display_cols=display_columns_tune,
n_rep=n_rep_es_tune,
level_col="level",
coverage_highlight_cols=["Coverage", "Uniform Coverage"]
)
```
Loading