From dc101bdd47541aeaa69549f5c4d1131c3c3fa456 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 22 Feb 2026 11:54:53 -0500 Subject: [PATCH 1/6] Add continuous DiD tutorial (14_continuous_did.ipynb) Tutorial covering the ContinuousDiD estimator (Callaway, Goodman-Bacon & Sant'Anna 2024) with a job training program example. Includes dose-response curves, event study diagnostics, B-spline configuration, control group comparison, bootstrap inference, and a binary DiD comparison showing information loss from binarizing continuous treatment. Co-Authored-By: Claude Opus 4.6 --- docs/tutorials/14_continuous_did.ipynb | 713 +++++++++++++++++++++++++ 1 file changed, 713 insertions(+) create mode 100644 docs/tutorials/14_continuous_did.ipynb diff --git a/docs/tutorials/14_continuous_did.ipynb b/docs/tutorials/14_continuous_did.ipynb new file mode 100644 index 0000000..1b1b2db --- /dev/null +++ b/docs/tutorials/14_continuous_did.ipynb @@ -0,0 +1,713 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Continuous Difference-in-Differences\n", + "\n", + "**Estimating dose-response effects with heterogeneous treatment intensity**\n", + "\n", + "*Based on Callaway, Goodman-Bacon & Sant'Anna (2024)*\n", + "\n", + "---\n", + "\n", + "Standard difference-in-differences compares treated and untreated groups. But what happens when treatment isn't binary? In a **job training program**, some workers receive 5 hours of training, others get 20 hours, and some get 50. Binary DiD lumps all trained workers together, losing the rich variation in treatment intensity. **Continuous DiD** estimates how earnings respond to each level of training hours \u2014 a full dose-response curve.\n", + "\n", + "### When to use `ContinuousDiD`\n", + "\n", + "- Treatment has **continuous intensity or dose** (hours, dollars, distance, etc.)\n", + "- You want a **dose-response curve** showing how effects vary with dose\n", + "- You need **marginal effects** (what does one more unit of treatment do?)\n", + "- You want to **avoid binarizing** a naturally continuous treatment\n", + "- An **untreated comparison group** exists (never-treated or not-yet-treated units)\n", + "\n", + "### Topics covered\n", + "\n", + "1. Why continuous treatment needs special methods\n", + "2. Data setup and the dose distribution\n", + "3. Basic estimation with `ContinuousDiD`\n", + "4. Dose-response curves: ATT(d) and ACRT(d)\n", + "5. Event study diagnostics\n", + "6. Advanced features: splines, control groups, bootstrap\n", + "7. Comparison to binary DiD (information loss from binarizing)\n", + "8. Results export and group-time effects\n", + "\n", + "**Prerequisites:** Familiarity with standard DiD ([Tutorial 02: Staggered DiD](02_staggered_did.ipynb)) and parallel trends ([Tutorial 04](04_parallel_trends.ipynb))." + ], + "id": "b1334a82" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from diff_diff import ContinuousDiD, CallawaySantAnna, generate_continuous_did_data\n", + "\n", + "try:\n", + " import matplotlib.pyplot as plt\n", + " plt.style.use('seaborn-v0_8-whitegrid')\n", + " HAS_MATPLOTLIB = True\n", + "except ImportError:\n", + " HAS_MATPLOTLIB = False\n", + " print(\"matplotlib not installed - visualization examples will be skipped\")" + ], + "id": "37328298" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Why Continuous Treatment Needs Special Methods\n", + "\n", + "Consider a job training program. Workers receive different amounts of training, and we want to know: **does more training lead to higher earnings?** Three problems arise with naive approaches:\n", + "\n", + "**Problem 1: Binarizing loses information.** Collapsing 5 hours vs. 20 hours vs. 50 hours into \"trained / untrained\" tells you whether *any* training helps, but not whether *more* training helps more. You lose the dose-response relationship entirely.\n", + "\n", + "**Problem 2: TWFE with continuous treatment is biased.** Plugging a continuous treatment variable into a standard two-way fixed effects regression can produce biased estimates due to negative weights, contamination bias, and scale dependence \u2014 the same issues that plague binary TWFE in staggered settings, but compounded.\n", + "\n", + "**Problem 3: You need new estimands.** With continuous treatment, we need to distinguish:\n", + "\n", + "| Parameter | Meaning (training example) |\n", + "|-----------|---------------------------|\n", + "| ATT(d) | Total earnings gain from *d* hours of training vs. no training |\n", + "| ACRT(d) | Earnings gain from **one additional hour** at level *d* (marginal effect) |\n", + "| ATT_glob | Average effect of any training vs. none (across all dose levels) |\n", + "| ACRT_glob | Average marginal return to an extra hour of training |\n", + "\n", + "The `ContinuousDiD` estimator from Callaway, Goodman-Bacon & Sant'Anna (2024) handles all three problems. It estimates the full dose-response curve using B-splines and provides valid inference." + ], + "id": "c1df897c" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Identifying assumptions\n", + "\n", + "The continuous DiD framework relies on two levels of parallel trends:\n", + "\n", + "- **Standard parallel trends:** Untreated potential outcomes evolve identically regardless of treatment dose. This is the same assumption as binary DiD, extended to all dose levels. Under standard PT, the estimator identifies the **binarized** ATT (treated vs. untreated) and local effects.\n", + "\n", + "- **Strong parallel trends:** Additionally, there is no selection into dose based on potential gains. That is, workers who chose 50 hours of training would have had the same earnings trajectory as workers who chose 5 hours, had they both remained untreated. Under strong PT, the full **dose-response curve** ATT(d) is identified, enabling cross-dose comparisons.\n", + "\n", + "Strong parallel trends is needed for the dose-response curve; standard PT only identifies the overall (binarized) effect." + ], + "id": "b3c2a66d" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Data Setup\n", + "\n", + "We generate a balanced panel of workers. Some are assigned to a training cohort (treated in period 2), while 30% are never treated. Each trained worker receives a dose (hours) drawn from a log-normal distribution. The true treatment effect is **ATT(d) = 1 + 2d** \u2014 each hour of training adds \\$2 to earnings, plus a base effect of \\$1 for any training at all." + ], + "id": "b00d92bc" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data = generate_continuous_did_data(\n", + " n_units=500,\n", + " n_periods=4,\n", + " cohort_periods=[2],\n", + " never_treated_frac=0.3,\n", + " att_function=\"linear\",\n", + " att_intercept=1.0,\n", + " att_slope=2.0,\n", + " seed=42,\n", + ")\n", + "\n", + "print(f\"Shape: {data.shape}\")\n", + "print(f\"Units: {data['unit'].nunique()}, Periods: {data['period'].nunique()}\")\n", + "print(f\"Treatment cohorts: {sorted(data.loc[data['first_treat'] > 0, 'first_treat'].unique())}\")\n", + "print(f\"Never-treated units: {(data.groupby('unit')['first_treat'].first() == 0).sum()}\")\n", + "print()\n", + "data.head(10)" + ], + "id": "06bc694e" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Dose distribution for treated units\n", + "treated_doses = data.loc[data['first_treat'] > 0].groupby('unit')['dose'].first()\n", + "\n", + "print(\"Dose summary (treated units only):\")\n", + "print(f\" Min: {treated_doses.min():.2f}\")\n", + "print(f\" Max: {treated_doses.max():.2f}\")\n", + "print(f\" Mean: {treated_doses.mean():.2f}\")\n", + "print(f\" Median: {treated_doses.median():.2f}\")\n", + "\n", + "if HAS_MATPLOTLIB:\n", + " fig, axes = plt.subplots(1, 2, figsize=(12, 4))\n", + "\n", + " # Left: dose histogram\n", + " axes[0].hist(treated_doses, bins=30, edgecolor='white', alpha=0.8, color='steelblue')\n", + " axes[0].set_xlabel('Training hours (dose)')\n", + " axes[0].set_ylabel('Number of workers')\n", + " axes[0].set_title('Distribution of Training Hours')\n", + "\n", + " # Right: true ATT(d) curve\n", + " d_grid = np.linspace(treated_doses.min(), treated_doses.max(), 100)\n", + " axes[1].plot(d_grid, 1.0 + 2.0 * d_grid, 'k-', linewidth=2)\n", + " axes[1].set_xlabel('Training hours (dose)')\n", + " axes[1].set_ylabel('True ATT(d)')\n", + " axes[1].set_title('True Dose-Response: ATT(d) = 1 + 2d')\n", + "\n", + " plt.tight_layout()\n", + " plt.show()" + ], + "id": "94b82787" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Basic Estimation\n", + "\n", + "The `ContinuousDiD` estimator follows an sklearn-like API. Call `fit()` with column names for outcome, unit, time, first treatment period, and dose. Setting `aggregate=\"dose\"` computes the dose-response curve and global summary parameters." + ], + "id": "9c685ba3" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "est = ContinuousDiD(seed=42)\n", + "results = est.fit(\n", + " data,\n", + " outcome=\"outcome\",\n", + " unit=\"unit\",\n", + " time=\"period\",\n", + " first_treat=\"first_treat\",\n", + " dose=\"dose\",\n", + " aggregate=\"dose\",\n", + ")\n", + "\n", + "print(results.summary())" + ], + "id": "97b14ca8" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Compare estimated vs. true parameters\n", + "true_att_glob = 1.0 + 2.0 * treated_doses.mean()\n", + "true_acrt_glob = 2.0\n", + "\n", + "print(\"Parameter comparison:\")\n", + "print(f\" ATT_glob \u2014 estimated: {results.overall_att:.4f}, true: {true_att_glob:.4f}, \"\n", + " f\"bias: {results.overall_att - true_att_glob:.4f}\")\n", + "print(f\" ACRT_glob \u2014 estimated: {results.overall_acrt:.4f}, true: {true_acrt_glob:.4f}, \"\n", + " f\"bias: {results.overall_acrt - true_acrt_glob:.4f}\")" + ], + "id": "11d4fa22" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the global parameters\n", + "\n", + "- **ATT_glob** is the average total effect of training (at the observed dose levels) compared to no training. It answers: \"On average, how much more do trained workers earn?\"\n", + "- **ACRT_glob** is the average marginal return to an additional hour of training. It answers: \"What is one more hour of training worth, on average?\"\n", + "\n", + "Analytical standard errors are the default (fast, no resampling). For bootstrap-based inference, set `n_bootstrap` (see Section 6)." + ], + "id": "1b80dae6" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Dose-Response Curves\n", + "\n", + "The key advantage of continuous DiD is the **dose-response curve** \u2014 how the treatment effect varies across dose levels. The estimator evaluates ATT(d) and ACRT(d) on a grid of dose values (by default, percentiles from P10 to P99 of the observed dose distribution).\n", + "\n", + "Access the curves via `results.dose_response_att` and `results.dose_response_acrt`, which are `DoseResponseCurve` objects with attributes `dose_grid`, `effects`, `se`, `conf_int_lower`, and `conf_int_upper`." + ], + "id": "8214ff79" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Tabular view of the ATT dose-response curve\n", + "att_df = results.dose_response_att.to_dataframe()\n", + "print(\"ATT(d) dose-response curve:\")\n", + "print(att_df.head(10).to_string(index=False))" + ], + "id": "5ef9b240" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " fig, axes = plt.subplots(1, 2, figsize=(13, 5))\n", + "\n", + " # ATT(d)\n", + " att = results.dose_response_att\n", + " axes[0].plot(att.dose_grid, att.effects, 'b-', linewidth=2, label='Estimated ATT(d)')\n", + " axes[0].fill_between(att.dose_grid, att.conf_int_lower, att.conf_int_upper,\n", + " alpha=0.2, color='blue')\n", + " axes[0].plot(att.dose_grid, 1.0 + 2.0 * att.dose_grid, 'k--', linewidth=1.5,\n", + " label='True ATT(d)')\n", + " axes[0].set_xlabel('Training hours (dose)')\n", + " axes[0].set_ylabel('ATT(d)')\n", + " axes[0].set_title('Dose-Response: Total Effect')\n", + " axes[0].legend()\n", + "\n", + " # ACRT(d)\n", + " acrt = results.dose_response_acrt\n", + " axes[1].plot(acrt.dose_grid, acrt.effects, 'r-', linewidth=2, label='Estimated ACRT(d)')\n", + " axes[1].fill_between(acrt.dose_grid, acrt.conf_int_lower, acrt.conf_int_upper,\n", + " alpha=0.2, color='red')\n", + " axes[1].axhline(2.0, color='k', linestyle='--', linewidth=1.5, label='True ACRT = 2.0')\n", + " axes[1].set_xlabel('Training hours (dose)')\n", + " axes[1].set_ylabel('ACRT(d)')\n", + " axes[1].set_title('Dose-Response: Marginal Effect')\n", + " axes[1].legend()\n", + "\n", + " plt.tight_layout()\n", + " plt.show()" + ], + "id": "00c21a0a" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Interpreting the dose-response curves\n", + "\n", + "- **Left panel (ATT):** The total effect of training rises roughly linearly with dose, closely tracking the true curve (dashed). A worker with 3 hours of training gains about \\$7 in earnings; a worker with 5 hours gains about \\$11.\n", + "- **Right panel (ACRT):** The marginal return to one additional hour is approximately constant at \\$2, matching the true DGP. The confidence band is wider at extreme doses where fewer workers are observed.\n", + "\n", + "These curves require the **strong parallel trends** assumption. If you only trust standard PT, the global parameters (ATT_glob, ACRT_glob) are still valid \u2014 you just can't compare effects across dose levels." + ], + "id": "2000eb4a" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Event Study Diagnostics\n", + "\n", + "An event study aggregation binarizes treatment (treated vs. untreated) and shows effects by relative time period. This is useful for **pre-trends diagnostics**: pre-treatment coefficients should be near zero if parallel trends holds.\n", + "\n", + "Event study requires data with multiple cohorts (to have variation in event-time). We generate a new dataset with two treatment cohorts." + ], + "id": "c88d540c" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate multi-cohort data for event study\n", + "data_es = generate_continuous_did_data(\n", + " n_units=500,\n", + " n_periods=6,\n", + " cohort_periods=[3, 5],\n", + " never_treated_frac=0.3,\n", + " att_function=\"linear\",\n", + " att_intercept=1.0,\n", + " att_slope=2.0,\n", + " seed=42,\n", + ")\n", + "\n", + "est_es = ContinuousDiD(n_bootstrap=199, seed=42)\n", + "results_es = est_es.fit(\n", + " data_es,\n", + " outcome=\"outcome\",\n", + " unit=\"unit\",\n", + " time=\"period\",\n", + " first_treat=\"first_treat\",\n", + " dose=\"dose\",\n", + " aggregate=\"eventstudy\",\n", + ")\n", + "\n", + "# Event study table\n", + "es_df = results_es.to_dataframe(level=\"event_study\")\n", + "es_df['pre_period'] = es_df['relative_period'] < 0\n", + "print(\"Event Study Effects (binarized ATT by relative period):\")\n", + "print(es_df.to_string(index=False))" + ], + "id": "1ad562d7" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " fig, ax = plt.subplots(figsize=(8, 5))\n", + "\n", + " rel = es_df['relative_period'].values\n", + " att = es_df['att_glob'].values\n", + " ci_lo = es_df['conf_int_lower'].values\n", + " ci_hi = es_df['conf_int_upper'].values\n", + "\n", + " ax.errorbar(rel, att, yerr=[att - ci_lo, ci_hi - att],\n", + " fmt='o', capsize=4, color='steelblue', markersize=6, linewidth=1.5)\n", + " ax.axhline(0, color='gray', linestyle='-', linewidth=0.8)\n", + " ax.axvline(-0.5, color='gray', linestyle='--', linewidth=0.8, label='Treatment onset')\n", + " ax.set_xlabel('Periods relative to treatment')\n", + " ax.set_ylabel('ATT (binarized)')\n", + " ax.set_title('Event Study: Pre-Trends Diagnostic')\n", + " ax.legend()\n", + "\n", + " plt.tight_layout()\n", + " plt.show()" + ], + "id": "61c7784e" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Advanced Features\n", + "\n", + "### B-spline configuration\n", + "\n", + "The estimator approximates the dose-response function using B-splines. Two parameters control the flexibility:\n", + "\n", + "- **`degree`** controls smoothness: `degree=1` gives a piecewise-linear fit, `degree=3` (default) gives a smooth cubic curve.\n", + "- **`num_knots`** adds interior knots for additional flexibility. With 0 knots (default), the spline is a single polynomial segment. Adding knots lets the curve bend at more points, useful for non-linear dose-response relationships.\n", + "\n", + "Let's see how these settings affect estimation on data with a **log dose-response**: ATT(d) = 1 + 2 log(1 + d)." + ], + "id": "7b22b542" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Generate data with log dose-response\n", + "data_log = generate_continuous_did_data(\n", + " n_units=500,\n", + " n_periods=4,\n", + " cohort_periods=[2],\n", + " never_treated_frac=0.3,\n", + " att_function=\"log\",\n", + " att_intercept=1.0,\n", + " att_slope=2.0,\n", + " seed=42,\n", + ")\n", + "\n", + "# Compare three B-spline configurations\n", + "configs = [\n", + " {\"label\": \"Linear (degree=1, knots=0)\", \"degree\": 1, \"num_knots\": 0},\n", + " {\"label\": \"Cubic (degree=3, knots=0)\", \"degree\": 3, \"num_knots\": 0},\n", + " {\"label\": \"Cubic + knots (degree=3, knots=2)\", \"degree\": 3, \"num_knots\": 2},\n", + "]\n", + "\n", + "spline_results = {}\n", + "print(f\"{'Configuration':<35} {'ATT_glob':>10} {'ACRT_glob':>11}\")\n", + "print(\"-\" * 58)\n", + "for cfg in configs:\n", + " est_cfg = ContinuousDiD(degree=cfg[\"degree\"], num_knots=cfg[\"num_knots\"], seed=42)\n", + " res_cfg = est_cfg.fit(data_log, \"outcome\", \"unit\", \"period\", \"first_treat\", \"dose\",\n", + " aggregate=\"dose\")\n", + " spline_results[cfg[\"label\"]] = res_cfg\n", + " print(f\"{cfg['label']:<35} {res_cfg.overall_att:>10.4f} {res_cfg.overall_acrt:>11.4f}\")" + ], + "id": "ebd370a0" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " fig, ax = plt.subplots(figsize=(8, 5))\n", + "\n", + " colors = ['#e74c3c', '#3498db', '#2ecc71']\n", + " for (label, res), color in zip(spline_results.items(), colors):\n", + " att = res.dose_response_att\n", + " ax.plot(att.dose_grid, att.effects, color=color, linewidth=2, label=label)\n", + "\n", + " # True curve\n", + " log_doses = data_log.loc[data_log['first_treat'] > 0].groupby('unit')['dose'].first()\n", + " d_grid = np.linspace(log_doses.min(), log_doses.max(), 100)\n", + " ax.plot(d_grid, 1.0 + 2.0 * np.log1p(d_grid), 'k--', linewidth=2, label='True: 1 + 2 log(1+d)')\n", + "\n", + " ax.set_xlabel('Training hours (dose)')\n", + " ax.set_ylabel('ATT(d)')\n", + " ax.set_title('B-Spline Configuration: Effect on Dose-Response Fit')\n", + " ax.legend(fontsize=9)\n", + "\n", + " plt.tight_layout()\n", + " plt.show()" + ], + "id": "141829f9" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Control group options\n", + "\n", + "The `control_group` parameter selects which untreated units serve as the comparison group:\n", + "\n", + "- **`\"never_treated\"`** (default): Only units that are never treated. Stronger assumption (parallel trends must hold for all periods) but avoids contamination from units that are treated later.\n", + "- **`\"not_yet_treated\"`**: Units not yet treated at time *t*. Weaker assumption and more data, but treated-later units may already be adjusting behavior in anticipation." + ], + "id": "b2eb2189" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Multi-cohort data for control group comparison\n", + "data_mc = generate_continuous_did_data(\n", + " n_units=500,\n", + " n_periods=6,\n", + " cohort_periods=[3, 5],\n", + " never_treated_frac=0.3,\n", + " att_function=\"linear\",\n", + " att_intercept=1.0,\n", + " att_slope=2.0,\n", + " seed=42,\n", + ")\n", + "\n", + "print(f\"{'Control group':<20} {'ATT_glob':>10} {'SE':>10} {'ACRT_glob':>11} {'SE':>10}\")\n", + "print(\"-\" * 63)\n", + "for cg in [\"never_treated\", \"not_yet_treated\"]:\n", + " est_cg = ContinuousDiD(control_group=cg, seed=42)\n", + " res_cg = est_cg.fit(data_mc, \"outcome\", \"unit\", \"period\", \"first_treat\", \"dose\",\n", + " aggregate=\"dose\")\n", + " print(f\"{cg:<20} {res_cg.overall_att:>10.4f} {res_cg.overall_att_se:>10.4f} \"\n", + " f\"{res_cg.overall_acrt:>11.4f} {res_cg.overall_acrt_se:>10.4f}\")" + ], + "id": "990c4782" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Bootstrap inference\n", + "\n", + "By default, `ContinuousDiD` computes **analytical standard errors** (fast, no resampling). For more robust inference, enable the **multiplier bootstrap** by setting `n_bootstrap`. The bootstrap perturbs unit-level influence functions using random weights.\n", + "\n", + "Weight options:\n", + "- `\"rademacher\"` (default): \u00b11 with equal probability. Good for most cases.\n", + "- `\"mammen\"`: Two-point distribution matching first 3 moments.\n", + "- `\"webb\"`: Six-point distribution, recommended for very few clusters (<10).\n", + "\n", + "We recommend `n_bootstrap >= 199` for reliable inference." + ], + "id": "ce4cc5e1" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Analytical SEs (default)\n", + "est_ana = ContinuousDiD(seed=42)\n", + "res_ana = est_ana.fit(data, \"outcome\", \"unit\", \"period\", \"first_treat\", \"dose\",\n", + " aggregate=\"dose\")\n", + "\n", + "# Bootstrap SEs\n", + "est_boot = ContinuousDiD(n_bootstrap=199, seed=42)\n", + "res_boot = est_boot.fit(data, \"outcome\", \"unit\", \"period\", \"first_treat\", \"dose\",\n", + " aggregate=\"dose\")\n", + "\n", + "print(f\"{'Method':<15} {'ATT_glob':>10} {'SE':>10} {'95% CI':>25}\")\n", + "print(\"-\" * 62)\n", + "for label, r in [(\"Analytical\", res_ana), (\"Bootstrap\", res_boot)]:\n", + " ci = r.overall_att_conf_int\n", + " print(f\"{label:<15} {r.overall_att:>10.4f} {r.overall_att_se:>10.4f} \"\n", + " f\"[{ci[0]:>8.4f}, {ci[1]:>8.4f}]\")\n", + "\n", + "print()\n", + "print(f\"{'Method':<15} {'ACRT_glob':>10} {'SE':>10} {'95% CI':>25}\")\n", + "print(\"-\" * 62)\n", + "for label, r in [(\"Analytical\", res_ana), (\"Bootstrap\", res_boot)]:\n", + " ci = r.overall_acrt_conf_int\n", + " print(f\"{label:<15} {r.overall_acrt:>10.4f} {r.overall_acrt_se:>10.4f} \"\n", + " f\"[{ci[0]:>8.4f}, {ci[1]:>8.4f}]\")" + ], + "id": "eb32265b" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Comparison to Binary DiD\n", + "\n", + "What if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **overall ATT** (treated vs. untreated), but the binary approach discards all dose information \u2014 no dose-response curve, no marginal effects.\n", + "\n", + "Note: the ATT_glob values may differ slightly because `ContinuousDiD` uses B-spline estimation while `CallawaySantAnna` uses simple mean differences. The continuous approach provides the full dose-response curve on top of the overall effect." + ], + "id": "a5230e69" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Binary DiD: Callaway-Sant'Anna (ignores dose)\n", + "cs = CallawaySantAnna(control_group=\"never_treated\")\n", + "results_binary = cs.fit(\n", + " data,\n", + " outcome=\"outcome\",\n", + " unit=\"unit\",\n", + " time=\"period\",\n", + " first_treat=\"first_treat\",\n", + ")\n", + "\n", + "print(f\"{'Estimator':<25} {'Overall ATT':>12} {'SE':>10}\")\n", + "print(\"-\" * 49)\n", + "print(f\"{'ContinuousDiD':<25} {results.overall_att:>12.4f} {results.overall_att_se:>10.4f}\")\n", + "print(f\"{'CallawaySantAnna':<25} {results_binary.overall_att:>12.4f} {results_binary.overall_se:>10.4f}\")\n", + "print()\n", + "print(\"Binary DiD gives one number. Continuous DiD gives the full dose-response curve\")\n", + "print(\"plus marginal effects (ACRT) \u2014 without losing any information.\")" + ], + "id": "d3303127" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " fig, ax = plt.subplots(figsize=(9, 5))\n", + "\n", + " # Dose-response curve with CI\n", + " att = results.dose_response_att\n", + " ax.plot(att.dose_grid, att.effects, 'b-', linewidth=2, label='Continuous DiD: ATT(d)')\n", + " ax.fill_between(att.dose_grid, att.conf_int_lower, att.conf_int_upper,\n", + " alpha=0.15, color='blue')\n", + "\n", + " # Binary ATT (flat line)\n", + " ax.axhline(results_binary.overall_att, color='red', linestyle='--', linewidth=2,\n", + " label=f'Binary DiD: ATT = {results_binary.overall_att:.2f}')\n", + "\n", + " # True curve\n", + " ax.plot(att.dose_grid, 1.0 + 2.0 * att.dose_grid, 'k:', linewidth=2,\n", + " label='True: ATT(d) = 1 + 2d')\n", + "\n", + " ax.set_xlabel('Training hours (dose)')\n", + " ax.set_ylabel('Effect on earnings')\n", + " ax.set_title('Continuous vs. Binary DiD: Information Loss from Binarizing')\n", + " ax.legend()\n", + "\n", + " plt.tight_layout()\n", + " plt.show()\n", + "\n", + " print(\"The red dashed line (binary DiD) collapses the entire dose-response curve\")\n", + " print(\"into a single number, losing the relationship between dose and effect.\")" + ], + "id": "40de2d0e" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Results Export and Group-Time Effects\n", + "\n", + "Results can be exported to DataFrames at different levels of aggregation using `to_dataframe()`:\n", + "\n", + "- **`level=\"dose_response\"`** (default): The dose-response curve with ATT(d) and ACRT(d)\n", + "- **`level=\"group_time\"`**: Underlying group-time cell estimates\n", + "- **`level=\"event_study\"`**: Event study effects (only available when fitted with `aggregate=\"eventstudy\"`)" + ], + "id": "b5af7533" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Dose-response level\n", + "print(\"Dose-response level:\")\n", + "dr_df = results.to_dataframe(level=\"dose_response\")\n", + "print(dr_df.head().to_string(index=False))\n", + "\n", + "print()\n", + "\n", + "# Group-time level\n", + "print(\"Group-time level:\")\n", + "gt_df = results.to_dataframe(level=\"group_time\")\n", + "print(gt_df.to_string(index=False))" + ], + "id": "91f0a878" + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. Summary\n", + "\n", + "### Key takeaways\n", + "\n", + "- **Continuous DiD** estimates dose-response effects without binarizing treatment intensity\n", + "- **ATT(d)** gives the total effect at dose *d*; **ACRT(d)** gives the marginal effect\n", + "- **ATT_glob** and **ACRT_glob** summarize the overall and marginal effects\n", + "- **B-splines** approximate the dose-response flexibly; increase `degree` and `num_knots` for non-linear patterns\n", + "- **Event study** diagnostics work the same as in binary DiD (pre-period coefficients near zero)\n", + "- Binarizing continuous treatment into binary DiD **loses information** \u2014 continuous DiD recovers the full curve\n", + "\n", + "### Parameter reference\n", + "\n", + "| Parameter | Default | Description |\n", + "|-----------|---------|-------------|\n", + "| `degree` | 3 | B-spline polynomial degree (1 = linear, 3 = cubic) |\n", + "| `num_knots` | 0 | Interior knots for spline flexibility |\n", + "| `control_group` | `\"never_treated\"` | `\"never_treated\"` or `\"not_yet_treated\"` |\n", + "| `anticipation` | 0 | Periods of treatment anticipation |\n", + "| `base_period` | `\"varying\"` | `\"varying\"` or `\"universal\"` |\n", + "| `n_bootstrap` | 0 | Bootstrap iterations (0 = analytical SEs only) |\n", + "| `bootstrap_weights` | `\"rademacher\"` | `\"rademacher\"`, `\"mammen\"`, or `\"webb\"` |\n", + "| `seed` | `None` | Random seed for reproducibility |\n", + "| `aggregate` | `None` | `\"dose\"` for dose-response, `\"eventstudy\"` for event study |\n", + "\n", + "### Reference\n", + "\n", + "Callaway, B., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2024). Difference-in-Differences with a Continuous Treatment. [arXiv:2107.02637](https://arxiv.org/abs/2107.02637)." + ], + "id": "315c7664" + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From f24f1cbedd2bbff788bdbad5dff8616c0cbebbf2 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 22 Feb 2026 12:19:56 -0500 Subject: [PATCH 2/6] Fix methodology inaccuracies in continuous DiD tutorial Address PR #178 review feedback: correct PT vs SPT identification claims, fix dose-response interpretation caveat, fix ATT_glob difference explanation, add assumption caveat to global parameters, and soften event study cohort requirement. Co-Authored-By: Claude Opus 4.6 --- docs/tutorials/14_continuous_did.ipynb | 164 +++++++++++++------------ 1 file changed, 83 insertions(+), 81 deletions(-) diff --git a/docs/tutorials/14_continuous_did.ipynb b/docs/tutorials/14_continuous_did.ipynb index 1b1b2db..1626970 100644 --- a/docs/tutorials/14_continuous_did.ipynb +++ b/docs/tutorials/14_continuous_did.ipynb @@ -2,6 +2,7 @@ "cells": [ { "cell_type": "markdown", + "id": "b1334a82", "metadata": {}, "source": [ "# Continuous Difference-in-Differences\n", @@ -12,7 +13,7 @@ "\n", "---\n", "\n", - "Standard difference-in-differences compares treated and untreated groups. But what happens when treatment isn't binary? In a **job training program**, some workers receive 5 hours of training, others get 20 hours, and some get 50. Binary DiD lumps all trained workers together, losing the rich variation in treatment intensity. **Continuous DiD** estimates how earnings respond to each level of training hours \u2014 a full dose-response curve.\n", + "Standard difference-in-differences compares treated and untreated groups. But what happens when treatment isn't binary? In a **job training program**, some workers receive 5 hours of training, others get 20 hours, and some get 50. Binary DiD lumps all trained workers together, losing the rich variation in treatment intensity. **Continuous DiD** estimates how earnings respond to each level of training hours — a full dose-response curve.\n", "\n", "### When to use `ContinuousDiD`\n", "\n", @@ -34,12 +35,12 @@ "8. Results export and group-time effects\n", "\n", "**Prerequisites:** Familiarity with standard DiD ([Tutorial 02: Staggered DiD](02_staggered_did.ipynb)) and parallel trends ([Tutorial 04](04_parallel_trends.ipynb))." - ], - "id": "b1334a82" + ] }, { "cell_type": "code", "execution_count": null, + "id": "37328298", "metadata": {}, "outputs": [], "source": [ @@ -54,11 +55,11 @@ "except ImportError:\n", " HAS_MATPLOTLIB = False\n", " print(\"matplotlib not installed - visualization examples will be skipped\")" - ], - "id": "37328298" + ] }, { "cell_type": "markdown", + "id": "c1df897c", "metadata": {}, "source": [ "## 1. Why Continuous Treatment Needs Special Methods\n", @@ -67,7 +68,7 @@ "\n", "**Problem 1: Binarizing loses information.** Collapsing 5 hours vs. 20 hours vs. 50 hours into \"trained / untrained\" tells you whether *any* training helps, but not whether *more* training helps more. You lose the dose-response relationship entirely.\n", "\n", - "**Problem 2: TWFE with continuous treatment is biased.** Plugging a continuous treatment variable into a standard two-way fixed effects regression can produce biased estimates due to negative weights, contamination bias, and scale dependence \u2014 the same issues that plague binary TWFE in staggered settings, but compounded.\n", + "**Problem 2: TWFE with continuous treatment is biased.** Plugging a continuous treatment variable into a standard two-way fixed effects regression can produce biased estimates due to negative weights, contamination bias, and scale dependence — the same issues that plague binary TWFE in staggered settings, but compounded.\n", "\n", "**Problem 3: You need new estimands.** With continuous treatment, we need to distinguish:\n", "\n", @@ -79,38 +80,38 @@ "| ACRT_glob | Average marginal return to an extra hour of training |\n", "\n", "The `ContinuousDiD` estimator from Callaway, Goodman-Bacon & Sant'Anna (2024) handles all three problems. It estimates the full dose-response curve using B-splines and provides valid inference." - ], - "id": "c1df897c" + ] }, { "cell_type": "markdown", + "id": "b3c2a66d", "metadata": {}, "source": [ "### Identifying assumptions\n", "\n", "The continuous DiD framework relies on two levels of parallel trends:\n", "\n", - "- **Standard parallel trends:** Untreated potential outcomes evolve identically regardless of treatment dose. This is the same assumption as binary DiD, extended to all dose levels. Under standard PT, the estimator identifies the **binarized** ATT (treated vs. untreated) and local effects.\n", + "- **Standard parallel trends:** Untreated potential outcomes evolve identically regardless of treatment dose. This is the same assumption as binary DiD, extended to all dose levels. Under standard PT, the estimator identifies the effect of dose *d* on units who actually received dose *d* — and the local average of that effect across dose groups. However, you cannot compare effects across different dose levels or give a causal interpretation to the dose-response curve.\n", "\n", - "- **Strong parallel trends:** Additionally, there is no selection into dose based on potential gains. That is, workers who chose 50 hours of training would have had the same earnings trajectory as workers who chose 5 hours, had they both remained untreated. Under strong PT, the full **dose-response curve** ATT(d) is identified, enabling cross-dose comparisons.\n", + "- **Strong parallel trends:** Additionally, there is no selection into dose based on potential gains. That is, workers who chose 50 hours of training would have had the same earnings trajectory as workers who chose 5 hours, had they both remained untreated. Under strong PT, the full **dose-response curve** ATT(d) is identified, cross-dose comparisons are valid, and all ACRT parameters (both dose-specific and global) have a causal interpretation.\n", "\n", - "Strong parallel trends is needed for the dose-response curve; standard PT only identifies the overall (binarized) effect." - ], - "id": "b3c2a66d" + "Strong parallel trends is needed for the dose-response curve and all marginal effect (ACRT) parameters; standard PT only identifies the binarized overall effect and its local average." + ] }, { "cell_type": "markdown", + "id": "b00d92bc", "metadata": {}, "source": [ "## 2. Data Setup\n", "\n", - "We generate a balanced panel of workers. Some are assigned to a training cohort (treated in period 2), while 30% are never treated. Each trained worker receives a dose (hours) drawn from a log-normal distribution. The true treatment effect is **ATT(d) = 1 + 2d** \u2014 each hour of training adds \\$2 to earnings, plus a base effect of \\$1 for any training at all." - ], - "id": "b00d92bc" + "We generate a balanced panel of workers. Some are assigned to a training cohort (treated in period 2), while 30% are never treated. Each trained worker receives a dose (hours) drawn from a log-normal distribution. The true treatment effect is **ATT(d) = 1 + 2d** — each hour of training adds \\$2 to earnings, plus a base effect of \\$1 for any training at all." + ] }, { "cell_type": "code", "execution_count": null, + "id": "06bc694e", "metadata": {}, "outputs": [], "source": [ @@ -131,12 +132,12 @@ "print(f\"Never-treated units: {(data.groupby('unit')['first_treat'].first() == 0).sum()}\")\n", "print()\n", "data.head(10)" - ], - "id": "06bc694e" + ] }, { "cell_type": "code", "execution_count": null, + "id": "94b82787", "metadata": {}, "outputs": [], "source": [ @@ -167,22 +168,22 @@ "\n", " plt.tight_layout()\n", " plt.show()" - ], - "id": "94b82787" + ] }, { "cell_type": "markdown", + "id": "9c685ba3", "metadata": {}, "source": [ "## 3. Basic Estimation\n", "\n", "The `ContinuousDiD` estimator follows an sklearn-like API. Call `fit()` with column names for outcome, unit, time, first treatment period, and dose. Setting `aggregate=\"dose\"` computes the dose-response curve and global summary parameters." - ], - "id": "9c685ba3" + ] }, { "cell_type": "code", "execution_count": null, + "id": "97b14ca8", "metadata": {}, "outputs": [], "source": [ @@ -198,12 +199,12 @@ ")\n", "\n", "print(results.summary())" - ], - "id": "97b14ca8" + ] }, { "cell_type": "code", "execution_count": null, + "id": "11d4fa22", "metadata": {}, "outputs": [], "source": [ @@ -212,15 +213,15 @@ "true_acrt_glob = 2.0\n", "\n", "print(\"Parameter comparison:\")\n", - "print(f\" ATT_glob \u2014 estimated: {results.overall_att:.4f}, true: {true_att_glob:.4f}, \"\n", + "print(f\" ATT_glob — estimated: {results.overall_att:.4f}, true: {true_att_glob:.4f}, \"\n", " f\"bias: {results.overall_att - true_att_glob:.4f}\")\n", - "print(f\" ACRT_glob \u2014 estimated: {results.overall_acrt:.4f}, true: {true_acrt_glob:.4f}, \"\n", + "print(f\" ACRT_glob — estimated: {results.overall_acrt:.4f}, true: {true_acrt_glob:.4f}, \"\n", " f\"bias: {results.overall_acrt - true_acrt_glob:.4f}\")" - ], - "id": "11d4fa22" + ] }, { "cell_type": "markdown", + "id": "1b80dae6", "metadata": {}, "source": [ "### Interpreting the global parameters\n", @@ -228,25 +229,27 @@ "- **ATT_glob** is the average total effect of training (at the observed dose levels) compared to no training. It answers: \"On average, how much more do trained workers earn?\"\n", "- **ACRT_glob** is the average marginal return to an additional hour of training. It answers: \"What is one more hour of training worth, on average?\"\n", "\n", + "Note: ACRT_glob and the dose-response interpretation of ATT_glob require strong parallel trends (see assumptions above). Under standard PT, the binarized overall effect is still identified.\n", + "\n", "Analytical standard errors are the default (fast, no resampling). For bootstrap-based inference, set `n_bootstrap` (see Section 6)." - ], - "id": "1b80dae6" + ] }, { "cell_type": "markdown", + "id": "8214ff79", "metadata": {}, "source": [ "## 4. Dose-Response Curves\n", "\n", - "The key advantage of continuous DiD is the **dose-response curve** \u2014 how the treatment effect varies across dose levels. The estimator evaluates ATT(d) and ACRT(d) on a grid of dose values (by default, percentiles from P10 to P99 of the observed dose distribution).\n", + "The key advantage of continuous DiD is the **dose-response curve** — how the treatment effect varies across dose levels. The estimator evaluates ATT(d) and ACRT(d) on a grid of dose values (by default, percentiles from P10 to P99 of the observed dose distribution).\n", "\n", "Access the curves via `results.dose_response_att` and `results.dose_response_acrt`, which are `DoseResponseCurve` objects with attributes `dose_grid`, `effects`, `se`, `conf_int_lower`, and `conf_int_upper`." - ], - "id": "8214ff79" + ] }, { "cell_type": "code", "execution_count": null, + "id": "5ef9b240", "metadata": {}, "outputs": [], "source": [ @@ -254,12 +257,12 @@ "att_df = results.dose_response_att.to_dataframe()\n", "print(\"ATT(d) dose-response curve:\")\n", "print(att_df.head(10).to_string(index=False))" - ], - "id": "5ef9b240" + ] }, { "cell_type": "code", "execution_count": null, + "id": "00c21a0a", "metadata": {}, "outputs": [], "source": [ @@ -291,37 +294,37 @@ "\n", " plt.tight_layout()\n", " plt.show()" - ], - "id": "00c21a0a" + ] }, { "cell_type": "markdown", + "id": "2000eb4a", "metadata": {}, "source": [ "### Interpreting the dose-response curves\n", "\n", - "- **Left panel (ATT):** The total effect of training rises roughly linearly with dose, closely tracking the true curve (dashed). A worker with 3 hours of training gains about \\$7 in earnings; a worker with 5 hours gains about \\$11.\n", - "- **Right panel (ACRT):** The marginal return to one additional hour is approximately constant at \\$2, matching the true DGP. The confidence band is wider at extreme doses where fewer workers are observed.\n", + "- **Left panel (ATT):** The total effect of training rises roughly linearly with dose, closely tracking the true curve (dashed). A worker with 3 hours of training gains about $7 in earnings; a worker with 5 hours gains about $11.\n", + "- **Right panel (ACRT):** The marginal return to one additional hour is approximately constant at $2, matching the true DGP. The confidence band is wider at extreme doses where fewer workers are observed.\n", "\n", - "These curves require the **strong parallel trends** assumption. If you only trust standard PT, the global parameters (ATT_glob, ACRT_glob) are still valid \u2014 you just can't compare effects across dose levels." - ], - "id": "2000eb4a" + "These curves require the **strong parallel trends** assumption. Under standard PT only, the overall binarized effect (the local average across dose groups) is identified, but the dose-response curve and ACRT_glob are not — they require strong PT because they involve cross-dose comparisons and counterfactual dose-response derivatives. The ATT_glob reported by the estimator is mechanically the binarized DiD: under standard PT it equals the local average effect, while under strong PT it additionally equals the global average across all dose levels." + ] }, { "cell_type": "markdown", + "id": "c88d540c", "metadata": {}, "source": [ "## 5. Event Study Diagnostics\n", "\n", "An event study aggregation binarizes treatment (treated vs. untreated) and shows effects by relative time period. This is useful for **pre-trends diagnostics**: pre-treatment coefficients should be near zero if parallel trends holds.\n", "\n", - "Event study requires data with multiple cohorts (to have variation in event-time). We generate a new dataset with two treatment cohorts." - ], - "id": "c88d540c" + "Event study works best with multiple cohorts (to have richer variation in event-time). A single cohort can still produce event-time estimates, but diagnostics are more informative with multiple cohorts. We generate a new dataset with two treatment cohorts." + ] }, { "cell_type": "code", "execution_count": null, + "id": "1ad562d7", "metadata": {}, "outputs": [], "source": [ @@ -353,12 +356,12 @@ "es_df['pre_period'] = es_df['relative_period'] < 0\n", "print(\"Event Study Effects (binarized ATT by relative period):\")\n", "print(es_df.to_string(index=False))" - ], - "id": "1ad562d7" + ] }, { "cell_type": "code", "execution_count": null, + "id": "61c7784e", "metadata": {}, "outputs": [], "source": [ @@ -381,11 +384,11 @@ "\n", " plt.tight_layout()\n", " plt.show()" - ], - "id": "61c7784e" + ] }, { "cell_type": "markdown", + "id": "7b22b542", "metadata": {}, "source": [ "## 6. Advanced Features\n", @@ -398,12 +401,12 @@ "- **`num_knots`** adds interior knots for additional flexibility. With 0 knots (default), the spline is a single polynomial segment. Adding knots lets the curve bend at more points, useful for non-linear dose-response relationships.\n", "\n", "Let's see how these settings affect estimation on data with a **log dose-response**: ATT(d) = 1 + 2 log(1 + d)." - ], - "id": "7b22b542" + ] }, { "cell_type": "code", "execution_count": null, + "id": "ebd370a0", "metadata": {}, "outputs": [], "source": [ @@ -435,12 +438,12 @@ " aggregate=\"dose\")\n", " spline_results[cfg[\"label\"]] = res_cfg\n", " print(f\"{cfg['label']:<35} {res_cfg.overall_att:>10.4f} {res_cfg.overall_acrt:>11.4f}\")" - ], - "id": "ebd370a0" + ] }, { "cell_type": "code", "execution_count": null, + "id": "141829f9", "metadata": {}, "outputs": [], "source": [ @@ -464,11 +467,11 @@ "\n", " plt.tight_layout()\n", " plt.show()" - ], - "id": "141829f9" + ] }, { "cell_type": "markdown", + "id": "b2eb2189", "metadata": {}, "source": [ "### Control group options\n", @@ -477,12 +480,12 @@ "\n", "- **`\"never_treated\"`** (default): Only units that are never treated. Stronger assumption (parallel trends must hold for all periods) but avoids contamination from units that are treated later.\n", "- **`\"not_yet_treated\"`**: Units not yet treated at time *t*. Weaker assumption and more data, but treated-later units may already be adjusting behavior in anticipation." - ], - "id": "b2eb2189" + ] }, { "cell_type": "code", "execution_count": null, + "id": "990c4782", "metadata": {}, "outputs": [], "source": [ @@ -506,11 +509,11 @@ " aggregate=\"dose\")\n", " print(f\"{cg:<20} {res_cg.overall_att:>10.4f} {res_cg.overall_att_se:>10.4f} \"\n", " f\"{res_cg.overall_acrt:>11.4f} {res_cg.overall_acrt_se:>10.4f}\")" - ], - "id": "990c4782" + ] }, { "cell_type": "markdown", + "id": "ce4cc5e1", "metadata": {}, "source": [ "### Bootstrap inference\n", @@ -518,17 +521,17 @@ "By default, `ContinuousDiD` computes **analytical standard errors** (fast, no resampling). For more robust inference, enable the **multiplier bootstrap** by setting `n_bootstrap`. The bootstrap perturbs unit-level influence functions using random weights.\n", "\n", "Weight options:\n", - "- `\"rademacher\"` (default): \u00b11 with equal probability. Good for most cases.\n", + "- `\"rademacher\"` (default): ±1 with equal probability. Good for most cases.\n", "- `\"mammen\"`: Two-point distribution matching first 3 moments.\n", "- `\"webb\"`: Six-point distribution, recommended for very few clusters (<10).\n", "\n", "We recommend `n_bootstrap >= 199` for reliable inference." - ], - "id": "ce4cc5e1" + ] }, { "cell_type": "code", "execution_count": null, + "id": "eb32265b", "metadata": {}, "outputs": [], "source": [ @@ -556,24 +559,24 @@ " ci = r.overall_acrt_conf_int\n", " print(f\"{label:<15} {r.overall_acrt:>10.4f} {r.overall_acrt_se:>10.4f} \"\n", " f\"[{ci[0]:>8.4f}, {ci[1]:>8.4f}]\")" - ], - "id": "eb32265b" + ] }, { "cell_type": "markdown", + "id": "a5230e69", "metadata": {}, "source": [ "## 7. Comparison to Binary DiD\n", "\n", - "What if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **overall ATT** (treated vs. untreated), but the binary approach discards all dose information \u2014 no dose-response curve, no marginal effects.\n", + "What if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **overall ATT** (treated vs. untreated), but the binary approach discards all dose information — no dose-response curve, no marginal effects.\n", "\n", - "Note: the ATT_glob values may differ slightly because `ContinuousDiD` uses B-spline estimation while `CallawaySantAnna` uses simple mean differences. The continuous approach provides the full dose-response curve on top of the overall effect." - ], - "id": "a5230e69" + "Note: both estimators compute ATT_glob as a binarized mean difference (treated vs. untreated), so the values should be very close. Any small differences arise from weighting or aggregation choices, control group or base period settings, or finite-sample variation — not from spline smoothing. The continuous approach provides the full dose-response curve on top of the overall effect." + ] }, { "cell_type": "code", "execution_count": null, + "id": "d3303127", "metadata": {}, "outputs": [], "source": [ @@ -593,13 +596,13 @@ "print(f\"{'CallawaySantAnna':<25} {results_binary.overall_att:>12.4f} {results_binary.overall_se:>10.4f}\")\n", "print()\n", "print(\"Binary DiD gives one number. Continuous DiD gives the full dose-response curve\")\n", - "print(\"plus marginal effects (ACRT) \u2014 without losing any information.\")" - ], - "id": "d3303127" + "print(\"plus marginal effects (ACRT) — without losing any information.\")" + ] }, { "cell_type": "code", "execution_count": null, + "id": "40de2d0e", "metadata": {}, "outputs": [], "source": [ @@ -630,11 +633,11 @@ "\n", " print(\"The red dashed line (binary DiD) collapses the entire dose-response curve\")\n", " print(\"into a single number, losing the relationship between dose and effect.\")" - ], - "id": "40de2d0e" + ] }, { "cell_type": "markdown", + "id": "b5af7533", "metadata": {}, "source": [ "## 8. Results Export and Group-Time Effects\n", @@ -644,12 +647,12 @@ "- **`level=\"dose_response\"`** (default): The dose-response curve with ATT(d) and ACRT(d)\n", "- **`level=\"group_time\"`**: Underlying group-time cell estimates\n", "- **`level=\"event_study\"`**: Event study effects (only available when fitted with `aggregate=\"eventstudy\"`)" - ], - "id": "b5af7533" + ] }, { "cell_type": "code", "execution_count": null, + "id": "91f0a878", "metadata": {}, "outputs": [], "source": [ @@ -664,11 +667,11 @@ "print(\"Group-time level:\")\n", "gt_df = results.to_dataframe(level=\"group_time\")\n", "print(gt_df.to_string(index=False))" - ], - "id": "91f0a878" + ] }, { "cell_type": "markdown", + "id": "315c7664", "metadata": {}, "source": [ "## 9. Summary\n", @@ -680,7 +683,7 @@ "- **ATT_glob** and **ACRT_glob** summarize the overall and marginal effects\n", "- **B-splines** approximate the dose-response flexibly; increase `degree` and `num_knots` for non-linear patterns\n", "- **Event study** diagnostics work the same as in binary DiD (pre-period coefficients near zero)\n", - "- Binarizing continuous treatment into binary DiD **loses information** \u2014 continuous DiD recovers the full curve\n", + "- Binarizing continuous treatment into binary DiD **loses information** — continuous DiD recovers the full curve\n", "\n", "### Parameter reference\n", "\n", @@ -699,8 +702,7 @@ "### Reference\n", "\n", "Callaway, B., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2024). Difference-in-Differences with a Continuous Treatment. [arXiv:2107.02637](https://arxiv.org/abs/2107.02637)." - ], - "id": "315c7664" + ] } ], "metadata": { @@ -710,4 +712,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} From 75a6282db9f34022caea9b1a7285fce88c7a2cd4 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 22 Feb 2026 12:45:32 -0500 Subject: [PATCH 3/6] Align REGISTRY.md with paper's two-level PT/SPT hierarchy Update REGISTRY.md ContinuousDiD section to document both standard PT and strong PT assumptions (matching continuous-did.md and CGBS 2024 Assumptions 1-2). Annotate each target parameter with which assumption it requires. Add assumption requirement notes to tutorial estimand table. Co-Authored-By: Claude Opus 4.6 --- docs/methodology/REGISTRY.md | 24 ++++++++++++++++++------ docs/tutorials/14_continuous_did.ipynb | 8 ++++---- 2 files changed, 22 insertions(+), 10 deletions(-) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 7bb17d4..f59080b 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -403,18 +403,30 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1: ### Identification -Under **Strong Parallel Trends** (SPT): for all doses d in D_+, +Two levels of parallel trends (following CGBS 2024, Assumptions 1-2): + +**Parallel Trends (PT):** for all doses d in D_+, `E[Y_t(0) - Y_{t-1}(0) | D = d] = E[Y_t(0) - Y_{t-1}(0) | D = 0]`. +Untreated potential outcome paths are the same across all dose groups and the +untreated group. Stronger than binary PT because it conditions on specific dose values. +Identifies: `ATT(d|d)`, `ATT^{loc}`. Does NOT identify `ATT(d)`, `ACRT`, or cross-dose comparisons. + +**Strong Parallel Trends (SPT):** additionally, for all d in D, +`E[Y_t(d) - Y_{t-1}(0) | D > 0] = E[Y_t(d) - Y_{t-1}(0) | D = d]`. +No selection into dose groups on the basis of treatment effects. +Implies `ATT(d|d) = ATT(d)` for all d. +Additionally identifies: `ATT(d)`, `ACRT(d)`, `ACRT^{glob}`, and cross-dose comparisons. -This is stronger than standard PT because it conditions on specific dose values. +See `docs/methodology/continuous-did.md` Section 4 for full details. ### Key Equations **Target parameters:** -- `ATT(d) = E[Y_t(d) - Y_t(0) | D > 0]` — dose-response curve -- `ACRT(d) = dATT(d)/dd` — average causal response (marginal effect) -- `ATT^{glob} = E[Delta Y | D > 0] - E[Delta Y | D = 0]` — binarized ATT -- `ACRT^{glob} = E[ACRT(D_i) | D > 0]` — plug-in average marginal effect +- `ATT(d|d) = E[Y_t(d) - Y_t(0) | D = d]` — effect of dose d on units who received dose d (PT) +- `ATT(d) = E[Y_t(d) - Y_t(0) | D > 0]` — dose-response curve (SPT required) +- `ACRT(d) = dATT(d)/dd` — average causal response / marginal effect (SPT required) +- `ATT^{glob} = E[Delta Y | D > 0] - E[Delta Y | D = 0]` — binarized ATT (PT) +- `ACRT^{glob} = E[ACRT(D_i) | D > 0]` — plug-in average marginal effect (SPT required) **Estimation via B-spline OLS:** 1. Compute `Delta_tilde_Y = (Y_t - Y_{t-1})_treated - mean((Y_t - Y_{t-1})_control)` diff --git a/docs/tutorials/14_continuous_did.ipynb b/docs/tutorials/14_continuous_did.ipynb index 1626970..163b847 100644 --- a/docs/tutorials/14_continuous_did.ipynb +++ b/docs/tutorials/14_continuous_did.ipynb @@ -74,10 +74,10 @@ "\n", "| Parameter | Meaning (training example) |\n", "|-----------|---------------------------|\n", - "| ATT(d) | Total earnings gain from *d* hours of training vs. no training |\n", - "| ACRT(d) | Earnings gain from **one additional hour** at level *d* (marginal effect) |\n", - "| ATT_glob | Average effect of any training vs. none (across all dose levels) |\n", - "| ACRT_glob | Average marginal return to an extra hour of training |\n", + "| ATT(d) | Total earnings gain from *d* hours of training vs. no training, across all treated workers (requires strong PT) |\n", + "| ACRT(d) | Earnings gain from **one additional hour** at level *d* (marginal effect; requires strong PT) |\n", + "| ATT_glob | Average effect of any training vs. none (binarized; valid under standard PT) |\n", + "| ACRT_glob | Average marginal return to an extra hour of training (requires strong PT) |\n", "\n", "The `ContinuousDiD` estimator from Callaway, Goodman-Bacon & Sant'Anna (2024) handles all three problems. It estimates the full dose-response curve using B-splines and provides valid inference." ] From 8bf8839e9ba175348daa413cbccdd3b0809189ad Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 22 Feb 2026 12:59:06 -0500 Subject: [PATCH 4/6] Fix ATT^{loc}/ATT^{glob} terminology across docs and code The binarized DiD identifies ATT^{loc} under PT, not ATT^{glob}. ATT^{glob} = ATT^{loc} only under SPT. Update REGISTRY.md to list both parameters with correct assumption labels, fix overall_att docstring in continuous_did_results.py, and use correct ATT_loc/ ATT_glob terminology in the tutorial's dose-response interpretation. Co-Authored-By: Claude Opus 4.6 --- diff_diff/continuous_did_results.py | 2 +- docs/methodology/REGISTRY.md | 3 ++- docs/tutorials/14_continuous_did.ipynb | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/diff_diff/continuous_did_results.py b/diff_diff/continuous_did_results.py index 8148d57..3121160 100644 --- a/diff_diff/continuous_did_results.py +++ b/diff_diff/continuous_did_results.py @@ -90,7 +90,7 @@ class ContinuousDiDResults: dose_response_acrt : DoseResponseCurve ACRT(d) dose-response curve. overall_att : float - Binarized overall ATT^{glob}. + Binarized overall ATT (ATT^{loc} under PT, equals ATT^{glob} under SPT). overall_acrt : float Plug-in overall ACRT^{glob}. group_time_effects : dict diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index f59080b..3c88f1d 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -425,7 +425,8 @@ See `docs/methodology/continuous-did.md` Section 4 for full details. - `ATT(d|d) = E[Y_t(d) - Y_t(0) | D = d]` — effect of dose d on units who received dose d (PT) - `ATT(d) = E[Y_t(d) - Y_t(0) | D > 0]` — dose-response curve (SPT required) - `ACRT(d) = dATT(d)/dd` — average causal response / marginal effect (SPT required) -- `ATT^{glob} = E[Delta Y | D > 0] - E[Delta Y | D = 0]` — binarized ATT (PT) +- `ATT^{loc} = E[ATT(D|D) | D > 0] = E[Delta Y | D > 0] - E[Delta Y | D = 0]` — binarized ATT (PT); equals `ATT^{glob}` under SPT +- `ATT^{glob} = E[ATT(D) | D > 0]` — global average dose-response level (SPT required) - `ACRT^{glob} = E[ACRT(D_i) | D > 0]` — plug-in average marginal effect (SPT required) **Estimation via B-spline OLS:** diff --git a/docs/tutorials/14_continuous_did.ipynb b/docs/tutorials/14_continuous_did.ipynb index 163b847..2ca12b5 100644 --- a/docs/tutorials/14_continuous_did.ipynb +++ b/docs/tutorials/14_continuous_did.ipynb @@ -306,7 +306,7 @@ "- **Left panel (ATT):** The total effect of training rises roughly linearly with dose, closely tracking the true curve (dashed). A worker with 3 hours of training gains about $7 in earnings; a worker with 5 hours gains about $11.\n", "- **Right panel (ACRT):** The marginal return to one additional hour is approximately constant at $2, matching the true DGP. The confidence band is wider at extreme doses where fewer workers are observed.\n", "\n", - "These curves require the **strong parallel trends** assumption. Under standard PT only, the overall binarized effect (the local average across dose groups) is identified, but the dose-response curve and ACRT_glob are not — they require strong PT because they involve cross-dose comparisons and counterfactual dose-response derivatives. The ATT_glob reported by the estimator is mechanically the binarized DiD: under standard PT it equals the local average effect, while under strong PT it additionally equals the global average across all dose levels." + "These curves require the **strong parallel trends** assumption. Under standard PT only, the overall binarized effect is identified — this is ATT_loc (the local average across dose groups). The dose-response curve and ACRT_glob are not identified under standard PT because they involve cross-dose comparisons and counterfactual dose-response derivatives. The ATT_glob reported by the estimator is mechanically the binarized DiD: under standard PT it equals ATT_loc, while under strong PT it additionally equals the global average ATT_glob." ] }, { From e1c061683d82dd13b95891f287db9a5875304f83 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 22 Feb 2026 13:11:28 -0500 Subject: [PATCH 5/6] Bridge API naming (ATT_glob) with formal terminology (ATT_loc/ATT_glob) Update tutorial estimand table, global parameter interpretation, and binary comparison sections to clarify that the API's ATT_glob/overall_att identifies ATT_loc under standard PT and equals ATT_glob only under SPT. Co-Authored-By: Claude Opus 4.6 --- docs/tutorials/14_continuous_did.ipynb | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/docs/tutorials/14_continuous_did.ipynb b/docs/tutorials/14_continuous_did.ipynb index 2ca12b5..f3ac70a 100644 --- a/docs/tutorials/14_continuous_did.ipynb +++ b/docs/tutorials/14_continuous_did.ipynb @@ -76,7 +76,7 @@ "|-----------|---------------------------|\n", "| ATT(d) | Total earnings gain from *d* hours of training vs. no training, across all treated workers (requires strong PT) |\n", "| ACRT(d) | Earnings gain from **one additional hour** at level *d* (marginal effect; requires strong PT) |\n", - "| ATT_glob | Average effect of any training vs. none (binarized; valid under standard PT) |\n", + "| ATT_glob | Average effect of any training vs. none (binarized). Identifies ATT_loc under standard PT; equals ATT_glob under strong PT |\n", "| ACRT_glob | Average marginal return to an extra hour of training (requires strong PT) |\n", "\n", "The `ContinuousDiD` estimator from Callaway, Goodman-Bacon & Sant'Anna (2024) handles all three problems. It estimates the full dose-response curve using B-splines and provides valid inference." @@ -226,10 +226,8 @@ "source": [ "### Interpreting the global parameters\n", "\n", - "- **ATT_glob** is the average total effect of training (at the observed dose levels) compared to no training. It answers: \"On average, how much more do trained workers earn?\"\n", - "- **ACRT_glob** is the average marginal return to an additional hour of training. It answers: \"What is one more hour of training worth, on average?\"\n", - "\n", - "Note: ACRT_glob and the dose-response interpretation of ATT_glob require strong parallel trends (see assumptions above). Under standard PT, the binarized overall effect is still identified.\n", + "- **ATT_glob** is the average total effect of training (at the observed dose levels) compared to no training. It answers: \"On average, how much more do trained workers earn?\" The API reports this as `overall_att`. Under standard PT, this identifies the local average effect (ATT_loc); under strong PT, it additionally equals the global average (ATT_glob).\n", + "- **ACRT_glob** is the average marginal return to an additional hour of training. It answers: \"What is one more hour of training worth, on average?\" This requires strong PT for causal interpretation.\n", "\n", "Analytical standard errors are the default (fast, no resampling). For bootstrap-based inference, set `n_bootstrap` (see Section 6)." ] @@ -568,9 +566,9 @@ "source": [ "## 7. Comparison to Binary DiD\n", "\n", - "What if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **overall ATT** (treated vs. untreated), but the binary approach discards all dose information — no dose-response curve, no marginal effects.\n", + "What if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **binarized ATT** (treated vs. untreated), but the binary approach discards all dose information — no dose-response curve, no marginal effects.\n", "\n", - "Note: both estimators compute ATT_glob as a binarized mean difference (treated vs. untreated), so the values should be very close. Any small differences arise from weighting or aggregation choices, control group or base period settings, or finite-sample variation — not from spline smoothing. The continuous approach provides the full dose-response curve on top of the overall effect." + "Note: both estimators compute the binarized ATT as a simple mean difference (treated vs. untreated), so the values should be very close. Under standard PT this identifies ATT_loc (the local average); under strong PT it additionally equals ATT_glob. Any small differences arise from weighting or aggregation choices, control group or base period settings, or finite-sample variation — not from spline smoothing. The continuous approach provides the full dose-response curve on top of the binarized effect." ] }, { From adcbad41ad9bee4f0fe7a72d825a7c7fb4f7d5f4 Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 22 Feb 2026 13:29:08 -0500 Subject: [PATCH 6/6] Fix binarized ATT description to reference group-time aggregation Replace "simple mean difference" with "aggregating group-time effects" in the binary comparison section, accurately reflecting the weighting used by both ContinuousDiD and CallawaySantAnna. Co-Authored-By: Claude Opus 4.6 --- docs/tutorials/14_continuous_did.ipynb | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/docs/tutorials/14_continuous_did.ipynb b/docs/tutorials/14_continuous_did.ipynb index f3ac70a..95615bd 100644 --- a/docs/tutorials/14_continuous_did.ipynb +++ b/docs/tutorials/14_continuous_did.ipynb @@ -563,13 +563,7 @@ "cell_type": "markdown", "id": "a5230e69", "metadata": {}, - "source": [ - "## 7. Comparison to Binary DiD\n", - "\n", - "What if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **binarized ATT** (treated vs. untreated), but the binary approach discards all dose information — no dose-response curve, no marginal effects.\n", - "\n", - "Note: both estimators compute the binarized ATT as a simple mean difference (treated vs. untreated), so the values should be very close. Under standard PT this identifies ATT_loc (the local average); under strong PT it additionally equals ATT_glob. Any small differences arise from weighting or aggregation choices, control group or base period settings, or finite-sample variation — not from spline smoothing. The continuous approach provides the full dose-response curve on top of the binarized effect." - ] + "source": "## 7. Comparison to Binary DiD\n\nWhat if we ignore dose entirely and just run a standard binary Callaway-Sant'Anna estimator? Both approaches should give a similar **binarized ATT** (treated vs. untreated), but the binary approach discards all dose information — no dose-response curve, no marginal effects.\n\nNote: both estimators compute the binarized ATT by aggregating group-time effects, so the values should be very close. Under standard PT this identifies ATT_loc (the local average); under strong PT it additionally equals ATT_glob. Any small differences arise from weighting or aggregation choices, control group or base period settings, or finite-sample variation — not from spline smoothing. The continuous approach provides the full dose-response curve on top of the binarized effect." }, { "cell_type": "code", @@ -710,4 +704,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file