class: center, middle, inverse, title-slide .title[ # Module 3: Designing Around Interference ] .subtitle[ ## Cluster Randomization, Switchbacks, and the Bias-Variance Tradeoff ] --- <style type="text/css"> .remark-code, .remark-inline-code { font-size: 80%; } .remark-slide-content { padding: 1em 2em; } .small { font-size: 80%; } .tiny { font-size: 65%; } .highlight-box { background: #fff3e0; border-left: 4px solid #e65100; padding: 0.5em 1em; margin: 0.5em 0; } .blue-box { background: #e3f2fd; border-left: 4px solid #1565c0; padding: 0.5em 1em; margin: 0.5em 0; } .nav-btn { position: absolute; bottom: 12px; left: 40px; font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; z-index: 100; text-decoration: none; color: #1a237e; } .nav-btn:hover { background: #c5cae9; } .inline-btn { font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; text-decoration: none; color: #1a237e; margin-right: 6px; vertical-align: middle; } .inline-btn:hover { background: #c5cae9; } .nav-btn-br { position: absolute; bottom: 12px; right: 70px; font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; z-index: 100; text-decoration: none; color: #1a237e; } .nav-btn-br:hover { background: #c5cae9; } </style> # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr> <tr><td>2</td><td><a href="../module-02/slides.html">SUTVA and When It Breaks</a></td><td>✓ done</td></tr> <tr><td><b>3</b></td><td><b>Designing Around Interference</b> <i>(you are here)</i></td><td>current</td></tr> <tr><td>4</td><td>Power and Sample Size</td><td>upcoming</td></tr> <tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr> <tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr> <tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr> <tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr> </table> --- # Last Time: SUTVA Breaks Everywhere Module 2 showed that interference biases the naive ATE in two settings we'll keep using here: - **Marketplace (zone-notification experiment):** notified drivers compete for rides in the same zone, contaminating the comparison - **Networks (author-nudge experiment):** control researchers learn the new reporting practice from treated co-authors -- Today: **how to design experiments that work despite interference.** The core idea is simple: randomize at a level where interference is contained. -- .highlight-box[ Every design in this module trades **bias** (from interference) for **variance** (from fewer independent units). The art is finding the sweet spot. ] --- # The Setup: 40 Cities, 5,000 Drivers Each A ride-sharing platform operates in **40 cities** with **5,000 drivers each** (200,000 total). They want to test the **zone-notification** feature from Module 1. -- **Within-city interference (symmetric):** as more drivers are notified, the high-demand zone gets crowded — *every* driver in the city (notified or not) faces lower accept rates. Direct boost `\(\tau = 0.05\)`, interference `\(\lambda = 0.07\)` → policy effect = `\(-0.02\)` at full rollout. -- **Design options:** | Design | Unit of randomization | Independent units | Estimand | |--------|----------------------|-------------------|----------| | Individual | Driver | 200,000 | Direct effect (+0.05) | | City cluster | City | 40 | Policy effect (−0.02) | | Region cluster | 5-city region | 8 | Policy effect (−0.02) | --- name: indiv-cluster-main # Individual vs Cluster: Side by Side <img src="slides_files/figure-html/indiv-vs-cluster-1.png" style="display: block; margin: auto;" /> <a href="#dgp-indiv-cluster" class="nav-btn">DGP code</a> --- name: cluster-bias-main # Two Estimators, Two Estimands .pull-left[ .small[ **Individual rand → direct effect:** - 50/50 within every city, so frac_t ≈ 0.5 everywhere - Both arms face the *same* penalty `\(-\lambda \cdot 0.5\)`; it cancels - Naive recovers `\(\tau\)` = +0.05 - Answers: *"marginal benefit to one driver?"* **City-cluster rand → policy effect:** - Treated cities: frac_t = 1 (full penalty `\(-\lambda\)`) - Control cities: frac_t = 0 (no penalty) - Cluster contrast = `\(\tau - \lambda\)` = −0.02 - Answers: *"what happens at full rollout?"* Both are unbiased — for *different estimands*. Pick one based on the decision you're trying to make. ] ] .pull-right[ |Design |Treated mean |Control mean |Estimator gives | |:-------------------|:------------|:------------|:----------------------| |Individual (50/50) |b + τ − λ·½ |b − λ·½ |τ (direct, +0.05) | |Cluster (saturated) |b + τ − λ |b |τ − λ (policy, −0.02) | .highlight-box[ **Intuition:** the cluster contrast *includes* the interference term, so it captures the equilibrium effect under full rollout. Individual rand mechanically cancels the term out. ] ] .small[**Cost of clustering:** effective sample size collapses from **200,000** drivers (individual) to **40** cities (city cluster) to **8** regions (region cluster). The next slide quantifies this with the design effect.] <a href="#cluster-bias-proof" class="nav-btn">decomposition</a> --- # Design Effect and ICC The **design effect** is the variance inflation factor from clustering: `$$DEFF = 1 + (m - 1) \cdot \rho \qquad\Longrightarrow\qquad n_{eff} = \frac{n}{DEFF}$$` For a fixed effect size you want to detect, required `\(n\)` scales by `\(DEFF\)` — equivalently, your effective sample shrinks by the same factor. `\(m\)` is cluster size (drivers per city), `\(\rho\)` is intra-cluster correlation (ICC). -- .pull-left[ .small[ | m (drivers)| ICC| DEFF| Eff. N (of 200k)| Power (MDE = 15 pp)| |-----------:|----:|-------:|----------------:|-------------------:| | 100| 0.02| 3.0| 67,114| 100%| | 100| 0.05| 6.0| 33,613| 100%| | 100| 0.10| 10.9| 18,349| 100%| | 5,000| 0.02| 101.0| 1,981| 100%| | 5,000| 0.05| 251.0| 797| 99%| | 5,000| 0.10| 500.9| 399| 86%| | 25,000| 0.02| 501.0| 399| 86%| | 25,000| 0.05| 1,251.0| 160| 49%| | 25,000| 0.10| 2,500.9| 80| 28%| ] ] .pull-right[ .small[ **Plausible ranges for `\(\rho\)`:** | Setting | Typical `\(\rho\)` | |---------|----------------| | Online behavior (clicks, conversions) | 0.001 – 0.01 | | Marketplace metrics (accept rate, ride completion) | 0.02 – 0.10 | | Geo clusters (DMA, city) | 0.05 – 0.20 | | Strong common shocks (weather, surge) | 0.10 – 0.30 | **Rule of thumb:** anything you'd model with city or time-period fixed effects has `\(\rho\)` big enough to matter. Estimate `\(\rho\)` from a pilot or historical data — don't guess. ] ] <a href="#icc-derivation" class="nav-btn">ICC derivation</a> --- # The Bias-Variance Tradeoff: Simulated <img src="slides_files/figure-html/tradeoff-sim-1.png" style="display: block; margin: auto;" /> .small[ **Bias** vanishes once clusters reach city scale — interference is fully contained. **Variance** keeps climbing — fewer effective independent units. ] --- # Switchback Designs An alternative to spatial clustering: **randomize over time**. -- .pull-left[ **How it works:** 1. Divide time into periods (e.g., 2-hour blocks) 2. Each period, randomize each city to notification on/off 3. Compare accept rates in notified vs. non-notified periods 4. Every city serves as its own control (removes city fixed effects) ] -- .pull-right[ <img src="slides_files/figure-html/switchback-diagram-1.png" style="display: block; margin: auto;" /> ] --- name: switchback-strengths # Switchback: Strengths and Risks .small[ **Strengths:** - Every unit (city) appears in both arms over time — removes unit-level confounders - **More "independent observations" than pure clustering:** identification is *within-city* (a city's "on" periods vs. its own "off" periods), not between cities. The ICC penalty is differenced out, and effective `\(N\)` grows from `\(\approx n_{cities}\)` toward `\(n_{cities} \times n_{periods}\)` <a href="#switchback-effn" class="inline-btn">proof</a> - Natural for marketplace experiments where interference is spatial, not temporal ] -- **Risks:** <img src="slides_files/figure-html/carryover-1.png" style="display: block; margin: auto;" /> .small[ **Carryover:** notification in period `\(t\)` affects behavior in `\(t+1\)` even if `\(t+1\)` is "off" (drivers who learned the zone keep going). **Fix:** insert *washout windows* — short buffers right after each switch that you exclude from analysis, letting behavior reset before measurement resumes. **Cost:** fewer usable minutes per period. ] --- # Switchback: Simulation <img src="slides_files/figure-html/switchback-sim-1.png" style="display: block; margin: auto;" /> <a href="#switchback-dgp" class="nav-btn">DGP code</a> --- # Two-Sided Marketplace Designs In a ride-sharing experiment, you can randomize **riders** instead of **drivers**: -- .pull-left[ **Randomize riders:** - 50% of riders see new pricing or matching - 50% see the old experience - Riders don't interact with each other (mostly) - No interference on the randomization side **Measure driver outcomes:** - Driver earnings, wait times, accept rate - These reflect the equilibrium effect ] -- .pull-right[ **Why it works:** - Each rider's experience is independent (they don't share rides) - But their collective demand changes driver supply allocation - You get the direct effect on riders AND the indirect effect on drivers **When it doesn't work:** - If treated riders change market-wide conditions (e.g., surge pricing triggers) - If the treatment fraction is large enough to move equilibrium prices ] --- # Geo Experiments .pull-left[ .small[ For online advertising and pricing experiments, **geographic units** are natural clusters. A **DMA** (Designated Market Area, defined by Nielsen) is a metro region used for TV and ad targeting — e.g., "New York DMA," "Los Angeles DMA." There are **210 DMAs in the US**, ranging from a few hundred thousand to ~20M people. Ad spend can be served at DMA granularity. **Treatment** in a geo experiment = changing ad spend in randomly chosen DMAs (typically *pausing* or *boosting* a campaign), holding control DMAs at the status quo. Outcome = revenue / conversions / signups per DMA. The contrast estimates **incremental lift** — what wouldn't have happened organically. ] ] .pull-right[ <img src="slides_files/figure-html/geo-diagram-1.png" style="display: block; margin: auto;" /> ] .blue-box[ **Google/Meta approach:** "GeoX" / "geo-based incrementality." Randomize ad spend across 50–200 DMAs. Power depends on between-DMA variance and number of DMAs, not number of users. ] --- # Ego-Cluster Randomization for Networks For **network interference** (the author-nudge experiment from Module 2), standard clustering is hard because network boundaries are unclear. -- .small[ **Ego-cluster approach** (node roles in the chart below): 1. Pick a focal study — the **ego** (center of each cluster, labeled "ego") 2. Add every study sharing co-authors with it — the **alters** (the four nodes around each ego) 3. Randomize the entire ego-cluster (ego + alters) to the same arm 4. Within-cluster spillover is absorbed; **bridges** (faded grey rings) are alters whose ties extend into the other arm — **drop them from the analysis sample** to avoid cross-arm contamination ] -- <img src="slides_files/figure-html/ego-cluster-1.png" style="display: block; margin: auto;" /> --- # Approach 4: Model the Spillover (Exposure Mapping) The first three approaches **contain** interference by design. When you can't — diffuse networks, observational settings, small samples — **model the spillover** instead. .pull-left[ .small[ **Idea:** for each unit `\(i\)`, build an exposure measure `\(S_i\)` summarizing how its network neighbors are treated. Include `\(S_i\)` as a covariate; recover the direct effect *and* the spillover jointly. For study `\(i\)` with author set `\(A_i\)` (M2's author-nudge experiment), `\(S_{ik} = \sum_{a \in A_i} \sum_{s \neq i} M_{as} T_{sk}\)` counts other studies in arm `\(k\)` that share an author with `\(i\)`. - `\(S_i = \sum_{k \geq 1} S_{ik}\)` — exposure to *any* treatment arm - `\(P_i = \sum_{k \geq 0} S_{ik}\)` — total degree (incl. control) ] ] .pull-right[ <img src="slides_files/figure-html/spillover-diagram-1.png" style="display: block; margin: auto;" /> .small[ Study 2 shares author *b* with study 1 and *e* with study 3 ⇒ `\(S_2 = 1\)`, `\(P_2 = 2\)`. Study 4's only tie is out-of-sample ⇒ `\(S_4 = P_4 = 0\)`. ] ] <a href="#spillover-regression" class="nav-btn">regression & validity</a> --- # Approach 4: Simulation True parameters: `\(\tau_1=0.05\)`, `\(\tau_2=0.10\)`, `\(\tau_3=0.15\)` (direct), `\(\beta_0=0.05\)` (control spillover), `\(\theta=0.02\)` (extra on treated), `\(\psi=0.05\)` (direct effect of degree `\(P_i\)`). 500 studies, co-author network (mean degree ≈ 6), random T0/T1/T2/T3 assignment. <img src="slides_files/figure-html/spillover-sim-1.png" style="display: block; margin: auto;" /> <a href="#spillover-regression" class="nav-btn">regression & validity</a> --- # Application: City-Level Zone-Notification Test A ride-sharing company wants to test whether the **zone-notification** feature lifts driver accept rate at the *city* level. -- .pull-left[ **Why cluster at the city level?** - Within a city, notified drivers compete with non-notified for rides - A 50/50 split per city contaminates control's accept rate - City-level randomization: each city gets the feature on or off, no within-city mixing ] -- .pull-right[ **The power problem:** - Only 40 cities available - 20 treatment, 20 control - ICC of accept rate within cities ≈ 0.10 - DEFF `\(\approx 500 \;\Longrightarrow\;\)` Effective `\(N \approx 400\)` (out of 200k drivers) - For MDE = 15 pp, power `\(\approx 86\%\)` — need a **large** effect ] -- .highlight-box[ **The interview question:** "We have 40 cities. Can we run a cluster-randomized experiment?" Answer: probably, but only if the expected effect is large. Calculate the minimum detectable effect (MDE) and check if it's business-relevant. ] --- # Application: Switchback for Zone Notifications **Alternative:** use a switchback design within each city. -- - Divide each day into 2-hour blocks - Randomize which blocks have the zone-notification feature on vs. off - 12 blocks/day × 30 days × 40 cities = 14,400 city-period cells - Effective N is much larger than 40 clusters -- **But:** - **Carryover**: if drivers learn the high-demand zone during "on" periods, they keep going there during "off" periods - Need washout periods (waste time and data) - Confounding with time-of-day effects (notifications matter more at peak times) -- .blue-box[ **Practical design:** stratified switchback. Within each city, pair "similar" time blocks (e.g., Monday 5–7pm with Wednesday 5–7pm) and randomize within pairs. This controls for time-of-day and day-of-week effects. ] --- # Application: Geo Incrementality for Online Ads An e-commerce company wants to measure whether their TV ad campaign drives sales. -- **Design:** - 100 DMAs (Designated Market Areas) across the US - Randomly assign 50 DMAs to see ads, 50 to no ads - Measure sales lift at the DMA level - Control for baseline sales using **CUPED** — subtract a scaled pre-experiment outcome from each DMA's post outcome to soak up pre-existing variation (covered in M5) -- **Challenges:** - DMAs vary enormously in size (NYC vs. rural Montana) - Need to weight by population or stratify by DMA size - Spillovers: people in control DMAs can see ads on streaming/social media (imperfect compliance) - Attribution: sales in a DMA might come from visitors from neighboring DMAs -- .highlight-box[ **Key insight:** geo experiments work best when the treatment is *geographically contained* (TV, billboards, local promotions). They work poorly for digital treatments that cross geo boundaries (social media ads, viral content). ] --- # Choosing the Right Design | Interference type | Recommended design | Key tradeoff | |-------------------|-------------------|--------------| | Within-market (supply/demand) | City-level cluster | Bias vs. power | | Temporal (pricing, algorithms) | Switchback | Carryover vs. efficiency | | Network (social, co-authorship) | Ego-cluster | Cluster size vs. coverage | | Geographic (ads, promotions) | Geo experiment | Geo count vs. precision | | Two-sided marketplace | Randomize one side | Indirect effects only | -- **Decision framework:** 1. **Where does interference happen?** Within cities? Across time? Through networks? 2. **Can you cluster at that level?** How many independent clusters do you have? 3. **Is the effect large enough to detect with that many clusters?** 4. **Is there carryover/leakage between clusters?** If so, switchback may be risky. --- # Summary of the Bias-Variance Tradeoff <img src="slides_files/figure-html/final-tradeoff-1.png" style="display: block; margin: auto;" /> --- # Key Takeaways 1. **Cluster randomization** eliminates interference bias by assigning entire clusters to the same arm. Cost: fewer independent units = more variance. 2. **Switchback designs** randomize over time. Good when interference is spatial. Watch for carryover effects. 3. **Two-sided marketplace designs**: randomize one side, measure the other. Works when one side doesn't interact with itself. 4. **Geo experiments**: natural clusters for advertising. Need many geos (50+) and geographically contained treatments. 5. **Ego-cluster randomization**: for network interference (e.g., the author-nudge experiment). Cluster = focal unit + neighbors. 6. **The bias-variance tradeoff** is the central tension. Minimize **RMSE**, not just bias or variance alone. 7. The **design effect** formula `\(n_{eff} = n / [1 + (m-1)\rho]\)` tells you how much power you lose from clustering. --- # Exercise Preview In the exercise you will: 1. Compare individual vs. city-level randomization in a zone-notification simulation 2. Show that clustering eliminates bias but increases variance 3. Sweep over cluster sizes and find the RMSE-minimizing level 4. Simulate a switchback design with and without carryover 5. Compute design effects for different ICC values See `exercise.R` for the starter code. --- # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr> <tr><td>2</td><td><a href="../module-02/slides.html">SUTVA and When It Breaks</a></td><td>✓ done</td></tr> <tr><td><b>3</b></td><td><b>Designing Around Interference</b> <i>(just finished)</i></td><td>✓ done</td></tr> <tr><td>4</td><td>Power and Sample Size</td><td>up next</td></tr> <tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr> <tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr> <tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr> <tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr> </table> --- class: center, middle, inverse # Backup Slides --- name: cluster-bias-proof # Backup: Two Estimators, Two Estimands **Symmetric interference DGP** — every driver in a city competes against the same treated share `\(s\)`: `$$y_i^{(0)} = b_i - \lambda \cdot s_{c(i)}, \qquad y_i^{(1)} = b_i + \tau - \lambda \cdot s_{c(i)}$$` with `\(b_i = 0.4 + 0.2\,\text{exp}_i + \alpha_{c(i)}\)`, `\(\tau = 0.05\)` direct effect, `\(\lambda = 0.07\)` interference. -- The naive contrast under random assignment — `\(\bar b\)` balances across arms: `$$\hat{\tau}_{\text{naive}} = E[y^{(1)} \mid D=1] - E[y^{(0)} \mid D=0] = (\bar b + \tau - \lambda \bar s_{D=1}) - (\bar b - \lambda \bar s_{D=0}) = \tau - \lambda(\bar s_{D=1} - \bar s_{D=0})$$` -- **Individual rand** (50/50 in every city → `\(\bar s_{D=1} = \bar s_{D=0} = 0.5\)`): `$$\hat\tau_{\text{ind}} = \tau - \lambda(0.5 - 0.5) = \tau = 0.05 \quad \text{(direct effect)}$$` **City cluster** (treated cities `\(s = 1\)`, control cities `\(s = 0\)`): `$$\hat\tau_{\text{cluster}} = \tau - \lambda(1 - 0) = \tau - \lambda = -0.02 \quad \text{(policy effect)}$$` -- Both are unbiased — for *different* estimands. If your decision is "should we roll this out everywhere?", you want `\(\hat\tau_{\text{cluster}}\)`. <a href="#cluster-bias-main" class="nav-btn">← back</a> --- name: dgp-indiv-cluster # Backup: DGP for the Slide 5 Simulation .small[ ```r direct_effect <- 0.05 # per-driver direct boost from being notified interference <- 0.07 # symmetric crowd-out per share treated (hits BOTH arms) # Individual rand: 50/50 in every city → frac_t ≈ 0.5 everywhere sim_individual <- function() { d <- market |> mutate(notification = sample(rep(c(0, 1), each = n() / 2))) d |> group_by(city_id) |> mutate(frac_t = mean(notification), # ≈ 0.5 y0 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect # baseline - interference * frac_t)), # CONTROLS hit by competition y1 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect # baseline + direct_effect - interference * frac_t))) |> # TREATED also hit ungroup() |> mutate(y_obs = rbinom(n(), 1, prob = if_else(notification == 1, y1, y0))) |> summarise(ate = mean(y_obs[notification == 1]) - mean(y_obs[notification == 0])) } # City-cluster rand: every city is 100% same arm → frac_t is 0 or 1 sim_cluster <- function() { arms <- tibble(city_id = 1:n_cities, notification = sample(rep(c(0, 1), each = n_cities / 2))) market |> left_join(arms, by = "city_id") |> group_by(city_id) |> mutate(frac_t = mean(notification)) |> ungroup() |> mutate(y0 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect - interference * frac_t)), # 0 in control cities y1 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect + direct_effect - interference * frac_t)), # full λ in treated cities y_obs = rbinom(n(), 1, prob = if_else(notification == 1, y1, y0))) |> summarise(ate = mean(y_obs[notification == 1]) - mean(y_obs[notification == 0])) } ``` ] .small[ - **Individual:** penalty cancels (both arms get `\(-\lambda \cdot 0.5\)`) → naive = `\(\tau\)` = +0.05 (direct). - **Cluster:** treated cities take full `\(-\lambda\)`, control cities take none → ATE = `\(\tau - \lambda\)` = −0.02 (policy). ] <a href="#indiv-cluster-main" class="nav-btn">← back</a> --- name: icc-derivation # Backup: ICC and Design Effect Derivation .small[ **ICC:** `\(\rho = \dfrac{\sigma^2_b}{\sigma^2_b + \sigma^2_w}\)`, with `\(\sigma^2_b\)` = between-cluster variance, `\(\sigma^2_w\)` = within-cluster variance. For a cluster of size `\(m\)` and `\(K\)` clusters total ( `\(n = Km\)` ): `$$\text{Var}(\bar{Y}_c) = \frac{\sigma^2_w}{m} + \sigma^2_b \qquad\Longrightarrow\qquad \text{Var}(\bar{Y}) = \frac{\sigma^2_w}{Km}\bigl[1 + (m-1)\rho\bigr]$$` Under simple random sampling, `\(\text{Var}_{SRS}(\bar{Y}) = \sigma^2 / n\)`. The ratio is the **design effect**: `$$\text{DEFF} = \frac{\text{Var}(\bar{Y})}{\text{Var}_{SRS}(\bar{Y})} = 1 + (m-1)\rho \qquad\Longrightarrow\qquad n_{eff} = \frac{n}{\text{DEFF}}$$` ] --- name: switchback-dgp # Backup: DGP for the Switchback Simulation .small[ ```r tau_sb <- 0.05 # true within-city treatment effect sim_switchback <- function(carryover = 0) { n_periods <- 50; n_sw_cities <- 20 d <- expand_grid(city_id = 1:n_sw_cities, period = 1:n_periods) |> mutate(notification = sample(c(0, 1), n(), replace = TRUE), # i.i.d. switching city_effect = rep(rnorm(n_sw_cities, 0, 0.05), # μ_c times = n_periods), period_effect = rep(rnorm(n_periods, 0, 0.02), # γ_t each = n_sw_cities)) |> group_by(city_id) |> mutate(prev = lag(notification, default = 0), # Asymmetric carryover: residual treatment leaks into off periods ONLY. # On periods get the full direct effect regardless of prev. carry = carryover * (1 - notification) * prev) |> ungroup() |> mutate(y = 0.4 + city_effect + period_effect + tau_sb * notification + carry + rnorm(n(), 0, 0.02)) mean(d$y[d$notification == 1]) - mean(d$y[d$notification == 0]) } ``` ] .small[ **Why the bias is downward.** With i.i.d. switching, `\(P(\text{prev}=1) = 0.5\)` in both arms — symmetric carryover (a `\(+\delta\)` that hits *any* off period equally) would cancel. But here the carryover is *asymmetric*: it shows up only on `\(T=0\)` periods that follow `\(T=1\)`, lifting the control mean by `\(\delta/2\)` while leaving the treated mean unchanged. So `\(E[\hat{\tau}] = \tau - \delta/2\)`. With `\(\tau = 0.05\)` and `\(\delta = 0.05\)`: estimate ≈ 0.025. ] <a href="#switchback-formal" class="nav-btn">formal</a> --- name: switchback-formal # Backup: Switchback Estimator For a switchback design with cities `\(c = 1, \ldots, C\)` and periods `\(t = 1, \ldots, T\)`: `$$Y_{ct} = \mu_c + \gamma_t + \tau \cdot D_{ct} + \delta \cdot D_{c,t-1} + \varepsilon_{ct}$$` -- - `\(\mu_c\)`: city fixed effect - `\(\gamma_t\)`: period fixed effect - `\(D_{ct}\)`: notification indicator for city `\(c\)` at time `\(t\)` - `\(\delta\)`: carryover parameter (ideally `\(\delta = 0\)`) -- The standard switchback estimator ignores carryover: `$$\hat{\tau} = \frac{1}{CT}\sum_{c,t} (2D_{ct} - 1) Y_{ct}$$` -- If `\(\delta \neq 0\)`, the estimator is biased: `$$E[\hat{\tau}] = \tau + \delta \cdot \text{Corr}(D_{ct}, D_{c,t-1})$$` Under i.i.d. randomization, `\(\text{Corr}(D_{ct}, D_{c,t-1}) \approx 0\)`, so the bias is small but not zero in finite samples. Including `\(D_{c,t-1}\)` as a covariate or using burn-in periods mitigates this. --- name: switchback-effn # Backup: Why `\(n_{eff}\)` Grows from `\(C\)` to `\(C \cdot T\)` .small[ Same model: `\(Y_{ct} = \mu_c + \gamma_t + \tau D_{ct} + \varepsilon_{ct}\)`, with `\(\varepsilon_{ct} \sim (0, \sigma_\varepsilon^2)\)` i.i.d., `\(\mu_c \sim (0, \sigma_\mu^2)\)`, `\(\gamma_t \sim (0, \sigma_\gamma^2)\)`. Half-half assignment, `\(C\)` cities, `\(T\)` periods. **Cluster randomization** (one assignment per city; average `\(T\)` periods within each city): `$$\bar{Y}_c = \mu_c + \bar{\gamma} + \tau D_c + \bar{\varepsilon}_c \quad\Longrightarrow\quad \text{Var}(\bar{Y}_c) = \sigma_\mu^2 + \frac{\sigma_\gamma^2 + \sigma_\varepsilon^2}{T}$$` `$$\text{Var}(\hat{\tau}_{cl}) = \frac{4}{C}\!\left(\sigma_\mu^2 + \frac{\sigma_\gamma^2 + \sigma_\varepsilon^2}{T}\right) \;\;\xrightarrow{T \to \infty}\;\; \frac{4 \sigma_\mu^2}{C}$$` The city term `\(\sigma_\mu^2\)` does **not** average down with `\(T\)` — it sets a noise floor governed by the city count `\(C\)`. **Switchback** (i.i.d. re-randomization each period; FE estimator absorbs `\(\mu_c\)` and `\(\gamma_t\)`): `$$\hat{\tau}_{sb} = \frac{\sum_{ct} \tilde{D}_{ct}\, Y_{ct}}{\sum_{ct} \tilde{D}_{ct}^2},\qquad \tilde{D}_{ct} = D_{ct} - \bar{D}_c - \bar{D}_t + \bar{D}$$` `$$\text{Var}(\hat{\tau}_{sb}) = \frac{\sigma_\varepsilon^2}{\sum_{ct} \tilde{D}_{ct}^2} \approx \frac{\sigma_\varepsilon^2}{0.25 \cdot CT} = \frac{4 \sigma_\varepsilon^2}{C \cdot T}$$` The within-city demeaning kills `\(\mu_c\)`, so noise is driven by the idiosyncratic `\(\sigma_\varepsilon^2\)` over all `\(C \cdot T\)` city-periods. **Effective `\(n\)`** ( `\(n_{eff} = \sigma^2_{tot} / \text{Var}(\hat{\tau})\)`, scaled to SRS units): `$$n_{eff,\,cl} \;\approx\; \frac{\sigma^2_{tot}}{\sigma_\mu^2}\cdot\frac{C}{4} \;\sim\; O(C) \qquad\text{vs.}\qquad n_{eff,\,sb} \;\approx\; \frac{\sigma^2_{tot}}{\sigma_\varepsilon^2}\cdot\frac{CT}{4} \;\sim\; O(C \cdot T)$$` Switchback gains a factor of `\(T\)` when `\(\sigma_\mu^2\)` is non-trivial (high ICC). With carryover, `\(\sigma_\varepsilon^2\)` is replaced by an inflated effective error and the ratio shrinks toward 1. ] <a href="#switchback-strengths" class="nav-btn-br">← back</a> --- name: spillover-regression # Backup: The Spillover Regression Augment the DiD specification with exposure terms: .small[ `$$Y_{it} = \mu_i + \delta\, I_{t=1} + \sum_{j=1}^{J}\tau_j\,(T_{ij}\, I_{t=1}) + \beta_0\,(S_i\, I_{t=1}) + \theta\,(S_i\, I_{t=1}\, \mathbb{1}\{T_i \neq 0\}) + \sum_{j=0}^{J}\psi_j\,(T_{ij}\, I_{t=1}\, P_i) + \varepsilon_{it}$$` ] .pull-left[ .small[ **What each term identifies:** - `\(\tau_j\)` — direct effect of own assignment to arm `\(j\)` - `\(\beta_0\)` — spillover onto **control** studies from exposure to any treatment - `\(\theta = \beta_* - \beta_0\)` — *additional* spillover when own arm is treated; sign tells you **complementarity** ( `\(\theta>0\)` ) vs. **substitution** ( `\(\theta<0\)` ) - `\(\psi_j P_i\)` — controls for total network degree, so spillover ≠ "well-connected studies are different" <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> ] ] .pull-right[ .small[ **Restrictions used to cut 16 `\(\beta_{jk}\)` down to 2** (count: 16 → 12 → 4 → 2): 1. No spillover from control exposure: `\(\beta_{j0}=0\)` for all `\(j\)` ⇒ kills 4 ⇒ **12 left** 2. Spillover doesn't depend on *which* T-arm exposed (for **all** `\(j\)`): `\(\beta_{jk}=\beta_{j*}\)` for `\(k\in\{1,2,3\}\)` ⇒ collapses 3 entries per row ⇒ **4 left** ($\beta_{0*},\beta_{1*},\beta_{2*},\beta_{3*}$) 3. Spillover on T-arms is the same across T-arms: `\(\beta_{1*}=\beta_{2*}=\beta_{3*}\equiv\beta_*\)` ⇒ **2 left** ($\beta_0\equiv\beta_{0*}$ and `\(\beta_*\)`) Each restriction is testable: relax it, refit, compare `\(\widehat\beta_0\)` and `\(\widehat\theta\)`. ] ] <a href="#spillover-validity" class="nav-btn">validity →</a> --- name: spillover-validity # Backup: Validity of Exposure Mapping .small[ **What's identified for free (under random assignment):** - `\(T_{ij}\)` is randomized ⇒ `\(\tau_j\)` is identified unconditionally. - `\(S_i\)` is determined by *others'* (random) assignments + the **fixed** network structure ⇒ `\(S_i\)` is "as good as random" *conditional on the network*. So `\(\beta_0\)` is identified by within-arm variation in exposure. <a href="#spillover-id" class="inline-btn">proof</a> ] .small[ **What you need on top:** 1. **Independent random assignment** of `\(T_i\)` across units (or known stratified assignment with conditioning). 2. **Correct exposure mapping.** `\(S_i\)` captures all relevant interference channels. If second-degree (indirect) ties or out-of-sample co-authors matter, `\(\widehat\beta_0\)` absorbs them — possibly with the wrong sign. 3. **Linearity / additivity.** Spillover scales linearly in `\(S_i\)` and degree enters through `\(P_i\)` alone. Saturation, threshold, or non-monotone effects, or unmodeled `\(P_i \to Y\)` channels, break this. 4. **Network exogeneity.** The co-author network must not respond to (anticipated) treatment (no endogenous network formation between `\(t=0\)` and `\(t=1\)`). 5. **Stable composition of `\(P_i\)`.** `\(P_i\)` must be measured the same way for everyone, so that controlling for it actually absorbs the "high-degree studies are different" channel. <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> ] .highlight-box[ **Trade-off vs. ego-cluster.** Exposure mapping uses the *whole* sample (no nodes dropped) and recovers spillover magnitudes — but identification rests on a parametric model of how spillovers propagate. Ego-cluster guarantees clean identification on a smaller sample. The right choice depends on whether you trust the network model or the partition. ] <a href="#spillover-regression" class="nav-btn">← regression</a> --- name: spillover-id # Backup: Why `\(\beta_0\)` Is Identified .small[ **Setup.** Treatment `\(T_i \in \{0, 1, \dots, J\}\)` assigned independently across `\(i\)`. Network `\(\mathbf{M} = (M_{as})\)` is fixed (measured pre-treatment). Define exposure as `$$S_{ik} = \sum_{a \in A_i} \sum_{s \neq i} M_{as}\, \mathbb{1}\{T_s = k\}, \qquad S_i = \sum_{k \geq 1} S_{ik}, \qquad P_i = \sum_{k \geq 0} S_{ik}.$$` Maintained outcome model: `$$Y_i = \mu + \tau_{T_i} + \beta_0\, S_i + \theta\, S_i\, \mathbb{1}\{T_i \neq 0\} + \psi\, P_i + \varepsilon_i,\quad E[\varepsilon_i \mid T_i, S_i, P_i, \mathbf{M}] = 0.$$` ] -- .small[ **Lemma (random exposure).** `\(\;S_i \perp T_i \mid \mathbf{M}.\)` *Proof.* The inner sum runs over `\(s \neq i\)`, so `\(S_i\)` is a function of `\(\{T_s : s \neq i\}\)` and the fixed network `\(\mathbf{M}\)` alone. Independent assignment gives `\(T_i \perp \{T_s : s \neq i\}\)`, and `\(\mathbf{M}\)` is constant. Therefore `\(S_i \perp T_i \mid \mathbf{M}\)`. `\(\quad\square\)` (Same argument applies to `\(P_i\)`: `\(P_i \perp T_i \mid \mathbf{M}\)`.) ] -- .small[ **Identification of `\(\beta_0\)`.** Restrict to controls ( `\(T_i = 0\)` ): `$$E[Y_i \mid T_i = 0, S_i, P_i, \mathbf{M}] \;=\; \mu + \tau_0 + \beta_0\, S_i + \psi\, P_i.$$` OLS of `\(Y\)` on `\(S_i\)` **and** `\(P_i\)` within the control sub-sample is consistent for `\(\beta_0\)` whenever the partial variance `\(\text{Var}(S_i \mid P_i, T_i = 0) > 0\)` — guaranteed by the lemma plus a non-degenerate network ( <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> shows this partial variance equals `\(\pi_T(1-\pi_T)\, P_i\)`). Within-treated OLS gives `\(\beta_0 + \theta\)`, so `\(\theta\)` is identified. `\(\quad\square\)` ] -- .small[ **What this *doesn't* prove.** (i) **Linearity of `\(S_i\)` and `\(P_i\)`** — nonlinear or threshold spillovers, indirect ties, or interactions break the result; OLS then recovers a best-linear-projection that is generally not the structural `\(\beta_0\)`. (ii) **Network exogeneity** — co-author networks must not respond to (anticipated) treatment. (iii) **Identifying variation may be small** — if `\(\rho(S, P) \approx 1\)` (dense networks, concentrated treatment fractions), `\(\widehat{\beta_0}\)` is consistent but high-variance. ] <a href="#spillover-validity" class="nav-btn-br">← validity</a> --- name: spillover-S-vs-P # Backup: `\(S_i\)` vs `\(P_i\)` — Why Both, Despite Collinearity? .small[ **Decomposition.** `\(P_i = S_{i0} + S_i\)`, where `\(S_{i0}\)` counts neighbors in the control arm. So under uniform 4-arm randomization with treatment fraction `\(\pi_T = 3/4\)`: `$$E[S_i \mid P_i] = \pi_T \cdot P_i = 0.75\, P_i$$` The expected `\(S_i\)` is a deterministic linear function of `\(P_i\)` — that's the source of the collinearity. The correlation `\(\rho(S, P)\)` is close to 1 in any sample with non-trivial `\(P_i\)`. ] -- .pull-left[ .small[ **Where identifying variation lives.** Decompose `\(S_i\)` into structural and randomized components: `$$S_i \;=\; \underbrace{\pi_T P_i}_{\text{structural (}\propto\text{degree)}} \;+\; \underbrace{(S_i - \pi_T P_i)}_{\text{randomized}}$$` Conditional on `\(P_i\)`, the second piece has variance `$$\text{Var}(S_i \mid P_i) = \pi_T(1-\pi_T)\, P_i \approx 0.19\, P_i$$` driven entirely by *which arm each neighbor landed in*. That's the variation `\(\widehat{\beta_0}\)` uses. ] ] .pull-right[ .small[ **Why include `\(P_i\)` anyway?** Drop `\(P_i\)` and `\(\widehat{\beta_0}\)` confounds two channels: 1. **Per-T-neighbor spillover** — the structural `\(\beta_0\)`. 2. **Degree effect** — well-connected studies may differ for unrelated reasons (more visibility, broader networks, prior relationships, productivity). Including `\(P_i\)` partials out the degree channel; the residual in `\(S_i \mid P_i\)` is *exactly* the randomized neighbor-arm draw, which is the only variation `\(\beta_0\)` should be identified from. **Cost:** collinearity inflates `\(\text{SE}(\widehat{\beta_0})\)`; power scales with `\(\sqrt{n\,\bar P\,\pi_T(1-\pi_T)}\)`. Identification requires both **many studies** *and* **non-trivial degree**. ] ] <a href="#spillover-validity" class="nav-btn-br">← validity</a> --- name: ego-cluster-formal # Backup: Ego-Cluster Randomization Define the **ego network** of unit `\(i\)` as `\(\mathcal{N}_i = \{i\} \cup \{j : j \text{ is connected to } i\}\)`. -- **Ego-cluster randomization** assigns the same treatment to all units in `\(\mathcal{N}_i\)`: `$$D_j = D_i \quad \forall j \in \mathcal{N}_i$$` -- This ensures that, for unit `\(i\)`, all first-degree neighbors have the same treatment status. If interference is limited to first-degree connections: `$$Y_i(D_i, \mathbf{D}_{\mathcal{N}_i}) = Y_i(D_i, D_i, \ldots, D_i) = Y_i(D_i)$$` and SUTVA is restored within the ego cluster. -- **Challenges:** - Ego clusters overlap: if `\(i\)` and `\(j\)` are neighbors, `\(\mathcal{N}_i \cap \mathcal{N}_j \neq \emptyset\)` - Must resolve overlaps (e.g., graph coloring, independent set sampling) - Power depends on the number of *non-overlapping* ego clusters - Second-degree interference is not addressed