class: center, middle, inverse, title-slide .title[ # Module 3: Designing Around Interference ] .subtitle[ ## Cluster Randomization, Switchbacks, and the Bias-Variance Tradeoff ] --- <style type="text/css"> .remark-code, .remark-inline-code { font-size: 80%; } .remark-slide-content { padding: 1em 2em; } .small { font-size: 80%; } .tiny { font-size: 65%; } .highlight-box { background: #fff3e0; border-left: 4px solid #e65100; padding: 0.5em 1em; margin: 0.5em 0; } .blue-box { background: #e3f2fd; border-left: 4px solid #1565c0; padding: 0.5em 1em; margin: 0.5em 0; } .nav-btn { position: absolute; bottom: 12px; left: 40px; font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; z-index: 100; text-decoration: none; color: #1a237e; } .nav-btn:hover { background: #c5cae9; } .inline-btn { font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; text-decoration: none; color: #1a237e; margin-right: 6px; vertical-align: middle; } .inline-btn:hover { background: #c5cae9; } .nav-btn-br { position: absolute; bottom: 12px; right: 70px; font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; z-index: 100; text-decoration: none; color: #1a237e; } .nav-btn-br:hover { background: #c5cae9; } </style> # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr> <tr><td>2</td><td><a href="../module-02/slides.html">SUTVA and When It Breaks</a></td><td>✓ done</td></tr> <tr><td><b>3</b></td><td><b>Designing Around Interference</b> <i>(you are here)</i></td><td>current</td></tr> <tr><td>4</td><td>Power and Sample Size</td><td>upcoming</td></tr> <tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr> <tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr> <tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr> <tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr> </table> --- # Last Time: SUTVA Breaks Everywhere Module 2 showed that interference biases the naive ATE in two settings we'll keep using here: - **Marketplace (zone-notification experiment):** notified drivers compete for rides in the same zone, contaminating the comparison - **Networks (author-nudge experiment):** control researchers learn the new reporting practice from treated co-authors -- Today: **how to design experiments that work despite interference.** The core idea is simple: randomize at a level where interference is contained. -- .highlight-box[ Every design in this module trades **bias** (from interference) for **variance** (from fewer independent units). The art is finding the sweet spot. ] --- # The Setup: 40 Cities, 5,000 Drivers Each A ride-sharing platform operates in **40 cities** with **5,000 drivers each** (200,000 total). They want to test the **zone-notification** feature from Module 1. -- **Within-city interference (symmetric):** as more drivers are notified, the high-demand zone gets crowded — *every* driver in the city (notified or not) faces lower accept rates. Direct boost `\(\tau = 0.05\)`, interference `\(\lambda = 0.07\)` → policy effect = `\(-0.02\)` at full rollout. -- **Design options:** | Design | Unit of randomization | Independent units | Estimand | |--------|----------------------|-------------------|----------| | Individual | Driver | 200,000 | Direct effect (+0.05) | | City cluster | City | 40 | Policy effect (−0.02) | | Region cluster | 5-city region | 8 | Policy effect (−0.02) | --- name: indiv-cluster-main # Individual vs Cluster: Side by Side <img src="slides_files/figure-html/indiv-vs-cluster-1.png" style="display: block; margin: auto;" /> <a href="#dgp-indiv-cluster" class="nav-btn">DGP code</a> --- name: cluster-bias-main # Two Estimators, Two Estimands .pull-left[ .small[ **Individual rand → direct effect:** - 50/50 within every city, so frac_t ≈ 0.5 everywhere - Both arms face the *same* penalty `\(-\lambda \cdot 0.5\)`; it cancels - Naive recovers `\(\tau\)` = +0.05 - Answers: *"marginal benefit to one driver?"* **City-cluster rand → policy effect:** - Treated cities: frac_t = 1 (full penalty `\(-\lambda\)`) - Control cities: frac_t = 0 (no penalty) - Cluster contrast = `\(\tau - \lambda\)` = −0.02 - Answers: *"what happens at full rollout?"* Both are unbiased — for *different estimands*. Pick one based on the decision you're trying to make. ] ] .pull-right[ |Design |Treated mean |Control mean |Estimator gives | |:-------------------|:------------|:------------|:----------------------| |Individual (50/50) |b + τ − λ·½ |b − λ·½ |τ (direct, +0.05) | |Cluster (saturated) |b + τ − λ |b |τ − λ (policy, −0.02) | .highlight-box[ **Intuition:** the cluster contrast *includes* the interference term, so it captures the equilibrium effect under full rollout. Individual rand mechanically cancels the term out. ] ] .small[**Cost of clustering:** effective sample size collapses from **200,000** drivers (individual) to **40** cities (city cluster) to **8** regions (region cluster). The next slide quantifies this with the design effect.] <a href="#cluster-bias-proof" class="nav-btn">decomposition</a> --- name: design-effect-icc # Design Effect and ICC The **design effect** is the variance inflation factor from clustering: `$$DEFF = 1 + (m - 1) \cdot \rho \qquad\Longrightarrow\qquad n_{eff} = \frac{n}{DEFF}$$` For a fixed effect size you want to detect, required `\(n\)` scales by `\(DEFF\)` — equivalently, your effective sample shrinks by the same factor. `\(m\)` is cluster size (drivers per city), `\(\rho\)` is intra-cluster correlation (ICC). -- .pull-left[ .small[ | m (drivers)| ICC| DEFF| Eff. N (of 200k)| Power (MDE = 15 pp)| |-----------:|----:|-------:|----------------:|-------------------:| | 100| 0.02| 3.0| 67,114| 100%| | 100| 0.05| 6.0| 33,613| 100%| | 100| 0.10| 10.9| 18,349| 100%| | 5,000| 0.02| 101.0| 1,981| 100%| | 5,000| 0.05| 251.0| 797| 99%| | 5,000| 0.10| 500.9| 399| 86%| | 25,000| 0.02| 501.0| 399| 86%| | 25,000| 0.05| 1,251.0| 160| 49%| | 25,000| 0.10| 2,500.9| 80| 28%| ] ] .pull-right[ .small[ **Plausible ranges for `\(\rho\)`:** | Setting | Typical `\(\rho\)` | |---------|----------------| | Online behavior (clicks, conversions) | 0.001 – 0.01 | | Marketplace metrics (accept rate, ride completion) | 0.02 – 0.10 | | Geo clusters (DMA, city) | 0.05 – 0.20 | | Strong common shocks (weather, surge) | 0.10 – 0.30 | **Rule of thumb:** anything you'd model with city or time-period fixed effects has `\(\rho\)` big enough to matter. Estimate `\(\rho\)` from a pilot or historical data — don't guess. ] ] <a href="#icc-derivation" class="nav-btn">ICC derivation</a> --- name: bias-variance-sim # The Bias-Variance Tradeoff: Simulated <img src="slides_files/figure-html/tradeoff-sim-1.png" style="display: block; margin: auto;" /> .small[ **Bias** vanishes once clusters reach city scale — interference is fully contained. **Variance** keeps climbing — fewer effective independent units. ] --- name: switchback-intro # Switchback Designs An alternative to spatial clustering: **randomize over time**. -- .pull-left[ **How it works:** 1. Divide time into periods (e.g., 2-hour blocks) 2. Each period, randomize each city to notification on/off 3. Compare accept rates in notified vs. non-notified periods 4. Every city serves as its own control (removes city fixed effects) ] -- .pull-right[ <img src="slides_files/figure-html/switchback-diagram-1.png" style="display: block; margin: auto;" /> ] --- name: switchback-strengths # Switchback: Strengths and Risks .small[ **Strengths:** - Every unit (city) appears in both arms over time — removes unit-level confounders - **More "independent observations" than pure clustering:** identification is *within-city* (a city's "on" periods vs. its own "off" periods), not between cities. The ICC penalty is differenced out, and effective `\(N\)` grows from `\(\approx n_{cities}\)` toward `\(n_{cities} \times n_{periods}\)` <a href="#switchback-effn" class="inline-btn">proof</a> - Natural for marketplace experiments where interference is spatial, not temporal ] -- **Risks:** <img src="slides_files/figure-html/carryover-1.png" style="display: block; margin: auto;" /> .small[ **Carryover:** notification in period `\(t\)` affects behavior in `\(t+1\)` even if `\(t+1\)` is "off" (drivers who learned the zone keep going). **Fix:** insert *washout windows* — short buffers right after each switch that you exclude from analysis, letting behavior reset before measurement resumes. **Cost:** fewer usable minutes per period. ] --- name: switchback-sim # Switchback: Simulation <img src="slides_files/figure-html/switchback-sim-1.png" style="display: block; margin: auto;" /> <a href="#switchback-dgp" class="nav-btn">DGP code</a> --- name: switchback-period-length # Switchback: Choosing the Period Length The [carryover slide](#switchback-strengths) showed the bias for short periods. So why not very long blocks (one switch per week)? .small[ - **Short blocks** are bad because flipping often means leftover behavior from the previous block contaminates the next one; the shorter the block, the bigger that contamination as a share of "clean" observations. Bias scales as `\(\delta/L\)`. - **Long blocks** are bad because every period inside one block sees the *same* assignment — they're highly correlated, so contribute roughly *one* effective observation each. Fewer effective comparisons → wider standard errors. Variance scales as `\(L/(CT)\)`. - **Sweet spot:** bias falls fast (as `\(1/L^2\)`); variance creeps up (linear in `\(L\)`). The U-shaped MSE has an interior minimum at `\(L^{*} \propto (\delta^{2} CT / \sigma^{2})^{1/3}\)` — Bojinov, Simchi-Levi & Zhao (2023, *Mgmt Sci*). Use *block-clustered SE*; naive FE SE understates variance. ] <img src="slides_files/figure-html/block-tradeoff-1.png" style="display: block; margin: auto;" /> <a href="#switchback-optimal" class="nav-btn">optimal-design sketch</a> --- # Two-Sided Marketplace Designs In a ride-sharing experiment, you can randomize **riders** instead of **drivers**: .small[ **A note on the name.** "Two-sided" refers to the *marketplace* (riders + drivers), not the randomization. The basic version randomizes one side and *measures* on the other — the two-sidedness is what lets cross-side outcomes carry the causal signal. A stricter usage (Bajari et al. 2021, *Multiple Randomization Designs*) reserves "two-sided experiment" for designs that randomize **both** sides cross-classified, which can separately identify direct + spillover effects on each side. ] -- .pull-left[ **Randomize riders:** - 50% of riders see new pricing or matching - 50% see the old experience - Riders don't interact with each other (mostly) - No interference on the randomization side **Measure driver outcomes:** - Driver earnings, wait times, accept rate - These reflect the equilibrium effect ] -- .pull-right[ **Why it works:** - Each rider's experience is independent (they don't share rides) - But their collective demand changes driver supply allocation - You get the direct effect on riders AND the indirect effect on drivers **When it doesn't work:** - If treated riders change market-wide conditions (e.g., surge pricing triggers) - If the treatment fraction is large enough to move equilibrium prices ] --- name: geo-experiments # Geo Experiments .pull-left[ .small[ For online advertising and pricing experiments, **geographic units** are natural clusters. A **DMA** (Designated Market Area, defined by Nielsen) is a metro region used for TV and ad targeting — e.g., "New York DMA," "Los Angeles DMA." There are **210 DMAs in the US**, ranging from a few hundred thousand to ~20M people. Ad spend can be served at DMA granularity. **Treatment** in a geo experiment = changing ad spend in randomly chosen DMAs (typically *pausing* or *boosting* a campaign), holding control DMAs at the status quo. Outcome = revenue / conversions / signups per DMA. The contrast estimates **incremental lift** — what wouldn't have happened organically. ] ] .pull-right[ <img src="slides_files/figure-html/geo-diagram-1.png" style="display: block; margin: auto;" /> ] .blue-box[ **Google/Meta approach:** "GeoX" / "geo-based incrementality." Randomize ad spend across 50–200 DMAs. Power depends on between-DMA variance and number of DMAs, not number of users. ] --- # Ego-Cluster Randomization for Networks For **network interference** (the author-nudge experiment from Module 2), standard clustering is hard because network boundaries are unclear. -- .small[ **Ego-cluster approach** (node roles in the chart below): 1. Pick a focal study — the **ego** (center of each cluster, labeled "ego") 2. Add every study sharing co-authors with it — the **alters** (the four nodes around each ego) 3. Randomize the entire ego-cluster (ego + alters) to the same arm 4. Within-cluster spillover is absorbed; **bridges** (faded grey rings) are alters whose ties extend into the other arm — **drop them from the analysis sample** to avoid cross-arm contamination ] -- <img src="slides_files/figure-html/ego-cluster-1.png" style="display: block; margin: auto;" /> --- name: spillover-exposure-mapping # Approach 4: Model the Spillover (Exposure Mapping) The first three approaches **contain** interference by design. When you can't — diffuse networks, observational settings, small samples — **model the spillover** instead. .pull-left[ .small[ **Idea (Aronow & Samii, 2017, *Annals of Applied Stats*):** for each unit `\(i\)`, build an exposure measure `\(S_i\)` summarizing how its network neighbors are treated. Include `\(S_i\)` as a covariate; recover the direct effect *and* the spillover jointly. For study `\(i\)` with author set `\(A_i\)` (M2's author-nudge experiment), `\(S_{ik} = \sum_{a \in A_i} \sum_{s \neq i} M_{as} T_{sk}\)` counts other studies in arm `\(k\)` that share an author with `\(i\)`. - `\(S_i = \sum_{k \geq 1} S_{ik}\)` — exposure to *any* treatment arm - `\(P_i = \sum_{k \geq 0} S_{ik}\)` — total degree (incl. control) ] ] .pull-right[ <img src="slides_files/figure-html/spillover-diagram-1.png" style="display: block; margin: auto;" /> .small[ Study 2 shares author *b* with study 1 and *e* with study 3 ⇒ `\(S_2 = 1\)`, `\(P_2 = 2\)`. Study 4's only tie is out-of-sample ⇒ `\(S_4 = P_4 = 0\)`. ] ] <a href="#spillover-regression" class="nav-btn">regression & validity</a> --- name: spillover-simulation # Approach 4: Simulation True parameters: `\(\tau_1=0.05\)`, `\(\tau_2=0.10\)`, `\(\tau_3=0.15\)` (direct), `\(\beta_0=0.05\)` (control spillover), `\(\theta=0.02\)` (extra on treated), `\(\psi=0.05\)` (direct effect of degree `\(P_i\)`). 500 studies, co-author network (mean degree ≈ 6), random T0/T1/T2/T3 assignment. <img src="slides_files/figure-html/spillover-sim-1.png" style="display: block; margin: auto;" /> <a href="#spillover-regression" class="nav-btn">regression & validity</a> --- # Application: City-Level Zone-Notification Test A ride-sharing company wants to test whether the **zone-notification** feature lifts driver accept rate at the *city* level. -- .pull-left[ **Why cluster at the city level?** - Within a city, notified drivers compete with non-notified for rides - A 50/50 split per city contaminates control's accept rate - City-level randomization: each city gets the feature on or off, no within-city mixing ] -- .pull-right[ **The power problem:** - Only 40 cities available - 20 treatment, 20 control - ICC of accept rate within cities ≈ 0.10 - DEFF `\(\approx 500 \;\Longrightarrow\;\)` Effective `\(N \approx 400\)` (out of 200k drivers) - For MDE = 15 pp, power `\(\approx 86\%\)` — need a **large** effect ] -- .highlight-box[ **The interview question:** "We have 40 cities. Can we run a cluster-randomized experiment?" Answer: probably, but only if the expected effect is large. Calculate the minimum detectable effect (MDE) and check if it's business-relevant. ] --- name: switchback-application # Application: Switchback for Zone Notifications **Alternative:** use a switchback design within each city. -- - Divide each day into 2-hour blocks - Randomize which blocks have the zone-notification feature on vs. off - 12 blocks/day × 30 days × 40 cities = 14,400 city-period cells - Effective N is much larger than 40 clusters -- **But:** - **Carryover**: if drivers learn the high-demand zone during "on" periods, they keep going there during "off" periods - Need washout periods (waste time and data) - Confounding with time-of-day effects (notifications matter more at peak times) -- .blue-box[ **Practical design:** stratified switchback. Within each city, pair "similar" time blocks (e.g., Monday 5–7pm with Wednesday 5–7pm) and randomize within pairs. This controls for time-of-day and day-of-week effects. ] --- name: geo-incrementality-app # Application: Geo Incrementality for Online Ads An e-commerce company wants to measure whether their TV ad campaign drives sales. -- **Design:** - 100 DMAs (Designated Market Areas) across the US - Randomly assign 50 DMAs to see ads, 50 to no ads - Measure sales lift at the DMA level - Control for baseline sales using **CUPED** — subtract a scaled pre-experiment outcome from each DMA's post outcome to soak up pre-existing variation (covered in M5) -- **Challenges:** - DMAs vary enormously in size (NYC vs. rural Montana) - Need to weight by population or stratify by DMA size - Spillovers: people in control DMAs can see ads on streaming/social media (imperfect compliance) - Attribution: sales in a DMA might come from visitors from neighboring DMAs -- .highlight-box[ **Key insight:** geo experiments work best when the treatment is *geographically contained* (TV, billboards, local promotions). They work poorly for digital treatments that cross geo boundaries (social media ads, viral content). ] --- name: choosing-design # Choosing the Right Design | Interference type | Recommended design | Key tradeoff | |-------------------|-------------------|--------------| | Within-market (supply/demand) | City-level cluster | Bias vs. power | | Temporal (pricing, algorithms) | Switchback | Carryover vs. efficiency | | Network (social, co-authorship) | Ego-cluster | Cluster size vs. coverage | | Geographic (ads, promotions) | Geo experiment | Geo count vs. precision | | Two-sided marketplace | Randomize one side | Indirect effects only | -- **Decision framework:** 1. **Where does interference happen?** Within cities? Across time? Through networks? 2. **Can you cluster at that level?** How many independent clusters do you have? 3. **Is the effect large enough to detect with that many clusters?** 4. **Is there carryover/leakage between clusters?** If so, switchback may be risky. --- name: bias-variance-summary # Summary of the Bias-Variance Tradeoff <img src="slides_files/figure-html/final-tradeoff-1.png" style="display: block; margin: auto;" /> --- name: design-walkthrough-prompt # Worked Example: Designing the Zone-Nudge Experiment .small[ **Interview prompt.** *"Uber's product team wants to ship a new in-app feature that nudges drivers toward high-demand zones with a tappable card. They want experimental evidence before launch. Design the experiment."* **First moves before designing.** Three questions to ask back: 1. **What's the deciding metric?** Driver accept rate, rider wait time, or driver earnings? Affects both the estimator and the minimum detectable effect. 2. **What's the scope of rollout if it works?** Single city, all US, both rider + driver sides? Determines what *estimand* we need (DE for partial test, TTE for ship/no-ship). 3. **Timeline & budget.** Two weeks vs two months changes the design (switchback vs phased geo). **Working assumptions for the rest of this chain:** primary metric is **driver accept rate**; ship/no-ship decision for the **whole city**; **2 weeks** of experiment time across **8 cities**. The design menu (this module): individual A/B → cluster A/B → switchback → geo → two-sided → ego-cluster. We'll walk through each decision. ] <a href="#design-walkthrough-estimand" class="nav-btn-br">step 1 →</a> --- name: design-walkthrough-estimand # Step 1 — What's the Estimand? .small[ **The decision.** Ship-or-not is a **TTE question**: "what changes when *every* driver in the city sees the nudge vs *no* driver does?" **Why not the naive A/B effect.** Under marketplace crowd-out (M2), a 50/50 individual A/B at city level gives a biased blend of DE and SE, *not* TTE — typically overstating the policy effect because the treated group's crowding is partly offset by the control group's relief. ([M2 §29 worked example](../module-02/slides.html#estimands-formal-2): canonical DGP gives +5 pp from naive A/B but −2 pp from full rollout.) **Conclusion.** Estimand = TTE. We need a design that produces near `\(\mathbf{1}\)` and near `\(\mathbf{0}\)` exposure regimes side-by-side. ] <a href="#design-walkthrough-prompt" class="nav-btn">← prompt</a> <a href="#design-walkthrough-design" class="nav-btn-br">step 2 →</a> --- name: design-walkthrough-design # Step 2 — Pick the Unit of Randomization .small[ **Two designs target TTE directly under marketplace interference:** | Design | When it wins | When it loses | |---|---|---| | **Cluster A/B** (whole cities) | Outcome responds slowly (days/weeks) — e.g., retention, weekly active drivers. Many cities available. | Outcome responds fast (within hours) — e.g., accept rate. Few cities → ICC-driven power loss. | | **Switchback** (whole-city ON/OFF periods) | Outcome responds fast — driver positioning + accept rate move within minutes. | Long carryover (driver positioning persists across switches → bias). | *"Responds slowly/fast" = the outcome's natural adjustment timescale after a state change. Stock-like variables (retention) drift on a long timescale; flow-like variables (accept rate) equilibrate in minutes.* **For the zone nudge:** outcome (accept rate) reacts within minutes; we have only 8 cities (limited cluster A/B power). → **Switchback wins.** **Block length.** Per [M3 §12](../module-03/slides.html#11) (Bojinov MSE-optimal `\(L^{*} \propto (\delta^2 CT/\sigma^2)^{1/3}\)`): start with **30-minute blocks**, **15-minute washout** after each switch. Driver positioning roughly resets in 5 min, but a longer washout buys margin against carryover under variable conditions (peak hours, weather shocks) — at the cost of ~33% effective sample vs. a 5-min washout. ] <a href="#design-walkthrough-estimand" class="nav-btn">← step 1</a> <a href="#design-walkthrough-power" class="nav-btn-br">step 3 →</a> --- name: design-walkthrough-power # Step 3 — Power and MDE .small[ **Sample size, back of envelope:** - 30-min block + 15-min washout = **45-min cycle** → 32 analyzed blocks/day per city. - 8 cities × 14 days × 32 blocks/day = **3,584 city-blocks** (vs. 5,376 with a 5-min washout — washout cost ~33%). - Within-city block-cluster correlation in accept rate: assume `\(\text{ICC} \approx 0.05\)` (low — block-level variation dominates). - `\(\sigma\)` at baseline accept rate `\(p = 0.4\)`: `\(\sigma = \sqrt{p(1-p)} = 0.49\)`. **MDE for power = 0.8, `\(\alpha = 0.05\)` two-sided** (block-clustered SE): `$$\text{MDE} \approx 2.8 \times \sigma \times \sqrt{\frac{\text{DE}_{\text{eff}}}{n_{\text{eff}}}} \approx 0.5 \text{ pp}$$` where `\(\text{DE}_{\text{eff}}\)` is the design effect from block clustering. The **2.8** is `\(z_{1-\alpha/2} + z_{1-\beta} = 1.96 + 0.84\)` — the standard multiplier for "detect 80% of the time at two-sided `\(\alpha = 0.05\)`" <a href="#power-mde-z" class="inline-btn">why 1.96 + 0.84?</a>. MDE rose from 0.4 pp (5-min washout) to **0.5 pp** because `\(n_{\text{eff}}\)` shrank by 33% and MDE `\(\propto 1/\sqrt{n}\)` ($\sqrt{1.5} \approx 1.22$). Still well below a plausible prior effect (~3 pp from a pilot). **If MDE were too high:** add cities (more clusters), shorten blocks (more periods, accept more carryover bias), or run longer. M4 has the formal derivation. ] <a href="#design-walkthrough-design" class="nav-btn">← step 2</a> <a href="#design-walkthrough-pap" class="nav-btn-br">step 4 →</a> --- name: design-walkthrough-pap # Step 4 — Pre-Analysis Plan .small[ **Pre-spec before turning the experiment on:** 1. **Primary metric.** Driver accept rate (block × city level). One number, one test, no peeking. 2. **Guardrails.** Rider wait time (must not increase ≥5%); cancellation rate (must not increase ≥1 pp). Tested separately with Bonferroni-adjusted α. 3. **Stopping rule.** No interim looks. Analyze at end of pre-specified 14-day window. 4. **Subgroup analyses (pre-specified, exploratory).** City tier (top-3 vs other 5), driver tenure (≥1y vs <1y), time-of-day (peak vs off-peak). Reported with multiple-testing correction; not used for the ship/no-ship decision. 5. **Sensitivity analyses.** Drop the first day (driver onboarding to feature); drop the lowest-volume city (donor-pool sensitivity). **Why this matters.** PAPs prevent garden-of-forking-paths inflation. Most "interesting" subgroup splits found post-hoc are noise. ] <a href="#design-walkthrough-power" class="nav-btn">← step 3</a> <a href="#design-walkthrough-analysis" class="nav-btn-br">step 5 →</a> --- name: design-walkthrough-analysis # Step 5 — Analysis .small[ **Estimator.** Fixed-effects regression at the block × city level: `$$Y_{ct} = \mu_c + \gamma_t + \tau \cdot D_{ct} + \varepsilon_{ct}$$` where `\(\mu_c\)` = city FE, `\(\gamma_t\)` = period (hour-of-day) FE. `\(\hat\tau\)` is the TTE estimate. **Standard errors.** One obs = one 30-min block × city ($n = 3{,}584$; each city contributes `\(14 \times 32 = 448\)` blocks). **Cluster by city** — *not* by block, since each block IS one obs (block-clustered `\(\equiv\)` HC at this aggregation). Why naive HC fails: - **Within-city serial correlation in `\(\varepsilon_{ct}\)`.** Consecutive blocks for a city share transient shocks that `\(\mu_c\)` and `\(\gamma_t\)` don't absorb (carryover `\(D_{c,t-1}\)`, demand surges spanning multiple blocks, weather). So `\(\text{Cov}(\varepsilon_{ct}, \varepsilon_{ct'}) > 0\)` — HC assumes 0. - **Few-clusters caveat.** 8 cities is below the 30+ rule of thumb for asymptotic cluster-robust SE → use **wild cluster bootstrap** (Cameron-Gelbach-Miller) for inference, not a `\(z\)`-test. **ITT vs LATE.** This rollout is a server-side flag flipped at the city × time level — no driver-side compliance issue. `\(\text{ITT} = \text{LATE}\)`. *When would `\(\text{ITT} < \text{LATE}\)` here?* If "treated" were redefined as *notification received* (drivers offline, app killed, or notifications muted miss the push), then `\(\text{ITT} = \pi \cdot \text{LATE}\)` with `\(\pi\)` = delivery rate. Ship decision still cares about ITT — we deploy the flag flip, which inherits any delivery losses. **Decision rule.** Pre-specified one-sided `\(z\)`-test at `\(\alpha = 0.05\)`. Ship if `\(\hat\tau > 0\)` AND all guardrail tests pass. **Robustness checks.** (a) drop first day; (b) split sample by city; (c) re-estimate with 60-min blocks; (d) report median in addition to ATE. ] <a href="#design-walkthrough-pap" class="nav-btn">← step 4</a> <a href="#design-walkthrough-wrap" class="nav-btn-br">step 6 →</a> --- name: design-walkthrough-wrap # Step 6 — Caveats and What I'd Actually Run .small[ **The plan in one paragraph.** *14-day switchback in 8 cities. 30-minute ON/OFF blocks with 5-minute washouts, randomized within-city. Primary metric = driver accept rate; guardrails on rider wait time and cancellations. City + period fixed effects, block-clustered SE. Pre-specified one-sided `\(\alpha = 0.05\)` ship/no-ship test. Reported alongside subgroup heterogeneity for product-team learning.* **External-validity caveats.** - **City selection.** 8 cities are the test set; rollout might be 100+ with different demand densities. Use a saturation curve (M3 §16) if cross-city heterogeneity is plausible. - **Time-of-year.** A 2-week window may miss seasonal patterns (sports nights, weather shocks). Consider a longer arm or a multi-season repeat. - **Driver learning.** Switchback assumes drivers don't learn the schedule. If they do, ON-period effects shrink over time → bias toward zero TTE. **When NOT to use this design.** - **Slow-responding outcome** (e.g., weekly active drivers — adjusts over days/weeks): use cluster A/B at the city-week level instead. - **Very few cities** (<3): switchback's TTE estimate has too few independent units; switch to a structural model. - **Carryover suspected to dominate**: run a saturation experiment first to characterize the spillover function before switchback. ] <a href="#design-walkthrough-analysis" class="nav-btn">← step 5</a> --- # Key Takeaways 1. **Cluster randomization** eliminates interference bias by assigning entire clusters to the same arm. Cost: fewer independent units = more variance. 2. **Switchback designs** randomize over time. Good when interference is spatial. Watch for carryover effects. 3. **Two-sided marketplace designs**: randomize one side, measure the other. Works when one side doesn't interact with itself. 4. **Geo experiments**: natural clusters for advertising. Need many geos (50+) and geographically contained treatments. 5. **Ego-cluster randomization**: for network interference (e.g., the author-nudge experiment). Cluster = focal unit + neighbors. 6. **The bias-variance tradeoff** is the central tension. Minimize **RMSE**, not just bias or variance alone. 7. The **design effect** formula `\(n_{eff} = n / [1 + (m-1)\rho]\)` tells you how much power you lose from clustering. --- # Exercise Preview In the exercise you will: 1. Compare individual vs. city-level randomization in a zone-notification simulation 2. Show that clustering eliminates bias but increases variance 3. Sweep over cluster sizes and find the RMSE-minimizing level 4. Simulate a switchback design with and without carryover 5. Compute design effects for different ICC values See `exercise.R` for the starter code. --- # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr> <tr><td>2</td><td><a href="../module-02/slides.html">SUTVA and When It Breaks</a></td><td>✓ done</td></tr> <tr><td><b>3</b></td><td><b>Designing Around Interference</b> <i>(just finished)</i></td><td>✓ done</td></tr> <tr><td>4</td><td>Power and Sample Size</td><td>up next</td></tr> <tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr> <tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr> <tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr> <tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr> </table> --- class: center, middle, inverse # Backup Slides --- name: cluster-bias-proof # Backup: Two Estimators, Two Estimands **Symmetric interference DGP** — every driver in a city competes against the same treated share `\(s\)`: `$$y_i^{(0)} = b_i - \lambda \cdot s_{c(i)}, \qquad y_i^{(1)} = b_i + \tau - \lambda \cdot s_{c(i)}$$` with `\(b_i = 0.4 + 0.2\,\text{exp}_i + \alpha_{c(i)}\)`, `\(\tau = 0.05\)` direct effect, `\(\lambda = 0.07\)` interference. -- The naive contrast under random assignment — `\(\bar b\)` balances across arms: `$$\hat{\tau}_{\text{naive}} = E[y^{(1)} \mid D=1] - E[y^{(0)} \mid D=0] = (\bar b + \tau - \lambda \bar s_{D=1}) - (\bar b - \lambda \bar s_{D=0}) = \tau - \lambda(\bar s_{D=1} - \bar s_{D=0})$$` -- **Individual rand** (50/50 in every city → `\(\bar s_{D=1} = \bar s_{D=0} = 0.5\)`): `$$\hat\tau_{\text{ind}} = \tau - \lambda(0.5 - 0.5) = \tau = 0.05 \quad \text{(direct effect)}$$` **City cluster** (treated cities `\(s = 1\)`, control cities `\(s = 0\)`): `$$\hat\tau_{\text{cluster}} = \tau - \lambda(1 - 0) = \tau - \lambda = -0.02 \quad \text{(policy effect)}$$` -- Both are unbiased — for *different* estimands. If your decision is "should we roll this out everywhere?", you want `\(\hat\tau_{\text{cluster}}\)`. <a href="#cluster-bias-main" class="nav-btn">← back</a> --- name: dgp-indiv-cluster # Backup: DGP for the Slide 5 Simulation .small[ ```r direct_effect <- 0.05 # per-driver direct boost from being notified interference <- 0.07 # symmetric crowd-out per share treated (hits BOTH arms) # Individual rand: 50/50 in every city → frac_t ≈ 0.5 everywhere sim_individual <- function() { d <- market |> mutate(notification = sample(rep(c(0, 1), each = n() / 2))) d |> group_by(city_id) |> mutate(frac_t = mean(notification), # ≈ 0.5 y0 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect # baseline - interference * frac_t)), # CONTROLS hit by competition y1 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect # baseline + direct_effect - interference * frac_t))) |> # TREATED also hit ungroup() |> mutate(y_obs = rbinom(n(), 1, prob = if_else(notification == 1, y1, y0))) |> summarise(ate = mean(y_obs[notification == 1]) - mean(y_obs[notification == 0])) } # City-cluster rand: every city is 100% same arm → frac_t is 0 or 1 sim_cluster <- function() { arms <- tibble(city_id = 1:n_cities, notification = sample(rep(c(0, 1), each = n_cities / 2))) market |> left_join(arms, by = "city_id") |> group_by(city_id) |> mutate(frac_t = mean(notification)) |> ungroup() |> mutate(y0 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect - interference * frac_t)), # 0 in control cities y1 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect + direct_effect - interference * frac_t)), # full λ in treated cities y_obs = rbinom(n(), 1, prob = if_else(notification == 1, y1, y0))) |> summarise(ate = mean(y_obs[notification == 1]) - mean(y_obs[notification == 0])) } ``` ] .small[ - **Individual:** penalty cancels (both arms get `\(-\lambda \cdot 0.5\)`) → naive = `\(\tau\)` = +0.05 (direct). - **Cluster:** treated cities take full `\(-\lambda\)`, control cities take none → ATE = `\(\tau - \lambda\)` = −0.02 (policy). ] <a href="#indiv-cluster-main" class="nav-btn">← back</a> --- name: icc-derivation # Backup: ICC and Design Effect Derivation .small[ **ICC:** `\(\rho = \dfrac{\sigma^2_b}{\sigma^2_b + \sigma^2_w}\)`, with `\(\sigma^2_b\)` = between-cluster variance, `\(\sigma^2_w\)` = within-cluster variance. For a cluster of size `\(m\)` and `\(K\)` clusters total ( `\(n = Km\)` ): `$$\text{Var}(\bar{Y}_c) = \frac{\sigma^2_w}{m} + \sigma^2_b \qquad\Longrightarrow\qquad \text{Var}(\bar{Y}) = \frac{\sigma^2_w}{Km}\bigl[1 + (m-1)\rho\bigr]$$` Under simple random sampling, `\(\text{Var}_{SRS}(\bar{Y}) = \sigma^2 / n\)`. The ratio is the **design effect**: `$$\text{DEFF} = \frac{\text{Var}(\bar{Y})}{\text{Var}_{SRS}(\bar{Y})} = 1 + (m-1)\rho \qquad\Longrightarrow\qquad n_{eff} = \frac{n}{\text{DEFF}}$$` ] --- name: switchback-dgp # Backup: DGP for the Switchback Simulation .small[ ```r tau_sb <- 0.05 # true within-city treatment effect sim_switchback <- function(carryover = 0) { n_periods <- 50; n_sw_cities <- 20 d <- expand_grid(city_id = 1:n_sw_cities, period = 1:n_periods) |> mutate(notification = sample(c(0, 1), n(), replace = TRUE), # i.i.d. switching city_effect = rep(rnorm(n_sw_cities, 0, 0.05), # μ_c times = n_periods), period_effect = rep(rnorm(n_periods, 0, 0.02), # γ_t each = n_sw_cities)) |> group_by(city_id) |> mutate(prev = lag(notification, default = 0), # Asymmetric carryover: residual treatment leaks into off periods ONLY. # On periods get the full direct effect regardless of prev. carry = carryover * (1 - notification) * prev) |> ungroup() |> mutate(y = 0.4 + city_effect + period_effect + tau_sb * notification + carry + rnorm(n(), 0, 0.02)) mean(d$y[d$notification == 1]) - mean(d$y[d$notification == 0]) } ``` ] .small[ **Why the bias is downward.** With i.i.d. switching, `\(P(\text{prev}=1) = 0.5\)` in both arms — symmetric carryover (a `\(+\delta\)` that hits *any* off period equally) would cancel. But here the carryover is *asymmetric*: it shows up only on `\(T=0\)` periods that follow `\(T=1\)`, lifting the control mean by `\(\delta/2\)` while leaving the treated mean unchanged. So `\(E[\hat{\tau}] = \tau - \delta/2\)`. With `\(\tau = 0.05\)` and `\(\delta = 0.05\)`: estimate ≈ 0.025. ] <a href="#switchback-formal" class="nav-btn">formal</a> --- name: switchback-formal # Backup: Switchback Estimator .small[ **Model.** For a switchback design with cities `\(c = 1, \ldots, C\)` and periods `\(t = 1, \ldots, T\)`: `$$Y_{ct} = \mu_c + \gamma_t + \tau \cdot D_{ct} + \delta \cdot D_{c,t-1} + \varepsilon_{ct}$$` where `\(\mu_c\)` = city FE, `\(\gamma_t\)` = period FE, `\(D_{ct}\)` = treatment indicator for city `\(c\)` at time `\(t\)`, and `\(\delta\)` = carryover parameter (ideally `\(\delta = 0\)`). ] -- .small[ **Naive estimator** (ignores carryover): `$$\hat{\tau} = \frac{1}{CT}\sum_{c,t} (2D_{ct} - 1) Y_{ct}$$` ] -- .small[ **Bias under carryover.** If `\(\delta \neq 0\)`: `$$E[\hat{\tau}] = \tau + \delta \cdot \text{Corr}(D_{ct}, D_{c,t-1})$$` Under i.i.d. randomization, `\(\text{Corr}(D_{ct}, D_{c,t-1}) \approx 0\)`, so the bias is small but nonzero in finite samples. Including `\(D_{c,t-1}\)` as a covariate or using burn-in periods mitigates this. ] --- name: switchback-effn # Backup: Why `\(n_{eff}\)` Grows from `\(C\)` to `\(C \cdot T\)` .small[ Same model: `\(Y_{ct} = \mu_c + \gamma_t + \tau D_{ct} + \varepsilon_{ct}\)`, with `\(\varepsilon_{ct} \sim (0, \sigma_\varepsilon^2)\)` i.i.d., `\(\mu_c \sim (0, \sigma_\mu^2)\)`, `\(\gamma_t \sim (0, \sigma_\gamma^2)\)`. Half-half assignment, `\(C\)` cities, `\(T\)` periods. **Cluster randomization** (one assignment per city; average `\(T\)` periods within each city): `$$\bar{Y}_c = \mu_c + \bar{\gamma} + \tau D_c + \bar{\varepsilon}_c \quad\Longrightarrow\quad \text{Var}(\bar{Y}_c) = \sigma_\mu^2 + \frac{\sigma_\gamma^2 + \sigma_\varepsilon^2}{T}$$` `$$\text{Var}(\hat{\tau}_{cl}) = \frac{4}{C}\!\left(\sigma_\mu^2 + \frac{\sigma_\gamma^2 + \sigma_\varepsilon^2}{T}\right) \;\;\xrightarrow{T \to \infty}\;\; \frac{4 \sigma_\mu^2}{C}$$` The city term `\(\sigma_\mu^2\)` does **not** average down with `\(T\)` — it sets a noise floor governed by the city count `\(C\)`. **Switchback** (i.i.d. re-randomization each period; FE estimator absorbs `\(\mu_c\)` and `\(\gamma_t\)`): `$$\hat{\tau}_{sb} = \frac{\sum_{ct} \tilde{D}_{ct}\, Y_{ct}}{\sum_{ct} \tilde{D}_{ct}^2},\qquad \tilde{D}_{ct} = D_{ct} - \bar{D}_c - \bar{D}_t + \bar{D}$$` `$$\text{Var}(\hat{\tau}_{sb}) = \frac{\sigma_\varepsilon^2}{\sum_{ct} \tilde{D}_{ct}^2} \approx \frac{\sigma_\varepsilon^2}{0.25 \cdot CT} = \frac{4 \sigma_\varepsilon^2}{C \cdot T}$$` The within-city demeaning kills `\(\mu_c\)`, so noise is driven by the idiosyncratic `\(\sigma_\varepsilon^2\)` over all `\(C \cdot T\)` city-periods. **Effective `\(n\)`** ( `\(n_{eff} = \sigma^2_{tot} / \text{Var}(\hat{\tau})\)`, scaled to SRS units): `$$n_{eff,\,cl} \;\approx\; \frac{\sigma^2_{tot}}{\sigma_\mu^2}\cdot\frac{C}{4} \;\sim\; O(C) \qquad\text{vs.}\qquad n_{eff,\,sb} \;\approx\; \frac{\sigma^2_{tot}}{\sigma_\varepsilon^2}\cdot\frac{CT}{4} \;\sim\; O(C \cdot T)$$` Switchback gains a factor of `\(T\)` when `\(\sigma_\mu^2\)` is non-trivial (high ICC). With carryover, `\(\sigma_\varepsilon^2\)` is replaced by an inflated effective error and the ratio shrinks toward 1. ] <a href="#switchback-strengths" class="nav-btn-br">← back</a> --- name: switchback-optimal # Backup: Switchback MSE-Optimal Block Length .small[ **Setup.** `\(C\)` cities, `\(T\)` periods each, block length `\(L\)`, total assignments per city `\(= T/L\)`. Markov-1 carryover model: `$$Y_{ct} = \mu_c + \gamma_t + \tau D_{ct} + \delta D_{c,t-1} + \varepsilon_{ct}, \qquad \varepsilon_{ct} \overset{iid}{\sim} (0, \sigma^2)$$` **Bias.** Under block-randomized `\(D_{ct}\)`, the fraction of `\(D=0\)` periods that follow a `\(D=1\)` period is `\(\approx 1/(2L)\)` — only the first period of each `\(D=0\)` block can be a transition, half of which follow `\(D=1\)`. The naive estimator inherits a downward shift: `$$E[\hat\tau] \;\approx\; \tau - \frac{\delta}{2L} \quad\Longrightarrow\quad \text{Bias}^{2} \;\propto\; \delta^{2}/L^{2}$$` **Variance.** Within a block of length `\(L\)`, the `\(L\)` observations share the same assignment, so they contribute one independent treatment-decision rather than `\(L\)`. With block-clustered SE: `$$\text{Var}(\hat\tau) \;\approx\; \frac{4\sigma^{2} L}{CT} \quad\Longrightarrow\quad \text{Variance} \;\propto\; L / (CT)$$` **MSE-optimal block length.** Minimize `\(\text{MSE}(L) = a/L^{2} + bL\)` where `\(a = \delta^{2}/4\)` and `\(b = 4\sigma^{2}/(CT)\)`: `$$L^{*} \;=\; \left(\frac{2a}{b}\right)^{1/3} \;\propto\; \left(\frac{\delta^{2} \cdot CT}{\sigma^{2}}\right)^{1/3}$$` ] -- .small[ **Implications.** Larger carryover `\(\delta\)` → longer blocks. More periods `\(T\)` or cities `\(C\)` → longer blocks (variance scales away faster than bias). Higher noise `\(\sigma^{2}\)` → shorter blocks. **Bojinov-Simchi-Levi-Zhao (2023)** prove a related minimax-MSE result for the general switching-design problem and provide a carryover-corrected implementable scheme. In practice: pre-estimate `\(\delta\)` from a pre-period A/A test, plug into `\(L^{*}\)`, and cluster SE at the block. ] <a href="#switchback-strengths" class="nav-btn-br">← back</a> --- name: power-mde-z # Backup: Why MDE = (1.96 + 0.84) × SE? .small[ **The formula.** For a two-sided test at level `\(\alpha\)` with target power `\(1 - \beta\)`: `$$\boxed{\;\text{MDE} \;=\; \bigl(z_{1-\alpha/2} \;+\; z_{1-\beta}\bigr) \times \text{SE}\;}$$` At `\(\alpha = 0.05\)`, `\(1-\beta = 0.80\)`: `\(\text{MDE} = (1.96 + 0.84) \times \text{SE} = 2.80 \times \text{SE}\)`. **Derivation.** Test stat `\(T = \hat\delta / \text{SE}\)`. Under `\(H_0\)`: `\(T \sim N(0,1)\)`; under `\(H_1: \delta = \text{MDE} > 0\)`: `\(T \sim N(\text{MDE}/\text{SE},\,1)\)`. - **1.96 = `\(z_{0.975}\)`.** Two-sided `\(\alpha = 0.05\)`: reject when `\(|T| > c\)`. Want 2.5% mass in each tail of `\(N(0,1)\)`. - **0.84 = `\(z_{0.80}\)`.** Power `\(= P(T > 1.96 \mid H_1) = 0.80\)` (lower-tail negligible). Standardizing gives `\(1.96 - \text{MDE}/\text{SE} = -0.84\)`, so `\(\text{MDE}/\text{SE} = 1.96 + 0.84\)`. `\(\square\)` **Intuition.** `\(1.96 \times \text{SE}\)` is the minimum effect that *can* reach significance (50% power). The extra `\(0.84 \times \text{SE}\)` shifts the alternative distribution far enough right that 80% of its mass lies above the critical value. **Reference table.** Multipliers at `\(\alpha=0.05\)` two-sided, i.e. `\(1.96 + z_{1-\beta}\)`: | Power → | 50% | 80% | 90% | 95% | |---|---|---|---|---| | `\(z_{1-\beta}\)` | 0 | 0.84 | 1.28 | 1.64 | | **Multiplier** | 1.96 | **2.80** | 3.24 | 3.60 | ] <a href="#design-walkthrough-power" class="nav-btn-br">← back</a> --- name: spillover-regression # Backup: The Spillover Regression Augment the DiD specification with exposure terms: .small[ `$$Y_{it} = \mu_i + \delta\, I_{t=1} + \sum_{j=1}^{J}\tau_j\,(T_{ij}\, I_{t=1}) + \beta_0\,(S_i\, I_{t=1}) + \theta\,(S_i\, I_{t=1}\, \mathbb{1}\{T_i \neq 0\}) + \sum_{j=0}^{J}\psi_j\,(T_{ij}\, I_{t=1}\, P_i) + \varepsilon_{it}$$` ] .pull-left[ .small[ **What each term identifies:** - `\(\tau_j\)` — direct effect of own assignment to arm `\(j\)` - `\(\beta_0\)` — spillover onto **control** studies from exposure to any treatment - `\(\theta = \beta_* - \beta_0\)` — *additional* spillover when own arm is treated; sign tells you **complementarity** ( `\(\theta>0\)` ) vs. **substitution** ( `\(\theta<0\)` ) - `\(\psi_j P_i\)` — controls for total network degree, so spillover ≠ "well-connected studies are different" <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> ] ] .pull-right[ .small[ **Restrictions used to cut 16 `\(\beta_{jk}\)` down to 2** (count: 16 → 12 → 4 → 2): 1. No spillover from control exposure: `\(\beta_{j0}=0\)` for all `\(j\)` ⇒ kills 4 ⇒ **12 left** 2. Spillover doesn't depend on *which* T-arm exposed (for **all** `\(j\)`): `\(\beta_{jk}=\beta_{j*}\)` for `\(k\in\{1,2,3\}\)` ⇒ collapses 3 entries per row ⇒ **4 left** ($\beta_{0*},\beta_{1*},\beta_{2*},\beta_{3*}$) 3. Spillover on T-arms is the same across T-arms: `\(\beta_{1*}=\beta_{2*}=\beta_{3*}\equiv\beta_*\)` ⇒ **2 left** ($\beta_0\equiv\beta_{0*}$ and `\(\beta_*\)`) Each restriction is testable: relax it, refit, compare `\(\widehat\beta_0\)` and `\(\widehat\theta\)`. ] ] <a href="#spillover-validity" class="nav-btn">validity →</a> --- name: spillover-validity # Backup: Validity of Exposure Mapping .small[ **What's identified for free (under random assignment):** - `\(T_{ij}\)` is randomized ⇒ `\(\tau_j\)` is identified unconditionally. - `\(S_i\)` is determined by *others'* (random) assignments + the **fixed** network structure ⇒ `\(S_i\)` is "as good as random" *conditional on the network*. So `\(\beta_0\)` is identified by within-arm variation in exposure. <a href="#spillover-id" class="inline-btn">proof</a> ] .small[ **What you need on top:** 1. **Independent random assignment** of `\(T_i\)` across units (or known stratified assignment with conditioning). 2. **Correct exposure mapping.** `\(S_i\)` captures all relevant interference channels. If second-degree (indirect) ties or out-of-sample co-authors matter, `\(\widehat\beta_0\)` absorbs them — possibly with the wrong sign. 3. **Linearity / additivity.** Spillover scales linearly in `\(S_i\)` and degree enters through `\(P_i\)` alone. Saturation, threshold, or non-monotone effects, or unmodeled `\(P_i \to Y\)` channels, break this. 4. **Network exogeneity.** The co-author network must not respond to (anticipated) treatment (no endogenous network formation between `\(t=0\)` and `\(t=1\)`). 5. **Stable composition of `\(P_i\)`.** `\(P_i\)` must be measured the same way for everyone, so that controlling for it actually absorbs the "high-degree studies are different" channel. <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> ] .highlight-box[ **Trade-off vs. ego-cluster.** Exposure mapping uses the *whole* sample (no nodes dropped) and recovers spillover magnitudes — but identification rests on a parametric model of how spillovers propagate. Ego-cluster guarantees clean identification on a smaller sample. The right choice depends on whether you trust the network model or the partition. ] <a href="#spillover-regression" class="nav-btn">← regression</a> --- name: spillover-id # Backup: Why `\(\beta_0\)` Is Identified .small[ **Setup.** Treatment `\(T_i \in \{0, 1, \dots, J\}\)` assigned independently across `\(i\)`. Network `\(\mathbf{M} = (M_{as})\)` is fixed (measured pre-treatment). Define exposure as `$$S_{ik} = \sum_{a \in A_i} \sum_{s \neq i} M_{as}\, \mathbb{1}\{T_s = k\}, \qquad S_i = \sum_{k \geq 1} S_{ik}, \qquad P_i = \sum_{k \geq 0} S_{ik}.$$` Maintained outcome model: `$$Y_i = \mu + \tau_{T_i} + \beta_0\, S_i + \theta\, S_i\, \mathbb{1}\{T_i \neq 0\} + \psi\, P_i + \varepsilon_i,\quad E[\varepsilon_i \mid T_i, S_i, P_i, \mathbf{M}] = 0.$$` ] -- .small[ **Lemma (random exposure).** `\(\;S_i \perp T_i \mid \mathbf{M}.\)` *Proof.* The inner sum runs over `\(s \neq i\)`, so `\(S_i\)` is a function of `\(\{T_s : s \neq i\}\)` and the fixed network `\(\mathbf{M}\)` alone. Independent assignment gives `\(T_i \perp \{T_s : s \neq i\}\)`, and `\(\mathbf{M}\)` is constant. Therefore `\(S_i \perp T_i \mid \mathbf{M}\)`. `\(\quad\square\)` (Same argument applies to `\(P_i\)`: `\(P_i \perp T_i \mid \mathbf{M}\)`.) ] -- .small[ **Identification of `\(\beta_0\)`.** Restrict to controls ( `\(T_i = 0\)` ): `$$E[Y_i \mid T_i = 0, S_i, P_i, \mathbf{M}] \;=\; \mu + \tau_0 + \beta_0\, S_i + \psi\, P_i.$$` OLS of `\(Y\)` on `\(S_i\)` **and** `\(P_i\)` within the control sub-sample is consistent for `\(\beta_0\)` whenever the partial variance `\(\text{Var}(S_i \mid P_i, T_i = 0) > 0\)` — guaranteed by the lemma plus a non-degenerate network ( <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> shows this partial variance equals `\(\pi_T(1-\pi_T)\, P_i\)`). Within-treated OLS gives `\(\beta_0 + \theta\)`, so `\(\theta\)` is identified. `\(\quad\square\)` ] -- .small[ **What this *doesn't* prove.** (i) **Linearity of `\(S_i\)` and `\(P_i\)`** — nonlinear or threshold spillovers, indirect ties, or interactions break the result; OLS then recovers a best-linear-projection that is generally not the structural `\(\beta_0\)`. (ii) **Network exogeneity** — co-author networks must not respond to (anticipated) treatment. (iii) **Identifying variation may be small** — if `\(\rho(S, P) \approx 1\)` (dense networks, concentrated treatment fractions), `\(\widehat{\beta_0}\)` is consistent but high-variance. ] <a href="#spillover-validity" class="nav-btn-br">← validity</a> --- name: spillover-S-vs-P # Backup: `\(S_i\)` vs `\(P_i\)` — Why Both, Despite Collinearity? .small[ **Decomposition.** `\(P_i = S_{i0} + S_i\)`, where `\(S_{i0}\)` counts neighbors in the control arm. So under uniform 4-arm randomization with treatment fraction `\(\pi_T = 3/4\)`: `$$E[S_i \mid P_i] = \pi_T \cdot P_i = 0.75\, P_i$$` The expected `\(S_i\)` is a deterministic linear function of `\(P_i\)` — that's the source of the collinearity. The correlation `\(\rho(S, P)\)` is close to 1 in any sample with non-trivial `\(P_i\)`. ] -- .pull-left[ .small[ **Where identifying variation lives.** Decompose `\(S_i\)` into structural and randomized components: `$$S_i \;=\; \underbrace{\pi_T P_i}_{\text{structural (}\propto\text{degree)}} \;+\; \underbrace{(S_i - \pi_T P_i)}_{\text{randomized}}$$` Conditional on `\(P_i\)`, the second piece has variance `$$\text{Var}(S_i \mid P_i) = \pi_T(1-\pi_T)\, P_i \approx 0.19\, P_i$$` driven entirely by *which arm each neighbor landed in*. That's the variation `\(\widehat{\beta_0}\)` uses. ] ] .pull-right[ .small[ **Why include `\(P_i\)` anyway?** Drop `\(P_i\)` and `\(\widehat{\beta_0}\)` confounds two channels: 1. **Per-T-neighbor spillover** — the structural `\(\beta_0\)`. 2. **Degree effect** — well-connected studies may differ for unrelated reasons (more visibility, broader networks, prior relationships, productivity). Including `\(P_i\)` partials out the degree channel; the residual in `\(S_i \mid P_i\)` is *exactly* the randomized neighbor-arm draw, which is the only variation `\(\beta_0\)` should be identified from. **Cost:** collinearity inflates `\(\text{SE}(\widehat{\beta_0})\)`; power scales with `\(\sqrt{n\,\bar P\,\pi_T(1-\pi_T)}\)`. Identification requires both **many studies** *and* **non-trivial degree**. ] ] <a href="#spillover-validity" class="nav-btn-br">← validity</a> --- name: ego-cluster-formal # Backup: Ego-Cluster Randomization Define the **ego network** of unit `\(i\)` as `\(\mathcal{N}_i = \{i\} \cup \{j : j \text{ is connected to } i\}\)`. -- **Ego-cluster randomization** assigns the same treatment to all units in `\(\mathcal{N}_i\)`: `$$D_j = D_i \quad \forall j \in \mathcal{N}_i$$` -- This ensures that, for unit `\(i\)`, all first-degree neighbors have the same treatment status. If interference is limited to first-degree connections: `$$Y_i(D_i, \mathbf{D}_{\mathcal{N}_i}) = Y_i(D_i, D_i, \ldots, D_i) = Y_i(D_i)$$` and SUTVA is restored within the ego cluster. -- **Challenges:** - Ego clusters overlap: if `\(i\)` and `\(j\)` are neighbors, `\(\mathcal{N}_i \cap \mathcal{N}_j \neq \emptyset\)` - Must resolve overlaps (e.g., graph coloring, independent set sampling) - Power depends on the number of *non-overlapping* ego clusters - Second-degree interference is not addressed