Module 3: Designing Around Interference

class: center, middle, inverse, title-slide

.title[
# Module 3: Designing Around Interference
]
.subtitle[
## Cluster Randomization, Switchbacks, and the Bias-Variance Tradeoff
]

---

# Course Map

<table>
<tr><th>#</th><th>Module</th><th>Status</th></tr>
<tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr>
<tr><td>2</td><td><a href="../module-02/slides.html">SUTVA and When It Breaks</a></td><td>✓ done</td></tr>
<tr><td><b>3</b></td><td><b>Designing Around Interference</b> <i>(you are here)</i></td><td>current</td></tr>
<tr><td>4</td><td>Power and Sample Size</td><td>upcoming</td></tr>
<tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr>
<tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr>
<tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr>
<tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr>
</table>

---

# Last Time: SUTVA Breaks Everywhere

Module 2 showed that interference biases the naive ATE in two settings we'll keep using here:

- **Marketplace (zone-notification experiment):** notified drivers compete for rides in the same zone, contaminating the comparison
- **Networks (author-nudge experiment):** control researchers learn the new reporting practice from treated co-authors

Today: **how to design experiments that work despite interference.**

The core idea is simple: randomize at a level where interference is contained.

.highlight-box[
Every design in this module trades **bias** (from interference) for **variance** (from fewer independent units). The art is finding the sweet spot.
]

---

# The Setup: 40 Cities, 5,000 Drivers Each

A ride-sharing platform operates in **40 cities** with **5,000 drivers each** (200,000 total). They want to test the **zone-notification** feature from Module 1.

**Within-city interference (symmetric):** as more drivers are notified, the high-demand zone gets crowded — *every* driver in the city (notified or not) faces lower accept rates. Direct boost `$\tau = 0.05$`, interference `$\lambda = 0.07$` → policy effect = `$-0.02$` at full rollout.

**Design options:**

| Design | Unit of randomization | Independent units | Estimand |
|--------|----------------------|-------------------|----------|
| Individual | Driver | 200,000 | Direct effect (+0.05) |
| City cluster | City | 40 | Policy effect (−0.02) |
| Region cluster | 5-city region | 8 | Policy effect (−0.02) |

---
name: indiv-cluster-main

# Individual vs Cluster: Side by Side

---
name: cluster-bias-main

# Two Estimators, Two Estimands

.pull-left[
.small[
**Individual rand → direct effect:**
- 50/50 within every city, so frac_t ≈ 0.5 everywhere
- Both arms face the *same* penalty `$-\lambda \cdot 0.5$`; it cancels
- Naive recovers `$\tau$` = +0.05
- Answers: *"marginal benefit to one driver?"*

**City-cluster rand → policy effect:**
- Treated cities: frac_t = 1 (full penalty `$-\lambda$`)
- Control cities: frac_t = 0 (no penalty)
- Cluster contrast = `$\tau - \lambda$` = −0.02
- Answers: *"what happens at full rollout?"*

Both are unbiased — for *different estimands*. Pick one based on the decision you're trying to make.
]
]

.pull-right[

|Design              |Treated mean |Control mean |Estimator gives        |
|:-------------------|:------------|:------------|:----------------------|
|Individual (50/50)  |b + τ − λ·½  |b − λ·½      |τ  (direct, +0.05)     |
|Cluster (saturated) |b + τ − λ    |b            |τ − λ  (policy, −0.02) |

.highlight-box[
**Intuition:** the cluster contrast *includes* the interference term, so it captures the equilibrium effect under full rollout. Individual rand mechanically cancels the term out.
]
]

.small[**Cost of clustering:** effective sample size collapses from **200,000** drivers (individual) to **40** cities (city cluster) to **8** regions (region cluster). The next slide quantifies this with the design effect.]

<a href="#cluster-bias-proof" class="nav-btn">decomposition</a>

---

# Design Effect and ICC

The **design effect** is the variance inflation factor from clustering:

`$$DEFF = 1 + (m - 1) \cdot \rho \qquad\Longrightarrow\qquad n_{eff} = \frac{n}{DEFF}$$`

For a fixed effect size you want to detect, required `$n$` scales by `$DEFF$` — equivalently, your effective sample shrinks by the same factor. `$m$` is cluster size (drivers per city), `$\rho$` is intra-cluster correlation (ICC).

.pull-left[
.small[

| m (drivers)|  ICC|    DEFF| Eff. N (of 200k)| Power (MDE = 15 pp)|
|-----------:|----:|-------:|----------------:|-------------------:|
|         100| 0.02|     3.0|           67,114|                100%|
|         100| 0.05|     6.0|           33,613|                100%|
|         100| 0.10|    10.9|           18,349|                100%|
|       5,000| 0.02|   101.0|            1,981|                100%|
|       5,000| 0.05|   251.0|              797|                 99%|
|       5,000| 0.10|   500.9|              399|                 86%|
|      25,000| 0.02|   501.0|              399|                 86%|
|      25,000| 0.05| 1,251.0|              160|                 49%|
|      25,000| 0.10| 2,500.9|               80|                 28%|
]
]

.pull-right[
.small[
**Plausible ranges for `$\rho$`:**

| Setting | Typical `$\rho$` |
|---------|----------------|
| Online behavior (clicks, conversions) | 0.001 – 0.01 |
| Marketplace metrics (accept rate, ride completion) | 0.02 – 0.10 |
| Geo clusters (DMA, city) | 0.05 – 0.20 |
| Strong common shocks (weather, surge) | 0.10 – 0.30 |

**Rule of thumb:** anything you'd model with city or time-period fixed effects has `$\rho$` big enough to matter. Estimate `$\rho$` from a pilot or historical data — don't guess.
]
]

<a href="#icc-derivation" class="nav-btn">ICC derivation</a>

---

# The Bias-Variance Tradeoff: Simulated

.small[
**Bias** vanishes once clusters reach city scale — interference is fully contained. **Variance** keeps climbing — fewer effective independent units.
]

---

# Switchback Designs

An alternative to spatial clustering: **randomize over time**.

.pull-left[
**How it works:**
1. Divide time into periods (e.g., 2-hour blocks)
2. Each period, randomize each city to notification on/off
3. Compare accept rates in notified vs. non-notified periods
4. Every city serves as its own control (removes city fixed effects)
]

.pull-right[
<img src="slides_files/figure-html/switchback-diagram-1.png" style="display: block; margin: auto;" />
]

---
name: switchback-strengths

# Switchback: Strengths and Risks

.small[
**Strengths:**
- Every unit (city) appears in both arms over time — removes unit-level confounders
- **More "independent observations" than pure clustering:** identification is *within-city* (a city's "on" periods vs. its own "off" periods), not between cities. The ICC penalty is differenced out, and effective `$N$` grows from `$\approx n_{cities}$` toward `$n_{cities} \times n_{periods}$` <a href="#switchback-effn" class="inline-btn">proof</a>
- Natural for marketplace experiments where interference is spatial, not temporal
]

**Risks:**

.small[
**Carryover:** notification in period `$t$` affects behavior in `$t+1$` even if `$t+1$` is "off" (drivers who learned the zone keep going). **Fix:** insert *washout windows* — short buffers right after each switch that you exclude from analysis, letting behavior reset before measurement resumes. **Cost:** fewer usable minutes per period.
]

---

# Switchback: Simulation

---

# Two-Sided Marketplace Designs

In a ride-sharing experiment, you can randomize **riders** instead of **drivers**:

.pull-left[
**Randomize riders:**
- 50% of riders see new pricing or matching
- 50% see the old experience
- Riders don't interact with each other (mostly)
- No interference on the randomization side

**Measure driver outcomes:**
- Driver earnings, wait times, accept rate
- These reflect the equilibrium effect
]

.pull-right[
**Why it works:**
- Each rider's experience is independent (they don't share rides)
- But their collective demand changes driver supply allocation
- You get the direct effect on riders AND the indirect effect on drivers

**When it doesn't work:**
- If treated riders change market-wide conditions (e.g., surge pricing triggers)
- If the treatment fraction is large enough to move equilibrium prices
]

---

# Geo Experiments

.pull-left[
.small[
For online advertising and pricing experiments, **geographic units** are natural clusters.

A **DMA** (Designated Market Area, defined by Nielsen) is a metro region used for TV and ad targeting — e.g., "New York DMA," "Los Angeles DMA." There are **210 DMAs in the US**, ranging from a few hundred thousand to ~20M people. Ad spend can be served at DMA granularity.

**Treatment** in a geo experiment = changing ad spend in randomly chosen DMAs (typically *pausing* or *boosting* a campaign), holding control DMAs at the status quo. Outcome = revenue / conversions / signups per DMA. The contrast estimates **incremental lift** — what wouldn't have happened organically.
]
]

.pull-right[
<img src="slides_files/figure-html/geo-diagram-1.png" style="display: block; margin: auto;" />
]

.blue-box[
**Google/Meta approach:** "GeoX" / "geo-based incrementality." Randomize ad spend across 50–200 DMAs. Power depends on between-DMA variance and number of DMAs, not number of users.
]

---

# Ego-Cluster Randomization for Networks

For **network interference** (the author-nudge experiment from Module 2), standard clustering is hard because network boundaries are unclear.

.small[
**Ego-cluster approach** (node roles in the chart below):
1. Pick a focal study — the **ego** (center of each cluster, labeled "ego")
2. Add every study sharing co-authors with it — the **alters** (the four nodes around each ego)
3. Randomize the entire ego-cluster (ego + alters) to the same arm
4. Within-cluster spillover is absorbed; **bridges** (faded grey rings) are alters whose ties extend into the other arm — **drop them from the analysis sample** to avoid cross-arm contamination
]

---

# Approach 4: Model the Spillover (Exposure Mapping)

The first three approaches **contain** interference by design. When you can't — diffuse networks, observational settings, small samples — **model the spillover** instead.

.pull-left[
.small[
**Idea:** for each unit `$i$`, build an exposure measure `$S_i$` summarizing how its network neighbors are treated. Include `$S_i$` as a covariate; recover the direct effect *and* the spillover jointly.

For study `$i$` with author set `$A_i$` (M2's author-nudge experiment), `$S_{ik} = \sum_{a \in A_i} \sum_{s \neq i} M_{as} T_{sk}$` counts other studies in arm `$k$` that share an author with `$i$`.

- `$S_i = \sum_{k \geq 1} S_{ik}$` — exposure to *any* treatment arm
- `$P_i = \sum_{k \geq 0} S_{ik}$` — total degree (incl. control)
]
]

.pull-right[
<img src="slides_files/figure-html/spillover-diagram-1.png" style="display: block; margin: auto;" />
.small[
Study 2 shares author *b* with study 1 and *e* with study 3 ⇒ `$S_2 = 1$`, `$P_2 = 2$`. Study 4's only tie is out-of-sample ⇒ `$S_4 = P_4 = 0$`.
]
]

<a href="#spillover-regression" class="nav-btn">regression & validity</a>

---

# Approach 4: Simulation

True parameters: `$\tau_1=0.05$`, `$\tau_2=0.10$`, `$\tau_3=0.15$` (direct), `$\beta_0=0.05$` (control spillover), `$\theta=0.02$` (extra on treated), `$\psi=0.05$` (direct effect of degree `$P_i$`). 500 studies, co-author network (mean degree ≈ 6), random T0/T1/T2/T3 assignment.

<a href="#spillover-regression" class="nav-btn">regression & validity</a>

---

# Application: City-Level Zone-Notification Test

A ride-sharing company wants to test whether the **zone-notification** feature lifts driver accept rate at the *city* level.

.pull-left[
**Why cluster at the city level?**
- Within a city, notified drivers compete with non-notified for rides
- A 50/50 split per city contaminates control's accept rate
- City-level randomization: each city gets the feature on or off, no within-city mixing
]

.pull-right[
**The power problem:**
- Only 40 cities available
- 20 treatment, 20 control
- ICC of accept rate within cities ≈ 0.10
- DEFF `$\approx 500 \;\Longrightarrow\;$` Effective `$N \approx 400$` (out of 200k drivers)
- For MDE = 15 pp, power `$\approx 86\%$` — need a **large** effect
]

.highlight-box[
**The interview question:** "We have 40 cities. Can we run a cluster-randomized experiment?" Answer: probably, but only if the expected effect is large. Calculate the minimum detectable effect (MDE) and check if it's business-relevant.
]

---

# Application: Switchback for Zone Notifications

**Alternative:** use a switchback design within each city.

- Divide each day into 2-hour blocks
- Randomize which blocks have the zone-notification feature on vs. off
- 12 blocks/day × 30 days × 40 cities = 14,400 city-period cells
- Effective N is much larger than 40 clusters

**But:**
- **Carryover**: if drivers learn the high-demand zone during "on" periods, they keep going there during "off" periods
- Need washout periods (waste time and data)
- Confounding with time-of-day effects (notifications matter more at peak times)

.blue-box[
**Practical design:** stratified switchback. Within each city, pair "similar" time blocks (e.g., Monday 5–7pm with Wednesday 5–7pm) and randomize within pairs. This controls for time-of-day and day-of-week effects.
]

---

# Application: Geo Incrementality for Online Ads

An e-commerce company wants to measure whether their TV ad campaign drives sales.

**Design:**
- 100 DMAs (Designated Market Areas) across the US
- Randomly assign 50 DMAs to see ads, 50 to no ads
- Measure sales lift at the DMA level
- Control for baseline sales using **CUPED** — subtract a scaled pre-experiment outcome from each DMA's post outcome to soak up pre-existing variation (covered in M5)

**Challenges:**
- DMAs vary enormously in size (NYC vs. rural Montana)
- Need to weight by population or stratify by DMA size
- Spillovers: people in control DMAs can see ads on streaming/social media (imperfect compliance)
- Attribution: sales in a DMA might come from visitors from neighboring DMAs

.highlight-box[
**Key insight:** geo experiments work best when the treatment is *geographically contained* (TV, billboards, local promotions). They work poorly for digital treatments that cross geo boundaries (social media ads, viral content).
]

---

# Choosing the Right Design

| Interference type | Recommended design | Key tradeoff |
|-------------------|-------------------|--------------|
| Within-market (supply/demand) | City-level cluster | Bias vs. power |
| Temporal (pricing, algorithms) | Switchback | Carryover vs. efficiency |
| Network (social, co-authorship) | Ego-cluster | Cluster size vs. coverage |
| Geographic (ads, promotions) | Geo experiment | Geo count vs. precision |
| Two-sided marketplace | Randomize one side | Indirect effects only |

**Decision framework:**

1. **Where does interference happen?** Within cities? Across time? Through networks?
2. **Can you cluster at that level?** How many independent clusters do you have?
3. **Is the effect large enough to detect with that many clusters?**
4. **Is there carryover/leakage between clusters?** If so, switchback may be risky.

---

# Summary of the Bias-Variance Tradeoff

---

# Key Takeaways

1. **Cluster randomization** eliminates interference bias by assigning entire clusters to the same arm. Cost: fewer independent units = more variance.

2. **Switchback designs** randomize over time. Good when interference is spatial. Watch for carryover effects.

3. **Two-sided marketplace designs**: randomize one side, measure the other. Works when one side doesn't interact with itself.

4. **Geo experiments**: natural clusters for advertising. Need many geos (50+) and geographically contained treatments.

5. **Ego-cluster randomization**: for network interference (e.g., the author-nudge experiment). Cluster = focal unit + neighbors.

6. **The bias-variance tradeoff** is the central tension. Minimize **RMSE**, not just bias or variance alone.

7. The **design effect** formula `$n_{eff} = n / [1 + (m-1)\rho]$` tells you how much power you lose from clustering.

---

# Exercise Preview

In the exercise you will:

1. Compare individual vs. city-level randomization in a zone-notification simulation
2. Show that clustering eliminates bias but increases variance
3. Sweep over cluster sizes and find the RMSE-minimizing level
4. Simulate a switchback design with and without carryover
5. Compute design effects for different ICC values

See `exercise.R` for the starter code.

---

# Course Map

<table>
<tr><th>#</th><th>Module</th><th>Status</th></tr>
<tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr>
<tr><td>2</td><td><a href="../module-02/slides.html">SUTVA and When It Breaks</a></td><td>✓ done</td></tr>
<tr><td><b>3</b></td><td><b>Designing Around Interference</b> <i>(just finished)</i></td><td>✓ done</td></tr>
<tr><td>4</td><td>Power and Sample Size</td><td>up next</td></tr>
<tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr>
<tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr>
<tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr>
<tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr>
</table>

---
class: center, middle, inverse

# Backup Slides

---
name: cluster-bias-proof

# Backup: Two Estimators, Two Estimands

**Symmetric interference DGP** — every driver in a city competes against the same treated share `$s$`:

`$$y_i^{(0)} = b_i - \lambda \cdot s_{c(i)}, \qquad y_i^{(1)} = b_i + \tau - \lambda \cdot s_{c(i)}$$`

with `$b_i = 0.4 + 0.2\,\text{exp}_i + \alpha_{c(i)}$`, `$\tau = 0.05$` direct effect, `$\lambda = 0.07$` interference.

The naive contrast under random assignment — `$\bar b$` balances across arms:

`$$\hat{\tau}_{\text{naive}} = E[y^{(1)} \mid D=1] - E[y^{(0)} \mid D=0] = (\bar b + \tau - \lambda \bar s_{D=1}) - (\bar b - \lambda \bar s_{D=0}) = \tau - \lambda(\bar s_{D=1} - \bar s_{D=0})$$`

**Individual rand** (50/50 in every city → `$\bar s_{D=1} = \bar s_{D=0} = 0.5$`):

`$$\hat\tau_{\text{ind}} = \tau - \lambda(0.5 - 0.5) = \tau = 0.05 \quad \text{(direct effect)}$$`

**City cluster** (treated cities `$s = 1$`, control cities `$s = 0$`):

`$$\hat\tau_{\text{cluster}} = \tau - \lambda(1 - 0) = \tau - \lambda = -0.02 \quad \text{(policy effect)}$$`

Both are unbiased — for *different* estimands. If your decision is "should we roll this out everywhere?", you want `$\hat\tau_{\text{cluster}}$`.

---
name: dgp-indiv-cluster

# Backup: DGP for the Slide 5 Simulation

.small[

```r
direct_effect <- 0.05   # per-driver direct boost from being notified
interference  <- 0.07   # symmetric crowd-out per share treated (hits BOTH arms)

# Individual rand: 50/50 in every city → frac_t ≈ 0.5 everywhere
sim_individual <- function() {
  d <- market |> mutate(notification = sample(rep(c(0, 1), each = n() / 2)))
  d |> group_by(city_id) |>
    mutate(frac_t = mean(notification),                                          # ≈ 0.5
           y0 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect             # baseline
                                  - interference * frac_t)),                     # CONTROLS hit by competition
           y1 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect             # baseline
                                  + direct_effect - interference * frac_t))) |>  # TREATED also hit
    ungroup() |>
    mutate(y_obs = rbinom(n(), 1, prob = if_else(notification == 1, y1, y0))) |>
    summarise(ate = mean(y_obs[notification == 1]) - mean(y_obs[notification == 0]))
}

# City-cluster rand: every city is 100% same arm → frac_t is 0 or 1
sim_cluster <- function() {
  arms <- tibble(city_id = 1:n_cities,
                 notification = sample(rep(c(0, 1), each = n_cities / 2)))
  market |> left_join(arms, by = "city_id") |>
    group_by(city_id) |> mutate(frac_t = mean(notification)) |> ungroup() |>
    mutate(y0 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect
                                  - interference * frac_t)),                     # 0 in control cities
           y1 = pmin(1, pmax(0, 0.4 + 0.2 * experience + city_effect
                                  + direct_effect - interference * frac_t)),     # full λ in treated cities
           y_obs = rbinom(n(), 1, prob = if_else(notification == 1, y1, y0))) |>
    summarise(ate = mean(y_obs[notification == 1]) - mean(y_obs[notification == 0]))
}
```
]

.small[
- **Individual:** penalty cancels (both arms get `$-\lambda \cdot 0.5$`) → naive = `$\tau$` = +0.05 (direct).
- **Cluster:** treated cities take full `$-\lambda$`, control cities take none → ATE = `$\tau - \lambda$` = −0.02 (policy).
]

---
name: icc-derivation

# Backup: ICC and Design Effect Derivation

.small[
**ICC:** `$\rho = \dfrac{\sigma^2_b}{\sigma^2_b + \sigma^2_w}$`, with `$\sigma^2_b$` = between-cluster variance, `$\sigma^2_w$` = within-cluster variance.

For a cluster of size `$m$` and `$K$` clusters total ( `$n = Km$` ):

`$$\text{Var}(\bar{Y}_c) = \frac{\sigma^2_w}{m} + \sigma^2_b
\qquad\Longrightarrow\qquad
\text{Var}(\bar{Y}) = \frac{\sigma^2_w}{Km}\bigl[1 + (m-1)\rho\bigr]$$`

Under simple random sampling, `$\text{Var}_{SRS}(\bar{Y}) = \sigma^2 / n$`. The ratio is the **design effect**:

`$$\text{DEFF} = \frac{\text{Var}(\bar{Y})}{\text{Var}_{SRS}(\bar{Y})} = 1 + (m-1)\rho
\qquad\Longrightarrow\qquad
n_{eff} = \frac{n}{\text{DEFF}}$$`
]

---
name: switchback-dgp

# Backup: DGP for the Switchback Simulation

.small[

```r
tau_sb <- 0.05   # true within-city treatment effect

sim_switchback <- function(carryover = 0) {
  n_periods <- 50; n_sw_cities <- 20
  d <- expand_grid(city_id = 1:n_sw_cities, period = 1:n_periods) |>
    mutate(notification   = sample(c(0, 1), n(), replace = TRUE),       # i.i.d. switching
           city_effect    = rep(rnorm(n_sw_cities, 0, 0.05),             # μ_c
                                times = n_periods),
           period_effect  = rep(rnorm(n_periods,  0, 0.02),              # γ_t
                                each  = n_sw_cities)) |>
    group_by(city_id) |>
    mutate(prev  = lag(notification, default = 0),
           # Asymmetric carryover: residual treatment leaks into off periods ONLY.
           # On periods get the full direct effect regardless of prev.
           carry = carryover * (1 - notification) * prev) |>
    ungroup() |>
    mutate(y = 0.4 + city_effect + period_effect +
               tau_sb * notification + carry + rnorm(n(), 0, 0.02))
  mean(d$y[d$notification == 1]) - mean(d$y[d$notification == 0])
}
```
]

.small[
**Why the bias is downward.** With i.i.d. switching, `$P(\text{prev}=1) = 0.5$` in both arms — symmetric carryover (a `$+\delta$` that hits *any* off period equally) would cancel. But here the carryover is *asymmetric*: it shows up only on `$T=0$` periods that follow `$T=1$`, lifting the control mean by `$\delta/2$` while leaving the treated mean unchanged. So `$E[\hat{\tau}] = \tau - \delta/2$`. With `$\tau = 0.05$` and `$\delta = 0.05$`: estimate ≈ 0.025.
]

<a href="#switchback-formal" class="nav-btn">formal</a>

---
name: switchback-formal

# Backup: Switchback Estimator

For a switchback design with cities `$c = 1, \ldots, C$` and periods `$t = 1, \ldots, T$`:

`$$Y_{ct} = \mu_c + \gamma_t + \tau \cdot D_{ct} + \delta \cdot D_{c,t-1} + \varepsilon_{ct}$$`

- `$\mu_c$`: city fixed effect
- `$\gamma_t$`: period fixed effect
- `$D_{ct}$`: notification indicator for city `$c$` at time `$t$`
- `$\delta$`: carryover parameter (ideally `$\delta = 0$`)

The standard switchback estimator ignores carryover:

`$$\hat{\tau} = \frac{1}{CT}\sum_{c,t} (2D_{ct} - 1) Y_{ct}$$`

If `$\delta \neq 0$`, the estimator is biased:

`$$E[\hat{\tau}] = \tau + \delta \cdot \text{Corr}(D_{ct}, D_{c,t-1})$$`

Under i.i.d. randomization, `$\text{Corr}(D_{ct}, D_{c,t-1}) \approx 0$`, so the bias is small but not zero in finite samples. Including `$D_{c,t-1}$` as a covariate or using burn-in periods mitigates this.

---
name: switchback-effn

# Backup: Why `$n_{eff}$` Grows from `$C$` to `$C \cdot T$`

.small[
Same model: `$Y_{ct} = \mu_c + \gamma_t + \tau D_{ct} + \varepsilon_{ct}$`, with `$\varepsilon_{ct} \sim (0, \sigma_\varepsilon^2)$` i.i.d., `$\mu_c \sim (0, \sigma_\mu^2)$`, `$\gamma_t \sim (0, \sigma_\gamma^2)$`. Half-half assignment, `$C$` cities, `$T$` periods.

**Cluster randomization** (one assignment per city; average `$T$` periods within each city):

`$$\bar{Y}_c = \mu_c + \bar{\gamma} + \tau D_c + \bar{\varepsilon}_c
\quad\Longrightarrow\quad
\text{Var}(\bar{Y}_c) = \sigma_\mu^2 + \frac{\sigma_\gamma^2 + \sigma_\varepsilon^2}{T}$$`

`$$\text{Var}(\hat{\tau}_{cl}) = \frac{4}{C}\!\left(\sigma_\mu^2 + \frac{\sigma_\gamma^2 + \sigma_\varepsilon^2}{T}\right)
\;\;\xrightarrow{T \to \infty}\;\;
\frac{4 \sigma_\mu^2}{C}$$`

The city term `$\sigma_\mu^2$` does **not** average down with `$T$` — it sets a noise floor governed by the city count `$C$`.

**Switchback** (i.i.d. re-randomization each period; FE estimator absorbs `$\mu_c$` and `$\gamma_t$`):

`$$\hat{\tau}_{sb} = \frac{\sum_{ct} \tilde{D}_{ct}\, Y_{ct}}{\sum_{ct} \tilde{D}_{ct}^2},\qquad
\tilde{D}_{ct} = D_{ct} - \bar{D}_c - \bar{D}_t + \bar{D}$$`

`$$\text{Var}(\hat{\tau}_{sb}) = \frac{\sigma_\varepsilon^2}{\sum_{ct} \tilde{D}_{ct}^2}
\approx \frac{\sigma_\varepsilon^2}{0.25 \cdot CT}
= \frac{4 \sigma_\varepsilon^2}{C \cdot T}$$`

The within-city demeaning kills `$\mu_c$`, so noise is driven by the idiosyncratic `$\sigma_\varepsilon^2$` over all `$C \cdot T$` city-periods.

**Effective `$n$`** ( `$n_{eff} = \sigma^2_{tot} / \text{Var}(\hat{\tau})$`, scaled to SRS units):

`$$n_{eff,\,cl} \;\approx\; \frac{\sigma^2_{tot}}{\sigma_\mu^2}\cdot\frac{C}{4} \;\sim\; O(C)
\qquad\text{vs.}\qquad
n_{eff,\,sb} \;\approx\; \frac{\sigma^2_{tot}}{\sigma_\varepsilon^2}\cdot\frac{CT}{4} \;\sim\; O(C \cdot T)$$`

Switchback gains a factor of `$T$` when `$\sigma_\mu^2$` is non-trivial (high ICC). With carryover, `$\sigma_\varepsilon^2$` is replaced by an inflated effective error and the ratio shrinks toward 1.
]

---
name: spillover-regression

# Backup: The Spillover Regression

Augment the DiD specification with exposure terms:

.small[
`$$Y_{it} = \mu_i + \delta\, I_{t=1} + \sum_{j=1}^{J}\tau_j\,(T_{ij}\, I_{t=1}) + \beta_0\,(S_i\, I_{t=1}) + \theta\,(S_i\, I_{t=1}\, \mathbb{1}\{T_i \neq 0\}) + \sum_{j=0}^{J}\psi_j\,(T_{ij}\, I_{t=1}\, P_i) + \varepsilon_{it}$$`
]

.pull-left[
.small[
**What each term identifies:**
- `$\tau_j$` — direct effect of own assignment to arm `$j$`
- `$\beta_0$` — spillover onto **control** studies from exposure to any treatment
- `$\theta = \beta_* - \beta_0$` — *additional* spillover when own arm is treated; sign tells you **complementarity** ( `$\theta>0$` ) vs. **substitution** ( `$\theta<0$` )
- `$\psi_j P_i$` — controls for total network degree, so spillover ≠ "well-connected studies are different" <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a>
]
]

.pull-right[
.small[
**Restrictions used to cut 16 `$\beta_{jk}$` down to 2** (count: 16 → 12 → 4 → 2):
1. No spillover from control exposure: `$\beta_{j0}=0$` for all `$j$` ⇒ kills 4 ⇒ **12 left**
2. Spillover doesn't depend on *which* T-arm exposed (for **all** `$j$`): `$\beta_{jk}=\beta_{j*}$` for `$k\in\{1,2,3\}$` ⇒ collapses 3 entries per row ⇒ **4 left** ($\beta_{0*},\beta_{1*},\beta_{2*},\beta_{3*}$)
3. Spillover on T-arms is the same across T-arms: `$\beta_{1*}=\beta_{2*}=\beta_{3*}\equiv\beta_*$` ⇒ **2 left** ($\beta_0\equiv\beta_{0*}$ and `$\beta_*$`)

Each restriction is testable: relax it, refit, compare `$\widehat\beta_0$` and `$\widehat\theta$`.
]
]

<a href="#spillover-validity" class="nav-btn">validity →</a>

---
name: spillover-validity

# Backup: Validity of Exposure Mapping

.small[
**What's identified for free (under random assignment):**
- `$T_{ij}$` is randomized ⇒ `$\tau_j$` is identified unconditionally.
- `$S_i$` is determined by *others'* (random) assignments + the **fixed** network structure ⇒ `$S_i$` is "as good as random" *conditional on the network*. So `$\beta_0$` is identified by within-arm variation in exposure. <a href="#spillover-id" class="inline-btn">proof</a>
]

.small[
**What you need on top:**
1. **Independent random assignment** of `$T_i$` across units (or known stratified assignment with conditioning).
2. **Correct exposure mapping.** `$S_i$` captures all relevant interference channels. If second-degree (indirect) ties or out-of-sample co-authors matter, `$\widehat\beta_0$` absorbs them — possibly with the wrong sign.
3. **Linearity / additivity.** Spillover scales linearly in `$S_i$` and degree enters through `$P_i$` alone. Saturation, threshold, or non-monotone effects, or unmodeled `$P_i \to Y$` channels, break this.
4. **Network exogeneity.** The co-author network must not respond to (anticipated) treatment (no endogenous network formation between `$t=0$` and `$t=1$`).
5. **Stable composition of `$P_i$`.** `$P_i$` must be measured the same way for everyone, so that controlling for it actually absorbs the "high-degree studies are different" channel. <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a>
]

.highlight-box[
**Trade-off vs. ego-cluster.** Exposure mapping uses the *whole* sample (no nodes dropped) and recovers spillover magnitudes — but identification rests on a parametric model of how spillovers propagate. Ego-cluster guarantees clean identification on a smaller sample. The right choice depends on whether you trust the network model or the partition.
]

<a href="#spillover-regression" class="nav-btn">← regression</a>

---
name: spillover-id

# Backup: Why `$\beta_0$` Is Identified

.small[
**Setup.** Treatment `$T_i \in \{0, 1, \dots, J\}$` assigned independently across `$i$`. Network `$\mathbf{M} = (M_{as})$` is fixed (measured pre-treatment). Define exposure as
`$$S_{ik} = \sum_{a \in A_i} \sum_{s \neq i} M_{as}\, \mathbb{1}\{T_s = k\}, \qquad S_i = \sum_{k \geq 1} S_{ik}, \qquad P_i = \sum_{k \geq 0} S_{ik}.$$`

Maintained outcome model:
`$$Y_i = \mu + \tau_{T_i} + \beta_0\, S_i + \theta\, S_i\, \mathbb{1}\{T_i \neq 0\} + \psi\, P_i + \varepsilon_i,\quad E[\varepsilon_i \mid T_i, S_i, P_i, \mathbf{M}] = 0.$$`
]

.small[
**Lemma (random exposure).** `$\;S_i \perp T_i \mid \mathbf{M}.$`

*Proof.* The inner sum runs over `$s \neq i$`, so `$S_i$` is a function of `$\{T_s : s \neq i\}$` and the fixed network `$\mathbf{M}$` alone. Independent assignment gives `$T_i \perp \{T_s : s \neq i\}$`, and `$\mathbf{M}$` is constant. Therefore `$S_i \perp T_i \mid \mathbf{M}$`. `$\quad\square$`

(Same argument applies to `$P_i$`: `$P_i \perp T_i \mid \mathbf{M}$`.)
]

.small[
**Identification of `$\beta_0$`.** Restrict to controls ( `$T_i = 0$` ):
`$$E[Y_i \mid T_i = 0, S_i, P_i, \mathbf{M}] \;=\; \mu + \tau_0 + \beta_0\, S_i + \psi\, P_i.$$`

OLS of `$Y$` on `$S_i$` **and** `$P_i$` within the control sub-sample is consistent for `$\beta_0$` whenever the partial variance `$\text{Var}(S_i \mid P_i, T_i = 0) > 0$` — guaranteed by the lemma plus a non-degenerate network ( <a href="#spillover-S-vs-P" class="inline-btn">S vs P</a> shows this partial variance equals `$\pi_T(1-\pi_T)\, P_i$`). Within-treated OLS gives `$\beta_0 + \theta$`, so `$\theta$` is identified. `$\quad\square$`
]

.small[
**What this *doesn't* prove.** (i) **Linearity of `$S_i$` and `$P_i$`** — nonlinear or threshold spillovers, indirect ties, or interactions break the result; OLS then recovers a best-linear-projection that is generally not the structural `$\beta_0$`. (ii) **Network exogeneity** — co-author networks must not respond to (anticipated) treatment. (iii) **Identifying variation may be small** — if `$\rho(S, P) \approx 1$` (dense networks, concentrated treatment fractions), `$\widehat{\beta_0}$` is consistent but high-variance.
]

<a href="#spillover-validity" class="nav-btn-br">← validity</a>

---
name: spillover-S-vs-P

# Backup: `$S_i$` vs `$P_i$` — Why Both, Despite Collinearity?

.small[
**Decomposition.** `$P_i = S_{i0} + S_i$`, where `$S_{i0}$` counts neighbors in the control arm. So under uniform 4-arm randomization with treatment fraction `$\pi_T = 3/4$`:
`$$E[S_i \mid P_i] = \pi_T \cdot P_i = 0.75\, P_i$$`
The expected `$S_i$` is a deterministic linear function of `$P_i$` — that's the source of the collinearity. The correlation `$\rho(S, P)$` is close to 1 in any sample with non-trivial `$P_i$`.
]

.pull-left[
.small[
**Where identifying variation lives.** Decompose `$S_i$` into structural and randomized components:
`$$S_i \;=\; \underbrace{\pi_T P_i}_{\text{structural (}\propto\text{degree)}} \;+\; \underbrace{(S_i - \pi_T P_i)}_{\text{randomized}}$$`
Conditional on `$P_i$`, the second piece has variance
`$$\text{Var}(S_i \mid P_i) = \pi_T(1-\pi_T)\, P_i \approx 0.19\, P_i$$`
driven entirely by *which arm each neighbor landed in*. That's the variation `$\widehat{\beta_0}$` uses.
]
]

.pull-right[
.small[
**Why include `$P_i$` anyway?** Drop `$P_i$` and `$\widehat{\beta_0}$` confounds two channels:
1. **Per-T-neighbor spillover** — the structural `$\beta_0$`.
2. **Degree effect** — well-connected studies may differ for unrelated reasons (more visibility, broader networks, prior relationships, productivity).

Including `$P_i$` partials out the degree channel; the residual in `$S_i \mid P_i$` is *exactly* the randomized neighbor-arm draw, which is the only variation `$\beta_0$` should be identified from.

**Cost:** collinearity inflates `$\text{SE}(\widehat{\beta_0})$`; power scales with `$\sqrt{n\,\bar P\,\pi_T(1-\pi_T)}$`. Identification requires both **many studies** *and* **non-trivial degree**.
]
]

<a href="#spillover-validity" class="nav-btn-br">← validity</a>

---
name: ego-cluster-formal

# Backup: Ego-Cluster Randomization

Define the **ego network** of unit `$i$` as `$\mathcal{N}_i = \{i\} \cup \{j : j \text{ is connected to } i\}$`.

**Ego-cluster randomization** assigns the same treatment to all units in `$\mathcal{N}_i$`:

`$$D_j = D_i \quad \forall j \in \mathcal{N}_i$$`

This ensures that, for unit `$i$`, all first-degree neighbors have the same treatment status. If interference is limited to first-degree connections:

`$$Y_i(D_i, \mathbf{D}_{\mathcal{N}_i}) = Y_i(D_i, D_i, \ldots, D_i) = Y_i(D_i)$$`

and SUTVA is restored within the ego cluster.

**Challenges:**
- Ego clusters overlap: if `$i$` and `$j$` are neighbors, `$\mathcal{N}_i \cap \mathcal{N}_j \neq \emptyset$`
- Must resolve overlaps (e.g., graph coloring, independent set sampling)
- Power depends on the number of *non-overlapping* ego clusters
- Second-degree interference is not addressed