class: center, middle, inverse, title-slide .title[ # Module 2: SUTVA and When It Breaks ] .subtitle[ ## Interference, Spillovers, and Why Your Experiment Might Be Lying ] --- <style type="text/css"> .remark-code, .remark-inline-code { font-size: 80%; } .remark-slide-content { padding: 1em 2em; } .small { font-size: 80%; } .tiny { font-size: 65%; } .highlight-box { background: #fff3e0; border-left: 4px solid #e65100; padding: 0.5em 1em; margin: 0.5em 0; } .blue-box { background: #e3f2fd; border-left: 4px solid #1565c0; padding: 0.5em 1em; margin: 0.5em 0; } .nav-btn { position: absolute; bottom: 12px; left: 40px; font-size: 11px; background: #e8eaf6; padding: 2px 8px; border-radius: 3px; z-index: 100; text-decoration: none; color: #1a237e; } .nav-btn:hover { background: #c5cae9; } </style> # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">The Experimental Ideal</a></td><td>✓ done</td></tr> <tr><td><b>2</b></td><td><b>SUTVA and When It Breaks</b> <i>(you are here)</i></td><td>current</td></tr> <tr><td>3</td><td>Designing Around Interference</td><td>upcoming</td></tr> <tr><td>4</td><td>Power and Sample Size</td><td>upcoming</td></tr> <tr><td>5</td><td><a href="../module-05/slides.html">Analyzing Experiments</a></td><td>✓ done</td></tr> <tr><td>6</td><td>Multiple Testing & Subgroups</td><td>upcoming</td></tr> <tr><td>7</td><td><a href="../module-07/slides.html">External Validity</a></td><td>✓ done</td></tr> <tr><td>8</td><td><a href="../module-08/slides.html">Beyond the A/B Test</a></td><td>✓ done</td></tr> </table> --- # Last Time: The Experimental Ideal Module 1 showed that randomization kills selection bias: `$$E[Y \mid D=1] - E[Y \mid D=0] = \text{ATE}$$` -- But this result relied on an assumption we barely mentioned: .highlight-box[ **SUTVA (Stable Unit Treatment Value Assumption):** Each unit's outcome depends *only* on its own treatment assignment, not on anyone else's. ] Today: what happens when SUTVA breaks, why it almost always breaks in marketplace experiments, and how to detect the problem. --- # The Setup: Zone-Notification Experiment (Again) Same experiment as Module 1: randomize 500 drivers to receive the zone notification, 500 to control. Measure accept rate. But now let's think about what happens in the **marketplace**... -- - Notified drivers all head toward the **same high-demand zone** - The zone has limited ride requests — more drivers there means **more competition for each ride** - Each notified driver takes fewer rides than the notification *alone* would give them - Non-notified drivers stay in their original areas, largely unaffected -- .highlight-box[ The **treated group** outcome is *contaminated* by the treatment itself (crowding). The naive ATE will be **biased downward** — it captures the direct notification effect MINUS the crowding among the treated. ] --- name: sutva-main # SUTVA: Stated Precisely **Component 1 — No interference:** `$$Y_i = Y_i(D_i) \quad \text{for all } i$$` Unit `\(i\)`'s outcome depends only on `\(i\)`'s treatment. Not on `\(D_j\)` for any `\(j \neq i\)`. -- **Component 2 — No hidden versions of treatment:** If `\(D_i = D_i' = 1\)`, then `\(Y_i(D_i) = Y_i(D_i')\)`. There is one "treated" state and one "control" state. -- When SUTVA holds: `\(2\)` potential outcomes per unit `\((Y_i(0), Y_i(1))\)`. When SUTVA fails: potential outcomes depend on the **entire** treatment vector `\(\mathbf{D} = (D_1, \ldots, D_N)\)`. .highlight-box[ With `\(N\)` units, there are `\(2^N\)` possible treatment vectors. Each unit has `\(2^N\)` potential outcomes instead of `\(2\)`. The ATE is no longer well-defined without specifying `\(\mathbf{D}\)`. ] <a href="#sutva-formal" class="nav-btn">Formal notation</a> --- name: sim-main # Seeing the Bias: A Simulation <img src="slides_files/figure-html/interference-sim-1.png" style="display: block; margin: auto;" /> <a href="#sim-dgp" class="nav-btn">DGP behind this plot</a> --- # Where Does the Bias Come From? .pull-left[ **Violated SUTVA component: *no interference*.** SUTVA requires `\(Y_i = Y_i(D_i)\)` — the outcome depends only on `\(i\)`'s own treatment. Here it doesn't: `$$Y_i(1, \mathbf{D}_{-i}) \;\neq\; Y_i(1, \mathbf{D}_{-i}')$$` **Mechanism:** the more drivers are notified, the more crowded the zone — so each treated driver gets fewer rides. `\(Y_i(1)\)` depends on how many *others* are treated. **What stays put:** control drivers don't go to the zone, so `\(Y_i(0)\)` is unaffected. So `\(E[Y \mid D=1]\)` is pulled down while `\(E[Y \mid D=0]\)` is fine → naive ATE is biased *downward*. ] .pull-right[ <img src="slides_files/figure-html/bias-direction-1.png" style="display: block; margin: auto;" /> ] --- # The Bias Grows with Interference Strength <img src="slides_files/figure-html/bias-vs-strength-1.png" style="display: block; margin: auto;" /> In a real marketplace, interference strength depends on **market thickness** (how tight supply is), **substitutability**, and **network density**. --- # Hidden Versions of Treatment SUTVA also requires **one version** of treatment. This can fail when: -- .pull-left[ **The treatment differs by context:** - Zone notification in San Francisco vs. rural Iowa - A new routing algorithm in a thick market vs. thin market - "Same" notification arrives during morning rush vs. midday ] -- .pull-right[ **The treatment differs by implementation:** - New driver app version with different bugs on iOS vs. Android - "Same" notification delivered via in-app banner vs. push - Treatment intensity varies (some drivers get re-notified, others don't) ] -- .highlight-box[ **Interview framing:** "The treatment effect is 5%." Which treatment? For whom? Delivered how? If the answer is "it depends," you may have a hidden versions problem. ] --- # Why Interviewers Love This Topic In tech, the three most common SUTVA violations are: -- **1. Marketplace interference** (ride-sharing, food delivery, e-commerce) - Supply and demand are linked: treating one side affects the other - "We tested a new routing algorithm on 10% of riders" — but those riders affect driver allocation for everyone -- **2. Network interference** (social media, communication tools) - Treating user A changes what user B sees in their feed - Viral features: a treated user shares content with control users -- **3. Shared resources** (cloud infrastructure, ad auctions) - Bidding experiments: treating some advertisers changes auction outcomes for all - Capacity experiments: allocating more servers to treated users reduces capacity for control --- # Application: Marketplace Interference in Detail <img src="slides_files/figure-html/marketplace-diagram-1.png" style="display: block; margin: auto;" /> --- # Application: Co-Author Network Spillovers Consider an experiment on academic researchers aimed at **increasing the reporting of results from pre-registered studies**. 300 studies are randomized into treatment arms T0–T3 (T0 = control; T1–T3 are progressively stronger nudges). Each study has 1–3 authors drawn from a shared pool of 1,000. <img src="slides_files/figure-html/coauthor-network-1.png" style="display: block; margin: auto;" /> --- # Co-Author Spillovers: The Problem <img src="slides_files/figure-html/spillover-bias-1.png" style="display: block; margin: auto;" /> Control studies connected to treated co-authors show **higher** reporting. The naive treatment effect **underestimates** the true effect (spillovers help controls). --- # Partial vs General Equilibrium A crucial distinction for marketplace experiments: -- .pull-left[ **Partial equilibrium** (small experiment) - Notify 1% of drivers - Zone barely moves (few drivers chasing the same rides) - Estimate `\(\approx\)` direct effect of notification - Interference is negligible ] -- .pull-right[ **General equilibrium** (full rollout) - Notify 100% of drivers - Zone saturates: everyone goes there → each gets fewer rides → effect shrinks - The effect you measured at 1% **does not predict** the rollout outcome ] -- .highlight-box[ **The core problem:** experiments estimate partial equilibrium effects. Business decisions require general equilibrium predictions. The gap between them can be large in thick markets. ] --- # Partial vs GE: Simulated <img src="slides_files/figure-html/partial-ge-1.png" style="display: block; margin: auto;" /> --- # Detecting SUTVA Violations How do you know if interference is a problem in your experiment? -- **1. Domain knowledge:** Ask "does my treatment change the environment for other units?" - Marketplace: does treating drivers change the supply/demand they face? - Network: do treated and control users interact? -- **2. Check for dose-response in control outcomes:** - If control units in markets with many treated units have different outcomes than control units in markets with few treated units, interference is present. -- **3. Vary the treatment fraction:** - Run the experiment at different saturation levels (e.g., 10% vs 50% treated) - If the estimated ATE differs, SUTVA is violated -- **4. Look at treated-group outcomes over time:** - If treated outcomes *worsen* as more units get treated (saturation), crowding is contaminating the treated group. --- # How Treatment Fraction Reveals Interference <img src="slides_files/figure-html/saturation-test-1.png" style="display: block; margin: auto;" /> .highlight-box[ If treated outcomes vary with treatment saturation, SUTVA is violated. The treated-control gap shrinks even though the *direct* effect is constant. ] --- # Summary: Types of SUTVA Violations | Type | Mechanism | Bias direction | Example | |------|-----------|---------------|---------| | **Marketplace** | Supply/demand reallocation | Depends on which side gets crowded (treated → underestimate; control → overestimate) | Zone notification crowds treated drivers | | **Network** | Information/behavior spreads | Underestimates ATE (control group "learns" from treated peers) | Co-author learns about reporting | | **Shared resource** | Capacity competition | Depends on which group loses capacity | Server allocation experiment | | **General equilibrium** | Prices/wages adjust | Depends on the sign of the equilibrium response (price/wage shift) | Full rollout changes market structure | -- .highlight-box[ **The key question for any experiment:** "If I treat more units, does anyone else's outcome change?" If yes, SUTVA is violated. ] --- # What Can We Do About It? Preview of Module 3: -- 1. **Cluster randomization** — randomize at a level where interference is contained (cities, markets, time periods) -- 2. **Switchback designs** — alternate treatment on/off over time within the same unit -- 3. **Two-sided marketplace designs** — randomize on one side, measure effects on the other -- 4. **Ego-cluster randomization** — for network experiments, randomize the cluster around each unit -- Each approach trades **bias** for **variance**. Bigger clusters reduce interference bias but give you fewer independent units. --- # Key Takeaways 1. **SUTVA** = no interference + no hidden versions of treatment. It's an assumption, not a fact. 2. In **marketplaces**, treating one side (drivers) almost always changes the supply/demand balance. Naive ATE is biased. 3. In **networks**, treatment spills over through connections. Co-authors, friends, neighbors can all be contaminated. 4. The **direction** of bias depends on whether spillovers hurt or help the treated and control groups. 5. **Partial vs general equilibrium**: your small experiment estimates a partial equilibrium effect. The full rollout will be different. 6. **Detection**: vary the treatment fraction, check outcomes against treatment saturation, use domain knowledge. --- # Exercise Preview In the exercise you will: 1. Simulate a marketplace with and without interference 2. Show that interference biases the naive ATE (direction depends on who crowds whom) 3. Sweep over interference strengths and see the bias grow 4. Build a co-author network and simulate spillover contamination 5. Compare partial vs general equilibrium predictions See `exercise.R` for the starter code. --- class: center, middle, inverse # Backup Slides --- name: sim-dgp # Backup: The DGP Behind Slide 6 One marketplace draw (the Monte Carlo on slide 6 replicates this 1,000× per interference strength): ```r sim_marketplace <- function(interference_strength = 0.15, frac_treated = 0.5) { # 1. Random assignment with the chosen treated share n_t <- round(n * frac_treated) notification <- sample(c(rep(1, n_t), rep(0, n - n_t))) # 2. Potential outcomes (probabilities). Crowding shrinks y1 by the # treated share; control drivers (y0) are unaffected. y0 <- pmin(1, pmax(0, 0.4 + 0.2 * drivers$experience)) y1 <- pmin(1, pmax(0, 0.4 + 0.2 * drivers$experience + 0.05 - interference_strength * frac_treated)) # 3. Realize observed outcome based on assignment, compute naive ATE y_obs <- rbinom(n, 1, prob = if_else(notification == 1, y1, y0)) mean(y_obs[notification == 1]) - mean(y_obs[notification == 0]) } ``` Slide 6 plot: `replicate(1000, sim_marketplace(0.15, 0.5))` vs `replicate(1000, sim_marketplace(0.15, 0.7))`. <a href="#sim-main" class="nav-btn">← back</a> --- name: sutva-formal # Backup: SUTVA Formal Notation Without SUTVA, each unit's outcome depends on the full treatment vector ( `\(2^N\)` potential outcomes): `$$Y_i = Y_i(\mathbf{D}) = Y_i(D_1, D_2, \ldots, D_N)$$` SUTVA simplifies this to `\(Y_i(\mathbf{D}) = Y_i(D_i)\)` — just 2 potential outcomes per unit. -- With SUTVA, observed outcome and ATE are well-defined: `$$Y_i^{obs} = D_i \cdot Y_i(1) + (1 - D_i) \cdot Y_i(0), \qquad \tau = E[Y_i(1) - Y_i(0)]$$` -- Without SUTVA, the "average direct effect" depends on everyone else's assignment `\(\mathbf{d}_{-i}\)`: `$$\tau(\mathbf{d}_{-i}) = E[Y_i(1, \mathbf{d}_{-i}) - Y_i(0, \mathbf{d}_{-i})]$$` <a href="#sutva-main" class="nav-btn">← back</a> --- name: interference-types # Backup: Taxonomy of Interference **Direct interference:** unit `\(j\)`'s treatment directly changes unit `\(i\)`'s outcome. - Example: vaccinating `\(j\)` reduces `\(i\)`'s infection probability. -- **Indirect interference (through market/prices):** treatment changes an equilibrium variable that affects everyone. - Example: subsidizing some sellers lowers the market price, affecting all sellers. -- **Behavioral interference:** treatment changes `\(j\)`'s behavior, and `\(i\)` observes and responds. - Example: co-author `\(j\)` learns about reporting standards and shares this knowledge. -- **Mechanical interference:** the experimental design itself creates interference. - Example: in a waitlist design, adding someone to treatment removes them from control, changing the control group composition. --- name: ge-formal # Backup: Partial vs General Equilibrium — Formal Let `\(p\)` denote a market-clearing price and `\(\alpha\)` the fraction treated. **Partial equilibrium** (small `\(\alpha\)`): price is approximately fixed at `\(p_0\)`. `$$\hat{\tau}_{PE} = E[Y_i(1, p_0)] - E[Y_i(0, p_0)]$$` -- **General equilibrium** (large `\(\alpha\)`): price adjusts to `\(p(\alpha)\)`. `$$\tau_{GE}(\alpha) = E[Y_i(1, p(\alpha))] - E[Y_i(0, p(\alpha))]$$` -- The experiment estimates `\(\hat{\tau}_{PE}\)`, but the rollout delivers `\(\tau_{GE}(1)\)`. The gap depends on how much `\(p\)` moves and how sensitive outcomes are to `\(p\)`: `$$\hat{\tau}_{PE} - \tau_{GE}(1) = \underbrace{(p_0 - p(1))}_{\text{price change}} \times \underbrace{\frac{\partial}{\partial p}[E[Y_i(1,p)] - E[Y_i(0,p)]]}_{\text{differential price sensitivity}}$$` This is a first-order approximation. In practice, you often cannot sign this without structural modeling.