class: center, middle, inverse, title-slide .title[ # Module 2: Audit & Correspondence Studies ] .subtitle[ ## Ge et al. (2016) and the Workhorse of the Modern Literature ] --- <style type="text/css"> .remark-code, .remark-inline-code { font-size: 80%; } .remark-slide-content { padding: 1em 2em; } .small { font-size: 80%; } </style> # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">Theory Primer</a></td><td>✓ done</td></tr> <tr><td><b>2</b></td><td><b>Audit & Correspondence Studies</b> <i>(you are here)</i></td><td>← current</td></tr> <tr><td>3</td><td>Decomposition Methods</td><td>upcoming</td></tr> <tr><td>4</td><td>Algorithmic Audits</td><td>upcoming</td></tr> <tr><td>5</td><td>Modern Methods</td><td>upcoming</td></tr> </table> --- # Why Audit Studies Exist Module 1 ended with a frustration: **regression with controls cannot distinguish** > "no discrimination" from "discrimination perfectly mediated by a proxy" -- If race influences neighborhood and neighborhood influences acceptance, then *controlling for neighborhood* removes the very channel through which race operates. -- The fix: **manufacture variation in the protected attribute** that is uncorrelated with everything else. -- That's exactly what audit studies do. They are the only design that gives a clean causal estimate without strong structural assumptions. --- # The Matched-Pair Audit Design 1. Construct two profiles, `\(A\)` and `\(B\)`, **identical on everything observable** except the protected attribute (or its proxy) 2. Submit both to the same gatekeeper (employer, driver, landlord) 3. Record the decision (interview / acceptance / approval) 4. Compare rates -- **Identification:** the only difference between profiles is the protected attribute, so any difference in outcomes is causally attributable to it. **No controls needed** — the design handles confounding by construction. --- # The Canonical Example: Bertrand-Mullainathan (2004) **Setting:** ~5,000 résumés sent in response to job postings in Boston and Chicago. -- **Randomization:** - **Name** (white-sounding: Emily, Greg vs African-American–sounding: Lakisha, Jamal) - **Quality** (high vs low credentials) -- **Result:** white-named résumés got **~50% more callbacks** than identical Black-named résumés. -- **Bonus finding:** the "credential premium" was much larger for white-named résumés. Black candidates couldn't compensate via résumé quality. --- # Ge, Knittel, MacKenzie, Zoepf (2016) The canonical ride-sharing audit study. The **centerpiece of this module**. -- **Setting:** Boston and Seattle, on UberX and Lyft. **Design:** RAs created rider profiles with stereotypically white and stereotypically African-American–sounding names. They followed pre-specified routes and submitted ride requests in matched pairs. -- **Sample:** ~1,500 ride requests across the two cities. **Outcomes measured:** cancellation rate, wait time, trip time, cost. --- # The Headline Results | Outcome | Setting | White | African-American | Ratio | |---|---|---|---|---| | Cancellation rate | Boston, UberX, male | 4.9% | 10.1% | **~2×** | | Wait time | Seattle, all | baseline | +30% | longer | | Travel time | Boston, female | baseline | longer | (indirect routes) | -- **The discrimination is large, statistically significant, and consistent across cities and profiles.** -- The differences look small in absolute terms (5pp on cancellation), but for the affected riders this is a meaningfully degraded service experience. --- # The Clever Bit: Seattle In Seattle, Uber **doesn't show rider names or photos** to drivers before the driver accepts the request. -- The discrimination still happens. **How?** -- The likely answer: drivers are using **pickup neighborhood** as a proxy for the rider's race, and the destination address (which they see after accepting) for further inference. -- This is **exactly the Phelps story** from Module 1, embedded in real data: - Drivers don't see race directly - They use a correlated feature (location) to infer it - The result is racially disparate outcomes via a "race-blind" matching mechanism -- This finding is what made the paper a foundational reference for the algorithmic-fairness literature that came after. --- # What Audits Identify (and Don't) | Question | Audit answers? | |---|---| | Is there a *causal* effect of perceived group membership? | **Yes** | | Is the discrimination taste-based or statistical? | **No** | | Does the discrimination persist in equilibrium? | **No** | | Would policy X eliminate it? | **No** | | Is the gap "fair" by some normative criterion? | **No** | | What's the aggregate welfare loss? | **No** | -- The right way to read an audit study: **clean documentation that the discrimination exists and is large**, *without* strong claims about mechanism or policy. --- # Standard Threats to Validity **1.** The manipulation must work (gatekeeper sees what we intended) -- **2.** Profiles must be matched on everything else (subtle confounders like socioeconomic inferences from names) -- **3.** The sample of gatekeepers must be representative (one city ≠ the world) -- **4.** The decision being measured must be the one that matters (callback ≠ hire ≠ wage) -- **5.** Ethical / legal: audit studies essentially defraud the gatekeeper. IRB approval is non-trivial; some jurisdictions push back. --- class: inverse, center, middle # Exercise ### A Stylized Ge et al. Replication --- # Setup ```r set.seed(2026) n_rides_per_group <- 500 baseline_cancel_A <- 0.05 discrimination <- 0.05 # B sees +5pp cancellation rate audit <- bind_rows( tibble(profile = "A", cancelled = rbinom(n_rides_per_group, 1, baseline_cancel_A)), tibble(profile = "B", cancelled = rbinom(n_rides_per_group, 1, baseline_cancel_A + discrimination)) ) audit |> group_by(profile) |> summarise(n = n(), cancellation = round(mean(cancelled), 3), .groups = "drop") |> knitr::kable() ``` |profile | n| cancellation| |:-------|---:|------------:| |A | 500| 0.040| |B | 500| 0.112| --- # Estimating the Audit Effect ```r counts <- audit |> group_by(profile) |> summarise(x = sum(cancelled), n = n(), .groups = "drop") prop_test <- prop.test(x = counts$x, n = counts$n, correct = FALSE) gap_estimate <- diff(rev(prop_test$estimate)) ci <- c(-prop_test$conf.int[2], -prop_test$conf.int[1]) ``` -- |Statistic |Value | |:---------------------|:----------------| |Estimated gap (B - A) |-0.072 | |95% CI |[0.0395, 0.1045] | |p-value |1.74e-05 | The point estimate matches the true effect (≈ 0.05), and the CI excludes zero. --- # How Many Rides Do You Need? (Power Calculation) ```r power_test <- power.prop.test( p1 = 0.05, p2 = 0.10, sig.level = 0.05, power = 0.8 ) ceiling(power_test$n) ``` ``` ## [1] 435 ``` -- To detect a **5 pp gap** at 80% power, you need ~470 rides **per profile**, or ~940 total. -- For a more realistic **2 pp gap**: ``` ## 2213 per group → 4426 total ``` That's why most field audit studies are **expensive** — and why algorithmic / scraped audits (Module 4) became dominant. --- # Power vs Sample Size (Monte Carlo) <img src="slides_files/figure-html/power-curve-1.png" style="display: block; margin: auto;" /> --- class: inverse # The Key Takeaways <br> ### 1. Audits are the only design that gives clean causal identification of discrimination without strong structural assumptions. -- <br> ### 2. Ge et al. (2016) is the canonical ride-sharing example. Cancellation rates 2× higher for African-American riders. -- <br> ### 3. The Seattle finding showed that even "anonymized" matching produces racial disparities via correlated features — the empirical anchor for the algorithmic-fairness literature. --- # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">Theory Primer</a></td><td>✓ done</td></tr> <tr><td>2</td><td>Audit & Correspondence Studies <i>(just finished)</i></td><td>✓ done</td></tr> <tr><td><b>3</b></td><td><b>Decomposition Methods (Cook et al. 2021)</b></td><td>next</td></tr> <tr><td>4</td><td>Algorithmic Audits</td><td>upcoming</td></tr> <tr><td>5</td><td>Modern Methods</td><td>upcoming</td></tr> </table> Say **"start module 3"** when ready.