class: center, middle, inverse, title-slide .title[ # Module 4: Algorithmic Audits ] .subtitle[ ## Pandey-Caliskan, Chen-Mislove-Wilson, and How to Audit a Black-Box Pricing System ] --- <style type="text/css"> .remark-code, .remark-inline-code { font-size: 80%; } .remark-slide-content { padding: 1em 2em; } .small { font-size: 80%; } </style> # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">Theory Primer</a></td><td>✓ done</td></tr> <tr><td>2</td><td><a href="../module-02/slides.html">Audit & Correspondence Studies</a></td><td>✓ done</td></tr> <tr><td>3</td><td><a href="../module-03/slides.html">Decomposition Methods</a></td><td>✓ done</td></tr> <tr><td><b>4</b></td><td><b>Algorithmic Audits</b> <i>(you are here)</i></td><td>← current</td></tr> <tr><td>5</td><td>Modern Methods</td><td>upcoming</td></tr> </table> --- # A New Kind of Gatekeeper The audit studies in Module 2 targeted **people**. The decomposition in Module 3 looked at **observational gaps**. This module is about auditing **algorithms**. -- The algorithm is now the gatekeeper. It sets prices, dispatches drivers, matches riders, suggests pay rates. You can audit it from the outside (scraped data) or the inside (full access + the ability to ship fixes). -- Two key papers for this module: - **Pandey & Caliskan (2021)** — disparate impact in Chicago ride-hail pricing - **Chen, Mislove & Wilson (2015)** — *Peeking Beneath the Hood of Uber*, the methodological template --- # What's Different About Algorithmic Audits **1. The "manipulation" is querying the system, not deceiving a person.** No fraud, no IRB headache. -- **2. You can run *millions* of queries.** Pandey-Caliskan analyze ~100M Chicago trips. -- **3. The "matched pair" is a counterfactual query.** "What would the algorithm have done if everything were the same except this one feature?" -- **The catch:** you don't usually *control* the inputs. You observe what real users get. So you're back in regression-and-controls territory, with all the bad-controls problems that implies. --- # Pandey & Caliskan (2021) **Title:** *Disparate Impact of AI Bias in Ridehailing Economy's Price Discrimination Algorithms* **Setting:** Chicago. The city publishes anonymized trip data: fare, distance, duration, pickup/dropoff census tracts. -- **Audit question:** *Are pickup or dropoff areas with higher minority populations charged more per mile for equivalent trips?* -- **Method (shadow audit):** 1. Compute fare-per-mile per trip 2. Merge trips with census-tract demographics 3. Regress fare-per-mile on demographic features, controlling for trip observables -- **Result:** Trips with pickup or dropoff in higher-minority census tracts have **higher fare-per-mile**, even after controls. --- # The Implicit Mechanism Uber's surge-pricing algorithm uses real-time supply/demand signals. **The algorithm never reads "race."** But: -- - In neighborhoods with thinner driver supply, prices surge more often -- - Driver supply is correlated with neighborhood demographics (historical patterns + current urban form) -- - So the surge signal is correlated with race even though the algorithm is race-blind -- This is **statistical discrimination by an algorithm**, mediated by a proxy. **Phelps's model running in real time.** --- # What Pandey-Caliskan Identifies (and Doesn't) | What | Yes? | |---|---| | Correlation between demographics and price-per-mile | **Yes** | | Documented disparate impact | **Yes** | | Causal mechanism | **No** | | Policy fix | **No** | -- The paper produces a quantitative claim that can be debated by regulators. That's its job. It is not the final word on whether the algorithm is "discriminatory" — that's a normative call. --- # Chen, Mislove & Wilson (2015) **Title:** *Peeking Beneath the Hood of Uber* (IMC '15) The original surge-pricing audit. Not about discrimination per se, but the **methodology** is the template for everything that followed. -- **The setup:** - Built a measurement infrastructure around Uber's app - Polled surge multipliers and driver counts at hundreds of locations every few seconds, for weeks - Recorded the spatial structure of surge -- **Findings:** - Surge cells are tiny (~0.5 km × 0.5 km) - Surge changes rapidly (5–10 min durations) - Surge increases supply only marginally — most rebalancing comes from rider cancellations -- **Methodological contribution:** *you can audit a black-box pricing system using only the public-facing API*. Subsequent algorithmic audits have all built on this template. --- # How to Audit a Pricing Algorithm: 4 Steps **Step 1.** Define the disparate-impact question precisely. *Whose price? Per what? Conditional on what? Compared to what?* -- **Step 2.** Pick the data source. Internal (trip log + dispatch log + linked Census) or external (scraped surge / public city data). -- **Step 3.** Pick the comparison. - Within-trip counterfactual ("if this trip were in a different neighborhood") - Within-rider FE - Cross-sectional regression (Pandey-Caliskan style) -- **Step 4.** Report carefully. Point estimate + SE + robustness + power + interpretation. Honest reports always include all five. --- # External vs Internal Audits | | External | Internal | |---|---|---| | Sample | Small, biased | Universe of decisions | | Variables | Public observables | Everything the system uses | | Counterfactuals | Hard (need a model) | Easy (run an A/B test) | | Reportability | Free to publish | Constrained | | Can ship fixes? | No | Yes | -- The interesting move for an **internal economist**: bring the external auditor's questions in-house and run them on the full data with the ability to ship the fix. --- class: inverse, center, middle # Exercise ### Stylized Surge-Pricing Audit --- # Setup: 30 Neighborhoods, Different Demographics ```r set.seed(2026) n_nbhd <- 30 city <- tibble( neighborhood = paste0("N", sprintf("%02d", 1:n_nbhd)), pct_minority = runif(n_nbhd, 0.05, 0.90), driver_supply = pmax(8 - 6 * pct_minority + rnorm(n_nbhd, 0, 0.5), 0.5), demand = rnorm(n_nbhd, 10, 1.5) ) |> mutate( surge_multiplier = pmax(1, demand / driver_supply), fare_per_mile = 1.50 * surge_multiplier ) ``` The pricing rule is `pmax(1, demand/supply)`. **Race is not in this formula.** But supply correlates with race because of how the city is set up. --- # The Audit Regression ```r audit_fit <- lm(fare_per_mile ~ pct_minority, data = city) ``` |term | estimate| std.error| statistic| p.value| |:------------|--------:|---------:|---------:|-------:| |(Intercept) | 1.4639| 0.1538| 9.5192| 0| |pct_minority | 3.2702| 0.3818| 8.5646| 0| -- Significant positive coefficient on `pct_minority` despite the algorithm never seeing race — the disparate impact is mediated entirely through the supply channel. --- # The Bad-Controls Trap (Again) ```r audit_ctrl <- lm(fare_per_mile ~ pct_minority + driver_supply, data = city) ``` |term | estimate| std.error| statistic| p.value| |:-------------|--------:|---------:|---------:|-------:| |(Intercept) | 6.7962| 1.0641| 6.3871| 0.0000| |pct_minority | -0.6735| 0.8309| -0.8107| 0.4246| |driver_supply | -0.6583| 0.1306| -5.0395| 0.0000| -- Adding `driver_supply` as a control kills the demographic coefficient — *because driver_supply is the mechanism*. The controlled regression hides the disparate impact instead of explaining it. --- # Two Policy Fixes |Policy | DI ratio (high/low)| Total revenue| |:-----------------|-------------------:|-------------:| |Surge as-is | 1.465| 769.9| |Cap surge at 1.5× | 1.138| 613.5| |No surge at all | 1.000| 438.0| -- Closing the disparate-impact gap costs revenue. The math draws the curve; **humans pick the point**. --- class: inverse # The Key Takeaways <br> ### 1. Algorithmic audits can be run at scale, on real data, with low ethical cost — they are the dominant modern methodology. -- <br> ### 2. Pandey-Caliskan documents disparate impact in ride-hail pricing without claiming a causal mechanism. -- <br> ### 3. The bad-controls problem follows you into the algorithmic world: regressions with mediator controls hide the very mechanism you're trying to find. --- # Course Map <table> <tr><th>#</th><th>Module</th><th>Status</th></tr> <tr><td>1</td><td><a href="../module-01/slides.html">Theory Primer</a></td><td>✓ done</td></tr> <tr><td>2</td><td><a href="../module-02/slides.html">Audit & Correspondence Studies</a></td><td>✓ done</td></tr> <tr><td>3</td><td><a href="../module-03/slides.html">Decomposition Methods</a></td><td>✓ done</td></tr> <tr><td>4</td><td>Algorithmic Audits <i>(just finished)</i></td><td>✓ done</td></tr> <tr><td><b>5</b></td><td><b>Modern Methods & Practitioner</b></td><td>next</td></tr> </table> Say **"start module 5"** when ready.