Module 7: Fairness Frameworks & Metrics

class: center, middle, inverse, title-slide

.title[
# Module 7: Fairness Frameworks & Metrics
]
.subtitle[
## When You Can’t Be Fair in More Than One Way
]

---

.metrics-ref {
  position: absolute;
  bottom: 12px;
  right: 90px;
  background: #fff8dc;
  border: 1px solid #d4b400;
  border-radius: 4px;
  padding: 1px 6px;
  font-size: 11px;
  text-decoration: none;
  color: #6b5a00;
  font-weight: bold;
  z-index: 100;
}
.metrics-ref:hover {
  background: #ffeeaa;
  text-decoration: none;
}
.cm-corner {
  position: absolute;
  top: 90px;
  right: 40px;
  border-collapse: collapse;
  font-size: 90%;
}
.cm-corner th, .cm-corner td {
  border: 1px solid #888;
  padding: 8px 14px;
  text-align: center;
  font-weight: bold;
}
.cm-tp { background: #c8e6c9; color: #1b5e20; }
.cm-fn { background: #ffe0b2; color: #e65100; }
.cm-fp { background: #ffcdd2; color: #b71c1c; }
.cm-tn { background: #bbdefb; color: #0d47a1; }
</style>

# Course Map

<table>
<tr><th>#</th><th>Module</th><th>Status</th></tr>
<tr><td>1</td><td><a href="../module-01/slides.html">The Learning Problem</a></td><td>✓ done</td></tr>
<tr><td>2</td><td><a href="../module-02/slides.html">Linear Models</a></td><td>✓ done</td></tr>
<tr><td>3</td><td><a href="../module-03/slides.html">Model Evaluation &amp; Selection</a></td><td>✓ done</td></tr>
<tr><td>4</td><td>Tree-Based Methods</td><td>upcoming</td></tr>
<tr><td>5</td><td>Unsupervised Learning</td><td>upcoming</td></tr>
<tr><td>6</td><td>Neural Networks Foundations</td><td>upcoming</td></tr>
<tr><td><b>7</b></td><td><b>Fairness Frameworks &amp; Metrics</b> <i>(you are here)</i></td><td>← current</td></tr>
<tr><td>8</td><td>Auditing &amp; Interpretability</td><td>upcoming</td></tr>
</table>

---

# From Module 3: We Saw the Gaps

Module 3's audit showed that the same model could have:

- Equal AUC across groups
- Different precision, recall, FPR per group

That tells us **the model isn't behaving the same for everyone**. But it doesn't tell us:

- Which gap matters?
- What does "fair" actually mean?
- Can we close all the gaps at once?

This module gives you the formal answers, and one **mathematical impossibility result** that says some of those answers exclude each other.

---

# The Setup

For a binary classifier:

- `$Y \in \{0, 1\}$` — actual outcome (e.g. driver accepted)
- `$\hat{Y} \in \{0, 1\}$` — model's prediction
- `$A$` — protected attribute (e.g. minority status), used **only** for auditing

A "fairness criterion" is some equality between groups defined on `$Y$`, `$\hat{Y}$`, and `$A$`.

There are several. The famous ones turn out to be **mutually incompatible**.

---

# Three Fairness Criteria

| Criterion | Equality required |
|---|---|
| **Demographic parity** | `$P(\hat Y = 1 \mid A = 0) = P(\hat Y = 1 \mid A = 1)$` |
| **Equalized odds** | `$P(\hat Y = 1 \mid Y = y, A = 0) = P(\hat Y = 1 \mid Y = y, A = 1)$` for `$y \in \{0, 1\}$` |
| **Predictive parity** | `$P(Y = 1 \mid \hat Y = 1, A = 0) = P(Y = 1 \mid \hat Y = 1, A = 1)$` |

Each one says something different about what "fair" means. Let's read each.

---

# 1. Demographic Parity

`$$P(\hat Y = 1 \mid A = 0) = P(\hat Y = 1 \mid A = 1)$$`

**The model says "accept" at the same rate for both groups.**

- Cares only about the model's outputs, **ignores the actual labels**
- Makes sense when the positive decision is itself the resource (a job offer, a loan, an Uber dispatch)

**Failure mode:** if base rates genuinely differ, hitting equal positive rates forces the model to *miss real positives* in one group or *flag real negatives* in the other.

---

# 2. Equalized Odds

`$$P(\hat Y = 1 \mid Y = 1, A = 0) = P(\hat Y = 1 \mid Y = 1, A = 1) \quad \text{(equal TPR)}$$`
`$$P(\hat Y = 1 \mid Y = 0, A = 0) = P(\hat Y = 1 \mid Y = 0, A = 1) \quad \text{(equal FPR)}$$`

**Among the actually-positive, catch the same fraction in each group. Among the actually-negative, falsely flag at the same rate.**

- Allows different positive rates *only if the truth says so*
- Weaker version, **equal opportunity**, only requires equal TPR

**Makes sense when:** there's a real outcome you care about and missing it has direct human cost (medical, fraud, dispatch).

**Failure mode:** assumes the labels `$Y$` are an unbiased measure of the truth. If `$Y$` itself is a biased proxy (e.g., past *arrests* used as a stand-in for past *crimes*), enforcing equal TPR/FPR just reproduces the labelling bias — and the only fix is post-processing with group-specific thresholds, which uses the protected attribute at decision time.

---

# 3. Predictive Parity

`$$P(Y = 1 \mid \hat Y = 1, A = 0) = P(Y = 1 \mid \hat Y = 1, A = 1)$$`

**A "positive" prediction means the same thing in each group** — equal precision.

**Worked example.** A driver-acceptance model flags 200 ride requests as "will accept" in each group. After observing actual outcomes:

| | Predicted accept | Actually accepted | Precision |
|---|---|---|---|
| Non-minority | 200 | 170 | 170/200 = **0.85** |
| Minority     | 200 | 168 | 168/200 = **0.84** |

These precisions are essentially equal → **predictive parity holds**. When the model says "accept", it's right ~85% of the time *regardless of group*.

- **Stronger version:** *calibration within groups*. At every probability score `$s$`, the actual rate equals `$s$` in **both** groups (not just at the threshold).
- This is exactly the per-group calibration check from Module 3 — re-read as a fairness criterion.

**Failure mode:** says nothing about the *innocent* (the negatives). Two models can have identical precision yet flag innocent minority riders at twice the rate — predictive parity is satisfied while equalized odds is not.

---

# The Impossibility Theorem

> **Chouldechova (2017) / Kleinberg, Mullainathan & Raghavan (2017):** if base rates differ between groups, `$P(Y = 1 \mid A = 0) \neq P(Y = 1 \mid A = 1)$`, and the classifier is not perfect, then **at most one of {demographic parity, equalized odds, predictive parity}** can hold.

This is **not a quirk of any model**. It is a mathematical fact about the joint distribution of `$Y$`, `$\hat Y$`, `$A$`.

If you enforce two, the third **must** break.

**This is why fairness is a choice, not a calculation.**

---

# The COMPAS Argument, Revisited

Module 3 showed how COMPAS triggered a famous fairness debate:

- **ProPublica** said: COMPAS is unfair — it has a **higher FPR for Black defendants** (innocent people flagged at twice the rate). Equalized odds is broken.

- **Northpointe** said: COMPAS is fair — a score of 7 means the same recidivism risk for Black and white defendants. **Predictive parity holds**.

- **Both were right.** Base rates differ; the impossibility theorem says you cannot satisfy both at once. Choosing which to break is a value judgment, not a math error.

---

# The Same Argument, with Driver Acceptance

1,000 ride requests per group (positive = "actually accepted"). **Different base rates** — that's what makes the theorem bite.

<div class="two-col">
<div class="col-narrow">
<p><b>Non-minority</b> (60% accept)</p>
<table>
<tr><th></th><th>Pred = 1</th><th>Pred = 0</th></tr>
<tr><th>Actual = 1</th><td>540</td><td>60</td></tr>
<tr><th>Actual = 0</th><td>95</td><td>305</td></tr>
</table>
</div>
<div class="col-narrow">
<p><b>Minority</b> (40% accept)</p>
<table>
<tr><th></th><th>Pred = 1</th><th>Pred = 0</th></tr>
<tr><th>Actual = 1</th><td>240</td><td>160</td></tr>
<tr><th>Actual = 0</th><td>40</td><td>560</td></tr>
</table>
</div>
</div>

**Precision** = TP/(TP+FP): non-min `$\tfrac{540}{635} \approx 0.85$`, min `$\tfrac{240}{280} \approx 0.86$` → **predictive parity ✓** (Northpointe)

**TPR** = TP/(TP+FN): non-min `$\tfrac{540}{600} = 0.90$`, min `$\tfrac{240}{400} = 0.60$` → **equalized odds ✗** (ProPublica)

**TPR gap of 0.30 on the same model — both sides right.**

---

# When Predictive Parity Wins: A Cancer Screen

Cancer screening test, two populations: A (5% prevalence), B (10%). Same model.

**Equalized odds.** Equal TPR and FPR across groups. Then a "positive" test result means **different actual probabilities** in each group — perhaps `$P(\text{cancer} \mid +) = 0.30$` in A, `$0.50$` in B.

The doctor now has to **mentally re-weight every score by the patient's group** — exactly the explicit demographic-aware decision fairness was supposed to prevent.

**Predictive parity.** Force `$P(\text{cancer} \mid +)$` equal in both groups. The number on the page means what it says, regardless of who the patient is.

Cost: more healthy patients in B get false alarms (more sick patients in B → any threshold catches more of both).

---

# When Predictive Parity Wins: The Rule of Thumb

**Which criterion to prioritize depends on how the score is consumed.**

- **Fixed cutoff with concentrated harm on false positives → equalized odds**
  - COMPAS, hiring screens, fraud flags, no-fly lists
  - "Flagged" is a near-binary action with a sharp downside

- **Score read as a calibrated probability → predictive parity**
  - Medical risk, credit risk, weather forecasts, insurance pricing
  - The number itself drives the decision; if it doesn't mean the same thing across groups, the decision-maker must use the protected attribute explicitly

**The COMPAS reframe.** ProPublica's intuition fit COMPAS because judges *used* the score as a near-binary "high vs low" cutoff — the FPR gap translated directly into more Black defendants being detained.

Had COMPAS been used as a calibrated probability fed into a more nuanced decision rule, **Northpointe's defense would have been the stronger one**. ProPublica was right *for COMPAS specifically*, not as a general principle.

---

# Empirical Demonstration: Same Data, Sweep the Threshold

---

# How Do You Enforce a Criterion?

Three injection points:

**Pre-processing** — massage the training data (reweight, resample, transform features) before training.
- Pros: model-agnostic. Cons: throws away signal; the model can relearn proxies.

**In-processing** — modify the training objective (fairness penalty, adversarial debiasing).
- Pros: principled. Cons: needs custom training code.

**Post-processing** — adjust the **decision rule** per group on the trained model's scores.
- Pros: simplest, model-agnostic. Cons: explicitly uses the protected attribute at decision time, often illegal.

---

# Post-Processing: Group-Specific Thresholds

The simplest fix: pick a different threshold for each group.

|Metric                 | Before|  After|
|:----------------------|------:|------:|
|Demographic parity gap | -0.421|  0.000|
|TPR gap (eq. opp.)     | -0.324|  0.030|
|Precision gap          | -0.028| -0.162|

We **closed the demographic parity gap** by using thresholds 0.66 (majority) and 0.35 (minority). But the **TPR gap got worse** — exactly what the impossibility theorem predicts.

---

# Accuracy vs Fairness Frontier

Every fairness intervention sits **somewhere on this curve**. Math draws the curve; humans pick the point.

---

# The Frontier in Dollars

.small[
**Assumptions** (deliberately rough): Uber processes ~25M daily ride requests; the platform earns ~$5 of revenue per accepted ride; each percentage point of lost accuracy maps 1:1 to lost matches. So 1 pp accuracy ≈ 25M × $5 × 365 ≈ **$45.6B/year** at the limit.

Closing the entire DP gap costs roughly **$3.5B/year** in this back-of-envelope calculation. Real platforms negotiate this number against legal exposure, brand risk, and the political cost of sustained disparities.
]

---
class: inverse

# The Key Questions

<br>

### 1. Which fairness criterion are you protecting? Why that one?

<br>

### 2. What does enforcing it cost — in accuracy, *and* in the other criteria?

<br>

### 3. Who gets to make that choice? (Spoiler: it's not the data scientist alone.)

---

# Course Map

<table>
<tr><th>#</th><th>Module</th><th>Status</th></tr>
<tr><td>1</td><td><a href="../module-01/slides.html">The Learning Problem</a></td><td>✓ done</td></tr>
<tr><td>2</td><td><a href="../module-02/slides.html">Linear Models</a></td><td>✓ done</td></tr>
<tr><td>3</td><td><a href="../module-03/slides.html">Model Evaluation &amp; Selection</a></td><td>✓ done</td></tr>
<tr><td>4</td><td>Tree-Based Methods</td><td>upcoming</td></tr>
<tr><td>5</td><td>Unsupervised Learning</td><td>upcoming</td></tr>
<tr><td>6</td><td>Neural Networks Foundations</td><td>upcoming</td></tr>
<tr><td>7</td><td>Fairness Frameworks &amp; Metrics <i>(just finished)</i></td><td>✓ done</td></tr>
<tr><td><b>8</b></td><td><b>Auditing &amp; Interpretability</b></td><td>next</td></tr>
</table>

Say **"start module 12"** when ready.

---
name: metrics-ref

# Reference: Metrics from the Confusion Matrix

<table class="cm-corner">
<tr><th></th><th>Pred = 1</th><th>Pred = 0</th></tr>
<tr><th>Actual = 1</th><td class="cm-tp">TP</td><td class="cm-fn">FN</td></tr>
<tr><th>Actual = 0</th><td class="cm-fp">FP</td><td class="cm-tn">TN</td></tr>
</table>

- **Accuracy** = `$\dfrac{\color{#1b5e20}{TP} + \color{#0d47a1}{TN}}{\color{#1b5e20}{TP} + \color{#e65100}{FN} + \color{#b71c1c}{FP} + \color{#0d47a1}{TN}}$` — overall fraction correct

- **Precision** = `$\dfrac{\color{#1b5e20}{TP}}{\color{#1b5e20}{TP} + \color{#b71c1c}{FP}}$` — of what I predicted positive, how many were real?

- **True positive rate (recall)** = `$\dfrac{\color{#1b5e20}{TP}}{\color{#1b5e20}{TP} + \color{#e65100}{FN}}$` — of the real positives, how many did I predict positive?

- **False positive rate** = `$\dfrac{\color{#b71c1c}{FP}}{\color{#b71c1c}{FP} + \color{#0d47a1}{TN}}$` — of the real negatives, how many did I predict positive?

- **F1** = `$2 \cdot \dfrac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$` — harmonic mean (penalizes lopsided models)

**Worked example.** 1000 ride requests, positive = "accepted": TP = 480, FN = 220, FP = 130, TN = 170.

`$$\text{accuracy} = \tfrac{480 + 170}{1000} = 0.65 \quad
\text{precision} = \tfrac{480}{480 + 130} \approx 0.79 \quad
\text{recall} = \tfrac{480}{480 + 220} \approx 0.69 \quad
\text{FPR} = \tfrac{130}{300} \approx 0.43$$`