Module 1: Python for R Users

class: center, middle, inverse, title-slide

.title[
# Module 1: Python for R Users
]
.subtitle[
## The cheat sheet
]

---

# Course Map

<table>
<tr><th>#</th><th>Module</th><th>Status</th></tr>
<tr><td><b>1</b></td><td><b>Python for R Users</b> <i>(you are here)</i></td><td>← current</td></tr>
<tr><td>2</td><td>pandas basics: filter / mutate / arrange / summarise</td><td>upcoming</td></tr>
<tr><td>3</td><td>Joins, merges, group-by recipes</td><td>upcoming</td></tr>
<tr><td>4</td><td>Regression and A/B tests with statsmodels</td><td>upcoming</td></tr>
<tr><td>5</td><td>End-to-end interview scenario</td><td>upcoming</td></tr>
</table>

---

# Before We Start: Setup

**Step 1.** Build the CSV data files (once, in your terminal):

```bash
cd ~/Desktop/sandbox/python-for-r-users
Rscript data/build_csvs.R
```

**Step 2.** Install **Positron** — a free IDE from the RStudio team that handles R *and* Python natively. Same keyboard shortcuts you already know.

Download from [positron.posit.co](https://positron.posit.co/), drag to Applications.

**Step 3.** Open the course folder: `File → Open Folder → python-for-r-users`

Open `module-01/exercise.py`. Highlight code → `Cmd+Return` to send to the Python console — just like RStudio.

**Test it:** highlight and run this:

```python
print(1 + 1)
```

If you see `2` in the console, you're ready.

---

# The Big Differences in 90 Seconds

| | R | Python |
|---|---|---|
| Indexing starts at | 1 | **0** |
| Inclusive ranges? | Yes (`1:5` = 1..5) | **No** (`range(1,5)` = 1..4, excludes the right end) |
| Assignment | `<-` (or `=`) | `=` |
| Indentation | cosmetic | **load-bearing** |
| Vectorized? | Everything is | Lists no, NumPy/pandas yes |
| Missing values | `NA` (typed) | `None` / `np.nan` (untyped) |
| Booleans | `TRUE`/`FALSE` | `True`/`False` |

The biggest mental shift: **indentation is part of the syntax**. There are no curly braces. The 4-space indent IS the block.

---

# Two Things From That Table Worth Explaining

**Why does `range(1,5)` include 1 but not 5?** Python uses "half-open" intervals: include the start, exclude the end. `range(n)` gives exactly `n` items, and consecutive ranges like `range(0,5)` + `range(5,10)` don't overlap. Same rule for list slicing: `x[1:3]` = items at index 1 and 2, not 3.

**What does "vectorized" mean?** In R, `c(1,2,3) + 10` gives `c(11,12,13)` — the operation applies to every element automatically. In Python, `[1,2,3] + 10` is an **error** — plain lists don't do element-wise math. You need NumPy or pandas for that (Module 2).

---

# Variables and Types

```python
x = 42                  # int
y = 3.14                # float
name = "Allison"        # str
greeting = f"Hello, {name}!"  # → "Hello, Allison!"

is_ready = True         # bool (capitalized!)
maybe = None            # Python's NULL
```

**f-strings:** the `f` before the quote says "replace `{...}` with the variable's value." Like R's `glue::glue("Hello, {name}!")` or `paste0("Hello, ", name, "!")`. Without the `f`, the braces are literal text.

R-isms that don't work: `x <- 5` parses as "is x less than minus 5" → `False`. Use `=`.

---

# The Four Core Data Structures

```python
nums  = [1, 2, 3, 4, 5]              # list — ordered, mutable
point = (3.0, 4.0)                   # tuple — ordered, immutable
ride  = {"id": 42, "fare": 15.5}     # dict — key-value (named list)
seen  = {1, 2, 3}                    # set — unordered, unique
```

Indexing on lists:

```python
nums[0]    # → 1   (zero-indexed!)
nums[-1]   # → 5   (negative = from end)
nums[1:3]  # → [2, 3]   (right-exclusive)
```

**Lists are NOT vectorized.** `[1, 2, 3] + 1` is a `TypeError`. For vectorized math you need NumPy or pandas (Module 2).

**Mutable vs immutable:** "mutable" means you can change it after creating it. `nums[0] = 99` works on a list (mutable) but crashes on a tuple (immutable). **Watch out:** in R, modifying a vector makes a copy. In Python, modifying a list changes the **original** — if you pass it to a function, the function can alter your data.

---

# Comprehensions: The Python Idiom

```python
# List comprehension
squares = [x ** 2 for x in range(10)]
# → [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# With a filter
evens = [x for x in range(20) if x % 2 == 0]
# → [0, 2, 4, ..., 18]

# Dict comprehension
square_map = {x: x ** 2 for x in range(5)}
# → {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
```

**Reading that last line:** `{x: x ** 2 for x in range(5)}` means "for each `x` from 0 to 4, create a key-value pair where the key is `x` and the value is `x` squared." The `{}` braces make it a dict (not a list). Like `setNames(0:4, (0:4)^2)` in R, but more readable.

R equivalent: `sapply()` / `purrr::map()`. Pythonic style prefers comprehensions to loops for transformations + filters.

---

# Control Flow: if / for

```python
x = 3
if x > 0:          # the : starts the block
    print("positive")   # indentation = the block body
elif x == 0:
    print("zero")
else:
    print("negative")
# → positive
```

```python
nums = [10, 20, 30]
for item in nums:   # like R's: for (item in nums) { ... }
    print(item)
# → 10, 20, 30
```

The `:` after the condition is required. Indentation defines the block (no `{}`). No parentheses around the condition.

---

# Control Flow: while

```python
x = 3
while x > 0:
    print(x)
    x -= 1       # shorthand for x = x - 1
# → 3, 2, 1
```

**R → Python cheat sheet:**

| R | Python |
|---|---|
| `if (x > 0) { ... }` | `if x > 0:` + indent |
| `for (i in 1:5) { ... }` | `for i in range(1, 6):` + indent |
| `while (x > 0) { ... }` | `while x > 0:` + indent |

---

# Functions

```python
def mph(distance, minutes):
    """Miles per hour."""   # ← docstring (triple quotes)
    return distance / (minutes / 60)

print(mph(5, 15))   # → 20.0
```

The `"""..."""` is a **docstring** — Python's convention for documenting what a function does, what it takes, and what it returns. Like writing `#' @description` / `#' @param` in R's roxygen. Optional, but good practice. Once written, `help(mph)` prints it.

```python
# Default arguments — same idea as R's f <- function(x, y = 2)
def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"

print(greet("Maya"))               # → Hello, Maya!
print(greet("Maya", "Welcome"))    # → Welcome, Maya!
```

```python
# Lambda = anonymous function, like R's \(x) x^2
square = lambda x: x ** 2
print(square(4))                   # → 16
```

---

# Imports

```python
import pandas as pd                       # whole module, alias as pd
import numpy as np
from statsmodels.formula.api import ols   # specific name from a module
```

In R, `library(dplyr)` dumps all functions into your namespace — you just call `filter()` without saying where it came from (and get conflicts when two packages share a name). In Python, you **always prefix**: `pd.read_csv()`, `np.mean()`. Verbose, but you always know which module a function belongs to — no conflicts, no guessing.

---

# 10 Things That Trip Up R Users

.pull-left[
1. **Zero-indexing.** `x[0]` is the first element.

2. **Indentation is syntax.** A misplaced space breaks your script.

3. **No vectorized math on lists.** Use NumPy/pandas.

4. **`==` for comparison, `=` for assignment.** No `<-`.

5. **`True`/`False` are capitalized.**
]

.pull-right[
<ol start="6">
<li><b><code>None</code> instead of <code>NULL</code>/<code>NA</code>.</b> NumPy uses <code>np.nan</code> for missing floats.</li>
<li><b>Methods vs functions.</b> <code>nums.append(6)</code> changes <code>nums</code> itself (nothing returned). <code>sorted(nums)</code> returns a new sorted list and leaves <code>nums</code> alone. In R, both would return a new object. In Python, <code>.method()</code> often mutates; <code>function()</code> often copies.</li>
<li><b>Mutability matters.</b> Lists/dicts mutable; tuples/strings immutable.</li>
<li><b><code>for</code> loops are fine here.</b> Python culture is OK with them.</li>
<li><b>No <code>%>%</code> pipe.</b> Python uses method chaining instead: <code>df.filter().groupby().mean()</code> — read left to right, like a pipe.</li>
</ol>
]

---
class: inverse, center, middle

# Interview Questions

---

# Q1. Write a function that computes miles per hour given distance (mi) and time (min).

*Hint: `def`, division, `return`.*

```python
def mph(distance, minutes):
    return distance / (minutes / 60)

print(mph(5, 15))   # → 20.0
```

---

# Q2. Given a list of ride dicts, compute the average fare for SF rides only.

```python
rides = [{"city": "SF", "fare": 12}, {"city": "NY", "fare": 18},
         {"city": "SF", "fare": 9}]
```

*Hint: list comprehension with a filter, then `sum() / len()`.*

```python
sf_fares = [r["fare"] for r in rides if r["city"] == "SF"]
print(sum(sf_fares) / len(sf_fares))   # → 10.5
# Note: Python has no built-in mean(). Use sum()/len(),
# or numpy.mean(), or once in pandas: series.mean()
```

---

# Q3. Read `data/rides.csv` with pandas and show the first 5 rows.

*Hint: `pd.read_csv()` and `.head()`.*

```python
import pandas as pd
rides = pd.read_csv("data/rides.csv")
rides.head()
```

This gives you a `DataFrame` — the Python equivalent of R's `data.frame` / `tibble`. Module 2 covers everything you can do with one.

---

# Q4. Invert a dict (swap keys and values).

```python
original = {"a": 1, "b": 2, "c": 3}
# → should become {1: "a", 2: "b", 3: "c"}
```

*Hint: dict comprehension + `.items()`.*

```python
inverted = {v: k for k, v in original.items()}
print(inverted)   # → {1: 'a', 2: 'b', 3: 'c'}
```

**Reading it:** `{v: k ...}` = "in the new dict, `v` is the key, `k` is the value" (the `:` separates key from value, same as `{"a": 1}`). `.items()` gives `(key, value)` pairs from the original; the comprehension flips them.

---

# Q5. Given a list of numbers, return the unique even numbers in sorted order.

```python
nums = [4, 7, 2, 8, 4, 1, 6, 3, 8, 2]
```

*Hint: set comprehension for uniqueness + filter, then `sorted()`.*

```python
print(sorted({n for n in nums if n % 2 == 0}))
# → [2, 4, 6, 8]
```

`{...}` with no `key: value` is a **set comprehension** — gives unique values. `sorted()` returns a list.

---
class: inverse

# The Key Takeaways

<br>

### 1. Python is "R syntax with zero-indexing, explicit imports, and load-bearing whitespace." Once you internalize those three, you can read any Python data script.

<br>

### 2. List/dict/set comprehensions replace most uses of `for` loops and `sapply`/`map`. Memorize the pattern.

<br>

### 3. Lists are not vectorized — you need NumPy or pandas for that. Module 2 starts there.

---

# Course Map

<table>
<tr><th>#</th><th>Module</th><th>Status</th></tr>
<tr><td>1</td><td>Python for R Users <i>(just finished)</i></td><td>✓ done</td></tr>
<tr><td><b>2</b></td><td><b>pandas basics</b></td><td>next</td></tr>
<tr><td>3</td><td>Joins, merges, group-by recipes</td><td>upcoming</td></tr>
<tr><td>4</td><td>Regression and A/B tests with statsmodels</td><td>upcoming</td></tr>
<tr><td>5</td><td>End-to-end interview scenario</td><td>upcoming</td></tr>
</table>