Logistic Regression: Linear Models for Classification
Classification seems different from regression, but the core ideas transfer directly. Logistic regression shows how.
Understand logistic regression as linear regression with a different loss function and interpretation.
Core Teachings
Key concepts with source texts
Setup: Input: features x (email text, tumor size, image pixels...) Output: class y ∈ {0, 1} (spam/not spam, malignant/benign, cat/dog...)
Why Not Linear Regression? If we use linear regression to predict class (0 or 1), the predictions can be any real number. A prediction of 1.5 or -0.3 doesn't make sense as a class.
The Logistic Function (Sigmoid): We transform the linear output into a probability:
$$P(y=1|x) = \sigma(\mathbf{w}^T \mathbf{x}) = \frac{1}{1 + e^{-\mathbf{w}^T \mathbf{x}}}$$
Properties of σ(z): - Always between 0 and 1 (interpretable as probability) - σ(0) = 0.5 (decision boundary at linear score = 0) - Monotonically increasing (higher linear score → higher probability)
Interpretation: The linear part (w^T x) computes a 'score.' The sigmoid squashes this to a probability. We predict class 1 if P(y=1|x) > 0.5 (i.e., score > 0).
For a spam classifier with weights w_contains_viagra = 4, w_from_friend = -3: calculate the probability an email is spam if it contains 'viagra' (1) but is from a friend (1). Then calculate if it's not from a friend (0).
Logistic regression is the workhorse of industry ML for classification. It's interpretable (weights show feature importance), fast to train, and often surprisingly effective. Understanding it deeply prepares you for neural networks (which are stacked logistic regressions).
Study Materials
Primary sources with guided reading
Logistic Regression, Clearly Explained - StatQuest
To understand logistic regression as transforming a linear model into probability outputs, and why cross-entropy is the right loss function.
- 1.Why can't we use regular linear regression for classification?
- 2.What does the sigmoid function do? Why is it useful?
- 3.What is maximum likelihood, and how does it lead to cross-entropy loss?
You should understand logistic regression as 'linear regression passed through sigmoid, trained with cross-entropy loss.'
Write your thoughts before revealing answers
Consider these points:
- •What does a negative weight mean for the relationship?
- •If income increases by $10,000, how does the linear score change?
- •How does that change in score affect probability through the sigmoid?
- •Is the effect on probability constant, or does it depend on the starting point?
Your Thoughts
Writing your thoughts first will deepen your understanding
Bridge notes help connect the resources and show how they relate to the learning outcome.
AI-generated notes synthesize the lesson outcome and resource summaries. Human-reviewed before publishing.