Linear Regression: The Foundation
Linear regression isn't just a simple algorithm—it's the conceptual foundation for understanding all of supervised learning. Master it deeply.
Derive, implement, and interpret linear regression, understanding it as the prototype of all supervised learning.
Core Teachings
Key concepts with source texts
The Setup: We have data: n examples of (x, y) pairs where x is input features and y is the target. - Example: x = [house size, bedrooms, age], y = house price
The Assumption (Hypothesis): y is approximately a linear function of x, plus noise:
$$y = w_0 + w_1 x_1 + w_2 x_2 + ... + w_d x_d + \epsilon$$
Or in vector notation: $$y = \mathbf{w}^T \mathbf{x} + \epsilon$$
Where: - w = weights (parameters we learn) - x = input features (data we're given) - ε = noise (stuff we can't predict)
Why Linear? 1. Simplicity: Linear models are interpretable—each weight tells us the effect of one feature 2. Often a good approximation: Many relationships are approximately linear in the right range 3. Foundation: Understanding linear regression deeply makes complex models easier to understand
The Geometry: In 2D (one input), we're fitting a line. In 3D (two inputs), we're fitting a plane. In higher dimensions, we're fitting a hyperplane. The weights determine the slope in each direction; the intercept (w₀) determines where the hyperplane crosses the y-axis.
For a house price prediction problem, interpret what it means if w₁ (weight for 'square feet') = 200. Answer: Each additional square foot is associated with a $200 increase in price, holding other features constant.
Linear regression contains, in miniature, all the concepts of supervised learning: model specification, loss functions, optimization, overfitting, and interpretation. Master it, and you've mastered the template for understanding any supervised learning algorithm.
Study Materials
Primary sources with guided reading
Linear Regression, Clearly Explained - StatQuest
To build geometric intuition for what linear regression does and how it finds the best line.
- 1.What is the 'residual' and why do we square it?
- 2.How do we find the line that minimizes total squared residuals?
- 3.What is R² and how do we interpret it?
You should be able to explain, in plain English, what linear regression does and how it finds the best-fit line.
Key Takeaways
- Linear regression minimizes sum of squared residuals (vertical distances from points to line)
- The 'best' line passes through the mean of x and y
- R² tells us what fraction of variance in y is explained by x
Gradient Descent, Step by Step - ML Glossary
To understand gradient descent as an algorithm—the mechanical steps, not just the intuition.
- 1.What are the inputs to gradient descent? What are the outputs?
- 2.How does the learning rate affect convergence?
- 3.What is the gradient, geometrically? How do we compute it?
You should be able to implement gradient descent from scratch for a simple loss function.
Key Takeaways
- Gradient descent iteratively moves toward the minimum by following the negative gradient
- Learning rate η controls step size—too small is slow, too large diverges
- For linear regression, gradient descent and the normal equation give the same answer (but scale differently)
Write your thoughts before revealing answers
Consider these points:
- •Does the model capture causation or correlation?
- •Can you actually add a bedroom while holding 'square feet' constant?
- •What if bedrooms are correlated with other features not in the model?
- •What if the relationship isn't actually linear?
Your Thoughts
Writing your thoughts first will deepen your understanding
Bridge notes help connect the resources and show how they relate to the learning outcome.
AI-generated notes synthesize the lesson outcome and resource summaries. Human-reviewed before publishing.
In linear regression, the weights (w) represent:
What happens if we set the learning rate too high in gradient descent?