Linear Regression: The Foundation

Linear regression isn't just a simple algorithm—it's the conceptual foundation for understanding all of supervised learning. Master it deeply.

Learning Outcome

Derive, implement, and interpret linear regression, understanding it as the prototype of all supervised learning.

Core Teachings

Key concepts with source texts

The Setup: We have data: n examples of (x, y) pairs where x is input features and y is the target. - Example: x = [house size, bedrooms, age], y = house price

The Assumption (Hypothesis): y is approximately a linear function of x, plus noise:

$$y = w_0 + w_1 x_1 + w_2 x_2 + ... + w_d x_d + \epsilon$$

Or in vector notation: $$y = \mathbf{w}^T \mathbf{x} + \epsilon$$

Where: - w = weights (parameters we learn) - x = input features (data we're given) - ε = noise (stuff we can't predict)

Why Linear? 1. Simplicity: Linear models are interpretable—each weight tells us the effect of one feature 2. Often a good approximation: Many relationships are approximately linear in the right range 3. Foundation: Understanding linear regression deeply makes complex models easier to understand

The Geometry: In 2D (one input), we're fitting a line. In 3D (two inputs), we're fitting a plane. In higher dimensions, we're fitting a hyperplane. The weights determine the slope in each direction; the intercept (w₀) determines where the hyperplane crosses the y-axis.

Practice This

For a house price prediction problem, interpret what it means if w₁ (weight for 'square feet') = 200. Answer: Each additional square foot is associated with a $200 increase in price, holding other features constant.

Deep Dive

Why This Matters

Linear regression contains, in miniature, all the concepts of supervised learning: model specification, loss functions, optimization, overfitting, and interpretation. Master it, and you've mastered the template for understanding any supervised learning algorithm.

Study Materials

Primary sources with guided reading

WatchYouTube (StatQuest)

Linear Regression, Clearly Explained - StatQuest

27 min

Why Read This?

To build geometric intuition for what linear regression does and how it finds the best line.

While Reading, Ask Yourself:

1.What is the 'residual' and why do we square it?
2.How do we find the line that minimizes total squared residuals?
3.What is R² and how do we interpret it?

After Reading, You Should:

You should be able to explain, in plain English, what linear regression does and how it finds the best-fit line.

Key Takeaways

Linear regression minimizes sum of squared residuals (vertical distances from points to line)
The 'best' line passes through the mean of x and y
R² tells us what fraction of variance in y is explained by x

ReadML Cheatsheet

Gradient Descent, Step by Step - ML Glossary

Why Read This?

To understand gradient descent as an algorithm—the mechanical steps, not just the intuition.

While Reading, Ask Yourself:

1.What are the inputs to gradient descent? What are the outputs?
2.How does the learning rate affect convergence?
3.What is the gradient, geometrically? How do we compute it?

After Reading, You Should:

You should be able to implement gradient descent from scratch for a simple loss function.

Key Takeaways

Gradient descent iteratively moves toward the minimum by following the negative gradient
Learning rate η controls step size—too small is slow, too large diverges
For linear regression, gradient descent and the normal equation give the same answer (but scale differently)

Reflection & Critical Thinking

Write your thoughts before revealing answers

Consider these points:

•Does the model capture causation or correlation?
•Can you actually add a bedroom while holding 'square feet' constant?
•What if bedrooms are correlated with other features not in the model?
•What if the relationship isn't actually linear?

Your Thoughts

Writing your thoughts first will deepen your understanding

This connects to our later discussion of feature engineering and multicollinearity—careful feature design matters for interpretability.

AI Bridge Notes

Bridge notes help connect the resources and show how they relate to the learning outcome.

AI-generated notes synthesize the lesson outcome and resource summaries. Human-reviewed before publishing.

Knowledge CheckTest your understanding

In linear regression, the weights (w) represent:

What happens if we set the learning rate too high in gradient descent?

Previous Next Lesson