ML crash course - Linear regression

Machine learning crash courseLinear regression 챕터.


Learning objectives:


Definition of linear regression:

Linear regression is a statistical technique used to find the relationship between variables. In an ML context, linear regression finds the relationship between features and a label.

Linear regression equation

In algebraic terms, the model would be defined as y=mx+by = mx + b, where

  • yy is the value we want to predict.
  • mm is the slope of the line
  • xx is our input value
  • bb is the y-intercept

In ML, we write the equation for a linear regression model as y=b+w1x1y' = b + w_1 x_1, where

  • yy' is the predicted label - the output
  • bb is the bias of the model. Bias is the same concept as the y-intercept in the algebraic equation for a line. In ML, bias is sometimes referred to as w0w_0. Bias is a parameter of the model and is calculated during training.
  • w1w_1 is the weight of the feature. Weight is the same concept as the slope mm in the algebraic equation for a line. Weight is a parameter of the model and is calculated during training.
  • x1x_1 is a feature - the input

Models with multiple features have two or more weights, e.g.: y=b+w1x1+w2x2+w3x3y' = b + w_1 x_1 + w_2 x_2 + w_3 x_3

Key terms



Loss is a numerical metric that describes how wrong a model’s predictions are. Loss measures the distance between the model’s predictions and the actual labels. The goal of training a model is to minimize the loss, reducing it to its lowest possible value.

Distance of loss:

Loss focuses on the distance between the values, not the direction. … Thus, all methods for calculating loss remove the sign.

Types of loss:

  • L1 loss: The sum of the absolute values of the difference between the predicted values and the actual values.
  • Mean absolute error: The average of L1 losses across a set of examples.
  • L2 loss: The sum of the squared difference between the predicted values and the actual values.
  • Mean squared error: The average of L2 losses across a set of examples.

When processing multiple examples at once, we recommend averaging the losses across all the examples, whether using MAE or MSE.

Choosing a loss

When choosing the best loss function, consider how you want the model to treat outliers. For instance, MSE moves the model more toward the outliers, while MAE doesn’t.

Key terms

Parameters exercise

Gradient descent

Gradient descent is a mathematical technique that iteratively finds the weights and bias that produce the model with the lowest loss.

Model convergence and loss curves

When training a model, you’ll often look at a loss curve to determine if the model has converged. The loss curve shows how the loss changes as the model trains.

Convergence and convex functions

The loss functions for linear models always produce a convex surface. As a result of this property, when a linear regression model converges, we know the model has found the weights and bias that produce the lowest loss.

Key terms



Hyperparameters are variables that control different aspects of training.

Common hyperparameters:

Key terms

Gradient descent exercise

Programming exercise

What’s next

2025 © ak