# ML crash course - Linear regression

> Machine learning crash course 중 Linear regression 챕터.

[Machine learning crash course](https://wiki.g15e.com/pages/Machine%20learning%20crash%20course.txt) 중 [Linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt) 챕터.

https://developers.google.com/machine-learning/crash-course/linear-regression

## Introduction

Learning objectives:

- Explain a [loss function](https://wiki.g15e.com/pages/Loss%20function.txt) and how it works.
- Define and describe how [gradient descent](https://wiki.g15e.com/pages/Gradient%20descent.txt) finds the optimal model [parameters](https://wiki.g15e.com/pages/Parameter%20(machine%20learning.txt)).
- Describe how to tune [hyperparameters](https://wiki.g15e.com/pages/Hyperparameter.txt) to efficiently [train](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)) a [linear model](https://wiki.g15e.com/pages/Linear%20model.txt).

Prerequisites:

- [Introduction to Machine Learning](https://wiki.g15e.com/pages/Introduction%20to%20Machine%20Learning.txt)

Definition of [linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt):

> Linear regression is a statistical technique used to find the relationship between variables. In an ML context, linear regression finds the relationship between [features](https://wiki.g15e.com/pages/Feature%20(machine%20learning.txt)) and a [label](https://wiki.g15e.com/pages/Label%20(machine%20learning.txt)).

### Linear regression equation

In algebraic terms, the model would be defined as $y = mx + b$, where

- $y$ is the value we want to predict.
- $m$ is the slope of the line
- $x$ is our input value
- $b$ is the y-intercept

In ML, we write the equation for a linear regression model as $y' = b + w_1 x_1$, where

- $y'$ is the [predicted label](https://wiki.g15e.com/pages/Prediction%20(machine%20learning.txt)) - the output
- $b$ is the [bias](https://wiki.g15e.com/pages/Bias%20(machine%20learning.txt)) of the [model](https://wiki.g15e.com/pages/Model%20(machine%20learning.txt)). Bias is the same concept as the y-intercept in the algebraic equation for a line. In ML, bias is sometimes referred to as $w_0$. Bias is a [parameter](https://wiki.g15e.com/pages/Parameter%20(machine%20learning.txt)) of the model and is calculated during [training](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)).
- $w_1$ is the [weight](https://wiki.g15e.com/pages/Weight%20(machine%20learning.txt)) of the feature. Weight is the same concept as the slope $m$ in the algebraic equation for a line. Weight is a [parameter](https://wiki.g15e.com/pages/Parameter%20(machine%20learning.txt)) of the model and is calculated during [training](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)).
- $x_1$ is a [feature](https://wiki.g15e.com/pages/Feature%20(machine%20learning.txt)) - the input

Models with multiple features have two or more weights, e.g.: $y' = b + w_1 x_1 + w_2 x_2 + w_3 x_3$

### Key terms

- [Bias](https://wiki.g15e.com/pages/Bias%20(machine%20learning.txt))
- [Feature](https://wiki.g15e.com/pages/Feature%20(machine%20learning.txt))
- [Label](https://wiki.g15e.com/pages/Label%20(machine%20learning.txt))
- [Linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt)
- [Parameter](https://wiki.g15e.com/pages/Parameter%20(machine%20learning.txt))
- [Weight](https://wiki.g15e.com/pages/Weight%20(machine%20learning.txt))

## Loss

Definition:

> [Loss](https://wiki.g15e.com/pages/Loss%20(machine%20learning.txt)) is a numerical metric that describes how wrong a [model](https://wiki.g15e.com/pages/Model%20(machine%20learning.txt))'s [predictions](https://wiki.g15e.com/pages/Prediction%20(machine%20learning.txt)) are. Loss measures the distance between the model's predictions and the actual [labels](https://wiki.g15e.com/pages/Label%20(machine%20learning.txt)). The goal of [training](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)) a model is to minimize the loss, reducing it to its lowest possible value.

Distance of loss:

> Loss focuses on the distance between the values, not the direction. … Thus, all methods for calculating loss remove the sign.

Types of loss:

- [L1 loss](https://wiki.g15e.com/pages/L1%20loss.txt): The sum of the absolute values of the difference between the predicted values and the actual values.
- [Mean absolute error](https://wiki.g15e.com/pages/Mean%20absolute%20error.txt): The average of L1 losses across a set of examples.
- [L2 loss](https://wiki.g15e.com/pages/L2%20loss.txt): The sum of the squared difference between the predicted values and the actual values.
- [Mean squared error](https://wiki.g15e.com/pages/Mean%20squared%20error.txt): The average of L2 losses across a set of examples.

When processing multiple examples at once, we recommend averaging the losses across all the examples, whether using [MAE](https://wiki.g15e.com/pages/Mean%20absolute%20error.txt) or [MSE](https://wiki.g15e.com/pages/Mean%20squared%20error.txt).

### Choosing a loss

When choosing the best loss function, consider how you want the model to treat [outliers](https://wiki.g15e.com/pages/Outliers.txt). For instance, [MSE](https://wiki.g15e.com/pages/Mean%20squared%20error.txt) moves the model more toward the outliers, while [MAE](https://wiki.g15e.com/pages/Mean%20absolute%20error.txt) doesn't.

### Key terms

- [Mean absolute error](https://wiki.g15e.com/pages/Mean%20absolute%20error.txt)
- [Mean squared error](https://wiki.g15e.com/pages/Mean%20squared%20error.txt)
- [L1 loss](https://wiki.g15e.com/pages/L1%20loss.txt)
- [L2 loss](https://wiki.g15e.com/pages/L2%20loss.txt)
- [Loss](https://wiki.g15e.com/pages/Loss%20(machine%20learning.txt))
- [Outliers](https://wiki.g15e.com/pages/Outliers.txt)
- [Prediction](https://wiki.g15e.com/pages/Prediction%20(machine%20learning.txt))

## Parameters exercise

https://developers.google.com/machine-learning/crash-course/linear-regression/parameters-exercise

## Gradient descent

[Gradient descent](https://wiki.g15e.com/pages/Gradient%20descent.txt) is a mathematical technique that iteratively finds the [weights](https://wiki.g15e.com/pages/Weight%20(machine%20learning.txt)) and [bias](https://wiki.g15e.com/pages/Bias%20(machine%20learning.txt)) that produce the model with the lowest [loss](https://wiki.g15e.com/pages/Loss%20(machine%20learning.txt)).

### Model convergence and loss curves

When [training](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)) a model, you'll often look at a [loss curve](https://wiki.g15e.com/pages/Loss%20curve.txt) to determine if the model has [converged](https://wiki.g15e.com/pages/Convergence%20(machine%20learning.txt)). The loss curve shows how the loss changes as the model trains.

### Convergence and convex functions

The [loss functions](https://wiki.g15e.com/pages/Loss%20function.txt) for [linear models](https://wiki.g15e.com/pages/Linear%20model.txt) always produce a [convex](https://wiki.g15e.com/pages/Convex%20function.txt) surface. As a result of this property, when a [linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt) model converges, we know the model has found the weights and bias that produce the lowest loss.

### Key terms

- [Convergence](https://wiki.g15e.com/pages/Convergence%20(machine%20learning.txt))
- [Convex function](https://wiki.g15e.com/pages/Convex%20function.txt)
- [Gradient descent](https://wiki.g15e.com/pages/Gradient%20descent.txt)
- [Iteration](https://wiki.g15e.com/pages/Iteration%20(machine%20learning.txt))
- [Loss curve](https://wiki.g15e.com/pages/Loss%20curve.txt)

## Hyperparameters

Definition:

> [Hyperparameters](https://wiki.g15e.com/pages/Hyperparameter.txt) are variables that control different aspects of [training](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)).

Common hyperparameters:

- [Learning rate](https://wiki.g15e.com/pages/Learning%20rate.txt)
- [Batch size](https://wiki.g15e.com/pages/Batch%20size%20(machine%20learning.txt))
- [Epochs](https://wiki.g15e.com/pages/Epoch%20(machine%20learning.txt))

### Key terms

- [Batch size](https://wiki.g15e.com/pages/Batch%20size%20(machine%20learning.txt))
- [Epoch](https://wiki.g15e.com/pages/Epoch%20(machine%20learning.txt))
- [Generalization](https://wiki.g15e.com/pages/Generalization%20(machine%20learning.txt))
- [Hyperparameter](https://wiki.g15e.com/pages/Hyperparameter.txt)
- [Iteration](https://wiki.g15e.com/pages/Iteration%20(machine%20learning.txt))
- [Learning rate](https://wiki.g15e.com/pages/Learning%20rate.txt)
- <Mini-batch>
- [Parameter](https://wiki.g15e.com/pages/Parameter%20(machine%20learning.txt))
- [Stochastic gradient descent](https://wiki.g15e.com/pages/Stochastic%20gradient%20descent.txt)

## Gradient descent exercise

https://developers.google.com/machine-learning/crash-course/linear-regression/gradient-descent-exercise

## Programming exercise

https://developers.google.com/machine-learning/crash-course/linear-regression/programming-exercise

## What's next

- [ML crash course - Logistic regression](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Logistic%20regression.txt)