# ML crash course - Logistic regression

> Machine learning crash course 중 Logistic regression 챕터.

[Machine learning crash course](https://wiki.g15e.com/pages/Machine%20learning%20crash%20course.txt) 중 [Logistic regression](https://wiki.g15e.com/pages/Logistic%20regression.txt) 챕터.

https://developers.google.com/machine-learning/crash-course/logistic-regression

## Introduction

Learning Objectives:

- Identify use cases for performing [logistic regression](https://wiki.g15e.com/pages/Logistic%20regression.txt).
- Explain how logistic regression models use the [sigmoid function](https://wiki.g15e.com/pages/Sigmoid%20function.txt) to calculate <probability>.
- Compare [linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt) and logistic regression.
- Explain why logistic regression uses [log loss](https://wiki.g15e.com/pages/Log%20loss.txt) instead of [squared loss](https://wiki.g15e.com/pages/L2%20loss.txt).
- Explain the importance of [regularization](https://wiki.g15e.com/pages/Regularization%20(machine%20learning.txt)) when training logistic regression models.

Prerequisites:

- [Introduction to Machine Learning](https://wiki.g15e.com/pages/Introduction%20to%20Machine%20Learning.txt)
- [ML crash course - Linear regression](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Linear%20regression.txt)

## Calculating a probability with the sigmoid function

This module focuses on using logistic regression model output as-is. In the [Classification module](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Classification.txt), you'll learn how to convert this output into a [binary category](https://wiki.g15e.com/pages/Binary%20classification.txt).

### Sigmoid function

The standard logistic function, also known as the [sigmoid function](https://wiki.g15e.com/pages/Sigmoid%20function.txt) (sigmoid means "s-shaped"), has the formula:

$$
f(x) = \frac{1}{1 + e^{-x}}
$$

### Linear regression from/to logistic regression

You can pass the linear regression prediction into the sigmoid function to obtain the logistic regression prediction.

Output of [linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt) is referred to as the [log odds](https://wiki.g15e.com/pages/Log%20odds.txt) because if you solve the sigmoid function for $x$, then $x$ is defined as the log of the ratio of the probabilities of two possible outcompues: $y$ and $1 - y$:

$$
x = log(\frac{y}{1 - y})
$$

### Key terms

- [Binary classification](https://wiki.g15e.com/pages/Binary%20classification.txt)
- [Log odds](https://wiki.g15e.com/pages/Log%20odds.txt)
- [Logistic regression](https://wiki.g15e.com/pages/Logistic%20regression.txt)
- [Sigmoid function](https://wiki.g15e.com/pages/Sigmoid%20function.txt)

## Loss and regularization

Logistic regression models are [Training|trained](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)) using the same process as [linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt) models, with two key distinctions:

- Logistic regression models use [log loss](https://wiki.g15e.com/pages/Log%20loss.txt) as the [loss function](https://wiki.g15e.com/pages/Loss%20function.txt) instead of [squared loss](https://wiki.g15e.com/pages/L2%20loss.txt).
- Applying [regularization](https://wiki.g15e.com/pages/Regularization%20(machine%20learning.txt)) is critical to prevent [overfitting](https://wiki.g15e.com/pages/Overfitting.txt).

### Log Loss

[Squared loss](https://wiki.g15e.com/pages/L2%20loss.txt) works well for a [linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt) where the rate of change of the output values is constant. However, the rate of change of a logistic regression model is not constant.

If you used squared loss to calculate errors for the sigmoid function, as the output got closer and closer to 0 and 1, you would need more memory to preserve the precision needed to track these values.

Instead, the loss function for logistic regression is [log loss](https://wiki.g15e.com/pages/Log%20loss.txt). The Log Loss equation returns the logarithm of the magnitude of the change, rather than just the distance from data to prediction.

### Regularization in logistic regression

[Regularization](https://wiki.g15e.com/pages/Regularization%20(machine%20learning.txt)), a mechanism for penalizing model complexity during [training](https://wiki.g15e.com/pages/Training%20(machine%20learning.txt)), is extremely important in logistic regression modeling. Without regularization, the asymptotic nature of logistic regression would keep driving loss towards 0 in cases where the model has a large number of features. Consequently, most logistic regression models use one of the following two strategies to decrease model complexity:

- [L2 regularization](https://wiki.g15e.com/pages/L2%20regularization.txt)
- [Early stopping](https://wiki.g15e.com/pages/Early%20stopping.txt)

See also [ML crash course - Datasets, generalization, and overfitting](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Datasets,%20generalization,%20and%20overfitting.txt)

### Key terms

- [Gradient descent](https://wiki.g15e.com/pages/Gradient%20descent.txt)
- [Linear regression](https://wiki.g15e.com/pages/Linear%20regression.txt)
- [Log loss](https://wiki.g15e.com/pages/Log%20loss.txt)
- [Logistic regression](https://wiki.g15e.com/pages/Logistic%20regression.txt)
- [Loss function](https://wiki.g15e.com/pages/Loss%20function.txt)
- [Overfitting](https://wiki.g15e.com/pages/Overfitting.txt)
- [Regularization](https://wiki.g15e.com/pages/Regularization%20(machine%20learning.txt))
- [Squared loss](https://wiki.g15e.com/pages/L2%20loss.txt)

## What's next

- [ML crash course - Classification](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Classification.txt)