# ML crash course - Classification

> Machine learning crash course 중 Classification model 챕터.

[Machine learning crash course](https://wiki.g15e.com/pages/Machine%20learning%20crash%20course.txt) 중 [Classification model](https://wiki.g15e.com/pages/Classification%20model.txt) 챕터.

https://developers.google.com/machine-learning/crash-course/classification

## Introduction

Learning objectives:

- Determine an appropriate [threshold](https://wiki.g15e.com/pages/Classification%20threshold.txt) for a [binary classification](https://wiki.g15e.com/pages/Binary%20classification.txt) model.
- Calculate and choose appropriate metrics to evaluate a binary classification model.
- Interprete [ROC](https://wiki.g15e.com/pages/ROC%20curve.txt) and [AUC](https://wiki.g15e.com/pages/Area%20under%20the%20ROC%20curve.txt)

Prerequisites:

- [Introduction to Machine Learning](https://wiki.g15e.com/pages/Introduction%20to%20Machine%20Learning.txt)
- [ML crash course - Linear regression](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Linear%20regression.txt)
- [ML crash course - Logistic regression](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Logistic%20regression.txt)

[Classification](https://wiki.g15e.com/pages/Classification%20model.txt) is the task of [predicting](https://wiki.g15e.com/pages/Prediction%20(machine%20learning.txt)) which of a set of [classes](https://wiki.g15e.com/pages/Class%20(machine%20learning.txt)) (categories) an [example](https://wiki.g15e.com/pages/Example%20(machine%20learning.txt)) belongs to. You can convert a [logistic regression](https://wiki.g15e.com/pages/Logistic%20regression.txt) model that predicts a <probability> into a [binary classification](https://wiki.g15e.com/pages/Binary%20classification.txt) model that predicts one of two classes.

### Key terms

- [Binary classification](https://wiki.g15e.com/pages/Binary%20classification.txt)
- [Class](https://wiki.g15e.com/pages/Class%20(machine%20learning.txt))
- [Classification](https://wiki.g15e.com/pages/Classification%20model.txt)
- <Multi-class classification>
- [Sigmoid function](https://wiki.g15e.com/pages/Sigmoid%20function.txt)

## Thresholds and the confusion matrix

[Classification threshold](https://wiki.g15e.com/pages/Classification%20threshold.txt):

- While 0.5 might seem like an intuitive threshold, it's not a good idea if the cost of one type of wrong classification is greater than the other, or if the classes are imbalanced.

### Confusion matrix

The probability score is not reality, or [ground truth](https://wiki.g15e.com/pages/Ground%20truth%20(machine%20learning.txt)). There are four possible outcomes for each output from a binary classifier.

- [True positive](https://wiki.g15e.com/pages/True%20positive.txt) (TP): A spam email correctly classified as a spam email. These are the spam messages automatically sent to the spam folder.
- [False positive](https://wiki.g15e.com/pages/False%20positive.txt) (FP): A not-spam email misclassified as spam. These are the legitimate emails that wind up in the spam folder.
- [False negative](https://wiki.g15e.com/pages/False%20negative.txt) (FN): A spam email misclassified as not-spam. These are spam emails that aren't caught by the spam filter and make their way into the inbox.
- [True negative](https://wiki.g15e.com/pages/True%20negative.txt) (TN): A not-spam email correctly classified as not-spam. These are the legitimate emails that are sent directly to the inbox.

When the total of actual positives is not close to the total of actual negatives, the dataset is [imbalanced](https://wiki.g15e.com/pages/Class-imbalanced%20dataset.txt).

### Effect of threshold on true and false positives and negatives

When the [classification threshold](https://wiki.g15e.com/pages/Classification%20threshold.txt) increases:

- both true and false positives decrease, and
- both true and false negatives increase.

## Accuracy, recall, precision, and related metrics

Which evaluation metrics are most meaningful depends on the specific model and the specific task, the cost of different misclassifications, and whether the dataset is balanced or imbalanced.

- [Accuracy](https://wiki.g15e.com/pages/Accuracy%20(machine%20learning.txt))
- [Recall](https://wiki.g15e.com/pages/Recall%20(machine%20learning.txt))
- [False positive rate](https://wiki.g15e.com/pages/False%20positive%20rate.txt)
- [Precision](https://wiki.g15e.com/pages/Precision%20(machine%20learning.txt))
- [F1 score](https://wiki.g15e.com/pages/F1%20score.txt)

## ROC and AUC

If you want to evaluate a model's quality across all possible thresholds, you need [ROC curve](https://wiki.g15e.com/pages/ROC%20curve.txt) and [AUC](https://wiki.g15e.com/pages/Area%20under%20the%20ROC%20curve.txt).

### Receiver-operating characteristic curve

### Area under the curve

### AUC and ROC for choosing model and threshold

## Prediction bias

## Multi-class classification

## Programming exercise

https://developers.google.com/machine-learning/crash-course/classification/programming-exercise

## What's next

- [ML crash course - Numerical data](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Numerical%20data.txt)