ML crash course - Linear regression
Machine learning crash course 중 Linear regression 챕터.
developers.google.com/machine-learning/crash-course/linear-regression
Introduction
Learning objectives:
- Explain a loss function and how it works.
- Define and describe how gradient descent finds the optimal model parameters.
- Describe how to tune hyperparameters to efficiently train a linear model.
Prerequisites:
Definition of linear regression:
Linear regression is a statistical technique used to find the relationship between variables. In an ML context, linear regression finds the relationship between features and a label.
Linear regression equation
In algebraic terms, the model would be defined as , where
- is the value we want to predict.
- is the slope of the line
- is our input value
- is the y-intercept
In ML, we write the equation for a linear regression model as , where
- is the predicted label - the output
- is the bias of the model. Bias is the same concept as the y-intercept in the algebraic equation for a line. In ML, bias is sometimes referred to as . Bias is a parameter of the model and is calculated during training.
- is the weight of the feature. Weight is the same concept as the slope in the algebraic equation for a line. Weight is a parameter of the model and is calculated during training.
- is a feature - the input
Models with multiple features have two or more weights, e.g.:
Key terms
Loss
Definition:
Loss is a numerical metric that describes how wrong a model’s predictions are. Loss measures the distance between the model’s predictions and the actual labels. The goal of training a model is to minimize the loss, reducing it to its lowest possible value.
Distance of loss:
Loss focuses on the distance between the values, not the direction. … Thus, all methods for calculating loss remove the sign.
Types of loss:
- L1 loss: The sum of the absolute values of the difference between the predicted values and the actual values.
- Mean absolute error: The average of L1 losses across a set of examples.
- L2 loss: The sum of the squared difference between the predicted values and the actual values.
- Mean squared error: The average of L2 losses across a set of examples.
When processing multiple examples at once, we recommend averaging the losses across all the examples, whether using MAE or MSE.
Choosing a loss
When choosing the best loss function, consider how you want the model to treat outliers. For instance, MSE moves the model more toward the outliers, while MAE doesn’t.
Key terms
Parameters exercise
https://developers.google.com/machine-learning/crash-course/linear-regression/parameters-exercise
Gradient descent
Gradient descent is a mathematical technique that iteratively finds the weights and bias that produce the model with the lowest loss.
Model convergence and loss curves
When training a model, you’ll often look at a loss curve to determine if the model has converged. The loss curve shows how the loss changes as the model trains.
Convergence and convex functions
The loss functions for linear models always produce a convex surface. As a result of this property, when a linear regression model converges, we know the model has found the weights and bias that produce the lowest loss.
Key terms
Hyperparameters
Definition:
Hyperparameters are variables that control different aspects of training.
Common hyperparameters:
Key terms
- Batch size
- Epoch
- Generalization
- Hyperparameter
- Iteration
- Learning rate
- Mini-batch
- Parameter
- Stochastic gradient descent
Gradient descent exercise
Programming exercise
https://developers.google.com/machine-learning/crash-course/linear-regression/programming-exercise