
Introduction to Machine Learning
Overview of supervised and unsupervised learning, linear regression, and gradient descent
1. Introduction to machine learning
Supervised learning
input x → output y (learns from being right answers)
- Examples
- Spam email filtering
- Audio text transcripts (speech recognition)
- Language translations
- Online advertisement
- Self-driving car
- Visual inspection
Regression
Regression predicts a number infinitely many possible outputs
Classification
Classification predicts categories, small number of possible outputs
Two or more inputs
Unsupervised Learning
Find something interesting in unlabeled data
- example
-
Google news
-
DNA microarray
-
Unsupervised learning
Data only comes with inputs x, but not output labels y: Algorithm has to find structure in the data
- clustering - group similar data points together
- Anomaly detection - find unusual data points
- Dimensionality reduction - compress data using fewer numbers
Linear Regression Model
Terminology
- Training set - data used to train the model
- x - input variable feature
- y - output variable, target variable
- m - number of training examples
- (x,y) - single training example
- - ith training example
- ex)
Cost Function
What do w, b do?
Find w,b:
is close to for all
Cost function
cost function is to see how well the model is doing
- error : (prediction_i - realvalue_i)
- different people use different cost function
- squared error cost function is the most commonly used one
Cost function intuition
- model
- parameter
- cost function
- goal
Simplified cost function
-
model
-
parameter
-
cost function
-
goal
-
examples
Visualizing the cost function
-
examples
Gradient Descent
Have some function for linear regression or any function
Want
Outline:
- Start with some w, b (set w=0,b=0)
- keep changing w,b to reduce J(w,b)
- until we settle at or near a minimum
-
may have > 1 minimum
-
Implementing gradient descent
Gradient descent algorithm
simultaneously update w and b
: learning rate
: Derivative
Gradient descent intuition
Learning rate
if a is too small, gradient descent may be slow
if a is too large, gradient descent may:
- overshoot, never reach minimum
- fail to converge, diverge
Gradient descent for linear regression
Linear regression model
Cost function
Gradient descent algorithm
repeat until convergence
- squared error cost will never have local minimum
- gradient descent with convex function will always converge with global minimum
Mathematics