· 10 min read

Introduction to Machine Learning

Overview of supervised and unsupervised learning, linear regression, and gradient descent

Introduction to Machine Learning

1. Introduction to machine learning

Supervised learning

input x → output y (learns from being right answers)

Regression

Regression predicts a number infinitely many possible outputs

Screenshot 2024-03-20 at 6.28.25 PM.png

Classification

Classification predicts categories, small number of possible outputs

Screenshot 2024-03-20 at 6.29.45 PM.png

Two or more inputs

Screenshot 2024-03-20 at 6.30.48 PM.png

Unsupervised Learning

Find something interesting in unlabeled data

Screenshot 2024-03-20 at 6.34.39 PM.png

Unsupervised learning

Data only comes with inputs x, but not output labels y: Algorithm has to find structure in the data

Linear Regression Model

Screenshot 2024-03-20 at 7.04.07 PM.png

Terminology

Screenshot 2024-03-20 at 7.05.47 PM.png

Screenshot 2024-03-20 at 7.09.04 PM.png

Cost Function

Screenshot 2024-03-20 at 7.10.21 PM.png

What do w, b do?

Find w,b:

y^(i)\hat{y}^{(i)} is close to y(i)y^{(i)} for all (x(i),y(i))(x^{(i)},y^{(i)})

Screenshot 2024-03-20 at 7.11.31 PM.png

Cost function

cost function is to see how well the model is doing

J(w,b)=12mi=1m(fw,b(x(i))y(i))2J(w,b) = \frac{1}{2m}\sum_{i=1}^m (f_w,_b(x^{(i)})-y^{(i)})^2

Screenshot 2024-03-20 at 7.13.53 PM.png

Cost function intuition

Simplified cost function

Visualizing the cost function

Screenshot 2024-03-20 at 7.33.45 PM.png

Gradient Descent

Have some function J(w,b)J(w,b) for linear regression or any function

Want minw,bJ(w,b)min_w,_bJ(w,b)

Outline:

Implementing gradient descent

Gradient descent algorithm

simultaneously update w and b

w=waddwJ(w,b)w = w - a\frac{d}{dw}J(w,b) b=baddbJ(w,b)b = b - a\frac{d}{db}J(w,b)

aa: learning rate

ddwJ(w,b)\frac{d}{dw}J(w,b): Derivative

Gradient descent intuition

Screenshot 2024-03-20 at 9.16.20 PM.png

Learning rate

if a is too small, gradient descent may be slow

if a is too large, gradient descent may:

Screenshot 2024-03-20 at 9.19.26 PM.png

Gradient descent for linear regression

Linear regression model

fw,b(x)=wx+bf_w,_b(x) = wx + b

Cost function

J(w,b)=12mi=1m(fw,b(x(i))y(i))2J(w,b) = \frac{1}{2m}\sum_{i=1}^m (f_w,_b(x^{(i)})-y^{(i)})^2

Gradient descent algorithm

repeat until convergence

w=waddwJ(w,b)w = w - a\frac{d}{dw}J(w,b) b=baddbJ(w,b)b = b - a\frac{d}{db}J(w,b)

Screenshot 2024-03-20 at 9.27.44 PM.png

Mathematics

1mi=1m(fw,b(x(i))y(i))x(i)\frac{1}{m}\sum_{i=1}^{m}(f_w,_b(x^{(i)})-y^{(i)})x^{(i)}

  1. ddwJ(w,b)\frac{d}{dw}J(w,b)
  2. ddw12mi=1m(fw,b(x(i))y(i))2\frac{d}{dw}\frac{1}{2m}\sum_{i=1}^{m}(f_w,_b(x^{(i)})-y^{(i)})^2
  3. ddw12mi=1m(wx(i)+by(i))2\frac{d}{dw}\frac{1}{2m}\sum_{i=1}^{m}(wx^{(i)}+b-y^{(i)})^2
  4. 12mi=1m(wx(i)+by(i))2x(i)\frac{1}{2m}\sum_{i=1}^{m}(wx^{(i)}+b-y^{(i)})2x^{(i)}

1mi=1m(fw,b(x(i))y(i))\frac{1}{m}\sum_{i=1}^{m}(f_w,_b(x^{(i)})-y^{(i)})

  1. ddbJ(w,b)\frac{d}{db}J(w,b)
  2. ddb12mi=1m(fw,b(x(i))y(i))2\frac{d}{db}\frac{1}{2m}\sum_{i=1}^{m}(f_w,_b(x^{(i)})-y^{(i)})^2
  3. ddb12mi=1m(wx(i)+by(i))2\frac{d}{db}\frac{1}{2m}\sum_{i=1}^{m}(wx^{(i)}+b-y^{(i)})^2
  4. 12mi=1m(wx(i)+by(i))2\frac{1}{2m}\sum_{i=1}^{m}(wx^{(i)}+b-y^{(i)})2

Running gradient descent

Screenshot 2024-03-20 at 9.31.37 PM.png


Source

Machine Learning