· 10 min read

Regression with Multiple Input Variables

Deep dive into multiple linear regression, vectorization, gradient descent, feature scaling, and polynomial regression.

Regression with Multiple Input Variables

2. Regression with multiple input variables

Multiple Features

Screenshot 2024-03-20 at 10.30.22 PM.png

Model

fw,b(x)=w1x1+w2x2+...+wnxn+bf_w,_b(x) = w_1x_1+w_2x_2+...+w_nx_n+b

w\overrightarrow{w} = [w1 w2 w3 .. wn]

b

x\overrightarrow{x} = [x1 x2 x3 .. xn]

Simplified - multiple linear regression

fw,b(x)=w1x1+w2x2+...+wnxn+b=wx+bf_w,_b(x) = w_1x_1+w_2x_2+...+w_nx_n+b = \overrightarrow{w}\cdot\overrightarrow{x}+b

Vectorization

Parameters and features

w\overrightarrow{w} = [w1 w2 w3] (n = 3)

b is a number

x\overrightarrow{x} = [x1 x2 x3 .. xn]

w = np.array([1, 2, 3])
b = 4
x = np.array([10, 20, 30])

Without vectorization

fw,b(x)=w1x1+w2x2+...+wnxn+bf_{\overrightarrow{w}},_b(\overrightarrow{x}) = w_1x_1+w_2x_2+...+w_nx_n+b

f = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + b

=fw,b(x)=j=1nwjxj+b= f_{\overrightarrow{w}},_b(\overrightarrow{x}) = \sum_{j=1}^{n}w_jx_j+b

f = 0
for j in range(n):
	f = f + w[j] * x[j]
f = f + b

With vectorization

fw,b(x)=wx+bf_{\overrightarrow{w}},_b(\overrightarrow{x}) = \overrightarrow{w}\cdot\overrightarrow{x}+b

f = np.dot(w,x) + b

vectorization calculate each columns in parallel

Gradient descent

w\overrightarrow{w} = (w1 w2 … w16)

d\overrightarrow{d} = (d1 d2 … d16)

w = np.array([0.5, 1.3, ... , 3.4])
d = np.array([0.3, 0.2, ... , 0.4])

Compute wj=wj0.1djw_j = w_j - 0.1 d_j for j=1...16j = 1 ... 16

Without vectorization

w1 = w1 - 0.1d1

w16 = w16 - 0.1d16

for j in range(16):
	w[j] = w[j] - 0.1 * d[j]

With vectorization

w=w0.1d\overrightarrow{w}=\overrightarrow{w}-0.1\overrightarrow{d}

w = w - 0.1 * d

Gradient descent for multiple regression

Previous notation

Parameters

w1,...,wnw_1,...,w_n

bb

Model

fw,b(x)=w1x1+...+wnxn+bf_{\overrightarrow{w},b}(\overrightarrow{x})=w_1x_1+...+w_nx_n+b

Cost function

J(w1,...,wn,b)J(w_1,...,w_n,b)

Gradient descent

repeat {

wj=wjaddwjJ(w1,...wn,b)w_j=w_j-a\frac{d}{dw_j}J(w_1,...w_n,b)

b=baddwbJ(w1,...wn,b)b=b-a\frac{d}{dw_b}J(w_1,...w_n,b)

}

Vector notation

Parameters

w=[w1...wn]\overrightarrow{w}=[w_1...w_n]

bb

Model

fw,b(x)=wx+bf_{\overrightarrow{w},b}(\overrightarrow{x})=\overrightarrow{w}\cdot\overrightarrow{x}+b

Cost function

J(w,b)J(\overrightarrow{w},b)

Gradient descent

repeat {

wj=wjaddwjJ(w,b)w_j = w_j - a\frac{d}{dw_j}J(\overrightarrow{w},b)

b=baddwbJ(w,b)b = b - a\frac{d}{dw_b}J(\overrightarrow{w},b)

}

Gradient Descent

One feature

repeat {

w=wa1mi=1m(fw,b(x(i))y(i))x(i)w = w - a \frac{1}{m}\sum_{i=1}^{m}(f_w,_b(x^{(i)})-y^{(i)})x^{(i)}

b=ba1mi=1m(fw,b(x(i))y(i))b = b - a\frac{1}{m}\sum_{i=1}^{m}(f_w,_b(x^{(i)})-y^{(i)})

simultaneously update w, b

}

n features (n ≥ 2)

repeat {

j = 1 : w1=w1a1mi=1m(fw,b(x(i))y(i))x(i)w_1 = w_1 - a \frac{1}{m}\sum_{i=1}^{m}(f_{\overrightarrow{w},b}(\overrightarrow{x}^{(i)})-y^{(i)})x^{(i)}

b=ba1mi=1m(fw,b(x(i))y(i))b = b - a\frac{1}{m}\sum_{i=1}^{m}(f_{\overrightarrow{w},b}(\overrightarrow{x}^{(i)})-y^{(i)})

simulatenously update wjw_j (for j=1,..,n)j = 1,..,n) and bb

}

An alternative to gradient descent

Normal equation

Feature scaling

Feature scaling enables gradient descent to run much faster by rescaling the range of each features

Mean normalization

Screenshot 2024-03-24 at 12.05.06 PM.png

Z-score normalization

Screenshot 2024-03-24 at 12.07.16 PM.png

Checking Gradient descent for convergence

Screenshot 2024-03-24 at 12.13.24 PM.png

Choosing the learning rate

Screenshot 2024-03-24 at 12.15.14 PM.png

Feature engineering

Screenshot 2024-03-24 at 12.16.45 PM.png

Polynomial regression

Screenshot 2024-03-24 at 12.18.30 PM.png

Screenshot 2024-03-24 at 12.19.20 PM.png