Multiple Linear Regression - Ordinary Least Squares Method Explained

Chanaka Prasanna

08 Oct 2024 • 5 min read

In the realm of data analysis, understanding the relationships between variables is crucial. One of the most powerful tools for uncovering these relationships is regression analysis. While I previously explored the fundamentals of regression and linear regression, this blog post delves deeper into multiple linear regression. This technique allows us to analyze how multiple input features contribute to a single output variable, helping us make predictions and informed decisions based on complex data sets. I’ll explore the mathematical foundations and the least squares estimation method, ultimately providing you with a comprehensive understanding of multiple linear regression.

Link for the previous blog post.

Here we are gonna discuss multiple linear regression in detail. It's clear that we have a linear equation with multiple input features. However, we assume that the model gives a single output.

y = ŷ + ε

Model:

ŷ = w₀ + w₁x₁ + ... + w_dx_d

where :

w₀ is the bias or intercept
from w₁ to w_d are weights or coefficients
ε is the error
ŷ is the predicted value

x₁	x₂	x₃	...	x_d	y
x₁₁	x₁₂	x₁₃	...	x_1d	y₁
x₂₁	x₂₂	x₂₃	...	x_2d	y₂
x₃₁	x₃₂	x₃₃	...	x_3d	y₃
...	...	...	...	...	...
x_n1	x_n2	x_n3	...	x_nd	y_n

Here X_nd represents the n^th data point of d^th feature.

Now we can represent all the above instances using:

The output vector y is n × 1 vector, X is n × (d+1) matrix and w is (d+1) × 1 vector.

Now the model can be represented as below.

ŷ=Xw

❇️

Note: Vectors are often represented in bold lowercase (e.g:- y).Matrices are represented in bold uppercase (e.g:- X).

Sometimes, you may confused about the column of 1s in matrix X.

So let's take an example and clarify it first.

Suppose we have a dataset with 2 features (x₁ and x₂) and 3 instances.

ŷ = w₀ + w₁x₁ + w₂x₂

Instance	𝑥_₁	𝑥_₂	𝑦
1	2	3	10
2	4	5	20
3	6	7	30

Without Column of 1s

If we exclude the intercept term, the model matrix X and weight vector w would look like this.

So, the output vector Y would be.

🔴

Issue: There's no term to account for the intercept w₀. The model is forced to pass through the origin (i.e., when x₁=0, x₂=0 and y=0), which may not be desirable

With Column of 1s

To include the intercept w₀, we modify X and w.

Now, the output vector would be

Now you know the reason to add a column of 1s in X.

Our objective of this problem is to minimize the error.

Least Square Estimation method

In the Least Square Estimation method, we calculate the average of the sum of squared errors.
ε = y - ŷ

ε = y - Xw

Both y and ŷ are n x 1 vectors.

As we know, the y and the inputs are known, and the objective function is a function of w (This includes the weights and bias).

So we can write the objective function as a function of w (bias: w₀ + weights: w₁ to w_d) as:

J(w) = Σε² / 2n

As we know, Σy² can be written as:

Then, let's express Y as a matrix

The sum of squares can be rewritten using matrix multiplication, where the transpose of Y, Y^T, is multiplied by Y:

So we can write Σε² as ε^T ε

ε = y - Xw

Then the objective function can be written as:

J(w) = (y-Xw)^T (y-Xw) / 2n

💡

(AB)^T = B^T A^T

J(w) = (y^T-w^TX^T) (y-Xw) / 2n

J(w) = ( y^Ty - y^TXw - w^TX^Ty + w^TX^TXw ) / 2n

Since y^TXw and w^TX^Ty are the same,

J(w) = ( y^Ty - 2w^TX^Ty + w^TX^TXw ) / 2n

To find the minimum, we need to get derivatives of this function and set it to zero.

∂J(w) / ∂w = 0

∂ [(y^Ty - 2w^TX^Ty + w^TX^TXw) / 2n] / ∂w = 0

0 - 2X^Ty +X^TXw = 0

w = (X^TX)^-1 X^Ty

Now we have a vector of values for w. This is a closed-form solution where we know exact values. Sometimes we have scenarios in the real world where we can't apply closed-form methods. In that case, we have to use the numerical optimization methods. For this multiple linear regression model also you can follow same steps as I mentioned in previous blog post.

In summary, multiple linear regression is a valuable technique for modeling relationships between several independent variables and a dependent variable. The least squares estimation method provides an efficient way to minimize errors, leading us to optimal weights for our model. While closed-form solutions are beneficial, it's essential to remember that in real-world scenarios, numerical optimization methods may also be necessary. Understanding these concepts will empower you to leverage multiple linear regression in various applications, enabling better predictions and insights from your data.

If you found something useful here, subscribe to me.