Multiple Linear Regression - Ordinary Least Squares Method Explained

Multiple Linear Regression - Ordinary Least Squares Method Explained

In the realm of data analysis, understanding the relationships between variables is crucial. One of the most powerful tools for uncovering these relationships is regression analysis. While I previously explored the fundamentals of regression and linear regression, this blog post delves deeper into multiple linear regression. This technique allows us to analyze how multiple input features contribute to a single output variable, helping us make predictions and informed decisions based on complex data sets. I’ll explore the mathematical foundations and the least squares estimation method, ultimately providing you with a comprehensive understanding of multiple linear regression.

Link for the previous blog post.

Regression Analysis
When dealing with data in the real world, you may encounter some variables (features) that have some relationships between them. These relationships may be linear or nonlinear. So, regression analysis is a technique used to find/model relationships between variables. For example, How do students’ marks on the statistics module

Here we are gonna discuss multiple linear regression in detail. It's clear that we have a linear equation with multiple input features. However, we assume that the model gives a single output.

y = ŷ + ε

Model:

ŷ = w0 + w1x1 + ... + wdxd

where :

    • w0 is the bias or intercept
    • from w1 to wd are weights or coefficients
    • ε is the error
    • ŷ is the predicted value
x1 x2 x3 ... xd y
x11 x12 x13 ... x1d y1
x21 x22 x23 ... x2d y2
x31 x32 x33 ... x3d y3
... ... ... ... ... ...
xn1 xn2 xn3 ... xnd yn

Here Xnd represents the nth data point of dth feature.

Now we can represent all the above instances using:

The output vector y is n × 1 vector, X is n × (d+1) matrix and w is (d+1) × 1 vector.

Now the model can be represented as below.

ŷ=Xw

❇️
Note: Vectors are often represented in bold lowercase (e.g:- y).Matrices are represented in bold uppercase (e.g:- X).

Sometimes, you may confused about the column of 1s in matrix X.

So let's take an example and clarify it first.

Suppose we have a dataset with 2 features (x1 and x2​) and 3 instances.

ŷ = w0 + w1x1 + w2x2

Instance 𝑥 𝑥 𝑦
1 2 3 10
2 4 5 20
3 6 7 30

Without Column of 1s

If we exclude the intercept term, the model matrix X and weight vector w would look like this.

So, the output vector Y would be.

🔴
Issue: There's no term to account for the intercept w0​. The model is forced to pass through the origin (i.e., when x1=0, x2​=0 and y=0), which may not be desirable

With Column of 1s

To include the intercept w0, we modify X and w.

Now, the output vector would be

Now you know the reason to add a column of 1s in X.

Our objective of this problem is to minimize the error.

Least Square Estimation method

In the Least Square Estimation method, we calculate the average of the sum of squared errors.
ε = y - ŷ

ε = y - Xw

Both y and ŷ are n x 1 vectors.

As we know, the y and the inputs are known, and the objective function is a function of w (This includes the weights and bias).

So we can write the objective function as a function of w (bias: w0 + weights: w1 to wd) as:

J(w) = Σε2 / 2n

As we know, Σy2 can be written as:

Then, let's express Y as a matrix

The sum of squares can be rewritten using matrix multiplication, where the transpose of Y, YT, is multiplied by Y:

So we can write Σε2 as εT ε

ε = y - Xw

Then the objective function can be written as:

J(w) = (y-Xw)T (y-Xw) / 2n

💡
(AB)T = BT AT

J(w) = (yT-wTXT) (y-Xw) / 2n

J(w) = ( yTy - yTXw - wTXTy + wTXT Xw ) / 2n

Since yTXw and wTXTy are the same,

J(w) = ( yTy - 2wTXTy + wTXT Xw ) / 2n

To find the minimum, we need to get derivatives of this function and set it to zero.

∂J(w) / ∂w = 0

∂ [(yTy - 2wTXTy + wTXT Xw) / 2n ] / ∂w = 0

0 - 2XTy +XT Xw = 0

w = (XT X)-1 XTy

Now we have a vector of values for w. This is a closed-form solution where we know exact values. Sometimes we have scenarios in the real world where we can't apply closed-form methods. In that case, we have to use the numerical optimization methods. For this multiple linear regression model also you can follow same steps as I mentioned in previous blog post.

In summary, multiple linear regression is a valuable technique for modeling relationships between several independent variables and a dependent variable. The least squares estimation method provides an efficient way to minimize errors, leading us to optimal weights for our model. While closed-form solutions are beneficial, it's essential to remember that in real-world scenarios, numerical optimization methods may also be necessary. Understanding these concepts will empower you to leverage multiple linear regression in various applications, enabling better predictions and insights from your data.

If you found something useful here, subscribe to me.