# Multiple Linear Regression - Ordinary Least Squares Method Explained

In the realm of data analysis, understanding the relationships between variables is crucial. One of the most powerful tools for uncovering these relationships is regression analysis. While I previously explored the fundamentals of regression and linear regression, this blog post delves deeper into **multiple linear regression**. This technique allows us to analyze how multiple input features contribute to a single output variable, helping us make predictions and informed decisions based on complex data sets. I’ll explore the mathematical foundations and the least squares estimation method, ultimately providing you with a comprehensive understanding of multiple linear regression.

Link for the previous blog post.

Here we are gonna discuss multiple linear regression in detail. It's clear that we have a linear equation with multiple input features. However, we assume that the model gives a single output.

y = **ŷ** + **ε**

Model:

**ŷ** = w_{0} + w_{1}x_{1} + ... + w_{d}x_{d}

where :

**w**is the bias or intercept_{0}- from
**w**to_{1}**w**are weights or coefficients_{d} **ε**is the error**ŷ**is the predicted value

x_{1} |
x_{2} |
x_{3} |
... | x_{d} |
y |
---|---|---|---|---|---|

x_{11} |
x_{12} |
x_{13} |
... | x_{1d} |
y_{1} |

x_{21} |
x_{22} |
x_{23} |
... | x_{2d} |
y_{2} |

x_{31} |
x_{32} |
x_{33} |
... | x_{3d} |
y_{3} |

... | ... | ... | ... | ... | ... |

x_{n1} |
x_{n2} |
x_{n3} |
... | x_{nd} |
y_{n} |

Here X_{nd} represents the n^{th} data point of d^{th} feature.

Now we can represent all the above instances using:

The output vector **y** is `n × 1`

**vector**, **X** is `n × (d+1)`

**matrix** and **w** is `(d+1) × 1`

**vector**.

Now the model can be represented as below.

**ŷ=Xw**

**are often represented in**

**Vectors****(e.g:-**

**bold lowercase****).**

**y****are represented in**

**Matrices****(e.g:-**

**bold uppercase****).**

**X**Sometimes, you may confused about the column of 1s in matrix **X. **

So let's take an example and clarify it first.

Suppose we have a dataset with **2 features** (x_{1} and x_{2}) and **3 instances**.

ŷ = w_{0} + w_{1}x_{1} + w_{2}x_{2}

Instance | 𝑥_{₁} |
𝑥_{₂} |
𝑦 |
---|---|---|---|

1 | 2 | 3 | 10 |

2 | 4 | 5 | 20 |

3 | 6 | 7 | 30 |

**Without Column of 1s**

If we exclude the intercept term, the model matrix **X** and weight vector **w** would look like this.

So, the output vector **Y** would be.

**: There's**

**Issue****to account for the intercept w**

**no term**_{0}. The model is forced to pass through the origin (i.e., when x

_{1}=0, x

_{2}=0 and y=0), which may not be desirable

**With Column of 1s**

To include the intercept w_{0}, we modify **X **and **w.**

Now, the output vector would be

Now you know the reason to add a column of 1s in **X.**

Our objective of this problem is to minimize the error.

**Least Square Estimation method**

In the **Least Square Estimation method, **we calculate the average of the sum of squared errors.

ε = **y** - **ŷ**

ε = **y** - **Xw**

Both **y **and **ŷ** are `n x 1`

vectors.

As we know, the y and the inputs are known, and the objective function is a function of w (This includes the weights and bias).

So we can write the objective function as a function of w (bias: w_{0} + weights: w_{1} to w_{d}) as:

J(w) = Σε^{2} / 2n

As we know, Σy^{2} can be written as:

Then, let's express Y as a matrix

The sum of squares can be rewritten using matrix multiplication, where the transpose of Y, Y^{T}, is multiplied by Y:

So we can write **Σε ^{2}** as

**ε**

^{T}εε = **y** - **Xw**

Then the objective function can be written as:

J(w) = (**y**-**Xw**)^{T} (**y**-**Xw**) / 2n

^{T}= B

^{T}A

^{T}

J(w) = (**y**^{T}-**w**^{T}**X**^{T}) (**y**-**Xw**) / 2n

J(w) = ( **y**^{T}**y - y**^{T}**Xw** - **w**^{T}**X**^{T}**y** + **w**^{T}**X**^{T }**Xw ) / 2n**^{ }

Since ** y**^{T}**Xw** and **w**^{T}**X**^{T}**y **are the same,

J(w) = ( **y**^{T}**y ** - 2**w**^{T}**X**^{T}**y** + **w**^{T}**X**^{T }**Xw ) / 2n**^{ }

To find the minimum, we need to get derivatives of this function and set it to zero.

∂J(w) / ∂w = 0

∂ [(**y**^{T}**y ** - 2**w**^{T}**X**^{T}**y** + **w**^{T}**X**^{T }**Xw) / 2n**^{ }] / ∂w = 0

0 - 2**X**^{T}**y +X**^{T }**Xw = 0**

** w **= (**X**^{T }**X) ^{-1}**

**X**

^{T}

**y**

Now we have a vector of values for w. This is a closed-form solution where we know exact values. Sometimes we have scenarios in the real world where we can't apply closed-form methods. In that case, we have to use the numerical optimization methods. For this multiple linear regression model also you can follow same steps as I mentioned in previous blog post.

In summary, multiple linear regression is a valuable technique for modeling relationships between several independent variables and a dependent variable. The least squares estimation method provides an efficient way to minimize errors, leading us to optimal weights for our model. While closed-form solutions are beneficial, it's essential to remember that in real-world scenarios, numerical optimization methods may also be necessary. Understanding these concepts will empower you to leverage multiple linear regression in various applications, enabling better predictions and insights from your data.

If you found something useful here, subscribe to me.