Conceptualizing Multivariate Linear Regression: How?

danielwu779
Aug 4, 2024
2 min read

If there are only three visible dimensions to the human eye, how can we visualize - or even obtain - a regression between four or more variables?

This is a question I encountered in my most recent paper. Although this question can be easily brushed off by any second year stats/econ/math/cs major, (or any other majors - I'm sorry for missing you), but for my high school eye, this question is utterly perplexing.

First, to understand multivariate regression, we must give a brief introduction to Two-Variable Linear Regression (OLS).

Ordinary Least Squares (OLS) regression is a foundational technique in statistics used to model the relationship between two variables. In its simplest form, OLS regression involves one dependent variable (Y) and one independent variable (X). The goal is to find the line that best fits the data points, which is achieved by minimizing the sum of the squared differences between the observed values and the values predicted by the line. The equation can be represented as:

𝑌=𝛽0+𝛽1𝑋+𝜖Y=β0+β1X+ϵ

Sounds simple. Now, lets extend that to three variables. When extending linear regression to include two independent variables, the model becomes a multiple linear regression. The equation now includes an additional term:

In this case, 𝑋1X1 and 𝑋2X2 are the independent variables, and 𝛽2β2 is the coefficient for 𝑋2X2. This model can be visualized in three-dimensional space, where the relationship between the variables forms a plane rather than a line. The plane represents the predicted values of Y given different combinations of 𝑋1X1 and 𝑋2X2. The relationship between two and three variable analysis can be visualized below. Credit to this guy: https://www.youtube.com/watch?v=zITIFTsivN8&ab_channel=StatQuestwithJoshStarmer

As the number of independent variables increases, the regression model becomes:

𝑌=𝛽0+𝛽1𝑋1+𝛽2𝑋2+⋯+𝛽𝑘𝑋𝑘+𝜖Y=β0+β1X1+β2X2+⋯+βkXk+ϵ

Here, 𝑘k represents the number of independent variables. In this multivariate linear regression, each additional variable adds another dimension to the model. While we can easily visualize up to three dimensions, higher dimensions are abstract and cannot be directly visualized.Despite this, the mathematical principles remain the same. The goal is still to find the coefficients 𝛽0,𝛽1,…,𝛽𝑘β0,β1,…,βk that minimize the sum of the squared residuals. These coefficients represent the effect of each independent variable on the dependent variable, holding all other variables constant.

However, this seems insane to me. But I've learned not to question how Python or R or Stata can find a 100000 variable regression coefficient. Multivariate linear regression is extremely useful and used to understand the relationship between a dependent variable and multiple independent variables. This technique is particularly useful in fields like economics, social sciences, and finance, where outcomes are influenced by several factors simultaneously. For example, my mood, what I eat today, and what I wear will impact how I present myself to someone else.

Conceptualizing Multivariate Linear Regression: How?

Recent Posts

Comments