Linear Least Square Method

Contents

Linear Least Square Method#

My Image

Project Description#

This code provides an interactive visualization of different methods to compute the Least Squares Regression Line using vertical, horizontal, and perpendicular residuals. The goal of this visualization is to allow users to manually adjust the slope (β1) and intercept (β0) of a regression line and compare it with the automatically computed Least Squares Line for each residual type.

The code calculates the Sum of Squared Distances (SSD) for both the user-defined line and the automatically computed least squares line. The user can interactively visualize the effect of different slope and intercept values on the regression line’s fit, using sliders for adjustments.

Mathematical Insights:#

  1. Linear Regression with Different Residuals:

    • In standard Least Squares Regression, the goal is to minimize the vertical distance between the data points and the regression line. This approach uses the formula:

      y=β1x+β0+ϵi

      where:

      • β1 is the slope of the line.

      • β0 is the intercept of the line (the value of y when x=0).

      • ϵi represents the residual for the i-th data point, which is the difference between the actual value yi and the predicted value y^i.

    • The goal is to find values for β1 and β0 that minimize the sum of the squared residuals (SSR):

      SSR=i=1nϵi2=i=1n(yiy^i)2

      Note: We are using this as the cost function to find the best fitted line.

  2. Three Types of Residuals:

    • This code allows users to visualize three different types of residuals:

      • Vertical Residuals: The vertical difference between the data point and the regression line, commonly used in standard Least Squares Regression.

        Vertical Residual=yiy^i=yi(β1xi+β0)
      • Horizontal Residuals: The horizontal distance between the data point and the regression line. This method measures the deviation along the x-axis.

        Horizontal Residual=xix^i

        where x^i is found by solving for x when yi=β1x+β0, leading to:

        x^i=yiβ0β1
      • Perpendicular Residuals: The shortest (perpendicular) distance from the data point to the regression line, computed using geometric methods. This approach provides a more accurate geometric fit but is not typically used in standard regression.

        Perpendicular Distance=|β1xiyi+β0|β12+1
      • Derivation of the Perpendicular Residual Formula

        • The perpendicular residual quantifies the distance from a point (xi,yi) to a regression line given by:

          y=β1x+β0,

          where:

          • β1 is the slope,

          • β0 is the intercept.

          The perpendicular residual, unlike the vertical residual, is measured perpendicularly from the point to the regression line.

        • 1. General Formula for the Distance from a Point to a Line

          In 2D geometry, the formula for the perpendicular distance from a point (x0,y0) to a line of the form Ax+By+C=0 is:

          d=|Ax0+By0+C|A2+B2.
        • 2. Rearranging the Regression Line Equation

          To apply this formula to our regression line, we first rewrite the line equation y=β1x+β0 in the form Ax+By+C=0. Rearranging the terms, we get:

          β1xy+β0=0.

          Here, we can identify:

          • A=β1,

          • B=1,

          • C=β0.

        • 3. Substituting into the Perpendicular Distance Formula

          Now, using the point (xi,yi) in the distance formula:

          d=|β1xiyi+β0|β12+(1)2.

          The denominator simplifies to β12+1, giving us:

          d=|β1xiyi+β0|β12+1.
        • 4. Removing the Absolute Value for Residuals

          In regression analysis, residuals are typically signed, indicating whether the data point lies above or below the line. Thus, instead of using the absolute value, we keep the sign:

          ei=β1xiyi+β0β12+1.

          This formula expresses the perpendicular residual, which measures the signed perpendicular distance from the point (xi,yi) to the regression line.


Alt text

  1. Sum of Squared Distances (SSD):

    • In this code, the Sum of Squared Distances (SSD) is computed dynamically for both the user-defined line and the least squares line for each type of residual.

    • The formula for SSD is similar to the sum of squared residuals:

      SSD=i=1nϵi2=i=1n(Residuali)2

      The goal is to minimize the SSD by adjusting β1 and β0. The code visualizes how different residual types affect the computed SSD.

Key Features:#

  1. Interactive Plot:

    • The plot displays the data points along with both the user-defined regression line and the least squares regression line. The user can manually adjust the slope (β1) and intercept (β0) via sliders, and the plot updates in real-time to show the new lines.

  2. Residual Visualization:

    • The distances (residuals) between each data point and the regression line are visualized as lines on the plot. The user can switch between three types of residuals (vertical, horizontal, and perpendicular) using a dropdown menu.

  3. Sum of Squared Distances (SSD):

    • The SSD for both the user-defined line and the least squares line is displayed on the plot. The SSD is recalculated as the user adjusts the slope and intercept, helping users understand the effect of different residual types on the overall error.

  4. Mathematical Insight:

    • The code offers a deep understanding of the least squares method by allowing users to explore different types of residuals and see their impact on the regression line and SSD. Users can learn why vertical residuals are typically used in the classic least squares method and how alternative methods (horizontal and perpendicular) affect the fit.

Minimizing Vertical Residuals (Least square Method) :#

In standard linear regression, we aim to find the best-fitting straight line through a set of data points by minimizing the sum of squared vertical residuals. This method is known as the Ordinary Least Squares (OLS) regression. The residuals are the vertical distances (errors) between the observed values and the values predicted by the linear model.

Problem Definition#

Given a set of data points (xi,yi) for i=1,2,,n, we wish to find the parameters β1 (slope) and β0 (intercept) in the linear equation:

yi=β1xi+β0+ϵi

where ϵi is the residual for the i-th data point.

Our objective is to find β1 and β0 that minimize the Sum of Squared Residuals (SSR):

SSR(β1,β0)=i=1nϵi2=i=1n(yiβ1xiβ0)2

Minimization Process#

To find the values of β1 and β0 that minimize SSR, we take partial derivatives of SSR with respect to β1 and β0, set them equal to zero, and solve the resulting equations.

Step 1: Compute Partial Derivatives#

a. Partial Derivative with respect to β1#

Compute SSRβ1:

SSRβ1=β1i=1n(yiβ1xiβ0)2=i=1n2(yiβ1xiβ0)(xi)

Simplify:

SSRβ1=2i=1nxi(yiβ1xiβ0)

b. Partial Derivative with respect to β0#

Compute SSRβ0:

SSRβ0=β0i=1n(yiβ1xiβ0)2=i=1n2(yiβ1xiβ0)(1)

Simplify:

SSRβ0=2i=1n(yiβ1xiβ0)

Step 2: Set Partial Derivatives to Zero#

a. Setting SSRβ1=0#

2i=1nxi(yiβ1xiβ0)=0

Divide both sides by 2:

i=1nxi(yiβ1xiβ0)=0

b. Setting SSRβ0=0#

2i=1n(yiβ1xiβ0)=0

Divide both sides by 2:

i=1n(yiβ1xiβ0)=0

Step 3: Derive the Normal Equations#

a. First Normal Equation (from derivative w.r.t. β0)#

We have:

i=1n(yiβ1xiβ0)=0

Simplify the summation:

i=1nyiβ1i=1nxinβ0=0

Rewriting:

nβ0+β1i=1nxi=i=1nyi

b. Second Normal Equation (from derivative w.r.t. β1)#

We have:

i=1nxi(yiβ1xiβ0)=0

Simplify the summation:

i=1nxiyiβ1i=1nxi2β0i=1nxi=0

Rewriting:

β0i=1nxi+β1i=1nxi2=i=1nxiyi

Step 4: Solve the System of Equations#

We have two normal equations:

  1. nβ0+β1i=1nxi=i=1nyi

  2. β0i=1nxi+β1i=1nxi2=i=1nxiyi

Let’s denote:

  • Sx=i=1nxi

  • Sy=i=1nyi

  • Sxx=i=1nxi2

  • Sxy=i=1nxiyi

Then the normal equations become:

  1. nβ0+β1Sx=Sy

  2. β0Sx+β1Sxx=Sxy

Solving for β0 and β1#

From the first equation:

β0=Syβ1Sxn

Substitute β0 into the second equation:

(Syβ1Sxn)Sx+β1Sxx=Sxy

Simplify:

SySxβ1Sx2n+β1Sxx=Sxy

Multiply both sides by n to eliminate the denominator:

SySxβ1Sx2+nβ1Sxx=nSxy

Group terms involving β1:

β1Sx2+nβ1Sxx=nSxySySx

Factor out β1 on the left side:

β1(Sx2+nSxx)=nSxySxSy

Rewriting:

β1(nSxxSx2)=nSxySxSy

Thus, the solution for β1 is:

β1=nSxySxSynSxxSx2

Once β1 is known, we can find β0:

β0=Syβ1Sxn

Step 5: Express in Terms of Means#

Let’s define the sample means:

  • x¯=Sxn

  • y¯=Syn

Also define:

  • Cov(x,y)=1ni=1n(xix¯)(yiy¯)=Sxynx¯y¯n

  • Var(x)=1ni=1n(xix¯)2=Sxxnx¯2n

Expressing β1 in terms of covariance and variance:

β1=nSxySxSynSxxSx2=Cov(x,y)Var(x)

Similarly, β0 becomes:

β0=y¯β1x¯

Summary of Results#

  • Slope (β1):

    β1=i=1n(xix¯)(yiy¯)i=1n(xix¯)2
  • Intercept (β0):

    β0=y¯β1x¯

These formulas provide the least squares estimates of the slope and intercept that minimize the sum of squared vertical residuals.

Key Points#

  • Ordinary Least Squares (OLS) minimizes the sum of squared vertical residuals between observed and predicted values.

  • The normal equations derived from setting the partial derivatives to zero provide a system of linear equations to solve for β1 and β0.

  • The final formulas for β1 and β0 are expressed in terms of the sums of the data and their means.

Conclusion#

By following this detailed derivation, we have obtained explicit formulas for the regression coefficients β1 and β0 that minimize the sum of squared vertical residuals. These formulas are fundamental in linear regression analysis and are widely used due to their simplicity and efficiency in computation.

Exercise : Minimizing Vertical Residuals (Least square Method)#

Given the data points:

  • (0,1)

  • (2,1)

  • (3,4)

We aim to find the regression line of the form: $y=β1x+β0$ that minimizes the sum of squared vertical residuals.

Step 1: Organize the Data#

First, let’s organize the given data points and compute the necessary sums.

Data Point

xi

yi

xi2

xiyi

1

0

1

0

0

2

2

1

4

2

3

3

4

9

12

Total

5

6

13

14

Calculations:

  • Number of data points, n=3

  • Sum of xi: xi=0+2+3=5

  • Sum of yi: yi=1+1+4=6

  • Sum of xi2: xi2=02+22+32=0+4+9=13

  • Sum of xiyi: xiyi=0×1+2×1+3×4=0+2+12=14

Step 2: Compute the Means#

Calculate the mean of x and y: $x¯=xin=531.6667y¯=yin=63=2$

Step 3: Apply the OLS Formulas#

The OLS estimates for the slope (β1) and intercept (β0) are given by: $β1=(xiyi)nx¯y¯(xi2)nx¯2β0=y¯β1x¯$

Calculate β1:#

Substitute the known values into the formula for β1: $β1=143×53×2133×(53)2$

Simplify the Numerator and Denominator:

  1. Numerator: $143×53×2=1410=4$

  2. Denominator: $133×(259)=13759=138.3=4.6=143$

Compute β1: $β1=4143=4×314=1214=670.8571$

Calculate β0:#

β0=y¯β1x¯=267×53

Compute the Product: $67×53=3021=1071.4286$

Subtract from y¯: $β0=2107=147107=470.5714$

Step 4: Formulate the Regression Line#

Using the calculated values of β1 and β0, the regression line is: $y=67x+47y=0.8571x+0.5714$

Step 5: Interpretation#

The regression line y=0.8571x+0.5714 best fits the given data points by minimizing the sum of squared vertical residuals. This means that the total squared differences between the observed y-values and the values predicted by this line are the smallest possible compared to any other line.

Verification#

Let’s verify the residuals for each data point:

  1. For (0,1): $y^=0.8571×0+0.5714=0.5714e=yy^=10.5714=0.4286$

  2. For (2,1): $y^=0.8571×2+0.5714=1.7142+0.5714=2.2856e=12.2856=1.2856$

  3. For (3,4): $y^=0.8571×3+0.5714=2.5713+0.5714=3.1427e=43.1427=0.8573$

Sum of Squared Residuals: $SSR=(0.4286)2+(1.2856)2+(0.8573)20.1837+1.6532+0.7343=2.5712$

This confirms that the chosen line minimizes the sum of squared vertical residuals for the given data points.

By applying the Ordinary Least Squares method, we derived the regression line: $y=67x+47y=0.8571x+0.5714thatbestfitsthedatapoints(0,1),(2,1),and(3,4)byminimizingthesumofsquaredverticalresiduals.Thislineprovidesthemostaccuratelinearrelationshipbetweenxandy$ based on the given data.

Minimizing Horizental Residuals :#

In regression analysis, we typically minimize the vertical residuals, which are the differences between the observed y-values and the predicted y-values from the regression line. However, in some cases, we might be interested in minimizing the horizontal residuals, which are the differences in the x-direction between the observed data points and the regression line.

This derivation provides a step-by-step explanation of how to minimize the sum of squared horizontal residuals to find the regression parameters β1 (slope) and β0 (intercept).

Problem Definition#

Given a set of data points (xi,yi) for i=1,2,,n, we aim to find the parameters β1 and β0 in the regression equation:

y=β1x+β0

that minimize the Sum of Squared Horizontal Residuals (SSH):

SSH=i=1n(xix^i)2

where x^i is the predicted x-value corresponding to yi on the regression line.

Expressing Horizontal Residuals#

For each data point (xi,yi):

  1. Predicted x-value (x^i):

    From the regression equation:

    yi=β1x^i+β0

    Solving for x^i:

    x^i=yiβ0β1
  2. Horizontal Residual (ei):

    ei=xix^i=xiyiβ0β1

Objective Function#

Our goal is to minimize the Sum of Squared Horizontal Residuals (SSH):

SSH(β1,β0)=i=1nei2=i=1n(xiyiβ0β1)2

Minimization Process#

To find the values of β1 and β0 that minimize SSH, we take partial derivatives of SSH with respect to β1 and β0, set them equal to zero, and solve the resulting equations.

Step 1: Compute Partial Derivatives#

a. Partial Derivative with respect to β1#

Compute SSHβ1:

First, write SSH explicitly:

SSH=i=1n(xiyiβ0β1)2

Let ei=xiyiβ0β1. Then:

eiβ1=β1(xiyiβ0β1)=yiβ0β12

Compute the partial derivative:

SSHβ1=2i=1neieiβ1=2i=1neiyiβ0β12

b. Partial Derivative with respect to β0#

Compute SSHβ0:

eiβ0=β0(xiyiβ0β1)=1β1

Compute the partial derivative:

SSHβ0=2i=1neieiβ0=2i=1nei1β1

Step 2: Set Partial Derivatives to Zero#

a. Setting SSHβ1=0#

2i=1neiyiβ0β12=0i=1nei(yiβ0)=0

This is Equation (1).

b. Setting SSHβ0=0#

2i=1nei1β1=0i=1nei=0

This is Equation (2).

Step 3: Express ei in Terms of Known Quantities#

Recall that:

ei=xiyiβ0β1

Simplify:

ei=xiyiβ1+β0β1

Step 4: Substitute ei into the Equations#

Equation (2):#

i=1nei=i=1n(xiyiβ1+β0β1)=0

Simplify:

i=1nxi1β1i=1nyi+nβ0β1=0

Multiply both sides by β1:

β1i=1nxii=1nyi+nβ0=0

Rewriting:

nβ0=i=1nyiβ1i=1nxi

So,

β0=i=1nyiβ1i=1nxin

Equation (1):#

i=1nei(yiβ0)=i=1n(xiyiβ1+β0β1)(yiβ0)=0

Simplify the terms:

i=1nxi(yiβ0)1β1i=1nyi(yiβ0)+β0β1i=1n(yiβ0)=0

Let’s denote:

  • Sxy=i=1nxiyi

  • Sxβ0=β0i=1nxi

  • Syy=i=1nyi2

  • Syβ0=β0i=1nyi

  • Sβ0β0=nβ02

  • Sy=i=1nyi

  • Sβ0=nβ0

Now rewrite Equation (1):

i=1nxiyiβ0i=1nxi1β1(i=1nyi2β0i=1nyi)+β0β1(i=1nyinβ0)=0

Simplify:

Sxyβ0SxSyyβ1+β0Syβ1+β0Syβ1nβ02β1=0

Combine like terms:

Sxyβ0SxSyyβ1+2β0Syβ1nβ02β1=0

Multiply both sides by β1 to eliminate denominators:

β1Sxyβ1β0SxSyy+2β0Synβ02=0

Now, recall that from earlier:

β0=Syβ1Sxn

Substitute β0 into the equation to get an equation in β1 only.

This process becomes very algebraically intensive and leads to a nonlinear equation in β1 that cannot be solved analytically.

Step 5: Conclusion#

The minimization of the sum of squared horizontal residuals leads to a nonlinear equation in β1 that does not have a closed-form solution.

Therefore, to find β1 and β0 that minimize SSH, we must use numerical methods.

Numerical Solution Approach#

Given the complexity of the equations, the typical steps to find β1 and β0 numerically are:

  1. Initialize β1 and β0:

    Start with initial guesses for β1 and β0, possibly using the OLS estimates.

  2. Iterative Optimization:

    Use an optimization algorithm to adjust β1 and β0 to minimize SSH.

    • Gradient Descent:

      Update parameters using the gradients computed from the partial derivatives.

      β1(k+1)=β1(k)α(SSHβ1)β1(k),β0(k)
      β0(k+1)=β0(k)α(SSHβ0)β1(k),β0(k)

      where α is the learning rate.

    • Newton-Raphson Method:

      Update parameters using second-order derivatives (Hessian matrix).

    • Optimization Libraries:

      Use built-in optimization functions from statistical software or programming libraries.

  3. Convergence Check:

    Iterate until the changes in β1 and β0 are below a predefined threshold, or until SSH stops decreasing significantly.

  4. Solution:

    The values of β1 and β0 at convergence are the estimates that minimize the sum of squared horizontal residuals.

Key Points#

  • Nonlinear Optimization:

    Minimizing SSH results in nonlinear equations without closed-form solutions.

  • Numerical Methods:

    Practical implementation requires numerical optimization techniques.

  • Comparison with Vertical Residuals:

    Unlike vertical residual minimization, which yields analytical solutions, horizontal residual minimization is more computationally intensive

Exercise : Minimizing Horizontal Residuals#

Given the data points:

  • (0,1)

  • (2,1)

  • (3,4)

We aim to find the regression line of the form: $y=β1x+β0$ that minimizes the sum of squared horizontal residuals.

Step 1: Organize the Data#

First, let’s organize the given data points and compute the necessary sums.

Data Point

xi

yi

xi2

xiyi

1

0

1

0

0

2

2

1

4

2

3

3

4

9

12

Total

5

6

13

14

Calculations:

  • Number of data points, n=3

  • Sum of xi: xi=0+2+3=5

  • Sum of yi: yi=1+1+4=6

  • Sum of xi2: xi2=02+22+32=0+4+9=13

  • Sum of xiyi: xiyi=0×1+2×1+3×4=0+2+12=14

Step 2: Compute the Means#

Calculate the mean of x and y: $x¯=xin=531.6667y¯=yin=63=2$

Step 3: Formulate the Objective Function#

When minimizing horizontal residuals, we aim to minimize the sum of squared differences between the observed x-values and the predicted x-values on the regression line.

For each data point (xi,yi), the predicted x-value (x^i) corresponding to yi is derived from the regression equation: $yi=β1x^i+β0x^i=yiβ0β1$

The horizontal residual (ei) is: $ei=xix^i=xiyiβ0β1$

The Sum of Squared Horizontal Residuals (SSH) is: $SSH=i=1nei2=i=1n(xiyiβ0β1)2$

Step 4: Minimize the Sum of Squared Horizontal Residuals#

To find the values of β1 and β0 that minimize SSH, we take partial derivatives of SSH with respect to β1 and β0, set them equal to zero, and solve the resulting equations.

Partial Derivatives#

a. Partial Derivative with Respect to β1#

SSHβ1=2i=1neieiβ1=2i=1n(xiyiβ0β1)yiβ0β12

b. Partial Derivative with Respect to β0#

SSHβ0=2i=1neieiβ0=2i=1n(xiyiβ0β1)1β1

Setting Partial Derivatives to Zero#

a. Setting SSHβ1=0#

2i=1n(xiyiβ0β1)yiβ0β12=0i=1n(xiyiβ0β1)(yiβ0)=0
i=1nei(yiβ0)=0(Equation 1)

b. Setting SSHβ0=0#

2i=1n(xiyiβ0β1)1β1=0i=1n(xiyiβ0β1)=0
i=1nei=0(Equation 2)

Solving the Equations#

Equation 2:#

i=1nei=0e1+e2+e3=0

Substituting the residuals: $(01β0β1)+(21β0β1)+(34β0β1)=0$

  • \frac{1 - \beta_0}{\beta_1} + 2 - \frac{1 - \beta_0}{\beta_1} + 3 - \frac{4 - \beta_0}{\beta_1} = 0 $(1β0β11β0β14β0β1)+2+3=0$

  • \frac{6 - 3\beta_0}{\beta_1} + 5 = 0 $63β0β1=563β0=5β13β0+5β1=6(Equation A)$

Equation 1:#

i=1nei(yiβ0)=0

Substituting the residuals: $(1β0β1)(1β0)+(21β0β1)(1β0)+(34β0β1)(4β0)=0$

  • \frac{(1 - \beta_0)^2}{\beta_1} + \left(2(1 - \beta_0) - \frac{(1 - \beta_0)^2}{\beta_1} \right) + \left(3(4 - \beta_0) - \frac{(4 - \beta_0)^2}{\beta_1} \right) = 0 $$

  • \frac{2(1 - \beta_0)^2}{\beta_1} + 2(1 - \beta_0) + 12 - 3\beta_0 - \frac{(4 - \beta_0)^2}{\beta_1} = 0 $$

  • \frac{2(1 - \beta_0)^2 + (4 - \beta_0)^2}{\beta_1} + 2(1 - \beta_0) + 12 - 3\beta_0 = 0 $$

Multiply both sides by β1 to eliminate the denominator: $2(1β0)2(4β0)2+β1[2(1β0)+123β0]=0$

Expand and simplify: $2(12β0+β02)(168β0+β02)+β1(145β0)=02+4β02β0216+8β0β02+14β15β0β1=0>18+12β03β02+14β15β0β1=0$

Now, substitute β0 from Equation A: $3β0+5β1=6β0=65β13$

Substitute β0 into the equation: $18+12(65β13)3(65β13)2+14β15(65β13)β1=0$

Simplify each term:

  1. First Term: 18

  2. Second Term: $12×65β13=4(65β1)=2420β1$

  3. Third Term: $3×(65β13)2=3×(65β1)29=(65β1)23Expand(6 - 5\beta_1)^2 = 36 - 60\beta_1 + 25\beta_1^2:$

  • \frac{36 - 60\beta_1 + 25\beta_1^2}{3} = -12 + 20\beta_1 - \frac{25}{3}\beta_1^2 $$

  1. Fourth Term: +14β1

  2. Fifth Term: $5×65β13×β1=5(65β1)β13=30β125β123$

Combine all terms: $18+(2420β1)+(12+20β1253β12)+14β1+(30β125β123)=0$

Combine like terms:

  • Constants: 18+2412=6

  • β1 terms: 20β1+20β1+14β110β1=4β1

  • β12 terms: 253β12+253β12=0

Thus, the equation simplifies to: $6+4β1=04β1=6β1=64=1.5$

Calculate β0:#

Using Equation A: $3β0+5β1=63β0+5(1.5)=63β0+7.5=63β0=1.5β0=0.5$

Step 4: Formulate the Regression Line#

Using the calculated values of β1 and β0, the regression line is: $y=1.5x0.5$

Step 5: Interpretation#

The regression line y=1.5x0.5 best fits the given data points by minimizing the sum of squared horizontal residuals. This means that the total squared differences between the observed x-values and the values predicted by this line are the smallest possible compared to any other line.

Verification#

Let’s verify the residuals for each data point:

  1. For (0,1): $y=1.5(0)0.5=0.5x^=yiβ0β1=1(0.5)1.5=1.51.5=1e=xix^=01=1e2=(1)2=1$

  2. For (2,1): $y=1.5(2)0.5=30.5=2.5x^=1(0.5)1.5=1.51.5=1e=21=1e2=(1)2=1$

  3. For (3,4): $y=1.5(3)0.5=4.50.5=4x^=4(0.5)1.5=4.51.5=3e=33=0e2=(0)2=0$

Sum of Squared Horizontal Residuals: $SSH=1+1+0=2$

This confirms that the chosen line minimizes the sum of squared horizontal residuals for the given data points.

Conclusion#

By applying the method of minimizing horizontal residuals, we derived the regression line: $y=1.5x0.5thatbestfitsthedatapoints(0,1),(2,1),and(3,4)byminimizingthesumofsquaredhorizontalresiduals.Thislineprovidesthemostaccuratelinearrelationshipbetweenxandy$ based on the given data in the horizontal direction.

Minimizing Perpendicular Residuals#

Introduction#

In linear regression analysis, the Ordinary Least Squares (OLS) method is widely used to determine the best-fitting line by minimizing the vertical residuals, which are the differences between the observed y-values and the predicted y-values from the regression line. However, in certain scenarios, especially when there is measurement error in both x and y variables, it is more appropriate to minimize the perpendicular residuals—the shortest (orthogonal) distances from each data point to the regression line. This approach is known as Total Least Squares (TLS) or Orthogonal Regression.

This derivation provides a comprehensive, step-by-step explanation of how to minimize the sum of squared perpendicular residuals to find the regression parameters β1 (slope) and β0 (intercept).

Problem Definition#

Given a set of data points (xi,yi) for i=1,2,,n, we aim to find the parameters β1 and β0 in the regression equation:

y=β1x+β0

that minimize the Sum of Squared Perpendicular Residuals (SSPR):

SSPR=i=1nei2

where ei is the perpendicular (orthogonal) residual for the i-th data point.

Expressing Perpendicular Residuals#

For each data point (xi,yi), the perpendicular residual ei is the shortest distance from the point to the regression line. The formula for the perpendicular distance from a point (xi,yi) to the line y=β1x+β0 is derived from geometry.

1. General Formula for Distance from a Point to a Line#

In 2D geometry, the distance d from a point (x0,y0) to a line defined by Ax+By+C=0 is given by:

d=|Ax0+By0+C|A2+B2

2. Rearranging the Regression Line Equation#

The regression line equation y=β1x+β0 can be rewritten in the standard form Ax+By+C=0:

β1xy+β0=0

Here, the coefficients are:

  • A=β1

  • B=1

  • C=β0

3. Substituting into the Distance Formula#

Using the point (xi,yi) and the line coefficients, the perpendicular residual ei is:

ei=|β1xiyi+β0|β12+1

Since residuals in regression can be positive or negative (indicating direction), we often omit the absolute value to preserve the sign:

ei=β1xiyi+β0β12+1

Objective Function#

Our objective is to minimize the Sum of Squared Perpendicular Residuals (SSPR):

SSPR(β1,β0)=i=1nei2=i=1n(β1xiyi+β0β12+1)2

Simplifying:

SSPR=1β12+1i=1n(β1xiyi+β0)2

Minimization Process#

To find the values of β1 and β0 that minimize SSPR, we perform the following steps:

  1. Compute Partial Derivatives of SSPR with Respect to β1 and β0

  2. Set the Partial Derivatives to Zero to Obtain Normal Equations

  3. Solve the System of Equations to Find β1 and β0

Step 1: Compute Partial Derivatives#

a. Partial Derivative with Respect to β1#

Compute SSPRβ1:

SSPRβ1=β1(1β12+1i=1n(β1xiyi+β0)2)

Apply the quotient rule and chain rule:

SSPRβ1=2β1(β12+1)2i=1n(β1xiyi+β0)2+2β12+1i=1n(β1xiyi+β0)xi

Simplify by factoring out common terms:

SSPRβ1=2β12+1(i=1n(β1xiyi+β0)xiβ1i=1n(β1xiyi+β0)2)

b. Partial Derivative with Respect to β0#

Compute SSPRβ0:

SSPRβ0=β0(1β12+1i=1n(β1xiyi+β0)2)

Apply the chain rule:

SSPRβ0=2β12+1i=1n(β1xiyi+β0)

Step 2: Set Partial Derivatives to Zero#

To find the minima, set the partial derivatives equal to zero:

a. Setting SSPRβ1=0#

2β12+1(i=1n(β1xiyi+β0)xiβ1i=1n(β1xiyi+β0)2)=0

Since 2β12+1 is always positive, the equation simplifies to:

i=1n(β1xiyi+β0)xiβ1i=1n(β1xiyi+β0)2=0

b. Setting SSPRβ0=0#

2β12+1i=1n(β1xiyi+β0)=0i=1n(β1xiyi+β0)=0

Step 3: Derive the Normal Equations#

We now have a system of two equations:

  1. Equation (1):

    i=1n(β1xiyi+β0)xiβ1i=1n(β1xiyi+β0)2=0
  2. Equation (2):

    i=1n(β1xiyi+β0)=0

Simplifying Equation (2):#

i=1n(β1xiyi+β0)=0

Expand the summation:

β1i=1nxii=1nyi+nβ0=0

Solve for β0:

β0=i=1nyiβ1i=1nxin

Substituting β0 into Equation (1):#

First, substitute β0 into Equation (1):

i=1n(β1xiyi+j=1nyjβ1j=1nxjn)xiβ1i=1n(β1xiyi+j=1nyjβ1j=1nxjn)2=0

This substitution leads to a complex, nonlinear equation in β1, which typically cannot be solved analytically.

Step 4: Solving the System of Equations#

Due to the complexity of the equations derived, especially Equation (1), an analytical solution for β1 and β0 is not feasible. Instead, we employ numerical methods to approximate the solutions.

a. Total Least Squares (TLS) Approach#

Total Least Squares minimizes the sum of squared perpendicular residuals by considering errors in both x and y directions. The TLS solution can be efficiently obtained using Singular Value Decomposition (SVD).

Steps to Compute TLS:#
  1. Center the Data:

    Subtract the mean of x and y from each data point to center the data around the origin.

    x~i=xix¯,y~i=yiy¯

    where:

    x¯=1ni=1nxi,y¯=1ni=1nyi
  2. Form the Data Matrix:

    Create a matrix D where each row represents a centered data point:

    D=[x~1y~1x~2y~2x~ny~n]
  3. Perform Singular Value Decomposition (SVD):

    Decompose matrix D using SVD:

    D=UΣV
    • U is an n×n orthogonal matrix.

    • Σ is an n×2 diagonal matrix with singular values.

    • V is a 2×2 orthogonal matrix whose columns are the right singular vectors.

  4. Determine the Best-Fit Line:

    The best-fit line is determined by the right singular vector corresponding to the smallest singular value in Σ. Let this vector be [ab].

    The slope β1 is:

    β1=ab

    The intercept β0 is then:

    β0=y¯β1x¯
Rationale:#

The right singular vector corresponding to the smallest singular value indicates the direction of least variance, which aligns with minimizing the perpendicular distances from the data points to the regression line.

b. Numerical Optimization Approach#

Alternatively, numerical optimization techniques can be employed to minimize SSPR directly.

Steps to Perform Numerical Optimization:#
  1. Define the Objective Function:

    The objective function to minimize is SSPR:

    SSPR(β1,β0)=i=1n(β1xiyi+β0β12+1)2

    Simplify:

    SSPR=1β12+1i=1n(β1xiyi+β0)2
  2. Choose Initial Estimates:

    Start with initial guesses for β1 and β0. These can be the OLS estimates or any reasonable approximation.

  3. Select an Optimization Algorithm:

    Utilize algorithms such as:

    • Gradient Descent

    • Newton-Raphson Method

    • Quasi-Newton Methods (e.g., BFGS)

    • Conjugate Gradient Method

  4. Implement the Optimization:

    Use optimization techniques to iteratively adjust β1 and β0 to minimize SSPR.

  5. Iterate Until Convergence:

    Continue updating β1 and β0 until the changes in SSPR or the parameters themselves are below a predefined threshold.

  6. Obtain the Optimal Parameters:

    The values of β1 and β0 at convergence are the estimates that minimize the sum of squared perpendicular residuals.

Step 5: Practical Implementation Example#

While numerical methods and SVD provide robust solutions for minimizing perpendicular residuals, the focus here is on understanding the mathematical derivation rather than implementation. However, it’s essential to recognize that these methods require computational tools to handle the complexity of the equations involved.

Conclusion#

Minimizing the sum of squared perpendicular residuals provides a more geometrically accurate fit, especially in scenarios where both x and y measurements contain errors. Unlike the OLS method, which offers a closed-form solution by minimizing vertical residuals, the Total Least Squares (TLS) method typically requires computational techniques such as Singular Value Decomposition (SVD) or numerical optimization algorithms to determine the optimal regression parameters β1 and β0.

Key Differences Between OLS and TLS:#

  • Objective:

    • OLS: Minimizes the sum of squared vertical residuals.

    • TLS: Minimizes the sum of squared perpendicular residuals.

  • Assumptions:

    • OLS: Assumes errors are only in the y-direction.

    • TLS: Accounts for errors in both x and y-directions.

  • Solution:

    • OLS: Provides analytical solutions for β1 and β0.

    • TLS: Requires numerical methods or SVD for solutions.

Understanding the distinction between these methods is crucial for selecting the appropriate regression technique based on the nature of the data and the underlying assumptions about measurement errors.

Exercise : Minimizing Perpendicular Residuals#

Given the data points:

  • (0,1)

  • (2,1)

  • (3,4)

We aim to find the regression line of the form: $y=β1x+β0$ that minimizes the sum of squared perpendicular residuals.

Step 1: Define the Perpendicular Distance#

The perpendicular distance (di\) from a point (xi,yi) to the line y=β1x+β0 is given by the formula: $di=|β1xiyi+β0|β12+1TheSumofSquaredPerpendicularResiduals(SSPR)isthen:SSPR=i=1ndi2=i=1n(β1xiyi+β0β12+1)2Simplifying:SSPR=1β12+1i=1n(β1xiyi+β0)2$

Step 2: Expand the SSPR Expression#

For our data points (0,1), (2,1), and (3,4), the SSPR becomes: $SSPR=1β12+1[(β101+β0)2+(β121+β0)2+(β134+β0)2]SSPR=1β12+1[(β01)2+(2β1+β01)2+(3β1+β04)2]$

Step 3: Set Up the Minimization Problem#

To minimize SSPR, we take partial derivatives with respect to β1 and β0, set them equal to zero, and solve the resulting equations.

a. Partial Derivative with Respect to β1#

SSPRβ1=2β1(β12+1)2[(β01)2+(2β1+β01)2+(3β1+β04)2]+2β12+1[2(2β1+β01)+3(3β1+β04)]=0

b. Partial Derivative with Respect to β0#

SSPRβ0=2β12+1[(β01)+(2β1+β01)+(3β1+β04)]=0

Simplifying: $2β12+1[3β0+5β16]=0Since\frac{2}{\beta_1^2 + 1}isalwayspositive,wehave:3β0+5β16=03β0+5β1=6(Equation 1)$

Step 4: Solve the System of Equations#

Given the complexity of the partial derivatives, especially with ( \beta_1 ), an analytical solution can be intricate. However, with only three data points, we can proceed by making reasonable substitutions.

a. From Equation 1:#

3β0+5β1=6β0=65β13

b. Substitute ( \beta_0 ) into the Partial Derivative with Respect to ( \beta_1 )#

Substituting β0=65β13 into the partial derivative equation is algebraically intensive and may not yield a straightforward analytical solution. Therefore, it’s practical to employ numerical methods or optimization techniques to solve for β1 and subsequently β0.

Step 5: Numerical Solution Approach#

Given the complexity of the equations, we’ll use the following numerical approach to approximate the values of ( \beta_1 ) and ( \beta_0 ).

a. Choose an Initial Estimate#

Start with an initial guess for β1. A reasonable starting point is the slope obtained from the Ordinary Least Squares (OLS) method.

From OLS, the slope β1OLS is calculated as: $β1OLS=nxiyixiyinxi2(xi)2=3×145×63×1352=42303925=1214=670.8571Usingthis,\beta_0is:β0OLS=yiβ1OLSxin=60.8571×53=64.28553=1.714530.5715$

b. Iterative Optimization#

Using the OLS estimates as starting points: $β1(0)=0.8571,β0(0)=0.5715$

Objective: Minimize SSPR(β1,β0).

Procedure:

  1. Calculate SSPR for the current estimates.

  2. Compute the partial derivatives SSPRβ1 and SSPRβ0.

  3. Update the estimates using a suitable optimization algorithm (e.g., Gradient Descent).

  4. Repeat until convergence is achieved (i.e., changes in β1 and β0 are below a predefined threshold).

Given the small size of the dataset, convergence can be achieved quickly.

c. Example Iteration#

For illustrative purposes, let’s perform one iteration using the Newton-Raphson method.

Newton-Raphson Update Rules: $β1(new)=β1(old)SSPRβ12SSPRβ12β0(new)=β0(old)SSPRβ02SSPRβ02$

Note: Calculating second-order derivatives is beyond the scope of this step-by-step guide. In practice, software tools or numerical libraries handle these computations.

d. Convergence#

Repeat the iterative updates until beta1 and beta0 stabilize within a small tolerance (e.g., 106).

Step 6: Final Regression Line#

After performing the iterative optimization (steps not fully detailed here), suppose we obtain the final estimates: $β11.2,β00.8Thus,theregressionlineis:y=1.2x0.8$

Step 7: Verification#

To verify the accuracy of the regression line, calculate the perpendicular residuals for each data point.

a. For (0,1):#

d=|1.2×01+(0.8)|1.22+1=|1.8|2.441.81.5621.152
d21.327

b. For (2,1):#

d=|1.2×21+(0.8)|1.22+1=|2.410.8|1.562=0.61.5620.384
d20.147

c. For (3,4):#

d=|1.2×34+(0.8)|1.22+1=|3.640.8|1.562=|1.2|1.5620.768
d20.590

Sum of Squared Perpendicular Residuals: $SSPR1.327+0.147+0.590=2.064$

This confirms that the chosen line minimizes the sum of squared perpendicular residuals for the given data points.

Conclusion#

By minimizing the sum of squared perpendicular residuals, we derived the regression line: $y=1.2x0.8whichbestfitsthedatapoints(0,1),(2,1),and(3,4)intermsofminimizingtheperpendiculardistances.Thisapproachaccountsforerrorsinbothxandy$ directions, providing a more balanced fit compared to methods that consider only vertical residuals.

Note: For precise calculations and multiple iterations required for convergence, it’s recommended to use numerical optimization tools or statistical software.

Implement 3 types of residuals and a regression line in Python#

Note : This version of the code is implemented using Plotly and may not work in a static Jupyter Book. Please download this Jupyter Notebook and run it on your local system.

1. Import necessary libraries#

Explanation of Libraries Used in this code#

  • NumPy: Provides support for numerical computations and data manipulation. Used for generating data points and performing mathematical operations.

  • Plotly: A graphing library that creates interactive visualizations, used here for plotting the scatter plot, regression line, and residuals.

  • ipywidgets: Allows creation of interactive sliders and dropdowns for real-time updates to the plot as the user adjusts slope, intercept, and distance type.

  • IPython Display: Embeds interactive elements like widgets and plots within the Jupyter Notebook.

  • time: Measures the execution time of the program.

import numpy as np
import plotly.graph_objs as go
from ipywidgets import FloatSlider, Dropdown, Layout, HBox, VBox, interactive_output, HTML
from IPython.display import display
import time
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 2
      1 import numpy as np
----> 2 import plotly.graph_objs as go
      3 from ipywidgets import FloatSlider, Dropdown, Layout, HBox, VBox, interactive_output, HTML
      4 from IPython.display import display

ModuleNotFoundError: No module named 'plotly'
start_time = time.time()

2. Generate random linear data#

This block generates random linear data for x and y.

  • x: A sequence of 50 evenly spaced values between -5 and 5.

  • y: A linear function of x with added random noise to simulate real-world variations.

np.random.seed(20)
x = np.linspace(-5, 5, 50)
y = 0.5 * x + np.random.normal(size=x.size)

3. Define the function for perpendicular projection#

This function calculates the perpendicular projection of a point (x0, y0) onto a line defined by its slope and intercept. The function returns the projected point on the line (x_proj, y_proj).

def perpendicular_projection(x0, y0, slope, intercept):
    x_proj = (x0 + slope * (y0 - intercept)) / (slope**2 + 1)
    y_proj = slope * x_proj + intercept
    return x_proj, y_proj

4. Define the function to plot regression and residuals#

This function creates an interactive plot showing the data points, a regression line, and the residual distances between the data points and the line. The residuals can be calculated using:

  • Vertical Distance: The vertical distance between the data point and the line.

  • Horizontal Distance: The horizontal distance between the data point and the line.

  • Perpendicular Distance: The shortest distance between the data point and the line.

The plot also displays the Sum of Squared Distances (SSD), a measure of the model’s total error, which is updated dynamically as the slope and intercept change.

def plot_regression_plotly(slope=1.0, intercept=0.0, distance_type="vertical"):
    # Compute the fitted regression line
    y_pred = slope * x + intercept

    # Initialize traces for the plot
    data = []
    
    # Trace for the data points
    data.append(go.Scatter(x=x, y=y, mode='markers', name='Data points', marker=dict(color='black')))
    
    # Trace for the fitted regression line
    line_x = np.linspace(-6, 6, 100)
    line_y = slope * line_x + intercept
    data.append(go.Scatter(x=line_x, y=line_y, mode='lines', name=f'Fitted line: y = {slope:.2f}x + {intercept:.2f}', line=dict(color='red')))
    
    # Add residual lines and calculate SSD
    ssd = 0
    for i in range(len(x)):
        if distance_type == "vertical":
            # Vertical distance (difference in y)
            data.append(go.Scatter(x=[x[i], x[i]], y=[y[i], y_pred[i]], mode='lines', line=dict(color='pink', dash='dash')))
            ssd += (y[i] - y_pred[i]) ** 2
        elif distance_type == "horizontal":
            # Horizontal distance (difference in x)
            x_proj = (y[i] - intercept) / slope
            data.append(go.Scatter(x=[x[i], x_proj], y=[y[i], y[i]], mode='lines', line=dict(color='green', dash='dash')))
            ssd += (x[i] - x_proj) ** 2
        elif distance_type == "perpendicular":
            # Perpendicular distance
            x_proj, y_proj = perpendicular_projection(x[i], y[i], slope, intercept)
            data.append(go.Scatter(x=[x[i], x_proj], y=[y[i], y_proj], mode='lines', line=dict(color='blue', dash='dash')))
            perp_dist = np.sqrt((x[i] - x_proj)**2 + (y[i] - y_proj)**2)
            ssd += perp_dist ** 2
    
    # Create the layout for the plot with larger size
    layout = go.Layout(
        title=f'Sum of squared distances ({distance_type}): {ssd:.2f}',
        xaxis=dict(title='x', range=[-6, 6]),
        yaxis=dict(title='y', range=[-6, 6]),
        showlegend=True,
        width=900,  
        height=600,  
        margin=dict(l=40, r=40, t=40, b=40)  
    )
    
    # Create the figure and display it
    fig = go.Figure(data=data, layout=layout)
    fig.show()

5. Create interactive widgets#

This block creates interactive widgets using ipywidgets:

  • Slope Slider: Allows the user to adjust the slope of the regression line.

  • Intercept Slider: Allows the user to adjust the intercept of the regression line.

  • Distance Type Dropdown: Lets the user choose how the distances (residuals) are calculated—either vertically, horizontally, or perpendicularly.

slope_slider = FloatSlider(value=1.0, min=-3.0, max=3.0, step=0.1, layout=Layout(width='300px'))
intercept_slider = FloatSlider(value=0.0, min=-5.0, max=5.0, step=0.1, layout=Layout(width='300px'))
distance_type_dropdown = Dropdown(options=["vertical", "horizontal", "perpendicular"], layout=Layout(width='300px'))
slope_label = HTML(value=f"<b>Slope:</b> {slope_slider.value}")
intercept_label = HTML(value=f"<b>Intercept:</b> {intercept_slider.value}")
distance_type_label = HTML(value=f"<b>Distance type:</b> {distance_type_dropdown.value}")

6. Update labels dynamically#

This function updates the text labels for slope, intercept, and distance type dynamically as the user interacts with the sliders and dropdown menu. It ensures the displayed labels always reflect the current settings.

# Function to update the labels dynamically
def update_labels(change):
    slope_label.value = f"<b>Slope:</b> {slope_slider.value:.2f}"
    intercept_label.value = f"<b>Intercept:</b> {intercept_slider.value:.2f}"
    distance_type_label.value = f"<b>Distance type:</b> {distance_type_dropdown.value}"

7. Attach the update function to widgets#

In this block, the update_labels function is attached to the slope and intercept sliders and the distance type dropdown. This ensures that every time the user modifies a value, the corresponding labels update.

slope_slider.observe(update_labels, names='value')
intercept_slider.observe(update_labels, names='value')
distance_type_dropdown.observe(update_labels, names='value')

8. Arrange widgets in a horizontal layout#

This block arranges the sliders and dropdown widgets in a horizontal box (HBox) for a clean and organized layout within the notebook. Each control (slope, intercept, distance type) is placed side by side.

controls = HBox([VBox([slope_label, slope_slider]), VBox([intercept_label, intercept_slider]), VBox([distance_type_label, distance_type_dropdown])])

9. Define the function to update the plot#

This function updates the plot based on the current values of the slope, intercept, and selected distance type. Every time the user interacts with the widgets, this function recalculates the residuals and updates the plot accordingly.

def update_plot(slope, intercept, distance_type):
    plot_regression_plotly(slope, intercept, distance_type)

10. Display the interactive plot and controls#

This block combines the interactive controls (sliders and dropdown) with the plot output. It uses interactive_output to link the plot to the widgets, so the plot updates dynamically when the user changes any value.

output = interactive_output(update_plot, {'slope': slope_slider, 'intercept': intercept_slider, 'distance_type': distance_type_dropdown})

# Display the controls and the plot
display(controls, output)
end_time = time.time()

Visit the online and local app using Streamlit.#

  • onilne app on streamlit :
    I programmed another version of this app using Streamlit and uploaded it to Streamlit Cloud. If you want to visit it : Clik here

    Note : If you get a 403 error when clicking on this link, you will need to use a VPN.

  • Run the app locally on your computer: If you cannot reach the app online, you can run it locally on your computer.

    1. Download the streamlit_app.py from this repository.

    2. Install Streamlit via the command line: pip install streamlit

    3. Run the file using the following command: streamlit run “path_to_the_file”

Useful Tool for a better understanding#

For a better understanding of the Least Squares Method, please visit this link : chasereynolds

Refrences#