Linear Least Square Method

Linear Least Square Method#

Author : Mustafa Sadeghi
E-mail : mustafasadeghi@mail.um.ac.ir

Project Description#

This code provides an interactive visualization of different methods to compute the Least Squares Regression Line using vertical, horizontal, and perpendicular residuals. The goal of this visualization is to allow users to manually adjust the slope ($\beta_1$) and intercept ($\beta_0$) of a regression line and compare it with the automatically computed Least Squares Line for each residual type.

The code calculates the Sum of Squared Distances (SSD) for both the user-defined line and the automatically computed least squares line. The user can interactively visualize the effect of different slope and intercept values on the regression line’s fit, using sliders for adjustments.

Mathematical Insights:#

Linear Regression with Different Residuals:
- In standard Least Squares Regression, the goal is to minimize the vertical distance between the data points and the regression line. This approach uses the formula:
  
  \[ y = \beta_1 x + \beta_0 + \epsilon_i \]
  
  where:
  - $\beta_1$ is the slope of the line.
  - $\beta_0$ is the intercept of the line (the value of $y$ when $x = 0$).
  - $\epsilon_i$ represents the residual for the $i$-th data point, which is the difference between the actual value $y_i$ and the predicted value $\hat{y}_i$.
- The goal is to find values for $\beta_1$ and $\beta_0$ that minimize the sum of the squared residuals (SSR):
  
  \[ \textcolor{red}{SSR = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \]
  
  Note: We are using this as the cost function to find the best fitted line.
Three Types of Residuals:
- This code allows users to visualize three different types of residuals:
  - Vertical Residuals: The vertical difference between the data point and the regression line, commonly used in standard Least Squares Regression.
    
    \[ \text{Vertical Residual} = y_i - \hat{y}_i = y_i - (\beta_1 x_i + \beta_0) \]
  - Horizontal Residuals: The horizontal distance between the data point and the regression line. This method measures the deviation along the $x$-axis.
    
    \[ \text{Horizontal Residual} = x_i - \hat{x}_i \]
    
    where $\hat{x}_i$ is found by solving for $x$ when $y_i = \beta_1 x + \beta_0$, leading to:
    
    \[ \hat{x}_i = \frac{y_i - \beta_0}{\beta_1} \]
  - Perpendicular Residuals: The shortest (perpendicular) distance from the data point to the regression line, computed using geometric methods. This approach provides a more accurate geometric fit but is not typically used in standard regression.
    
    \[ \text{Perpendicular Distance} = \frac{|\beta_1 x_i - y_i + \beta_0|}{\sqrt{\beta_1^2 + 1}} \]
  - Derivation of the Perpendicular Residual Formula
    - The perpendicular residual quantifies the distance from a point $(x_i, y_i)$ to a regression line given by:
      
      \[ y = \beta_1 x + \beta_0, \]
      
      where:
      - $\beta_1$ is the slope,
      - $\beta_0$ is the intercept.
      The perpendicular residual, unlike the vertical residual, is measured perpendicularly from the point to the regression line.
    - 1. General Formula for the Distance from a Point to a Line
      
      In 2D geometry, the formula for the perpendicular distance from a point $(x_0, y_0)$ to a line of the form $Ax + By + C = 0$ is:
      
      \[ d = \frac{|Ax_0 + By_0 + C|}{\sqrt{A^2 + B^2}}. \]
    - 2. Rearranging the Regression Line Equation
      
      To apply this formula to our regression line, we first rewrite the line equation $y = \beta_1 x + \beta_0$ in the form $Ax + By + C = 0$. Rearranging the terms, we get:
      
      \[ \beta_1 x - y + \beta_0 = 0. \]
      
      Here, we can identify:
      - $A = \beta_1$,
      - $B = -1$,
      - $C = \beta_0$.
    - 3. Substituting into the Perpendicular Distance Formula
      
      Now, using the point $(x_i, y_i)$ in the distance formula:
      
      \[ d = \frac{|\beta_1 x_i - y_i + \beta_0|}{\sqrt{\beta_1^2 + (-1)^2}}. \]
      
      The denominator simplifies to $\sqrt{\beta_1^2 + 1}$, giving us:
      
      \[ d = \frac{|\beta_1 x_i - y_i + \beta_0|}{\sqrt{\beta_1^2 + 1}}. \]
    - 4. Removing the Absolute Value for Residuals
      
      In regression analysis, residuals are typically signed, indicating whether the data point lies above or below the line. Thus, instead of using the absolute value, we keep the sign:
      
      \[ e_i = \frac{\beta_1 x_i - y_i + \beta_0}{\sqrt{\beta_1^2 + 1}}. \]
      
      This formula expresses the perpendicular residual, which measures the signed perpendicular distance from the point $(x_i, y_i)$ to the regression line.

Sum of Squared Distances (SSD):
- In this code, the Sum of Squared Distances (SSD) is computed dynamically for both the user-defined line and the least squares line for each type of residual.
- The formula for SSD is similar to the sum of squared residuals:
  
  \[ SSD = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} (\text{Residual}_i)^2 \]
  
  The goal is to minimize the SSD by adjusting $\beta_1$ and $\beta_0$. The code visualizes how different residual types affect the computed SSD.

Key Features:#

Interactive Plot:
- The plot displays the data points along with both the user-defined regression line and the least squares regression line. The user can manually adjust the slope ($\beta_1$) and intercept ($\beta_0$) via sliders, and the plot updates in real-time to show the new lines.
Residual Visualization:
- The distances (residuals) between each data point and the regression line are visualized as lines on the plot. The user can switch between three types of residuals (vertical, horizontal, and perpendicular) using a dropdown menu.
Sum of Squared Distances (SSD):
- The SSD for both the user-defined line and the least squares line is displayed on the plot. The SSD is recalculated as the user adjusts the slope and intercept, helping users understand the effect of different residual types on the overall error.
Mathematical Insight:
- The code offers a deep understanding of the least squares method by allowing users to explore different types of residuals and see their impact on the regression line and SSD. Users can learn why vertical residuals are typically used in the classic least squares method and how alternative methods (horizontal and perpendicular) affect the fit.

Minimizing Vertical Residuals (Least square Method) :#

In standard linear regression, we aim to find the best-fitting straight line through a set of data points by minimizing the sum of squared vertical residuals. This method is known as the Ordinary Least Squares (OLS) regression. The residuals are the vertical distances (errors) between the observed values and the values predicted by the linear model.

Problem Definition#

Given a set of data points $(x_i, y_i)$ for $i = 1, 2, \dots, n$, we wish to find the parameters $\beta_1$ (slope) and $\beta_0$ (intercept) in the linear equation:

\[ y_i = \beta_1 x_i + \beta_0 + \epsilon_i \]

where $\epsilon_i$ is the residual for the $i$-th data point.

Our objective is to find $\beta_1$ and $\beta_0$ that minimize the Sum of Squared Residuals (SSR):

\[ SSR(\beta_1, \beta_0) = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right)^2 \]

Minimization Process#

To find the values of $\beta_1$ and $\beta_0$ that minimize $SSR$, we take partial derivatives of $SSR$ with respect to $\beta_1$ and $\beta_0$, set them equal to zero, and solve the resulting equations.

Step 1: Compute Partial Derivatives#

a. Partial Derivative with respect to $\beta_1$#

Compute $\frac{\partial SSR}{\partial \beta_1}$:

\[ \frac{\partial SSR}{\partial \beta_1} = \frac{\partial}{\partial \beta_1} \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right)^2 = \sum_{i=1}^{n} 2 \left( y_i - \beta_1 x_i - \beta_0 \right) (-x_i) \]

Simplify:

\[ \frac{\partial SSR}{\partial \beta_1} = -2 \sum_{i=1}^{n} x_i \left( y_i - \beta_1 x_i - \beta_0 \right) \]

b. Partial Derivative with respect to $\beta_0$#

Compute $\frac{\partial SSR}{\partial \beta_0}$:

\[ \frac{\partial SSR}{\partial \beta_0} = \frac{\partial}{\partial \beta_0} \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right)^2 = \sum_{i=1}^{n} 2 \left( y_i - \beta_1 x_i - \beta_0 \right) (-1) \]

Simplify:

\[ \frac{\partial SSR}{\partial \beta_0} = -2 \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right) \]

Step 2: Set Partial Derivatives to Zero#

a. Setting $\frac{\partial SSR}{\partial \beta_1} = 0$#

\[ -2 \sum_{i=1}^{n} x_i \left( y_i - \beta_1 x_i - \beta_0 \right) = 0 \]

Divide both sides by $-2$:

\[ \sum_{i=1}^{n} x_i \left( y_i - \beta_1 x_i - \beta_0 \right) = 0 \]

b. Setting $\frac{\partial SSR}{\partial \beta_0} = 0$#

\[ -2 \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right) = 0 \]

Divide both sides by $-2$:

\[ \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right) = 0 \]

Step 3: Derive the Normal Equations#

a. First Normal Equation (from derivative w.r.t. $\beta_0$)#

We have:

\[ \sum_{i=1}^{n} \left( y_i - \beta_1 x_i - \beta_0 \right) = 0 \]

Simplify the summation:

\[ \sum_{i=1}^{n} y_i - \beta_1 \sum_{i=1}^{n} x_i - n \beta_0 = 0 \]

Rewriting:

\[ n \beta_0 + \beta_1 \sum_{i=1}^{n} x_i = \sum_{i=1}^{n} y_i \]

b. Second Normal Equation (from derivative w.r.t. $\beta_1$)#

We have:

\[ \sum_{i=1}^{n} x_i \left( y_i - \beta_1 x_i - \beta_0 \right) = 0 \]

Simplify the summation:

\[ \sum_{i=1}^{n} x_i y_i - \beta_1 \sum_{i=1}^{n} x_i^2 - \beta_0 \sum_{i=1}^{n} x_i = 0 \]

Rewriting:

\[ \beta_0 \sum_{i=1}^{n} x_i + \beta_1 \sum_{i=1}^{n} x_i^2 = \sum_{i=1}^{n} x_i y_i \]

Step 4: Solve the System of Equations#

We have two normal equations:

$ n \beta_0 + \beta_1 \sum_{i=1}^{n} x_i = \sum_{i=1}^{n} y_i $
$ \beta_0 \sum_{i=1}^{n} x_i + \beta_1 \sum_{i=1}^{n} x_i^2 = \sum_{i=1}^{n} x_i y_i $

Let’s denote:

$ S_x = \sum_{i=1}^{n} x_i $
$ S_y = \sum_{i=1}^{n} y_i $
$ S_{xx} = \sum_{i=1}^{n} x_i^2 $
$ S_{xy} = \sum_{i=1}^{n} x_i y_i $

Then the normal equations become:

$ n \beta_0 + \beta_1 S_x = S_y $
$ \beta_0 S_x + \beta_1 S_{xx} = S_{xy} $

Solving for $\beta_0$ and $\beta_1$#

From the first equation:

\[ \beta_0 = \frac{S_y - \beta_1 S_x}{n} \]

Substitute $\beta_0$ into the second equation:

\[ \left( \frac{S_y - \beta_1 S_x}{n} \right) S_x + \beta_1 S_{xx} = S_{xy} \]

Simplify:

\[ \frac{S_y S_x - \beta_1 S_x^2}{n} + \beta_1 S_{xx} = S_{xy} \]

Multiply both sides by $n$ to eliminate the denominator:

\[ S_y S_x - \beta_1 S_x^2 + n \beta_1 S_{xx} = n S_{xy} \]

Group terms involving $\beta_1$:

\[ - \beta_1 S_x^2 + n \beta_1 S_{xx} = n S_{xy} - S_y S_x \]

Factor out $\beta_1$ on the left side:

\[ \beta_1 \left( - S_x^2 + n S_{xx} \right) = n S_{xy} - S_x S_y \]

Rewriting:

\[ \beta_1 \left( n S_{xx} - S_x^2 \right) = n S_{xy} - S_x S_y \]

Thus, the solution for $\beta_1$ is:

\[ \beta_1 = \frac{n S_{xy} - S_x S_y}{n S_{xx} - S_x^2} \]

Once $\beta_1$ is known, we can find $\beta_0$:

\[ \beta_0 = \frac{S_y - \beta_1 S_x}{n} \]

Step 5: Express in Terms of Means#

Let’s define the sample means:

$ \bar{x} = \frac{S_x}{n} $
$ \bar{y} = \frac{S_y}{n} $

Also define:

$ \text{Cov}(x, y) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y}) = \frac{S_{xy} - n \bar{x} \bar{y}}{n} $
$ \text{Var}(x) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2 = \frac{S_{xx} - n \bar{x}^2}{n} $

Expressing $\beta_1$ in terms of covariance and variance:

\[ \beta_1 = \frac{n S_{xy} - S_x S_y}{n S_{xx} - S_x^2} = \frac{\text{Cov}(x, y)}{\text{Var}(x)} \]

Similarly, $\beta_0$ becomes:

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]

Summary of Results#

Slope ($\beta_1$):

\[ \beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]
Intercept ($\beta_0$):

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]

These formulas provide the least squares estimates of the slope and intercept that minimize the sum of squared vertical residuals.

Key Points#

Ordinary Least Squares (OLS) minimizes the sum of squared vertical residuals between observed and predicted values.
The normal equations derived from setting the partial derivatives to zero provide a system of linear equations to solve for $\beta_1$ and $\beta_0$.
The final formulas for $\beta_1$ and $\beta_0$ are expressed in terms of the sums of the data and their means.

Conclusion#

By following this detailed derivation, we have obtained explicit formulas for the regression coefficients $\beta_1$ and $\beta_0$ that minimize the sum of squared vertical residuals. These formulas are fundamental in linear regression analysis and are widely used due to their simplicity and efficiency in computation.

Exercise : Minimizing Vertical Residuals (Least square Method)#

Given the data points:

$(0, 1)$
$(2, 1)$
$(3, 4)$

We aim to find the regression line of the form: $$ y = \beta_1 x + \beta_0 $$ that minimizes the sum of squared vertical residuals.

Step 1: Organize the Data#

First, let’s organize the given data points and compute the necessary sums.

Data Point	$x_i$	$y_i$	$x_i^2$	$x_i y_i$
1	0	1	0	0
2	2	1	4	2
3	3	4	9	12
Total	5	6	13	14

Calculations:

Number of data points, $n = 3$
Sum of $x_i$: $\sum x_i = 0 + 2 + 3 = 5$
Sum of $y_i$: $\sum y_i = 1 + 1 + 4 = 6$
Sum of $x_i^2$: $\sum x_i^2 = 0^2 + 2^2 + 3^2 = 0 + 4 + 9 = 13$
Sum of $x_i y_i$: $\sum x_i y_i = 0 \times 1 + 2 \times 1 + 3 \times 4 = 0 + 2 + 12 = 14$

Step 2: Compute the Means#

Calculate the mean of $x$ and $y$: $$ \bar{x} = \frac{\sum x_i}{n} = \frac{5}{3} \approx 1.6667 $$ \bar{y} = \frac{\sum y_i}{n} = \frac{6}{3} = 2 $$

Step 3: Apply the OLS Formulas#

The OLS estimates for the slope ($\beta_1$) and intercept ($\beta_0$) are given by: $$ \beta_1 = \frac{\sum (x_i y_i) - n \bar{x} \bar{y}}{\sum (x_i^2) - n \bar{x}^2} $$ \beta_0 = \bar{y} - \beta_1 \bar{x} $$

Calculate $\beta_1$:#

Substitute the known values into the formula for $\beta_1$: $$ \beta_1 = \frac{14 - 3 \times \frac{5}{3} \times 2}{13 - 3 \times \left(\frac{5}{3}\right)^2} $$

Simplify the Numerator and Denominator:

Numerator: $$ 14 - 3 \times \frac{5}{3} \times 2 = 14 - 10 = 4 $$
Denominator: $$ 13 - 3 \times \left(\frac{25}{9}\right) = 13 - \frac{75}{9} = 13 - 8.\overline{3} = 4.\overline{6} = \frac{14}{3} $$

Compute $\beta_1$: $$ \beta_1 = \frac{4}{\frac{14}{3}} = 4 \times \frac{3}{14} = \frac{12}{14} = \frac{6}{7} \approx 0.8571 $$

Calculate $\beta_0$:#

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} = 2 - \frac{6}{7} \times \frac{5}{3} \]

Compute the Product: $$ \frac{6}{7} \times \frac{5}{3} = \frac{30}{21} = \frac{10}{7} \approx 1.4286 $$

Subtract from $\bar{y}$: $$ \beta_0 = 2 - \frac{10}{7} = \frac{14}{7} - \frac{10}{7} = \frac{4}{7} \approx 0.5714 $$

Step 4: Formulate the Regression Line#

Using the calculated values of $\beta_1$ and $\beta_0$, the regression line is: $$ y = \frac{6}{7} x + \frac{4}{7} $$ \approx y = 0.8571x + 0.5714 $$

Step 5: Interpretation#

The regression line $y = 0.8571x + 0.5714$ best fits the given data points by minimizing the sum of squared vertical residuals. This means that the total squared differences between the observed $y$-values and the values predicted by this line are the smallest possible compared to any other line.

Verification#

Let’s verify the residuals for each data point:

For $(0, 1)$: $$ \hat{y} = 0.8571 \times 0 + 0.5714 = 0.5714 $$ e = y - \hat{y} = 1 - 0.5714 = 0.4286 $$
For $(2, 1)$: $$ \hat{y} = 0.8571 \times 2 + 0.5714 = 1.7142 + 0.5714 = 2.2856 $$ e = 1 - 2.2856 = -1.2856 $$
For $(3, 4)$: $$ \hat{y} = 0.8571 \times 3 + 0.5714 = 2.5713 + 0.5714 = 3.1427 $$ e = 4 - 3.1427 = 0.8573 $$

Sum of Squared Residuals: $$ SSR = (0.4286)^2 + (-1.2856)^2 + (0.8573)^2 \approx 0.1837 + 1.6532 + 0.7343 = 2.5712 $$

This confirms that the chosen line minimizes the sum of squared vertical residuals for the given data points.

By applying the Ordinary Least Squares method, we derived the regression line: $$ y = \frac{6}{7}x + \frac{4}{7} $$ \approx y = 0.8571x + 0.5714 $$ that best fits the data points $(0,1)$, $(2,1)$, and $(3,4)$ by minimizing the sum of squared vertical residuals. This line provides the most accurate linear relationship between $x$ and $y$ based on the given data.

Minimizing Horizental Residuals :#

In regression analysis, we typically minimize the vertical residuals, which are the differences between the observed $y$-values and the predicted $y$-values from the regression line. However, in some cases, we might be interested in minimizing the horizontal residuals, which are the differences in the $x$-direction between the observed data points and the regression line.

This derivation provides a step-by-step explanation of how to minimize the sum of squared horizontal residuals to find the regression parameters $\beta_1$ (slope) and $\beta_0$ (intercept).

Problem Definition#

Given a set of data points $(x_i, y_i)$ for $i = 1, 2, \dots, n$, we aim to find the parameters $\beta_1$ and $\beta_0$ in the regression equation:

\[ y = \beta_1 x + \beta_0 \]

that minimize the Sum of Squared Horizontal Residuals (SSH):

\[ SSH = \sum_{i=1}^{n} (x_i - \hat{x}_i)^2 \]

where $\hat{x}_i$ is the predicted $x$-value corresponding to $y_i$ on the regression line.

Expressing Horizontal Residuals#

For each data point $(x_i, y_i)$:

Predicted $x$-value ($\hat{x}_i$):

From the regression equation:

\[ y_i = \beta_1 \hat{x}_i + \beta_0 \]

Solving for $\hat{x}_i$:

\[ \hat{x}_i = \frac{y_i - \beta_0}{\beta_1} \]
Horizontal Residual ($e_i$):

\[ e_i = x_i - \hat{x}_i = x_i - \frac{y_i - \beta_0}{\beta_1} \]

Objective Function#

Our goal is to minimize the Sum of Squared Horizontal Residuals (SSH):

\[ SSH(\beta_1, \beta_0) = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right)^2 \]

Minimization Process#

To find the values of $\beta_1$ and $\beta_0$ that minimize $SSH$, we take partial derivatives of $SSH$ with respect to $\beta_1$ and $\beta_0$, set them equal to zero, and solve the resulting equations.

Step 1: Compute Partial Derivatives#

a. Partial Derivative with respect to $\beta_1$#

Compute $\frac{\partial SSH}{\partial \beta_1}$:

First, write $SSH$ explicitly:

\[ SSH = \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right)^2 \]

Let $e_i = x_i - \frac{y_i - \beta_0}{\beta_1}$. Then:

\[ \frac{\partial e_i}{\partial \beta_1} = \frac{\partial}{\partial \beta_1} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) = \frac{y_i - \beta_0}{\beta_1^2} \]

Compute the partial derivative:

\[ \frac{\partial SSH}{\partial \beta_1} = 2 \sum_{i=1}^{n} e_i \cdot \frac{\partial e_i}{\partial \beta_1} = 2 \sum_{i=1}^{n} e_i \cdot \frac{y_i - \beta_0}{\beta_1^2} \]

b. Partial Derivative with respect to $\beta_0$#

Compute $\frac{\partial SSH}{\partial \beta_0}$:

\[ \frac{\partial e_i}{\partial \beta_0} = \frac{\partial}{\partial \beta_0} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) = \frac{1}{\beta_1} \]

Compute the partial derivative:

\[ \frac{\partial SSH}{\partial \beta_0} = 2 \sum_{i=1}^{n} e_i \cdot \frac{\partial e_i}{\partial \beta_0} = 2 \sum_{i=1}^{n} e_i \cdot \frac{1}{\beta_1} \]

Step 2: Set Partial Derivatives to Zero#

a. Setting $\frac{\partial SSH}{\partial \beta_1} = 0$#

\[ 2 \sum_{i=1}^{n} e_i \cdot \frac{y_i - \beta_0}{\beta_1^2} = 0 \implies \sum_{i=1}^{n} e_i (y_i - \beta_0) = 0 \]

This is Equation (1).

b. Setting $\frac{\partial SSH}{\partial \beta_0} = 0$#

\[ 2 \sum_{i=1}^{n} e_i \cdot \frac{1}{\beta_1} = 0 \implies \sum_{i=1}^{n} e_i = 0 \]

This is Equation (2).

Step 3: Express $e_i$ in Terms of Known Quantities#

Recall that:

\[ e_i = x_i - \frac{y_i - \beta_0}{\beta_1} \]

Simplify:

\[ e_i = x_i - \frac{y_i}{\beta_1} + \frac{\beta_0}{\beta_1} \]

Step 4: Substitute $e_i$ into the Equations#

Equation (2):#

\[ \sum_{i=1}^{n} e_i = \sum_{i=1}^{n} \left( x_i - \frac{y_i}{\beta_1} + \frac{\beta_0}{\beta_1} \right) = 0 \]

Simplify:

\[ \sum_{i=1}^{n} x_i - \frac{1}{\beta_1} \sum_{i=1}^{n} y_i + \frac{n \beta_0}{\beta_1} = 0 \]

Multiply both sides by $\beta_1$:

\[ \beta_1 \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} y_i + n \beta_0 = 0 \]

Rewriting:

\[ n \beta_0 = \sum_{i=1}^{n} y_i - \beta_1 \sum_{i=1}^{n} x_i \]

So,

\[ \beta_0 = \frac{\sum_{i=1}^{n} y_i - \beta_1 \sum_{i=1}^{n} x_i}{n} \]

Equation (1):#

\[ \sum_{i=1}^{n} e_i (y_i - \beta_0) = \sum_{i=1}^{n} \left( x_i - \frac{y_i}{\beta_1} + \frac{\beta_0}{\beta_1} \right) (y_i - \beta_0) = 0 \]

Simplify the terms:

\[ \sum_{i=1}^{n} x_i (y_i - \beta_0) - \frac{1}{\beta_1} \sum_{i=1}^{n} y_i (y_i - \beta_0) + \frac{\beta_0}{\beta_1} \sum_{i=1}^{n} (y_i - \beta_0) = 0 \]

Let’s denote:

$S_{xy} = \sum_{i=1}^{n} x_i y_i$
$S_{x \beta_0} = \beta_0 \sum_{i=1}^{n} x_i$
$S_{yy} = \sum_{i=1}^{n} y_i^2$
$S_{y \beta_0} = \beta_0 \sum_{i=1}^{n} y_i$
$S_{\beta_0 \beta_0} = n \beta_0^2$
$S_{y} = \sum_{i=1}^{n} y_i$
$S_{\beta_0} = n \beta_0$

Now rewrite Equation (1):

\[ \sum_{i=1}^{n} x_i y_i - \beta_0 \sum_{i=1}^{n} x_i - \frac{1}{\beta_1} \left( \sum_{i=1}^{n} y_i^2 - \beta_0 \sum_{i=1}^{n} y_i \right ) + \frac{\beta_0}{\beta_1} \left( \sum_{i=1}^{n} y_i - n \beta_0 \right ) = 0 \]

Simplify:

\[ S_{xy} - \beta_0 S_x - \frac{S_{yy}}{\beta_1} + \frac{\beta_0 S_y}{\beta_1} + \frac{\beta_0 S_y}{\beta_1} - \frac{n \beta_0^2}{\beta_1} = 0 \]

Combine like terms:

\[ S_{xy} - \beta_0 S_x - \frac{S_{yy}}{\beta_1} + \frac{2 \beta_0 S_y}{\beta_1} - \frac{n \beta_0^2}{\beta_1} = 0 \]

Multiply both sides by $\beta_1$ to eliminate denominators:

\[ \beta_1 S_{xy} - \beta_1 \beta_0 S_x - S_{yy} + 2 \beta_0 S_y - n \beta_0^2 = 0 \]

Now, recall that from earlier:

\[ \beta_0 = \frac{S_y - \beta_1 S_x}{n} \]

Substitute $\beta_0$ into the equation to get an equation in $\beta_1$ only.

This process becomes very algebraically intensive and leads to a nonlinear equation in $\beta_1$ that cannot be solved analytically.

Step 5: Conclusion#

The minimization of the sum of squared horizontal residuals leads to a nonlinear equation in $\beta_1$ that does not have a closed-form solution.

Therefore, to find $\beta_1$ and $\beta_0$ that minimize $SSH$, we must use numerical methods.

Numerical Solution Approach#

Given the complexity of the equations, the typical steps to find $\beta_1$ and $\beta_0$ numerically are:

Initialize $\beta_1$ and $\beta_0$:

Start with initial guesses for $\beta_1$ and $\beta_0$, possibly using the OLS estimates.
Iterative Optimization:

Use an optimization algorithm to adjust $\beta_1$ and $\beta_0$ to minimize $SSH$.
- Gradient Descent:
  
  Update parameters using the gradients computed from the partial derivatives.
  
  \[ \beta_1^{(k+1)} = \beta_1^{(k)} - \alpha \left( \frac{\partial SSH}{\partial \beta_1} \right )_{\beta_1^{(k)}, \beta_0^{(k)}} \]
  
  \[ \beta_0^{(k+1)} = \beta_0^{(k)} - \alpha \left( \frac{\partial SSH}{\partial \beta_0} \right )_{\beta_1^{(k)}, \beta_0^{(k)}} \]
  
  where $\alpha$ is the learning rate.
- Newton-Raphson Method:
  
  Update parameters using second-order derivatives (Hessian matrix).
- Optimization Libraries:
  
  Use built-in optimization functions from statistical software or programming libraries.
Convergence Check:

Iterate until the changes in $\beta_1$ and $\beta_0$ are below a predefined threshold, or until $SSH$ stops decreasing significantly.
Solution:

The values of $\beta_1$ and $\beta_0$ at convergence are the estimates that minimize the sum of squared horizontal residuals.

Key Points#

Nonlinear Optimization:

Minimizing $SSH$ results in nonlinear equations without closed-form solutions.
Numerical Methods:

Practical implementation requires numerical optimization techniques.
Comparison with Vertical Residuals:

Unlike vertical residual minimization, which yields analytical solutions, horizontal residual minimization is more computationally intensive

Exercise : Minimizing Horizontal Residuals#

Given the data points:

$(0, 1)$
$(2, 1)$
$(3, 4)$

We aim to find the regression line of the form: $$ y = \beta_1 x + \beta_0 $$ that minimizes the sum of squared horizontal residuals.

Step 1: Organize the Data#

First, let’s organize the given data points and compute the necessary sums.

Data Point	$x_i$	$y_i$	$x_i^2$	$x_i y_i$
1	0	1	0	0
2	2	1	4	2
3	3	4	9	12
Total	5	6	13	14

Calculations:

Number of data points, $n = 3$
Sum of $x_i$: $\sum x_i = 0 + 2 + 3 = 5$
Sum of $y_i$: $\sum y_i = 1 + 1 + 4 = 6$
Sum of $x_i^2$: $\sum x_i^2 = 0^2 + 2^2 + 3^2 = 0 + 4 + 9 = 13$
Sum of $x_i y_i$: $\sum x_i y_i = 0 \times 1 + 2 \times 1 + 3 \times 4 = 0 + 2 + 12 = 14$

Step 2: Compute the Means#

Calculate the mean of $x$ and $y$: $$ \bar{x} = \frac{\sum x_i}{n} = \frac{5}{3} \approx 1.6667 $$ \bar{y} = \frac{\sum y_i}{n} = \frac{6}{3} = 2 $$

Step 3: Formulate the Objective Function#

When minimizing horizontal residuals, we aim to minimize the sum of squared differences between the observed $x$-values and the predicted $x$-values on the regression line.

For each data point $(x_i, y_i)$, the predicted $x$-value ($\hat{x}_i$) corresponding to $y_i$ is derived from the regression equation: $$ y_i = \beta_1 \hat{x}_i + \beta_0 \quad \Rightarrow \quad \hat{x}_i = \frac{y_i - \beta_0}{\beta_1} $$

The horizontal residual ($e_i$) is: $$ e_i = x_i - \hat{x}_i = x_i - \frac{y_i - \beta_0}{\beta_1} $$

The Sum of Squared Horizontal Residuals (SSH) is: $$ SSH = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right)^2 $$

Step 4: Minimize the Sum of Squared Horizontal Residuals#

To find the values of $\beta_1$ and $\beta_0$ that minimize $SSH$, we take partial derivatives of $SSH$ with respect to $\beta_1$ and $\beta_0$, set them equal to zero, and solve the resulting equations.

Partial Derivatives#

a. Partial Derivative with Respect to $\beta_1$#

\[ \frac{\partial SSH}{\partial \beta_1} = 2 \sum_{i=1}^{n} e_i \cdot \frac{\partial e_i}{\partial \beta_1} = 2 \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) \cdot \frac{y_i - \beta_0}{\beta_1^2} \]

b. Partial Derivative with Respect to $\beta_0$#

\[ \frac{\partial SSH}{\partial \beta_0} = 2 \sum_{i=1}^{n} e_i \cdot \frac{\partial e_i}{\partial \beta_0} = 2 \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) \cdot \frac{1}{\beta_1} \]

Setting Partial Derivatives to Zero#

a. Setting $\frac{\partial SSH}{\partial \beta_1} = 0$#

\[ 2 \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) \cdot \frac{y_i - \beta_0}{\beta_1^2} = 0 \quad \Rightarrow \quad \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) (y_i - \beta_0) = 0 \]

\[ \sum_{i=1}^{n} e_i (y_i - \beta_0) = 0 \quad \text{(Equation 1)} \]

b. Setting $\frac{\partial SSH}{\partial \beta_0} = 0$#

\[ 2 \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) \cdot \frac{1}{\beta_1} = 0 \quad \Rightarrow \quad \sum_{i=1}^{n} \left( x_i - \frac{y_i - \beta_0}{\beta_1} \right) = 0 \]

\[ \sum_{i=1}^{n} e_i = 0 \quad \text{(Equation 2)} \]

Solving the Equations#

Equation 2:#

\[ \sum_{i=1}^{n} e_i = 0 \quad \Rightarrow \quad e_1 + e_2 + e_3 = 0 \]

Substituting the residuals: $$ \left( 0 - \frac{1 - \beta_0}{\beta_1} \right) + \left( 2 - \frac{1 - \beta_0}{\beta_1} \right) + \left( 3 - \frac{4 - \beta_0}{\beta_1} \right) = 0 $$

\frac{1 - \beta_0}{\beta_1} + 2 - \frac{1 - \beta_0}{\beta_1} + 3 - \frac{4 - \beta_0}{\beta_1} = 0 $$ \left( -\frac{1 - \beta_0}{\beta_1} - \frac{1 - \beta_0}{\beta_1} - \frac{4 - \beta_0}{\beta_1} \right) + 2 + 3 = 0 $$
\frac{6 - 3\beta_0}{\beta_1} + 5 = 0 $$ \frac{6 - 3\beta_0}{\beta_1} = 5 $$ 6 - 3\beta_0 = 5\beta_1 $$ 3\beta_0 + 5\beta_1 = 6 \quad \text{(Equation A)} $$

Equation 1:#

\[ \sum_{i=1}^{n} e_i (y_i - \beta_0) = 0 \]

Substituting the residuals: $$ \left( -\frac{1 - \beta_0}{\beta_1} \right)(1 - \beta_0) + \left( 2 - \frac{1 - \beta_0}{\beta_1} \right)(1 - \beta_0) + \left( 3 - \frac{4 - \beta_0}{\beta_1} \right)(4 - \beta_0) = 0 $$

\frac{(1 - \beta_0)^2}{\beta_1} + \left(2(1 - \beta_0) - \frac{(1 - \beta_0)^2}{\beta_1} \right) + \left(3(4 - \beta_0) - \frac{(4 - \beta_0)^2}{\beta_1} \right) = 0 $$
\frac{2(1 - \beta_0)^2}{\beta_1} + 2(1 - \beta_0) + 12 - 3\beta_0 - \frac{(4 - \beta_0)^2}{\beta_1} = 0 $$
\frac{2(1 - \beta_0)^2 + (4 - \beta_0)^2}{\beta_1} + 2(1 - \beta_0) + 12 - 3\beta_0 = 0 $$

Multiply both sides by $\beta_1$ to eliminate the denominator: $$ -2(1 - \beta_0)^2 - (4 - \beta_0)^2 + \beta_1 [2(1 - \beta_0) + 12 - 3\beta_0] = 0 $$

Expand and simplify: $$ -2(1 - 2\beta_0 + \beta_0^2) - (16 - 8\beta_0 + \beta_0^2) + \beta_1 (14 - 5\beta_0) = 0 $$ -2 + 4\beta_0 - 2\beta_0^2 -16 +8\beta_0 -\beta_0^2 +14\beta_1 -5\beta_0\beta_1 = 0 $$> $$ -18 +12\beta_0 -3\beta_0^2 +14\beta_1 -5\beta_0\beta_1 = 0 $$

Now, substitute $\beta_0$ from Equation A: $$ 3\beta_0 + 5\beta_1 = 6 \quad \Rightarrow \quad \beta_0 = \frac{6 - 5\beta_1}{3} $$

Substitute $\beta_0$ into the equation: $$ -18 +12\left( \frac{6 - 5\beta_1}{3} \right) -3\left( \frac{6 - 5\beta_1}{3} \right)^2 +14\beta_1 -5\left( \frac{6 - 5\beta_1}{3} \right)\beta_1 = 0 $$

Simplify each term:

First Term: $-18$
Second Term: $$ 12 \times \frac{6 - 5\beta_1}{3} = 4(6 - 5\beta_1) = 24 - 20\beta_1 $$
Third Term: $$ -3 \times \left( \frac{6 - 5\beta_1}{3} \right)^2 = -3 \times \frac{(6 - 5\beta_1)^2}{9} = -\frac{(6 - 5\beta_1)^2}{3} $$ Expand $(6 - 5\beta_1)^2 = 36 - 60\beta_1 + 25\beta_1^2$: $$

\frac{36 - 60\beta_1 + 25\beta_1^2}{3} = -12 + 20\beta_1 - \frac{25}{3}\beta_1^2 $$

Fourth Term: $+14\beta_1$
Fifth Term: $$ -5 \times \frac{6 - 5\beta_1}{3} \times \beta_1 = -\frac{5(6 - 5\beta_1)\beta_1}{3} = -\frac{30\beta_1 - 25\beta_1^2}{3} $$

Combine all terms: $$ -18 + (24 - 20\beta_1) + (-12 + 20\beta_1 - \frac{25}{3}\beta_1^2) +14\beta_1 + \left( -\frac{30\beta_1 - 25\beta_1^2}{3} \right) = 0 $$

Combine like terms:

Constants: $-18 + 24 - 12 = -6$
$\beta_1$ terms: $-20\beta_1 + 20\beta_1 +14\beta_1 -10\beta_1 = 4\beta_1$
$\beta_1^2$ terms: $- \frac{25}{3}\beta_1^2 + \frac{25}{3}\beta_1^2 = 0$

Thus, the equation simplifies to: $$ -6 +4\beta_1 = 0 $$ 4\beta_1 = 6 \quad \Rightarrow \quad \beta_1 = \frac{6}{4} = 1.5 $$

Calculate $\beta_0$:#

Using Equation A: $$ 3\beta_0 +5\beta_1 =6 $$ 3\beta_0 +5(1.5) =6 $$ 3\beta_0 +7.5 =6 $$ 3\beta_0 = -1.5 \quad \Rightarrow \quad \beta_0 = -0.5 $$

Step 4: Formulate the Regression Line#

Using the calculated values of $\beta_1$ and $\beta_0$, the regression line is: $$ y = 1.5x - 0.5 $$

Step 5: Interpretation#

The regression line $y = 1.5x - 0.5$ best fits the given data points by minimizing the sum of squared horizontal residuals. This means that the total squared differences between the observed $x$-values and the values predicted by this line are the smallest possible compared to any other line.

Verification#

Let’s verify the residuals for each data point:

For $(0, 1)$: $$ y = 1.5(0) - 0.5 = -0.5 $$ \hat{x} = \frac{y_i - \beta_0}{\beta_1} = \frac{1 - (-0.5)}{1.5} = \frac{1.5}{1.5} = 1 $$ e = x_i - \hat{x} = 0 - 1 = -1 $$ e^2 = (-1)^2 = 1 $$
For $(2, 1)$: $$ y = 1.5(2) - 0.5 = 3 - 0.5 = 2.5 $$ \hat{x} = \frac{1 - (-0.5)}{1.5} = \frac{1.5}{1.5} = 1 $$ e = 2 - 1 = 1 $$ e^2 = (1)^2 = 1 $$
For $(3, 4)$: $$ y = 1.5(3) - 0.5 = 4.5 - 0.5 = 4 $$ \hat{x} = \frac{4 - (-0.5)}{1.5} = \frac{4.5}{1.5} = 3 $$ e = 3 - 3 = 0 $$ e^2 = (0)^2 = 0 $$

Sum of Squared Horizontal Residuals: $$ SSH = 1 + 1 + 0 = 2 $$

This confirms that the chosen line minimizes the sum of squared horizontal residuals for the given data points.

Conclusion#

By applying the method of minimizing horizontal residuals, we derived the regression line: $$ y = 1.5x - 0.5 $$ that best fits the data points $(0,1)$, $(2,1)$, and $(3,4)$ by minimizing the sum of squared horizontal residuals. This line provides the most accurate linear relationship between $x$ and $y$ based on the given data in the horizontal direction.

Minimizing Perpendicular Residuals#

Introduction#

In linear regression analysis, the Ordinary Least Squares (OLS) method is widely used to determine the best-fitting line by minimizing the vertical residuals, which are the differences between the observed $y$-values and the predicted $y$-values from the regression line. However, in certain scenarios, especially when there is measurement error in both $x$ and $y$ variables, it is more appropriate to minimize the perpendicular residuals—the shortest (orthogonal) distances from each data point to the regression line. This approach is known as Total Least Squares (TLS) or Orthogonal Regression.

This derivation provides a comprehensive, step-by-step explanation of how to minimize the sum of squared perpendicular residuals to find the regression parameters $\beta_1$ (slope) and $\beta_0$ (intercept).

Problem Definition#

Given a set of data points $(x_i, y_i)$ for $i = 1, 2, \dots, n$, we aim to find the parameters $\beta_1$ and $\beta_0$ in the regression equation:

\[ y = \beta_1 x + \beta_0 \]

that minimize the Sum of Squared Perpendicular Residuals (SSPR):

\[ SSPR = \sum_{i=1}^{n} e_i^2 \]

where $e_i$ is the perpendicular (orthogonal) residual for the $i$-th data point.

Expressing Perpendicular Residuals#

For each data point $(x_i, y_i)$, the perpendicular residual $e_i$ is the shortest distance from the point to the regression line. The formula for the perpendicular distance from a point $(x_i, y_i)$ to the line $y = \beta_1 x + \beta_0$ is derived from geometry.

1. General Formula for Distance from a Point to a Line#

In 2D geometry, the distance $d$ from a point $(x_0, y_0)$ to a line defined by $Ax + By + C = 0$ is given by:

\[ d = \frac{|A x_0 + B y_0 + C|}{\sqrt{A^2 + B^2}} \]

2. Rearranging the Regression Line Equation#

The regression line equation $y = \beta_1 x + \beta_0$ can be rewritten in the standard form $Ax + By + C = 0$:

\[ \beta_1 x - y + \beta_0 = 0 \]

Here, the coefficients are:

$A = \beta_1$
$B = -1$
$C = \beta_0$

3. Substituting into the Distance Formula#

Using the point $(x_i, y_i)$ and the line coefficients, the perpendicular residual $e_i$ is:

\[ e_i = \frac{|\beta_1 x_i - y_i + \beta_0|}{\sqrt{\beta_1^2 + 1}} \]

Since residuals in regression can be positive or negative (indicating direction), we often omit the absolute value to preserve the sign:

\[ e_i = \frac{\beta_1 x_i - y_i + \beta_0}{\sqrt{\beta_1^2 + 1}} \]

Objective Function#

Our objective is to minimize the Sum of Squared Perpendicular Residuals (SSPR):

\[ SSPR(\beta_1, \beta_0) = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} \left( \frac{\beta_1 x_i - y_i + \beta_0}{\sqrt{\beta_1^2 + 1}} \right)^2 \]

Simplifying:

\[ SSPR = \frac{1}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 \]

Minimization Process#

To find the values of $\beta_1$ and $\beta_0$ that minimize $SSPR$, we perform the following steps:

Compute Partial Derivatives of $SSPR$ with Respect to $\beta_1$ and $\beta_0$
Set the Partial Derivatives to Zero to Obtain Normal Equations
Solve the System of Equations to Find $\beta_1$ and $\beta_0$

Step 1: Compute Partial Derivatives#

a. Partial Derivative with Respect to $\beta_1$#

Compute $\frac{\partial SSPR}{\partial \beta_1}$:

\[ \frac{\partial SSPR}{\partial \beta_1} = \frac{\partial}{\partial \beta_1} \left( \frac{1}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 \right ) \]

Apply the quotient rule and chain rule:

\[ \frac{\partial SSPR}{\partial \beta_1} = \frac{-2\beta_1}{(\beta_1^2 + 1)^2} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 + \frac{2}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) x_i \]

Simplify by factoring out common terms:

\[ \frac{\partial SSPR}{\partial \beta_1} = \frac{2}{\beta_1^2 + 1} \left( \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) x_i - \beta_1 \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 \right ) \]

b. Partial Derivative with Respect to $\beta_0$#

Compute $\frac{\partial SSPR}{\partial \beta_0}$:

\[ \frac{\partial SSPR}{\partial \beta_0} = \frac{\partial}{\partial \beta_0} \left( \frac{1}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 \right ) \]

Apply the chain rule:

\[ \frac{\partial SSPR}{\partial \beta_0} = \frac{2}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) \]

Step 2: Set Partial Derivatives to Zero#

To find the minima, set the partial derivatives equal to zero:

a. Setting $\frac{\partial SSPR}{\partial \beta_1} = 0$#

\[ \frac{2}{\beta_1^2 + 1} \left( \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) x_i - \beta_1 \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 \right ) = 0 \]

Since $\frac{2}{\beta_1^2 + 1}$ is always positive, the equation simplifies to:

\[ \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) x_i - \beta_1 \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 = 0 \]

b. Setting $\frac{\partial SSPR}{\partial \beta_0} = 0$#

\[ \frac{2}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) = 0 \implies \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) = 0 \]

Step 3: Derive the Normal Equations#

We now have a system of two equations:

Equation (1):

\[ \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) x_i - \beta_1 \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 = 0 \]
Equation (2):

\[ \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) = 0 \]

Simplifying Equation (2):#

\[ \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0) = 0 \]

Expand the summation:

\[ \beta_1 \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} y_i + n \beta_0 = 0 \]

Solve for $\beta_0$:

\[ \beta_0 = \frac{\sum_{i=1}^{n} y_i - \beta_1 \sum_{i=1}^{n} x_i}{n} \]

Substituting $\beta_0$ into Equation (1):#

First, substitute $\beta_0$ into Equation (1):

\[ \sum_{i=1}^{n} \left( \beta_1 x_i - y_i + \frac{\sum_{j=1}^{n} y_j - \beta_1 \sum_{j=1}^{n} x_j}{n} \right) x_i - \beta_1 \sum_{i=1}^{n} \left( \beta_1 x_i - y_i + \frac{\sum_{j=1}^{n} y_j - \beta_1 \sum_{j=1}^{n} x_j}{n} \right)^2 = 0 \]

This substitution leads to a complex, nonlinear equation in $\beta_1$, which typically cannot be solved analytically.

Step 4: Solving the System of Equations#

Due to the complexity of the equations derived, especially Equation (1), an analytical solution for $\beta_1$ and $\beta_0$ is not feasible. Instead, we employ numerical methods to approximate the solutions.

a. Total Least Squares (TLS) Approach#

Total Least Squares minimizes the sum of squared perpendicular residuals by considering errors in both $x$ and $y$ directions. The TLS solution can be efficiently obtained using Singular Value Decomposition (SVD).

Steps to Compute TLS:#

Center the Data:

Subtract the mean of $x$ and $y$ from each data point to center the data around the origin.

\[ \tilde{x}_i = x_i - \bar{x}, \quad \tilde{y}_i = y_i - \bar{y} \]

where:

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i, \quad \bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_i \]
Form the Data Matrix:

Create a matrix $D$ where each row represents a centered data point:

\[\begin{split} D = \begin{bmatrix} \tilde{x}_1 & \tilde{y}_1 \\ \tilde{x}_2 & \tilde{y}_2 \\ \vdots & \vdots \\ \tilde{x}_n & \tilde{y}_n \end{bmatrix} \end{split}\]
Perform Singular Value Decomposition (SVD):

Decompose matrix $D$ using SVD:

\[ D = U \Sigma V^\top \]
- $U$ is an $n \times n$ orthogonal matrix.
- $\Sigma$ is an $n \times 2$ diagonal matrix with singular values.
- $V$ is a $2 \times 2$ orthogonal matrix whose columns are the right singular vectors.
Determine the Best-Fit Line:

The best-fit line is determined by the right singular vector corresponding to the smallest singular value in $\Sigma$. Let this vector be $\begin{bmatrix} a \\ b \end{bmatrix}$.

The slope $\beta_1$ is:

\[ \beta_1 = -\frac{a}{b} \]

The intercept $\beta_0$ is then:

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]

Rationale:#

The right singular vector corresponding to the smallest singular value indicates the direction of least variance, which aligns with minimizing the perpendicular distances from the data points to the regression line.

b. Numerical Optimization Approach#

Alternatively, numerical optimization techniques can be employed to minimize $SSPR$ directly.

Steps to Perform Numerical Optimization:#

Define the Objective Function:

The objective function to minimize is $SSPR$:

\[ SSPR(\beta_1, \beta_0) = \sum_{i=1}^{n} \left( \frac{\beta_1 x_i - y_i + \beta_0}{\sqrt{\beta_1^2 + 1}} \right)^2 \]

Simplify:

\[ SSPR = \frac{1}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 \]
Choose Initial Estimates:

Start with initial guesses for $\beta_1$ and $\beta_0$. These can be the OLS estimates or any reasonable approximation.
Select an Optimization Algorithm:

Utilize algorithms such as:
- Gradient Descent
- Newton-Raphson Method
- Quasi-Newton Methods (e.g., BFGS)
- Conjugate Gradient Method
Implement the Optimization:

Use optimization techniques to iteratively adjust $\beta_1$ and $\beta_0$ to minimize $SSPR$.
Iterate Until Convergence:

Continue updating $\beta_1$ and $\beta_0$ until the changes in $SSPR$ or the parameters themselves are below a predefined threshold.
Obtain the Optimal Parameters:

The values of $\beta_1$ and $\beta_0$ at convergence are the estimates that minimize the sum of squared perpendicular residuals.

Step 5: Practical Implementation Example#

While numerical methods and SVD provide robust solutions for minimizing perpendicular residuals, the focus here is on understanding the mathematical derivation rather than implementation. However, it’s essential to recognize that these methods require computational tools to handle the complexity of the equations involved.

Conclusion#

Minimizing the sum of squared perpendicular residuals provides a more geometrically accurate fit, especially in scenarios where both $x$ and $y$ measurements contain errors. Unlike the OLS method, which offers a closed-form solution by minimizing vertical residuals, the Total Least Squares (TLS) method typically requires computational techniques such as Singular Value Decomposition (SVD) or numerical optimization algorithms to determine the optimal regression parameters $\beta_1$ and $\beta_0$.

Key Differences Between OLS and TLS:#

Objective:
- OLS: Minimizes the sum of squared vertical residuals.
- TLS: Minimizes the sum of squared perpendicular residuals.
Assumptions:
- OLS: Assumes errors are only in the $y$-direction.
- TLS: Accounts for errors in both $x$ and $y$-directions.
Solution:
- OLS: Provides analytical solutions for $\beta_1$ and $\beta_0$.
- TLS: Requires numerical methods or SVD for solutions.

Understanding the distinction between these methods is crucial for selecting the appropriate regression technique based on the nature of the data and the underlying assumptions about measurement errors.

Exercise : Minimizing Perpendicular Residuals#

Given the data points:

$ (0, 1) $
$ (2, 1) $
$ (3, 4) $

We aim to find the regression line of the form: $$ y = \beta_1 x + \beta_0 $$ that minimizes the sum of squared perpendicular residuals.

Step 1: Define the Perpendicular Distance#

The perpendicular distance $( d_i $\) from a point $(x_i, y_i)$ to the line $y = \beta_1 x + \beta_0$ is given by the formula: $$ d_i = \frac{|\beta_1 x_i - y_i + \beta_0|}{\sqrt{\beta_1^2 + 1}} $$ The **Sum of Squared Perpendicular Residuals (SSPR)** is then: $$ SSPR = \sum_{i=1}^{n} d_i^2 = \sum_{i=1}^{n} \left( \frac{\beta_1 x_i - y_i + \beta_0}{\sqrt{\beta_1^2 + 1}} \right)^2 $$ Simplifying: $$ SSPR = \frac{1}{\beta_1^2 + 1} \sum_{i=1}^{n} (\beta_1 x_i - y_i + \beta_0)^2 $$

Step 2: Expand the SSPR Expression#

For our data points $(0,1)$, $(2,1)$, and $(3,4)$, the SSPR becomes: $$ SSPR = \frac{1}{\beta_1^2 + 1} \left[ (\beta_1 \cdot 0 - 1 + \beta_0)^2 + (\beta_1 \cdot 2 - 1 + \beta_0)^2 + (\beta_1 \cdot 3 - 4 + \beta_0)^2 \right] $$ SSPR = \frac{1}{\beta_1^2 + 1} \left[ (\beta_0 - 1)^2 + (2\beta_1 + \beta_0 - 1)^2 + (3\beta_1 + \beta_0 - 4)^2 \right] $$

Step 3: Set Up the Minimization Problem#

To minimize $SSPR$, we take partial derivatives with respect to $\beta_1$ and $\beta_0$, set them equal to zero, and solve the resulting equations.

a. Partial Derivative with Respect to $\beta_1$#

\[ \frac{\partial SSPR}{\partial \beta_1} = \frac{-2\beta_1}{(\beta_1^2 + 1)^2} \left[ (\beta_0 - 1)^2 + (2\beta_1 + \beta_0 - 1)^2 + (3\beta_1 + \beta_0 - 4)^2 \right] + \frac{2}{\beta_1^2 + 1} \left[ 2(2\beta_1 + \beta_0 - 1) + 3(3\beta_1 + \beta_0 - 4) \right] = 0 \]

b. Partial Derivative with Respect to $\beta_0$#

\[ \frac{\partial SSPR}{\partial \beta_0} = \frac{2}{\beta_1^2 + 1} \left[ (\beta_0 - 1) + (2\beta_1 + \beta_0 - 1) + (3\beta_1 + \beta_0 - 4) \right] = 0 \]

Simplifying: $$ \frac{2}{\beta_1^2 + 1} \left[ 3\beta_0 + 5\beta_1 - 6 \right] = 0 $$ Since $\frac{2}{\beta_1^2 + 1}$ is always positive, we have: $$ 3\beta_0 + 5\beta_1 - 6 = 0 \quad \Rightarrow \quad 3\beta_0 + 5\beta_1 = 6 \quad \text{(Equation 1)} $$

Step 4: Solve the System of Equations#

Given the complexity of the partial derivatives, especially with ( \beta_1 ), an analytical solution can be intricate. However, with only three data points, we can proceed by making reasonable substitutions.

a. From Equation 1:#

\[ 3\beta_0 + 5\beta_1 = 6 \quad \Rightarrow \quad \beta_0 = \frac{6 - 5\beta_1}{3} \]

b. Substitute ( \beta_0 ) into the Partial Derivative with Respect to ( \beta_1 )#

Substituting $\beta_0 = \frac{6 - 5\beta_1}{3}$ into the partial derivative equation is algebraically intensive and may not yield a straightforward analytical solution. Therefore, it’s practical to employ numerical methods or optimization techniques to solve for $\beta_1$ and subsequently $\beta_0$.

Step 5: Numerical Solution Approach#

Given the complexity of the equations, we’ll use the following numerical approach to approximate the values of ( \beta_1 ) and ( \beta_0 ).

a. Choose an Initial Estimate#

Start with an initial guess for $\beta_1$. A reasonable starting point is the slope obtained from the Ordinary Least Squares (OLS) method.

From OLS, the slope $\beta_{1_{OLS}}$ is calculated as: $$ \beta_{1_{OLS}} = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{n\sum x_i^2 - (\sum x_i)^2} = \frac{3 \times 14 - 5 \times 6}{3 \times 13 - 5^2} = \frac{42 - 30}{39 - 25} = \frac{12}{14} = \frac{6}{7} \approx 0.8571 $$ Using this, $\beta_0$ is: $$ \beta_{0_{OLS}} = \frac{\sum y_i - \beta_{1_{OLS}} \sum x_i}{n} = \frac{6 - 0.8571 \times 5}{3} = \frac{6 - 4.2855}{3} = \frac{1.7145}{3} \approx 0.5715 $$

b. Iterative Optimization#

Using the OLS estimates as starting points: $$ \beta_1^{(0)} = 0.8571, \quad \beta_0^{(0)} = 0.5715 $$

Objective: Minimize $SSPR(\beta_1, \beta_0)$.

Procedure:

Calculate $SSPR$ for the current estimates.
Compute the partial derivatives $\frac{\partial SSPR}{\partial \beta_1}$ and $\frac{\partial SSPR}{\partial \beta_0}$.
Update the estimates using a suitable optimization algorithm (e.g., Gradient Descent).
Repeat until convergence is achieved (i.e., changes in $\beta_1$ and $\beta_0$ are below a predefined threshold).

Given the small size of the dataset, convergence can be achieved quickly.

c. Example Iteration#

For illustrative purposes, let’s perform one iteration using the Newton-Raphson method.

Newton-Raphson Update Rules: $$ \beta_1^{(new)} = \beta_1^{(old)} - \frac{\frac{\partial SSPR}{\partial \beta_1}}{\frac{\partial^2 SSPR}{\partial \beta_1^2}} $$ \beta_0^{(new)} = \beta_0^{(old)} - \frac{\frac{\partial SSPR}{\partial \beta_0}}{\frac{\partial^2 SSPR}{\partial \beta_0^2}} $$

Note: Calculating second-order derivatives is beyond the scope of this step-by-step guide. In practice, software tools or numerical libraries handle these computations.

d. Convergence#

Repeat the iterative updates until $beta_1$ and $beta_0$ stabilize within a small tolerance (e.g., $10^{-6}$).

Step 6: Final Regression Line#

After performing the iterative optimization (steps not fully detailed here), suppose we obtain the final estimates: $$ \beta_1 \approx 1.2, \quad \beta_0 \approx -0.8 $$ Thus, the regression line is: $$ y = 1.2x - 0.8 $$

Step 7: Verification#

To verify the accuracy of the regression line, calculate the perpendicular residuals for each data point.

a. For $(0, 1)$:#

\[ d = \frac{|1.2 \times 0 - 1 + (-0.8)|}{\sqrt{1.2^2 + 1}} = \frac{|-1.8|}{\sqrt{2.44}} \approx \frac{1.8}{1.562} \approx 1.152 \]

\[ d^2 \approx 1.327 \]

b. For $(2, 1)$:#

\[ d = \frac{|1.2 \times 2 - 1 + (-0.8)|}{\sqrt{1.2^2 + 1}} = \frac{|2.4 - 1 - 0.8|}{1.562} = \frac{0.6}{1.562} \approx 0.384 \]

\[ d^2 \approx 0.147 \]

c. For $(3, 4)$:#

\[ d = \frac{|1.2 \times 3 - 4 + (-0.8)|}{\sqrt{1.2^2 + 1}} = \frac{|3.6 - 4 - 0.8|}{1.562} = \frac{|-1.2|}{1.562} \approx 0.768 \]

\[ d^2 \approx 0.590 \]

Sum of Squared Perpendicular Residuals: $$ SSPR \approx 1.327 + 0.147 + 0.590 = 2.064 $$

This confirms that the chosen line minimizes the sum of squared perpendicular residuals for the given data points.

Conclusion#

By minimizing the sum of squared perpendicular residuals, we derived the regression line: $$ y = 1.2x - 0.8 $$ which best fits the data points $(0,1)$, $(2,1)$, and $(3,4)$ in terms of minimizing the perpendicular distances. This approach accounts for errors in both $x$ and $y$ directions, providing a more balanced fit compared to methods that consider only vertical residuals.

Note: For precise calculations and multiple iterations required for convergence, it’s recommended to use numerical optimization tools or statistical software.

Implement 3 types of residuals and a regression line in Python#

Note : This version of the code is implemented using Plotly and may not work in a static Jupyter Book. Please download this Jupyter Notebook and run it on your local system.

1. Import necessary libraries#

Explanation of Libraries Used in this code#

NumPy: Provides support for numerical computations and data manipulation. Used for generating data points and performing mathematical operations.
Plotly: A graphing library that creates interactive visualizations, used here for plotting the scatter plot, regression line, and residuals.
ipywidgets: Allows creation of interactive sliders and dropdowns for real-time updates to the plot as the user adjusts slope, intercept, and distance type.
IPython Display: Embeds interactive elements like widgets and plots within the Jupyter Notebook.
time: Measures the execution time of the program.

import numpy as np
import plotly.graph_objs as go
from ipywidgets import FloatSlider, Dropdown, Layout, HBox, VBox, interactive_output, HTML
from IPython.display import display
import time

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 2
      1 import numpy as np
----> 2 import plotly.graph_objs as go
      3 from ipywidgets import FloatSlider, Dropdown, Layout, HBox, VBox, interactive_output, HTML
      4 from IPython.display import display

ModuleNotFoundError: No module named 'plotly'

start_time = time.time()

2. Generate random linear data#

This block generates random linear data for x and y.

x: A sequence of 50 evenly spaced values between -5 and 5.
y: A linear function of x with added random noise to simulate real-world variations.

np.random.seed(20)
x = np.linspace(-5, 5, 50)
y = 0.5 * x + np.random.normal(size=x.size)

3. Define the function for perpendicular projection#

This function calculates the perpendicular projection of a point (x0, y0) onto a line defined by its slope and intercept. The function returns the projected point on the line (x_proj, y_proj).

def perpendicular_projection(x0, y0, slope, intercept):
    x_proj = (x0 + slope * (y0 - intercept)) / (slope**2 + 1)
    y_proj = slope * x_proj + intercept
    return x_proj, y_proj

4. Define the function to plot regression and residuals#

This function creates an interactive plot showing the data points, a regression line, and the residual distances between the data points and the line. The residuals can be calculated using:

Vertical Distance: The vertical distance between the data point and the line.
Horizontal Distance: The horizontal distance between the data point and the line.
Perpendicular Distance: The shortest distance between the data point and the line.

The plot also displays the Sum of Squared Distances (SSD), a measure of the model’s total error, which is updated dynamically as the slope and intercept change.

def plot_regression_plotly(slope=1.0, intercept=0.0, distance_type="vertical"):
    # Compute the fitted regression line
    y_pred = slope * x + intercept

    # Initialize traces for the plot
    data = []
    
    # Trace for the data points
    data.append(go.Scatter(x=x, y=y, mode='markers', name='Data points', marker=dict(color='black')))
    
    # Trace for the fitted regression line
    line_x = np.linspace(-6, 6, 100)
    line_y = slope * line_x + intercept
    data.append(go.Scatter(x=line_x, y=line_y, mode='lines', name=f'Fitted line: y = {slope:.2f}x + {intercept:.2f}', line=dict(color='red')))
    
    # Add residual lines and calculate SSD
    ssd = 0
    for i in range(len(x)):
        if distance_type == "vertical":
            # Vertical distance (difference in y)
            data.append(go.Scatter(x=[x[i], x[i]], y=[y[i], y_pred[i]], mode='lines', line=dict(color='pink', dash='dash')))
            ssd += (y[i] - y_pred[i]) ** 2
        elif distance_type == "horizontal":
            # Horizontal distance (difference in x)
            x_proj = (y[i] - intercept) / slope
            data.append(go.Scatter(x=[x[i], x_proj], y=[y[i], y[i]], mode='lines', line=dict(color='green', dash='dash')))
            ssd += (x[i] - x_proj) ** 2
        elif distance_type == "perpendicular":
            # Perpendicular distance
            x_proj, y_proj = perpendicular_projection(x[i], y[i], slope, intercept)
            data.append(go.Scatter(x=[x[i], x_proj], y=[y[i], y_proj], mode='lines', line=dict(color='blue', dash='dash')))
            perp_dist = np.sqrt((x[i] - x_proj)**2 + (y[i] - y_proj)**2)
            ssd += perp_dist ** 2
    
    # Create the layout for the plot with larger size
    layout = go.Layout(
        title=f'Sum of squared distances ({distance_type}): {ssd:.2f}',
        xaxis=dict(title='x', range=[-6, 6]),
        yaxis=dict(title='y', range=[-6, 6]),
        showlegend=True,
        width=900,  
        height=600,  
        margin=dict(l=40, r=40, t=40, b=40)  
    )
    
    # Create the figure and display it
    fig = go.Figure(data=data, layout=layout)
    fig.show()

5. Create interactive widgets#

This block creates interactive widgets using ipywidgets:

Slope Slider: Allows the user to adjust the slope of the regression line.
Intercept Slider: Allows the user to adjust the intercept of the regression line.
Distance Type Dropdown: Lets the user choose how the distances (residuals) are calculated—either vertically, horizontally, or perpendicularly.

slope_slider = FloatSlider(value=1.0, min=-3.0, max=3.0, step=0.1, layout=Layout(width='300px'))
intercept_slider = FloatSlider(value=0.0, min=-5.0, max=5.0, step=0.1, layout=Layout(width='300px'))
distance_type_dropdown = Dropdown(options=["vertical", "horizontal", "perpendicular"], layout=Layout(width='300px'))
slope_label = HTML(value=f"<b>Slope:</b> {slope_slider.value}")
intercept_label = HTML(value=f"<b>Intercept:</b> {intercept_slider.value}")
distance_type_label = HTML(value=f"<b>Distance type:</b> {distance_type_dropdown.value}")

6. Update labels dynamically#

This function updates the text labels for slope, intercept, and distance type dynamically as the user interacts with the sliders and dropdown menu. It ensures the displayed labels always reflect the current settings.

# Function to update the labels dynamically
def update_labels(change):
    slope_label.value = f"<b>Slope:</b> {slope_slider.value:.2f}"
    intercept_label.value = f"<b>Intercept:</b> {intercept_slider.value:.2f}"
    distance_type_label.value = f"<b>Distance type:</b> {distance_type_dropdown.value}"

7. Attach the update function to widgets#

In this block, the update_labels function is attached to the slope and intercept sliders and the distance type dropdown. This ensures that every time the user modifies a value, the corresponding labels update.

slope_slider.observe(update_labels, names='value')
intercept_slider.observe(update_labels, names='value')
distance_type_dropdown.observe(update_labels, names='value')

8. Arrange widgets in a horizontal layout#

This block arranges the sliders and dropdown widgets in a horizontal box (HBox) for a clean and organized layout within the notebook. Each control (slope, intercept, distance type) is placed side by side.

controls = HBox([VBox([slope_label, slope_slider]), VBox([intercept_label, intercept_slider]), VBox([distance_type_label, distance_type_dropdown])])

9. Define the function to update the plot#

This function updates the plot based on the current values of the slope, intercept, and selected distance type. Every time the user interacts with the widgets, this function recalculates the residuals and updates the plot accordingly.

def update_plot(slope, intercept, distance_type):
    plot_regression_plotly(slope, intercept, distance_type)

10. Display the interactive plot and controls#

This block combines the interactive controls (sliders and dropdown) with the plot output. It uses interactive_output to link the plot to the widgets, so the plot updates dynamically when the user changes any value.

output = interactive_output(update_plot, {'slope': slope_slider, 'intercept': intercept_slider, 'distance_type': distance_type_dropdown})

# Display the controls and the plot
display(controls, output)

end_time = time.time()

Print the execution time#

execution_time = end_time - start_time
print(f"Program execution time: {execution_time:.4f} seconds")

Program execution time: 0.2698 seconds

Visit the online and local app using Streamlit.#

onilne app on streamlit :
I programmed another version of this app using Streamlit and uploaded it to Streamlit Cloud. If you want to visit it : Clik here

Note : If you get a 403 error when clicking on this link, you will need to use a VPN.
Run the app locally on your computer: If you cannot reach the app online, you can run it locally on your computer.
1. Download the streamlit_app.py from this repository.
2. Install Streamlit via the command line: pip install streamlit
3. Run the file using the following command: streamlit run “path_to_the_file”

Useful Tool for a better understanding#

For a better understanding of the Least Squares Method, please visit this link : chasereynolds

Data Point	\(x_i\)	\(y_i\)	\(x_i^2\)	\(x_i y_i\)
1	0	1	0	0
2	2	1	4	2
3	3	4	9	12
Total	5	6	13	14

Data Point	\(x_i\)	\(y_i\)	\(x_i^2\)	\(x_i y_i\)
1	0	1	0	0
2	2	1	4	2
3	3	4	9	12
Total	5	6	13	14

Linear Least Square Method

Contents

Linear Least Square Method#

Project Description#

Mathematical Insights:#

Key Features:#

Minimizing Vertical Residuals (Least square Method) :#

Problem Definition#

Minimization Process#

Step 1: Compute Partial Derivatives#

a. Partial Derivative with respect to \(\beta_1\)#

b. Partial Derivative with respect to \(\beta_0\)#

Step 2: Set Partial Derivatives to Zero#

a. Setting \(\frac{\partial SSR}{\partial \beta_1} = 0\)#

b. Setting \(\frac{\partial SSR}{\partial \beta_0} = 0\)#

Step 3: Derive the Normal Equations#

a. First Normal Equation (from derivative w.r.t. \(\beta_0\))#

b. Second Normal Equation (from derivative w.r.t. \(\beta_1\))#

Step 4: Solve the System of Equations#

Solving for \(\beta_0\) and \(\beta_1\)#

Step 5: Express in Terms of Means#

Summary of Results#

Key Points#

Conclusion#

Exercise : Minimizing Vertical Residuals (Least square Method)#

Step 1: Organize the Data#

Step 2: Compute the Means#

Step 3: Apply the OLS Formulas#

Calculate \(\beta_1\):#

Calculate \(\beta_0\):#

Step 4: Formulate the Regression Line#

Step 5: Interpretation#

Verification#

Minimizing Horizental Residuals :#

Problem Definition#

Expressing Horizontal Residuals#

Objective Function#

Minimization Process#

Step 1: Compute Partial Derivatives#

a. Partial Derivative with respect to \(\beta_1\)#

b. Partial Derivative with respect to \(\beta_0\)#

Step 2: Set Partial Derivatives to Zero#

a. Setting \(\frac{\partial SSH}{\partial \beta_1} = 0\)#

b. Setting \(\frac{\partial SSH}{\partial \beta_0} = 0\)#

Step 3: Express \(e_i\) in Terms of Known Quantities#

Step 4: Substitute \(e_i\) into the Equations#

Equation (2):#

Equation (1):#

Step 5: Conclusion#

Numerical Solution Approach#

Key Points#

Exercise : Minimizing Horizontal Residuals#

Step 1: Organize the Data#

Step 2: Compute the Means#

Step 3: Formulate the Objective Function#

Step 4: Minimize the Sum of Squared Horizontal Residuals#

Partial Derivatives#

a. Partial Derivative with Respect to \(\beta_1\)#

b. Partial Derivative with Respect to \(\beta_0\)#

Setting Partial Derivatives to Zero#

a. Setting \(\frac{\partial SSH}{\partial \beta_1} = 0\)#

b. Setting \(\frac{\partial SSH}{\partial \beta_0} = 0\)#

Solving the Equations#

Equation 2:#

Equation 1:#

Calculate \(\beta_0\):#

Step 4: Formulate the Regression Line#

Step 5: Interpretation#

Verification#

Conclusion#

Minimizing Perpendicular Residuals#

Introduction#

Problem Definition#

Expressing Perpendicular Residuals#

1. General Formula for Distance from a Point to a Line#

2. Rearranging the Regression Line Equation#

3. Substituting into the Distance Formula#

Objective Function#

Minimization Process#

Step 1: Compute Partial Derivatives#