Math ClubMath Club
v1 · padrão canônico

Lesson 105 — Simple linear regression

OLS model, least squares estimators, R², residuals, inference on the slope. Foundation of supervised learning and econometrics.

Used in: Stochastik LK alemão (Klasse 12) · H2 Mathematics Singapura (§14) · Math B japonês

Y^=β^0+β^1X,β^1=SxySxx\hat{Y} = \hat\beta_0 + \hat\beta_1 X, \qquad \hat\beta_1 = \frac{S_{xy}}{S_{xx}}
Choose your door

Rigorous notation, full derivation, hypotheses

Rigorous definition

Simple linear regression model

"The regression equation is written as y^=a+bx\hat{y} = a + bx, where bb is the slope and aa is the yy-intercept." — OpenStax Statistics, §12.3

Variance decomposition and R²

"The coefficient of determination r2r^2 is the square of the correlation coefficient rr. It tells you the fraction of total variability in the response that is explained by the least-squares line." — OpenIntro Statistics, §7.2, p. 331

Inference on the slope

YXeFitted lineDataResidual eᵢ

Least squares line (gold) minimizing the sum of squared residuals (orange). Each residual e is the vertical distance from the point to the line.

Solved examples

Exercise list

30 exercises · 7 with worked solution (25%)

Application 15Understanding 4Modeling 5Challenge 4Proof 2
  1. Ex. 105.1Application

    Data: n=6n=6, Xˉ=4\bar X = 4, Yˉ=10\bar Y = 10, Sxx=20S_{xx} = 20, Sxy=30S_{xy} = 30. Calculate β^0\hat\beta_0 and β^1\hat\beta_1.

  2. Ex. 105.2Application

    Pairs (X,Y)(X,Y): (2,5)(2,5), (4,9)(4,9), (6,11)(6,11), (8,15)(8,15), (10,20)(10,20). Calculate the least squares line.

  3. Ex. 105.3Application

    Using Y^=1.2+1.8X\hat Y = 1.2 + 1.8X (previous exercise), predict YY for X=7X=7 and X=12X=12. Identify which prediction is extrapolation.

  4. Ex. 105.4Application

    For the data in Exercise 105.1: Xˉ=4\bar X=4, Yˉ=10\bar Y=10, Sxx=20S_{xx}=20, Sxy=30S_{xy}=30, Syy=52S_{yy}=52. Calculate R2R^2 and interpret.

  5. Ex. 105.5ApplicationAnswer key

    The Pearson correlation coefficient between two variables is r=0.87r = 0.87. What is the R2R^2 of the simple regression of YY on XX?

  6. Ex. 105.6ApplicationAnswer key

    Regression of annual salary (in thousand BRL) on years of experience produced Y^=32.4+2.5X\hat Y = 32.4 + 2.5X. Interpret β^0\hat\beta_0 and β^1\hat\beta_1.

  7. Ex. 105.7Application

    Using Y^=32.4+2.5X\hat Y = 32.4 + 2.5X, an employee with 14 years of experience earns 72,000 BRL/year. Calculate the residual.

  8. Ex. 105.8ApplicationAnswer key

    Five observed values of YY: (8,10,12,9,11)(8, 10, 12, 9, 11) with Yˉ=10\bar Y = 10. The SSE of the regression is 3.2. Calculate SST, SSR and R2R^2.

  9. Ex. 105.9Application

    A regression with n=20n=20 produced SSE=48.6SSE = 48.6. Calculate MSEMSE and σ^\hat\sigma and interpret.

  10. Ex. 105.10Application

    β^1=3.6\hat\beta_1 = 3.6, σ^=2.1\hat\sigma = 2.1, Sxx=144S_{xx} = 144. Calculate SE(β^1)SE(\hat\beta_1) and the TT statistic.

  11. Ex. 105.11Application

    n=30n=30, β^1=1.4\hat\beta_1 = 1.4, SE(β^1)=0.38SE(\hat\beta_1) = 0.38. Construct a 95% CI for β1\beta_1 and interpret.

  12. Ex. 105.12Application

    r=0.73r = -0.73, sX=4s_X = 4, sY=6s_Y = 6. What is the sign of β^1\hat\beta_1? Calculate β^1\hat\beta_1 using the relation β^1=r(sY/sX)\hat\beta_1 = r(s_Y/s_X).

  13. Ex. 105.13UnderstandingAnswer key

    Which of the statements about the least squares line is CORRECT?

  14. Ex. 105.14Understanding

    What is the correct interpretation of R2=0R^2 = 0 in simple linear regression?

  15. Ex. 105.15Understanding

    A regression produced R2=0.85R^2 = 0.85 and β^1=2.3>0\hat\beta_1 = 2.3 > 0. What can be concluded?

  16. Ex. 105.16Modeling

    A real estate agent in Curitiba collected data from 10 apartments: area (XX, in m²) and rent cost (YY, in BRL/month). Xˉ=80\bar X=80, Yˉ=1600\bar Y=1600, Sxx=3200S_{xx}=3200, Sxy=64000S_{xy}=64000. Fit the line and predict the rent for a 95 m² apartment.

  17. Ex. 105.17Modeling

    Children aged 10 to 25: Xˉ=22\bar X = 22 years, Yˉ=74\bar Y = 74 kg, sX=2.3s_X = 2.3, sY=8.5s_Y = 8.5, r=0.82r = 0.82. Fit the line using β^1=r(sY/sX)\hat\beta_1 = r(s_Y/s_X) and predict the weight of a 30-year-old child.

  18. Ex. 105.18ModelingAnswer key

    Regression with n=25n=25, SST=1200SST=1200, R2=0.72R^2=0.72. Build the ANOVA table (SSR, SSE, MSR, MSE, F) and test H0:β1=0H_0: \beta_1 = 0 at the 5% level.

  19. Ex. 105.19Modeling

    A regression of water consumption (liters/day) on temperature (°C) produced Y^=50+8X\hat Y = 50 + 8X with R2=0.91R^2=0.91 for n=30n=30 points. The point (15;430)(15; 430) appears very far from the others. What procedure should be used to evaluate its influence?

  20. Ex. 105.20Modeling

    A carrier recorded the number of orders XX and monthly logistics cost YY (in thousand BRL) for 5 branches: (10,100)(10,100), (20,180)(20,180), (30,270)(30,270), (40,340)(40,340), (50,400)(50,400). Fit the line.

  21. Ex. 105.21Application

    Using Y^=30+7.6X\hat Y = 30 + 7.6X, calculate the prediction and the residual for a branch with X=35X=35 orders and an observed cost of 310,000 BRL.

  22. Ex. 105.22Application

    For the regression in Exercise 105.20, calculate the 5 residuals, the SSE, and the residual standard deviation σ^\hat\sigma.

  23. Ex. 105.23Understanding

    The residuals vs. Y^\hat Y plot has a funnel shape (increasing variance). What does this indicate?

  24. Ex. 105.24Application

    For the regression in Exercise 105.20 (Y^=30+7.6X\hat Y = 30 + 7.6X, n=5n=5, Xˉ=30\bar X=30, Sxx=1000S_{xx}=1000, σ^10.95\hat\sigma \approx 10.95), construct a 95% CI for the average cost of a branch with X=40X^*=40 orders. Use t3;0.025=3.182t_{3;\,0.025} = 3.182.

  25. Ex. 105.25ChallengeAnswer key

    Prove algebraically that, for simple linear regression, R2=r2R^2 = r^2 (square of the Pearson correlation coefficient).

  26. Ex. 105.26ChallengeAnswer key

    Derive the formulas for β^0\hat\beta_0 and β^1\hat\beta_1 by minimizing SSE=(Yiβ0β1Xi)2SSE = \sum (Y_i - \beta_0 - \beta_1 X_i)^2 via differential calculus (normal equations).

  27. Ex. 105.27Proof

    Prove that, for any least squares line, the sum of residuals is zero: i=1nei=0\sum_{i=1}^n e_i = 0.

  28. Ex. 105.28Challenge

    Summary data: n=15n=15, Xˉ=12\bar X=12, Yˉ=45\bar Y=45, Sxx=420S_{xx}=420, Sxy=1260S_{xy}=1260, Syy=4800S_{yy}=4800. Calculate: fitted line, R2R^2, test H0:β1=0H_0:\beta_1=0 at the 5% level.

  29. Ex. 105.29Challenge

    Why does reducing the variability of XX (narrowing the sampled range) hurt the estimation of β1\beta_1? Relate to the formula for SE(β^1)SE(\hat\beta_1).

  30. Ex. 105.30Proof

    Prove that the OLS estimators β^0\hat\beta_0 and β^1\hat\beta_1 are unbiased, i.e., E[β^j]=βjE[\hat\beta_j] = \beta_j.

Sources

  • Statistics — OpenStax — Illowsky, Dean · CC-BY · Chapters 12 (Linear Regression and Correlation). Primary source for examples, equations, and exercises in this lesson.
  • OpenIntro Statistics (4th ed.) — Diez, Çetinkaya-Rundel, Barr · CC-BY-SA · Chapter 7 (Introduction to linear regression). Primary source for residual diagnostics, inference, and exercises with real data.
  • Probabilidade e Estatística — Wikilivros — collaborative · CC-BY-SA · Linear regression section. Reference in PT-BR with notation compatible with the national curriculum.

Updated on 2025-05-14 · Author(s): Clube da Matemática

Found an error? Open an issue on GitHub or submit a PR — open source forever.