Lesson 105 — Simple linear regression
OLS model, least squares estimators, R², residuals, inference on the slope. Foundation of supervised learning and econometrics.
Used in: Stochastik LK alemão (Klasse 12) · H2 Mathematics Singapura (§14) · Math B japonês
Rigorous notation, full derivation, hypotheses
Rigorous definition
Simple linear regression model
"The regression equation is written as , where is the slope and is the -intercept." — OpenStax Statistics, §12.3
Variance decomposition and R²
"The coefficient of determination is the square of the correlation coefficient . It tells you the fraction of total variability in the response that is explained by the least-squares line." — OpenIntro Statistics, §7.2, p. 331
Inference on the slope
Least squares line (gold) minimizing the sum of squared residuals (orange). Each residual e is the vertical distance from the point to the line.
Solved examples
Exercise list
30 exercises · 7 with worked solution (25%)
- Ex. 105.1Application
Data: , , , , . Calculate and .
- Ex. 105.2Application
Pairs : , , , , . Calculate the least squares line.
- Ex. 105.3Application
Using (previous exercise), predict for and . Identify which prediction is extrapolation.
- Ex. 105.4Application
For the data in Exercise 105.1: , , , , . Calculate and interpret.
- Ex. 105.5ApplicationAnswer key
The Pearson correlation coefficient between two variables is . What is the of the simple regression of on ?
- Ex. 105.6ApplicationAnswer key
Regression of annual salary (in thousand BRL) on years of experience produced . Interpret and .
- Ex. 105.7Application
Using , an employee with 14 years of experience earns 72,000 BRL/year. Calculate the residual.
- Ex. 105.8ApplicationAnswer key
Five observed values of : with . The SSE of the regression is 3.2. Calculate SST, SSR and .
- Ex. 105.9Application
A regression with produced . Calculate and and interpret.
- Ex. 105.10Application
, , . Calculate and the statistic.
- Ex. 105.11Application
, , . Construct a 95% CI for and interpret.
- Ex. 105.12Application
, , . What is the sign of ? Calculate using the relation .
- Ex. 105.13UnderstandingAnswer key
Which of the statements about the least squares line is CORRECT?
- Ex. 105.14Understanding
What is the correct interpretation of in simple linear regression?
- Ex. 105.15Understanding
A regression produced and . What can be concluded?
- Ex. 105.16Modeling
A real estate agent in Curitiba collected data from 10 apartments: area (, in m²) and rent cost (, in BRL/month). , , , . Fit the line and predict the rent for a 95 m² apartment.
- Ex. 105.17Modeling
Children aged 10 to 25: years, kg, , , . Fit the line using and predict the weight of a 30-year-old child.
- Ex. 105.18ModelingAnswer key
Regression with , , . Build the ANOVA table (SSR, SSE, MSR, MSE, F) and test at the 5% level.
- Ex. 105.19Modeling
A regression of water consumption (liters/day) on temperature (°C) produced with for points. The point appears very far from the others. What procedure should be used to evaluate its influence?
- Ex. 105.20Modeling
A carrier recorded the number of orders and monthly logistics cost (in thousand BRL) for 5 branches: , , , , . Fit the line.
- Ex. 105.21Application
Using , calculate the prediction and the residual for a branch with orders and an observed cost of 310,000 BRL.
- Ex. 105.22Application
For the regression in Exercise 105.20, calculate the 5 residuals, the SSE, and the residual standard deviation .
- Ex. 105.23Understanding
The residuals vs. plot has a funnel shape (increasing variance). What does this indicate?
- Ex. 105.24Application
For the regression in Exercise 105.20 (, , , , ), construct a 95% CI for the average cost of a branch with orders. Use .
- Ex. 105.25ChallengeAnswer key
Prove algebraically that, for simple linear regression, (square of the Pearson correlation coefficient).
- Ex. 105.26ChallengeAnswer key
Derive the formulas for and by minimizing via differential calculus (normal equations).
- Ex. 105.27Proof
Prove that, for any least squares line, the sum of residuals is zero: .
- Ex. 105.28Challenge
Summary data: , , , , , . Calculate: fitted line, , test at the 5% level.
- Ex. 105.29Challenge
Why does reducing the variability of (narrowing the sampled range) hurt the estimation of ? Relate to the formula for .
- Ex. 105.30Proof
Prove that the OLS estimators and are unbiased, i.e., .
Sources
- Statistics — OpenStax — Illowsky, Dean · CC-BY · Chapters 12 (Linear Regression and Correlation). Primary source for examples, equations, and exercises in this lesson.
- OpenIntro Statistics (4th ed.) — Diez, Çetinkaya-Rundel, Barr · CC-BY-SA · Chapter 7 (Introduction to linear regression). Primary source for residual diagnostics, inference, and exercises with real data.
- Probabilidade e Estatística — Wikilivros — collaborative · CC-BY-SA · Linear regression section. Reference in PT-BR with notation compatible with the national curriculum.