Tutorial 10
Learning objectives
After this tutorial the student should be able to:
recognize a situation and research question for which a simple linear regression is the appropriate analysis;
mention and explain in own words the five model assumptions of the simple linear regression model;
give the model equation;
mention and use the associated terms (i.e., \(y\), \(x\), \(\beta_0\), \(\beta_1\), \(\sigma\) and \(\varepsilon\)) appropriately;
give the least square estimators for the three model parameters (\(\beta_0\), \(\beta_1\), \(\sigma\));
interpret a scatter plot and regression line;
interpret the regression coefficients;
give (based on R/R Commander output) the estimated regression model;
give (based on R/R Commander output) the estimated \(\sigma\):
apply (based on R/R Commander output) the omnibus \(F\)-test for the model.
Pre-class activity
Watch:
The clip is linked on Brightspace.
Simple Linear Regression
Read:
-
- paragraph 11.1 pp.555-559 up to smoothers,
- paragraph 11.2 pp.564-568 up to and including Example 11.2, or
-
- paragraph 11.1 pp.572-576 up to smoothers,
- paragraph 11.2 pp. 581-585 up to and including Example 11.2.
There is no need to calculate the estimates \(\hat{\beta}_0\), \(\hat{\beta}_1\) and \(\hat{\sigma}^2_{\varepsilon}\) by hand. For this the R/R Commander output will be used.
The estimated regression line always passes through the center of all data points \((\bar{x}, \bar{y})\).
The book (O&L) uses the term standard error of estimate for \(\hat{\sigma}_{\varepsilon}\), whereas R/R Commander uses the term Residual standard error. These terms are confusing, because “standard error of estimate” is generally reserved for the precision of an unbiased estimator.
- The \(F\)-test above is only used for \(\mbox{H}_{\mbox{a}}:\ \beta_1 \neq 0\) (and not for \(\mbox{H}_{\mbox{a}}:\ \beta_1 > 0\) or \(\mbox{H}_{\mbox{a}}:\ \beta_1 < 0\), or any test value other than \(0\)), and is within the frame work of simple linear regression for a straight line equivalent to the \(t\)-test for \(\mbox{H}_{\mbox{a}}:\ \beta_1 \neq 0\) (as will be explained in Tutorial 11).
- The \(F\)-test above is only used for a two-tailed alternative hypothesis \(\mbox{H}_{\mbox{a}}: \beta_1 \neq 0\). However, the rejection region is (due to the characteristics of the \(F\)-distribution) always right-tailed. Therefore, the p-value equals \(P(F \geq \mbox{outcome test statistic})\).
Exercises to be done during the tutorial
Exercise 10.1 and Exercise 10.2 are in the presentation handouts of Tutorial 10. Check Brightspace for answers/feedback.
Exercise 10.1
Based on the research of Ruben Dijkhof (MSc Thesis Landscape Architecture and Spatial Planning, see clip linked on Brightspace).
Research Question: What is the effect of the distance to the national ecological network \((x)\), measured in kilometers, on the price of agricultural land \((y)\) in the province Limburg?
It is assumed that \(\mu_y\) and \(x\) are linearly related.
Write down the (mathematical) model and describe all used symbols.
Exercise 10.2
Part of the linear model summary for the Rhizotron potato example:
Using the partial R/R Commander output above:
a. Provide the equation for the estimated model of the root depth explained by thermal time.
b. Provide an estimate for the (population) mean root depth, when the thermal time equals 0.
c. Provide an estimate for the (population) mean root depth, when the thermal time equals 1.
d. What will be the estimated effect on the (population) mean root depth, when the thermal time increases with 4 degree days?
e. Is there any evidence that the model for root depth explained by thermal time has predictive value?
Post-class activity
Watch:
The clip is linked on Brightspace.
Exercises to be done after the tutorial
For answers/feedback check Brightspace.
Exercise 10.3
Based on:
Read the introduction of this example (not the questions below it). Use the R/R Commander output below to answer the following questions:
- Scatter plots
- Summary Simple Linear Regression model (straight line model)
a. What is the research question in example 11.2?
b. Have a look at the provided scatter plots. What can you read from these plots with respect to the strength of the linear relationship?
c. Give, based on the description of the example in the book, the Simple Linear Regression model (straight line model) and describe all symbols used in terms of the actual problem.
d. Test (\(\alpha = 0.05\)) whether the model has any predictive value. Mention all 8 steps.
e. Is the test performed in d. suitable for the research question (answer a.)? Give arguments.
f. Give the estimated simple linear regression model (straight line model) as an equation.
Exercise 10.4
In a study conducted to examine the quality of fish after 7 days of storage on ice, ten raw fish of the same kind and approximately the same size were caught and prepared for storage on ice. Two of the fish were placed in storage immediately after being caught, two were placed in storage 3 hours after being caught, and two each were placed in storage at 6, 9 and 12 hours after being caught.
Let \(y\) denote a measurement of fish quality (on a 10-point scale) after 7 days of storage on ice, and let \(x\) denote the time after being caught that the fish were placed in storage on ice. The sample data are (see Table 1):
| y | x |
|---|---|
| 8.5 | 0 |
| 8.4 | 0 |
| 7.9 | 3 |
| 8.1 | 3 |
| 7.8 | 6 |
| 7.6 | 6 |
| 7.3 | 9 |
| 7.0 | 9 |
| 6.8 | 12 |
| 6.7 | 12 |
The following model is assumed: \(y_i = \beta_0 + \beta_1 \times x_i + \varepsilon_i\)
Furthermore assume that the residuals are independent and normally distributed with standard deviation \(\sigma_{\varepsilon}\). Use, where appropriate, the provided R/R Commander output to answer the questions:
- Summary Simple Linear Regression model (straight line model):
a. Plot the sample data (by hand). Does there seem to be a linear relation between \(y\) and \(x\)?
b. Formulate the research question.
c. Give the least squares estimate for \(\beta_0\) and its estimated standard error.
d. Give the least squares estimate for \(\beta_1\) and its estimated standard error.
e. Interpret the value of \(\hat{\beta}_1\) in the context of this problem.
f. Give an estimate for \(\sigma_{\varepsilon}\)
g. Calculate the estimated population mean fish quality after 7 days storage for a fish placed on ice 7 hours after having been caught.