| Estimate | 2.5 % | 97.5 % | |
|---|---|---|---|
| (Intercept) | 4.6979 | -9.0275 | 18.4232 |
| Purchased_Directly | 1.9705 | 1.6141 | 2.3269 |
Tutorial 11
Learning objectives
After this tutorial, within the frame work of a single regression analysis, the student should be able to:
apply a \(t\)-test for the slope;
construct a confidence interval for the regression coefficients;
determine for any \(x\)-value the predicted expected value \(\hat{\mu}\), given the estimated regression equation;
determine for any \(x\)-value the predicted \(\hat{y}\) for a random unit, given the estimated regression equation;
explain the difference in interpretation between \(\hat{\mu}\) and \(\hat{y}\) ;
determine the confidence interval for any predicted mean outcome given the standard error of the predicted value;
read the fitted value, the standard error of the fitted value and the confidence interval for \(\mu_y\) from the R/R Commander output;
read the fitted value and the prediction interval for \(y\) for a new value of \(x\) from the R/R Commander output;
distinguish the confidence interval for \(\mu_y\) from the prediction interval for \(y\);
explain why the confidence interval for \(\mu_y\) is smaller than the prediction interval for \(y\) .
Hypothesis testing and confidence interval for a regression coefficient
Read:
-
- paragraph 11.3 pp.574-577 or
-
- paragraphs 11.3 pp.590-594.
Predicting new \(y\) values, confidence and prediction intervals
Read:
-
- paragraph 11.4 pp.577-581 or
-
- paragraphs 11.4 pp.594-598.
In this paragraph the prediction of the expected value \(\mbox{E}(\mu)\) for a given value of \(x\), the prediction of an individual outcome \(\mbox{E}(y)\) as well as the associated confidence and prediction intervals are discussed. Related to prediction, the book also discusses extrapolation.
Exercises to be done during the tutorial
Exercise 11.1 is in the presentation handouts of Tutorial 11. Check Brightspace for answers/feedback.
Exercise 11.1
Based on the data from:
A director of a company wants to know, whether there is a positive relationship between the productivity of employees (\(y\)) and the score on an aptitude test (\(x\)). He has data of 12 employees and assumes the following linear relationship: \(y = \beta_0 + \beta_1 \times x + \varepsilon\)
R/R Commander output for the simple linear regression model:
a. Apply the appropriate test to answer the Research Question. Mention all steps (use \(\alpha =0.05\))
b. Calculate the 95% CI for \(\beta_1\).
Post-class activity
Watch:
All of the clips are linked on Brightspace.
Exercises to be done after the tutorial
For answers/feedback check Brightspace.
Exercise 11.2
In preparation of Tutorial 12 please answer the Multiple Choice Questions of the example exam (available on Brightspace > Tutorial 12). The answers will be given and discussed in Tutorial 12.
Exercise 11.3
Based on:
This example was used for Exercise 10.3 as well. (Re-)Read the introduction of this example (not the questions below it). Use the R/R Commander output below to answer the following questions:
Scatter plots \(\rightarrow\) see Tutorial 10, Exercise 10.3, Figure 1
Summary Simple Linear Regression model (straight line model) \(\rightarrow\) see Tutorial 10, Exercise 10.3
\(95\%\) Confidence Intervals for \(\beta_0\) and \(\beta_1\) (see Table 1)
a. What is the research question in example 11.2? (You have answered this question already in Exercise 10.3a., but good to start with it again.)
b. In Exercise 10.3e. you have applied the omnibus F-test to test \(\mbox{H}_0:\ \beta_1 = 0\) against \(\mbox{H}_{\mbox{a}}:\ \beta_1 \neq 0\). In Tutorial 11 you have seen that you could apply a t-test for a regression coefficient as well. Find the outcome of the T.S. t for this test in the R output.
c. Check that the squared outcome of the test statistic t is equal to the outcome of the test statistic F of the omnibus F-test.
d. Suppose, in advance it was hypothesized that there is a positive relationship between the percentage of prescription ingredients purchased directly from the supplier and the prescription sales volume. Perform a test (\(\alpha = 0.05\)) for this hypothesis. Mention all steps. Use the R output for steps 6 and 7 only!
e. Suppose that from previous research it is known that \(\beta_1 = 2\). Test (\(\alpha = 0.05\)) whether this has changed. Mention all steps. Use the appropriate parts of the R output to answer this question.
f. Read the 95% Confidence Interval for \(\beta_1\) from the R output.
g. Calculate the 90% confidence interval for \(\beta_1\).
h. Give the interpretation for the confidence interval constructed in question g.
Exercise 11.4
Same case as in Exercise 10.4. However, this time with different questions (except for some useful repetition).
In a study conducted to examine the quality of fish after 7 days of storage on ice, ten raw fish of the same kind and approximately the same size were caught and prepared for storage on ice. Two of the fish were placed in storage immediately after being caught, two were placed in storage 3 hours after being caught, and two each were placed in storage at 6, 9 and 12 hours after being caught.
Let \(y\) denote a measurement of fish quality (on a 10-point scale) after 7 days of storage on ice, and let \(x\) denote the time after being caught that the fish were placed in storage on ice. The sample data are given in Table 1.
The following model is assumed: \(y_i = \beta_0 + \beta_1 \times x_i + \varepsilon_i\)
Furthermore assume that the residuals are independent and normally distributed with standard deviation \(\sigma_{\varepsilon}\). Use, where appropriate, the provided R/R Commander output to answer the questions:
- Summary Simple Linear Regression model (straight line model) \(\rightarrow\) See Tutorial 10 Exercise 10.4.
-
Prediction of new samples:
| \(x_{n + 1}\) | fit | lwr | upr | se.fit | df |
|---|---|---|---|---|---|
| 7 | 7.468333 | 7.377923 | 7.558744 | 0.03920654 | 8 |
| 10 | 7.043333 | 6.922390 | 7.164276 | 0.05244706 | 8 |
| \(x_{n + 1}\) | fit | lwr | upr | se.fit | df |
|---|---|---|---|---|---|
| 7 | 7.468333 | 7.175737 | 7.760929 | 0.03920654 | 8 |
| 10 | 7.043333 | 6.739910 | 7.346756 | 0.05244706 | 8 |
a. Formulate the research question.
b. Why is the omnibus F-test not appropriate to answer the research question?
c. Apply the appropriate test (\(\alpha = 0.05\)) to answer the research question.
d. Give the estimate for \(\sigma_\varepsilon\).
e. Calculate the estimated population mean fish quality score after 7 days of storage on ice, when the fish was stored on ice 10 hours after being caught.
f. Read from the R output the fish quality score after 7 days of storage on ice, when the fish was stored on ice 10 hours after being caught.
g. Explain why the answers to e. and f. are the same.
h. Determine the 95% Confidence Interval for the population mean fish quality score after 7 days of storage on ice, when the fish was stored on ice 10 hours after being caught.
i. Read from the R output the 95% Prediction Interval for the fish quality score after 7 days of storage on ice, when the fish was stored on ice 10 hours after being caught.
j. Explain why the 95% Prediction Interval for \(y\) at \(x_{n + 1}\) is always wider than the Confidence Interval for \(\mu_y = \beta_0 + \beta_1 \times x_{n + 1}\).