| experience | salary | |
|---|---|---|
| experience | 1.0000000 | 0.6946505 |
| salary | 0.6946505 | 1.0000000 |
Tutorial 9
Learning objectives
After this tutorial the student should be able to:
recognize a situation for which the binomial test is the most appropriate test;
give the estimator for \(\pi\) and the associated standard error;
determine the estimate for \(\pi\) and the associated standard error;
determine the expected number of successes and the associated standard deviation;
apply by hand the exact binomial test for \(\pi\) and interpret the result.
explain correlation;
mention five properties of Pearson’s correlation coefficient;
mention the estimator for the population correlation \(\rho\) ;
interpret the direction and strength of the correlation based on graph and estimated correlation coefficient.
Pre-class activity
Watch:
The clip is linked on Brightspace.
Hypothesis testing for \(\pi\) (one sample; binomial distribution)
(Re-)Read the paragraph about the Binomial distribution from Tutorial 3:
-
- paragraph 4.8 pp.167-175 up to Poisson distribution, or
-
- paragraphs 4.8 pp.158-166 up to Poisson distribution.
Read:
-
- paragraph 10.2 pp.483-485 the first six lines, or
-
- paragraph 10.2 pp.500-502 the first six lines.
In an binomial situation the population proportion “successes” is denoted with \(\pi\). When \(\pi\) is unknown, use an estimator to get an estimate based on a sample.
This estimator can be used in a test (the so-called approximate \(z\)-test), however in this course we will apply the exact binomial test for answering research questions with respect to \(\pi\). The exact binomial test simply uses the number of “successes” in the sample (denoted by \(y\)) as a test statistic. The null distribution of this test statistic \(y\) is a binomial distribution with parameters \(n\) and \(\pi_0\).
For sample sizes \(n \leq 20\), cumulative probabilities for the binomial distributions can be found at the end of the Lecture Notes in the appendix using the tables with binomial distributions. Probabilities (for all sample sizes \(n\)) can also be calculated by using R/R Commander, or a graphing calculator. During Computer Practical \(5\) the exact binomial test will be applied using R/R Commander.
Correlation
Read:
-
- paragraph 3.7 pp.111-115 from the last two lines above Table 3.16 to side-by-side boxplots,
- paragraph 11.6 pp.591-598 from the sixth line under Example 11.13 or
-
- paragraph 3.7 pp.104-108 from the last two lines above Table 3.15 to side-to-side boxplots
- paragraph 11.7 pp.613-616 from the fourth line.
We have \(n\) paired observations \((x_1, y_1), (x_2, y_2),\ldots, (x_n, y_n)\).
Remark:
\(r_{xy}\) is also denoted by \(r(x,y)\).
The \(r(a \times x + b, c \times y + d) = r(x,y)\) when \(a \times c > 0\), or more specifically \(r(a \times x, c \times y) = r(x,y)\) when \(a \times c > 0\) means that the unit of the measurements has no influence on the value of the correlation coefficient. E.g., when you are interested in the correlation coefficient between the height and the weight of persons, it does not matter whether you measure the weight in gram or in kilogram (and the height in centimeter or meter).
Exercises to be done during the tutorial
Exercise 9.1 and Exercise 9.2 are in the presentation handouts of Tutorial 9. Check Brightspace for answers/feedback.
Exercise 9.1
Obstructive sleep apnea is a sleep disorder, that causes a person to stop breathing momentarily and then awaken briefly. Sleep researcher theorize that 25% of the general population suffers from this disorder.
Researchers from a university found that \(7\) out of \(20\) commercial truck drivers suffered from obstructive sleep apnea.
a. Investigate, using the p-value approach, whether the proportion of truck drivers suffering from sleep apnea is larger than 25%. Mention all the steps.
b. Instead of using the p-value approach, the rejection region approach can be used. Using the rejection region approach provide steps 5, 6,7 and 8 to investigate, whether the proportion of truck drivers suffering from obstructive sleep apnea is larger than 25%.
Exercise 9.2
Based on:
Random sample of employees: \(n = 52\).
Figure 1 displays a scatter plot of of first-year salary after graduation and years of work experience prior to obtaining their MBA and Table 1 shows the correlation matrix for all variables.
a. Judge the strength of the correlation based on the scatter plot shown in Figure 1.
b. Give the estimated correlation coefficient.
Post-class activity
Watch:
The clip is linked on Brightspace.
Exercises to be done after the tutorial
For answers/feedback check Brightspace.
Exercise 9.3
In 2006 it was shown that 6% of Dutch children between 7 and 9 years old suffer from dyslexia. In 2018 these children were among the students of Dutch universities. Any student suffering from dyslexia is entitled to have more time to finish her/his exam. Based on the number of requests for extra time a teacher thinks that less than 6% of the students at Wageningen University & Research have dyslexia. She takes a random sample of 100 students out of the student population at Wageningen University & Research and finds that 3 of them suffer from dyslexia.
You may use R Commander to find the p-value for this exercise: Distributions > Discrete distributions > Binomial distribution > Binomial tail probabilities\(\ldots\). Enter the correct values and choose the correct tail, where lower tail gives the probability: \(P(y \leq k)\), and upper tail gives the probability: \(P(y > k)\).
a. Formulate the research question.
b. Apply the appropriate test, using \(\alpha = 0.10\), to answer the research question.
c. Are you surprised by the outcome of the test?
d. Use R Commander, to find how many students out of 100 should have dyslexia to reject the null hypothesis: Distributions > Discrete distributions > Binomial distribution > Binomial quantiles\(\ldots\). Enter the correct values and choose the correct tail.
Exercise 9.4
This exercise is based on Exercise 4.45 O&L 6th Edition pp.210-211, which is not available in O&L 7th Edition. Therefore, the exercise is provided below. Use the correct binomial distribution table to answer the questions.
It was claimed that in an inspection of automobiles in Los Angeles, 60% of all automobiles did not meet the EPA regulations. A garage owner thinks that this percentage must be smaller. He takes a sample of 20 automobiles, from which 9 did not meet the EPA regulations.
a. Formulate the research question.
b. Apply the appropriate test , mentioning all steps, to answer the research question. Use \(\alpha = 10\%\).
c. How many cars out of a sample of 20 should meet the EPA regulations, when the garage owner is right?
Exercise 9.5
Based on:
Read the example in the book and have a look at the table with the data. Answer the following questions with help of the R/R Commander output:
- Scatter plot of the number of eggs produced versus the body weight (Figure 2).
- Correlation matrix for the number of eggs produced and the body weight of female grasshoppers (Table 2)
| eggs | weight | |
|---|---|---|
| eggs | 1.0000 | 0.6059 |
| weight | 0.6059 | 1.0000 |
a. Based on the scatterplot (Figure 2), would you say that there is a relation between the weight of the female grasshopper and the number of eggs (give arguments)?
b. When a. is answered with ‘yes’; would you say this is a straight line relationship? (give arguments)
c. Note: Irrespective the answers given in a. and b., proceed pretending that all assumptions for a straight line relationship are met. Give the estimator and the estimate for the population correlation \(\rho\).
d. Suppose the observations for which \(\mbox{weight} \geq 4.0\) would be deleted from the data set and the correlation would be calculated again. Will the correlation be unchanged, larger or smaller? Give arguments.
Exercise 9.6
Based on:
Read the example in the book and have a look at the table with the data. Answer the following questions with help of the R/R Commander output:
- Scatter plot of the productivity index versus the aptitude test score (Figure 3).
- Correlation matrix for the productivity index and the aptitude test score of employees (Table 3).
| productivity | aptitude | |
|---|---|---|
| productivity | 1.0000 | 0.6456 |
| aptitude | 0.6456 | 1.0000 |
a. Does the scatter plot suggest a straight line relation between the productivity index and the aptitude test score of employees?
b. If yes, is it a positive or negative relationship? Give arguments.