Computer Practicum 4

This computer practicum contains the following three parts:

Tasting coffee.
Weight of bags with coffee beans.
Distinguishing between organic and regular coffee.

So far during the tutorials three different \(t\)-tests have been introduced. The difficult part is not applying the tests, but choosing the appropriate test.

In this computer practicum three different situations are present, where the appropriate test needs, or tests need, to be chosen to answer the research question. To aid you, an answer form will be supplied to you by your computer practicum teacher with a specific structure.

The example, provided in the next section, shows how the answer form should be filled in. Review the example and answers carefully, and try to understand it, before you start with Part 1.

Example

In \(1996\) the American Department For International Development (DFID) has donated \(283.300\) cubic ton of soy beans enriched with vitamin C (CSB). The enrichment was such that the expected amount of vitamin C equaled \(40\) mg / \(100\) g.

In \(8\) random samples containing \(100\) g of CSB each, the measured amounts of vitamin C in mg per \(100\) g CSB were found to be: \(26,\ 31,\ 23,\ 22,\ 11,\ 22,\ 14,\ 31\)

Table 1 shows the answer form filled for the example.

Table 1: Answer form filled for the example

	To be filled before collecting data
Research question	Does the expected amount of vitamin C per \(100\) g CSB deviate from \(40\) mg?
Name of the test	one-sample \(t\)-test
variable(s)(symbol(s) & definition(s))	\(y =\) amount of vitamin C (mg / \(100\) g CSB)
parameter(s) of interest(symbol(s) & definition(s))	\(\mu_y =\) (population) mean amount of vitamin C (mg / \(100\) g CSB)
null hypothesis (\(\mbox{H}_0\))	\(\mbox{H}_0:\ \mu = 40\)
alternative hypothesis (\(\mbox{H}_{\mbox{a}}\))	\(\mbox{H}_{\mbox{a}}:\ \mu \neq 40\)
Test Statistic (T.S.)(symbol & formula)	\(t = \frac{\bar{y} - \mu_0}{SE(\bar{y})} = \frac{\bar{y} - 40}{(s_y /\sqrt{8})}\)
Distribution T.S. under \(\mbox{H}_0\)	Student’s t-distribution with \(\nu = 7\) degrees of freedom.
Behavior T.S. under \(\mbox{H}_{\mbox{a}}\)	~~higher~~ / ~~lower~~ / higher or lower
sidedness \(p\)-value	~~right-tailed~~ / ~~left-tailed~~ / two-tailed

	Use collected data in R Commander and fill in the results
Levene’s test	Applicable: ~~yes~~ / no\(H_0:\)\(H_a:\)Outcome T.S.: \(F \approx \ldots\)\(p\)-value:\(\ldots\)Assume equal variances: yes / no
(point) estimate parameter	\(\bar{y} = 22.5\)
SE of the estimate	\(\mbox{SE}(\bar{y}) \approx 2.54\)
outcome T.S.	\(t \approx -6.88\)
\(p\)-value	\(2 \times P(t \leq -6.88) \approx 0.0002\)
Statistical conclusion	\((p\mbox{-value} \approx 0.0002) < (\alpha = 0.05)\) Reject \(H_0\), and \(H_a\) has been shown.
Conclusion	It is shown (when \(\alpha = 0.05\)) that the expected vitamin content in 100 grams CSB deviates from \(40\) mg, in such a way that it is actually less than \(40\) mg (the estimate is lower than the hypothesized value).
Confidence Interval	\(95\%\) CI for \(\mu_y = (16.49,\ 28.51)\)
Possible error Type I or II	It is possible, that a Type I error has been made.
Rejection Region (R.R)	\(\|t\| \geq 2.365\) (from O&L Table \(2\) with \(\alpha / 2 = 0.025\) and \(\nu = 7\) degrees of freedom)

Learning objectives

After this computer practicum the student should be able to perform the following tests in R Commander, explain why the chosen test is the appropriate one given the research question and data, and interpret the R Commander output:

\(t\)-test for a (population) mean \(\mu\), a.k.a one sample \(t\)-test;
Levene’s test;
\(t\)-test for the difference in (population) means \(\mu_1 - \mu_2\), a.k.a. independent samples \(t\)-test;
paired samples \(t\)-test.

Part 1 - Tasting Coffee

An experiment investigated, whether consumers were more positive about the taste of a certain brand of coffee in a test situation at home compared to tasting the coffee in a laboratory environment. Ten randomly selected consumers were asked to score their judgement about the coffee at home and in a laboratory environment by moving an arrow on a vertical axis with numbers between 0 and 100.

Fill in the part “To be filled before collecting data” of the answer form for Part 1.

The data collected from the experiment is given in Table 2.

Table 2: Judgement scores by consumers for tasting at home and in a laboratory

Consumer	1	2	3	4	5	6	7	8	9	10
Judgement score at home	50	76	81	60	30	70	74	64	76	70
Judgement score in the lab	55	74	79	49	34	65	68	66	73	64

Create a new data set in the appropriate way, with a sensible name, e.g., “tasting_coffee”, in R Commander by going to: Data > New data set\(\ldots\) using the data from Table 2. Think about the number of rows and columns required to properly reflect the given case.

Part 1c)

An additional variable or additional variables may be required to check the normality assumption.

Check with the appropriate graph(s), whether the normality assumption holds for the variable(s). Mention the graph(s), the variable(s) you used to make the graph(s), and your conclusion(s) with respect to normality. Write your answers on the answer form.
Check the mean(s), standard deviation(s), and standard error(s), i.e., make summaries of the variable(s) in R Commander and write down the values on the answer form.
Perform Levene’s test if appropriate and fill in the row for Levene’s test on the answer form. For Levene’s test go to: Statistics > Variances > Levene’s test\(\ldots\), select Center: “mean” by changing the radio button. Click the OK button to perform the test.
Continue with the test procedure to answer the research question about tasting coffee: perform the appropriate analysis in R Commander. Use \(\alpha = 0.05\) and indicate to create a \(95\%\) Confidence Interval.
On the answer form, fill in the part of the large table in the first column for Part 1 up to ‘Confidence Interval’.
Decide, whether the \(95\%\) Confidence Interval can be read from the output generated at Part 1f), or that new output is required. In case of the latter, please do so. Denote the correct \(95\%\) Confidence Interval for the parameter(s) of interest on the answer form.
Could an error of Type I or II have been made here? Explain your answer.
Instead of using the \(p\)-value, also the rejection region (R.R.) could have been used. Use R Commander to find the correct rejection region. Go to: Distributions, choose quantiles for the correct distribution.

Part 2 - Weight of bags with coffee beans

A machine at Lavazza, a famous brand of coffee, packages coffee beans in bags weighing half a kilo (500 grams). A random selected half-kilo bag with coffee beans has a net weight \(y\), which is assumed to be normally distributed with expectation \(\mu_y\) and standard deviation \(\sigma_y\). Both \(\mu_y\) and \(\sigma_y\) are unknown, and expressed in the physical unit grams [g].

From the packaging machine at the Lavazza factory \(10\) half-kilo bags with coffee beans are randomly selected to check whether the weight is in line with the machine settings. These \(10\) bags are, therefore, a simple random sample.

Fill in the part “To be filled before collecting data” of the answer form for Part 2.
Load the data in the file “BSP4_Weight_Coffee_Bags.RData” into R Commander.
Check with the appropriate graph(s), whether the normality assumption holds for the variable(s). Mention the graph(s), the variable(s) you used to make the graph(s), and your conclusion(s) with respect to normality. Write your answers on the answer form.
Check the mean(s), standard deviation(s), and standard error(s), i.e., make summaries of the variable(s) in R Commander and write down the values on the answer form.
Perform Levene’s test if appropriate and fill in the row for Levene’s test on the answer form. For Levene’s test go to: Statistics > Variances > Levene’s test\(\ldots\), select Center: “mean” by changing the radio button. Click the OK button to perform the test.
Continue with the test procedure to answer the research question about weight of coffee bean bags: perform the appropriate analysis in R Commander. Use \(\alpha = 0.10\) and indicate to create a \(90\%\) Confidence Interval.
On the answer form, fill in the part of the large table in the first column for Part 2 up to ‘Confidence Interval’.
Decide, whether the \(90\%\) Confidence Interval can be read from the output generated at Part 2f), or that new output is required. In case of the latter, please do so. Write down the correct \(90\%\) Confidence Interval for the parameter(s) of interest on the answer form.
Could an error of Type I or II have been made here? Explain your answer.
Instead of using the \(p\)-value, also the rejection region (R.R.) could have been used. Use R Commander to find the correct rejection region. Go to: Distributions, choose quantiles for the correct distribution.

Part 3 - Distinguishing between organic and regular coffee

The following is based on an article form Resource at Wageningen University & Research.

A laboratory test can help distinguish between organic coffee and regular (i.e, non-organic) coffee, and, as a result, can help expose fraud concerning food authenticity. Researchers at Wageningen Food Safety Research have shown this by letting a machine ‘smell’ coffee aroma. Various types of coffee could be distinguished by looking at the mixture of substances responsible for the aroma. The researchers extracted air from a bottle containing half a gram of coffee. A so-called mass spectrometer determined exactly which aromatic substances were present in the extracted air. Because coffee can release about 900 different substances for the aroma, each type of coffee has its own aromatic profile, which can be considered like a fingerprint of the coffee. These aromatic profiles appeared to be very different for organic coffee and regular coffee.

In this part of the computer practicum, focus will be on one specific aromatic substance: ethyl-dimethyl-pyrazine, also referred to ion 137 (variable: “ion137”). This substance has a real coffee aroma. A significant difference in measured intensity of ion 137 between organic coffee and regular coffee, when running a two-tailed hypothesis test, would indicate that this aromatic substance can be of help to distinguish between organic coffee and regular coffee.

The data set consists of \(65\) observations for regular coffee and \(43\) observations for organic coffee. The observations can assumed to be independent and normally distributed within both groups, as well as independent between both groups.

Fill in the part “To be filled before collecting data” of the answer form for Part 3.
Load the data in the file “BSP4_Regular_Organic.RData” into R Commander.
Check with the appropriate graph(s), whether the normality assumption holds for the variable(s). Mention the graph(s), the variable(s) you used to make the graph(s), and your conclusion(s) with respect to normality. Write your answers on the answer form.
Check the mean(s), standard deviation(s), and standard error(s), i.e., make summaries of the variable(s) in R Commander and write down the values on the answer form.
Perform Levene’s test if appropriate and fill in the row for Levene’s test on the answer form. For Levene’s test go to: Statistics > Variances > Levene’s test\(\ldots\), select Center: “mean” by changing the radio button. Click the OK button to perform the test.
Continue with the test procedure to answer the research question about tasting coffee: perform the appropriate analysis in R Commander. Use \(\alpha = 0.05\) and indicate the create a \(95\%\) Confidence Interval.
On the answer form, fill in the part of the large table in the first column for Part 2 up to ‘Confidence Interval’.
Decide, whether the \(95\%\) Confidence Interval can be read from the output generated at Part 2f), or that new output is required. In case of the latter, please do so. Write down the correct \(95\%\) Confidence Interval for the parameter(s) of interest on the answer form.
Could an error of Type I or II have been made here? Explain your answer.
Instead of using the \(p\)-value, also the rejection region (R.R.) could have been used. Use R Commander to find the correct rejection region. Go to: Distributions, choose quantiles for the correct distribution.