Data from the NSR education test
The Norwegian Centre for Science Recruitment (NSR) has an online “education test” where youths may answer a questionnaire to check their so-called cognitive types, their science interest, their preferred learning methods and their interest to various science subjects. The test suggests different ares within the STEM (Science, Technology, Engineering and Mathematics) within which the youth may find suitable work.
We have an excerpt of these data which can be downloaded from Canvas as the nsr.rdata
file. The data.frame NSRdata
contains two variables, Science and Age:
head(NSRdata,5)
Science Age
101 4.2 16
102 4.2 16
103 4.2 16
104 3.0 19
105 2.8 16
Science
is an average liking score (scale 1-6) to various STEM-subjects, and Age
is a factor indicating different age-groups:
- 1: 1-12 yrs
- 13: 13-15 yrs
- 16: 16-19 yrs
- 19: 19-29 yrs
- 30: 30 + yrs
- Perform an analysis of the NSR data to check whether Age influences the liking to STEM subjects. State the model, fit the model, check model assumptions, test hypotheses, and give model critique. Write a short summary of the results.
Answer will come later.
- Follow up the previous exercise by performing a Tukey test for all pair wise comparisons with overall (family-wise) error rate 5%. Give a summary of the results.
NSRmod <- lm(Science ~ Age, data = NSRdata)
Anova(NSRmod, type = "III")
Anova Table (Type III tests)
Response: Science
Sum Sq Df F value Pr(>F)
(Intercept) 1601 1 1557.6 <2e-16 ***
Age 225 4 54.7 <2e-16 ***
Residuals 10270 9995
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pt <- simple.glht(NSRmod, "Age", corr = c("Tukey"),level = 0.95)
print(pt)
Simultaneous Confidence Intervals and Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Science ~ Age, data = NSRdata)
Quantile = 2.73
Minimum significant difference = 0.157
95% confidence level
Linear Hypotheses:
Lower Center Upper Std.Err t value P(>t)
1-13 0.0455 0.2021 0.3588 0.0574 3.52 0.0040 **
1-16 -0.0183 0.1384 0.2951 0.0574 2.41 0.1126
1-19 0.1079 0.2646 0.4212 0.0574 4.61 0.00004063 ***
1-30 -0.3466 -0.1899 -0.0333 0.0574 -3.31 0.0084 **
13-16 -0.2204 -0.0637 0.0929 0.0574 -1.11 0.8015
13-19 -0.0942 0.0624 0.2191 0.0574 1.09 0.8132
13-30 -0.5487 -0.3921 -0.2354 0.0574 -6.83 < 2e-16 ***
16-19 -0.0305 0.1262 0.2828 0.0574 2.20 0.1808
16-30 -0.4850 -0.3283 -0.1717 0.0574 -5.72 0.00000011 ***
19-30 -0.6112 -0.4545 -0.2978 0.0574 -7.91 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
WARNING: Unbalanced data may lead to poor estimates
The anova table indicates a clear significance of Age-group with regard to the interest level to Science subjects. hence, at least two age-groups are have significantly different expected means. We can use the Tukey test to identify pairwise differences.
The Tukey output provides differences in mean (column “center”) between all pairs of groups, and at the top of output the minimum difference yielding a significant difference in means is given to be 0.157. That is, all age-groups with a difference in averages of more than 0.157 are significantly different, according to Tukey. From the p-values we observe that 6 out of 10 pairs are significantly different, and the largest difference is found between age groups 30+ and 19-29.
cld(pt)
Tukey's HSD
Alpha: 0.05
Mean G1 G2 G3
30 3.33 A
1 3.14 B
16 3.00 B C
13 2.94 C
19 2.88 C
The compact letter display gives a grouping of the similar levels, and there are three groups of levels that are internaly non-significantly different. 30+ is different from all other levels, whereas 1-12 and 16-19 are similar and 16-19, 13-15 and 19-29 are also similar.