Species Data
Consider the plant species count data used in lecture of Generalized linear model.
- Redo the analysis using the
Biomass
andpH
variables as predictors for species counts (as in slide 9). Find separate expressions for the estimated linear predictors \(\hat{\eta}_j\) for eachpH
-level \(j\) = “low
”,“medium
”,“high
”. Also find the estimated expected count forBiomass = 2
andpH = high
.
- Redo the analysis using the
First we fit the model:
mod2.glm <- glm(Species ~ Biomass + pH, family = poisson, data = species)
summod2 <- summary(mod2.glm)
summod2$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.849 0.0528 72.89 0.00e+00
Biomass -0.128 0.0101 -12.58 2.76e-36
pH(low) -1.136 0.0672 -16.91 3.77e-64
pH(mid) -0.445 0.0549 -8.11 4.88e-16
Then the estimated linear predictors are:
\[\eta_\text{low} = 3.85 - 0.128\cdot \text{Biomass} - 1.14\]
\[\eta_\text{medium} = 3.85 - 0.128\cdot \text{Biomass} - 0.45\]
\[\eta_\text{high} = 3.85 - 0.128\cdot \text{Biomass}\]
For Biomass = 2
and pH = high
the estimated expected count is:
\[\hat{\lambda} = \exp(3.85 - 0.128\cdot 2) = 36.4\]
Alternatively we could compute this in R by:
predict(mod2.glm, newdata = data.frame(Biomass = 2, pH = "high"), type = "response")
1
36.4
- Fit the same model as in a. using the quasipoisson approach. Is there any evidence of overdisperion?
mod2q.glm <- glm(Species ~ Biomass + pH, family = quasipoisson, data = species)
summary(mod2q.glm)
Call:
glm(formula = Species ~ Biomass + pH, family = quasipoisson,
data = species)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.596 -0.699 -0.074 0.665 3.560
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.8489 0.0556 69.28 < 2e-16 ***
Biomass -0.1276 0.0107 -11.96 < 2e-16 ***
pH(low) -1.1364 0.0707 -16.07 < 2e-16 ***
pH(mid) -0.4452 0.0577 -7.71 0.00000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 1.11)
Null deviance: 452.346 on 89 degrees of freedom
Residual deviance: 99.242 on 86 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
The dispersion parameter is only marginally larger than 1. We conclude that over-disperion is not a problem here.
- Extend the model from a. by including an interaction between Biomass and pH. Test the significance of the interaction. Is there a significant difference between pH=“high” and pH=“medium” with regard to the effect of Biomass to the species count?
mod3.glm <- glm(Species ~ Biomass * pH, family = poisson, data = species)
summary(mod3.glm)
Call:
glm(formula = Species ~ Biomass * pH, family = poisson, data = species)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.498 -0.748 -0.040 0.557 3.230
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.7681 0.0615 61.24 < 2e-16 ***
Biomass -0.1071 0.0125 -8.58 < 2e-16 ***
pH(low) -0.8156 0.1028 -7.93 2.2e-15 ***
pH(mid) -0.3315 0.0922 -3.60 0.00032 ***
Biomass:pH(low) -0.1550 0.0400 -3.87 0.00011 ***
Biomass:pH(mid) -0.0319 0.0231 -1.38 0.16695
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 452.346 on 89 degrees of freedom
Residual deviance: 83.201 on 84 degrees of freedom
AIC: 514.4
Number of Fisher Scoring iterations: 4
anova(mod2.glm, mod3.glm, test = "Chi")
Analysis of Deviance Table
Model 1: Species ~ Biomass + pH
Model 2: Species ~ Biomass * pH
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 86 99.2
2 84 83.2 2 16 0.00033 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction term is highly significant with a p-value of 0.00033. The Wald tests (the approximate normal test) given in the summary output gave a p-value of 0.167 for the term Biomass:pHmid
which really is the contrast between interactions Biomass:pHhigh
and Biomass:pHmedium
since pHhigh
is the reference level in this model. The relatively large p-value says that there is no evidence that Biomass has different effect on counts for these two pH-levels.
- Optional R-programming exercise for those interested: Reproduce the graph on slide 3 of lesson 10 and add curves visualizing the estimated expected counts as a function of Biomass and with separate curves for the pH-levels. The curves should be based on the fitted model with interaction between Biomass and pH-level.