Species Data

Consider the plant species count data used in lecture of Generalized linear model.

    1. Redo the analysis using the Biomass and pH variables as predictors for species counts (as in slide 9). Find separate expressions for the estimated linear predictors \(\hat{\eta}_j\) for each pH-level \(j\) = “low”,“medium”,“high”. Also find the estimated expected count for Biomass = 2 and pH = high.

First we fit the model:

mod2.glm <- glm(Species ~ Biomass + pH, family = poisson, data = species)
summod2 <- summary(mod2.glm)
summod2$coefficients
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    3.849     0.0528   72.89 0.00e+00
Biomass       -0.128     0.0101  -12.58 2.76e-36
pH(low)       -1.136     0.0672  -16.91 3.77e-64
pH(mid)       -0.445     0.0549   -8.11 4.88e-16

Then the estimated linear predictors are:

\[\eta_\text{low} = 3.85 - 0.128\cdot \text{Biomass} - 1.14\]

\[\eta_\text{medium} = 3.85 - 0.128\cdot \text{Biomass} - 0.45\]

\[\eta_\text{high} = 3.85 - 0.128\cdot \text{Biomass}\]

For Biomass = 2 and pH = high the estimated expected count is:

\[\hat{\lambda} = \exp(3.85 - 0.128\cdot 2) = 36.4\]

Alternatively we could compute this in R by:

predict(mod2.glm, newdata = data.frame(Biomass = 2, pH = "high"), type = "response")
   1 
36.4 
    1. Fit the same model as in a. using the quasipoisson approach. Is there any evidence of overdisperion?
mod2q.glm <- glm(Species ~ Biomass + pH, family = quasipoisson, data = species)
summary(mod2q.glm)

Call:
glm(formula = Species ~ Biomass + pH, family = quasipoisson, 
    data = species)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.596  -0.699  -0.074   0.665   3.560  

Coefficients:
            Estimate Std. Error t value      Pr(>|t|)    
(Intercept)   3.8489     0.0556   69.28       < 2e-16 ***
Biomass      -0.1276     0.0107  -11.96       < 2e-16 ***
pH(low)      -1.1364     0.0707  -16.07       < 2e-16 ***
pH(mid)      -0.4452     0.0577   -7.71 0.00000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 1.11)

    Null deviance: 452.346  on 89  degrees of freedom
Residual deviance:  99.242  on 86  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

The dispersion parameter is only marginally larger than 1. We conclude that over-disperion is not a problem here.

    1. Extend the model from a. by including an interaction between Biomass and pH. Test the significance of the interaction. Is there a significant difference between pH=“high” and pH=“medium” with regard to the effect of Biomass to the species count?
mod3.glm <- glm(Species ~ Biomass * pH, family = poisson, data = species)
summary(mod3.glm)

Call:
glm(formula = Species ~ Biomass * pH, family = poisson, data = species)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.498  -0.748  -0.040   0.557   3.230  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)       3.7681     0.0615   61.24  < 2e-16 ***
Biomass          -0.1071     0.0125   -8.58  < 2e-16 ***
pH(low)          -0.8156     0.1028   -7.93  2.2e-15 ***
pH(mid)          -0.3315     0.0922   -3.60  0.00032 ***
Biomass:pH(low)  -0.1550     0.0400   -3.87  0.00011 ***
Biomass:pH(mid)  -0.0319     0.0231   -1.38  0.16695    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 452.346  on 89  degrees of freedom
Residual deviance:  83.201  on 84  degrees of freedom
AIC: 514.4

Number of Fisher Scoring iterations: 4
anova(mod2.glm, mod3.glm, test = "Chi")
Analysis of Deviance Table

Model 1: Species ~ Biomass + pH
Model 2: Species ~ Biomass * pH
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)    
1        86       99.2                         
2        84       83.2  2       16  0.00033 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction term is highly significant with a p-value of 0.00033. The Wald tests (the approximate normal test) given in the summary output gave a p-value of 0.167 for the term Biomass:pHmid which really is the contrast between interactions Biomass:pHhighand Biomass:pHmedium since pHhigh is the reference level in this model. The relatively large p-value says that there is no evidence that Biomass has different effect on counts for these two pH-levels.

    1. Optional R-programming exercise for those interested: Reproduce the graph on slide 3 of lesson 10 and add curves visualizing the estimated expected counts as a function of Biomass and with separate curves for the pH-levels. The curves should be based on the fitted model with interaction between Biomass and pH-level.

Answers from students