Strength of chemical pastes
pastes.Rd
A chemical paste product is delivered in casks, and there are thought to be variations in the mean strength of the paste between delivery batches. When a batch arrives, the paste is tested by sampling casks from the batch and then analysing material from the casks. Further sources of error arise in the sampling and in the analysis. The data come from ten randomly chosen delivery batches; from each, three casks are selected at random, and two analyses are carried out on the contents of each selected cask. The response is the percentage paste strength in the analysed sample.
Format
A data frame with 60 rows and 4 columns:
- batch
Factor. Batch number (1 to 10)
- cask
Factor. Cask number within each cask(1 to 3)
- sample
Factor. Sample number within each cask (1 or 2)
- strength
Percentage paste strength in the analysed sample
Source
Davies, O.L. and Goldsmith, P.L. (eds.) (1972) Statistical Methods in Research and Production, 4th Edition, Edinburgh: Oliver and Boyd, 136.
Details
Davies and Goldsmith use the following data to demonstrate the use of analysis of variance in a hierarchical situation where the aim is to estimate variance components.
This data is also useful to demonstrate the difference between crossed and
nested random effect structures in linear mixed models in R using the
lme4
package.
Examples
library(lme4)
# The cask numbers are not uniquely defined across batches
xtabs(~ batch + cask, pastes)
#> cask
#> batch 1 2 3
#> 1 2 2 2
#> 2 2 2 2
#> 3 2 2 2
#> 4 2 2 2
#> 5 2 2 2
#> 6 2 2 2
#> 7 2 2 2
#> 8 2 2 2
#> 9 2 2 2
#> 10 2 2 2
# This can lead to a "crossed" random effect design if the model terms are
# not correctly specified
m1 <- lmer(strength ~ (1 | batch) + (1 | cask), data = pastes)
summary(m1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: strength ~ (1 | batch) + (1 | cask)
#> Data: pastes
#>
#> REML criterion at convergence: 301.5
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.49025 -0.90096 -0.01247 0.62911 1.82246
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> batch (Intercept) 3.3639 1.8341
#> cask (Intercept) 0.1487 0.3856
#> Residual 7.3060 2.7030
#> Number of obs: 60, groups: batch, 10; cask, 3
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 60.0533 0.7125 84.28
# The model identifies 3 different casks used repeatedly over the 10 batches (WRONG)
summary(m1)$ngrps
#> batch cask
#> 10 3
# Specifying the random effect structure avoids this issue, and correctly leads
# to a "nested" random effect structure
m2 <- lmer(strength ~ (1 | batch / cask), data = pastes)
summary(m2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: strength ~ (1 | batch/cask)
#> Data: pastes
#>
#> REML criterion at convergence: 247
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.4798 -0.5156 0.0095 0.4720 1.3897
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> cask:batch (Intercept) 8.434 2.9041
#> batch (Intercept) 1.657 1.2874
#> Residual 0.678 0.8234
#> Number of obs: 60, groups: cask:batch, 30; batch, 10
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 60.0533 0.6769 88.72
# There are indeed 30 combinations of casks and batches (3 *different* casks used
# in each batch)
summary(m2)$ngrps
#> cask:batch batch
#> 30 10