Skip to contents

A chemical paste product is delivered in casks, and there are thought to be variations in the mean strength of the paste between delivery batches. When a batch arrives, the paste is tested by sampling casks from the batch and then analysing material from the casks. Further sources of error arise in the sampling and in the analysis. The data come from ten randomly chosen delivery batches; from each, three casks are selected at random, and two analyses are carried out on the contents of each selected cask. The response is the percentage paste strength in the analysed sample.

Usage

pastes

Format

A data frame with 60 rows and 4 columns:

batch

Factor. Batch number (1 to 10)

cask

Factor. Cask number within each cask(1 to 3)

sample

Factor. Sample number within each cask (1 or 2)

strength

Percentage paste strength in the analysed sample

Source

Davies, O.L. and Goldsmith, P.L. (eds.) (1972) Statistical Methods in Research and Production, 4th Edition, Edinburgh: Oliver and Boyd, 136.

Details

Davies and Goldsmith use the following data to demonstrate the use of analysis of variance in a hierarchical situation where the aim is to estimate variance components.

This data is also useful to demonstrate the difference between crossed and nested random effect structures in linear mixed models in R using the lme4 package.

Examples

library(lme4)

# The cask numbers are not uniquely defined across batches
xtabs(~ batch + cask, pastes)
#>      cask
#> batch 1 2 3
#>    1  2 2 2
#>    2  2 2 2
#>    3  2 2 2
#>    4  2 2 2
#>    5  2 2 2
#>    6  2 2 2
#>    7  2 2 2
#>    8  2 2 2
#>    9  2 2 2
#>    10 2 2 2

# This can lead to a "crossed" random effect design if the model terms are
# not correctly specified
m1 <- lmer(strength ~ (1 | batch) + (1 | cask), data = pastes)
summary(m1)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: strength ~ (1 | batch) + (1 | cask)
#>    Data: pastes
#> 
#> REML criterion at convergence: 301.5
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -1.49025 -0.90096 -0.01247  0.62911  1.82246 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  batch    (Intercept) 3.3639   1.8341  
#>  cask     (Intercept) 0.1487   0.3856  
#>  Residual             7.3060   2.7030  
#> Number of obs: 60, groups:  batch, 10; cask, 3
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  60.0533     0.7125   84.28
# The model identifies 3 different casks used repeatedly over the 10 batches (WRONG)
summary(m1)$ngrps
#> batch  cask 
#>    10     3 

# Specifying the random effect structure avoids this issue, and correctly leads
# to a "nested" random effect structure
m2 <- lmer(strength ~ (1 | batch / cask), data = pastes)
summary(m2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: strength ~ (1 | batch/cask)
#>    Data: pastes
#> 
#> REML criterion at convergence: 247
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.4798 -0.5156  0.0095  0.4720  1.3897 
#> 
#> Random effects:
#>  Groups     Name        Variance Std.Dev.
#>  cask:batch (Intercept) 8.434    2.9041  
#>  batch      (Intercept) 1.657    1.2874  
#>  Residual               0.678    0.8234  
#> Number of obs: 60, groups:  cask:batch, 30; batch, 10
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  60.0533     0.6769   88.72
# There are indeed 30 combinations of casks and batches (3 *different* casks used
# in each batch)
summary(m2)$ngrps
#> cask:batch      batch 
#>         30         10