NOTE ON PACKAGES: I use the following packages in this chapter. If you do not have these packages installed, you may use the script below to install them and load their libraries (just remove the “#” before the package you need to install).
#install.packages("lme4")
#install.packages("ggplot2")
#install.packages("HLMdiag")
#install.packages("DHARMa")
#install.packages("car")
#load the libraries so you have access to them in your workflow
library("lme4")
library("ggplot2")
library("HLMdiag")
library("DHARMa")
library("car") #for the Levene test which we will not discuss here
library("Matrix")
It is important to work within the same file environment so that data and code can be easily linked. To do so, let’s set the working directory
#setwd("set locations")
The following information is a best approximation of how to test assumptions of mixed and multilevel models as of November 2016. Much of the procedures described have been piecemeal put together through the concatenation of multiple sources (both refereed and not). There is by no means any sort of consensus within the field of statistics and the R community as a whole as to how to approach this topic systematically. I’ve attempted to pull in as many different ideas as feasible in my little chapter. As such, we will be using the lmer as opposed to the lme package. The lmer package is better suited for mixed designs and thus is more amenable to many different kinds of data. Unfortunately, unlike the lme package, lmer DOES NOT include a way to easily run model diagnostics. Some would suggest that if your model is a standard Multilevel Model (i.e. not mixed designs) to then just use the lme package to streamline the model building process.
As with any statistical manipulation, there are a specific set of assumptions under which we operate when conducting multilevel models (MLM). These assumptions are identical to those of ordinary multiple regression analyses, but the way in which we test them is quite different. In this chapter, we will examine the three most important (and most frequently violated) assumptions of MLM. To do so, let’s take a look at some sample data, fit a model, and run through how to test each assumption.
Let’s create a sample dataframe with which we will run our multilevel model and then test our assumptions. Let’s say there are 10 subjects with 4 temporal-based observations (one every year) in this hypothetical scenario. Each person will have data for age, sex, average number of cigarettes smoked each week, level of nicotine dependence from the Fagerstrom Test of Nicotine Dependence (FTND), ratings of depressive symptoms from the Beck Depression Inventory (BDI), and a count variable for the total number of lifetime major depressive episodes suffered up to that point of data collection.
#Create vectors for each of the variables
Subjects <- c(rep.int(1,4),rep.int(2,4),rep.int(3,4),rep.int(4,4),rep.int(5,4),rep.int(6,4),rep.int(7,4),rep.int(8,4),rep.int(9,4),rep.int(10,4))
Time <- c(rep(1:4,10)) #this is for illustrative purposes only - these models are much more helpful with many different data points per subject
Age <- c(18:21, 19:22, 18:21, 20:23, 19:22, 22:25, 21:24, 19:22, 20:23, 20:23)
Sex <- c(rep("M",8),rep("F",4),rep("M",4),rep("F",12),rep("M",4),rep("F",8))
Cigarettes <- c(10,12,14,0,20,20,20,20,10,20,25,40,12,9,8,8,4,4,8,20,10,10,10,10,10,20,30,60,0,0,0,0,0,1,2,3,0,0,0,30)
Fagerstrom <- c(6,8,8,0,10,10,10,10,8,9,9,10,6,8,10,12,7,7,6,7,2,2,3,8,9,9,9,9,10,10,12,13,0,0,0,0,0,0,0,12)
BDI <- c(7,7,8,9,4,5,6,8,2,2,2,2,3,4,7,8,12,14,14,17,22,22,10,9,7,5,6,8,12,15,6,18,12,14,8,4,3,2,2,1)
Depression <- c(1,1,2,3,2,3,3,3,0,1,1,1,2,2,2,2,4,5,6,6,3,3,4,5,2,2,2,2,1,1,1,1,3,3,4,5,1,1,2,2)
#Combine the vectors into a single data frame
Cigarette.Study<-data.frame(
subjectID = Subjects,
Time = Time,
Sex = Sex,
Age = Age,
Cigs = Cigarettes,
FTND = Fagerstrom,
BDI = BDI,
Depression = Depression)
Cigarette.Study$Sex.I<-ifelse(Cigarette.Study$Sex=="M",0,1) #creates a numerical value "Sex" that we can run analyses on
Cigarette.Study
## subjectID Time Sex Age Cigs FTND BDI Depression Sex.I
## 1 1 1 M 18 10 6 7 1 0
## 2 1 2 M 19 12 8 7 1 0
## 3 1 3 M 20 14 8 8 2 0
## 4 1 4 M 21 0 0 9 3 0
## 5 2 1 M 19 20 10 4 2 0
## 6 2 2 M 20 20 10 5 3 0
## 7 2 3 M 21 20 10 6 3 0
## 8 2 4 M 22 20 10 8 3 0
## 9 3 1 F 18 10 8 2 0 1
## 10 3 2 F 19 20 9 2 1 1
## 11 3 3 F 20 25 9 2 1 1
## 12 3 4 F 21 40 10 2 1 1
## 13 4 1 M 20 12 6 3 2 0
## 14 4 2 M 21 9 8 4 2 0
## 15 4 3 M 22 8 10 7 2 0
## 16 4 4 M 23 8 12 8 2 0
## 17 5 1 F 19 4 7 12 4 1
## 18 5 2 F 20 4 7 14 5 1
## 19 5 3 F 21 8 6 14 6 1
## 20 5 4 F 22 20 7 17 6 1
## 21 6 1 F 22 10 2 22 3 1
## 22 6 2 F 23 10 2 22 3 1
## 23 6 3 F 24 10 3 10 4 1
## 24 6 4 F 25 10 8 9 5 1
## 25 7 1 F 21 10 9 7 2 1
## 26 7 2 F 22 20 9 5 2 1
## 27 7 3 F 23 30 9 6 2 1
## 28 7 4 F 24 60 9 8 2 1
## 29 8 1 M 19 0 10 12 1 0
## 30 8 2 M 20 0 10 15 1 0
## 31 8 3 M 21 0 12 6 1 0
## 32 8 4 M 22 0 13 18 1 0
## 33 9 1 F 20 0 0 12 3 1
## 34 9 2 F 21 1 0 14 3 1
## 35 9 3 F 22 2 0 8 4 1
## 36 9 4 F 23 3 0 4 5 1
## 37 10 1 F 20 0 0 3 1 1
## 38 10 2 F 21 0 0 2 1 1
## 39 10 3 F 22 0 0 2 2 1
## 40 10 4 F 23 30 12 1 2 1
#Check the structure of that data frame
str(Cigarette.Study)
## 'data.frame': 40 obs. of 9 variables:
## $ subjectID : num 1 1 1 1 2 2 2 2 3 3 ...
## $ Time : int 1 2 3 4 1 2 3 4 1 2 ...
## $ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 1 1 ...
## $ Age : int 18 19 20 21 19 20 21 22 18 19 ...
## $ Cigs : num 10 12 14 0 20 20 20 20 10 20 ...
## $ FTND : num 6 8 8 0 10 10 10 10 8 9 ...
## $ BDI : num 7 7 8 9 4 5 6 8 2 2 ...
## $ Depression: num 1 1 2 3 2 3 3 3 0 1 ...
## $ Sex.I : num 0 0 0 0 0 0 0 0 1 1 ...
Skipping all of the requisite model building steps, say we find a model predicting cigarette use based on self-reported symptoms of depression (BDI), Time, BDI by Time interaction (BDI*Time), FTND score, with a random intercept for each person, an independently varying FTND score for each person, and depressive episodes by sex fits the data the best and explains the most amount of variance.
Model.F<-lmer(Cigs ~ Time*BDI+FTND
+(1|subjectID) + (0+FTND|subjectID), #random effects for subject ID and (+) FTND by (|) subject, which are allowed to vary independently from eachother (the 1 and 0 notation)
data=Cigarette.Study, REML=FALSE)
summary(Model.F) #summarizes the results of the model - somehow in the "knitting process" significance stars have been lost - R code suggests a significant effect of time and FTND score
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite
## approximations to degrees of freedom [lmerMod]
## Formula:
## Cigs ~ Time * BDI + FTND + (1 | subjectID) + (0 + FTND | subjectID)
## Data: Cigarette.Study
##
## AIC BIC logLik deviance df.resid
## 301.0 314.5 -142.5 285.0 32
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.8308 -0.5259 -0.1275 0.3722 3.8984
##
## Random effects:
## Groups Name Variance Std.Dev.
## subjectID (Intercept) 0.0000 0.0000
## subjectID.1 FTND 0.8168 0.9038
## Residual 50.7145 7.1214
## Number of obs: 40, groups: subjectID, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -9.5672 5.5833 33.5800 -1.714 0.09583 .
## Time 4.6197 2.0078 35.8700 2.301 0.02732 *
## BDI 0.6027 0.4960 32.9900 1.215 0.23288
## FTND 1.4208 0.4367 20.3300 3.253 0.00392 **
## Time:BDI -0.2105 0.2019 35.5900 -1.043 0.30416
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Time BDI FTND
## Time -0.813
## BDI -0.822 0.728
## FTND -0.251 -0.026 0.090
## Time:BDI 0.704 -0.850 -0.851 -0.068
ranef(Model.F) #examines the random effects (i.e things that are allowed to vary across units, in this case each representing our subject-level effect of FTND for our 10 participants)
## $subjectID
## (Intercept) FTND
## 1 0 0.03600961
## 2 0 0.32130460
## 3 0 0.85639494
## 4 0 -0.67526169
## 5 0 -0.35541807
## 6 0 -0.32064855
## 7 0 1.38356895
## 8 0 -1.49504433
## 9 0 0.00000000
## 10 0 0.24909453
The first thing to note is that p values are not reported by default. This is because the lme and lmer packages were written in large part by Doug Bates from the University of Wisconsin-Madison. Dr. Bates is not a fan of p value and subsequently chose not to display it in his packages. Our results suggest that both time and Fagerstrom scores are predictive of cigarette use …. But, given our small sample size (and see Maas & Hox 2005 for an exhaustive discussion on minimal sample sizes), can we be assured that the assumptions under which we operate to conduct such analyses are being met?
A regression analysis is meant to fit the best rectilinear line that explains the most data given your set of parameters. Therefore, the base models rely on the assumption that your data follows a straight line (though the models can be expanded to handle curvilinear data).
Graphically, plotting the model residuals (the difference between the observed value and the model-estimated value) vs the predictor is one simple way to test. If a pattern emerges (anything that looks non-random), a higher order term may need to be included or you may need to mathematically transform a predictor/response. In R, we’ll use the simple plot function to compare the model-predicted values to the observed ones.
Plot.Model.F.Linearity<-plot(resid(Model.F),Cigarettes) #resid() calls for the residuals of the model, Cigarettes was our initial outcome variables - we're plotting the residuals vs observered
Looks pretty random so we probably haven’t violated this assumption (see Fox 2008 for a discussion about the utility of significance vs visual testing; essentially they argue that significance tests don’t offer you additional information that your naked eye can’t already detect). Let’s look at assumption #2
Regression models assume that variance of the residuals is equal across groups. In this case, the groups we’re referring to are at the individual (i.e. subject) level.
R is an excellent program for extracting and storing your model residuals. After we have them in place, we can do a simple ANOVA to determine if they’re different for each person. This procedure is a variation of “Levene’s Test”. Essentially, we’ll extract the residuals from the model, place them in our original table, take their absolute value, and then square them (for a more robust analysis with respect to issues of normality, see Glaser 2006). Finally we’ll take a look at the ANOVA of the between subjects residuals. Let’s give it a shot below:
#for this portion of the analysis, we need to revisit about statistical significance - since the assumption is that the variance is not going to differ, we would hope to see NO STATISTICAL DIFFERENCES in the following procedure (i.e. p>0.05) to confirm that -
Cigarette.Study$Model.F.Res<- residuals(Model.F) #extracts the residuals and places them in a new column in our original data table
Cigarette.Study$Abs.Model.F.Res <-abs(Cigarette.Study$Model.F.Res) #creates a new column with the absolute value of the residuals
Cigarette.Study$Model.F.Res2 <- Cigarette.Study$Abs.Model.F.Res^2 #squares the absolute values of the residuals to provide the more robust estimate
Levene.Model.F <- lm(Model.F.Res2 ~ subjectID, data=Cigarette.Study) #ANOVA of the squared residuals
anova(Levene.Model.F) #displays the results
## Analysis of Variance Table
##
## Response: Model.F.Res2
## Df Sum Sq Mean Sq F value Pr(>F)
## subjectID 1 1624 1623.5 0.1044 0.7484
## Residuals 38 590881 15549.5
Since the p value is greater than 0.05, we can say that the variance of the residuals is equal and therefore the assumption of homoscedasticity is met Note: R does have built-in or package made Levene (and less the flexible Bartlett) tests, but I couldn’t figure out how to implement them with respect to lmer. Feel free to explore these options on your own.
Plot.Model.F <- plot(Model.F) #creates a fitted vs residual plot
Plot.Model.F
There seems to be even spread around the centered line, but I prefer the Levene’s test as I’m not entirely sure what this graph is supposed to look like
Regression models don’t require that outcome variables need to be normally distributed (see: Logistic or Poisson regression models), however MLM assume that the residuals of the analysis ARE normally distributed.
QQ plots (which are easily obtained in standard regression modeling in R) can provide an estimation of where the standardized residuals lie with respect to normal quantiles. Strong deviation from the provided line indicates that the residuals themselves are not normally distributed.
##qq plot (similar to a diagnostic plot provided by the lm function) for an estimation of the linearity of the residuals
require("lattice")
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:boot':
##
## melanoma
qqmath(Model.F, id=0.05) #id: identifies values that may be exerting undue influence on the model (i.e. outliers)
There is some deviation from from the expected normal line towards the tails, but overall the line looks straight and therefore pretty normal and suggests that the assumption is not violated. See http://data.library.virginia.edu/diagnostic-plots/ and https://www.r-bloggers.com/model-validation-interpreting-residual-plots/ for more information regarding these visual tests.
Transformations can be done to improve the relative normality of the data. Most commonly used transformations are log/ln. Let’s create log(10) transformed outcome variables to see if it improves the normality of the residuals.
Cigarette.Study$Cig.Plus.1 <- Cigarettes+1
Cigarette.Study$Cig.Log <- log10(Cigarette.Study$Cig.Plus.1)
str(Cigarette.Study)
## 'data.frame': 40 obs. of 14 variables:
## $ subjectID : num 1 1 1 1 2 2 2 2 3 3 ...
## $ Time : int 1 2 3 4 1 2 3 4 1 2 ...
## $ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 1 1 ...
## $ Age : int 18 19 20 21 19 20 21 22 18 19 ...
## $ Cigs : num 10 12 14 0 20 20 20 20 10 20 ...
## $ FTND : num 6 8 8 0 10 10 10 10 8 9 ...
## $ BDI : num 7 7 8 9 4 5 6 8 2 2 ...
## $ Depression : num 1 1 2 3 2 3 3 3 0 1 ...
## $ Sex.I : num 0 0 0 0 0 0 0 0 1 1 ...
## $ Model.F.Res : num 3.461 -0.599 -1.717 -6.759 5.957 ...
## $ Abs.Model.F.Res: num 3.461 0.599 1.717 6.759 5.957 ...
## $ Model.F.Res2 : num 11.977 0.359 2.948 45.686 35.491 ...
## $ Cig.Plus.1 : num 11 13 15 1 21 21 21 21 11 21 ...
## $ Cig.Log : num 1.04 1.11 1.18 0 1.32 ...
Model.F.2<-lmer(Cig.Log ~ Time*BDI+FTND
+(1|subjectID) + (0+FTND|subjectID),
data=Cigarette.Study, REML=FALSE)
summary(Model.F.2) #summarizes the results of the model
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite
## approximations to degrees of freedom [lmerMod]
## Formula:
## Cig.Log ~ Time * BDI + FTND + (1 | subjectID) + (0 + FTND | subjectID)
## Data: Cigarette.Study
##
## AIC BIC logLik deviance df.resid
## 26.4 39.9 -5.2 10.4 32
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.67228 -0.56209 -0.02675 0.38253 1.75817
##
## Random effects:
## Groups Name Variance Std.Dev.
## subjectID (Intercept) 0.198703 0.44576
## subjectID.1 FTND 0.004817 0.06941
## Residual 0.020319 0.14254
## Number of obs: 40, groups: subjectID, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.406378 0.224780 16.878000 1.808 0.0885 .
## Time 0.066599 0.046013 24.283000 1.447 0.1606
## BDI -0.006330 0.013984 29.952000 -0.453 0.6541
## FTND 0.050544 0.028589 8.707000 1.768 0.1120
## Time:BDI 0.002459 0.004594 24.535000 0.535 0.5972
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Time BDI FTND
## Time -0.415
## BDI -0.550 0.678
## FTND -0.294 -0.102 0.036
## Time:BDI 0.433 -0.853 -0.795 0.001
ranef(Model.F.2) #examines the random effects
## $subjectID
## (Intercept) FTND
## 1 -0.53631688 0.096595673
## 2 0.06984188 0.016932654
## 3 -0.02110366 0.038008571
## 4 0.61464581 -0.073006166
## 5 0.02039356 -0.002722592
## 6 0.54319687 -0.069215794
## 7 0.12787324 0.027901740
## 8 -0.09585722 -0.092767359
## 9 -0.21146375 0.000000000
## 10 -0.51120985 0.058273274
Notice we have lost the effects of FTND and Time due to the transformation.. Not looking good!
Let’s take a look at the QQ plot with the new model
qqmath(Model.F.2, id=0.05) #id: identifies values that may be exerting undue influence on the model (i.e. outliers)
Made it much worse… You can try other transformations (natural log, square root, etc), but there is ongoing debate as to whether or not these values should be transformed as it makes interpretation of results very difficult. In this case, since we lost the effect of FTND, it may be ok to leave as is.
https://cran.rstudio.com/web/packages/HLMdiag/index.html
HLMdiag is the dissertation of Adam Loy and is a useful to assess not only leverage and traditional deletion diagnostics (Cook’s distance, covratio, covtrace, and MDFFITS) but also convenience functions and graphics for residual analysis. At this time, only 2-level models are supported and it has not been updated since November of 2015, so if the other packages it requires are changed, you may have difficulties using this going forward. I’ll briefly discuss select functions below:
#case_delete() #iteratively delete groups corresponding to the levels of a hierarchical linear model, using lmer to fit the models for each deleted case
#covratio() #calculate measures of the change in the covariance matrices for the fixed effects based on the deletion of an observation, or group of observations,
#diagnostics() #is used to compute deletion diagnostics for a hierarchical linear model based on the building blocks returned by case_delete.
#HLMresid() #extracts residuals from a hierarchical linear model fit using lmer. Can provide a variety of different types of residuals based upon the specifications when you call the function
#leverage() #calculates the leverage of a hierarchical linear model fit
#mdffits() #calculate measures of the change in the fixed effects estimates based on the deletetion of an observation, or group of observations
https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html
DHARMa was created by Florian Hartig in 2016 and creates readily interpretable residuals for generalized linear (mixed) models that are standardized to values between 0 and 1, and that can be interpreted as intuitively as residuals for the linear model. This is achieved by a simulation-based approach, similar to the Bayesian p-value or the parametric bootstrap, that tranforms the residuals to a standardized scale. I’ll provide some basic functions and their purposes below:
#simulateResiduals() #creates scaled (quantile) residuals through a default 250 simulations (which can be modified)
#plotSimulatedResiduals #provides qqplot and residuals vs predicted plots to determine deviations from normality
#Goodness of fit tests:
#testUniformity()
#testOverdispersion()
#testZeroinflation()
#testTemporalAutocorrelation()
#testSpatialAutocorrelation()
These have been excellent resources in illuminating theory and then implementing procedures