library(lme4)
library(effects)
library(corrplot)
library(stargazer)
library(car)
library(MASS)
library(ggplot2)
library(texreg)
This is a special type of data where there is only one measurement per student (i.e., there is no within subject variable), and students are nested in schools so we must control for the effects of schools.
Multilevel data occur when observations are nested within groups, for example, when students are nested within schools in a district.
Our simple story -
We looked at 6 schools (3 rich and 3 poor) with 40 students in each rich school and 160 students in each poor school, and we measured them on Happiness, number of Friends, and GPA. We wondered if happiness could be predicted by number of friends and/or GPA.
Our Level 1 will be the lowest unit of observation - those nested in groups. In this case, Level 1 is the students.
Our Level 2 refers to the groups in which the level 1 units are nested; in this case Level 2 is the schools.
Create a distribution for each school, making the variance of rich schools small and the variance of poor schools large. We’ve also made the equations different so that our X (number of friends) coefficient is negative in rich schools but positive in the poor schools. We keep our Z (GPA) coefficient the same for all schools.
set.seed(42)
nrich=40
npoor=160
#Paramaters
S.F.Rich=-2 # setting paramaeters for the Friends variable in the Rich schools
S.F.Poor=6 # setting parameters for the Friends variable in the Poor schools
S.G.Rich=.7 # setting parameters for the GPA variable in the Rich schools
S.G.Poor=.7 # setting parameters for the GPA variable in the Poor schools
# Rich Schools
# School 1
X1 <- rnorm(nrich, 10, 2) # number of friends
Z1 <- runif(nrich, 1.0, 4.0) # GPA
Y1 <- S.F.Rich*X1 + S.G.Rich*Z1 + 80 + rnorm(nrich, sd= 5) # Our equation to create Y
# School 2
X2 <- rnorm(nrich, 10, 2) # number of friends
Z2 <- runif(nrich, 1.0, 4.0) # GPA
Y2 <- S.F.Rich*X2 + S.G.Rich*Z2 + 75 + rnorm(nrich, sd= 5)
# School 3
X3 <- rnorm(nrich, 10, 2) # number of friends
Z3 <- runif(nrich, 1.0, 4.0) # GPA
Y3 <- S.F.Rich*X3 + S.G.Rich*Z3 +90 + rnorm(nrich, sd= 5)
# Poor Schools
# School 4
X4 <- rnorm(npoor, 5, 2) #number of friends
Z4 <- runif(npoor, 1.0, 4.0) #GPA
Y4 <- S.F.Poor*X4 + S.G.Poor*Z4 + 35 + rnorm(npoor, sd = 10)
# School 5
X5 <- rnorm(npoor, 5, 2) #number of friends
Z5 <- runif(npoor, 1.0, 4.0) #GPA
Y5 <- S.F.Poor*X5 + S.G.Poor*Z5 + 40 + rnorm(npoor, sd = 10)
# School 6
X6 <- rnorm(npoor, 5, 2) #number of friends
Z6 <- runif(npoor, 1.0, 4.0) #GPA
Y6 <- S.F.Poor*X6 + S.G.Poor*Z6 + 50 + rnorm(npoor, sd = 10)
Now that we have our equations, we’re going to create the data frame for each school.
# The 3 Rich Schools:
Student.Data.School.1<-data.frame(Happiness=Y1, Friends=X1, GPA=Z1)
Student.Data.School.2<-data.frame(Happiness=Y2, Friends=X2, GPA=Z2)
Student.Data.School.3<-data.frame(Happiness=Y3, Friends=X3, GPA=Z3)
# The 3 Poor Schools:
Student.Data.School.4<-data.frame(Happiness=Y4, Friends=X4, GPA=Z4)
Student.Data.School.5<-data.frame(Happiness=Y5, Friends=X5, GPA=Z5)
Student.Data.School.6<-data.frame(Happiness=Y6, Friends=X6, GPA=Z6)
Let’s test if these formulas work with an example from each set of schools (one rich and one poor).
# Rich - School 1
corr.student = cor(Student.Data.School.1)
corrplot(corr.student, method = "number",diag = FALSE,type = "lower")
Great, our correlation between Friends and Happiness is -.76 and we have a correlation between GPA and Happiness of -.05.
# Poor - School 4
corr.student = cor(Student.Data.School.4)
corrplot(corr.student, method = "number",diag = FALSE,type = "lower")
Great, our correlation between Friends and Happiness is .77, which is higher AND in the opposite direction than in rich schools. There was no correlation between GPA and Happiness.
Pay attention to the fact that the schools have different correlation values for Friends and Happiness depending on if they’re rich or poor. Rich schools have r = -.76 and poor schools have r = .77. For GPA and Happiness, only the rich school had a correlation. This will be important.
NOTE: Although I am telling you that some schools are rich and some are poor, you may not know this information when you come in and try to do the analysis. If we knew this information, we would put SES as another Level 2 predictor.
Now that we know they work, let’s put all schools together into one dataset.
All.Schools.Data <- rbind(Student.Data.School.1, Student.Data.School.2, Student.Data.School.3, Student.Data.School.4, Student.Data.School.5, Student.Data.School.6)
head(All.Schools.Data)
## Happiness Friends GPA
## 1 54.60136 12.741917 2.744812
## 2 64.21655 8.870604 1.473716
## 3 62.91056 10.726257 2.077085
## 4 66.52306 11.265725 2.936896
## 5 57.07570 10.808537 3.327470
## 6 68.82087 9.787751 2.690941
Now let’s add in Student ID’s and the Schools as variables.
# Adding the subject variable (Student ID)
All.Schools.Data$StudentID<-seq(1:nrow(All.Schools.Data))
# Did it work?
head(All.Schools.Data)
## Happiness Friends GPA StudentID
## 1 54.60136 12.741917 2.744812 1
## 2 64.21655 8.870604 1.473716 2
## 3 62.91056 10.726257 2.077085 3
## 4 66.52306 11.265725 2.936896 4
## 5 57.07570 10.808537 3.327470 5
## 6 68.82087 9.787751 2.690941 6
# Yes!
# Adding the School variable.
All.Schools.Data$School<-c(rep(1, nrich), rep(2,nrich), rep(3, nrich), rep(4, npoor),
rep(5, npoor), rep(6, npoor))
Let’s see what the scatter plots for each school.
Now let’s plot up raw data.
First, let’s plot with Friends as our IV.
theme_set(theme_bw(base_size = 12, base_family = ""))
# Friends
Model.Plot.Friends <-ggplot(data = All.Schools.Data, aes(x = Friends, y=Happiness,group=School))+
facet_grid( ~ School)+
geom_point(aes(colour = School))+
geom_smooth(method = "lm", se = TRUE, aes(colour = School))+
xlab("Friends")+ylab("Happiness")+
theme(legend.position = "none")
Model.Plot.Friends
Now, let’s plot with GPA as our IV.
Model.Plot.GPA <-ggplot(data = All.Schools.Data, aes(x =GPA, y=Happiness,group=School))+
facet_grid( ~ School)+
geom_point(aes(colour = School))+
geom_smooth(method = "lm", se = TRUE, aes(colour = School))+
xlab("GPA")+ylab("Happiness")+
theme(legend.position = "none")
Model.Plot.GPA
Let’s see what our correlations look like when lumping all the rich and poor schools together.
corr.student = cor(All.Schools.Data[,1:3])
corrplot(corr.student, method = "number",diag = FALSE,type = "lower")
Notice how the correlation between Friends and Happiness has changed when you collapse across all rich and poor schools. The correlation is now .24 (which is between Rich and Poor schools). It is a positive correlation as well, but only poor schools (those with more students) had a positive correlation. Also notice that the correlation between GPA and Happiness is .05, which is the same as for the rich school - the only one that had showed a correlation.
Let’s try running a normal regression on all the data at once.
First, let’s center the variables.
All.Schools.Data$Friends.C<-scale(All.Schools.Data$Friends, scale = FALSE)[,]
All.Schools.Data$GPA.C<-scale(All.Schools.Data$GPA, scale = FALSE)[,]
Now we can run the regression.
Reg.Model<-lm(Happiness ~ Friends.C + GPA.C, data = All.Schools.Data)
summary(Reg.Model)
##
## Call:
## lm(formula = Happiness ~ Friends.C + GPA.C, data = All.Schools.Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.136 -10.946 -0.396 10.747 46.556
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.6843 0.6396 110.519 < 2e-16 ***
## Friends.C 1.4021 0.2317 6.051 2.54e-09 ***
## GPA.C 0.9560 0.7142 1.339 0.181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.67 on 597 degrees of freedom
## Multiple R-squared: 0.05975, Adjusted R-squared: 0.0566
## F-statistic: 18.97 on 2 and 597 DF, p-value: 1.03e-08
#plot(Reg.Model)
Putting all schools together in a regular regression, we have a positive, significant effect of Friends and no effect of GPA. But what happens when we separate the schools by rich or poor status?
Now let’s see what happens if you run a normal regression on rich and poor schools separately.
# Rich Schools
School.Rich.Reg.Model<-lm(Happiness ~ Friends.C + GPA.C, data = subset(All.Schools.Data, School<4))
summary(School.Rich.Reg.Model)
##
## Call:
## lm(formula = Happiness ~ Friends.C + GPA.C, data = subset(All.Schools.Data,
## School < 4))
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.956 -6.003 -1.160 5.468 20.517
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.4094 1.5372 47.106 < 2e-16 ***
## Friends.C -2.1276 0.3464 -6.142 1.16e-08 ***
## GPA.C 0.4517 0.8171 0.553 0.581
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.924 on 117 degrees of freedom
## Multiple R-squared: 0.2457, Adjusted R-squared: 0.2328
## F-statistic: 19.05 on 2 and 117 DF, p-value: 6.868e-08
Notice, we have a negative, significant effect for Friends but no effect for GPA.
# Poor Schools
School.Poor.Reg.Model<-lm(Happiness ~ Friends.C + GPA.C, data = subset(All.Schools.Data, School>3))
summary(School.Poor.Reg.Model)
##
## Call:
## lm(formula = Happiness ~ Friends.C + GPA.C, data = subset(All.Schools.Data,
## School > 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.265 -8.085 0.389 8.044 34.997
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.6701 0.6012 130.854 <2e-16 ***
## Friends.C 6.4806 0.2798 23.162 <2e-16 ***
## GPA.C 0.6250 0.5962 1.048 0.295
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.72 on 477 degrees of freedom
## Multiple R-squared: 0.53, Adjusted R-squared: 0.528
## F-statistic: 268.9 on 2 and 477 DF, p-value: < 2.2e-16
Uh-oh. Now we have a positive, significant effect of Friends and no effect of GPA.
Remember! Although I am telling you that some schools are rich and some are poor, you may not know this information when you come in and try to do the analysis. If we knew this information, we could put SES as another Level 2 predictor. Also, I am only using six schools to demonstrate, but in a more “normal” case of nested design there may be way too many schools to treat SES as a fixed factor in the design.
The regular regression did not reflect what was happening in each school type. It gave us a positive effect (Friends) when only poor schools had a positive effect. Collapsing across school types in this case was not ideal because different things were happening within each school type, compromising the generalizability of the findings.
Another way to put this, the regular (lm) regression indicates that the more friends a student has, the happier they are, but looking closer this is not the case in all schools (and is, in fact, the opposite in some). If you were trying to generalize your findings or use them to argue for/show a need for an intervention, your results would be misleading and could cause problems.
SO, Let’s try running this as a multilevel, random effects model:
Using a multi-level model allows us to separate the within-group effects from the between-group effects, whereas regular regression blends them together into a single coefficient.
Null<-lmer(Happiness ~ 1 # This simply means Happiness predicted by the intercept
+(1|School), # each school gets its own intercept
data=All.Schools.Data, REML = FALSE)
summary(Null)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Happiness ~ 1 + (1 | School)
## Data: All.Schools.Data
##
## AIC BIC logLik deviance df.resid
## 4933.3 4946.5 -2463.7 4927.3 597
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0540 -0.6053 0.0226 0.6014 3.9123
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 54.56 7.387
## Residual 209.21 14.464
## Number of obs: 600, groups: School, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 68.336 3.103 22.02
We examine the intra-class correlation (ICC) to determine if multi-level modeling is the correct choice for our analysis. The ICC measures the degree of clustering in our data and answers the question, “How much does my Level 2 predict the total variance of my study?” If your ICC is greater than 0, you have a multi-level study.
ICC.Model<-function(Model.Name) {
tau.Null<-as.numeric(lapply(summary(Model.Name)$varcor, diag))
sigma.Null <- as.numeric(attr(summary(Model.Name)$varcor, "sc")^2)
ICC.Null <- tau.Null/(tau.Null+sigma.Null)
return(ICC.Null)
}
ICC.Model(Null)
## [1] 0.2068566
Our ICC is greater than 0, meaning we were correct to think of this as a multi-level problem.
The.Model.1<-lmer(Happiness ~ GPA.C
+(1|School),
data=All.Schools.Data, REML = FALSE)
summary(The.Model.1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Happiness ~ GPA.C + (1 | School)
## Data: All.Schools.Data
##
## AIC BIC logLik deviance df.resid
## 4934.7 4952.3 -2463.3 4926.7 596
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0109 -0.5984 0.0242 0.5938 3.8676
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 54.31 7.37
## Residual 209.01 14.46
## Number of obs: 600, groups: School, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 68.3517 3.0960 22.077
## GPA.C 0.5192 0.6622 0.784
##
## Correlation of Fixed Effects:
## (Intr)
## GPA.C 0.006
The results indicate that a student’s GPA does not have an effect on their Happiness when controlling for Level 2 fluctuations in Happiness.
This model allows the variable Friends to vary between schools.
The.Model.2<-lmer(Happiness ~ Friends.C + GPA.C
+(1+Friends.C|School), # each school gets its own intercept, and Friends can vary as a function of the school.
data=All.Schools.Data, REML = FALSE)
summary(The.Model.2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Happiness ~ Friends.C + GPA.C + (1 + Friends.C | School)
## Data: All.Schools.Data
##
## AIC BIC logLik deviance df.resid
## 4438.4 4469.2 -2212.2 4424.4 593
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6347 -0.6060 0.0229 0.6675 3.7765
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## School (Intercept) 42.14 6.492
## Friends.C 17.94 4.235 0.51
## Residual 86.37 9.294
## Number of obs: 600, groups: School, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.3044 2.7888 27.003
## Friends.C 2.1566 1.7431 1.237
## GPA.C 0.4235 0.4271 0.991
##
## Correlation of Fixed Effects:
## (Intr) Frnd.C
## Friends.C 0.454
## GPA.C 0.002 0.001
The results indicate that the number of Friends a student has does not have an effect on Happiness when controlling for the random effects of Level 2 influences.
In our regular (lm) regression, Friends had a significant effect, b = 1.40 (p < .001). However, in our mixed (lmer) regression, Friends had a larger (2.16), but non-significant effect.
Why is this important? The goal of multi-level modeling is to draw a conclusion about the general sample that you have while controlling for differences you are not trying to explain (in this example, rich vs. poor). Not properly controlling for these differences, which you may often not know are there, will increase your chance of Type I error. Because the effect of Friends was different in different schools, it makes sense that the multi-level model (MLM) did not show a significant effect. In the present example, our MLM gave us a more accurate interpretation - that no main effect of Friends existed for all schools generally.
For a comprehensive look at how to run diagnostics, please see Chapter 12 and Chapter 18
P-values are hotly disputed in Mixed Models.
One solution is to estimate degrees of freedom to get p-values. SAS/SPSS uses Satterthwaite approximations. There are also Kenward-Roger approximations (see Westfall, Kenny, & Judd, 2014). Either are acceptable. Making a number of assumptions, these outputs will mirror the results of a traditional ANOVA fairly closely.
install.packages(“lmerTest”)
This is a mixed model add-on package to calculate p-values based on a certain set of assumptions. Each ddf is a different method of attaining p-values, so you can choose which to run. I give you three examples below. you will need to refit the models with the package lmerTest
installed and loaded.
library(lmerTest)
The.Model.2<-lmer(Happiness ~ Friends.C + GPA.C
+(1+Friends.C|School),
data=All.Schools.Data, REML = FALSE)
summary(The.Model.2,ddf = "lme4")
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Happiness ~ Friends.C + GPA.C + (1 + Friends.C | School)
## Data: All.Schools.Data
##
## AIC BIC logLik deviance df.resid
## 4438.4 4469.2 -2212.2 4424.4 593
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6347 -0.6060 0.0229 0.6675 3.7765
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## School (Intercept) 42.14 6.492
## Friends.C 17.94 4.235 0.51
## Residual 86.37 9.294
## Number of obs: 600, groups: School, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 75.3044 2.7888 27.003
## Friends.C 2.1566 1.7431 1.237
## GPA.C 0.4235 0.4271 0.991
##
## Correlation of Fixed Effects:
## (Intr) Frnd.C
## Friends.C 0.454
## GPA.C 0.002 0.001
summary(The.Model.2, ddf = "Satterthwaite") # SAS method
## Linear mixed model fit by maximum likelihood t-tests use Satterthwaite
## approximations to degrees of freedom [lmerMod]
## Formula: Happiness ~ Friends.C + GPA.C + (1 + Friends.C | School)
## Data: All.Schools.Data
##
## AIC BIC logLik deviance df.resid
## 4438.4 4469.2 -2212.2 4424.4 593
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6347 -0.6060 0.0229 0.6675 3.7765
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## School (Intercept) 42.14 6.492
## Friends.C 17.94 4.235 0.51
## Residual 86.37 9.294
## Number of obs: 600, groups: School, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 75.3044 2.7888 6.0000 27.003 1.55e-07 ***
## Friends.C 2.1566 1.7431 6.0000 1.237 0.262
## GPA.C 0.4235 0.4271 588.8000 0.991 0.322
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Frnd.C
## Friends.C 0.454
## GPA.C 0.002 0.001
If you want to report main effects and interactions instead of slopes, you might convert your mixed model into an ANOVA.
anova(The.Model.2,ddf = "Kenward-Roger") # Kenny et al suggested method
## Analysis of Variance Table of type III with Kenward-Roger
## approximation for degrees of freedom
## Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
## Friends.C 108.798 108.798 1 5.00 1.2596 0.3127
## GPA.C 83.824 83.824 1 587.57 0.9705 0.3250
Huta, V. (2014). When to use hierarchical linear modeling. The Quantitative Methods for Psychology, 10(1): 13- 28.
Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020-2045.
Woltman, H., Feldstain, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1): 52-69.