The following chapter will include:
Interaction: When the effect of one independent variable differs based on the level or magnitude of another independent variable
For more information about interactions in regression:
Click here for Jaccard & Turrisi 2003 Interaction Effects in Multiple Regression
A simple slope is a regression line at one level of a predictor variable
Think of simple slopes as the visualization of an interaction
How do we plot these things in R?…
When running a regression in R, it is likely that you will be interested in interactions. The following packages and functions are good places to start, but the following chapter is going to teach you how to make custom interaction plots.
Why can’t we just use these packages?…
We go from “quick & dirty” simple slope plots to “pretty & customizable” graphs
If you don’t already have these packages installed, use the following functions to do so:
#install.packages("car") #An extremely useful/in-depth regression package
#install.packages("stargazer") #Produces easy to read regression results (similar to what you get in SPSS)
#install.packages("effects") #We will use this to create our interactions
#install.packages("ggplot2") #Our incredibly powerful and versatile graphing package
**One last thing before we get started…*
If the words “interaction” or “linear model” are sounding a little foreign, check out Chapter 12 for an awesome regression refresher!!
IQ and Work Ethic as Predictors of GPA
For all the examples in this chapter, we are actually going to simulate our own data. This eliminates the need for downloading a data set / calling in data.
library(car) #Even though we already installed "car", we have to tell R we want it to load this package for us to use
#You can choose whatever # you want for the seed; this is for randomization of your data set
set.seed(150)
#Let's make our data set will have 250 participants (n), perhaps college students!
n <- 250
#Uniform distribution of work ethic (X) from 1-5 (1 = poor work ethic, 5 = great work ethic)
X <- rnorm(n, 2.75, .75)
#We want a normal distribution of IQ (Z)
#I fixed the mean of IQ to 15 so that the regression equation works realistically, SD = 15
Z <- rnorm(n, 15, 15)
#We then create Y using a regression equation (adding a bit of random noise)
Y <- .7*X + .3*Z + 2.5*X*Z + rnorm(n, sd = 5)
#This code is here so that Y (GPA) is capped at 4.0 (the logical max for GPA)
Y = (Y - min(Y)) / (max(Y) - min(Y))*4
#Finally, we put our data together with the data.frame() function
GPA.Data <- data.frame(GPA=Y, Work.Ethic=X, IQ=Z)
Why do we center our variables?
GPA.Data$IQ.C <- scale(GPA.Data$IQ, center = TRUE, scale = FALSE)[,]
GPA.Data$Work.Ethic.C <- scale(GPA.Data$Work.Ethic, center = TRUE, scale = FALSE)[,]
Use lm() function to run model with and without interaction
Use stargazer() to get a pretty, user-friendly chart of your results
GPA.Model.1 <- lm(GPA~IQ.C+Work.Ethic.C, GPA.Data)
GPA.Model.2 <- lm (GPA~IQ.C*Work.Ethic.C, GPA.Data)
library(stargazer)
stargazer(GPA.Model.1, GPA.Model.2,type="html",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE)
Dependent variable: | ||
GPA | ||
Main Effects | Interaction | |
(1) | (2) | |
Constant | 2.054*** | 2.054*** |
(0.008) | (0.002) | |
IQ.C | 0.041*** | 0.040*** |
(0.001) | (0.0001) | |
Work.Ethic.C | 0.199*** | 0.202*** |
(0.012) | (0.002) | |
IQ.C:Work.Ethic.C | 0.014*** | |
(0.0002) | ||
Observations | 250 | 250 |
R2 | 0.959 | 0.998 |
Adjusted R2 | 0.959 | 0.998 |
Residual Std. Error | 0.134 (df = 247) | 0.026 (df = 246) |
F Statistic | 2,888.028*** (df = 2; 247) | 51,713.400*** (df = 3; 246) |
Note: | p<0.1; p<0.05; p<0.01 |
When we are plotting the simple slopes of a continuous IV X continuous IV, we have to specify what levels of each we want to examine. There are 3 methods for choosing levels: hand picking, quantiles, standard deviation
For the next 3 methods, we are going to specify the centered Work Ethic IV to range from -2.5 to 2.5, increasing by .5, but for the centered IQ IV, we will show 3 different theoretical ways to choose our levels.
library(effects)
#Run the interaction
Inter.HandPick <- effect('IQ.C*Work.Ethic.C', GPA.Model.2,
xlevels=list(IQ.C = c(-15, 0, 15),
Work.Ethic.C = c(-1.1, 0, 1.1)),
se=TRUE, confidence.level=.95, typical=mean)
#Put data in data frame
Inter.HandPick <- as.data.frame(Inter.HandPick)
#Check out what the "head" (first 6 rows) of your data looks like
head(Inter.HandPick)
## IQ.C Work.Ethic.C fit se lower upper
## 1 -15 -1.1 1.464610 0.004670705 1.455410 1.473809
## 2 0 -1.1 1.831723 0.003040883 1.825734 1.837713
## 3 15 -1.1 2.198836 0.004717661 2.189544 2.208129
## 4 -15 0.0 1.460340 0.002350381 1.455711 1.464970
## 5 0 0.0 2.054450 0.001663308 2.051174 2.057726
## 6 15 0.0 2.648560 0.002348376 2.643934 2.653185
#Create a factor of the IQ variable used in the interaction
Inter.HandPick$IQ <- factor(Inter.HandPick$IQ.C,
levels=c(-15, 0, 15),
labels=c("1 SD Below Population Mean", "Population Mean", "1 SD Above Population Mean"))
#Create a factor of the Work Ethic variable used in the interaction
Inter.HandPick$Work.Ethic <- factor(Inter.HandPick$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))
library(ggplot2)
Plot.HandPick<-ggplot(data=Inter.HandPick, aes(x=Work.Ethic, y=fit, group=IQ))+
geom_line(size=2, aes(color=IQ))+
ylim(0,4)+
ylab("GPA")+
xlab("Work Ethic")+
ggtitle("Hand Picked Plot")
Plot.HandPick
#In R, you have to "call for" your graphs after you make them in order to see them
#Code to save plot to your computer
#ggsave("Plot.1.png", Plot.1,width = 5, height = 5, units = "in")
This plot here is an example of pretty much the simplest you can get with ggplot. These are the default settings with respect to all aesthetic elements. As we go through this chapter, I will give you bits of code that will help you make your graph prettier, more colorful, or better suited for publishing.
For even more ggplot fun, refer to Chapter 10 or this awesome ggplot Cheat Sheet
In terms of what this graph is telling us, we can visualize the fact that for smart people (1 SD above the population mean (not determined by our data set), as their work ethic increases, so does their GPA. A similar pattern is seen for people with average IQs, though the effect is not nearly as strong. For people 1 SD below the population mean on IQ, as their work ethic increases, it appears as though their GPA actually decreases. Interesting! Maybe they get more confused with the material? Who knows!
#Make your new IQ variable that asks for quantiles
IQ.Quantile <- quantile(GPA.Data$IQ.C, probs=c(0,.25,.50,.75,1))
IQ.Quantile <- round(IQ.Quantile, 2)
IQ.Quantile
## 0% 25% 50% 75% 100%
## -46.38 -9.94 -1.10 10.60 36.48
library(effects)
#Run your interaction
Inter.Quantile <- effect('IQ.C*Work.Ethic.C', GPA.Model.2,
xlevels=list(IQ.C = c(-35.44, -9.78, -0.04, 9.89, 41.90),
Work.Ethic.C = c(-1.1, 0, 1.1)),
se=TRUE, confidence.level=.95, typical=mean)
#Put data into data frame
Inter.Quantile <- as.data.frame(Inter.Quantile)
#Create factors of the different variables in your interaction:
Inter.Quantile$IQ<-factor(Inter.Quantile$IQ.C,
levels=c(-35.44, -9.78, -0.04, 9.89, 41.90),
labels=c("0%", "25%", "50%", "75%", "100%"))
Inter.Quantile$Work.Ethic<-factor(Inter.Quantile$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))
FUN WITH FONTS
install.packages(extrafont)
I did not include this package up front, as it is totally optional! If you want to play around with different font options, install this package and load it
After installation/loading, you will want to run the following code: font_import()
This code can take a few minutes to run, which is why I have not included it in the coded section of this chapter.
The last function you will need is: fonts() You will get a list of all the fonts accessible to you in R
library(extrafont)
#font_import() # run this line of coded here to install fonts
library(ggplot2)
Plot.Quantile<-ggplot(data=Inter.Quantile, aes(x=Work.Ethic, y=fit, group=IQ))+
geom_line(size=2, aes(color=IQ))+
ylab("GPA")+
xlab("Work Ethic")+
scale_color_manual(values=c("#42c5f4","#54f284","#f45dcc",
"#ff9d35","#d7afff"))+ #custom color coding
theme_bw()+ #deleting the gray background
theme(text = element_text(family="Impact", size=14, color="black"))+ #changing font!
ggtitle("Quantile Plot") #adding a title!
Plot.Quantile
So to recap the codes we learned in this plot, we now know how to change fonts, get rid of the gray background, add a title, and choose custom colors!
You may be wondering where I got these funky letter/number combinations that translate into colors. If you google “html color picker”, you can copy the color code of any color your heart desires! Pretty neat!!
Now, in terms of what we’re learning from this graph - we can see the interaction effects a lot more clearly. It seems as though all people in the 25th percentile or higher are experiencing some degreee of a positive relationship between work ethic and GPA. As work ethic and IQ increase, so does GPA! Unfortunately, for this group below the 25th percentile, there is a pretty clear negative relationship indicating that is their work ethic increases, their GPA actually decreases. Not good!
#Create our new variable for IQ based on the actual mean/standard deviation in our data set
IQ.SD <- c(mean(GPA.Data$IQ.C)-sd(GPA.Data$IQ.C),
mean(GPA.Data$IQ.C),
mean(GPA.Data$IQ.C)+sd(GPA.Data$IQ.C))
IQ.SD <- round(IQ.SD, 2)
IQ.SD
## [1] -15.29 0.00 15.29
# Note: the mean is 0 because we mean centered our data, meaning we said, make
# the mean of our data = 0! Also, we see that our standard deviations are pretty
# darn close to the expected population standard deviations. Keep in mind that
# this is simulated data, and most data in the real world will not produce such
# "typical" data
Inter.SD <- effect(c("IQ.C*Work.Ethic.C"), GPA.Model.2,
xlevels=list(IQ.C=c(-14.75, 0, 14.75),
Work.Ethic.C=c(-1.1, 0, 1.1)))
# put data in data frame
Inter.SD <- as.data.frame(Inter.SD)
# Create factors of the different variables in your interaction
Inter.SD$IQ<-factor(Inter.SD$IQ.C,
levels=c(-14.75, 0, 14.75),
labels=c("1 SD Below Mean", "Mean", "1 SD Above Mean"))
Inter.SD$Work.Ethic<-factor(Inter.SD$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))
# Plot this bad boy!
Plot.SD<-ggplot(data=Inter.SD, aes(x=Work.Ethic, y=fit, group=IQ))+
geom_line(size=1, aes(color=IQ))+ #Can adjust the thickness of your lines
geom_point(aes(colour = IQ), size=2)+ #Can adjust the size of your points
geom_ribbon(aes(ymin=fit-se, ymax=fit+se),fill="gray",alpha=.6)+ #Can adjust your error bars
ylim(0,4)+ #Puts a limit on the y-axis
ylab("GPA")+ #Adds a label to the y-axis
xlab("Work Ethic")+ #Adds a label to the x-axis
ggtitle("Standard Deviation Plot")+ #Title
theme_bw()+ #Removes the gray background
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
legend.key = element_blank())+ #Removes the lines
scale_fill_grey()
Plot.SD
** Note - the error bars in this graph are very hard to see, because we have very little error in our simulated data set. For a full APA style graph, error bars would be expected.
IQ x Gender (Male/Female) as predictors of GPA
Now that we have gone through one full example of regression interactions, the next two sections should be a bit easier. This upcoming section is going to look at how you would run/plot a regression with 1 continuous predictor variable and 1 categorical predictor variable.
Going off of our last example, let’s say we now want to investigate how work ethic interacts with gender (as a categorical variable). Things get slightly trickier… Let’s check it out!
#Once again, we are going to begin by simulating our data
#Remember, your seed can be set to anything!
set.seed(140)
#Staying with 250 participants for consistency's sake
N <- 250
#Uniform distribution of work ethic (X) from 1-5 (1 = poor work ethic, 5 = great work ethic)
X <- rnorm(n, 2.75, .75)
#Our newest variable, G, is a binary variable (0,1) for gender
#We are asking the computer to create a dataset of 0s and 1s and call it variable G
G <- sample(rep(c(0,1),N),N,replace = FALSE)
#This is our equation to create Y
Y <- .7*X + .3*G + 2*X*G + rnorm(n, sd = 5)
#Gotta cap our Y variable at 4 (because it is GPA)
Y = (Y - min(Y)) / (max(Y) - min(Y))*4
#Finally, let's put all our variables into a data frame
#This is basically telling the computer "put all these variables I just made into one data set"
GPA.Data.2<-data.frame(GPA=Y, Work.Ethic=X, Gender=G)
#Don't forget to center our continuous variable!
GPA.Data.2$Work.Ethic.C <- scale(GPA.Data$Work.Ethic, center = TRUE, scale = FALSE)[,]
Here is where things get a little different..
What is Dummy Coding?
In this particular case, since there are only two levels of the variable Gender (male and female), it is quite a simple dummy code of 0, 1. All males in the data set are assigned a 0 and all females are assigned a 1.
Previously, I wrote that R dummy codes automatically. While we get the 0s and 1s automatically, it is far more intuitive to rename our factor to something that makes more sense.
GPA.Data.2$Gender.F <- factor(GPA.Data.2$Gender,
level=c(0,1),
labels=c("Male","Female"))
Use lm() function to run model with and without interaction
Use stargazer() to visualize your results
GPA.2.Model.1 <- lm(GPA~Work.Ethic.C+Gender.F, GPA.Data.2)
GPA.2.Model.2 <- lm(GPA~Work.Ethic.C*Gender.F, GPA.Data.2)
library(stargazer)
stargazer(GPA.2.Model.1, GPA.2.Model.2,type="html",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE)
Dependent variable: | ||
GPA | ||
Main Effects | Interaction | |
(1) | (2) | |
Constant | 1.540*** | 1.539*** |
(0.063) | (0.063) | |
Work.Ethic.C | 0.136** | 0.175** |
(0.060) | (0.081) | |
Gender.FFemale | 0.570*** | 0.570*** |
(0.087) | (0.087) | |
Work.Ethic.C:Gender.FFemale | -0.087 | |
(0.122) | ||
Observations | 250 | 250 |
R2 | 0.161 | 0.163 |
Adjusted R2 | 0.154 | 0.153 |
Residual Std. Error | 0.685 (df = 247) | 0.686 (df = 246) |
F Statistic | 23.740*** (df = 2; 247) | 15.965*** (df = 3; 246) |
Note: | p<0.1; p<0.05; p<0.01 |
Let’s go right into creating our interaction!
Keep in mind, we already turned Gender into a Factor with labeled levels, so we can refer to the actual names of the levels (instead of numbers)
library(effects)
#Our interaction
Inter.GPA.2 <- effect('Work.Ethic.C*Gender.F', GPA.2.Model.2,
xlevels=list(Work.Ethic.C = c(-1.1, 0, 1.1)),
se=TRUE, confidence.level=.95, typical=mean)
#Put data in data frame
Inter.GPA.2<-as.data.frame(Inter.GPA.2)
#Create factors of the interaction variables
Inter.GPA.2$Work.Ethic<-factor(Inter.GPA.2$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))
Inter.GPA.2$Gender<-factor(Inter.GPA.2$Gender.F,
levels=c("Male", "Female"))
#Note: when I create this Gender factor, I will no longer use ".F" so I don't have to rename my legend in my plot
library(ggplot2)
#Plot it up!
Plot.GPA.2<-ggplot(data=Inter.GPA.2, aes(x=Work.Ethic, y=fit, group=Gender))+
coord_cartesian(ylim = c(0,4))+
#For ylim, specify the range of your DV (in our case, 0-4)
geom_line(size=2, aes(color=Gender))+
ylab("GPA")+
xlab("Work Ethic")+
ggtitle("Work Ethic and Gender as Predictors of GPA")+
theme_bw()+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank())+
scale_fill_grey()
Plot.GPA.2
#### Interpretation of Continuous x Categorial Interaction Plot As you can see, there is not much of an interaction, which we would expect after seeing that our interaction effect was insignificant.
Tutors and Gender as Predictors of GPA
For the final example of the chapter, we are going to look at plotting interactions with 2 categorical predictors. We know that students differ in their access to/use of tutoring and it would be interesting to see how Gender interacts with tutoring services.
Students in this study either have:
#Set up simulation
set.seed(244)
N <- 250
Q <- sample(rep(c(-1,0,1),N),N,replace = FALSE) #Q = Tutor Status
G <- sample(rep(c(0,1),N*3/2),N,replace = FALSE) #G = Gender
#Our equation to create Y
Y <- .5*Q + .25*G + 2.5*Q*G+ 1 + rnorm(N, sd=2)
#Put a cap on our Y
Y = (Y - min(Y)) / (max(Y) - min(Y))*4
#Build our data frame
GPA.Data.3<-data.frame(GPA=Y,Tutor=Q,Gender=G)
Similar to the last example, we are going to now create factors with dummy codes. This time, however,we need to do this for BOTH predictor variables (gender & tutor) because we have 2 categorical variables.
GPA.Data.3$Tutor.F <- factor(GPA.Data.3$Tutor,
level=c(-1,0,1),
labels=c("No Tutor", "Group Tutor", "Private Tutor"))
GPA.Data.3$Gender.F <- factor(GPA.Data.3$Gender,
level=c(0,1),
labels=c("Male", "Female"))
Once again, we look at both our main effects model & interaction model and use stargazer to compare the two models.
GPA.3.Model.1<-lm(GPA ~ Tutor.F+Gender.F, data = GPA.Data.3)
GPA.3.Model.2<-lm(GPA ~ Tutor.F*Gender.F, data = GPA.Data.3)
stargazer(GPA.3.Model.1, GPA.3.Model.2,type="html",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=TRUE,
notes.append = FALSE,
omit.stat=c("ser"),
star.cutoffs = c(0.05, 0.01, 0.001),
header=FALSE)
Dependent variable: | ||
GPA | ||
Main Effects | Interaction | |
(1) | (2) | |
Constant | 1.500*** (0.076) | 1.739*** (0.081) |
Tutor.FGroup Tutor | 0.421*** (0.099) | 0.285* (0.127) |
Tutor.FPrivate Tutor | 0.970*** (0.095) | 0.407*** (0.116) |
Gender.FFemale | 0.120 (0.079) | -0.467*** (0.127) |
Tutor.FGroup Tutor:Gender.FFemale | 0.410* (0.180) | |
Tutor.FPrivate Tutor:Gender.FFemale | 1.250*** (0.173) | |
Observations | 250 | 250 |
R2 | 0.309 | 0.436 |
Adjusted R2 | 0.301 | 0.424 |
F Statistic | 36.721*** (df = 3; 246) | 37.693*** (df = 5; 244) |
Note: | p<0.05; p<0.01; p<0.001 |
#The Interaction
Inter.GPA.3 <- effect('Tutor.F*Gender.F', GPA.3.Model.2,
se=TRUE)
#Data Frame
Inter.GPA.3.DF<-as.data.frame(Inter.GPA.3)
# Relable them to put them back in order
Inter.GPA.3.DF$Tutor.F <- factor(Inter.GPA.3.DF$Tutor,
level=c("No Tutor", "Group Tutor", "Private Tutor"),
labels=c("No Tutor", "Group Tutor", "Private Tutor"))
Inter.GPA.3.DF$Gender.F <- factor(Inter.GPA.3.DF$Gender,
level=c("Male", "Female"),
labels=c("Male", "Female"))
#Create plot
Plot.GPA.3<-ggplot(data=Inter.GPA.3.DF, aes(x=Tutor.F, y=fit, group=Gender.F))+
geom_line(size=2, aes(color=Gender.F))+
geom_ribbon(aes(ymin=fit-se, ymax=fit+se,fill=Gender.F),alpha=.2)+
ylab("GPA")+
xlab("Tutor")+
ggtitle("Tutors and Gender as GPA Predictors")+
theme_bw()+
theme(text = element_text(size=12),
legend.text = element_text(size=12),
legend.direction = "horizontal",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="top")
Plot.GPA.3
A final little note… There are definitely easier ways to make plots in R, but I want to show you with this final example the difference between using effects/ggplot and simpler code. I will say, it is helpful to use these simple codes as you are working through your analysis to visualize your data, but in terms of publishing your data, ggplot will give you the quality you need!!
plot(Inter.GPA.3, multiline = TRUE)