# 1 Basics

The following chapter will include:

• Basic information about interactions and simple slopes
• Background to plotting interactions in R
• A real-life example
• Code to simulate data set
• Continuous X Continuous Regression: code and interpretation
• Nominal X Continuous Regression: code and interpretation
• Nominal X Nominal Regression: code and translation

## 1.1 What is an interaction?

Interaction: When the effect of one independent variable differs based on the level or magnitude of another independent variable

• y = A + B + A*B
• y = dependent variable
• A = independent variable
• B = independent variabile
• A*B = interaction between A and B

Click here for Jaccard & Turrisi 2003 Interaction Effects in Multiple Regression

## 1.2 What is a simple slope?

• A simple slope is a regression line at one level of a predictor variable

• Think of simple slopes as the visualization of an interaction

How do we plot these things in R?…

## 1.3 Interaction Plotting Packages

When running a regression in R, it is likely that you will be interested in interactions. The following packages and functions are good places to start, but the following chapter is going to teach you how to make custom interaction plots.

• lm() function: your basic regression function that will give you interaction terms
• stargazer package, stargazer() function: pretty summary of regression results
• rockchalk package, plotSlopes() function: quick and basic graph of simple slopes

Why can’t we just use these packages?…

## 1.4 Benefits of Custom Interaction Plots

#### 1.4.0.1 Using effects and ggplot2

• Full control over what you’re plotting (i.e. x level of x variable)
• More accurate calculations of mean/error/etc.
• Can enter in own values for confidence intervals, standard error bars, etc.
• Can customize every aspect of the graphs (color, size of text, data points)

We go from “quick & dirty” simple slope plots to “pretty & customizable” graphs

## 1.5 The Packages You Need

If you don’t already have these packages installed, use the following functions to do so:

#install.packages("car") #An extremely useful/in-depth regression package
#install.packages("stargazer") #Produces easy to read regression results (similar to what you get in SPSS)
#install.packages("effects") #We will use this to create our interactions
#install.packages("ggplot2") #Our incredibly powerful and versatile graphing package 

**One last thing before we get started…*

If the words “interaction” or “linear model” are sounding a little foreign, check out Chapter 12 for an awesome regression refresher!!

# 2 Continuous x Continuous Regression

IQ and Work Ethic as Predictors of GPA

For all the examples in this chapter, we are actually going to simulate our own data. This eliminates the need for downloading a data set / calling in data.

library(car) #Even though we already installed "car", we have to tell R we want it to load this package for us to use
#You can choose whatever # you want for the seed; this is for randomization of your data set
set.seed(150)
#Let's make our data set will have 250 participants (n), perhaps college students!
n <- 250
#Uniform distribution of work ethic (X) from 1-5 (1 = poor work ethic, 5 = great work ethic)
X <- rnorm(n, 2.75, .75)
#We want a normal distribution of IQ (Z)
#I fixed the mean of IQ to 15 so that the regression equation works realistically, SD = 15
Z <- rnorm(n, 15, 15)
#We then create Y using a regression equation (adding a bit of random noise)
Y <- .7*X + .3*Z + 2.5*X*Z + rnorm(n, sd = 5)
#This code is here so that Y (GPA) is capped at 4.0 (the logical max for GPA)
Y = (Y - min(Y)) / (max(Y) - min(Y))*4
#Finally, we put our data together with the data.frame() function
GPA.Data <- data.frame(GPA=Y, Work.Ethic=X, IQ=Z)   

## 2.2 Center your independent variables

Why do we center our variables?

• To avoid problems of multicollinearity! When a model has multicollinearity, it doesn’t know which term to give the variance to (“You gave me 3 lines that are the same!” - angry model)
• When we center our IVs, the center of each IV represents the mean
• When you interact X * Z, you are adding a new predictor (XZ) that strongly correlates with X and Z
• If you center your variables, you will now have a U-shaped interaction that is orthogonal to X and Z
• Exceptions: Don’t center physical data or when there is a true, meaningful 0
GPA.Data$IQ.C <- scale(GPA.Data$IQ, center = TRUE, scale = FALSE)[,]
GPA.Data$Work.Ethic.C <- scale(GPA.Data$Work.Ethic, center = TRUE, scale = FALSE)[,]

## 2.3 Run your regression models

Use lm() function to run model with and without interaction

• Multiplicative (interaction) effects = *

Use stargazer() to get a pretty, user-friendly chart of your results

GPA.Model.1 <- lm(GPA~IQ.C+Work.Ethic.C, GPA.Data)
GPA.Model.2 <- lm (GPA~IQ.C*Work.Ethic.C, GPA.Data)

library(stargazer)
stargazer(GPA.Model.1, GPA.Model.2,type="html",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE) 
 Dependent variable: GPA Main Effects Interaction (1) (2) Constant 2.054*** 2.054*** (0.008) (0.002) IQ.C 0.041*** 0.040*** (0.001) (0.0001) Work.Ethic.C 0.199*** 0.202*** (0.012) (0.002) IQ.C:Work.Ethic.C 0.014*** (0.0002) Observations 250 250 R2 0.959 0.998 Adjusted R2 0.959 0.998 Residual Std. Error 0.134 (df = 247) 0.026 (df = 246) F Statistic 2,888.028*** (df = 2; 247) 51,713.400*** (df = 3; 246) Note: p<0.1; p<0.05; p<0.01

When we are plotting the simple slopes of a continuous IV X continuous IV, we have to specify what levels of each we want to examine. There are 3 methods for choosing levels: hand picking, quantiles, standard deviation

For the next 3 methods, we are going to specify the centered Work Ethic IV to range from -2.5 to 2.5, increasing by .5, but for the centered IQ IV, we will show 3 different theoretical ways to choose our levels.

### 2.4.1 Plotting simple slopes: Hand Picking

• Hand picking is useful if you have specific predictions in your data set
• If you are working with IQ, a drug, or age - numbers are relevant and are useful to pick!
• For our example, let’s go with -15, 0, 15 for our centered IQ (1 SD above and below mean)
• c() will give you the exact values and seq() will give you a range from a to b, increasing by c
library(effects)
#Run the interaction
Inter.HandPick <- effect('IQ.C*Work.Ethic.C', GPA.Model.2,
xlevels=list(IQ.C = c(-15, 0, 15),
Work.Ethic.C = c(-1.1, 0, 1.1)),
se=TRUE, confidence.level=.95, typical=mean)

#Put data in data frame
Inter.HandPick <- as.data.frame(Inter.HandPick)

#Check out what the "head" (first 6 rows) of your data looks like
head(Inter.HandPick)
##   IQ.C Work.Ethic.C      fit          se    lower    upper
## 1  -15         -1.1 1.464610 0.004670705 1.455410 1.473809
## 2    0         -1.1 1.831723 0.003040883 1.825734 1.837713
## 3   15         -1.1 2.198836 0.004717661 2.189544 2.208129
## 4  -15          0.0 1.460340 0.002350381 1.455711 1.464970
## 5    0          0.0 2.054450 0.001663308 2.051174 2.057726
## 6   15          0.0 2.648560 0.002348376 2.643934 2.653185
#Create a factor of the IQ variable used in the interaction
Inter.HandPick$IQ <- factor(Inter.HandPick$IQ.C,
levels=c(-15, 0, 15),
labels=c("1 SD Below Population Mean", "Population Mean", "1 SD Above Population Mean"))

#Create a factor of the Work Ethic variable used in the interaction
Inter.HandPick$Work.Ethic <- factor(Inter.HandPick$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))

library(ggplot2)
Plot.HandPick<-ggplot(data=Inter.HandPick, aes(x=Work.Ethic, y=fit, group=IQ))+
geom_line(size=2, aes(color=IQ))+
ylim(0,4)+
ylab("GPA")+
xlab("Work Ethic")+
ggtitle("Hand Picked Plot")

Plot.HandPick 

#In R, you have to "call for" your graphs after you make them in order to see them

#Code to save plot to your computer
#ggsave("Plot.1.png", Plot.1,width = 5, height = 5, units = "in")

#### 2.4.1.1 Interpretation of Hand Picked Plot

This plot here is an example of pretty much the simplest you can get with ggplot. These are the default settings with respect to all aesthetic elements. As we go through this chapter, I will give you bits of code that will help you make your graph prettier, more colorful, or better suited for publishing.

For even more ggplot fun, refer to Chapter 10 or this awesome ggplot Cheat Sheet

In terms of what this graph is telling us, we can visualize the fact that for smart people (1 SD above the population mean (not determined by our data set), as their work ethic increases, so does their GPA. A similar pattern is seen for people with average IQs, though the effect is not nearly as strong. For people 1 SD below the population mean on IQ, as their work ethic increases, it appears as though their GPA actually decreases. Interesting! Maybe they get more confused with the material? Who knows!

### 2.4.2 Plotting simple slopes: Quantile

• Let’s use levels that are based on quantiles (bins based on probability)
• You can ask for as many or as few quantiles as you want
• Non-parametric; based on probability and does not assume normality of IV
• For this example, let’s ask for 5 quantiles and have them rounded to 2 decimal points
#Make your new IQ variable that asks for quantiles
IQ.Quantile <- quantile(GPA.Data$IQ.C, probs=c(0,.25,.50,.75,1)) IQ.Quantile <- round(IQ.Quantile, 2) IQ.Quantile  ## 0% 25% 50% 75% 100% ## -46.38 -9.94 -1.10 10.60 36.48 library(effects) #Run your interaction Inter.Quantile <- effect('IQ.C*Work.Ethic.C', GPA.Model.2, xlevels=list(IQ.C = c(-35.44, -9.78, -0.04, 9.89, 41.90), Work.Ethic.C = c(-1.1, 0, 1.1)), se=TRUE, confidence.level=.95, typical=mean) #Put data into data frame Inter.Quantile <- as.data.frame(Inter.Quantile) #Create factors of the different variables in your interaction: Inter.Quantile$IQ<-factor(Inter.Quantile$IQ.C, levels=c(-35.44, -9.78, -0.04, 9.89, 41.90), labels=c("0%", "25%", "50%", "75%", "100%")) Inter.Quantile$Work.Ethic<-factor(Inter.Quantile$Work.Ethic.C, levels=c(-1.1, 0, 1.1), labels=c("Poor Worker", "Average Worker", "Hard Worker")) FUN WITH FONTS install.packages(extrafont) I did not include this package up front, as it is totally optional! If you want to play around with different font options, install this package and load it After installation/loading, you will want to run the following code: font_import() This code can take a few minutes to run, which is why I have not included it in the coded section of this chapter. The last function you will need is: fonts() You will get a list of all the fonts accessible to you in R library(extrafont) #font_import() # run this line of coded here to install fonts library(ggplot2) Plot.Quantile<-ggplot(data=Inter.Quantile, aes(x=Work.Ethic, y=fit, group=IQ))+ geom_line(size=2, aes(color=IQ))+ ylab("GPA")+ xlab("Work Ethic")+ scale_color_manual(values=c("#42c5f4","#54f284","#f45dcc", "#ff9d35","#d7afff"))+ #custom color coding theme_bw()+ #deleting the gray background theme(text = element_text(family="Impact", size=14, color="black"))+ #changing font! ggtitle("Quantile Plot") #adding a title! Plot.Quantile #### 2.4.2.1 Interpretation of Quantile Plot So to recap the codes we learned in this plot, we now know how to change fonts, get rid of the gray background, add a title, and choose custom colors! You may be wondering where I got these funky letter/number combinations that translate into colors. If you google “html color picker”, you can copy the color code of any color your heart desires! Pretty neat!! Now, in terms of what we’re learning from this graph - we can see the interaction effects a lot more clearly. It seems as though all people in the 25th percentile or higher are experiencing some degreee of a positive relationship between work ethic and GPA. As work ethic and IQ increase, so does GPA! Unfortunately, for this group below the 25th percentile, there is a pretty clear negative relationship indicating that is their work ethic increases, their GPA actually decreases. Not good! ### 2.4.3 Plotting simple slopes: Standard Deviation • Lastly, let’s choose our levels based on the standard deviation of the data • We can select values based on the mean and SD of our data • For this example, we will do 3 values: M - 1SD, M, M + 1SD, where M = mean • Once again, we are going to round off these values at 2 decimal points with **round() • Note: because we have centered our data, M = 0 & remember, centering doesn’t change our SD • Another note: since we “hand picked” what we know to be the traditional mean and SD for IQ, these levels should look very similar to our first simple slopes graph! #Create our new variable for IQ based on the actual mean/standard deviation in our data set IQ.SD <- c(mean(GPA.Data$IQ.C)-sd(GPA.Data$IQ.C), mean(GPA.Data$IQ.C),
mean(GPA.Data$IQ.C)+sd(GPA.Data$IQ.C))

IQ.SD <- round(IQ.SD, 2)
IQ.SD
## [1] -15.29   0.00  15.29
# Note: the mean is 0 because we mean centered our data, meaning we said, make
# the mean of our data = 0! Also, we see that our standard deviations are pretty
# darn close to the expected population standard deviations. Keep in mind that
# this is simulated data, and most data in the real world will not produce such
# "typical" data
Inter.SD <- effect(c("IQ.C*Work.Ethic.C"), GPA.Model.2,
xlevels=list(IQ.C=c(-14.75, 0, 14.75),
Work.Ethic.C=c(-1.1, 0, 1.1)))
# put data in data frame
Inter.SD <- as.data.frame(Inter.SD)

# Create factors of the different variables in your interaction
Inter.SD$IQ<-factor(Inter.SD$IQ.C,
levels=c(-14.75, 0, 14.75),
labels=c("1 SD Below Mean", "Mean", "1 SD Above Mean"))

Inter.SD$Work.Ethic<-factor(Inter.SD$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))

Plot.SD<-ggplot(data=Inter.SD, aes(x=Work.Ethic, y=fit, group=IQ))+
ylim(0,4)+ #Puts a limit on the y-axis
ylab("GPA")+ #Adds a label to the y-axis
xlab("Work Ethic")+ #Adds a label to the x-axis
ggtitle("Standard Deviation Plot")+ #Title
theme_bw()+ #Removes the gray background
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
legend.key = element_blank())+ #Removes the lines
scale_fill_grey()
Plot.SD

#### 2.4.3.1 Interpretation of SD plot

** Note - the error bars in this graph are very hard to see, because we have very little error in our simulated data set. For a full APA style graph, error bars would be expected.

# 3 Continuous x Categorical Regression

IQ x Gender (Male/Female) as predictors of GPA

Now that we have gone through one full example of regression interactions, the next two sections should be a bit easier. This upcoming section is going to look at how you would run/plot a regression with 1 continuous predictor variable and 1 categorical predictor variable.

Going off of our last example, let’s say we now want to investigate how work ethic interacts with gender (as a categorical variable). Things get slightly trickier… Let’s check it out!

#Once again, we are going to begin by simulating our data
#Remember, your seed can be set to anything!
set.seed(140)
#Staying with 250 participants for consistency's sake
N <- 250
#Uniform distribution of work ethic (X) from 1-5 (1 = poor work ethic, 5 = great work ethic)
X <- rnorm(n, 2.75, .75)
#Our newest variable, G, is a binary variable (0,1) for gender
#We are asking the computer to create a dataset of 0s and 1s and call it variable G
G <- sample(rep(c(0,1),N),N,replace = FALSE)
#This is our equation to create Y
Y <- .7*X + .3*G + 2*X*G + rnorm(n, sd = 5)
#Gotta cap our Y variable at 4 (because it is GPA)
Y = (Y - min(Y)) / (max(Y) - min(Y))*4
#Finally, let's put all our variables into a data frame
#This is basically telling the computer "put all these variables I just made into one data set"
GPA.Data.2<-data.frame(GPA=Y, Work.Ethic=X, Gender=G)
#Don't forget to center our continuous variable!
GPA.Data.2$Work.Ethic.C <- scale(GPA.Data$Work.Ethic, center = TRUE, scale = FALSE)[,]

## 3.1 Dummy Coding

Here is where things get a little different..

What is Dummy Coding?

• It is the most common and basic way to analyze categorical variables in regression
• Every variable has a baseline/reference group that other “levels” get compared to
• R dummy codes automatically when it detects factor variables
• The question we are asking is: “how much does each group deviate from the reference?”

In this particular case, since there are only two levels of the variable Gender (male and female), it is quite a simple dummy code of 0, 1. All males in the data set are assigned a 0 and all females are assigned a 1.

Previously, I wrote that R dummy codes automatically. While we get the 0s and 1s automatically, it is far more intuitive to rename our factor to something that makes more sense.

• We are creating a new variable, called Gender.F, where F stands for factor
• This variable now has levels with words, instead of just 0s and 1s
• Note: it is very important that your labels are spelled right (or that you consistently spell your labels incorrectly) because you will be entering in these exact labels again when you create your interactions
GPA.Data.2$Gender.F <- factor(GPA.Data.2$Gender,
level=c(0,1),
labels=c("Male","Female"))   

## 3.2 Run your regression models

Use lm() function to run model with and without interaction

• Multiplicative (interaction) effects = *

Use stargazer() to visualize your results

GPA.2.Model.1 <- lm(GPA~Work.Ethic.C+Gender.F, GPA.Data.2)
GPA.2.Model.2 <- lm(GPA~Work.Ethic.C*Gender.F, GPA.Data.2)

library(stargazer)
stargazer(GPA.2.Model.1, GPA.2.Model.2,type="html",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE) 
 Dependent variable: GPA Main Effects Interaction (1) (2) Constant 1.540*** 1.539*** (0.063) (0.063) Work.Ethic.C 0.136** 0.175** (0.060) (0.081) Gender.FFemale 0.570*** 0.570*** (0.087) (0.087) Work.Ethic.C:Gender.FFemale -0.087 (0.122) Observations 250 250 R2 0.161 0.163 Adjusted R2 0.154 0.153 Residual Std. Error 0.685 (df = 247) 0.686 (df = 246) F Statistic 23.740*** (df = 2; 247) 15.965*** (df = 3; 246) Note: p<0.1; p<0.05; p<0.01

Let’s go right into creating our interaction!

Keep in mind, we already turned Gender into a Factor with labeled levels, so we can refer to the actual names of the levels (instead of numbers)

library(effects)
#Our interaction
Inter.GPA.2 <- effect('Work.Ethic.C*Gender.F', GPA.2.Model.2,
xlevels=list(Work.Ethic.C = c(-1.1, 0, 1.1)),
se=TRUE, confidence.level=.95, typical=mean)

#Put data in data frame
Inter.GPA.2<-as.data.frame(Inter.GPA.2)

#Create factors of the interaction variables
Inter.GPA.2$Work.Ethic<-factor(Inter.GPA.2$Work.Ethic.C,
levels=c(-1.1, 0, 1.1),
labels=c("Poor Worker", "Average Worker", "Hard Worker"))
Inter.GPA.2$Gender<-factor(Inter.GPA.2$Gender.F,
levels=c("Male", "Female"))
#Note: when I create this Gender factor, I will no longer use ".F" so I don't have to rename my legend in my plot

library(ggplot2)
#Plot it up!
Plot.GPA.2<-ggplot(data=Inter.GPA.2, aes(x=Work.Ethic, y=fit, group=Gender))+
coord_cartesian(ylim = c(0,4))+
#For ylim, specify the range of your DV (in our case, 0-4)
geom_line(size=2, aes(color=Gender))+
ylab("GPA")+
xlab("Work Ethic")+
ggtitle("Work Ethic and Gender as Predictors of GPA")+
theme_bw()+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank())+
scale_fill_grey()
Plot.GPA.2

#### Interpretation of Continuous x Categorial Interaction Plot As you can see, there is not much of an interaction, which we would expect after seeing that our interaction effect was insignificant.

# 4 Categorical x Categorical Regression

Tutors and Gender as Predictors of GPA

For the final example of the chapter, we are going to look at plotting interactions with 2 categorical predictors. We know that students differ in their access to/use of tutoring and it would be interesting to see how Gender interacts with tutoring services.

Students in this study either have:

• No Tutor
• Group Tutor
• Private Tutor

## 4.1 Data simulation

#Set up simulation
set.seed(244)
N <- 250
Q <- sample(rep(c(-1,0,1),N),N,replace = FALSE) #Q = Tutor Status
G <- sample(rep(c(0,1),N*3/2),N,replace = FALSE) #G = Gender

#Our equation to create Y
Y <- .5*Q + .25*G + 2.5*Q*G+ 1 + rnorm(N, sd=2)

#Put a cap on our Y
Y = (Y - min(Y)) / (max(Y) - min(Y))*4

#Build our data frame
GPA.Data.3<-data.frame(GPA=Y,Tutor=Q,Gender=G)  

## 4.2 Dummy coding

Similar to the last example, we are going to now create factors with dummy codes. This time, however,we need to do this for BOTH predictor variables (gender & tutor) because we have 2 categorical variables.

GPA.Data.3$Tutor.F <- factor(GPA.Data.3$Tutor,
level=c(-1,0,1),
labels=c("No Tutor", "Group Tutor", "Private Tutor"))
GPA.Data.3$Gender.F <- factor(GPA.Data.3$Gender,
level=c(0,1),
labels=c("Male", "Female"))  

Once again, we look at both our main effects model & interaction model and use stargazer to compare the two models.

GPA.3.Model.1<-lm(GPA ~ Tutor.F+Gender.F, data = GPA.Data.3)
GPA.3.Model.2<-lm(GPA ~ Tutor.F*Gender.F, data = GPA.Data.3)

stargazer(GPA.3.Model.1, GPA.3.Model.2,type="html",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=TRUE,
notes.append = FALSE,
omit.stat=c("ser"),
star.cutoffs = c(0.05, 0.01, 0.001),
header=FALSE) 
 Dependent variable: GPA Main Effects Interaction (1) (2) Constant 1.500*** (0.076) 1.739*** (0.081) Tutor.FGroup Tutor 0.421*** (0.099) 0.285* (0.127) Tutor.FPrivate Tutor 0.970*** (0.095) 0.407*** (0.116) Gender.FFemale 0.120 (0.079) -0.467*** (0.127) Tutor.FGroup Tutor:Gender.FFemale 0.410* (0.180) Tutor.FPrivate Tutor:Gender.FFemale 1.250*** (0.173) Observations 250 250 R2 0.309 0.436 Adjusted R2 0.301 0.424 F Statistic 36.721*** (df = 3; 246) 37.693*** (df = 5; 244) Note: p<0.05; p<0.01; p<0.001

## 4.4 Now for the interaction plot!

#The Interaction
Inter.GPA.3 <- effect('Tutor.F*Gender.F', GPA.3.Model.2,
se=TRUE)
#Data Frame
Inter.GPA.3.DF<-as.data.frame(Inter.GPA.3)

# Relable them to put them back in order
Inter.GPA.3.DF$Tutor.F <- factor(Inter.GPA.3.DF$Tutor,
level=c("No Tutor", "Group Tutor", "Private Tutor"),
labels=c("No Tutor", "Group Tutor", "Private Tutor"))
Inter.GPA.3.DF$Gender.F <- factor(Inter.GPA.3.DF$Gender,
level=c("Male", "Female"),
labels=c("Male", "Female"))

#Create plot
Plot.GPA.3<-ggplot(data=Inter.GPA.3.DF, aes(x=Tutor.F, y=fit, group=Gender.F))+
geom_line(size=2, aes(color=Gender.F))+
geom_ribbon(aes(ymin=fit-se, ymax=fit+se,fill=Gender.F),alpha=.2)+
ylab("GPA")+
xlab("Tutor")+
ggtitle("Tutors and Gender as GPA Predictors")+
theme_bw()+
theme(text = element_text(size=12),
legend.text = element_text(size=12),
legend.direction = "horizontal",
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="top")
Plot.GPA.3

A final little note… There are definitely easier ways to make plots in R, but I want to show you with this final example the difference between using effects/ggplot and simpler code. I will say, it is helpful to use these simple codes as you are working through your analysis to visualize your data, but in terms of publishing your data, ggplot will give you the quality you need!!

plot(Inter.GPA.3, multiline = TRUE)