1 WHAT IS R?

R is a programming language and environment for statistical computing and graphics. The software is free and open-source.

R provides wide variety of statistical and graphical techniques, and is highly extensible (i.e. designed so that users or developers can expand or add to its capabilities). There are many user written packages.

The following is pulled from the “About R” section of the R Project’s website (https://www.r-project.org/about.html)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

1.1 R can be used in many ways

Download and apply GUIs (graphical user interface; provides a more user friendly environment) to R language. –> Makes it look like SPSS –> No programming needed –> This way of doing things is useful as a first exposure

Learn and use the syntax of specific packages (a common way non-programmers use R) –> Packages are groups of function and help files. Someone else has already “programmed” the solution, and you are using it because it fits your needs well enough. –> Each package has its own syntax, constraints, and bugs –> Packages are all open source

Learn R-programming –> Write your own functions –> Hardest to learn, but gives maximum flexibility

2 What is R Studio?

R Studio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, and tools for plotting, history, debugging and workspace management. It makes R easier to use, as it is a relatively user-friendly environment. And free!

2.1 Where to download R and R Studio?

R https://cran.r-project.org/bin/windows/base/

See https://cran.r-project.org/doc/FAQ/R-FAQ.html for information about the current version of R, and how it can be obtained and installed.

R Studio https://www.rstudio.com/products/rstudio/download3/ or https://www.rstudio.com/products/rstudio/download/preview/

See https://www.rstudio.com/products/RStudio/ for more information about R Studio

2.2 Basics of Using R Studio

*For a more in-depth introduction to R Studio, see http://dss.princeton.edu/training/RStudio101.pdf

2.2.1 R Studio Elements

Drop-down Menus: Drop down menus at the top ofthe window provide menu options that allow you to maneuver in R Studio a bit without writing out lines of code. As might be expected, the options are limited in comparison to what you can code yourself, but they may feel more user-friendly for those not used to coding, and can provide useful shortcuts.

Text Editor: This is where you write in the script you want to run. You can type multiple lines of code into the text editor without having each line evaluated by R.

Pressing “enter” will move the cursor to the next line, but will not command the code to run.

To run a line or block of code: PC users: ctrl+enter

Mac users: apple+enter

Putting “#” in front of a line means code won’t run; instead, allows you comment Common pitfall: using a capital letter in some lines of code and not in others (e.g., “Time1” in some places and “time1” in others). Capitalization must be consistent Spaces between commands, however, don’t matter.

*directions to follow all involve commands you will enter into the text editor.

Console: You can type commands in the console as you do in the text editor, as well as see output.

Environment The Environment tab stores and shows all of the active objects, values, and functions (and anything else you create) during your R session.

2.3 To set the working directory

The “working directory” is the specific location - i.e. the filepath - where you want R to read in data or files from and write out results to.

If you don’t specify the working directory at the beginning of your session, you run the risk of pulling from the wrong dataset, or missing the dataset, and saving out to the wrong location and therefore losing your results.

In PCs, you can find the filepath by left-clicking on the bar at the top of the folder you want to designate as your working directory.

In Macs, you can find the filepath by selecting the folder you want, hitting Command+i or right-clicking on the folder and selecting “Get Info,” then clicking on “Where” under “General.”

# set the working directory with the following function: setwd(filepath)

#setwd('C:/Users/PsychologyDept/Desktop') 
# This example shows how to set the working directory in R to the folder "Downloads" within the folder "PsychologyDept," which in turn is in the folder "Users" on the C drive.

#Note that you must use the forward slash / or double backslash \\ in R. The Windows format of single backslash will not work.

#You can check what your working directory is by using the following function:

#getwd()

More information on working directories and workspaces can be found at: https://support.rstudio.com/hc/en-us/articles/200711843-Working-Directories-and-Workspaces

Data in R can be stored in a few different forms. Numerics is the default computational type of data. Both integers and decimal values will initially be read as numeric, though you can specify that you want to treat your data as Integer data subsequently (see Chapter 2 for more details) Logicals, as described below, set up true/false comparisons between variables.
Character data, or string data, means the data is treated as a sequence of symbols to which we don’t necessarily attach other meaning. So, for example, words would be stored in R as character or string data.

See Chapter 2 for more information on data types and how to modify them when needed.

R’s data structures include vectors, matrices, arrays, lists, and data frames. Vectors, matrices, and data frames will be introduced below, after a few other introductory bits and pieces.

3 R the calculator

In its most basic form, R can be used as a simple calculator.

# Addition
5 + 4 
## [1] 9
# Subtraction
5 - 4 
## [1] 1
# Multiplication
3 * 4
## [1] 12
# Division
(5 + 5) / 2 
## [1] 5
# Exponentiation
5^2
## [1] 25
#Note: console in R Studio shows results of work you're doing, i.e. "prints" answers.

#If you create a new object, you can see it in Console by typing what you have called it, after you create it. 
#See c.sq e.g. in Part 3.1

4 Creating Variables

#To name a variable, use <-
  #E.g. to create a variable named "Pirate.1" that is just a single number, do:
Chocolate <- 4

#See how things are stored in console, and in values, as you create variables
#You type:
  Sugar = 5
  Sugar <- 5
  Flour <- 2*Sugar
  Cookie<-Sugar*Flour

#What does your Console show?
#What does your Values tab (where it is keeping track of objects) show?

#If you then type b and result.1 it will print it, i.e. show value in console
#Type:
  Flour
## [1] 10
  Cookie
## [1] 50
#What does your Console show?

WARNING: DON’T name new variables that refer to preexisting functions. For example, “mean” is a preexisting function. To check if something is already a function, type name in parentheses and hover cursor over what you just typed, see if variable name appears to populate it.

#hover cursors over the two lines below
(mean)
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x00000000177ef5d0>
## <environment: namespace:base>
(median)
## function (x, na.rm = FALSE, ...) 
## UseMethod("median")
## <bytecode: 0x0000000013f31978>
## <environment: namespace:stats>

5 Logicals

Logicals are a variable type used to compare variables. See Chapter 2 for more information.

3 == 3
## [1] TRUE
#== means equals
3 != 4 
## [1] TRUE
#!= means not equals
3 < 2
## [1] FALSE
!(3 < 2)
## [1] TRUE
#What does your Console show?

6 Vectors

A vector is a sequence of data elements of the same basic type. See http://www.r-tutor.com/r-introduction/vector for a longer introduction to vectors in R Studio, and see Chapter 2 for more in depth discussion of vector types.

#Make A vector

Ice.Cream <- c(4,2,6,3,7,4,1,9,1,6) #Correct. This gives us a vector of 10 data elements that are all numerical values.

# Brussel.Sprouts <- 4,2,6,3,7,4,1,9,1,6      #gives an error
# Kale <- 4 2 6 3 7 4 1 9 1 6       #gives an error
#(the error-giving commands are commented out so script can run, but you can remove # at beginning of line to see what happens when you try to run them)


#So what does "c()" do? Combines values into a vector or list.
#to find out what something doesin R, can look it up in R like this:
?c

#another way of doing it:
(Brownie <- c(0,2,0,4,0)) #the () around the "call" causes it to appear without the need to call (i.e. use) Brownie
## [1] 0 2 0 4 0

7 Simple vectors manipulation

#vector X Vector has to be the same length such as this: 

Sundae <- Ice.Cream * Ice.Cream
Sundae
##  [1] 16  4 36  9 49 16  1 81  1 36
#What does console show after you input "c.sq" ?

#Vector can also be a multiple
length(Ice.Cream)/ length(Brownie)
## [1] 2
Ice.Cream * Brownie
##  [1]  0  4  0 12  0  0  2  0  4  0
#Note: multiplies first value by first value, second value by second value, third value by third value. 
#When one vector runs out of values, starts again with first value

7.1 Matrix Math

Snack <- matrix(1:12, ncol = 3, nrow = 4)
Snack
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
#see how this created a matrix with values from 1-12, with 3 columns and 4 rows

t(Snack) #transpose a matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
#see how now there are 4 columns and 3 rows

Snack*Snack #multiply the matrix by its self
##      [,1] [,2] [,3]
## [1,]    1   25   81
## [2,]    4   36  100
## [3,]    9   49  121
## [4,]   16   64  144
#see how each cell was multiplied by itself

7.1.1 Vectors x matrices

#vector of 1
Bite <-2 
Big.Snack<- Snack * Bite
#i.e. 'Snack' matrix*2

#vector of 2
Bites <-c(2,4)
Bigger.Snack<-Snack * Bites
#i.e. 'Snack' matrix variables are multiplied by 2, then 4, then 2, then 4, etc.


#Look at the data tables that were created in a separate window, or by printing new matrices
Big.Snack
##      [,1] [,2] [,3]
## [1,]    2   10   18
## [2,]    4   12   20
## [3,]    6   14   22
## [4,]    8   16   24
Bigger.Snack
##      [,1] [,2] [,3]
## [1,]    2   10   18
## [2,]    8   24   40
## [3,]    6   14   22
## [4,]   16   32   48

8 Working with built in functions

#Reminder of Ice.Cream (a vector we built earlier)
Ice.Cream
##  [1] 4 2 6 3 7 4 1 9 1 6
#check out these built in functions!
mean(Ice.Cream)
## [1] 4.3
median(Ice.Cream)
## [1] 4
var(Ice.Cream)
## [1] 7.122222
sd(Ice.Cream)
## [1] 2.668749
summary(Ice.Cream)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.25    4.00    4.30    6.00    9.00
#a list of useful built in functions
# http://www.sr.bham.ac.uk/~ajrs/R/r-function_list.html

#builtins() # List all built-in functions

#?NA        # Help page on handling of missing data values
#abs(x)     # The absolute value of "x"
#append()   # Add elements to a vector
#c(x)       # A generic function which combines its arguments 
#cbind()    # Combine vectors by row/column (cf. "paste" in Unix)
#diff(x)    # Returns suitably lagged and iterated differences
#identical()  # Test if 2 objects are *exactly* equal
#jitter()     # Add a small amount of noise to a numeric vector
#length(x)    # Return no. of elements in vector x
#paste(x)     # Concatenate vectors after converting to character
#range(x)     # Returns the minimum and maximum of x
#rep(1,5)     # Repeat the number 1 five times
#rev(x)       # List the elements of "x" in reverse order
#seq(1,10,0.4)  # Generate a sequence (1 -> 10, spaced by 0.4)
#sign(x)        # Returns the signs of the elements of x
#sort(x)        # Sort the vector x
#order(x)       # list sorted element numbers of x
#unique(x)      # Remove duplicate entries from vector


#quick and dirty plots
#hist(x)    #histogram
#plot(x,y)  #scatter

8.1 Passing arguments

“Passing an argument” to a function means that you are providing inputs that specify certain parameters regarding how that function is computed. Arguments can represent data, or parameters that specifies the action of the function.

Take note! To use a built in function, follow “help” for each one you want to use in order to figure out what the default arguments are. Then you can decide whether you want to adjust defaults.

?mean
#see "Arguments" section of description in the Help section that appears for what the arguments below can do if set differently. Example below of how you could pass "trim" argument to the mean function.

mean(Ice.Cream, trim = 0, na.rm = FALSE) #default settings
## [1] 4.3
# trim and na.rm are arguments that are passed to the mean function, meaning that they each have instructions on how to compute the mean of the Ice.Cream data.

?trim
## No documentation for 'trim' in specified packages and libraries:
## you could try '??trim'
?na.rm
## No documentation for 'na.rm' in specified packages and libraries:
## you could try '??na.rm'
# you can change the specifications for these arguments
mean(Ice.Cream, trim = 0.5)
## [1] 4

You can also write your own function (a “custom” function), and pass multiple data vectors to it. You can name your function anything you like. It’s helpful to create a “return statement.” This specifies what you want returned to you when you run the function.

In the case below, a function called “Yummy.test” is created which comprises the more complex function defined in the {}. It calculates the mean of x and multiplies it by the mean of y, and divides the whole thing by the sum of y.

Yummy.test <- function(x,y) {
  result<- mean(x) * mean(y) /  sum(y)
  return(result)
}

Yummy.test(Ice.Cream,Brownie)
## [1] 0.86

9 Dataframes

Data frames are like tables in a relational database, and are a primary way data are stored in R. A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case. Technically, in R a data frame is a list of column vectors.

Let’s build a data frame!

#build some vectors of equal length

Ice.Cream <- c(4,2,6,3,7,4,1,9,1,6)
Gummy.Bears <- c(7,3,5,3,9,2,0,0,4,8)
Potato.Chips <- c(5,0,8,1,2,3,9,4,9,2)

snacky.dataframe = data.frame(Ice.Cream, Gummy.Bears, Potato.Chips)       
# snacky.dataframe is a data frame

?data.frame

9.1 Attributes

Any data that you enter into R can be given an attribute. An attribute is a characteristic of your data, and you can find that information and change it. For example, The names and the dimensions of matrices and arrays are stored in R as attributes of the object. These attributes can be seen as labeled values you can attach to any object.

To see all the attributes of an object, you can use the attributes() function.

?attributes

attributes(Bigger.Snack)
## $dim
## [1] 4 3
?dim


attributes(snacky.dataframe)
## $names
## [1] "Ice.Cream"    "Gummy.Bears"  "Potato.Chips"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $class
## [1] "data.frame"
?names
?row.names
?class


str(Bigger.Snack)
##  num [1:4, 1:3] 2 8 6 16 10 24 14 32 18 40 ...
str(snacky.dataframe)
## 'data.frame':    10 obs. of  3 variables:
##  $ Ice.Cream   : num  4 2 6 3 7 4 1 9 1 6
##  $ Gummy.Bears : num  7 3 5 3 9 2 0 0 4 8
##  $ Potato.Chips: num  5 0 8 1 2 3 9 4 9 2
?str

#note that when you see the structure of the vector, and then of the dataframe, it shows that all variables are stored as numeric. See Chapter 2 for more details on how to modify.
