.pull-right[
```{r, echo=FALSE}
cowsay::say("Isn't R cool?!",
by = "cow")
```
]

---
## Installing Packages: A Warning
R packages are typically high quality and trustworthy. However, even the best packages contain bugs!
.footnote[[1] It is good practice to cite package names and versions in your manuscripts. See `?citation` and `?packageVersion` for help]
--
.pull-left[
Also, because *anybody* can write an R package, you might find yourself using a package that is not well built or, even worse, contains malicious code
]
.pull-right[
.pull-left[
```{r}
1 + 2 * 3 / 4^5
```
]
.pull-right[
```{r}
`+`(1, `/`(`*`(2, 3), `^`(4, 5)))
```
]

---
## Another Nested Function Example
Suppose you want to summarize the `mtcars` dataframe by the numbers of cylinders in each car (4, 6, or 8), you only want to do this for the cars with more than 100 horsepower, and you want to add a column that converts miles per gallon (mpg) to kilometers per liter (kpl):
```{r}
transform(aggregate(formula = . ~ cyl,
data = subset(mtcars, hp > 100),
FUN = function(x) round(mean(x), 2)),
"kpl" = mpg*0.425144)
```
Although this code works, it takes a lot of energy to read it and understand what's going on
---
.pull-left[
```{r}
swiss$Catholic %<>% sqrt()
```
]
---
# Base R Pipe: `|>`
Recently (May 18, 2021) [R version 4.1.0 was released](https://stat.ethz.ch/pipermail/r-devel/2021-May/080724.html) with a base R pipe operator: `|>`
`|>` works in much the same way as `magrittr::%>%`, except that `.` cannot be used to reference the `lhs` object:
```{r, error=TRUE}
10 %>% sample(1:5, ., TRUE)
10 |> sample(1:5, ., TRUE)
```
Instead, you have to create an **anonymous function**
```{r}
10 |> {function(x) sample(1:5, x, TRUE)}()
```
---
# The Anonymous Function
.smallish[
An **anonymous function** (also known as a **lambda** function) differs from all other functions in that it __does not have a name__ . Anonymous functions are usually arguments to a higher-order (parent) function.
Up until May 18, 2021, anonymous functions were created the same way all other functions are created: the `function()` function. However, many people thought these was too verbose and, following the lead of other programming languages, made a shorthand for `function()`: `\()`
*Technically* `\()` can be used to write all your functions. For example:
```{r, eval=FALSE}
# These are equivalent
addOne <- function(x) return(x + 1)
addOne <- \(x) return(x + 1)
```
In practice, however, `\()` will only be used to write anonymous functions, and you should follow that custom (for code readability)
.footnote[*Note.* If you are worried about your code being backwards compatible (e.g., because you are collaborating with a large group), avoid using `|>` and `\\()` for a little while]
]
---
# Debugging Functions
.smallish[
```{r, error=TRUE}
errorFunction <- function(){
a <- "a"
b <- "b"
stop("The error occurs here.")
c <- "c"
}
errorFunction()
```
Often it won't be clear what is causing a certain bug (unlike the above example). Sometimes it's a typo in your code, sometimes its an invalid argument.
To debug a function use the `debug()` function.
```{r, eval=FALSE}
debug(errorFunction)
```
When you are finished debugging, use `undebug()` so you won't go into debug mode every time the function is called
```{r, eval=FALSE}
undebug(errorFunction)
```
]
---
# Recursive functions
.smallish[
A **recursive function** is a function that calls itself. Recursive functions are useful in situations where problems can be broken down into smaller, repetitive problems, or when you need to iterate over arbitrarily nested objects.
```{r}
myFactorial <- function(number){
if(number == 0){
return(1)
} else{
return(number * myFactorial(number-1))
}
}
myFactorial(5)
```
This evaluated as:
1. `1`
2. `2 * myFactorial(1)`
3. `3 * myFactorial(2)`
4. `4 * myFactorial(3)`
5. `5 * myFactorial(4)`
]
---
# Making an Operator
Now that we know how to make our own functions we can make our own operators. These operators are known as **infix operators** because they are placed *between* arguments. `+`, `-`, `*`, `/`, `%*%`, `%in%`, etc. are all infix operators.
******
### An example:
.smallish[
Many programming languages have shorthand operators for incrementing and decrementing variables:
- `+=` (add the rhs to the lhs: `lhs <- lhs + rhs`)
- `-=` (subtracts the rhs from the lhs: `lhs <- lhs - rhs`)
- `++` (adds one to a variable: `lhs <- lhs + 1`)
- `--` (subtracts one from a variable: `lhs <- lhs - 1`)
These are very useful in loops:
```{r, eval=F}
count <- 0
while(count < 10){
count++ # instead of count <- count + 1
print(count)
}
```
]
---
.smallish[
Unfortunately R doesn't come with these operators. But we can make our own very easily!
```{r}
`%+=%` <- function(lhs, rhs){
# Evaluates the expression in the parent frame
# `substitute()` needed so the expression does
# not run in the eval.parent() call
eval.parent(substitute(lhs <- lhs + rhs))
}
`%-=%` <- function(lhs, rhs){
eval.parent(substitute(lhs <- lhs - rhs))
}
```
.pull-left[
```{r}
value <- 0
value %+=% 5
print(value)
value %+=% 5
print(value)
```
]
.pull-right[
```{r}
value <- 20
value %-=% 5
print(value)
value %-=% 5
print(value)
```
]
]
---
# What about not `%in%`?
.smallish[
Recall that the `%in%` operator returns a vector of the positions of the lhs vector that are in the rhs vector:
```{r}
1:5 %in% 1:3
```
We can inverse this to get the opposite, but it is a bit hard to read:
```{r}
!1:5 %in% 1:3
```
We can *invert* or **negate**^{1} `%in%` to get a "not in" operator:
```{r}
`%!in%` <- Negate(`%in%`)
1:5 %!in% 1:3
```
.footnote[[1] `Negate()` produces logical negations of *functions*, inverting their output. For example: `is.not.numeric <- Negate(is.numeric)`]
]
---
class: inverse
# Classes and Methods
---
# Classes
Objects in R are **instances** of one or more **classes**. A class defines the behavior of an object.
.smallish[
To get the class of an object, use the `class()` function:
```{r}
class(1:10)
class(letters)
class(mean)
class(mtcars)
```
]
---
# Methods
A **method** is a function associated with a specific class. There are many **generic functions** in R which change their behavior depending on the class of the object which it is passed.
Methods are denoted by `.classname` after the generic function name. For example, let's take a look at the `summary` generic function, which has `r length(methods(summary))` methods:
```{r}
head(methods(summary))
```
This means that when summary is passed an object of class `aov` (`print.aov`) it works differently than when it is passed a `data.frame` (`print.data.frame`)
---
# Making Our Own Method
Making your own method is just like making your own function, except you need to name the function accordingly: `generic.class()`
To assign an object a class, use the `class()` function
```{r}
string <- "Please print me!"
class(string)
print(string)
class(string) <- "refuseprint"
class(string)
```
---
```{r, eval=FALSE}
string <- "Please print me!"
```
```{r}
# print() method for objects of r class `refuseprint`
print.refuseprint <- function(x){
print("I refuse to print!!!")
}
# Notice that I don't need to call `print.refuseprint()`
# R knows what to do!
print(string)
```
******
**Side Note:** This is why it is generally frowned upon to name objects using dot notation (e.g., `day.one`, `participant.ID`). The `.` actually means something, so it's best to reserve it for its purpose!
---
class: inverse
# Revisiting Loops
###`apply()` et al.
- `apply()`
- `lapply()`
- `sapply()`
- `mapply()`
- `tapply()`
- `replicate()`
- `sweep()`
---
# Disclaimer
The `apply` family of functions (`*apply`) offer a different way to loop in R
--
Some people argue that these functions are faster (to write and also to execute) than `for` loops.
--
1. `*apply` is [not faster to execute](https://faculty.washington.edu/tlumley/b514/R-fundamentals.pdf) than a `for` loop, generally speaking
2. `*apply` may be faster to write (but may also not be)
******
### Advantages of `*apply`
--
- You do not need to pre-allocate
--
- In some cases they *may* be faster than for loops (and in other cases they may not be)
--
- In some case they're easier to read (and sometimes they are not)
--
**Bottom line**: Use the tool that (a) makes most sense for your problem, (b) works for you and your collaborators, and (c) you feel confident with
---
## `lapply()`
.smallish[
```{r, eval=FALSE}
lapply(X, FUN, ...)
```
`lapply()` iterates over `X` (a vector, list, or columns of a data frame), applies the `FUN` function to each element, and __returns a list__. The `...` argument allows you to pass additional arguments into `FUN`
.pull-left[
```{r}
# Vector ðŸ‘‡ anonymous function #<<
lapply(1:3, function(x) x^2)
```
You can also use a named function ðŸ‘‡
```{r, eval=FALSE}
square <- function(x) x^2
lapply(1:3, square)
```
]
.pull-right[
```{r}
# List #<<
myList <- list("One" = 1:10,
"Two" = lm(qsec ~ hp, mtcars),
"Three" = mtcars)
lapply(myList, function(x) class(x))
```
]
]
---
class: code-overflow
## `...` in `lapply`
.smallish[
The `...` in `lapply` allow you to supply additional arguments to `FUN`.
For example, let's take the mean across each column of mtcars:^{1}
```{r, eval=T, echo=F}
mtcars[1, ] <- NA
```
.pull-left[
```{r}
lapply(mtcars, mean)
```
]
.pull-right[
```{r}
lapply(mtcars, mean, na.rm = T)
```
]
```{r, eval=F}
# Equivalent to:
lapply(mtcars, function(x) mean(x, na.rm = T))
```
```{r, eval=T, echo=F}
data(mtcars)
```
.footnote[[1] I introduced some `NA`s into `mtcars` for this example]
]
---
## `sapply()`: Simple `lapply()`
.smallish[
A downside of `lapply()` is that lists can be hard to work with and are also less common than other data types (vectors, dataframes, matrices). `sapply()` **s**implifies the output by returning a vector or a matrix
`sapply()` is a **wrapper** for `lapply()`, which means that it calls `lapply()` itself, then does some extra work for you
```{r, eval=FALSE}
sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = FALSE)
```
- `X`, `FUN`, and `...` are the same as in `lapply()`
- `simplify`: if `TRUE`, returns a vector or matrix (whichever is most appropriate), if `FALSE` returns a list
- `USE.NAMES`: if `TRUE` and `X` is character, use `X` as names for result
]
******
.small[
.pull-left[
```{r}
lapply(1:3, function(x) x^2)
```
]
.pull-right[
```{r}
sapply(1:3, function(x) x^2)
```
]
]
---
## `apply()`
`apply()` iterates over the margins of an array^{1} (or matrix or dataframe)
```{r, eval=FALSE}
apply(X, MARGIN, FUN, ..., simplify = TRUE)
```
- `X`: an array (or matrix or dataframe)
- `MARGIN`: a vector specifying the subscripts that the function will be applied over
- `1` = rows
- `2` = columns
- `c(1, 2)` = rows and columns
- `...`: additional arguments to `FUN`
- `simplify`: if `TRUE` results are simplified to a vector, matrix, or dataframe (whichever is appropriate), if `FALSE` a list is returned
.footnote[[1] an `array` is an object that can store data in more than 2 dimensions. We aren't talking about them in this class, but see `?array` for more info]
---
## `apply()`: Examples
```{r}
# Take the mean down all rows, across all columns
apply(mtcars, 1, mean)
```
---
## `apply()`: Examples
```{r}
# Take the mean down all rows, across all specified columns
apply(mtcars[, c("cyl", "drat", "wt")], 1, mean)
```
---
## `apply()`: Examples
```{r}
# Take the sum across all columns
apply(mtcars, 2, sum)
```
```{r}
# Add 3 to all values
mtcars_p3 <- apply(mtcars, 1:2, function(x) x + 3)
head(mtcars_p3)
```
---
## `mapply()`
.smallish[
`mapply()` is a multivariate version of `sapply()`
```{r, eval=FALSE}
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
```
- `...` is a list of arguments to iterate over
- `MoreArgs` is a `list` of other arguments to pass to `FUN`
```{r}
mapply(rep, 1:3, 7:5)
```
You can have has many `...` as you want!
```{r}
mapply(sum, 1:3, 4:6, 7:9)
```
]
---
## `replicate()`
`replicate()` is a wrapper for a special case of `sapply()` where a single expression is **replicated** repeatedly
```{r, eval=FALSE}
replicate(n, expr, simplify = "array")
```
- `n` integer: the number of replications
- `expr`: the expression (i.e., R code) to evaluate *n* times
- `simplify` used to specify desired return value
---
## `replicate()`: Example
.smallish[
`replicate()` is really good for conducting simulations because essentially all you are doing is repeatedly sampling from the *same* distrubtion
Let's simulate 10,000 samples of *n* = 300 from a uniform distribution (equal probability of all values, 0 to 10) and plot the means. **Why are these values normally distributed?**
```{r, eval=F}
replicate(10000, mean(runif(300, 0, 10))) %>% #<<
hist(main = "", xlab = "", breaks = 25)
```
```{r, fig.height=2.5, fig.width=7, fig.align='center', dev='svg', echo=F}
par(mar = c(2, 4, 1, 2) + 0.1)
replicate(10000, mean(runif(300, 0, 10))) %>% #<<
hist(main = "", xlab = "", breaks = 25)
```
]
---
## `sweep()`
`sweep()` **sweeps out** a summary statistic from an input array (typically a matrix or dataframe)
```{r, eval=FALSE}
sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
```
- `x`: an array (or matrix or dataframe)
--
- `MARGIN`: a vector of indices which correspond with `STATS` (this is typically columns [2] but can be rows [1] or both [c(1, 2)])
--
- `STATS`: the summary statistic to be swept out (typically a vector)
--
- `FUN`: the function to be used to carry out the sweep (default is to substract)
--
- `check.margin`: if `TRUE` warn if `length(STATS)` doesn't match `length(x)`
---
## `sweep()`: Example
.small[
It is often desired to *center* a variable prior to analysis. `sweep()` can be used to quickly center a bunch of columns in one call:
```{r}
mtcars_c <- sweep(x = mtcars,
MARGIN = 2,
STATS = colMeans(mtcars))
head(mtcars)
head(mtcars_c) |> round(1)
```
]
---
## `tapply()`
.smallish[
`tapply()` is used to apply a function over descrete subsets of an array
```{r, eval=FALSE}
tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
```
- `X`: an object that allows for subsetting (almost always a vector!)
- `INDEX`: a list of 1+ vectors (same length as `X`) that specify the groups to subset by
- `FUN`: the function to apply to each subset
- `...`: additional arguments to pass to `FUN`
******
`tapply()` is *very* useful for looking at descriptive statistics by group
.pull-left[
Mean miles per gallon by automatic (0) or manual (1) transmission
```{r}
tapply(mtcars$mpg, mtcars$am, mean)
```
]
.pull-right[
Mean miles per gallon by automatic (0) or manual (1) transmission *and* number of cylinders (4, 6, or 8)
```{r}
tapply(mtcars$mpg, list(mtcars$am, mtcars$cyl), mean)
```
]
]