Assignment 4

Please submit both the .Rmd and a .html file on Canvas.

You do not need any extra packages to do this homework (i.e., everything can be done using base R functions!)

Problem 1

Write a function called minAndMax that takes a vector of any length (x) and returns a named vector of length 2 (names = “min” and “max”) with the minimum and maximum value of that vector. Before taking the minimum and maximum values of the vector, use what you know about control structures and checking the atomic type of a variable to ensure that the vector is numeric. If the vector is not numeric, you will want to throw an error (i.e., tell the user you can’t find the maximum value of a non-numeric vector). Code to throw an error is below. Bonus points if you get the error message to inform the user which type of vector they passed to your function.

stop("`x` must be a numeric vector")

Call your function to make sure it works! First, let’s pass it a numeric vector:

minAndMax(-500:300)

##  min  max 
## -500  300

Then pass it a non-numeric vector to make sure it throws an error:

minAndMax(c("UW", "WSU", "WWU", "CWU", "EWU", "SU", "SPU"))

## Error in minAndMax(c("UW", "WSU", "WWU", "CWU", "EWU", "SU", "SPU")): `x` must be a numeric vector but you passed it a character vector.

minAndMax(c(T, T, F, F, T, T, T, T, F))

## Error in minAndMax(c(T, T, F, F, T, T, T, T, F)): `x` must be a numeric vector but you passed it a logical vector.

Problem 2

Let’s create a proper function out of the number guessing game you wrote in Assignment 2 Problem 5. Create a function called guessingGame that takes the following arguments:

min: a numeric value indicating the minimum number that can be chosen. Defaults to 1.
max: a numeric value indicating the maximum number that can be chosen. Defaults to 10.
play_again: a logical value that when TRUE prompts the user to play again after they guess correctly and when FALSE plays only once with the user (when the call the function). Defaults to TRUE.

Modify your code from assignment 2 problem 5 (or use mine, if you prefer) to guess a number between min and max and repeat only if the user says they want to be asked.

Problem 3

Trimming is a statistical procedure wherein the most extreme cases on both ends of a univariate distribution are removed. Winsorizing is similar, but the removed observations are replaced with values that correspond with chosen quantiles. When trimming/winsorizing, you have to decide which quantiles you want to trim/winsorize by. For example, consider the following vector:

observations <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

A 10% trim (keeping everything between the 5th and 95th percentile) would remove the first and last observation:

## [1] 2 3 4 5 6 7 8 9

And a 10% winsorize would result in the following:

# Observe quantiles
quantile(observations, probs = c(0.05, 0.95))

##   5%  95% 
## 1.45 9.55

##  [1] 1.45 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 9.55

Create a function called trim that takes a numeric vector of any length as input and returns that same vector trimmed or winsorized. For simplicity, assume the user is giving you a large enough vector (i.e., they won’t ask you to trim a vector of length 2). Your function should have the following arguments:

x: a numeric vector of any length
probs: numeric vector of probabilities with values in [0, 1] to be passed to quantile. Defaults to c(0.05, 0.95).
winsorize: a logical value whether to return a trimmed (FALSE) or winsorized (TRUE) vector. Defaults to FALSE.

Bonus points if you make sure that all of the arguments are valid before performing the trim.

Test to make sure trim() does what you want! Here are a few test cases. Can you think of others?

trim(1:13)

##  [1]  2  3  4  5  6  7  8  9 10 11 12

trim(1:13, probs = c(.1, .9))

## [1]  3  4  5  6  7  8  9 10 11

trim(1:13, probs = c(.1, .9), winsorize = TRUE)

##  [1]  2.2  2.2  3.0  4.0  5.0  6.0  7.0  8.0  9.0 10.0 11.0 11.8 11.8

trim(c(5, 13, 11, 8, 9, 2, 12, 10, 1, 3, 6, 4, 7))

##  [1]  5 11  8  9  2 12 10  3  6  4  7

Make sure to test that trim() throws an error when you would expect it to!

trim()

## Error in trim(): argument "x" is missing, with no default

trim(1:13, probs = c(.05, .10, .15))

## Error in trim(1:13, probs = c(0.05, 0.1, 0.15)): `probs` must be numeric vector of length 2 between 0 and 1. 'c(0.05, 0.1, 0.15)' is not a valid value.

trim(1:13, probs = c(5, 95))

## Error in trim(1:13, probs = c(5, 95)): `probs` must be numeric vector of length 2 between 0 and 1. 'c(5, 95)' is not a valid value.

trim(1:13, winsorize = "Yes please")

## Error in trim(1:13, winsorize = "Yes please"): `winsorize` must be TRUE or FALSE. 'Yes please' is not a valid value.

Problem 4

Part A

Using replicate(), modify the nickel coin flip function from Lecture 5 slide 12 to perform nflips flips. Return the results in a vector of length nflips (hint: you’ll have one argument called nflips). Create a few test cases to make sure it works as intended.

flipNickel <- function(){
  sideup <- sample(x = c("heads", "tails", "edge"),
         size = 1,
         prob = c(.5-1/6000, .5-1/6000, 1/6000))
  return(sideup)
}

flipNickel(5)

## [1] "tails" "tails" "tails" "tails" "tails"

flipNickel(100)

##   [1] "tails" "heads" "heads" "heads" "tails" "tails" "heads" "heads" "tails"
##  [10] "tails" "heads" "heads" "heads" "heads" "heads" "tails" "heads" "tails"
##  [19] "tails" "tails" "heads" "tails" "tails" "heads" "tails" "tails" "heads"
##  [28] "heads" "heads" "tails" "heads" "tails" "tails" "heads" "tails" "heads"
##  [37] "tails" "tails" "tails" "heads" "heads" "heads" "heads" "tails" "tails"
##  [46] "tails" "tails" "heads" "tails" "tails" "tails" "heads" "heads" "tails"
##  [55] "tails" "tails" "tails" "heads" "tails" "tails" "heads" "tails" "heads"
##  [64] "heads" "tails" "tails" "heads" "heads" "heads" "tails" "tails" "tails"
##  [73] "tails" "tails" "tails" "heads" "heads" "tails" "tails" "tails" "tails"
##  [82] "heads" "heads" "tails" "heads" "heads" "tails" "tails" "heads" "tails"
##  [91] "tails" "heads" "heads" "heads" "heads" "heads" "tails" "heads" "tails"
## [100] "tails"

flipNickel(-5)

## Error in integer(n): invalid 'length' argument

Part B

Modify the function from Lecture 4 slide 12 to flip the nickle nflips times. This time, however, you do not want to use any sort of looping function (no for loops, no apply functions, no replicate, etc.). Hint: the answer is probably more simple than you think! Make sure you understand what is inside the original function well.

flipNickel <- function(){
  sideup <- sample(x = c("heads", "tails", "edge"),
                   size = 1,
                   prob = c(.5-1/6000, .5-1/6000, 1/6000))
  return(sideup)
}

Problem 5

Below is a dataset from 1,867 undergrads at UW who filled out the PHQ-9 (a short measure of depression symptoms). There are 9 items of the PHQ-9 (take a look at the data however you would like to get a sense of its structure).

d_phq <- read.csv("https://adamkucz.github.io/psych548/data/PHQ_Data.csv")

Using a for loop, create a variable inside d_phq called phq_total_forloop that holds the sum of all PHQ-9 items for each row.

Using apply(), create a variable inside d_phq called phq_total_apply that holds the sum of all PHQ-9 items for each row.

Using mapply(), create a variable inside d_phq called phq_total_mapply that holds the sum of all PHQ-9 items for each row.

Using the rowSums() function, create a variable inside d_phq called phq_total_rowsums that holds the sum of all PHQ-9 items for each row.

Check that these three columns are all equal to each other

all(d_phq$phq_total_forloop == d_phq$phq_total_apply)

## [1] TRUE

all(d_phq$phq_total_apply == d_phq$phq_total_rowsums)

## [1] TRUE

all(d_phq$phq_total_rowsums == d_phq$phq_total_mapply)

## [1] TRUE

The PHQ total score has cut points that correspond with the following:

0-4 = “minimal depression”
5-9 = “mild depression”
10-14 = “moderate depression”
15-19 = “moderately severe depression”
20-27 = “severe depression”

Create a new variable in d_phq called depression_severity that takes on the values above based on the phq_total score in each row. (Hint: you do not need any sort of loop to do this, and you’ve written similar code before!)

Now use lapply() with unlist() or sapply() alone to create a named vector (same names as the phq columns in d_phq) with the mean of each column of PHQ data (phq_1 through phq_9). Call your vector phq_item_means.

Problem 6

Using tapply(), get the mean PHQ-9 total score for individuals in each category of depression_severity

##              mild depression           minimal depression 
##                     6.897351                     1.669533 
##          moderate depression moderately severe depression 
##                    11.847826                    16.631148 
##            severe depression 
##                    21.960784

Do this again to get the mean PHQ-9 Total score for individuals in a relationship (relationhip == 1) and those not in a relationship (relationship == 0).

##        0        1 
## 6.403084 6.873614