Please submit both the .Rmd and a .html file on Canvas.
You do not need any extra packages to do this homework (i.e., everything can be done using base R functions!)
Write a function called minAndMax
that takes a vector of any length (x
) and returns a named vector of length 2 (names = “min” and “max”) with the minimum and maximum value of that vector. Before taking the minimum and maximum values of the vector, use what you know about control structures and checking the atomic type of a variable to ensure that the vector is numeric. If the vector is not numeric, you will want to throw an error (i.e., tell the user you can’t find the maximum value of a non-numeric vector). Code to throw an error is below. Bonus points if you get the error message to inform the user which type of vector they passed to your function.
stop("`x` must be a numeric vector")
Call your function to make sure it works! First, let’s pass it a numeric vector:
minAndMax(-500:300)
## min max
## -500 300
Then pass it a non-numeric vector to make sure it throws an error:
minAndMax(c("UW", "WSU", "WWU", "CWU", "EWU", "SU", "SPU"))
## Error in minAndMax(c("UW", "WSU", "WWU", "CWU", "EWU", "SU", "SPU")): `x` must be a numeric vector but you passed it a character vector.
minAndMax(c(T, T, F, F, T, T, T, T, F))
## Error in minAndMax(c(T, T, F, F, T, T, T, T, F)): `x` must be a numeric vector but you passed it a logical vector.
Let’s create a proper function out of the number guessing game you wrote in Assignment 2 Problem 5. Create a function called guessingGame
that takes the following arguments:
min
: a numeric value indicating the minimum number that can be chosen. Defaults to 1.max
: a numeric value indicating the maximum number that can be chosen. Defaults to 10.play_again
: a logical value that when TRUE
prompts the user to play again after they guess correctly and when FALSE
plays only once with the user (when the call the function). Defaults to TRUE
.Modify your code from assignment 2 problem 5 (or use mine, if you prefer) to guess a number between min
and max
and repeat
only if the user says they want to be asked.
Trimming is a statistical procedure wherein the most extreme cases on both ends of a univariate distribution are removed. Winsorizing is similar, but the removed observations are replaced with values that correspond with chosen quantiles. When trimming/winsorizing, you have to decide which quantiles you want to trim/winsorize by. For example, consider the following vector:
observations <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
A 10% trim (keeping everything between the 5th and 95th percentile) would remove the first and last observation:
## [1] 2 3 4 5 6 7 8 9
And a 10% winsorize would result in the following:
# Observe quantiles
quantile(observations, probs = c(0.05, 0.95))
## 5% 95%
## 1.45 9.55
## [1] 1.45 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 9.55
Create a function called trim
that takes a numeric vector of any length as input and returns that same vector trimmed or winsorized. For simplicity, assume the user is giving you a large enough vector (i.e., they won’t ask you to trim a vector of length 2). Your function should have the following arguments:
x
: a numeric vector of any lengthprobs
: numeric vector of probabilities with values in [0, 1] to be passed to quantile
. Defaults to c(0.05, 0.95)
.winsorize
: a logical value whether to return a trimmed (FALSE
) or winsorized (TRUE
) vector. Defaults to FALSE
.Bonus points if you make sure that all of the arguments are valid before performing the trim.
Test to make sure trim()
does what you want! Here are a few test cases. Can you think of others?
trim(1:13)
## [1] 2 3 4 5 6 7 8 9 10 11 12
trim(1:13, probs = c(.1, .9))
## [1] 3 4 5 6 7 8 9 10 11
trim(1:13, probs = c(.1, .9), winsorize = TRUE)
## [1] 2.2 2.2 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 11.8 11.8
trim(c(5, 13, 11, 8, 9, 2, 12, 10, 1, 3, 6, 4, 7))
## [1] 5 11 8 9 2 12 10 3 6 4 7
Make sure to test that trim()
throws an error when you would expect it to!
trim()
## Error in trim(): argument "x" is missing, with no default
trim(1:13, probs = c(.05, .10, .15))
## Error in trim(1:13, probs = c(0.05, 0.1, 0.15)): `probs` must be numeric vector of length 2 between 0 and 1. 'c(0.05, 0.1, 0.15)' is not a valid value.
trim(1:13, probs = c(5, 95))
## Error in trim(1:13, probs = c(5, 95)): `probs` must be numeric vector of length 2 between 0 and 1. 'c(5, 95)' is not a valid value.
trim(1:13, winsorize = "Yes please")
## Error in trim(1:13, winsorize = "Yes please"): `winsorize` must be TRUE or FALSE. 'Yes please' is not a valid value.
Using replicate()
, modify the nickel coin flip function from Lecture 5 slide 12 to perform nflips
flips. Return the results in a vector of length nflips
(hint: you’ll have one argument called nflips). Create a few test cases to make sure it works as intended.
flipNickel <- function(){
sideup <- sample(x = c("heads", "tails", "edge"),
size = 1,
prob = c(.5-1/6000, .5-1/6000, 1/6000))
return(sideup)
}
flipNickel(5)
## [1] "tails" "tails" "tails" "tails" "tails"
flipNickel(100)
## [1] "tails" "heads" "heads" "heads" "tails" "tails" "heads" "heads" "tails"
## [10] "tails" "heads" "heads" "heads" "heads" "heads" "tails" "heads" "tails"
## [19] "tails" "tails" "heads" "tails" "tails" "heads" "tails" "tails" "heads"
## [28] "heads" "heads" "tails" "heads" "tails" "tails" "heads" "tails" "heads"
## [37] "tails" "tails" "tails" "heads" "heads" "heads" "heads" "tails" "tails"
## [46] "tails" "tails" "heads" "tails" "tails" "tails" "heads" "heads" "tails"
## [55] "tails" "tails" "tails" "heads" "tails" "tails" "heads" "tails" "heads"
## [64] "heads" "tails" "tails" "heads" "heads" "heads" "tails" "tails" "tails"
## [73] "tails" "tails" "tails" "heads" "heads" "tails" "tails" "tails" "tails"
## [82] "heads" "heads" "tails" "heads" "heads" "tails" "tails" "heads" "tails"
## [91] "tails" "heads" "heads" "heads" "heads" "heads" "tails" "heads" "tails"
## [100] "tails"
flipNickel(-5)
## Error in integer(n): invalid 'length' argument
Modify the function from Lecture 4 slide 12 to flip the nickle nflips
times. This time, however, you do not want to use any sort of looping function (no for
loops, no apply
functions, no replicate
, etc.). Hint: the answer is probably more simple than you think! Make sure you understand what is inside the original function well.
flipNickel <- function(){
sideup <- sample(x = c("heads", "tails", "edge"),
size = 1,
prob = c(.5-1/6000, .5-1/6000, 1/6000))
return(sideup)
}
Below is a dataset from 1,867 undergrads at UW who filled out the PHQ-9 (a short measure of depression symptoms). There are 9 items of the PHQ-9 (take a look at the data however you would like to get a sense of its structure).
d_phq <- read.csv("https://adamkucz.github.io/psych548/data/PHQ_Data.csv")
Using a for
loop, create a variable inside d_phq
called phq_total_forloop
that holds the sum of all PHQ-9 items for each row.
Using apply()
, create a variable inside d_phq
called phq_total_apply
that holds the sum of all PHQ-9 items for each row.
Using mapply()
, create a variable inside d_phq
called phq_total_mapply
that holds the sum of all PHQ-9 items for each row.
Using the rowSums()
function, create a variable inside d_phq
called phq_total_rowsums
that holds the sum of all PHQ-9 items for each row.
Check that these three columns are all equal to each other
all(d_phq$phq_total_forloop == d_phq$phq_total_apply)
## [1] TRUE
all(d_phq$phq_total_apply == d_phq$phq_total_rowsums)
## [1] TRUE
all(d_phq$phq_total_rowsums == d_phq$phq_total_mapply)
## [1] TRUE
The PHQ total score has cut points that correspond with the following:
Create a new variable in d_phq
called depression_severity
that takes on the values above based on the phq_total
score in each row. (Hint: you do not need any sort of loop to do this, and you’ve written similar code before!)
Now use lapply()
with unlist()
or sapply()
alone to create a named vector (same names as the phq columns in d_phq
) with the mean of each column of PHQ data (phq_1
through phq_9
). Call your vector phq_item_means
.
Using tapply()
, get the mean PHQ-9 total score for individuals in each category of depression_severity
## mild depression minimal depression
## 6.897351 1.669533
## moderate depression moderately severe depression
## 11.847826 16.631148
## severe depression
## 21.960784
Do this again to get the mean PHQ-9 Total score for individuals in a relationship (relationhip == 1
) and those not in a relationship (relationship == 0
).
## 0 1
## 6.403084 6.873614