WPA01 of Basic data and decision analysis in R, taught at the University of Konstanz in Winter 2017/2018.
Instructions
To complete and submit these exercises, please remember and do the following:
Your WPAs can be written and submitted either as scripts of commented code (as
.R
or.Rmd
files) or as reproducible documents that combine text with code (in.html
or.pdf
formats).A simple
.Rmd
template is provided here.Alternatively, open a plain R script and save it as
LastnameFirstname_WPA##_yymmdd.R
.
Also enter the current assignment (e.g., WPA01), your name, and the current date at the top of your document. When working on a task, always indicate which task you are answering with appopriate comments.
Here is an example how your file JillsomeJack_WPA01_161031.Rmd
could look:
# Assignment: WPA 01
# Name: Jillsome, Jack
# Date: 2017 October 30
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 1:
# Adding numbers:
s <- 1 + 2
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 2:
# Draw 100 samples and then conduct a t-test:
x <- rnorm(100) # create vector of samples from a random distribution
t.test(x) # t-test on the sample
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 3:
# etc. ...
Complete as many exercises as you can by Wednesday (23:59).
Submit your script or output file (including all code) to the appropriate folder on Ilias before midnight.
A. In Class
Basic concepts
1. You learned that object names in R are case-sensitive. If that’s the case, why does stuff == STUFF
evaluate to TRUE
in the following code?
plunder <- pi
stuff <- 1 # l1
Stuff <- stuff * plunder # l2
STUFF <- Stuff / plunder # l3
stuff == STUFF
2. Your captain claims that the number 1 and the result of 999\(^{9}\) have the same length. Show that your captain is correct (of course) and explain why this is so in this particular case.
3. To calculate the arithmetic mean of -3, -2, -1, 0, 1, 2, 3 in R, you use the function mean()
. What does the following code return and why?
mean(-3, -2, -1, 0, 1, 2, 3)
Correct the code to return the desired value.
Sequence generation
4. Predict the sequence generated by seq(from = 0, to = 10, length.out = 10)
. Then change one argument to yield the same result as 0:10
.
5. Create 3 vectors of the integers from 1 to 10 by using 3 different commands.
6. Predict the sequences generated by pi:10
and 10:pi
. Then check your prediction and explain what happened.
Sampling from sets of values
7. You want to simulate 3 flips of a fair coin in R. Explain what is wrong with the following code and correct it.
coin <- c(heads, tails)
sample(x = coin, size = 3)
8a. Create a fair dice (with possible outcomes from 1 to 6) and determine the arithmetic mean and standard deviation of throwing it 10,000 times.
8b. Now imagine the dice was manipulated so that throwing a 6 is twice as likely as each of the other numbers. How does this change the arithmetic mean and standard deviation of throwing it 10,000 times?
9. The most popular German lottery is known as 6 aus 49, in which a total of 7 numbers are randomly drawn: First, 6 unique numbers are randomly drawn out of the numbers from 1 to 49. Second, a single-digit “Superzahl” between 0 and 9. Simulate this lottery and run it once.
Drawing random values
Drawing random values from (normal or uniform) distributions:
You can create random samples of values from a Normal distribution using the rnorm(n, mean, sd)
function. For example, the following code will create a vector of 50 values from a Normal distribution with mean of 100 and standard deviation of 10:
# Random sample of 50 values from N(mean = 20, sd = 10)
x <- rnorm(n = 50, mean = 100, sd = 10)
Similarly, you can create random samples of values from a Uniform distribution using the runif(n, min, max)
function. For example, the following code will create a vector of 50 values from a Uniform distribution with minimum value of 60 and a maximum value of 140:
# Random sample of 50 uniform values between 10 and 100:
y <- runif(n = 50, min = 60, max = 140)
10. The following graph shows four distributions:
Draw 10 random samples from each distribution and round them to the nearest integer (using round(x, 0)
).
11. Show that the mean of random values from a uniform distribution from min
to max
approximates the midpoint between min
and max
for large samples. (Hint: Choose arbitrary values for min
and max
and compare their midpoint to the mean for increasingly large samples from the corresponding uniform distribution.)
12. The following tasks compare two different random samples from a Normal distribution with a mean of 100 and a standard deviation of 10.
12a. Create a vector called samp.10
that contains 10 samples from a Normal distribution with a mean of 100 and a standard deviation of 10.
12b. Create a vector called samp.100000
that contains 100,000 samples from the same Normal distribution as above (that is, also with a mean of 100 and standard deviation of 10).
12c. Before making any calculations, what would you guess the mean and standard deviations of samp.10
and samp.100000
are? Which prediction are you more confident in?
12d. Now calculate the mean and standard deviations of samp.10
and samp.100000
separately. Was your prediction correct?
Computing numeric vectors
13. Create a vector that contains all numbers between 1,000 and 2,000 that are multiples of 17. (Hint: Find the 1st element in this range by using the %%
operator to check whether a number is divisible by 17.)
14. Suppose you can save EUR 10 this month, twice as much next month, and again twice as much the next month, etc. If this holds up, how much will you have saved after one year?
Checkpoint 1
At this point you completed all basic exercises. This is good, but keep carrying on…
B. At Home
Shuffling an ordered sequence
15a. Create a vector shuffled.deck
that shuffles (or contains a random permutation of) a deck of 32 cards. (Hint: Create a vector deck
containing the sequence of numbers from 1 to 32 and then use sample()
to draw all items from it.)
15b. Check that there are 32 unique()
elements in shuffled.deck
!
Assigning participants to conditions
16. Your new experiment has 3 treatment conditions A
, B
, and placebo
. The following exercises deal with different ways of assigning participants to experimental treatments.
16a. Assign 15 participants to the 3 treatments in the (non-random) order in which they arrive at the laboratory. (Hint: Create a vector cond
that contains the 3 treatment conditions and repeat it to create a vector of assignments.)
16b. Randomly assign another 15 participants to the same 3 treatment conditions using the sample
function.
16c. A frequent nuisance with a truly random assignment is that different numbers of participants end up receiving the different treatments. To avoid this, create a random sequence of 15 assignments that is guaranteed to have an equal number of participants (i.e., exactly 5) in each treatment condition.
16d. For another set of 15 participants, make sure that you assign every triple of participants to all 3 conditions, but do this randomly for the 1st and 2nd participant of every triple.
Drinking non-alcoholic beer
17. Does drinking non-alcoholic beer affect cognitive performance?
A psychologist has a theory that some of the negative cognitive effects of alcohol are the result of psychological rather than physiological processes. To test this, she has 12 participants perform a cognitive test before and after drinking non-alcoholic beer which was labelled to contain 5% of alcohol. Results from the study, including some demographic data, are presented in the following table. Note that higher scores on the test indicate better performance.
participant | before | after | age | sex | eye.color |
---|---|---|---|---|---|
1 | 45 | 43 | 20 | male | blue |
2 | 49 | 50 | 19 | female | blue |
3 | 40 | 61 | 22 | male | brown |
4 | 48 | 44 | 20 | female | brown |
5 | 44 | 45 | 27 | male | blue |
6 | 70 | 20 | 22 | female | blue |
7 | 90 | 85 | 22 | male | brown |
8 | 75 | 65 | 20 | female | brown |
9 | 80 | 72 | 25 | male | blue |
10 | 65 | 65 | 22 | female | blue |
11 | 80 | 70 | 24 | male | brown |
12 | 52 | 75 | 22 | female | brown |
Creating vectors from scratch
17a. Create a vector of the participant data called participant
using the c()
function.
17b. Now, create the participant
vector again, but this time use the a:b
function.
17c. Now create the participant
vector again, but this time using the seq()
function.
17d. Create a vector of the before drink data called before
using c()
.
17e. Create a vector of the after drink data called after
using c()
.
17f. Create a vector of the age data called age
using c()
.
17g. Create a vector of the sex data called sex
but don’t use c()
. Instead, use the rep()
function by looking for an existing pattern in the data (above).
17h. Create a vector of the eye color data called eye.color
but don’t use c()
. Instead, use the rep()
function by looking for the pattern in the data (above).
Combining and changing vectors
18a. Create a new vector called age.months
that shows the participants’ age in months instead of years. (Hint: Use basic vector arithmetic.)
18b. Create a new vector called change
that shows the change in participants’ scores from before to after (Hint: Use basic vector arithmetic.)
18c. Create a new vector called average
that shows the participants’ average score across both tests. That is, the first element of average
should be the average of the first participant’s two scores, and the second element should be the average of the second participant’s two scores. (Hint: Don’t use mean()
– use basic vector arithmetic.)
18d. Oops! It turns out that the watch used to measure time was off. All the before
times are 1 second too fast, and all the after
times are 1 second too slow. Correct them!
Checkpoint 2
If you got this far you are doing very well! Try to keep it up for a little longer…
Applying functions to vectors
19a. How many elements are in each of the original data vectors? (Hint: use length()
). If the number of elements in each is not the same, you made an error somewhere.
19b. What was the standard deviation of ages? Assign the result to a scalar object called age.sd
.
19c. What is the median age? Assign the result to a scalar object called age.median
.
19d. How many people were there of each sex? (Hint: Use table()
.)
19e. What percent of people were of each sex? (Hint: Use table()
and divide by its sum()
to get a percentage.)
19f. Calculate the mean of the sex
column. What happens and why?
19g. What was the mean before
time? Assign the result to a scalar object called before.mean
.
19h. What was the mean after
time? Assign the result to a scalar object called after.mean
.
19i. What was the difference in the mean before
times and the mean after
times? Calculate this in two ways: once using the change
vector, and once using the before.mean
and after.mean
objects. (Verify that you obtain the same answer for both.)
Standardizing variables (via z-scores)
Standardizing variables makes them comparable. For instance, the z-scores of a vector \(v\) of values \(x_{i}\) are computed by subtracting the mean \(m(v)\) from every value \(x_{i}\), and then dividing the result by the vector’s standard deviation \(sd(v)\). Standardizing multiple variables puts them on a common scale (i.e., transformed variables have the same mean and standard deviations).
20a. Create a vector called before.z
as a standardized version of before
.
20b. Create a vector called after.z
as a standardized version of after
.
20c. What was the largest before
score? What was its corresponding z-score?
20d. What was the smallest after
score? What was its corresponding z-score?
20e. What should the mean and standard deviation of before.z
and after.z
be? Test your predictions by carrying out the appropriate calculations.
Checkpoint 3
If you made it to this point you’re doing a great job. Consider the following a tasty bonus task…
C. Bonus:
The Room with 100 Boxes
21. Imagine the following: You enter a room with 100 closed boxes. 99 of the 100 boxes each contain EUR 10,000 which you can keep if you open the box. However, 1 of the 100 boxes contains a bomb which kills you if you open it.
Here’s your question: If you walked into the room with 100 closed boxes, how many would you want to open?
You can easily play the Room with 100 boxes game using the sample()
function in R.
First, assign the number of boxes you want to open to a new scalar object called i.will.open
:
# How many do you want to open?
i.will.open <- 0 # To play, change the value from 0 to a number from 1 to 100.
Now run the following code to see what you get:
# Play the Room with 100 Boxes Game:
n <- 100 # number of boxes
# Define the set of possible boxes (n-1 contain 10000 each, but 1 contains -Inf):
boxes <- c(rep(10000, (n-1)), -Inf)
boxes
# Draw a random sample of size i.will.open from the boxes:
boxes.result <- sample(x = boxes, size = i.will.open)
# Alternative solution:
shuffled.boxes <- sample(boxes, n) # permute boxes
if (i.will.open > 0) {
boxes.result <- shuffled.boxes[1:i.will.open]} # open the first i.will.open boxes
# Print your results:
boxes.result # show what's in each box (in EUR)
sum(boxes.result) # your total winnings (in EUR)
You can also represent the boxes game with a custom function in R. Run the following chunk to create the new function play.boxes.game()
:
# Run this chunk to create the function:
play.boxes.game <- function(i.will.open) {
# Prevent exponent printing:
options("scipen" = 100, "digits" = 4)
if(i.will.open == 0) { # Case 0: No play
print("You haven't opened any boxes. You earned nothing but are still alive.")
}
if(i.will.open > 0) {
boxes <- c(rep(10000, 99), -Inf) # fill boxes
boxes.result <- sample(x = boxes, size = i.will.open) # draw sample
# Evaluate boxes.result:
if(-Inf %in% boxes.result) { # Case 1: Bad luck
print(paste("You're dead! You opened ", i.will.open,
" boxes and got the bomb!", sep = ""))}
if((-Inf %in% boxes.result) == FALSE) { # Case 2: Lucky draw
print(paste("Congratulations! You opened ", i.will.open,
" boxes and earned EUR ", sum(boxes.result), ". ",
" Do you want to play again? :)", sep = ""))}
}
}
Now play the game a few times just by running the play.boxes.game()
function with the number of boxes you want to open as its only argument! For instance:
play.boxes.game(0) # Play by opening 0 boxes...
play.boxes.game(3) # Play by opening 3 boxes...
That’s it – now it’s time to submit your assignment!
Save and submit your (commented) script or output file (including all code) to the appropriate folder on Ilias by Wednesday, 23:59.
[WPA01.Rmd
updated on 2017-10-30 09:05:55 by hn.]