Answers for WPA01 of Basic data and decision analysis in R, taught at the University of Konstanz in Winter 2017/2018.
Instructions
To complete and submit these exercises, please remember and do the following:
Your WPAs can be written and submitted either as scripts of commented code (as
.R
or.Rmd
files) or as reproducible documents that combine text with code (in.html
or.pdf
formats).A simple
.Rmd
template is provided here.Alternatively, open a plain R script and save it as
LastnameFirstname_WPA##_yymmdd.R
.
Also enter the current assignment (e.g., WPA01), your name, and the current date at the top of your document. When working on a task, always indicate which task you are answering with appopriate comments.
Here is an example how your file JillsomeJack_WPA01_161031.Rmd
could look:
# Assignment: WPA 01
# Name: Jillsome, Jack
# Date: 2017 October 30
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 1:
# Adding numbers:
s <- 1 + 2
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 2:
# Draw 100 samples and then conduct a t-test:
x <- rnorm(100) # create vector of samples from a random distribution
t.test(x) # t-test on the sample
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 3:
# etc. ...
Complete as many exercises as you can [by the deadline].
Submit your script or output file (including all code) to the appropriate folder on Ilias before midnight.
A. In Class
Basic concepts
1. You learned that object names in R are case-sensitive. If that’s the case, why does stuff == STUFF
evaluate to TRUE
in the following code?
plunder <- pi
stuff <- 1 # l1
Stuff <- stuff * plunder # l2
STUFF <- Stuff / plunder # l3
stuff == STUFF
#> [1] TRUE
# Answer: stuff, Stuff and STUFF are different objects,
# but both were assigned to the same number (here: 1),
# as l2 and l3 combine to STUFF <- stuff * plunder / plunder
# CAVEAT: Note what happens if l1 is slightly modified:
stuff <- 1 + 10 # l1
Stuff <- stuff * plunder # l2
STUFF <- Stuff / plunder # l3
stuff == STUFF
#> [1] FALSE
# Thus, when testing for equality of numerical results, use:
all.equal(stuff, STUFF)
#> [1] TRUE
2. Your captain claims that the number 1 and the result of 999\(^{9}\) have the same length. Show that your captain is correct (of course) and explain why in this particular case.
# In R, numbers are scalars and represented as vectors of length 1,
# irrespective of their numeric value:
length(1)
#> [1] 1
length(999^9)
#> [1] 1
3. To calculate the arithmetic mean of -3, -2, -1, 0, 1, 2, 3 in R, you use the function mean()
. What does the following code return and why?
mean(-3, -2, -1, 0, 1, 2, 3)
#> [1] -3
Correct the code to return the desired value.
mean(-3, -2, -1, 0, 1, 2, 3) # Problem: 1st argument is a scalar (-3), rather than a vector: => -3
#> [1] -3
mean(c(-3, -2, -1, 0, 1, 2, 3)) # is the intended version: => 0
#> [1] 0
mean(-3:3) # is a shorter version of the intended version: => 0
#> [1] 0
Sequence generation
4. Predict the sequence generated by seq(from = 0, to = 10, length.out = 10)
. Then change one argument to yield the same result as 0:10
.
# Evaluation (to test prediction):
x <- seq(from = 0, to = 10, length.out = 10)
x
#> [1] 0.000000 1.111111 2.222222 3.333333 4.444444 5.555556 6.666667
#> [8] 7.777778 8.888889 10.000000
# Correction:
y <- seq(from = 0, to = 10, length.out = 11)
y
#> [1] 0 1 2 3 4 5 6 7 8 9 10
# Alternative correction:
z <- seq(from = 0, to = 10, by = 1)
z
#> [1] 0 1 2 3 4 5 6 7 8 9 10
all.equal(0:10, y, z) # testing the equality of vectors y and z
#> [1] TRUE
5. Create 3 vectors of the integers from 1 to 10 by using 3 different commands.
v1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
v2 <- 1:10
v3 <- seq(from = 1, to = 10)
all.equal(v1, v2, v3) # testing the equality of vectors
#> [1] TRUE
6. Predict the sequences generated by pi:10
and 10:pi
. Then check your prediction and explain what happened.
pi:10 # starts at (real number) pi and increments by 1
#> [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
10:pi # starts at (integer) 10 and decrements by 1
#> [1] 10 9 8 7 6 5 4
Sampling from sets of values
7. You want to simulate 3 flips of a fair coin in R. Explain what is wrong with the following code and correct it.
coin <- c(heads, tails)
sample(x = coin, size = 3)
# There are two problems:
# 1. The coin cannot be assigned unless the objects heads and tails exist
# or are denoted as characters.
# 2. As sample() draws without replacement by default (replace = FALSE),
# the size of a sample cannot exceed the length of x.
# Corrections:
coin <- c("heads", "tails") # character vector
sample(x = coin, size = 3, replace = TRUE) # drawing with replacement
8a. Create a fair dice (with possible outcomes from 1 to 6) and determine the arithmetic mean and standard deviation of throwing it 10,000 times.
# Fair dice:
dice <- 1:6
N <- 10000
out <- sample(dice, size = N, replace = TRUE)
mean(out)
#> [1] 3.4882
sd(out)
#> [1] 1.699809
8b. Now imagine the dice was manipulated so that throwing a 6 is twice as likely as each of the other numbers. How does this change the arithmetic mean and standard deviation of throwing it 10,000 times?
# Unfair dice:
probs <- c(rep(1/7, 5), 2/7)
out2 <- sample(dice, prob = probs, size = N, replace = TRUE)
mean(out2)
#> [1] 3.8663
sd(out2)
#> [1] 1.803483
mean.dif <- mean(out2) - mean(out) # difference
sd.dif <- sd(out2) - sd(out)
Manipulating the dice increased its mean by 0.378 and its standard deviation by 0.104.
9. The most popular German lottery is known as 6 aus 49, in which a total of 7 numbers are randomly drawn: First, 6 unique numbers are randomly drawn out of the numbers from 1 to 49. Second, a single-digit “Superzahl” between 0 and 9. Simulate this lottery and run it once.
set.seed(100) # for reproducible results
six.n <- sort(sample(1:49, size = 6)) # 1. Draw 6 out of 1:49 (without replacement)
super.n <- sample(0:9, size = 1) # 2. Draw 1 out of 0:9
lottery <- c(six.n, "Super = ", super.n) # Combine both draws.
lottery
#> [1] "3" "13" "16" "22" "26" "45"
#> [7] "Super = " "8"
Drawing random values
Drawing random values from (normal or uniform) distributions:
You can create random samples of values from a Normal distribution using the rnorm(n, mean, sd)
function. For example, the following code will create a vector of 50 values from a Normal distribution with mean of 100 and standard deviation of 10:
# Random sample of 50 values from N(mean = 20, sd = 10)
x <- rnorm(n = 50, mean = 100, sd = 10)
Similarly, you can create random samples of values from a Uniform distribution using the runif(n, min, max)
function. For example, the following code will create a vector of 50 values from a Uniform distribution with minimum value of 60 and a maximum value of 140:
# Random sample of 50 uniform values between 10 and 100:
y <- runif(n = 50, min = 60, max = 140)
10. The following graph shows four distributions:
Draw 10 random samples from each distribution and round them to the nearest integer (using round(x, 0)
).
n <- 10
n1 <- rnorm(n, mean = -4, sd = 2)
n2 <- rnorm(n, mean = 5, sd = 1)
u1 <- runif(n, min = -3, max = 1)
u2 <- runif(n, min = -2, max = 7)
round(n1, 0)
#> [1] -3 -5 -3 -3 -3 -2 -5 -1 0 -2
round(n2, 0)
#> [1] 5 5 4 5 5 5 6 4 5 6
round(u1, 0)
#> [1] -1 -2 -2 0 -1 -2 -2 1 -2 -2
round(u2, 0)
#> [1] 3 1 1 -1 7 -1 3 6 7 2
11. Show that the mean of random values from a uniform distribution from min
to max
approximates the midpoint between min
and max
for large samples. (Hint: Choose arbitrary values for min
and max
and compare their midpoint to the mean for increasingly large samples from the corresponding uniform distribution.)
min <- sample(-100:-1, 1) # a random integer from -100 to -1.
max <- sample(+1:+100, 1) # a random integer from +1 to +100.
mid <- mean(c(min, max)) # the midpoint between min and max.
# The absolute difference between mean and mid
# approximates 0 for increasingly large samples:
abs(mean(runif(n = 10^1, min, max)) - mid)
#> [1] 0.6858565
abs(mean(runif(n = 10^3, min, max)) - mid)
#> [1] 0.1191256
abs(mean(runif(n = 10^6, min, max)) - mid)
#> [1] 0.003440042
12. The following tasks compare two different random samples from a Normal distribution with a mean of 100 and a standard deviation of 10.
12a. Create a vector called samp.10
that contains 10 samples from a Normal distribution with a mean of 100 and a standard deviation of 10.
samp.10 <- rnorm(n = 10, mean = 100, sd = 10)
12b. Create a vector called samp.100000
that contains 100,000 samples from the same Normal distribution as above (that is, also with a mean of 100 and standard deviation of 10).
samp.100000 <- rnorm(n = 100000, mean = 100, sd = 10)
12c. Before making any calculations, what would you guess the mean and standard deviations of samp.10
and samp.100000
are? Which prediction are you more confident in?
# My best guess for both means is 100 and for both SDs is 10.
# However, I'm more confident in samp.100000
# because of its larger sample size.
12d. Now calculate the mean and standard deviations of samp.10
and samp.100000
separately. Was your prediction correct?
mean(samp.10)
#> [1] 101.9955
sd(samp.10)
#> [1] 8.251404
mean(samp.100000)
#> [1] 99.93852
sd(samp.100000)
#> [1] 10.01994
# Conclusion:
# Yes, my prediction was correct (at least for this sample...).
Computing numeric vectors
13. Create a vector that contains all numbers between 1,000 and 2,000 that are multiples of 17. (Hint: Find the 1st element in this range by using the %%
operator to check whether a number is divisible by 17.)
1000:1017 %% 17 # signals the 1st multiple of 17 in this range (by returning 0)
#> [1] 14 15 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1003 %% 17 # verifies that 1003 is the 1st number
#> [1] 0
s <- seq(from = 1003, to = 2000, by = 17)
s # shows the solution
#> [1] 1003 1020 1037 1054 1071 1088 1105 1122 1139 1156 1173 1190 1207 1224
#> [15] 1241 1258 1275 1292 1309 1326 1343 1360 1377 1394 1411 1428 1445 1462
#> [29] 1479 1496 1513 1530 1547 1564 1581 1598 1615 1632 1649 1666 1683 1700
#> [43] 1717 1734 1751 1768 1785 1802 1819 1836 1853 1870 1887 1904 1921 1938
#> [57] 1955 1972 1989
s %% 17 # verifies that all elements in s are divisible by 17
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14. Suppose you can save EUR 10 this month, twice as much next month, and again twice as much the next month, etc. If this holds up, how much will you have saved after one year?
months <- 1:12 # vector of months
initial <- 10 # initial value
savings <- initial * 2^(months - 1)
savings # vector each month's savings.
#> [1] 10 20 40 80 160 320 640 1280 2560 5120 10240
#> [12] 20480
sum(savings) # yields the total amount saved:
#> [1] 40950
Checkpoint 1
At this point you completed all basic exercises. This is good, but keep carrying on…
B. At Home
Shuffling an ordered sequence
15a. Create a vector shuffled.deck
that shuffles (or contains a random permutation of) a deck of 32 cards. (Hint: Create a vector deck
containing the sequence of numbers from 1 to 32 and then use sample()
to draw all items from it.)
n <- 32
deck <- 1:n
shuffled.deck <- sample(deck, n)
shuffled.deck
#> [1] 17 21 10 5 31 4 15 30 3 28 25 19 16 1 32 20 24 2 9 26 13 18 7
#> [24] 12 23 11 8 27 29 22 6 14
15b. Check that there are 32 unique()
elements in shuffled.deck
!
length(shuffled.deck) # => length is equal to n
#> [1] 32
all.equal(unique(shuffled.deck), shuffled.deck) # all elements are unique
#> [1] TRUE
Assigning participants to conditions
16. Your new experiment has 3 treatment conditions A
, B
, and placebo
. The following exercises deal with different ways of assigning participants to experimental treatments.
16a. Assign 15 participants to the 3 treatments in the (non-random) order in which they arrive at the laboratory. (Hint: Create a vector cond
that contains the 3 treatment conditions and repeat it to create a vector of assignments.)
cond <- c("A", "B", "placebo")
assign <- rep(cond, length.out = 15)
assign <- rep(cond, times = 5) # yields same (as length(cond) => 5)
assign
#> [1] "A" "B" "placebo" "A" "B" "placebo" "A"
#> [8] "B" "placebo" "A" "B" "placebo" "A" "B"
#> [15] "placebo"
16b. Randomly assign another 15 participants to the same 3 treatment conditions using the sample
function.
set.seed(11) # for reproducible results
rand.assign <- sample(cond, 15, replace = TRUE) # sample 15 items from cond (with replacement)
rand.assign
#> [1] "A" "A" "B" "A" "A" "placebo" "A"
#> [8] "A" "placebo" "A" "A" "B" "placebo" "placebo"
#> [15] "placebo"
16c. A frequent nuisance with a truly random assignment is that different numbers of participants end up receiving the different treatments. To avoid this, create a random sequence of 15 assignments that is guaranteed to have an equal number of participants (i.e., exactly 5) in each treatment condition.
cond.x.groups <- rep(cond, 5) # create a new set by repeating cond 5 times
equal.assign <- sample(cond.x.groups, 15, replace = FALSE) # sample all 15 items from this set (without replacement)
equal.assign
#> [1] "placebo" "A" "B" "B" "placebo" "placebo" "B"
#> [8] "A" "B" "A" "placebo" "placebo" "B" "A"
#> [15] "A"
16d. For another set of 15 participants, make sure that you assign every triple of participants to all 3 conditions, but do this randomly for the 1st and 2nd participant of every triple.
set.seed(22) # for reproducible results
pseudo.rand.assign <- rep(sample(cond, 3), 5)
pseudo.rand.assign # only repeats the SAME sequence 5 times.
#> [1] "A" "placebo" "B" "A" "placebo" "B" "A"
#> [8] "placebo" "B" "A" "placebo" "B" "A" "placebo"
#> [15] "B"
set.seed(33) # for reproducible results
more.rand.assign <- c(sample(cond, 3), sample(cond, 3), sample(cond, 3), sample(cond, 3), sample(cond, 3))
more.rand.assign
#> [1] "B" "A" "placebo" "placebo" "B" "A" "B"
#> [8] "A" "placebo" "A" "B" "placebo" "A" "placebo"
#> [15] "B"
Drinking non-alcoholic beer
17. Does drinking non-alcoholic beer affect cognitive performance?
A psychologist has a theory that some of the negative cognitive effects of alcohol are the result of psychological rather than physiological processes. To test this, she has 12 participants perform a cognitive test before and after drinking non-alcoholic beer which was labelled to contain 5% of alcohol. Results from the study, including some demographic data, are presented in the following table. Note that higher scores on the test indicate better performance.
participant | before | after | age | sex | eye.color |
---|---|---|---|---|---|
1 | 45 | 43 | 20 | male | blue |
2 | 49 | 50 | 19 | female | blue |
3 | 40 | 61 | 22 | male | brown |
4 | 48 | 44 | 20 | female | brown |
5 | 44 | 45 | 27 | male | blue |
6 | 70 | 20 | 22 | female | blue |
7 | 90 | 85 | 22 | male | brown |
8 | 75 | 65 | 20 | female | brown |
9 | 80 | 72 | 25 | male | blue |
10 | 65 | 65 | 22 | female | blue |
11 | 80 | 70 | 24 | male | brown |
12 | 52 | 75 | 22 | female | brown |
Creating vectors from scratch
17a. Create a vector of the participant data called participant
using the c()
function.
participant <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
17b. Now, create the participant
vector again, but this time use the a:b
function.
participant <- 1:12
17c. Now create the participant
vector again, but this time using the seq()
function.
participant <- seq(from = 1, to = 12, by = 1)
17d. Create a vector of the before drink data called before
using c()
.
before <- c(45, 49, 40, 48, 44, 70, 90, 75, 80, 65, 80, 52)
17e. Create a vector of the after drink data called after
using c()
.
after <- c(43, 50, 61, 44, 45, 20, 85, 65, 72, 65, 70, 75)
17f. Create a vector of the age data called age
using c()
.
age <- c(20, 19, 22, 20, 27, 22, 22, 20, 25, 22, 24, 22)
17g. Create a vector of the sex data called sex
but don’t use c()
. Instead, use the rep()
function by looking for an existing pattern in the data (above).
sex <- rep(c("male", "female"), length.out = 12)
# OR
sex <- rep(c("male", "female"), times = 6)
17h. Create a vector of the eye color data called eye.color
but don’t use c()
. Instead, use the rep()
function by looking for the pattern in the data (above).
# Here are 3 different solutions:
eye.color <- rep(c("blue", "brown"), each = 2, times = 3)
eye.color <- rep(c("blue", "brown"), each = 2, length.out = 12)
eye.color <- rep(c("blue", "blue", "brown", "brown"), times = 3)
Combining and changing vectors
18a. Create a new vector called age.months
that shows the participants’ age in months instead of years. (Hint: Use basic vector arithmetic.)
age.months <- age * 12
18b. Create a new vector called change
that shows the change in participants’ scores from before to after (Hint: Use basic vector arithmetic.)
change <- after - before
change
#> [1] -2 1 21 -4 1 -50 -5 -10 -8 0 -10 23
18c. Create a new vector called average
that shows the participants’ average score across both tests. That is, the first element of average
should be the average of the first participant’s two scores, and the second element should be the average of the second participant’s two scores. (Hint: Don’t use mean()
– use basic vector arithmetic.)
average <- (after + before) / 2
18d. Oops! It turns out that the watch used to measure time was off. All the before
times are 1 second too fast, and all the after
times are 1 second too slow. Correct them!
before <- before + 1
after <- after - 1
Checkpoint 2
If you got this far you are doing very well! Try to keep it up for a little longer…
Applying functions to vectors
19a. How many elements are in each of the original data vectors? (Hint: use length()
). If the number of elements in each is not the same, you made an error somewhere.
# Comparing vector lengths:
length(participant)
#> [1] 12
length(before)
#> [1] 12
length(after)
#> [1] 12
length(age)
#> [1] 12
length(sex)
#> [1] 12
length(eye.color)
#> [1] 12
19b. What was the standard deviation of ages? Assign the result to a scalar object called age.sd
.
age.sd <- sd(age)
19c. What is the median age? Assign the result to a scalar object called age.median
.
age.median <- median(age)
19d. How many people were there of each sex? (Hint: Use table()
.)
table(sex)
#> sex
#> female male
#> 6 6
19e. What percent of people were of each sex? (Hint: Use table()
and divide by its sum()
to get a percentage.)
table(sex) / sum(table(sex))
#> sex
#> female male
#> 0.5 0.5
# OR:
table(sex) / length(sex)
#> sex
#> female male
#> 0.5 0.5
19f. Calculate the mean of the sex
column. What happens and why?
# We get an NA value because sex is not a numeric vector!
mean(sex)
#> [1] NA
19g. What was the mean before
time? Assign the result to a scalar object called before.mean
.
before.mean <- mean(before)
19h. What was the mean after
time? Assign the result to a scalar object called after.mean
.
after.mean <- mean(after)
19i. What was the difference in the mean before
times and the mean after
times? Calculate this in two ways: once using the change
vector, and once using the before.mean
and after.mean
objects. (Verify that you obtain the same answer for both.)
after.mean - before.mean
#> [1] -5.583333
# We have to re-create the change vector because we updated after and before!
change <- after - before
mean(change)
#> [1] -5.583333
Standardizing variables (via z-scores)
Standardizing variables makes them comparable. For instance, the z-scores of a vector \(v\) of values \(x_{i}\) are computed by subtracting the mean \(m(v)\) from every value \(x_{i}\), and then dividing the result by the vector’s standard deviation \(sd(v)\). Standardizing multiple variables puts them on a common scale (i.e., transformed variables have the same mean and standard deviations).
20a. Create a vector called before.z
as a standardized version of before
.
before.z <- (before - mean(before)) / sd(before)
20b. Create a vector called after.z
showing a standardized version of after
.
after.z <- (after - mean(after)) / sd(after)
20c. What was the largest before
score? What was its corresponding z-score?
max(before)
#> [1] 91
max(before.z)
#> [1] 1.662411
20d. What was the smallest after
score? What was its corresponding z-score?
min(after)
#> [1] 19
min(after.z)
#> [1] -2.106802
20e. What should the mean and standard deviation of before.z
and after.z
be? Test your predictions by carrying out the appropriate calculations.
# For z-scores, the mean should be 0 the SD should be 1:
mean(before.z)
#> [1] 1.619979e-17
sd(before.z)
#> [1] 1
mean(after.z)
#> [1] 1.526557e-16
sd(after.z)
#> [1] 1
Checkpoint 3
If you made it to this point you’re doing a great job. Consider the following a tasty bonus task…
C. Bonus:
The Room with 100 Boxes
21. Imagine the following: You enter a room with 100 closed boxes. 99 of the 100 boxes each contain EUR 10,000 which you can keep if you open the box. However, 1 of the 100 boxes contains a bomb which kills you if you open it.
Here’s your question: If you walked into the room with 100 closed boxes, how many would you want to open?
You can easily play the Room with 100 boxes game using the sample()
function in R.
First, assign the number of boxes you want to open to a new scalar object called i.will.open
:
# How many do you want to open?
i.will.open <- 0 # To play, change the value from 0 to a number from 1 to 100.
Now run the following code to see what you get:
# Play the Room with 100 Boxes Game:
n <- 100 # number of boxes
# Define the set of possible boxes (n-1 contain 10000 each, but 1 contains -Inf):
boxes <- c(rep(10000, (n-1)), -Inf)
boxes
# Draw a random sample of size i.will.open from the boxes:
boxes.result <- sample(x = boxes, size = i.will.open)
# Alternative solution:
shuffled.boxes <- sample(boxes, n) # permute boxes
if (i.will.open > 0) {
boxes.result <- shuffled.boxes[1:i.will.open]} # open the first i.will.open boxes
# Print your results:
boxes.result # show what's in each box (in EUR)
sum(boxes.result) # your total winnings (in EUR)
You can also represent the boxes game with a custom function in R. Run the following chunk to create the new function play.boxes.game()
:
# Run this chunk to create the function:
play.boxes.game <- function(i.will.open) {
# Prevent exponent printing:
options("scipen" = 100, "digits" = 4)
if(i.will.open == 0) { # Case 0: No play
print("You haven't opened any boxes. You earned nothing but are still alive.")
}
if(i.will.open > 0) {
boxes <- c(rep(10000, 99), -Inf) # fill boxes
boxes.result <- sample(x = boxes, size = i.will.open) # draw sample
# Evaluate boxes.result:
if(-Inf %in% boxes.result) { # Case 1: Bad luck
print(paste("You're dead! You opened ", i.will.open,
" boxes and got the bomb!", sep = ""))}
if((-Inf %in% boxes.result) == FALSE) { # Case 2: Lucky draw
print(paste("Congratulations! You opened ", i.will.open,
" boxes and earned EUR ", sum(boxes.result), ". ",
" Do you want to play again? :)", sep = ""))}
}
}
Now play the game a few times just by running the play.boxes.game()
function with the number of boxes you want to open as its only argument! For instance:
play.boxes.game(0) # Play by opening 0 boxes...
#> [1] "You haven't opened any boxes. You earned nothing but are still alive."
play.boxes.game(3) # Play by opening 3 boxes...
#> [1] "Congratulations! You opened 3 boxes and earned EUR 30000. Do you want to play again? :)"
That’s it – hope you enjoyed working on this assignment!
[WPA01_answers.Rmd
updated on 2017-11-02 11:45:22 by hn.]