uni.kn.logo

WPA01 of Basic data and decision analysis in R, taught at the University of Konstanz in Winter 2017/2018.


Instructions

To complete and submit these exercises, please remember and do the following:

  1. Your WPAs can be written and submitted either as scripts of commented code (as .R or .Rmd files) or as reproducible documents that combine text with code (in .html or .pdf formats).

    • A simple .Rmd template is provided here.

    • Alternatively, open a plain R script and save it as LastnameFirstname_WPA##_yymmdd.R.

  2. Also enter the current assignment (e.g., WPA01), your name, and the current date at the top of your document. When working on a task, always indicate which task you are answering with appopriate comments.

Here is an example how your file JillsomeJack_WPA01_161031.Rmd could look:

# Assignment: WPA 01
# Name: Jillsome, Jack
# Date: 2017 October 30
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 1: 

# Adding numbers: 
s <- 1 + 2 

# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 2: 

# Draw 100 samples and then conduct a t-test: 

x <- rnorm(100) # create vector of samples from a random distribution
t.test(x)       # t-test on the sample

# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Exercise 3: 
# etc. ...
  1. Complete as many exercises as you can by Wednesday (23:59).

  2. Submit your script or output file (including all code) to the appropriate folder on Ilias before midnight.


A. In Class

Basic concepts

1. You learned that object names in R are case-sensitive. If that’s the case, why does stuff == STUFF evaluate to TRUE in the following code?

plunder <- pi
stuff <- 1               # l1
Stuff <- stuff * plunder # l2
STUFF <- Stuff / plunder # l3
stuff == STUFF

2. Your captain claims that the number 1 and the result of 999\(^{9}\) have the same length. Show that your captain is correct (of course) and explain why this is so in this particular case.

3. To calculate the arithmetic mean of -3, -2, -1, 0, 1, 2, 3 in R, you use the function mean(). What does the following code return and why?

mean(-3, -2, -1, 0, 1, 2, 3)

Correct the code to return the desired value.

Sequence generation

4. Predict the sequence generated by seq(from = 0, to = 10, length.out = 10). Then change one argument to yield the same result as 0:10.

5. Create 3 vectors of the integers from 1 to 10 by using 3 different commands.

6. Predict the sequences generated by pi:10 and 10:pi. Then check your prediction and explain what happened.

Sampling from sets of values

7. You want to simulate 3 flips of a fair coin in R. Explain what is wrong with the following code and correct it.

coin <- c(heads, tails)
sample(x = coin, size = 3)

8a. Create a fair dice (with possible outcomes from 1 to 6) and determine the arithmetic mean and standard deviation of throwing it 10,000 times.

8b. Now imagine the dice was manipulated so that throwing a 6 is twice as likely as each of the other numbers. How does this change the arithmetic mean and standard deviation of throwing it 10,000 times?

9. The most popular German lottery is known as 6 aus 49, in which a total of 7 numbers are randomly drawn: First, 6 unique numbers are randomly drawn out of the numbers from 1 to 49. Second, a single-digit “Superzahl” between 0 and 9. Simulate this lottery and run it once.

Drawing random values

Drawing random values from (normal or uniform) distributions:

You can create random samples of values from a Normal distribution using the rnorm(n, mean, sd) function. For example, the following code will create a vector of 50 values from a Normal distribution with mean of 100 and standard deviation of 10:

# Random sample of 50 values from N(mean = 20, sd = 10)
x <- rnorm(n = 50, mean = 100, sd = 10)

Similarly, you can create random samples of values from a Uniform distribution using the runif(n, min, max) function. For example, the following code will create a vector of 50 values from a Uniform distribution with minimum value of 60 and a maximum value of 140:

# Random sample of 50 uniform values between 10 and 100: 
y <- runif(n = 50, min = 60, max = 140)

10. The following graph shows four distributions:

Draw 10 random samples from each distribution and round them to the nearest integer (using round(x, 0)).

11. Show that the mean of random values from a uniform distribution from min to max approximates the midpoint between min and max for large samples. (Hint: Choose arbitrary values for min and max and compare their midpoint to the mean for increasingly large samples from the corresponding uniform distribution.)

12. The following tasks compare two different random samples from a Normal distribution with a mean of 100 and a standard deviation of 10.

12a. Create a vector called samp.10 that contains 10 samples from a Normal distribution with a mean of 100 and a standard deviation of 10.

12b. Create a vector called samp.100000 that contains 100,000 samples from the same Normal distribution as above (that is, also with a mean of 100 and standard deviation of 10).

12c. Before making any calculations, what would you guess the mean and standard deviations of samp.10 and samp.100000 are? Which prediction are you more confident in?

12d. Now calculate the mean and standard deviations of samp.10 and samp.100000 separately. Was your prediction correct?

Computing numeric vectors

13. Create a vector that contains all numbers between 1,000 and 2,000 that are multiples of 17. (Hint: Find the 1st element in this range by using the %% operator to check whether a number is divisible by 17.)

14. Suppose you can save EUR 10 this month, twice as much next month, and again twice as much the next month, etc. If this holds up, how much will you have saved after one year?

Checkpoint 1

At this point you completed all basic exercises. This is good, but keep carrying on…

B. At Home

Shuffling an ordered sequence

15a. Create a vector shuffled.deck that shuffles (or contains a random permutation of) a deck of 32 cards. (Hint: Create a vector deck containing the sequence of numbers from 1 to 32 and then use sample() to draw all items from it.)

15b. Check that there are 32 unique() elements in shuffled.deck!

Assigning participants to conditions

16. Your new experiment has 3 treatment conditions A, B, and placebo. The following exercises deal with different ways of assigning participants to experimental treatments.

16a. Assign 15 participants to the 3 treatments in the (non-random) order in which they arrive at the laboratory. (Hint: Create a vector cond that contains the 3 treatment conditions and repeat it to create a vector of assignments.)

16b. Randomly assign another 15 participants to the same 3 treatment conditions using the sample function.

16c. A frequent nuisance with a truly random assignment is that different numbers of participants end up receiving the different treatments. To avoid this, create a random sequence of 15 assignments that is guaranteed to have an equal number of participants (i.e., exactly 5) in each treatment condition.

16d. For another set of 15 participants, make sure that you assign every triple of participants to all 3 conditions, but do this randomly for the 1st and 2nd participant of every triple.

Drinking non-alcoholic beer

17. Does drinking non-alcoholic beer affect cognitive performance?

A psychologist has a theory that some of the negative cognitive effects of alcohol are the result of psychological rather than physiological processes. To test this, she has 12 participants perform a cognitive test before and after drinking non-alcoholic beer which was labelled to contain 5% of alcohol. Results from the study, including some demographic data, are presented in the following table. Note that higher scores on the test indicate better performance.

participant before after age sex eye.color
1 45 43 20 male blue
2 49 50 19 female blue
3 40 61 22 male brown
4 48 44 20 female brown
5 44 45 27 male blue
6 70 20 22 female blue
7 90 85 22 male brown
8 75 65 20 female brown
9 80 72 25 male blue
10 65 65 22 female blue
11 80 70 24 male brown
12 52 75 22 female brown

Creating vectors from scratch

17a. Create a vector of the participant data called participant using the c() function.

17b. Now, create the participant vector again, but this time use the a:b function.

17c. Now create the participant vector again, but this time using the seq() function.

17d. Create a vector of the before drink data called before using c().

17e. Create a vector of the after drink data called after using c().

17f. Create a vector of the age data called age using c().

17g. Create a vector of the sex data called sex but don’t use c(). Instead, use the rep() function by looking for an existing pattern in the data (above).

17h. Create a vector of the eye color data called eye.color but don’t use c(). Instead, use the rep() function by looking for the pattern in the data (above).

Combining and changing vectors

18a. Create a new vector called age.months that shows the participants’ age in months instead of years. (Hint: Use basic vector arithmetic.)

18b. Create a new vector called change that shows the change in participants’ scores from before to after (Hint: Use basic vector arithmetic.)

18c. Create a new vector called average that shows the participants’ average score across both tests. That is, the first element of average should be the average of the first participant’s two scores, and the second element should be the average of the second participant’s two scores. (Hint: Don’t use mean() – use basic vector arithmetic.)

18d. Oops! It turns out that the watch used to measure time was off. All the before times are 1 second too fast, and all the after times are 1 second too slow. Correct them!

Checkpoint 2

If you got this far you are doing very well! Try to keep it up for a little longer…

Applying functions to vectors

19a. How many elements are in each of the original data vectors? (Hint: use length()). If the number of elements in each is not the same, you made an error somewhere.

19b. What was the standard deviation of ages? Assign the result to a scalar object called age.sd.

19c. What is the median age? Assign the result to a scalar object called age.median.

19d. How many people were there of each sex? (Hint: Use table().)

19e. What percent of people were of each sex? (Hint: Use table() and divide by its sum() to get a percentage.)

19f. Calculate the mean of the sex column. What happens and why?

19g. What was the mean before time? Assign the result to a scalar object called before.mean.

19h. What was the mean after time? Assign the result to a scalar object called after.mean.

19i. What was the difference in the mean before times and the mean after times? Calculate this in two ways: once using the change vector, and once using the before.mean and after.mean objects. (Verify that you obtain the same answer for both.)

Standardizing variables (via z-scores)

Standardizing variables makes them comparable. For instance, the z-scores of a vector \(v\) of values \(x_{i}\) are computed by subtracting the mean \(m(v)\) from every value \(x_{i}\), and then dividing the result by the vector’s standard deviation \(sd(v)\). Standardizing multiple variables puts them on a common scale (i.e., transformed variables have the same mean and standard deviations).

20a. Create a vector called before.z as a standardized version of before.

20b. Create a vector called after.z as a standardized version of after.

20c. What was the largest before score? What was its corresponding z-score?

20d. What was the smallest after score? What was its corresponding z-score?

20e. What should the mean and standard deviation of before.z and after.z be? Test your predictions by carrying out the appropriate calculations.

Checkpoint 3

If you made it to this point you’re doing a great job. Consider the following a tasty bonus task…

C. Bonus:
The Room with 100 Boxes

21. Imagine the following: You enter a room with 100 closed boxes. 99 of the 100 boxes each contain EUR 10,000 which you can keep if you open the box. However, 1 of the 100 boxes contains a bomb which kills you if you open it.

Here’s your question: If you walked into the room with 100 closed boxes, how many would you want to open?

You can easily play the Room with 100 boxes game using the sample() function in R.

First, assign the number of boxes you want to open to a new scalar object called i.will.open:

# How many do you want to open? 
i.will.open <- 0  # To play, change the value from 0 to a number from 1 to 100.

Now run the following code to see what you get:

# Play the Room with 100 Boxes Game:
n <- 100 # number of boxes

# Define the set of possible boxes (n-1 contain 10000 each, but 1 contains -Inf): 
boxes <- c(rep(10000, (n-1)), -Inf)
boxes

# Draw a random sample of size i.will.open from the boxes:
boxes.result <- sample(x = boxes, size = i.will.open)

# Alternative solution:
shuffled.boxes <- sample(boxes, n) # permute boxes
if (i.will.open > 0) {
  boxes.result <- shuffled.boxes[1:i.will.open]} # open the first i.will.open boxes


# Print your results:
boxes.result      # show what's in each box (in EUR)
sum(boxes.result) # your total winnings (in EUR)

You can also represent the boxes game with a custom function in R. Run the following chunk to create the new function play.boxes.game():

# Run this chunk to create the function:
play.boxes.game <- function(i.will.open) {

# Prevent exponent printing:
options("scipen" = 100, "digits" = 4)
  
if(i.will.open == 0) { # Case 0: No play
  print("You haven't opened any boxes. You earned nothing but are still alive.")
  }
  
if(i.will.open > 0) {

  boxes <- c(rep(10000, 99), -Inf) # fill boxes  
  boxes.result <- sample(x = boxes, size = i.will.open) # draw sample

  # Evaluate boxes.result: 
  if(-Inf %in% boxes.result) { # Case 1: Bad luck
    
    print(paste("You're dead! You opened ", i.will.open, 
                " boxes and got the bomb!", sep = ""))}

  if((-Inf %in% boxes.result) == FALSE) { # Case 2: Lucky draw
  
    print(paste("Congratulations! You opened ", i.will.open, 
                " boxes and earned EUR ", sum(boxes.result), ". ",  
                " Do you want to play again? :)", sep = ""))}
   }
}

Now play the game a few times just by running the play.boxes.game() function with the number of boxes you want to open as its only argument! For instance:

play.boxes.game(0)  # Play by opening 0 boxes...
play.boxes.game(3)  # Play by opening 3 boxes...

That’s it – now it’s time to submit your assignment!

Save and submit your (commented) script or output file (including all code) to the appropriate folder on Ilias by Wednesday, 23:59.


[WPA01.Rmd updated on 2017-10-30 09:05:55 by hn.]