WPA07 of Basic data and decision analysis in R, taught at the University of Konstanz in Winter 2017/2018.

To complete and submit these exercises, please remember and do the following:

Your WPAs should be written as scripts of commented code (as .Rmd files) and submitted as reproducible documents that combine text with code (in .html or .pdf formats).
- A simple .Rmd template is provided here.
- (Alternatively, open a plain R script and save it as LastnameFirstname_WPA##_yymmdd.R.)
Also enter the current assignment (e.g., WPA07), your name, and the current date at the top of your document. When working on a task, always indicate which task you are answering with appopriate comments.
Complete as many exercises as you can by Wednesday (23:59).
Submit your script or output file (including all code) to the appropriate folder on Ilias.

General guidelines

Read the following guidelines carefully to save yourself time in conducting analyses and reporting results in this WPA:

For each question, conduct the appropriate ANOVA and then formulate your conclusion in APA style. To summarize an effect in an ANOVA, use the format \(F(XXX, YYY) = FFF\), \(p = PPP\), where \(XXX\) is the degrees of freedom of the variable you are testing, \(YYY\) is the degrees of freedom of the residuals, \(FFF\) is the \(F\)-value for the variable you are testing, and \(PPP\) is the \(p\)-value.
p-values: If the \(p\)-value is less than .01, just write \(p < .01\). If the \(p\)-value of the ANOVA is less than .05, conduct post-hoc tests.
post-hoc tests: If you are only testing one independent variable, write the proper APA conclusions for the post-hoc test. If you are testing more than one independent variable in your ANOVA, you do not need to write the APA style conclusions for post-hoc tests – just print the result.

Here is an example:

Question: Was there an effect of diets on chicken weights (in the ChickWeight data set)?

# ANOVA on Chicken Weights: IV = Diet, DV = weight

# Conduct ANOVA:
p0.aov <- aov(formula = weight ~ Diet,
            data = ChickWeight)
summary(p0.aov)
#>              Df  Sum Sq Mean Sq F value   Pr(>F)    
#> Diet          3  155863   51954   10.81 6.43e-07 ***
#> Residuals   574 2758693    4806                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ANOVA was significant (p < .01), so do post-hoc tests: 
TukeyHSD(p0.aov) # post-hoc tests
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = weight ~ Diet, data = ChickWeight)
#> 
#> $Diet
#>          diff         lwr      upr     p adj
#> 2-1 19.971212  -0.2998092 40.24223 0.0552271
#> 3-1 40.304545  20.0335241 60.57557 0.0000025
#> 4-1 32.617257  12.2353820 52.99913 0.0002501
#> 3-2 20.333333  -2.7268370 43.39350 0.1058474
#> 4-2 12.646045 -10.5116315 35.80372 0.4954239
#> 4-3 -7.687288 -30.8449649 15.47039 0.8277810

Answer: There was a significant main effect of diets on chicken weights (\(F(3, 574) = 10.81\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, \(p < .01\)) and diets 1 and 4 (diff = 32.62, \(p < .01\)). All other pairwise differences were not significant at the \(\alpha = .05\) significance level.

A. In Class

Here are some warm-up exercises that review important points from previous chapters and practice the basic concepts of the current topic:

Preparations

0. The following steps prepare the current session by opening an R project, creating a new .Rmd file, and compiling it into an .html output file:

0a. Open your R project from last week (called RCourse or something similar), which contains some files and at least two subfolders (data and R).

0b. Create a new R Markdown (.Rmd) script and save it as LastFirst_WPA07_yymmdd.Rmd (with an appropriate header) in your project directory.

0c. Insert a code chunk and load the rmarkdown, knitr and yarrr packages. (Hint: It’s always a good idea to name code chunks and load all required packages with library() at the beginning of your document. Using the chunk option include = FALSE evaluates the chunk, but does not show it or its outputs in the html output file.)

library(rmarkdown)
library(knitr)
library(yarrr)

# Store original par() settings:
opar <- par()
# par(opar) # restores original (default) par settings later

0d. Make sure that you can create an .html output-file by “knitting” your current document.

Data exploration

1. The ToothGrowth data set included in R contains the length len of odontoblasts (i.e., cells responsible for tooth growth) in 60 guinea pigs as a function of two delivery methods (supp) and three levels of vitamin C (dose).

1a. Copy the data set into an object tg and familiarize yourself with its structure and contents. (Hint: Check ?ToothGrowth and use the head(), str() and summary() functions to explore the data.)

#>    len supp dose
#> 1  4.2   VC  0.5
#> 2 11.5   VC  0.5
#> 3  7.3   VC  0.5
#> 4  5.8   VC  0.5
#> 5  6.4   VC  0.5
#> 6 10.0   VC  0.5

1b. How many cases are there of each combination of supp and dose? (Hint: Use the table() function to find out.)

1c. What are the mean lengths for each combination of supp and dose? (Hint: Use multiple mean() or the aggregate() functions to find out.)

1d. Visualize the potential effects of supp and dose on the len variable. (Hint: Use the yarrr::pirateplot(), barplot(), or boxplot() functions to create a plot.)

# (a) Pirateplot: 
yarrr::pirateplot(formula = len ~ supp + dose, 
                  data = tg,
                  main = "Effects of delivery method (supp) and dosage (dose) on length",
                  xlab = "IVs",
                  ylab = "DV: length",
                  gl.col = "gray",
                  pal = unikn.col[c(4, 6)])

## (b) Barplot with confidence intervals: 
# install.packages("sciplot") # installs package (if not already installed)
library(sciplot) # load package
sciplot::bargraph.CI(tg$supp, tg$len, tg$dose, 
                     x.leg = "topleft", 
                     col = seeblau,
                     angle = 45, density = c(0, 40, 100), legend = TRUE, 
                     ylim = c(0, 30), las = 1, space = c(0.1,1), 
                     ylab = "Tooth length", xlab = "Supplement type",
                     main = "Effects of delivery method (supp) and dosage (dose) on length")

# (c) Boxplot:
boxplot(len ~ supp * dose, 
        data = tg, 
        col = c(seeblau.col[4], seeblau.col[1]), 
        las = 1,
        names = c("OJ|0.5", "VC|0.5", "OJ|1.0", 
                  "VC|1.0", "OJ|2.0", "VC|2.0"),
        horizontal = TRUE, 
        xlab = "Tooth length")

Answer: (…)

ANOVAs with 1 IV

In the following exercises, you will test statistical hypotheses about mean differences between two or more samples (on the tg data set from above).

2a. Does the delivery method of the supplement supp have a systematic effect on len? (Hint: Use an ANOVA with one IV to find out.)

Answer: (…)

2b. As the test in 2a. only involved two groups, you could have used a simpler test to check the effects of supp on len. Conduct the corresponding t-test to check the results of your ANOVA.

Answer: (…)

3a. Does the level of the vitamin C dose have a systematic effect on len? (Hint: Use an ANOVA with one IV to find out and note that the as.factor() can be used to convert numeric variables into factors.)

Answer: (…)

3b. More specifically, which levels of the vitamin C dose have a systematic effect on len? (Hint: Use post-hoc tests on the ANOVA of 3a. to find out.)

Answer: (…)

3c. Check the result of the post-hoc test for the difference between a dose of 1 and a dose of 2 with a corresponding t-test.

Answer: (…)

ANOVAs with 2 IVs

4a. Test the main effects of delivery method supp and dosage dose on len in one analysis. (Hint: Use an aov() with 2 independent variables – combined by the + operator – and appropriate post-hoc tests in case of significant main effects.)

Answer: (…)

4b. Test the main effects and a potential interaction of the delivery method supp and dosage dose on len in one analysis. (Hint: Use an aov() with 2 independent variables – combined by the * operator – and appropriate post-hoc tests in case of significant main effects or interactions.)

Answer: (…)

Checkpoint 1

At this point you completed all basic exercises. This is good, but additional practice will deepen your understanding, so please keep carrying on…

B. At Home

Facebook attraction

In this WPA, you will analyze data from a ficticious study on attraction. In the study, 1,000 heterosexual University students viewed the Facebook profile of another student (the “target” person) of the opposite sex. Based on a target person’s profile, each participant made three judgments about the target: their perceived intelligence, attractiveness, and dateability. The primary judgment of interest was the dateability rating indicating as how dateable the target person was perceived (ranging from a minimum value of 0 to a maximum value of 100).

Data description

The data file contains 1,000 rows and 10 columns. Here are the columns:

session: The experiment session in which the study was run. There were 50 sessions in total.
sex: The sex of the target person (“m” vs. “f”).
age: The age of the target person (in years).
haircolor: The hair color of the target person.
university: The university that the target person attended.
education: The highest level of education obtained by the target person.
shirtless: Did the target person have a shirtless profile picture? (1.No vs. 2.Yes).
intelligence: As how intelligent do you rate this target? (1.Low, 2.Medium, 3.High).
attractiveness: As how physically attractive do you rate this target? (1.Low, 2.Medium, 3.High).
dateability: As how dateable do you rate this target person? (Scale from 0 to 100).

Data loading and exploration

5a. The data are located in a tab-delimited text file at http://Rpository.com/down/data/WPA07_facebook.txt. Using read.table() load this data into R as a new object called facebook.

facebook <- read.table(file = "http://Rpository.com/down/data/WPA07_facebook.txt", # from url
                       # file = "data/WPA07_facebook.txt", # local
                       sep = "\t", header = TRUE)

Here is how the first few rows of the data should look:

head(facebook)
#>   session sex age haircolor   university    education shirtless
#> 1       1   m  23     brown 3.Goettingen    3.Masters     2.Yes
#> 2       1   m  19    blonde   2.Freiburg 1.HighSchool      1.No
#> 3       1   f  22     brown   2.Freiburg  2.Bachelors     2.Yes
#> 4       1   f  22       red   2.Freiburg  2.Bachelors      1.No
#> 5       1   m  23     brown 3.Goettingen  2.Bachelors      1.No
#> 6       1   m  26    blonde   2.Freiburg    3.Masters     2.Yes
#>   intelligence attractiveness dateability
#> 1        1.low         3.high          15
#> 2     2.medium       2.medium          44
#> 3        1.low       2.medium         100
#> 4     2.medium         3.high         100
#> 5     2.medium       2.medium          63
#> 6       3.high         3.high          76

5b. Inspect the first few rows of the dataframe with the head() function to make sure it loaded correctly. Using the str() function, look at the structure of the dataframe to make sure everything looks ok.

One-way ANOVAs

6a. Was there a main effect of the university on dateability? Conduct a one-way ANOVA to find out. If the result is significant (i.e., \(p < .05\)), conduct post-hoc tests.

Answer: (…)

6b. Was there a main effect of haircolor on dateability? Conduct a one-way ANOVA to find out. If the result is significant (p < .05), conduct post-hoc tests.

Answer: (…)

6c. Was there a main effect of intelligence on dateability? Conduct a one-way ANOVA to find out. If the result is significant (\(p < .05\)), conduct post-hoc tests.

Answer: (…)

Multi-independent ANOVAs

7a. Conduct a three-way ANOVA on dateability with both intelligence, university and haircolor as independent variables (IVs). Do your results for each variable change compared to your previous one-way ANOVAs on these variables? (You do not need to provide the full APA results or conduct post-hoc tests, just verbally answer the question.)

Answer: (…)

7b. Conduct a multi-way ANOVA predicting dateability by sex, haircolor, university, education, shirtless, intelligence and attractiveness as independent variables (IVs). Which of these variables are significantly related to dateability? (Do write APA results for each variable, but do not conduct post-hoc tests.)

Answer: (…)

Checkpoint 2

If you got this far you’re doing great, but don’t give up just yet…

ANOVAs on subsets of data

8. It turns out that the (male) experimenter who ran the experimental Sessions 1 through 30 was trying to score a date and slipped in his own profile picture into the study. Thus, you wonder whether you can trust the data of these sessions. Repeat your multi ANOVA from question 7b. ONLY for Sessions 31 through 50. Do your conclusions change when compared to the data from all sessions?

Answer: (…)

Interactions

9a. Create a plot (e.g., a pirateplot(), barplot(), or boxplot()) showing the distribution of dateability based on two independent variables of sex and shirtless. Based on what you see in the plot, do you expect there to be an interaction between these two independent variables? Why or why not?

Answer: (…)

9b. Test your prediction with the appropriate ANOVA.

Answer: (…)

Checkpoint 3

If you got this far you’re doing an amazing job — well done! Enjoy the following challenge…

C. Challenges

More interactions

10a. Create a plot (e.g., a pirateplot(), barplot(), or boxplot()) showing the distribution of dateability based on two independent variables: sex and intelligence. Based on what you see in this plot, do you expect there to be an interaction between the two IVs? Why or why not?

Answer: (…)

10b. Test your prediction with the appropriate ANOVA.

Answer: (…)

11a. Create a plot (e.g., a pirateplot(), barplot(), or boxplot()) showing the distribution of dateability based on two independent variables: haircolor and university. Based on what you see in the plot, do you expect there to be an interaction between haircolor and university? Why or why not?

Answer: (…)

11b. Test your prediction with the appropriate ANOVA.

Answer: (…)

Submission

That’s it – now it’s time to submit your assignment!

Save and submit your script or output file (including all code) to the appropriate folder on Ilias before Wednesday (23:59).

[WPA07.Rmd updated on 2017-12-11 13:25:05 by hn.]

WPA07: Statistics: ANOVAs

Hansjörg Neth, SPDS, uni.kn

2017 Dec 11

General guidelines

A. In Class

Preparations

Data exploration

ANOVAs with 1 IV

ANOVAs with 2 IVs

Checkpoint 1

B. At Home

Facebook attraction

Data description

Data loading and exploration

One-way ANOVAs

Multi-independent ANOVAs

Checkpoint 2

ANOVAs on subsets of data

Interactions

Checkpoint 3

C. Challenges

More interactions

Submission