Answers for WPA07 of Basic data and decision analysis in R, taught at the University of Konstanz in Winter 2017/2018.

To complete and submit these exercises, please remember and do the following:

Your WPAs should be written as scripts of commented code (as .Rmd files) and submitted as reproducible documents that combine text with code (in .html or .pdf formats).
- A simple .Rmd template is provided here.
- (Alternatively, open a plain R script and save it as LastnameFirstname_WPA##_yymmdd.R.)
Also enter the current assignment (e.g., WPA07), your name, and the current date at the top of your document. When working on a task, always indicate which task you are answering with appopriate comments.
Complete as many exercises as you can by Wednesday (23:59).
Submit your script or output file (including all code) to the appropriate folder on Ilias.

General guidelines

Read the following guidelines carefully to save yourself time in conducting analyses and reporting results in this WPA:

For each question, conduct the appropriate ANOVA and then formulate your conclusion in APA style. To summarize an effect in an ANOVA, use the format \(F(XXX, YYY) = FFF\), \(p = PPP\), where \(XXX\) is the degrees of freedom of the variable you are testing, \(YYY\) is the degrees of freedom of the residuals, \(FFF\) is the \(F\)-value for the variable you are testing, and \(PPP\) is the \(p\)-value.
p-values: If the \(p\)-value is less than .01, just write \(p < .01\). If the \(p\)-value of the ANOVA is less than .05, conduct post-hoc tests.
post-hoc tests: If you are only testing one independent variable, write the proper APA conclusions for the post-hoc test. If you are testing more than one independent variable in your ANOVA, you do not need to write the APA style conclusions for post-hoc tests – just print the result.

Here is an example:

Question: Was there an effect of diets on chicken weights (in the ChickWeight data set)?

# ANOVA on Chicken Weights: IV = Diet, DV = weight

# Conduct ANOVA:
p0.aov <- aov(formula = weight ~ Diet,
            data = ChickWeight)
summary(p0.aov)
#>              Df  Sum Sq Mean Sq F value   Pr(>F)    
#> Diet          3  155863   51954   10.81 6.43e-07 ***
#> Residuals   574 2758693    4806                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# ANOVA was significant (p < .01), so do post-hoc tests: 
TukeyHSD(p0.aov) # post-hoc tests
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = weight ~ Diet, data = ChickWeight)
#> 
#> $Diet
#>          diff         lwr      upr     p adj
#> 2-1 19.971212  -0.2998092 40.24223 0.0552271
#> 3-1 40.304545  20.0335241 60.57557 0.0000025
#> 4-1 32.617257  12.2353820 52.99913 0.0002501
#> 3-2 20.333333  -2.7268370 43.39350 0.1058474
#> 4-2 12.646045 -10.5116315 35.80372 0.4954239
#> 4-3 -7.687288 -30.8449649 15.47039 0.8277810

Answer: There was a significant main effect of diets on chicken weights (\(F(3, 574) = 10.81\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, \(p < .01\)) and diets 1 and 4 (diff = 32.62, \(p < .01\)). All other pairwise differences were not significant at the \(\alpha = .05\) significance level.

A. In Class

Here are some warm-up exercises that review important points from previous chapters and practice the basic concepts of the current topic:

Preparations

0. The following steps prepare the current session by opening an R project, creating a new .Rmd file, and compiling it into an .html output file:

0a. Open your R project from last week (called RCourse or something similar), which contains some files and at least two subfolders (data and R).

0b. Create a new R Markdown (.Rmd) script and save it as LastFirst_WPA07_yymmdd.Rmd (with an appropriate header) in your project directory.

0c. Insert a code chunk and load the rmarkdown, knitr and yarrr packages. (Hint: It’s always a good idea to name code chunks and load all required packages with library() at the beginning of your document. Using the chunk option include = FALSE evaluates the chunk, but does not show it or its outputs in the html output file.)

library(rmarkdown)
library(knitr)
library(yarrr)

# Store original par() settings:
opar <- par()
# par(opar) # restores original (default) par settings later

0d. Make sure that you can create an .html output-file by “knitting” your current document.

Data exploration

1. The ToothGrowth data set included in R contains the length len of odontoblasts (i.e., cells responsible for tooth growth) in 60 guinea pigs as a function of two delivery methods (supp) and three levels of vitamin C (dose).

1a. Copy the data set into an object tg and familiarize yourself with its structure and contents. (Hint: Check ?ToothGrowth and use the head(), str() and summary() functions to explore the data.)

#>    len supp dose
#> 1  4.2   VC  0.5
#> 2 11.5   VC  0.5
#> 3  7.3   VC  0.5
#> 4  5.8   VC  0.5
#> 5  6.4   VC  0.5
#> 6 10.0   VC  0.5

1b. How many cases are there of each combination of supp and dose? (Hint: Use the table() function to find out.)

table(tg$supp, tg$dose) # count cases
#>     
#>      0.5  1  2
#>   OJ  10 10 10
#>   VC  10 10 10

1c. What are the mean lengths for each combination of supp and dose? (Hint: Use multiple mean() or the aggregate() functions to find out.)

ag <- aggregate(len ~ ., data = tg, mean) # means
xtabs(len ~ ., data = ag)
#>     dose
#> supp   0.5     1     2
#>   OJ 13.23 22.70 26.06
#>   VC  7.98 16.77 26.14

1d. Visualize the potential effects of supp and dose on the len variable. (Hint: Use the yarrr::pirateplot(), barplot(), or boxplot() functions to create a plot.)

# (a) Pirateplot: 
yarrr::pirateplot(formula = len ~ supp + dose, 
                  data = tg,
                  main = "Effects of delivery method (supp) and dosage (dose) on length",
                  xlab = "IVs",
                  ylab = "DV: length",
                  gl.col = "gray",
                  pal = unikn.col[c(4, 6)])


## (b) Barplot with confidence intervals: 
# install.packages("sciplot")
library(sciplot)
sciplot::bargraph.CI(tg$supp, tg$len, tg$dose, 
                     x.leg = "topleft", 
                     col = seeblau,
                     angle = 45, density = c(0, 40, 100), legend = TRUE, 
                     ylim = c(0, 30), las = 1, space = c(0.1,1), 
                     ylab = "Tooth length", xlab = "Supplement type",
                     main = "Effects of delivery method (supp) and dosage (dose) on length")


# (c) Boxplot:
boxplot(len ~ supp * dose, 
        data = tg, 
        col = c(seeblau.col[4], seeblau.col[1]), 
        las = 1,
        names = c("OJ|0.5", "VC|0.5", "OJ|1.0", 
                  "VC|1.0", "OJ|2.0", "VC|2.0"),
        horizontal = TRUE, 
        xlab = "Tooth length")

Answer: Looks like an interaction.

ANOVAs with 1 IV

In the following exercises, you will test statistical hypotheses about mean differences between two or more samples.

2a. Does the delivery method of the supplement supp have a systematic effect on len? (Hint: Use an ANOVA with one IV to find out.)

# ANOVA with 1IV of 2 groups:
tg1.aov <- aov(formula = len ~ supp, data = tg)
summary(tg1.aov)
#>             Df Sum Sq Mean Sq F value Pr(>F)  
#> supp         1    205  205.35   3.668 0.0604 .
#> Residuals   58   3247   55.98                 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Check with linear model:
tg1.lm <- lm(formula = len ~ supp, data = tg)
summary(tg1.lm)
#> 
#> Call:
#> lm(formula = len ~ supp, data = tg)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -12.7633  -5.7633   0.4367   5.5867  16.9367 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   20.663      1.366  15.127   <2e-16 ***
#> suppVC        -3.700      1.932  -1.915   0.0604 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 7.482 on 58 degrees of freedom
#> Multiple R-squared:  0.05948,    Adjusted R-squared:  0.04327 
#> F-statistic: 3.668 on 1 and 58 DF,  p-value: 0.06039

Answer: There is no systematic effect of delivery method supp on length: \(F(1, 58) = 3.67\), \(p = 0.06\).

2b. As the test in 2a. only involved two groups, you could have used a simpler test to check the effects of supp on len. Conduct the corresponding t-test to check the results of your ANOVA.

t.test(x = tg$len[tg$supp == "VC"],   
       y = tg$len[tg$supp == "OJ"])
#> 
#>  Welch Two Sample t-test
#> 
#> data:  tg$len[tg$supp == "VC"] and tg$len[tg$supp == "OJ"]
#> t = -1.9153, df = 55.309, p-value = 0.06063
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -7.5710156  0.1710156
#> sample estimates:
#> mean of x mean of y 
#>  16.96333  20.66333

Answer: Essentially the same result: mean difference = 3.7, t(55.31) = -1.92, p = 0.06 (2-tailed).

3a. Does the level of the vitamin C dose have a systematic effect on len? (Hint: Use an ANOVA with one IV to find out and note that the as.factor() can be used to convert numeric variables into factors.)

is.factor(tg$dose) # is a numeric variable 
#> [1] FALSE
tg$dose2 <- as.factor(tg$dose) # convert to factor variable
is.factor(tg$dose2) # check
#> [1] TRUE

# ANOVA with 1IV of 3 groups:
tg2.aov <- aov(formula = len ~ dose2, data = tg)
summary(tg2.aov)
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> dose2        2   2426    1213   67.42 9.53e-16 ***
#> Residuals   57   1026      18                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Check with linear model:
tg2.lm <- aov(formula = len ~ dose2, data = tg)
summary(tg2.lm)
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> dose2        2   2426    1213   67.42 9.53e-16 ***
#> Residuals   57   1026      18                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: Yes, there is a significant effect of dose on len: \(F(2, 57) = 67.4\), \(p < .01\).

3b. More specifically, which levels of the vitamin C dose have a systematic effect on len? (Hint: Use post-hoc tests on the ANOVA of 3a. to find out.)

# Post-hoc tests:
TukeyHSD(tg2.aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = len ~ dose2, data = tg)
#> 
#> $dose2
#>         diff       lwr       upr    p adj
#> 1-0.5  9.130  5.901805 12.358195 0.00e+00
#> 2-0.5 15.495 12.266805 18.723195 0.00e+00
#> 2-1    6.365  3.136805  9.593195 4.25e-05

Answer: All pairwise comparisons between three levels of dose are different from each other (at the \(p < .01\) level).

3c. Check the result of the post-hoc test for the difference between a dose of 1 and a dose of 2 with a corresponding t-test.

# t-test of len by dose == 1 vs. dose == 2:
t.test(x = tg$len[tg$dose == 1],   
       y = tg$len[tg$dose == 2])
#> 
#>  Welch Two Sample t-test
#> 
#> data:  tg$len[tg$dose == 1] and tg$len[tg$dose == 2]
#> t = -4.9005, df = 37.101, p-value = 1.906e-05
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -8.996481 -3.733519
#> sample estimates:
#> mean of x mean of y 
#>    19.735    26.100

Answer: Yes, they are different indeed: mean difference = 6.37, t(37.1) = -4.9, p < 0.01 (2-tailed).

ANOVAs with 2 IVs

4a. Test the main effects of delivery method supp and dosage dose on len in one analysis. (Hint: Use an aov() with 2 independent variables – combined by the + operator – and appropriate post-hoc tests in case of significant main effects.)

# ANOVA with 2IVs of 2x3 groups: Only main effects
tg3.aov <- aov(formula = len ~ supp + dose2, data = tg)
summary(tg3.aov)
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> supp         1  205.4   205.4   14.02 0.000429 ***
#> dose2        2 2426.4  1213.2   82.81  < 2e-16 ***
#> Residuals   56  820.4    14.7                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(tg3.aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = len ~ supp + dose2, data = tg)
#> 
#> $supp
#>       diff       lwr       upr     p adj
#> VC-OJ -3.7 -5.679762 -1.720238 0.0004293
#> 
#> $dose2
#>         diff       lwr       upr p adj
#> 1-0.5  9.130  6.215909 12.044091 0e+00
#> 2-0.5 15.495 12.580909 18.409091 0e+00
#> 2-1    6.365  3.450909  9.279091 7e-06

# Check with linear model:
tg3.lm <- lm(formula = len ~ supp + dose2, data = tg)
summary(tg3.lm)
#> 
#> Call:
#> lm(formula = len ~ supp + dose2, data = tg)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -7.085 -2.751 -0.800  2.446  9.650 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  12.4550     0.9883  12.603  < 2e-16 ***
#> suppVC       -3.7000     0.9883  -3.744 0.000429 ***
#> dose21        9.1300     1.2104   7.543 4.38e-10 ***
#> dose22       15.4950     1.2104  12.802  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.828 on 56 degrees of freedom
#> Multiple R-squared:  0.7623, Adjusted R-squared:  0.7496 
#> F-statistic: 59.88 on 3 and 56 DF,  p-value: < 2.2e-16

Answer: Both main effects are significant:

Effect of supp on len: \(F(1, 56) = 14.0\), \(p < .01\). (Note that this main effect was not found above!)
Effect of dose on len: \(F(2, 56) = 82.8\), \(p < .01\) (as above).

4b. Test the main effects and a potential interaction of the delivery method supp and dosage dose on len in one analysis. (Hint: Use an aov() with 2 independent variables – combined by the * operator – and appropriate post-hoc tests in case of significant main effects or interactions.)

# ANOVA with 2IVs of 2x3 groups: with interaction:
tg4.aov <- aov(formula = len ~ supp * dose2, data = tg)
summary(tg4.aov)
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> supp         1  205.4   205.4  15.572 0.000231 ***
#> dose2        2 2426.4  1213.2  92.000  < 2e-16 ***
#> supp:dose2   2  108.3    54.2   4.107 0.021860 *  
#> Residuals   54  712.1    13.2                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(tg4.aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = len ~ supp * dose2, data = tg)
#> 
#> $supp
#>       diff       lwr       upr     p adj
#> VC-OJ -3.7 -5.579828 -1.820172 0.0002312
#> 
#> $dose2
#>         diff       lwr       upr   p adj
#> 1-0.5  9.130  6.362488 11.897512 0.0e+00
#> 2-0.5 15.495 12.727488 18.262512 0.0e+00
#> 2-1    6.365  3.597488  9.132512 2.7e-06
#> 
#> $`supp:dose2`
#>                diff        lwr        upr     p adj
#> VC:0.5-OJ:0.5 -5.25 -10.048124 -0.4518762 0.0242521
#> OJ:1-OJ:0.5    9.47   4.671876 14.2681238 0.0000046
#> VC:1-OJ:0.5    3.54  -1.258124  8.3381238 0.2640208
#> OJ:2-OJ:0.5   12.83   8.031876 17.6281238 0.0000000
#> VC:2-OJ:0.5   12.91   8.111876 17.7081238 0.0000000
#> OJ:1-VC:0.5   14.72   9.921876 19.5181238 0.0000000
#> VC:1-VC:0.5    8.79   3.991876 13.5881238 0.0000210
#> OJ:2-VC:0.5   18.08  13.281876 22.8781238 0.0000000
#> VC:2-VC:0.5   18.16  13.361876 22.9581238 0.0000000
#> VC:1-OJ:1     -5.93 -10.728124 -1.1318762 0.0073930
#> OJ:2-OJ:1      3.36  -1.438124  8.1581238 0.3187361
#> VC:2-OJ:1      3.44  -1.358124  8.2381238 0.2936430
#> OJ:2-VC:1      9.29   4.491876 14.0881238 0.0000069
#> VC:2-VC:1      9.37   4.571876 14.1681238 0.0000058
#> VC:2-OJ:2      0.08  -4.718124  4.8781238 1.0000000

# Check with linear model:
tg4.lm <- lm(formula = len ~ supp * dose2, data = tg)
summary(tg4.lm)
#> 
#> Call:
#> lm(formula = len ~ supp * dose2, data = tg)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#>  -8.20  -2.72  -0.27   2.65   8.27 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     13.230      1.148  11.521 3.60e-16 ***
#> suppVC          -5.250      1.624  -3.233  0.00209 ** 
#> dose21           9.470      1.624   5.831 3.18e-07 ***
#> dose22          12.830      1.624   7.900 1.43e-10 ***
#> suppVC:dose21   -0.680      2.297  -0.296  0.76831    
#> suppVC:dose22    5.330      2.297   2.321  0.02411 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.631 on 54 degrees of freedom
#> Multiple R-squared:  0.7937, Adjusted R-squared:  0.7746 
#> F-statistic: 41.56 on 5 and 54 DF,  p-value: < 2.2e-16

Answer: Both main effects (of supp and of dose) and their interaction are significant.

Checkpoint 1

At this point you completed all basic exercises. This is good, but additional practice will deepen your understanding, so please keep carrying on…

B. At Home

Facebook attraction

In this WPA, you will analyze data from a ficticious study on attraction. In the study, 1,000 heterosexual University students viewed the Facebook profile of another student (the “target” person) of the opposite sex. Based on a target person’s profile, each participant made three judgments about the target: their perceived intelligence, attractiveness, and dateability. The primary judgment of interest was the dateability rating indicating as how dateable the target person was perceived (ranging from a minimum value of 0 to a maximum value of 100).

Data description

The data file contains 1,000 rows and 10 columns. Here are the columns:

session: The experiment session in which the study was run. There were 50 sessions in total.
sex: The sex of the target person (“m” vs. “f”).
age: The age of the target person (in years).
haircolor: The hair color of the target person.
university: The university that the target person attended.
education: The highest level of education obtained by the target person.
shirtless: Did the target person have a shirtless profile picture? (1.No vs. 2.Yes).
intelligence: As how intelligent do you rate this target? (1.Low, 2.Medium, 3.High).
attractiveness: As how physically attractive do you rate this target? (1.Low, 2.Medium, 3.High).
dateability: As how dateable do you rate this target person? (Scale from 0 to 100).

Data loading and exploration

5a. The data are located in a tab-delimited text file at http://Rpository.com/down/data/WPA07_facebook.txt. Using read.table() load this data into R as a new object called facebook.

Here is how the first few rows of the data should look:

head(facebook)
#>   session sex age haircolor   university    education shirtless
#> 1       1   m  23     brown 3.Goettingen    3.Masters     2.Yes
#> 2       1   m  19    blonde   2.Freiburg 1.HighSchool      1.No
#> 3       1   f  22     brown   2.Freiburg  2.Bachelors     2.Yes
#> 4       1   f  22       red   2.Freiburg  2.Bachelors      1.No
#> 5       1   m  23     brown 3.Goettingen  2.Bachelors      1.No
#> 6       1   m  26    blonde   2.Freiburg    3.Masters     2.Yes
#>   intelligence attractiveness dateability
#> 1        1.low         3.high          15
#> 2     2.medium       2.medium          44
#> 3        1.low       2.medium         100
#> 4     2.medium         3.high         100
#> 5     2.medium       2.medium          63
#> 6       3.high         3.high          76

5b. Inspect the first few rows of the dataframe with the head() function to make sure it loaded correctly. Using the str() function, look at the structure of the dataframe to make sure everything looks ok.

#>   session sex age haircolor   university    education shirtless
#> 1       1   m  23     brown 3.Goettingen    3.Masters     2.Yes
#> 2       1   m  19    blonde   2.Freiburg 1.HighSchool      1.No
#> 3       1   f  22     brown   2.Freiburg  2.Bachelors     2.Yes
#> 4       1   f  22       red   2.Freiburg  2.Bachelors      1.No
#> 5       1   m  23     brown 3.Goettingen  2.Bachelors      1.No
#> 6       1   m  26    blonde   2.Freiburg    3.Masters     2.Yes
#>   intelligence attractiveness dateability
#> 1        1.low         3.high          15
#> 2     2.medium       2.medium          44
#> 3        1.low       2.medium         100
#> 4     2.medium         3.high         100
#> 5     2.medium       2.medium          63
#> 6       3.high         3.high          76
#> 'data.frame':    1000 obs. of  10 variables:
#>  $ session       : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ sex           : Factor w/ 2 levels "f","m": 2 2 1 1 2 2 1 2 1 1 ...
#>  $ age           : int  23 19 22 22 23 26 19 25 22 19 ...
#>  $ haircolor     : Factor w/ 3 levels "blonde","brown",..: 2 1 2 3 2 1 2 3 2 1 ...
#>  $ university    : Factor w/ 3 levels "1.Konstanz","2.Freiburg",..: 3 2 2 2 3 2 3 2 2 3 ...
#>  $ education     : Factor w/ 4 levels "1.HighSchool",..: 3 1 2 2 2 3 1 3 2 1 ...
#>  $ shirtless     : Factor w/ 2 levels "1.No","2.Yes": 2 1 2 1 1 2 1 2 2 1 ...
#>  $ intelligence  : Factor w/ 3 levels "1.low","2.medium",..: 1 2 1 2 2 3 1 3 2 2 ...
#>  $ attractiveness: Factor w/ 3 levels "1.low","2.medium",..: 3 2 2 3 2 3 2 1 2 2 ...
#>  $ dateability   : int  15 44 100 100 63 76 61 26 76 40 ...

One-way ANOVAs

6a. Was there a main effect of the university on dateability? Conduct a one-way ANOVA to find out. If the result is significant (i.e., \(p < .05\)), conduct post-hoc tests.

unidate.aov <- aov(formula = dateability ~ university, data = facebook)

summary(unidate.aov)
#>              Df Sum Sq Mean Sq F value   Pr(>F)    
#> university    2  19769    9884   13.99 1.01e-06 ***
#> Residuals   997 704173     706                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(unidate.aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = dateability ~ university, data = facebook)
#> 
#> $university
#>                               diff        lwr       upr     p adj
#> 2.Freiburg-1.Konstanz   -10.275500 -15.099617 -5.451383 0.0000020
#> 3.Goettingen-1.Konstanz  -8.641689 -13.586611 -3.696767 0.0001311
#> 3.Goettingen-2.Freiburg   1.633811  -3.122189  6.389811 0.6991862

Answer: There was a significant main effect of university on dateability (\(F(2, 997) = 13.99\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between Freiburg and Konstanz (\(diff = -10.28\), \(p < .01\)) and between Goettingen and Konstanz (\(diff = -8.64\), \(p < .01\)). The pairwise difference between Goettingen and Freiburg was not significant at the 0.05 significance threshold.

6b. Was there a main effect of haircolor on dateability? Conduct a one-way ANOVA to find out. If the result is significant (p < .05), conduct post-hoc tests.

hairdate.aov <- aov(formula = dateability ~ haircolor, data = facebook)

summary(hairdate.aov)
#>              Df Sum Sq Mean Sq F value Pr(>F)
#> haircolor     2   1779   889.7   1.228  0.293
#> Residuals   997 722162   724.3

Answer: There was no significant main effect of haircolor on dateability (\(F(2, 997) = 1.23\), \(p = 0.29\)).

6c. Was there a main effect of intelligence on dateability? Conduct a one-way ANOVA to find out. If the result is significant (\(p < .05\)), conduct post-hoc tests.

intellidate.aov <- aov(formula = dateability ~ intelligence, data = facebook)

summary(intellidate.aov)
#>               Df Sum Sq Mean Sq F value Pr(>F)   
#> intelligence   2   7070    3535   4.917 0.0075 **
#> Residuals    997 716871     719                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(intellidate.aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = dateability ~ intelligence, data = facebook)
#> 
#> $intelligence
#>                      diff       lwr       upr     p adj
#> 2.medium-1.low   6.535264  1.615557 11.454970 0.0053173
#> 3.high-1.low     2.865833 -2.070710  7.802375 0.3610523
#> 3.high-2.medium -3.669431 -8.458242  1.119380 0.1705679

Answer: There was a significant main effect of intelligence on dateability (\(F(2, 997) = 4.97\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between Medium and Low intelligence (\(diff = 6.53\), \(p < .01\)). All other pairwise differences were not significant at the 0.05 significance threshold.

Multi-independent ANOVAs

7a. Conduct a three-way ANOVA on dateability with both intelligence, university and haircolor as independent variables (IVs). Do your results for each variable change compared to your previous one-way ANOVAs on these variables? (You do not need to provide the full APA results or conduct post-hoc tests, just verbally answer the question.)

iuhdate.aov <- aov(formula = dateability ~ intelligence + university + haircolor, 
                   data = facebook)

summary(iuhdate.aov)
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> intelligence   2   7070    3535   5.043  0.00662 ** 
#> university     2  18838    9419  13.436 1.75e-06 ***
#> haircolor      2   1921     960   1.370  0.25462    
#> Residuals    993 696112     701                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: The results are the same (and should be).

7b. Conduct a multi-way ANOVA predicting dateability by sex, haircolor, university, education, shirtless, intelligence and attractiveness as independent variables (IVs). Which of these variables are significantly related to dateability? (Do write APA results for each variable, but do not conduct post-hoc tests.)

multidate.aov <- aov(formula = dateability ~ sex + haircolor + university + education 
                     + shirtless + intelligence + attractiveness,
                     data = facebook)

summary(multidate.aov)
#>                 Df Sum Sq Mean Sq F value   Pr(>F)    
#> sex              1  36514   36514  66.103 1.28e-15 ***
#> haircolor        2   1869     935   1.692  0.18466    
#> university       2  19172    9586  17.354 3.92e-08 ***
#> education        3   3186    1062   1.922  0.12424    
#> shirtless        1    106     106   0.192  0.66106    
#> intelligence     2   5904    2952   5.345  0.00491 ** 
#> attractiveness   2 112544   56272 101.872  < 2e-16 ***
#> Residuals      986 544646     552                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: There were significant effects of sex (\(F(1, 986) = 66.10\), \(p < .01\)), university (\(F(2, 986) = 17.35\), \(p < .01\)), intelligence (\(F(2, 986) = 5.34\), \(p < .01\)) and attractiveness (\(F(2, 986) = 101.87\), \(p < .01\)) on the target person’s perceived dateability.

Checkpoint 2

If you got this far you’re doing great, but don’t give up just yet…

ANOVAs on subsets of data

8. It turns out that the (male) experimenter who ran the experimental Sessions 1 through 30 was trying to score a date and slipped in his own profile picture into the study. Thus, you wonder whether you can trust the data of these sessions. Repeat your multi ANOVA from question 7b. ONLY for Sessions 31 through 50. Do your conclusions change when compared to the data from all sessions?

multidate2.aov <- aov(formula = dateability ~ sex + haircolor + university + education 
                      + shirtless + intelligence + attractiveness,
                      data = subset(facebook, session > 30))

summary(multidate2.aov)
#>                 Df Sum Sq Mean Sq F value   Pr(>F)    
#> sex              1   6933    6933  12.425 0.000474 ***
#> haircolor        2   2459    1230   2.203 0.111806    
#> university       2  12099    6050  10.842 2.62e-05 ***
#> education        3   2059     686   1.230 0.298479    
#> shirtless        1    184     184   0.330 0.565884    
#> intelligence     2    543     271   0.487 0.615135    
#> attractiveness   2  36084   18042  32.334 1.04e-13 ***
#> Residuals      386 215386     558                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: The results look about the same.

Interactions

9a. Create a plot (e.g., a pirateplot(), barplot(), or boxplot()) showing the distribution of dateability based on two independent variables of sex and shirtless. Based on what you see in the plot, do you expect there to be an interaction between these two independent variables? Why or why not?

yarrr::pirateplot(dateability ~ sex + shirtless,
                  data = facebook,
                  main = "Effects of sex and shirtlessness on dateability",
                  xlab = "IVs",
                  ylab = "dateability",
                  gl.col = "gray",
                  pal = "google")

Answer: This looks like an interaction (i.e., the effect due to one IV varies as a function of another IV): Female dateability appears to be selectively increased by shirtlessness imagery.

9b. Test your prediction with the appropriate ANOVA.

sexshirtdate.aov <- aov(dateability ~ sex * shirtless, 
                        data = facebook)
            
summary(sexshirtdate.aov)
#>                Df Sum Sq Mean Sq F value   Pr(>F)    
#> sex             1  36514   36514  62.689 6.43e-15 ***
#> shirtless       1    247     247   0.425    0.515    
#> sex:shirtless   1 107048  107048 183.786  < 2e-16 ***
#> Residuals     996 580132     582                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: Yep, interaction indeed.

Checkpoint 3

If you got this far you’re doing an amazing job — well done! Enjoy the following challenge…

C. Challenges

More interactions

10a. Create a plot (e.g., a pirateplot(), barplot(), or boxplot()) showing the distribution of dateability based on two independent variables: sex and intelligence. Based on what you see in this plot, do you expect there to be an interaction between the two IVs? Why or why not?

yarrr::pirateplot(dateability ~ intelligence + sex, 
                  data = facebook,
                  main = "Effects of intelligence on dateability",
                  xlab = "IVs",
                  ylab = "dateability",
                  gl.col = "gray",
                  pal = "google")

Answer: This also looks like an interaction (i.e., the effect due to one IV varies as a function of another IV): The dateability of males seems to increase with intelligence, whereas the dateability of female doesn’t. (Beware that this is fake data, which often tends to correspond to overly simplistic cliches…)

10b. Test your prediction with the appropriate ANOVA.

intellisexdate.aov <- aov(dateability ~ intelligence + sex, data = facebook)
summary(intellisexdate.aov)
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> intelligence   2   7070    3535   5.172  0.00583 ** 
#> sex            1  36022   36022  52.696 7.83e-13 ***
#> Residuals    996 680849     684                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

intellisexdate.aov <- aov(dateability ~ intelligence + sex, data = facebook)
summary(intellisexdate.aov)
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> intelligence   2   7070    3535   5.172  0.00583 ** 
#> sex            1  36022   36022  52.696 7.83e-13 ***
#> Residuals    996 680849     684                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: Yep, interaction indeed.

11a. Create a plot (e.g., a pirateplot(), barplot(), or boxplot()) showing the distribution of dateability based on two independent variables: haircolor and university. Based on what you see in the plot, do you expect there to be an interaction between haircolor and university? Why or why not?

yarrr::pirateplot(dateability ~ haircolor + university, 
                  data = facebook,
                  main = "Effects of haircolor and university on dateability",
                  xlab = "IVs",
                  ylab = "dateability",
                  gl.col = "gray",
                  pal = "google")

Answer: This suggests a potential main effect (of Konstanz possibly scoring higher than the other two universities), but does not look like an interaction (i.e., the effect due to one IV does not seem to vary as a function of another IV).

11b. Test your prediction with the appropriate ANOVA.

hairunidate.aov <- aov(dateability ~ haircolor * university, data = facebook)
summary(hairunidate.aov)
#>                       Df Sum Sq Mean Sq F value   Pr(>F)    
#> haircolor              2   1779     890   1.257    0.285    
#> university             2  19636    9818  13.873 1.14e-06 ***
#> haircolor:university   4   1217     304   0.430    0.787    
#> Residuals            991 701309     708                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(hairunidate.aov)
#>   Tukey multiple comparisons of means
#>     95% family-wise confidence level
#> 
#> Fit: aov(formula = dateability ~ haircolor * university, data = facebook)
#> 
#> $haircolor
#>                diff       lwr      upr     p adj
#> brown-blonde -0.224 -5.060707 4.612707 0.9935063
#> red-blonde    2.924 -2.660948 8.508948 0.4363527
#> red-brown     3.148 -1.688707 7.984707 0.2783324
#> 
#> $university
#>                               diff        lwr       upr     p adj
#> 2.Freiburg-1.Konstanz   -10.253202 -15.082095 -5.424310 0.0000022
#> 3.Goettingen-1.Konstanz  -8.585490 -13.535307 -3.635673 0.0001492
#> 3.Goettingen-2.Freiburg   1.667712  -3.092995  6.428420 0.6892951
#> 
#> $`haircolor:university`
#>                                                diff        lwr         upr
#> brown:1.Konstanz-blonde:1.Konstanz      -1.10629558 -12.589148 10.37655727
#> red:1.Konstanz-blonde:1.Konstanz         3.93401899  -9.182782 17.05081981
#> blonde:2.Freiburg-blonde:1.Konstanz     -8.73592782 -21.358082  3.88622634
#> brown:2.Freiburg-blonde:1.Konstanz     -10.91331247 -22.093032  0.26640725
#> red:2.Freiburg-blonde:1.Konstanz        -8.71804145 -21.434850  3.99876671
#> blonde:3.Goettingen-blonde:1.Konstanz  -11.08367582 -24.326826  2.15947399
#> brown:3.Goettingen-blonde:1.Konstanz    -7.90421201 -19.154051  3.34562692
#> red:3.Goettingen-blonde:1.Konstanz      -5.88607595 -19.044060  7.27190802
#> red:1.Konstanz-brown:1.Konstanz          5.04031457  -6.395324 16.47595322
#> blonde:2.Freiburg-brown:1.Konstanz      -7.62963224 -18.494352  3.23508787
#> brown:2.Freiburg-brown:1.Konstanz       -9.80701689 -18.956307 -0.65772728
#> red:2.Freiburg-brown:1.Konstanz         -7.61174587 -18.586288  3.36279636
#> blonde:3.Goettingen-brown:1.Konstanz    -9.97738024 -21.557725  1.60296450
#> brown:3.Goettingen-brown:1.Konstanz     -6.79791642 -16.032755  2.43692214
#> red:3.Goettingen-brown:1.Konstanz       -4.77978037 -16.262633  6.70307249
#> blonde:2.Freiburg-red:1.Konstanz       -12.66994681 -25.249164 -0.09072993
#> brown:2.Freiburg-red:1.Konstanz        -14.84733146 -25.978551 -3.71611164
#> red:2.Freiburg-red:1.Konstanz          -12.65206044 -25.326252  0.02213111
#>                                            p adj
#> brown:1.Konstanz-blonde:1.Konstanz     0.9999981
#> red:1.Konstanz-blonde:1.Konstanz       0.9911019
#> blonde:2.Freiburg-blonde:1.Konstanz    0.4386690
#> brown:2.Freiburg-blonde:1.Konstanz     0.0619905
#> red:2.Freiburg-blonde:1.Konstanz       0.4524943
#> blonde:3.Goettingen-blonde:1.Konstanz  0.1870409
#> brown:3.Goettingen-blonde:1.Konstanz   0.4168367
#> red:3.Goettingen-blonde:1.Konstanz     0.9014619
#> red:1.Konstanz-brown:1.Konstanz        0.9088545
#> blonde:2.Freiburg-brown:1.Konstanz     0.4175945
#> brown:2.Freiburg-brown:1.Konstanz      0.0250357
#> red:2.Freiburg-brown:1.Konstanz        0.4355857
#> blonde:3.Goettingen-brown:1.Konstanz   0.1568700
#> brown:3.Goettingen-brown:1.Konstanz    0.3501373
#> red:3.Goettingen-brown:1.Konstanz      0.9331675
#> blonde:2.Freiburg-red:1.Konstanz       0.0467834
#> brown:2.Freiburg-red:1.Konstanz        0.0012144
#> red:2.Freiburg-red:1.Konstanz          0.0508063
#>  [ reached getOption("max.print") -- omitted 18 rows ]

Answer: Right again: A main effect of university (with Konstanz winning, of course), but no interaction here.

That’s it – hope you enjoyed working on this assignment!

[WPA07_answers.Rmd updated on 2017-12-14 15:07:16 by hn.]

WPA07: Statistics: ANOVAs (Answers)

Hansjörg Neth, SPDS, uni.kn

2017 Dec 14