Answers for WPA07 of Basic data and decision analysis in R, taught at the University of Konstanz in Winter 2017/2018.
To complete and submit these exercises, please remember and do the following:
Your WPAs should be written as scripts of commented code (as
.Rmd
files) and submitted as reproducible documents that combine text with code (in.html
or.pdf
formats).A simple
.Rmd
template is provided here.(Alternatively, open a plain R script and save it as
LastnameFirstname_WPA##_yymmdd.R
.)
Also enter the current assignment (e.g., WPA07), your name, and the current date at the top of your document. When working on a task, always indicate which task you are answering with appopriate comments.
Complete as many exercises as you can by Wednesday (23:59).
Submit your script or output file (including all code) to the appropriate folder on Ilias.
General guidelines
Read the following guidelines carefully to save yourself time in conducting analyses and reporting results in this WPA:
For each question, conduct the appropriate ANOVA and then formulate your conclusion in APA style. To summarize an effect in an ANOVA, use the format \(F(XXX, YYY) = FFF\), \(p = PPP\), where \(XXX\) is the degrees of freedom of the variable you are testing, \(YYY\) is the degrees of freedom of the residuals, \(FFF\) is the \(F\)-value for the variable you are testing, and \(PPP\) is the \(p\)-value.
p-values: If the \(p\)-value is less than .01, just write \(p < .01\). If the \(p\)-value of the ANOVA is less than .05, conduct post-hoc tests.
post-hoc tests: If you are only testing one independent variable, write the proper APA conclusions for the post-hoc test. If you are testing more than one independent variable in your ANOVA, you do not need to write the APA style conclusions for post-hoc tests – just print the result.
Here is an example:
Question: Was there an effect of diets on chicken weights (in the ChickWeight
data set)?
# ANOVA on Chicken Weights: IV = Diet, DV = weight
# Conduct ANOVA:
p0.aov <- aov(formula = weight ~ Diet,
data = ChickWeight)
summary(p0.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Diet 3 155863 51954 10.81 6.43e-07 ***
#> Residuals 574 2758693 4806
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA was significant (p < .01), so do post-hoc tests:
TukeyHSD(p0.aov) # post-hoc tests
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = weight ~ Diet, data = ChickWeight)
#>
#> $Diet
#> diff lwr upr p adj
#> 2-1 19.971212 -0.2998092 40.24223 0.0552271
#> 3-1 40.304545 20.0335241 60.57557 0.0000025
#> 4-1 32.617257 12.2353820 52.99913 0.0002501
#> 3-2 20.333333 -2.7268370 43.39350 0.1058474
#> 4-2 12.646045 -10.5116315 35.80372 0.4954239
#> 4-3 -7.687288 -30.8449649 15.47039 0.8277810
Answer: There was a significant main effect of diets on chicken weights (\(F(3, 574) = 10.81\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between diets 1 and 3 (diff = 40.30, \(p < .01\)) and diets 1 and 4 (diff = 32.62, \(p < .01\)). All other pairwise differences were not significant at the \(\alpha = .05\) significance level.
A. In Class
Here are some warm-up exercises that review important points from previous chapters and practice the basic concepts of the current topic:
Preparations
0. The following steps prepare the current session by opening an R project, creating a new .Rmd
file, and compiling it into an .html
output file:
0a. Open your R project from last week (called RCourse
or something similar), which contains some files and at least two subfolders (data
and R
).
0b. Create a new R Markdown (.Rmd
) script and save it as LastFirst_WPA07_yymmdd.Rmd
(with an appropriate header) in your project directory.
0c. Insert a code chunk and load the rmarkdown
, knitr
and yarrr
packages. (Hint: It’s always a good idea to name code chunks and load all required packages with library()
at the beginning of your document. Using the chunk option include = FALSE
evaluates the chunk, but does not show it or its outputs in the html output file.)
library(rmarkdown)
library(knitr)
library(yarrr)
# Store original par() settings:
opar <- par()
# par(opar) # restores original (default) par settings later
0d. Make sure that you can create an .html
output-file by “knitting” your current document.
Data exploration
1. The ToothGrowth
data set included in R contains the length len
of odontoblasts (i.e., cells responsible for tooth growth) in 60 guinea pigs as a function of two delivery methods (supp
) and three levels of vitamin C (dose
).
1a. Copy the data set into an object tg
and familiarize yourself with its structure and contents. (Hint: Check ?ToothGrowth
and use the head()
, str()
and summary()
functions to explore the data.)
#> len supp dose
#> 1 4.2 VC 0.5
#> 2 11.5 VC 0.5
#> 3 7.3 VC 0.5
#> 4 5.8 VC 0.5
#> 5 6.4 VC 0.5
#> 6 10.0 VC 0.5
1b. How many cases are there of each combination of supp
and dose
? (Hint: Use the table()
function to find out.)
table(tg$supp, tg$dose) # count cases
#>
#> 0.5 1 2
#> OJ 10 10 10
#> VC 10 10 10
1c. What are the mean lengths for each combination of supp
and dose
? (Hint: Use multiple mean()
or the aggregate()
functions to find out.)
ag <- aggregate(len ~ ., data = tg, mean) # means
xtabs(len ~ ., data = ag)
#> dose
#> supp 0.5 1 2
#> OJ 13.23 22.70 26.06
#> VC 7.98 16.77 26.14
1d. Visualize the potential effects of supp
and dose
on the len
variable. (Hint: Use the yarrr::pirateplot()
, barplot()
, or boxplot()
functions to create a plot.)
# (a) Pirateplot:
yarrr::pirateplot(formula = len ~ supp + dose,
data = tg,
main = "Effects of delivery method (supp) and dosage (dose) on length",
xlab = "IVs",
ylab = "DV: length",
gl.col = "gray",
pal = unikn.col[c(4, 6)])
## (b) Barplot with confidence intervals:
# install.packages("sciplot")
library(sciplot)
sciplot::bargraph.CI(tg$supp, tg$len, tg$dose,
x.leg = "topleft",
col = seeblau,
angle = 45, density = c(0, 40, 100), legend = TRUE,
ylim = c(0, 30), las = 1, space = c(0.1,1),
ylab = "Tooth length", xlab = "Supplement type",
main = "Effects of delivery method (supp) and dosage (dose) on length")
# (c) Boxplot:
boxplot(len ~ supp * dose,
data = tg,
col = c(seeblau.col[4], seeblau.col[1]),
las = 1,
names = c("OJ|0.5", "VC|0.5", "OJ|1.0",
"VC|1.0", "OJ|2.0", "VC|2.0"),
horizontal = TRUE,
xlab = "Tooth length")
Answer: Looks like an interaction.
ANOVAs with 1 IV
In the following exercises, you will test statistical hypotheses about mean differences between two or more samples.
2a. Does the delivery method of the supplement supp
have a systematic effect on len
? (Hint: Use an ANOVA with one IV to find out.)
# ANOVA with 1IV of 2 groups:
tg1.aov <- aov(formula = len ~ supp, data = tg)
summary(tg1.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> supp 1 205 205.35 3.668 0.0604 .
#> Residuals 58 3247 55.98
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check with linear model:
tg1.lm <- lm(formula = len ~ supp, data = tg)
summary(tg1.lm)
#>
#> Call:
#> lm(formula = len ~ supp, data = tg)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -12.7633 -5.7633 0.4367 5.5867 16.9367
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 20.663 1.366 15.127 <2e-16 ***
#> suppVC -3.700 1.932 -1.915 0.0604 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 7.482 on 58 degrees of freedom
#> Multiple R-squared: 0.05948, Adjusted R-squared: 0.04327
#> F-statistic: 3.668 on 1 and 58 DF, p-value: 0.06039
Answer: There is no systematic effect of delivery method supp
on length: \(F(1, 58) = 3.67\), \(p = 0.06\).
2b. As the test in 2a. only involved two groups, you could have used a simpler test to check the effects of supp
on len
. Conduct the corresponding t-test
to check the results of your ANOVA.
t.test(x = tg$len[tg$supp == "VC"],
y = tg$len[tg$supp == "OJ"])
#>
#> Welch Two Sample t-test
#>
#> data: tg$len[tg$supp == "VC"] and tg$len[tg$supp == "OJ"]
#> t = -1.9153, df = 55.309, p-value = 0.06063
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -7.5710156 0.1710156
#> sample estimates:
#> mean of x mean of y
#> 16.96333 20.66333
Answer: Essentially the same result: mean difference = 3.7, t(55.31) = -1.92, p = 0.06 (2-tailed).
3a. Does the level of the vitamin C dose
have a systematic effect on len
? (Hint: Use an ANOVA with one IV to find out and note that the as.factor()
can be used to convert numeric variables into factors.)
is.factor(tg$dose) # is a numeric variable
#> [1] FALSE
tg$dose2 <- as.factor(tg$dose) # convert to factor variable
is.factor(tg$dose2) # check
#> [1] TRUE
# ANOVA with 1IV of 3 groups:
tg2.aov <- aov(formula = len ~ dose2, data = tg)
summary(tg2.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> dose2 2 2426 1213 67.42 9.53e-16 ***
#> Residuals 57 1026 18
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check with linear model:
tg2.lm <- aov(formula = len ~ dose2, data = tg)
summary(tg2.lm)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> dose2 2 2426 1213 67.42 9.53e-16 ***
#> Residuals 57 1026 18
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: Yes, there is a significant effect of dose
on len
: \(F(2, 57) = 67.4\), \(p < .01\).
3b. More specifically, which levels of the vitamin C dose
have a systematic effect on len
? (Hint: Use post-hoc tests on the ANOVA of 3a. to find out.)
# Post-hoc tests:
TukeyHSD(tg2.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = len ~ dose2, data = tg)
#>
#> $dose2
#> diff lwr upr p adj
#> 1-0.5 9.130 5.901805 12.358195 0.00e+00
#> 2-0.5 15.495 12.266805 18.723195 0.00e+00
#> 2-1 6.365 3.136805 9.593195 4.25e-05
Answer: All pairwise comparisons between three levels of dose
are different from each other (at the \(p < .01\) level).
3c. Check the result of the post-hoc test for the difference between a dose of 1 and a dose of 2 with a corresponding t-test.
# t-test of len by dose == 1 vs. dose == 2:
t.test(x = tg$len[tg$dose == 1],
y = tg$len[tg$dose == 2])
#>
#> Welch Two Sample t-test
#>
#> data: tg$len[tg$dose == 1] and tg$len[tg$dose == 2]
#> t = -4.9005, df = 37.101, p-value = 1.906e-05
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -8.996481 -3.733519
#> sample estimates:
#> mean of x mean of y
#> 19.735 26.100
Answer: Yes, they are different indeed: mean difference = 6.37, t(37.1) = -4.9, p < 0.01 (2-tailed).
ANOVAs with 2 IVs
4a. Test the main effects of delivery method supp
and dosage dose
on len
in one analysis. (Hint: Use an aov()
with 2 independent variables – combined by the +
operator – and appropriate post-hoc tests in case of significant main effects.)
# ANOVA with 2IVs of 2x3 groups: Only main effects
tg3.aov <- aov(formula = len ~ supp + dose2, data = tg)
summary(tg3.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> supp 1 205.4 205.4 14.02 0.000429 ***
#> dose2 2 2426.4 1213.2 82.81 < 2e-16 ***
#> Residuals 56 820.4 14.7
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(tg3.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = len ~ supp + dose2, data = tg)
#>
#> $supp
#> diff lwr upr p adj
#> VC-OJ -3.7 -5.679762 -1.720238 0.0004293
#>
#> $dose2
#> diff lwr upr p adj
#> 1-0.5 9.130 6.215909 12.044091 0e+00
#> 2-0.5 15.495 12.580909 18.409091 0e+00
#> 2-1 6.365 3.450909 9.279091 7e-06
# Check with linear model:
tg3.lm <- lm(formula = len ~ supp + dose2, data = tg)
summary(tg3.lm)
#>
#> Call:
#> lm(formula = len ~ supp + dose2, data = tg)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -7.085 -2.751 -0.800 2.446 9.650
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 12.4550 0.9883 12.603 < 2e-16 ***
#> suppVC -3.7000 0.9883 -3.744 0.000429 ***
#> dose21 9.1300 1.2104 7.543 4.38e-10 ***
#> dose22 15.4950 1.2104 12.802 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.828 on 56 degrees of freedom
#> Multiple R-squared: 0.7623, Adjusted R-squared: 0.7496
#> F-statistic: 59.88 on 3 and 56 DF, p-value: < 2.2e-16
Answer: Both main effects are significant:
- Effect of
supp
onlen
: \(F(1, 56) = 14.0\), \(p < .01\). (Note that this main effect was not found above!) - Effect of
dose
onlen
: \(F(2, 56) = 82.8\), \(p < .01\) (as above).
4b. Test the main effects and a potential interaction of the delivery method supp
and dosage dose
on len
in one analysis. (Hint: Use an aov()
with 2 independent variables – combined by the *
operator – and appropriate post-hoc tests in case of significant main effects or interactions.)
# ANOVA with 2IVs of 2x3 groups: with interaction:
tg4.aov <- aov(formula = len ~ supp * dose2, data = tg)
summary(tg4.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> supp 1 205.4 205.4 15.572 0.000231 ***
#> dose2 2 2426.4 1213.2 92.000 < 2e-16 ***
#> supp:dose2 2 108.3 54.2 4.107 0.021860 *
#> Residuals 54 712.1 13.2
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(tg4.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = len ~ supp * dose2, data = tg)
#>
#> $supp
#> diff lwr upr p adj
#> VC-OJ -3.7 -5.579828 -1.820172 0.0002312
#>
#> $dose2
#> diff lwr upr p adj
#> 1-0.5 9.130 6.362488 11.897512 0.0e+00
#> 2-0.5 15.495 12.727488 18.262512 0.0e+00
#> 2-1 6.365 3.597488 9.132512 2.7e-06
#>
#> $`supp:dose2`
#> diff lwr upr p adj
#> VC:0.5-OJ:0.5 -5.25 -10.048124 -0.4518762 0.0242521
#> OJ:1-OJ:0.5 9.47 4.671876 14.2681238 0.0000046
#> VC:1-OJ:0.5 3.54 -1.258124 8.3381238 0.2640208
#> OJ:2-OJ:0.5 12.83 8.031876 17.6281238 0.0000000
#> VC:2-OJ:0.5 12.91 8.111876 17.7081238 0.0000000
#> OJ:1-VC:0.5 14.72 9.921876 19.5181238 0.0000000
#> VC:1-VC:0.5 8.79 3.991876 13.5881238 0.0000210
#> OJ:2-VC:0.5 18.08 13.281876 22.8781238 0.0000000
#> VC:2-VC:0.5 18.16 13.361876 22.9581238 0.0000000
#> VC:1-OJ:1 -5.93 -10.728124 -1.1318762 0.0073930
#> OJ:2-OJ:1 3.36 -1.438124 8.1581238 0.3187361
#> VC:2-OJ:1 3.44 -1.358124 8.2381238 0.2936430
#> OJ:2-VC:1 9.29 4.491876 14.0881238 0.0000069
#> VC:2-VC:1 9.37 4.571876 14.1681238 0.0000058
#> VC:2-OJ:2 0.08 -4.718124 4.8781238 1.0000000
# Check with linear model:
tg4.lm <- lm(formula = len ~ supp * dose2, data = tg)
summary(tg4.lm)
#>
#> Call:
#> lm(formula = len ~ supp * dose2, data = tg)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -8.20 -2.72 -0.27 2.65 8.27
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 13.230 1.148 11.521 3.60e-16 ***
#> suppVC -5.250 1.624 -3.233 0.00209 **
#> dose21 9.470 1.624 5.831 3.18e-07 ***
#> dose22 12.830 1.624 7.900 1.43e-10 ***
#> suppVC:dose21 -0.680 2.297 -0.296 0.76831
#> suppVC:dose22 5.330 2.297 2.321 0.02411 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.631 on 54 degrees of freedom
#> Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
#> F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
Answer: Both main effects (of supp
and of dose
) and their interaction are significant.
Checkpoint 1
At this point you completed all basic exercises. This is good, but additional practice will deepen your understanding, so please keep carrying on…
B. At Home
Facebook attraction
In this WPA, you will analyze data from a ficticious study on attraction. In the study, 1,000 heterosexual University students viewed the Facebook profile of another student (the “target” person) of the opposite sex. Based on a target person’s profile, each participant made three judgments about the target: their perceived intelligence, attractiveness, and dateability. The primary judgment of interest was the dateability rating indicating as how dateable the target person was perceived (ranging from a minimum value of 0 to a maximum value of 100).
Data description
The data file contains 1,000 rows and 10 columns. Here are the columns:
session
: The experiment session in which the study was run. There were 50 sessions in total.sex
: The sex of the target person (“m” vs. “f”).age
: The age of the target person (in years).haircolor
: The hair color of the target person.university
: The university that the target person attended.education
: The highest level of education obtained by the target person.shirtless
: Did the target person have a shirtless profile picture? (1.No vs. 2.Yes).intelligence
: As how intelligent do you rate this target? (1.Low, 2.Medium, 3.High).attractiveness
: As how physically attractive do you rate this target? (1.Low, 2.Medium, 3.High).dateability
: As how dateable do you rate this target person? (Scale from 0 to 100).
Data loading and exploration
5a. The data are located in a tab-delimited text file at http://Rpository.com/down/data/WPA07_facebook.txt. Using read.table()
load this data into R as a new object called facebook
.
Here is how the first few rows of the data should look:
head(facebook)
#> session sex age haircolor university education shirtless
#> 1 1 m 23 brown 3.Goettingen 3.Masters 2.Yes
#> 2 1 m 19 blonde 2.Freiburg 1.HighSchool 1.No
#> 3 1 f 22 brown 2.Freiburg 2.Bachelors 2.Yes
#> 4 1 f 22 red 2.Freiburg 2.Bachelors 1.No
#> 5 1 m 23 brown 3.Goettingen 2.Bachelors 1.No
#> 6 1 m 26 blonde 2.Freiburg 3.Masters 2.Yes
#> intelligence attractiveness dateability
#> 1 1.low 3.high 15
#> 2 2.medium 2.medium 44
#> 3 1.low 2.medium 100
#> 4 2.medium 3.high 100
#> 5 2.medium 2.medium 63
#> 6 3.high 3.high 76
5b. Inspect the first few rows of the dataframe with the head()
function to make sure it loaded correctly. Using the str()
function, look at the structure of the dataframe to make sure everything looks ok.
#> session sex age haircolor university education shirtless
#> 1 1 m 23 brown 3.Goettingen 3.Masters 2.Yes
#> 2 1 m 19 blonde 2.Freiburg 1.HighSchool 1.No
#> 3 1 f 22 brown 2.Freiburg 2.Bachelors 2.Yes
#> 4 1 f 22 red 2.Freiburg 2.Bachelors 1.No
#> 5 1 m 23 brown 3.Goettingen 2.Bachelors 1.No
#> 6 1 m 26 blonde 2.Freiburg 3.Masters 2.Yes
#> intelligence attractiveness dateability
#> 1 1.low 3.high 15
#> 2 2.medium 2.medium 44
#> 3 1.low 2.medium 100
#> 4 2.medium 3.high 100
#> 5 2.medium 2.medium 63
#> 6 3.high 3.high 76
#> 'data.frame': 1000 obs. of 10 variables:
#> $ session : int 1 1 1 1 1 1 1 1 1 1 ...
#> $ sex : Factor w/ 2 levels "f","m": 2 2 1 1 2 2 1 2 1 1 ...
#> $ age : int 23 19 22 22 23 26 19 25 22 19 ...
#> $ haircolor : Factor w/ 3 levels "blonde","brown",..: 2 1 2 3 2 1 2 3 2 1 ...
#> $ university : Factor w/ 3 levels "1.Konstanz","2.Freiburg",..: 3 2 2 2 3 2 3 2 2 3 ...
#> $ education : Factor w/ 4 levels "1.HighSchool",..: 3 1 2 2 2 3 1 3 2 1 ...
#> $ shirtless : Factor w/ 2 levels "1.No","2.Yes": 2 1 2 1 1 2 1 2 2 1 ...
#> $ intelligence : Factor w/ 3 levels "1.low","2.medium",..: 1 2 1 2 2 3 1 3 2 2 ...
#> $ attractiveness: Factor w/ 3 levels "1.low","2.medium",..: 3 2 2 3 2 3 2 1 2 2 ...
#> $ dateability : int 15 44 100 100 63 76 61 26 76 40 ...
One-way ANOVAs
6a. Was there a main effect of the university
on dateability
? Conduct a one-way ANOVA to find out. If the result is significant (i.e., \(p < .05\)), conduct post-hoc tests.
unidate.aov <- aov(formula = dateability ~ university, data = facebook)
summary(unidate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> university 2 19769 9884 13.99 1.01e-06 ***
#> Residuals 997 704173 706
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(unidate.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = dateability ~ university, data = facebook)
#>
#> $university
#> diff lwr upr p adj
#> 2.Freiburg-1.Konstanz -10.275500 -15.099617 -5.451383 0.0000020
#> 3.Goettingen-1.Konstanz -8.641689 -13.586611 -3.696767 0.0001311
#> 3.Goettingen-2.Freiburg 1.633811 -3.122189 6.389811 0.6991862
Answer: There was a significant main effect of university on dateability (\(F(2, 997) = 13.99\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between Freiburg and Konstanz (\(diff = -10.28\), \(p < .01\)) and between Goettingen and Konstanz (\(diff = -8.64\), \(p < .01\)). The pairwise difference between Goettingen and Freiburg was not significant at the 0.05 significance threshold.
6b. Was there a main effect of haircolor
on dateability
? Conduct a one-way ANOVA to find out. If the result is significant (p < .05), conduct post-hoc tests.
hairdate.aov <- aov(formula = dateability ~ haircolor, data = facebook)
summary(hairdate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> haircolor 2 1779 889.7 1.228 0.293
#> Residuals 997 722162 724.3
Answer: There was no significant main effect of haircolor on dateability (\(F(2, 997) = 1.23\), \(p = 0.29\)).
6c. Was there a main effect of intelligence
on dateability
? Conduct a one-way ANOVA to find out. If the result is significant (\(p < .05\)), conduct post-hoc tests.
intellidate.aov <- aov(formula = dateability ~ intelligence, data = facebook)
summary(intellidate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> intelligence 2 7070 3535 4.917 0.0075 **
#> Residuals 997 716871 719
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(intellidate.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = dateability ~ intelligence, data = facebook)
#>
#> $intelligence
#> diff lwr upr p adj
#> 2.medium-1.low 6.535264 1.615557 11.454970 0.0053173
#> 3.high-1.low 2.865833 -2.070710 7.802375 0.3610523
#> 3.high-2.medium -3.669431 -8.458242 1.119380 0.1705679
Answer: There was a significant main effect of intelligence on dateability (\(F(2, 997) = 4.97\), \(p < .01\)). Pairwise Tukey HSD tests showed significant differences between Medium and Low intelligence (\(diff = 6.53\), \(p < .01\)). All other pairwise differences were not significant at the 0.05 significance threshold.
Multi-independent ANOVAs
7a. Conduct a three-way ANOVA on dateability
with both intelligence
, university
and haircolor
as independent variables (IVs). Do your results for each variable change compared to your previous one-way ANOVAs on these variables? (You do not need to provide the full APA results or conduct post-hoc tests, just verbally answer the question.)
iuhdate.aov <- aov(formula = dateability ~ intelligence + university + haircolor,
data = facebook)
summary(iuhdate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> intelligence 2 7070 3535 5.043 0.00662 **
#> university 2 18838 9419 13.436 1.75e-06 ***
#> haircolor 2 1921 960 1.370 0.25462
#> Residuals 993 696112 701
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: The results are the same (and should be).
7b. Conduct a multi-way ANOVA predicting dateability
by sex
, haircolor
, university
, education
, shirtless
, intelligence
and attractiveness
as independent variables (IVs). Which of these variables are significantly related to dateability? (Do write APA results for each variable, but do not conduct post-hoc tests.)
multidate.aov <- aov(formula = dateability ~ sex + haircolor + university + education
+ shirtless + intelligence + attractiveness,
data = facebook)
summary(multidate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> sex 1 36514 36514 66.103 1.28e-15 ***
#> haircolor 2 1869 935 1.692 0.18466
#> university 2 19172 9586 17.354 3.92e-08 ***
#> education 3 3186 1062 1.922 0.12424
#> shirtless 1 106 106 0.192 0.66106
#> intelligence 2 5904 2952 5.345 0.00491 **
#> attractiveness 2 112544 56272 101.872 < 2e-16 ***
#> Residuals 986 544646 552
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: There were significant effects of sex (\(F(1, 986) = 66.10\), \(p < .01\)), university (\(F(2, 986) = 17.35\), \(p < .01\)), intelligence (\(F(2, 986) = 5.34\), \(p < .01\)) and attractiveness (\(F(2, 986) = 101.87\), \(p < .01\)) on the target person’s perceived dateability.
Checkpoint 2
If you got this far you’re doing great, but don’t give up just yet…
ANOVAs on subsets of data
8. It turns out that the (male) experimenter who ran the experimental Sessions 1 through 30 was trying to score a date and slipped in his own profile picture into the study. Thus, you wonder whether you can trust the data of these sessions. Repeat your multi ANOVA from question 7b. ONLY for Sessions 31 through 50. Do your conclusions change when compared to the data from all sessions?
multidate2.aov <- aov(formula = dateability ~ sex + haircolor + university + education
+ shirtless + intelligence + attractiveness,
data = subset(facebook, session > 30))
summary(multidate2.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> sex 1 6933 6933 12.425 0.000474 ***
#> haircolor 2 2459 1230 2.203 0.111806
#> university 2 12099 6050 10.842 2.62e-05 ***
#> education 3 2059 686 1.230 0.298479
#> shirtless 1 184 184 0.330 0.565884
#> intelligence 2 543 271 0.487 0.615135
#> attractiveness 2 36084 18042 32.334 1.04e-13 ***
#> Residuals 386 215386 558
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: The results look about the same.
Interactions
9a. Create a plot (e.g., a pirateplot()
, barplot()
, or boxplot()
) showing the distribution of dateability
based on two independent variables of sex
and shirtless
. Based on what you see in the plot, do you expect there to be an interaction between these two independent variables? Why or why not?
yarrr::pirateplot(dateability ~ sex + shirtless,
data = facebook,
main = "Effects of sex and shirtlessness on dateability",
xlab = "IVs",
ylab = "dateability",
gl.col = "gray",
pal = "google")
Answer: This looks like an interaction (i.e., the effect due to one IV varies as a function of another IV): Female dateability appears to be selectively increased by shirtlessness imagery.
9b. Test your prediction with the appropriate ANOVA.
sexshirtdate.aov <- aov(dateability ~ sex * shirtless,
data = facebook)
summary(sexshirtdate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> sex 1 36514 36514 62.689 6.43e-15 ***
#> shirtless 1 247 247 0.425 0.515
#> sex:shirtless 1 107048 107048 183.786 < 2e-16 ***
#> Residuals 996 580132 582
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: Yep, interaction indeed.
Checkpoint 3
If you got this far you’re doing an amazing job — well done! Enjoy the following challenge…
C. Challenges
More interactions
10a. Create a plot (e.g., a pirateplot()
, barplot()
, or boxplot()
) showing the distribution of dateability
based on two independent variables: sex
and intelligence
. Based on what you see in this plot, do you expect there to be an interaction between the two IVs? Why or why not?
yarrr::pirateplot(dateability ~ intelligence + sex,
data = facebook,
main = "Effects of intelligence on dateability",
xlab = "IVs",
ylab = "dateability",
gl.col = "gray",
pal = "google")
Answer: This also looks like an interaction (i.e., the effect due to one IV varies as a function of another IV): The dateability of males seems to increase with intelligence, whereas the dateability of female doesn’t. (Beware that this is fake data, which often tends to correspond to overly simplistic cliches…)
10b. Test your prediction with the appropriate ANOVA.
intellisexdate.aov <- aov(dateability ~ intelligence + sex, data = facebook)
summary(intellisexdate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> intelligence 2 7070 3535 5.172 0.00583 **
#> sex 1 36022 36022 52.696 7.83e-13 ***
#> Residuals 996 680849 684
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
intellisexdate.aov <- aov(dateability ~ intelligence + sex, data = facebook)
summary(intellisexdate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> intelligence 2 7070 3535 5.172 0.00583 **
#> sex 1 36022 36022 52.696 7.83e-13 ***
#> Residuals 996 680849 684
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: Yep, interaction indeed.
11a. Create a plot (e.g., a pirateplot()
, barplot()
, or boxplot()
) showing the distribution of dateability
based on two independent variables: haircolor
and university
. Based on what you see in the plot, do you expect there to be an interaction between haircolor
and university
? Why or why not?
yarrr::pirateplot(dateability ~ haircolor + university,
data = facebook,
main = "Effects of haircolor and university on dateability",
xlab = "IVs",
ylab = "dateability",
gl.col = "gray",
pal = "google")
Answer: This suggests a potential main effect (of Konstanz possibly scoring higher than the other two universities), but does not look like an interaction (i.e., the effect due to one IV does not seem to vary as a function of another IV).
11b. Test your prediction with the appropriate ANOVA.
hairunidate.aov <- aov(dateability ~ haircolor * university, data = facebook)
summary(hairunidate.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> haircolor 2 1779 890 1.257 0.285
#> university 2 19636 9818 13.873 1.14e-06 ***
#> haircolor:university 4 1217 304 0.430 0.787
#> Residuals 991 701309 708
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(hairunidate.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = dateability ~ haircolor * university, data = facebook)
#>
#> $haircolor
#> diff lwr upr p adj
#> brown-blonde -0.224 -5.060707 4.612707 0.9935063
#> red-blonde 2.924 -2.660948 8.508948 0.4363527
#> red-brown 3.148 -1.688707 7.984707 0.2783324
#>
#> $university
#> diff lwr upr p adj
#> 2.Freiburg-1.Konstanz -10.253202 -15.082095 -5.424310 0.0000022
#> 3.Goettingen-1.Konstanz -8.585490 -13.535307 -3.635673 0.0001492
#> 3.Goettingen-2.Freiburg 1.667712 -3.092995 6.428420 0.6892951
#>
#> $`haircolor:university`
#> diff lwr upr
#> brown:1.Konstanz-blonde:1.Konstanz -1.10629558 -12.589148 10.37655727
#> red:1.Konstanz-blonde:1.Konstanz 3.93401899 -9.182782 17.05081981
#> blonde:2.Freiburg-blonde:1.Konstanz -8.73592782 -21.358082 3.88622634
#> brown:2.Freiburg-blonde:1.Konstanz -10.91331247 -22.093032 0.26640725
#> red:2.Freiburg-blonde:1.Konstanz -8.71804145 -21.434850 3.99876671
#> blonde:3.Goettingen-blonde:1.Konstanz -11.08367582 -24.326826 2.15947399
#> brown:3.Goettingen-blonde:1.Konstanz -7.90421201 -19.154051 3.34562692
#> red:3.Goettingen-blonde:1.Konstanz -5.88607595 -19.044060 7.27190802
#> red:1.Konstanz-brown:1.Konstanz 5.04031457 -6.395324 16.47595322
#> blonde:2.Freiburg-brown:1.Konstanz -7.62963224 -18.494352 3.23508787
#> brown:2.Freiburg-brown:1.Konstanz -9.80701689 -18.956307 -0.65772728
#> red:2.Freiburg-brown:1.Konstanz -7.61174587 -18.586288 3.36279636
#> blonde:3.Goettingen-brown:1.Konstanz -9.97738024 -21.557725 1.60296450
#> brown:3.Goettingen-brown:1.Konstanz -6.79791642 -16.032755 2.43692214
#> red:3.Goettingen-brown:1.Konstanz -4.77978037 -16.262633 6.70307249
#> blonde:2.Freiburg-red:1.Konstanz -12.66994681 -25.249164 -0.09072993
#> brown:2.Freiburg-red:1.Konstanz -14.84733146 -25.978551 -3.71611164
#> red:2.Freiburg-red:1.Konstanz -12.65206044 -25.326252 0.02213111
#> p adj
#> brown:1.Konstanz-blonde:1.Konstanz 0.9999981
#> red:1.Konstanz-blonde:1.Konstanz 0.9911019
#> blonde:2.Freiburg-blonde:1.Konstanz 0.4386690
#> brown:2.Freiburg-blonde:1.Konstanz 0.0619905
#> red:2.Freiburg-blonde:1.Konstanz 0.4524943
#> blonde:3.Goettingen-blonde:1.Konstanz 0.1870409
#> brown:3.Goettingen-blonde:1.Konstanz 0.4168367
#> red:3.Goettingen-blonde:1.Konstanz 0.9014619
#> red:1.Konstanz-brown:1.Konstanz 0.9088545
#> blonde:2.Freiburg-brown:1.Konstanz 0.4175945
#> brown:2.Freiburg-brown:1.Konstanz 0.0250357
#> red:2.Freiburg-brown:1.Konstanz 0.4355857
#> blonde:3.Goettingen-brown:1.Konstanz 0.1568700
#> brown:3.Goettingen-brown:1.Konstanz 0.3501373
#> red:3.Goettingen-brown:1.Konstanz 0.9331675
#> blonde:2.Freiburg-red:1.Konstanz 0.0467834
#> brown:2.Freiburg-red:1.Konstanz 0.0012144
#> red:2.Freiburg-red:1.Konstanz 0.0508063
#> [ reached getOption("max.print") -- omitted 18 rows ]
Answer: Right again: A main effect of university (with Konstanz winning, of course), but no interaction here.
That’s it – hope you enjoyed working on this assignment!
[WPA07_answers.Rmd
updated on 2017-12-14 15:07:16 by hn.]