Introduction
Content
This file contains descriptions of datasets used in this course and their sources.
Course coordinates
- PSY-15150, at the University of Konstanz by Hansjörg Neth (h.neth@uni.kn, SPDS, office D507).
- Summer 2019: Mondays, 15:15–16:45, D435.
- Links to current course syllabus | ZeUS | Ilias
Datasets
1. Positive psychology
Introduction
In a highly-cited publication, Seligman, Steen, Park, and Peterson (2005) suggest that positive psychology interventions (PPIs) contain specific, powerful, therapeutic ingredients that cause higher increases in happiness and reductions in depression than a placebo control. The study by Woodworth et al. (2017) re-examines this claim by comparing the three most effective PPIs (identical with the interventions used by Seligman et al., 2005) to a placebo control in a web‐based, randomized assignment design.
Data sources
Articles reporting original research:
Seligman, M. E., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421. doi: https://doi.org/10.1037/0003-066X.60.5.410
Woodworth, R. J., O’Brien‐Malone, A., Diamond, M. R., & Schüz, B. (2017). Web‐based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232. doi: https://doi.org/10.1002/jclp.22328
Article on data used here:
Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). Data from, ‘Web-based Positive Psychology Interventions: A Reexamination of Effectiveness’. Journal of Open Psychology Data, 6: 1. doi: https://doi.org/10.5334/jopd.35
See https://openpsychologydata.metajnl.com/articles/10.5334/jopd.35/ for details.
The dataset is available from figshare at https://doi.org/10.6084/m9.figshare.1577563.v1.
Codebook
Description of the variables and values contained in the 2 original data files:
1. File posPsy_participants.csv
The file posPsy_participants.csv
contains 6 variables with demographic information on 295 participants:
id
: participant IDintervention
: 3 positive psychology interventions (PPIs), plus 1 control condition:- 1 = “Using signature strengths”,
- 2 = “Three good things”,
- 3 = “Gratitude visit”,
- 4 = “Recording early memories” (control condition).
sex
:- 1 = female,
- 2 = male.
age
: participant’s age (in years).educ
: level of education:- 1 = Less than Year 12,
- 2 = Year 12,
- 3 = Vocational training,
- 4 = Bachelor’s degree,
- 5 = Postgraduate degree.
income
:- 1 = below average,
- 2 = average,
- 3 = above average.
2. File posPsy_AHI_CESD.csv
The file posPsy_AHI_CESD.csv
contains answers to the 24 items of the Authentic Happiness Inventory (AHI) and answers to the 20 items of the Center for Epidemiological Studies Depression (CES-D) scale (see Radloff, 1977) for multiple (1 to 6) measurement occasions:
id
: Particpant IDoccasion
: Measurement occasion:- 0 = Pretest (i.e., at enrolment),
- 1 = Posttest (i.e., 7 days after pretest),
- 2 = 1-week follow-up, (i.e., 14 days after pretest, 7 days after posttest),
- 3 = 1-month follow-up, (i.e., 38 days after pretest, 31 days after posttest),
- 4 = 3-month follow-up, (i.e., 98 days after pretest, 91 days after posttest),
- 5 = 6-month follow-up, (i.e., 189 days after pretest, 182 days after posttest).
elapsed.days
: Time since enrolment measured in fractional daysintervention
: Intervention group (1 to 4)ahi01
–ahi24
: Responses on 24 AHI itemscesd01
–cesd20
: Responses on 20 CES-D itemsahiTotal
: Total AHI scorecesdTotal
: Total CES-D score
Getting the data
Files available
The following files were generated from the original data files (and saved in .csv
format):
posPsy_participants.csv
: Original participant data (295 x 6 variables):
http://rpository.com/ds4psy/data/posPsy_participants.csv.posPsy_AHI_CESD.csv
: Original data of dependent measures in long format (992 x 50 variables):
http://rpository.com/ds4psy/data/posPsy_AHI_CESD.csv.posPsy_AHI_CESD_corrected.csv
: Corrected version of dependent measures in long format (990 x 50 variables):
http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv.posPsy_data_wide.csv
: Corrected version of all data joined in wide format (295 x 294 variables):
http://rpository.com/ds4psy/data/posPsy_data_wide.csv.
Different measurement occasions are suffixed by.0
,.1
, …,.5
.
Loading data
We can load data stored in csv
-format into R by using the read_csv
command (from the readr
package, which is part of the tidyverse
). Here, we obtain the data files from online sources (at http://rpository.com/ds4psy/):
# Packages:
library(tidyverse)
# Load csv-data files from online links:
# 1. Participant data:
posPsy_p_info <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_participants.csv")
dim(posPsy_p_info) # 295 x 6
#> [1] 295 6
# 2. Original DVs in long format:
AHI_CESD <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_AHI_CESD.csv")
dim(AHI_CESD) # 992 x 50
#> [1] 992 50
# 3. Corrected DVs in long format:
posPsy_long <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv")
dim(posPsy_long) # 990 x 50
#> [1] 990 50
# 4. Transformed and corrected version of all data (in wide format):
posPsy_wide <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_data_wide.csv")
dim(posPsy_wide) # 295 x 294
#> [1] 295 294
# Check number of missing values:
sum(is.na(posPsy_p_info)) # 0 missing values
#> [1] 0
sum(is.na(posPsy_long)) # 0 missing values
#> [1] 0
sum(is.na(posPsy_wide)) # 37440 missing values!
#> [1] 37440
References
Radloff, L. S. (1977). The CES-D scale: A self report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401. doi: https://doi.org/10.1177/014662167700100306
Seligman, M. E., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421.
Woodworth, R. J., O’Brien‐Malone, A., Diamond, M. R., & Schüz, B. (2017). Web‐based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232.
Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. Journal of Open Psychology Data, 6: 1. doi: https://doi.org/10.5334/jopd.35
2. False positive psychology
Introduction
To highlight problematic research practices within psychology, Simmons, Nelson and Simonsohn (2011) published a controversial article with a necessarily false finding. By conducting simulations and two simple behavioral experiments, the authors show that flexibility in data collection, analysis, and reporting dramatically increases the rate of false-positive findings.
Data sources
Articles reporting original research:
- Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. doi: https://doi.org/10.1177/0956797611417632
Article on data used here:
- Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2014). Data from paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”. Journal of Open Psychology Data, 2(1), e1. doi: http://doi.org/10.5334/jopd.aa
See https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/ for data. (A zip-Archive with txt
files is available at http://dx.doi.org/10.5281/zenodo.7664.)
Codebook
The study data is stored in 2 seperate files: study1.xlsx
& study2.xlsx
. Both data files contain the same information about each participant in 17 variables:
age
: Days since participant was born (based on their self-reported birthday)
dad
: Father’s age in yearsmom
: Mother’s age in yearsfemale
: Is the participant a woman?- 1: yes
- 2: no
root
: Did they geht correctly the square root of 100?- 1: yes
- 2: no
bird
: Imagine a restaurant you really like offered a 30% discount for dining between 4 pm and 6 pm. How likely would you be to take advantage of that offer?- 1: very unlikely to 7: very likely
political
: In the political spectrum, where would you place yourself?- 1: very liberal
- 2: liberal
- 3: centrist
- 4: conservative
- 5: very conservative
- 1: very liberal
quarterback
: If you had to guess who was chosen the quarterback of the year in Canada last year, which of the following four options would you choose?- 1: Dalton Bell
- 2: Daryll Clark
- 3: Jarious Jackson
- 4: Frank Wilczynski
olddays
: How often have you referred to some past part of your life as “the good old days”?- 11: Never
- 12: almost never
- 13: sometimes
- 14: often
- 15: very often
potato
: Did the participant hear the song ‘Hot Potato’ by the Australian band The Wiggles?- 1: yes
- 2: no
when64
: Did the participant hear the song ‘When I am 64’ by the Beatles?- 1: yes
- 2: no
kalimba
: Did the participant hear the song ‘Kalimba’ by Mr. Scrub?- 1: yes
- 2: no
feelold
: How old do you feel?- 1: very young
- 2: young
- 3: neither young nor old
- 4: old
- 5: very old
computer
: Computers are complicated machines- 1: strongly disagree to 5: strongly agree
diner
: Imagine you were going to a diner for dinner tonight, how much do you think you would like the food?- 1: dislike extremely to 9: like extremely
cond
: In which condition was the participant?- control: Suject heard the song ‘Kalimba’ by Mr. Scrub
- potato: Subject heard the song ‘Hot Potato’ by the Australian band The Wiggles
- 64: Subject heard the song ‘When I am 64’ by the Beatles
aged365
: age in years
Getting the data
Files available
The following file was generated from the original data files (and saved in .csv
format):
falsePosPsy_all.csv
: Combines the 2 original datasets in one file:
http://rpository.com/ds4psy/data/falsePosPsy_all.csv.
2 variables that denote the original study (1 vs. 2) and a unique participantID
(ranging from 1 to 78) have been added, so that the data file now contains 78 cases and 19 variables.
Loading data
# Load csv-data files from online links:
falsePosPsy_all <- readr::read_csv(file = "http://rpository.com/ds4psy/data/falsePosPsy_all.csv")
# Check:
dim(falsePosPsy_all) # 78 x 19
#> [1] 78 19
# Check number of missing values:
sum(is.na(falsePosPsy_all)) # 0 missing values
#> [1] 0
References
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. doi: https://doi.org/10.1177/0956797611417632
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2014). Data from paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”. Journal of Open Psychology Data, 2(1), e1. doi: http://doi.org/10.5334/jopd.aa
Data at https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/ or https://zenodo.org/record/7664.
Other sources of data
Data in base R
Every version of R comes with a collection of datasets:
## Get info on included datasets:
library(help = "datasets")
## Check some dimensions: -----
# dim(ChickWeight)
# dim(iris)
# Nile # Time series. See plot(Nile)
# dim(sleep) # Student's Sleep Data
# dim(Titanic) # also see dim(FFTrees::titanic)
Data in R packages
Packages of the tidyverse
:
dplyr
:starwars
ggplot2
:diamonds
,mpg
,msleep
, etc.tidyr
:table1
, etc.
Other packages with large data sets include:
babynames
dslabs
eurostat
FFTrees
:breastcancer
,car
,heartdisease
,mushrooms
,titanic
,wine
ISLR
MASS
nycflights13
yarrr
:pirates
,movies
,auction
, etc.
Online sources
The web is full of data, of course, but most of it needs sound data science and a sound dose of scepticism to be of any use. Here are some good starting points for finding free data:
Collections:
Specific datasets:
- PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals
Conclusion
All ds4psy essentials so far:
Nr. | Topic |
---|---|
0. | Syllabus |
1. | Basic R concepts and commands |
+. | Datasets |
[Last update on 2019-04-15 13:12:54 by hn.]