Introduction

Content

This file contains descriptions of datasets used in this course and their sources.

Course coordinates

PSY-15150, at the University of Konstanz by Hansjörg Neth (h.neth@uni.kn, SPDS, office D507).
Summer 2019: Mondays, 15:15–16:45, D435.
Links to current course syllabus | ZeUS | Ilias

Datasets

1. Positive psychology

Introduction

In a highly-cited publication, Seligman, Steen, Park, and Peterson (2005) suggest that positive psychology interventions (PPIs) contain specific, powerful, therapeutic ingredients that cause higher increases in happiness and reductions in depression than a placebo control. The study by Woodworth et al. (2017) re-examines this claim by comparing the three most effective PPIs (identical with the interventions used by Seligman et al., 2005) to a placebo control in a web‐based, randomized assignment design.

Data sources

Articles reporting original research:

Seligman, M. E., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421. doi: https://doi.org/10.1037/0003-066X.60.5.410
Woodworth, R. J., O’Brien‐Malone, A., Diamond, M. R., & Schüz, B. (2017). Web‐based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232. doi: https://doi.org/10.1002/jclp.22328

Article on data used here:

Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). Data from, ‘Web-based Positive Psychology Interventions: A Reexamination of Effectiveness’. Journal of Open Psychology Data, 6: 1. doi: https://doi.org/10.5334/jopd.35
See https://openpsychologydata.metajnl.com/articles/10.5334/jopd.35/ for details.
The dataset is available from figshare at https://doi.org/10.6084/m9.figshare.1577563.v1.

Codebook

Description of the variables and values contained in the 2 original data files:

1. File `posPsy_participants.csv`

The file posPsy_participants.csv contains 6 variables with demographic information on 295 participants:

id: participant ID
intervention: 3 positive psychology interventions (PPIs), plus 1 control condition:
- 1 = “Using signature strengths”,
- 2 = “Three good things”,
- 3 = “Gratitude visit”,
- 4 = “Recording early memories” (control condition).
sex:
- 1 = female,
- 2 = male.
age: participant’s age (in years).
educ: level of education:
- 1 = Less than Year 12,
- 2 = Year 12,
- 3 = Vocational training,
- 4 = Bachelor’s degree,
- 5 = Postgraduate degree.
income:
- 1 = below average,
- 2 = average,
- 3 = above average.

2. File `posPsy_AHI_CESD.csv`

The file posPsy_AHI_CESD.csv contains answers to the 24 items of the Authentic Happiness Inventory (AHI) and answers to the 20 items of the Center for Epidemiological Studies Depression (CES-D) scale (see Radloff, 1977) for multiple (1 to 6) measurement occasions:

id: Particpant ID
occasion: Measurement occasion:
- 0 = Pretest (i.e., at enrolment),
- 1 = Posttest (i.e., 7 days after pretest),
- 2 = 1-week follow-up, (i.e., 14 days after pretest, 7 days after posttest),
- 3 = 1-month follow-up, (i.e., 38 days after pretest, 31 days after posttest),
- 4 = 3-month follow-up, (i.e., 98 days after pretest, 91 days after posttest),
- 5 = 6-month follow-up, (i.e., 189 days after pretest, 182 days after posttest).
elapsed.days: Time since enrolment measured in fractional days
intervention: Intervention group (1 to 4)
ahi01–ahi24: Responses on 24 AHI items
cesd01–cesd20: Responses on 20 CES-D items
ahiTotal: Total AHI score
cesdTotal: Total CES-D score

Getting the data

Files available

The following files were generated from the original data files (and saved in .csv format):

posPsy_participants.csv: Original participant data (295 x 6 variables):
http://rpository.com/ds4psy/data/posPsy_participants.csv.
posPsy_AHI_CESD.csv: Original data of dependent measures in long format (992 x 50 variables):
http://rpository.com/ds4psy/data/posPsy_AHI_CESD.csv.
posPsy_AHI_CESD_corrected.csv: Corrected version of dependent measures in long format (990 x 50 variables):
http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv.
posPsy_data_wide.csv: Corrected version of all data joined in wide format (295 x 294 variables):
http://rpository.com/ds4psy/data/posPsy_data_wide.csv.
Different measurement occasions are suffixed by .0, .1, …, .5.

Loading data

We can load data stored in csv-format into R by using the read_csv command (from the readr package, which is part of the tidyverse). Here, we obtain the data files from online sources (at http://rpository.com/ds4psy/):

# Packages:
library(tidyverse)

# Load csv-data files from online links:

# 1. Participant data: 
posPsy_p_info <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_participants.csv")
dim(posPsy_p_info)  # 295 x 6 
#> [1] 295   6

# 2. Original DVs in long format:
AHI_CESD <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_AHI_CESD.csv")
dim(AHI_CESD)  # 992 x 50
#> [1] 992  50

# 3. Corrected DVs in long format:
posPsy_long <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv")
dim(posPsy_long)  # 990 x 50
#> [1] 990  50

# 4. Transformed and corrected version of all data (in wide format): 
posPsy_wide <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_data_wide.csv")
dim(posPsy_wide)  # 295 x 294 
#> [1] 295 294

# Check number of missing values: 
sum(is.na(posPsy_p_info))  #     0 missing values 
#> [1] 0
sum(is.na(posPsy_long))    #     0 missing values 
#> [1] 0
sum(is.na(posPsy_wide))    # 37440 missing values!  
#> [1] 37440

References

Radloff, L. S. (1977). The CES-D scale: A self report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401. doi: https://doi.org/10.1177/014662167700100306
Seligman, M. E., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421.
Woodworth, R. J., O’Brien‐Malone, A., Diamond, M. R., & Schüz, B. (2017). Web‐based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232.
Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. Journal of Open Psychology Data, 6: 1. doi: https://doi.org/10.5334/jopd.35
Data at https://doi.org/10.6084/m9.figshare.1577563.v1.

2. False positive psychology

Introduction

To highlight problematic research practices within psychology, Simmons, Nelson and Simonsohn (2011) published a controversial article with a necessarily false finding. By conducting simulations and two simple behavioral experiments, the authors show that flexibility in data collection, analysis, and reporting dramatically increases the rate of false-positive findings.

Data sources

Articles reporting original research:

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. doi: https://doi.org/10.1177/0956797611417632

Article on data used here:

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2014). Data from paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”. Journal of Open Psychology Data, 2(1), e1. doi: http://doi.org/10.5334/jopd.aa

See https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/ for data. (A zip-Archive with txt files is available at http://dx.doi.org/10.5281/zenodo.7664.)

Codebook

The study data is stored in 2 seperate files: study1.xlsx & study2.xlsx. Both data files contain the same information about each participant in 17 variables:

age: Days since participant was born (based on their self-reported birthday)
dad: Father’s age in years
mom: Mother’s age in years
female: Is the participant a woman?
- 1: yes
- 2: no
root: Did they geht correctly the square root of 100?
- 1: yes
- 2: no
bird: Imagine a restaurant you really like offered a 30% discount for dining between 4 pm and 6 pm. How likely would you be to take advantage of that offer?
- 1: very unlikely to 7: very likely
political: In the political spectrum, where would you place yourself?
- 1: very liberal
- 2: liberal
- 3: centrist
- 4: conservative
- 5: very conservative
quarterback: If you had to guess who was chosen the quarterback of the year in Canada last year, which of the following four options would you choose?
- 1: Dalton Bell
- 2: Daryll Clark
- 3: Jarious Jackson
- 4: Frank Wilczynski
olddays: How often have you referred to some past part of your life as “the good old days”?
- 11: Never
- 12: almost never
- 13: sometimes
- 14: often
- 15: very often
potato: Did the participant hear the song ‘Hot Potato’ by the Australian band The Wiggles?
- 1: yes
- 2: no
when64: Did the participant hear the song ‘When I am 64’ by the Beatles?
- 1: yes
- 2: no
kalimba: Did the participant hear the song ‘Kalimba’ by Mr. Scrub?
- 1: yes
- 2: no
feelold: How old do you feel?
- 1: very young
- 2: young
- 3: neither young nor old
- 4: old
- 5: very old
computer: Computers are complicated machines
- 1: strongly disagree to 5: strongly agree
diner: Imagine you were going to a diner for dinner tonight, how much do you think you would like the food?
- 1: dislike extremely to 9: like extremely
cond: In which condition was the participant?
- control: Suject heard the song ‘Kalimba’ by Mr. Scrub
- potato: Subject heard the song ‘Hot Potato’ by the Australian band The Wiggles
- 64: Subject heard the song ‘When I am 64’ by the Beatles
aged365: age in years

Getting the data

Files available

The following file was generated from the original data files (and saved in .csv format):

falsePosPsy_all.csv: Combines the 2 original datasets in one file:
http://rpository.com/ds4psy/data/falsePosPsy_all.csv.
2 variables that denote the original study (1 vs. 2) and a unique participant ID (ranging from 1 to 78) have been added, so that the data file now contains 78 cases and 19 variables.

Loading data

# Load csv-data files from online links:
falsePosPsy_all <- readr::read_csv(file = "http://rpository.com/ds4psy/data/falsePosPsy_all.csv")

# Check: 
dim(falsePosPsy_all)  # 78 x 19 
#> [1] 78 19

# Check number of missing values: 
sum(is.na(falsePosPsy_all))  # 0 missing values  
#> [1] 0

References

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. doi: https://doi.org/10.1177/0956797611417632
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2014). Data from paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”. Journal of Open Psychology Data, 2(1), e1. doi: http://doi.org/10.5334/jopd.aa
Data at https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/ or https://zenodo.org/record/7664.

Other sources of data

Data in base R

Every version of R comes with a collection of datasets:

## Get info on included datasets: 
library(help = "datasets") 

## Check some dimensions: ----- 
# dim(ChickWeight)
# dim(iris)
# Nile           # Time series. See plot(Nile)
# dim(sleep)     # Student's Sleep Data
# dim(Titanic)   # also see dim(FFTrees::titanic)

Data in R packages

Packages of the tidyverse:

dplyr: starwars
ggplot2: diamonds, mpg, msleep, etc.
tidyr: table1, etc.

Other packages with large data sets include:

babynames
dslabs
eurostat
FFTrees: breastcancer, car, heartdisease, mushrooms, titanic, wine
ISLR
MASS
nycflights13
yarrr: pirates, movies, auction, etc.

Online sources

The web is full of data, of course, but most of it needs sound data science and a sound dose of scepticism to be of any use. Here are some good starting points for finding free data:

Collections:

Google dataset search
Kaggle: A place for data science projects (with many large datasets)
Wikidata: Wikipedia data

Specific datasets:

PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals

Conclusion

All ds4psy essentials so far:

Nr.	Topic
0.	Syllabus
1.	Basic R concepts and commands
+.	Datasets

[Last update on 2019-04-15 13:12:54 by hn.]

Datasets (ds4psy)

Hansjörg Neth, SPDS, uni.kn

2019 04 15

Introduction

Content

Course coordinates

Datasets

1. Positive psychology

Introduction

Data sources

Codebook

1. File `posPsy_participants.csv`

2. File `posPsy_AHI_CESD.csv`

Getting the data

Files available

Loading data

References

2. False positive psychology

Introduction

Data sources

Codebook

Getting the data

Files available

Loading data

References

Other sources of data

Data in base R

Data in R packages

Online sources

Conclusion

Datasets (ds4psy)

Hansjörg Neth, SPDS, uni.kn

2019 04 15

Introduction

Content

Course coordinates

Datasets

1. Positive psychology

Introduction

Data sources

Codebook

1. File posPsy_participants.csv

2. File posPsy_AHI_CESD.csv

Getting the data

Files available

Loading data

References

2. False positive psychology

Introduction

Data sources

Codebook

Getting the data

Files available

Loading data

References

Other sources of data

Data in base R

Data in R packages

Online sources

Conclusion

1. File `posPsy_participants.csv`

2. File `posPsy_AHI_CESD.csv`