Introduction

Content

This file contains descriptions of datasets used in this course and their sources.

Course coordinates

spds.uni.kn

Datasets

1. Positive psychology

Introduction

In a highly-cited publication, Seligman, Steen, Park, and Peterson (2005) suggest that positive psychology interventions (PPIs) contain specific, powerful, therapeutic ingredients that cause higher increases in happiness and reductions in depression than a placebo control. The study by Woodworth et al. (2017) re-examines this claim by comparing the three most effective PPIs (identical with the interventions used by Seligman et al., 2005) to a placebo control in a web‐based, randomized assignment design.

Data sources

Articles reporting original research:

  • Seligman, M. E., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421. doi: https://doi.org/10.1037/0003-066X.60.5.410

  • Woodworth, R. J., O’Brien‐Malone, A., Diamond, M. R., & Schüz, B. (2017). Web‐based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232. doi: https://doi.org/10.1002/jclp.22328

Article on data used here:

Codebook

Description of the variables and values contained in the 2 original data files:

1. File posPsy_participants.csv

The file posPsy_participants.csv contains 6 variables with demographic information on 295 participants:

  1. id: participant ID

  2. intervention: 3 positive psychology interventions (PPIs), plus 1 control condition:

    • 1 = “Using signature strengths”,
    • 2 = “Three good things”,
    • 3 = “Gratitude visit”,
    • 4 = “Recording early memories” (control condition).
  3. sex:

    • 1 = female,
    • 2 = male.
  4. age: participant’s age (in years).

  5. educ: level of education:

    • 1 = Less than Year 12,
    • 2 = Year 12,
    • 3 = Vocational training,
    • 4 = Bachelor’s degree,
    • 5 = Postgraduate degree.
  6. income:

    • 1 = below average,
    • 2 = average,
    • 3 = above average.

2. File posPsy_AHI_CESD.csv

The file posPsy_AHI_CESD.csv contains answers to the 24 items of the Authentic Happiness Inventory (AHI) and answers to the 20 items of the Center for Epidemiological Studies Depression (CES-D) scale (see Radloff, 1977) for multiple (1 to 6) measurement occasions:

  1. id: Particpant ID

  2. occasion: Measurement occasion:

    • 0 = Pretest (i.e., at enrolment),
    • 1 = Posttest (i.e., 7 days after pretest),
    • 2 = 1-week follow-up, (i.e., 14 days after pretest, 7 days after posttest),
    • 3 = 1-month follow-up, (i.e., 38 days after pretest, 31 days after posttest),
    • 4 = 3-month follow-up, (i.e., 98 days after pretest, 91 days after posttest),
    • 5 = 6-month follow-up, (i.e., 189 days after pretest, 182 days after posttest).
  3. elapsed.days: Time since enrolment measured in fractional days

  4. intervention: Intervention group (1 to 4)

  5. ahi01ahi24: Responses on 24 AHI items

  6. cesd01cesd20: Responses on 20 CES-D items

  7. ahiTotal: Total AHI score

  8. cesdTotal: Total CES-D score

Getting the data

Files available

The following files were generated from the original data files (and saved in .csv format):

  1. posPsy_participants.csv: Original participant data (295 x 6 variables):
    http://rpository.com/ds4psy/data/posPsy_participants.csv.

  2. posPsy_AHI_CESD.csv: Original data of dependent measures in long format (992 x 50 variables):
    http://rpository.com/ds4psy/data/posPsy_AHI_CESD.csv.

  3. posPsy_AHI_CESD_corrected.csv: Corrected version of dependent measures in long format (990 x 50 variables):
    http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv.

  4. posPsy_data_wide.csv: Corrected version of all data joined in wide format (295 x 294 variables):
    http://rpository.com/ds4psy/data/posPsy_data_wide.csv.
    Different measurement occasions are suffixed by .0, .1, …, .5.

Loading data

We can load data stored in csv-format into R by using the read_csv command (from the readr package, which is part of the tidyverse). Here, we obtain the data files from online sources (at http://rpository.com/ds4psy/):

# Packages:
library(tidyverse)

# Load csv-data files from online links:

# 1. Participant data: 
posPsy_p_info <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_participants.csv")
dim(posPsy_p_info)  # 295 x 6 
#> [1] 295   6

# 2. Original DVs in long format:
AHI_CESD <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_AHI_CESD.csv")
dim(AHI_CESD)  # 992 x 50
#> [1] 992  50

# 3. Corrected DVs in long format:
posPsy_long <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_AHI_CESD_corrected.csv")
dim(posPsy_long)  # 990 x 50
#> [1] 990  50

# 4. Transformed and corrected version of all data (in wide format): 
posPsy_wide <- read_csv(file = "http://rpository.com/ds4psy/data/posPsy_data_wide.csv")
dim(posPsy_wide)  # 295 x 294 
#> [1] 295 294

# Check number of missing values: 
sum(is.na(posPsy_p_info))  #     0 missing values 
#> [1] 0
sum(is.na(posPsy_long))    #     0 missing values 
#> [1] 0
sum(is.na(posPsy_wide))    # 37440 missing values!  
#> [1] 37440

References

  • Radloff, L. S. (1977). The CES-D scale: A self report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401. doi: https://doi.org/10.1177/014662167700100306

  • Seligman, M. E., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421.

  • Woodworth, R. J., O’Brien‐Malone, A., Diamond, M. R., & Schüz, B. (2017). Web‐based positive psychology interventions: A reexamination of effectiveness. Journal of Clinical Psychology, 73(3), 218–232.

  • Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. Journal of Open Psychology Data, 6: 1. doi: https://doi.org/10.5334/jopd.35

  • Data at https://doi.org/10.6084/m9.figshare.1577563.v1.

2. False positive psychology

Introduction

To highlight problematic research practices within psychology, Simmons, Nelson and Simonsohn (2011) published a controversial article with a necessarily false finding. By conducting simulations and two simple behavioral experiments, the authors show that flexibility in data collection, analysis, and reporting dramatically increases the rate of false-positive findings.

Data sources

Articles reporting original research:

  • Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. doi: https://doi.org/10.1177/0956797611417632

Article on data used here:

  • Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2014). Data from paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”. Journal of Open Psychology Data, 2(1), e1. doi: http://doi.org/10.5334/jopd.aa

See https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/ for data. (A zip-Archive with txt files is available at http://dx.doi.org/10.5281/zenodo.7664.)

Codebook

The study data is stored in 2 seperate files: study1.xlsx & study2.xlsx. Both data files contain the same information about each participant in 17 variables:

  1. age: Days since participant was born (based on their self-reported birthday)
  2. dad: Father’s age in years

  3. mom: Mother’s age in years

  4. female: Is the participant a woman?
    • 1: yes
    • 2: no
  5. root: Did they geht correctly the square root of 100?
    • 1: yes
    • 2: no
  6. bird: Imagine a restaurant you really like offered a 30% discount for dining between 4 pm and 6 pm. How likely would you be to take advantage of that offer?
    • 1: very unlikely to 7: very likely
  7. political: In the political spectrum, where would you place yourself?
    • 1: very liberal
    • 2: liberal
    • 3: centrist
    • 4: conservative
    • 5: very conservative
  8. quarterback: If you had to guess who was chosen the quarterback of the year in Canada last year, which of the following four options would you choose?
    • 1: Dalton Bell
    • 2: Daryll Clark
    • 3: Jarious Jackson
    • 4: Frank Wilczynski
  9. olddays: How often have you referred to some past part of your life as “the good old days”?
    • 11: Never
    • 12: almost never
    • 13: sometimes
    • 14: often
    • 15: very often
  10. potato: Did the participant hear the song ‘Hot Potato’ by the Australian band The Wiggles?
    • 1: yes
    • 2: no
  11. when64: Did the participant hear the song ‘When I am 64’ by the Beatles?
    • 1: yes
    • 2: no
  12. kalimba: Did the participant hear the song ‘Kalimba’ by Mr. Scrub?
    • 1: yes
    • 2: no
  13. feelold: How old do you feel?
    • 1: very young
    • 2: young
    • 3: neither young nor old
    • 4: old
    • 5: very old
  14. computer: Computers are complicated machines
    • 1: strongly disagree to 5: strongly agree
  15. diner: Imagine you were going to a diner for dinner tonight, how much do you think you would like the food?
    • 1: dislike extremely to 9: like extremely
  16. cond: In which condition was the participant?
    • control: Suject heard the song ‘Kalimba’ by Mr. Scrub
    • potato: Subject heard the song ‘Hot Potato’ by the Australian band The Wiggles
    • 64: Subject heard the song ‘When I am 64’ by the Beatles
  17. aged365: age in years

Getting the data

Files available

The following file was generated from the original data files (and saved in .csv format):

  1. falsePosPsy_all.csv: Combines the 2 original datasets in one file:
    http://rpository.com/ds4psy/data/falsePosPsy_all.csv.
    2 variables that denote the original study (1 vs. 2) and a unique participant ID (ranging from 1 to 78) have been added, so that the data file now contains 78 cases and 19 variables.

Loading data

# Load csv-data files from online links:
falsePosPsy_all <- readr::read_csv(file = "http://rpository.com/ds4psy/data/falsePosPsy_all.csv")

# Check: 
dim(falsePosPsy_all)  # 78 x 19 
#> [1] 78 19

# Check number of missing values: 
sum(is.na(falsePosPsy_all))  # 0 missing values  
#> [1] 0

References

Other sources of data

Data in base R

Every version of R comes with a collection of datasets:

## Get info on included datasets: 
library(help = "datasets") 

## Check some dimensions: ----- 
# dim(ChickWeight)
# dim(iris)
# Nile           # Time series. See plot(Nile)
# dim(sleep)     # Student's Sleep Data
# dim(Titanic)   # also see dim(FFTrees::titanic)

Data in R packages

Packages of the tidyverse:

  • dplyr: starwars
  • ggplot2: diamonds, mpg, msleep, etc.
  • tidyr: table1, etc.

Other packages with large data sets include:

  • babynames
  • dslabs
  • eurostat
  • FFTrees: breastcancer, car, heartdisease, mushrooms, titanic, wine
  • ISLR
  • MASS
  • nycflights13
  • yarrr: pirates, movies, auction, etc.

Online sources

The web is full of data, of course, but most of it needs sound data science and a sound dose of scepticism to be of any use. Here are some good starting points for finding free data:

Collections:

Specific datasets:

  • PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals

Conclusion

All ds4psy essentials so far:

Nr. Topic
0. Syllabus
1. Basic R concepts and commands
+. Datasets

[Last update on 2019-04-15 13:12:54 by hn.]