Codebook example with SPSS dataset

Ruben Arslan

2019-05-21

knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
pander::panderOptions("table.split.table", Inf)
ggplot2::theme_set(ggplot2::theme_bw())
knitr::opts_chunk$set(warning = TRUE, message = TRUE, error = TRUE, echo = TRUE)
library(dplyr)
library(codebook)

In this vignette, you can see how to use the metadata that is often already stored in SPSS and Stata files. It’s easy. All we need is the rio::import function. For files with the right file extension, we can automatically pick the right way to import the data. Here, we’re downloading straight from the Open Science Framework, so we have to specify the file extension.

We select a subset of variables, just to keep it short. The data were shared by Emanuel Jauk in a project called How alluring are dark personalities? The Dark Triad and attractiveness in speed dating.

Often, files imported from SPSS or Stata to R will not have their missings coded properly. Here, that is not the case, but if you find yourself with such a dataset, the detect_missing function makes it easy to recognise common ways to specify missing data (e.g. negative values, labelled values, 99/999).

darktriad <- rio::import("https://osf.io/j4fcb/download", format = "sav")
if (!knit_by_pkgdown) {
  darktriad <- darktriad %>%
  select(DG, sex, relStat, education, NPI_avg)
}
metadata(darktriad)$name <- "How alluring are dark personalities? The Dark Triad and attractiveness in speed dating"
metadata(darktriad)$description <- paste0("The data to this speed dating study comes in two different formats: Personwise (one record for each individual) and dyadic (pairwise; one record for each date). The respective SPSS files are named \"DarkTriadDate_person.sav\" and \"DarkTriadDate_dyad.sav\".

### Download link
[Open Science Framework](https://osf.io/j4fcb/download)

### Personwise datafile 
The personwise datafile contains individual differences variables and perceiver and target effects according to the social relations model. These are centered marginal means that were calculated according to the formulae provided by Kenny, Kashy, and Cook (2006). These effects are not (!) based on multilevel analyses.

### Preprocessing
All rating variables (i.e., actual choice, friendship, short-term relationship etc.) were corrected for prior acquaintance, which means that dates wih prior acquaintance were excluded (set to missing) on a dyadic basis.

Variables are labeled in SPSS. 

### A list of important abbreviations, prefixes and suffixes:

* _acq = acquaintance (i.e., variables with this suffix are controlled for prior * acquaintance)
* avg = average
* _rat = rating variable
* _z = z-standardized score
* BC = booty call
* DG = dating group (three groups in this study)
* FIPI = five item personality inventory
* FS = friendship
* FWB = friends-with-benefits
* Int = Intelligence
* Like = Likeability
* LTR = long-term relationship
* MACHIV = mach-iv machiavellianism questionnaire
* N, E, O, A, C = Big5
* NPI = narcissistic personality inventory
* ONS = one night stand
* P = perceiver
* PA = physical attractiveness
* PercEff = perceiver effect
* SD = speed dating
* SRM = social relations model
* SRP = self-report psychopathy scale
* STR = short-term relationship
* T = target
* TargEff = target effect


")
metadata(darktriad)$identifier <- "https://osf.io/jvk3u/"
metadata(darktriad)$datePublished <- "2015-10-07"
metadata(darktriad)$creator <- list(
      "@type" = "Person",
      givenName = "Emanuel", familyName = "Jauk",
      email = "emanuel.jauk@uni‐graz.at", 
      affiliation = list("@type" = "Organization",
        name = "Karl‐Franzens‐Universität Graz, Austria"))
metadata(darktriad)$citation <- "Jauk, E., Neubauer, A. C., Mairunteregger, T., Pemp, S., Sieber, K. P., & Rauthmann, J. F. (2016). How alluring are dark personalities? The Dark Triad and attractiveness in speed dating. European Journal of Personality, 30(2), 125-138."
metadata(darktriad)$url <- "https://osf.io/j4fcb/"
metadata(darktriad)$temporalCoverage <- "2015" 
metadata(darktriad)$spatialCoverage <- "Graz, Austria" 
metadata(darktriad)$distribution = list(
  list("@type" = "DataDownload",
       "requiresSubscription" = "http://schema.org/True",
       "encodingFormat" = "https://www.loc.gov/preservation/digital/formats/fdd/fdd000469.shtml",
       contentUrl = "https://osf.io/j4fcb/download")
)
# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)

Now, we can immediately generate a codebook.

Metadata

Description

Dataset name: How alluring are dark personalities? The Dark Triad and attractiveness in speed dating

The data to this speed dating study comes in two different formats: Personwise (one record for each individual) and dyadic (pairwise; one record for each date). The respective SPSS files are named “DarkTriadDate_person.sav” and “DarkTriadDate_dyad.sav”.

Personwise datafile

The personwise datafile contains individual differences variables and perceiver and target effects according to the social relations model. These are centered marginal means that were calculated according to the formulae provided by Kenny, Kashy, and Cook (2006). These effects are not (!) based on multilevel analyses.

Preprocessing

All rating variables (i.e., actual choice, friendship, short-term relationship etc.) were corrected for prior acquaintance, which means that dates wih prior acquaintance were excluded (set to missing) on a dyadic basis.

Variables are labeled in SPSS.

A list of important abbreviations, prefixes and suffixes:

Metadata for search engines

Variables

DG

dating group

Distribution

0 missing values.

Summary statistics

name label data_type missing complete n mean sd p0 p25 p50 p75 p100 hist format.spss
DG dating group numeric 0 90 90 2.03 0.77 1 1 2 3 3 ▆▁▁▇▁▁▁▆ F8.0

sex

sex

Distribution

0 missing values.

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist format.spss display_width
sex sex numeric 1. female,
2. male
0 90 90 1.49 0.5 1 1 1 2 2 ▇▁▁▁▁▁▁▇ F1.0 5

Value labels

  • female: 1
  • male: 2

relStat

relationship status

Distribution

1 missing values.

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist format.spss display_width
relStat relationship status numeric 1. single,
2. in a relationship,
3. living separately / divorced
1 89 90 1.09 0.32 1 1 1 1 3 ▇▁▁▁▁▁▁▁ F8.0 10

Value labels

  • single: 1
  • in a relationship: 2
  • living separately / divorced: 3

education

highest educational attainment

Distribution

1 missing values.

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist format.spss display_width
education highest educational attainment numeric 1. nine years schooling only,
2. professional training,
3. vocational school,
4. university-entrance diploma,
5. academic degree
1 89 90 4.17 0.38 4 4 4 4 5 ▇▁▁▁▁▁▁▂ F1.0 5

Value labels

  • nine years schooling only: 1
  • professional training: 2
  • vocational school: 3
  • university-entrance diploma: 4
  • academic degree: 5

NPI_avg

narcissistic personality inventory - average

Distribution

0 missing values.

Summary statistics

name label data_type missing complete n mean sd p0 p25 p50 p75 p100 hist format.spss display_width
NPI_avg narcissistic personality inventory - average numeric 0 90 90 2.61 0.35 1.7 2.42 2.6 2.82 3.65 ▁▂▃▇▆▂▁▁ F8.2 10

Missingness report

Codebook table

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist format.spss display_width
DG dating group numeric NA 0 90 90 2.03 0.77 1 1 2 3 3 ▆▁▁▇▁▁▁▆ F8.0 NA
sex sex numeric 1. female,
2. male
0 90 90 1.49 0.5 1 1 1 2 2 ▇▁▁▁▁▁▁▇ F1.0 5
relStat relationship status numeric 1. single,
2. in a relationship,
3. living separately / divorced
1 89 90 1.09 0.32 1 1 1 1 3 ▇▁▁▁▁▁▁▁ F8.0 10
education highest educational attainment numeric 1. nine years schooling only,
2. professional training,
3. vocational school,
4. university-entrance diploma,
5. academic degree
1 89 90 4.17 0.38 4 4 4 4 5 ▇▁▁▁▁▁▁▂ F1.0 5
NPI_avg narcissistic personality inventory - average numeric NA 0 90 90 2.61 0.35 1.7 2.42 2.6 2.82 3.65 ▁▂▃▇▆▂▁▁ F8.2 10