This vignette demonstrates one of the newer features in the `SimDesign`

package pertaining to multiple analysis function definitions that can be selected for each `Design`

condition or whenever they are applicable. The purpose of providing multiple analysis functions is to

- Remove the less readable if-then-else combinations that can appear when writing simulation code, where specific analysis functions are not intended to be used on a given generated dataset,
- Provide better automatic naming of the analysis results across independent subroutines,
- Construct more readable code when the
`Analyse()`

function contains too much code to easily track, and to - Create a more modular approach to isolating the analysis functions for the purpose of redistribution or reusing in related projects

Functionality speaking, this type of organization does not change how `SimDesign`

generally operates. For that reason, the coding style presented in this vignette can be considered optional. However, if any of the above points resonate well with you then following the details of this coding organization style may prove useful.

The usual work-flow with `SimDesign`

requires first calling `SimFunctions()`

to generate a working template, such as the following.

`SimDesign::SimFunctions()`

```
## #-------------------------------------------------------------------
##
## library(SimDesign)
##
## Design <- createDesign(factor1 = NA,
## factor2 = NA)
##
## #-------------------------------------------------------------------
##
## Generate <- function(condition, fixed_objects = NULL) {
## dat <- data.frame()
## dat
## }
##
## Analyse <- function(condition, dat, fixed_objects = NULL) {
## ret <- nc(stat1 = NaN, stat2 = NaN)
## ret
## }
##
## Summarise <- function(condition, results, fixed_objects = NULL) {
## ret <- c(bias = NaN, RMSE = NaN)
## ret
## }
##
## #-------------------------------------------------------------------
##
## res <- runSimulation(design=Design, replications=1000, generate=Generate,
## analyse=Analyse, summarise=Summarise)
## res
```

which uses the default `nAnalyses=1`

to generate only a single `Analyse()`

function. In the context of multiple analysis functions, however, users may be more interested in passing the number of analysis functions they believe they will need in their simulation (e.g., if analyzing a \(t\)-test setup to compare the Welch versus independent samples t-test, then two analysis functions should be used). Passing `nAnalyses=2`

to `SimFunctions()`

creates the following template:

`SimDesign::SimFunctions(nAnalyses = 2)`

```
## #-------------------------------------------------------------------
##
## library(SimDesign)
##
## Design <- createDesign(factor1 = NA,
## factor2 = NA)
##
## #-------------------------------------------------------------------
##
## Generate <- function(condition, fixed_objects = NULL) {
## dat <- data.frame()
## dat
## }
##
## Analyse.A1 <- function(condition, dat, fixed_objects = NULL) {
## ret <- nc(stat1 = NaN, stat2 = NaN)
## ret
## }
##
## Analyse.A2 <- function(condition, dat, fixed_objects = NULL) {
## ret <- nc(stat1 = NaN, stat2 = NaN)
## ret
## }
##
## #-------------------------------------------------------------------
##
## Summarise <- function(condition, results, fixed_objects = NULL) {
## ret <- c(bias = NaN, RMSE = NaN)
## ret
## }
##
## #-------------------------------------------------------------------
##
## res <- runSimulation(design=Design, replications=1000,generate=Generate,
## analyse=list(A1=Analyse.A1, A2=Analyse.A2),
## summarise=Summarise)
## res
```

Notice in this case that there are two `Analyse.#()`

definitions constructed, and when passed to `runSimulation()`

these are organized as a named `list`

. The names of the list will ultimately be attached to the names of the analysis objects so that there is no ambiguity in the outputted information. However, the inputs to the `Analyse()`

functions will always be the same, as the `dat`

object formed by the `Generate()`

call will be passed to each of these Analyse definitions (hence, the generate data is held constant across the respective analyses).

The above template should of course be modified to replace the less useful names of the `Analyse.#()`

components. By default users will want to change these to something like `Analyse.some_statistic`

, `Analyse.some_other_statistic`

, …, `Analyse.some_other_other_statistic`

, and so on, where the number of `Analyse.#`

function definitions will ultimately end up in the `runSimulation(..., Analyse=list())`

input. Supplying better names to the named `list`

component is also recommended as these will be used to name the associated results in the `Summarise()`

step.

Finally, note that all the rules about objects and object naming from the typical single Analyse function still apply and are properly checked internally for suitable names and consistency. The independently defined Analyse functions are also *interchangable* and *removable*/*replaceable*, which makes the structure of the Generate-Analyse-Summarise setup more modular with respect to the analysis components.

The following code is adopted from the Wiki, and so details about the simulation should be obtained from that source.

```
library(SimDesign)
# SimFunctions(nAnalyses = 2)
sample_sizes <- c(250, 500, 1000)
nitems <- c(10, 20)
Design <- createDesign(sample_size = sample_sizes,
nitems = nitems)
# create list of additional parameters which are fixed across conditions
set.seed(1)
pars_10 <- rbind(a = round(rlnorm(10, .3, .5)/1.702, 2),
d = round(rnorm(10, 0, .5)/1.702, 2))
pars_20 <- rbind(a = round(rlnorm(20, .3, .5)/1.702, 2),
d = round(rnorm(20, 0, .5)/1.702, 2))
pars <- list(ten=pars_10, twenty=pars_20)
P_logit <- function(a, d, Theta) exp(a * Theta + d) / (1 + exp(a * Theta + d))
P_ogive <- function(a, d, Theta) pnorm(a * Theta + d)
```

```
Generate <- function(condition, fixed_objects = NULL) {
N <- condition$sample_size
nitems <- condition$nitems
nitems_name <- ifelse(nitems == 10, 'ten', 'twenty')
#extract objects from fixed_objects
a <- fixed_objects[[nitems_name]]['a', ]
d <- fixed_objects[[nitems_name]]['d', ]
dat <- matrix(NA, N, nitems)
colnames(dat) <- paste0('item_', 1:nitems)
Theta <- rnorm(N)
for(j in 1:nitems){
p <- P_ogive(a[j], d[j], Theta)
for(i in 1:N)
dat[i,j] <- sample(c(1,0), 1, prob = c(p[i], 1 - p[i]))
}
as.data.frame(dat) #data.frame works nicer with lavaan
}
Analyse.FIML <- function(condition, dat, fixed_objects = NULL) {
mod <- mirt(dat, 1L, verbose=FALSE)
if(!extract.mirt(mod, 'converged')) stop('mirt did not converge')
cfs <- mirt::coef(mod, simplify = TRUE, digits = Inf)
FIML_as <- cfs$items[,1L] / 1.702
ret <- c(as=unname(FIML_as))
ret
}
Analyse.DWLS <- function(condition, dat, fixed_objects = NULL) {
nitems <- condition$nitems
lavmod <- paste0('F =~ ', paste0('NA*', colnames(dat)[1L], ' + '),
paste0(colnames(dat)[-1L], collapse = ' + '),
'\nF ~~ 1*F')
lmod <- sem(lavmod, dat, ordered = colnames(dat))
if(!lavInspect(lmod, 'converged')) stop('lavaan did not converge')
cfs2 <- lavaan::coef(lmod)
DWLS_alpha <- cfs2[1L:nitems]
const <- sqrt(1 - DWLS_alpha^2)
DWLS_as <- DWLS_alpha / const
ret <- c(as=unname(DWLS_as))
ret
}
Summarise <- function(condition, results, fixed_objects = NULL) {
nitems <- condition$nitems
nitems_name <- ifelse(nitems == 10, 'ten', 'twenty')
#extract objects from fixed_objects
a <- fixed_objects[[nitems_name]]['a', ]
pop <- c(a, a)
obt_bias <- bias(results, pop)
obt_RMSE <- RMSE(results, pop)
ret <- c(bias=obt_bias, RMSE=obt_RMSE)
ret
}
```

```
res <- runSimulation(Design, replications=100, verbose=FALSE, parallel=TRUE,
generate=Generate,
analyse=list(FIML=Analyse.FIML, DWLS=Analyse.DWLS),
summarise=Summarise, filename = 'mirt_lavaan',
packages=c('mirt', 'lavaan'), fixed_objects=pars)
```

`res`

```
## # A tibble: 6 × 86
## sample_size nitems bias.FIML.as1 bias.FIML.…¹ bias.FIM…² bias.FIM…³ bias.FIM…⁴
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 250 10 0.0099887 0.013457 -0.020161 0.18616 9.2653e-3
## 2 500 10 -0.021136 0.018720 -0.018173 0.10424 2.9298e-2
## 3 1000 10 -0.0039473 -0.0014778 -0.013393 0.077890 3.5690e-3
## 4 250 20 0.060647 0.011787 0.064688 -0.026314 5.8222e-4
## 5 500 20 0.019202 0.045376 0.021837 -0.0063587 2.0297e-2
## 6 1000 20 0.019976 0.018608 0.0079817 -0.011498 -1.2842e-2
## # … with 79 more variables: bias.FIML.as6 <dbl>, bias.FIML.as7 <dbl>,
## # bias.FIML.as8 <dbl>, bias.FIML.as9 <dbl>, bias.FIML.as10 <dbl>,
## # bias.DWLS.as1 <dbl>, bias.DWLS.as2 <dbl>, bias.DWLS.as3 <dbl>,
## # bias.DWLS.as4 <dbl>, bias.DWLS.as5 <dbl>, bias.DWLS.as6 <dbl>,
## # bias.DWLS.as7 <dbl>, bias.DWLS.as8 <dbl>, bias.DWLS.as9 <dbl>,
## # bias.DWLS.as10 <dbl>, RMSE.FIML.as1 <dbl>, RMSE.FIML.as2 <dbl>,
## # RMSE.FIML.as3 <dbl>, RMSE.FIML.as4 <dbl>, RMSE.FIML.as5 <dbl>, …
## # ℹ Use `colnames()` to see all variable names
```

In this particular formulation the `mirt`

and `lavaan`

package analyses have been completely isolated into their own respective functions, and in principle could therefore be analyzed independently in future simulation studies. This adds a nicer layer of potential modularity to the Analyse portion of the `SimDesign`

framework, where re-using or modifying previous `SimDesign`

code should be less painful.

`AnalyseIf()`

In situations where analysis functions defined in the `analyse`

list should only be applied in certain design conditions, users can include an `AnalyseIf()`

definition at the beginning of their respective functions to ensure the analyses are only executed when the provided logical is `TRUE`

. This logical ensures the data generation conditions are suitable for the analysis function to be investigated; otherwise, it is skipped over in the generate-analyse-summarise work-flow.

As a continuation from above, say that an investigator was also interested in recovering the slope parameters of a factor analysis model where the observed indicator variable were continuous as well as discrete. The `Design`

definition may therefore look like the following.

```
Design <- createDesign(sample_size = sample_sizes,
nitems = nitems,
indicators = c('discrete', 'continuous'))
Design
```

```
## # A tibble: 12 × 3
## sample_size nitems indicators
## <dbl> <dbl> <chr>
## 1 250 10 discrete
## 2 500 10 discrete
## 3 1000 10 discrete
## 4 250 20 discrete
## 5 500 20 discrete
## 6 1000 20 discrete
## 7 250 10 continuous
## 8 500 10 continuous
## 9 1000 10 continuous
## 10 250 20 continuous
## 11 500 20 continuous
## 12 1000 20 continuous
```

Provided that the `Generate()`

step utilized this `indicators`

character vector, this would imply that the `dat`

object returned from `Generate()`

could consist of discrete or continuous data. In the case of continuous indicator variables, `lavaan`

could be used as it supports such indicator types; however, `mirt`

cannot. So, to ensure that only the analysis function pertaining to `lavaan`

is used one could include the following replacement definition that used `mirt`

, but now includes an `AnalyseIf()`

logical given the `indicators`

variable’s state.

```
Analyse.FIML <- function(condition, dat, fixed_objects = NULL) {
AnalyseIf(condition$indicators == 'discrete')
# equivalently:
# AnalyseIf(indicators == 'discrete', condition)
# with(condition, AnalyseIf(indicators == 'discrete'))
mod <- mirt(dat, 1L, verbose=FALSE)
if(!extract.mirt(mod, 'converged')) stop('mirt did not converge')
cfs <- coef(mod, simplify = TRUE, digits = Inf)
FIML_as <- cfs$items[,1L] / 1.702
ret <- c(as=unname(FIML_as))
ret
}
```

Using this definition the final object returned by `runSimulation()`

will provide suitable `NA`

placeholders (where appropriate). For continuous indicators the results will be presented as though `mirt`

was never used for the continuous indicator conditions controlled by the `Design`

object.

Note that a similar logic can be made for multiple `Generate()`

functions, templated via `SimDesign::SimFunctions(nGenerate = 2)`

, and requiring specific `GenerateIf()`

tests to indicate which generation function should be used. For those interested in this type of structure, the following could be used to separate the discrete/continuous data generation per `Design`

row:

```
Generate.G1 <- function(condition, fixed_objects = NULL) {
GenerateIf(condition$indicators == 'discrete')
...
dat
}
Generate.G2 <- function(condition, fixed_objects = NULL) {
GenerateIf(condition$indicators == 'continuous')
...
dat
}
res <- runSimulation(design=Design, replications=1000,
generate=list(G1=Generate.G1, G2=Generate.G2),
analyse=list(DWLS=Analyse.DWLS, FIML=Analyse.FIML),
summarise=Summarise)
```

Interestingly, `AnalyseIf()`

could also be used to select only one analysis function at a time given the components in the `Design`

object. For instance, if the `Design`

definition were constructed using

```
Design <- createDesign(sample_size = sample_sizes,
nitems = nitems,
method = c('FIML', 'DWLS'))
Design
```

```
## # A tibble: 12 × 3
## sample_size nitems method
## <dbl> <dbl> <chr>
## 1 250 10 FIML
## 2 500 10 FIML
## 3 1000 10 FIML
## 4 250 20 FIML
## 5 500 20 FIML
## 6 1000 20 FIML
## 7 250 10 DWLS
## 8 500 10 DWLS
## 9 1000 10 DWLS
## 10 250 20 DWLS
## 11 500 20 DWLS
## 12 1000 20 DWLS
```

and the analysis functions above were supplied defined as

```
Analyse.FIML <- function(condition, dat, fixed_objects = NULL) {
AnalyseIf(method == 'FIML', condition)
#...
}
Analyse.DWLS <- function(condition, dat, fixed_objects = NULL) {
AnalyseIf(method == 'DWLS', condition)
# ...
}
# ...
res <- runSimulation(Design, replications=100, verbose=FALSE, parallel=TRUE,
generate=Generate,
analyse=list(Analyse.FIML, Analyse.DWLS),
summarise=Summarise, filename = 'mirt_lavaan',
packages=c('mirt', 'lavaan'), fixed_objects=pars)
```

then only one analysis function will be applied at a time in the simulation experiment. Note that in this case there is no need to append ‘MML’ or ‘DWLS’ to the `results`

objects as this becomes redundant with the `method`

column in the `Design`

object, and so the `analyse`

list input is specified as an unnamed list (cf. earlier when the input was named, which appended `MML.`

and `DWLS.`

to the `results`

output in `Summarise()`

).

Users may find this a more natural setup than having to merge all analysis information into a single `Analyse()`

definition. The downside, however, is that the analysis function will be applied to different generated datasets, which while theoretically unbiased could have ramifications should the analysis functions throw errors at different rates (even when explicitly supplying a `seed`

vector input to `runSimulation()`

).