Acquiring valid norm scores fundamentally relies on the
representativeness of the norm sample. Traditionally, random sampling is
employed to meet this end, but even this approach may result in a sample
that diverges from the population structure. The *cNORM* R
package provides a potent solution by incorporating sampling weights
into the norming process, thereby diminishing the impact of
non-representative norm samples on the norm score quality. A key
component of this process is raking, or iterative proportional fitting,
which allows for the post-stratification of the norm sample based on
stratification variables (SVs), considering their population
marginals.

When a norm sample inadequately represents the target population, especially regarding pertinent stratification variables, it can compromise the quality of the resulting norm scores (Kruskal & Mosteller, 1979). An illustration of this issue is the neglect of parental educational background when determining norm scores for a children’s intelligence test. Such oversight can lead to skewed norm scores, thus distorting the child’s actual intelligence level (Hernandez et al., 2017). As norm scores commonly inform significant decisions, like school placement or the diagnosis of learning disabilities (Gary & Lenhard, 2021; Lenhard, Lenhard & Gary, 2019; Lenhard & Lenhard, 2021), any bias can potentially disadvantage those being evaluated. Therefore, techniques such as sample weighting methods become crucial to mitigate non-representativeness in norm samples.

Raking, also called iterative proportional fitting, is a post-stratification approach targeted to enhance sample representativeness with respect to two or more stratification variables . For this purpose, sample weights are computed for every case in the norm sample based on the ratio between the proportion of the corresponding strata in the target population and the proportion in the actual norm sample (Lumley, 2011). The procedure can be described as an iterative post-stratification with respect to one variable in each step. For example, let’s assume a target population containing 49% female as well as 51% male persons, while the resulting norm sample contains 45% female and 55% male subjects. To enhance the representativeness of the norm sample with respect to the SV sex (female/male), every single female case would be weighted with \(w_{female}=\frac{49\%}{45\%}=1.09\) and every single male case with \(w_{male}=\frac{55\%}{51\%}=0.93\). For stratifying a norm sample with respect to two or more variables, for example sex(female/male) and education(low/medium/high), the before described adaptation is applied several times regarding the marginals of one variable by time iteratively. For example, if the weights are adapted with respect to the variable sex first, the weights would be adapted regarding education in the second step. Since the weights no longer represent the population with respect to variable sex after the second step, the weights are computed to SV sex in the third step respectively to education in the fourth step and so on until the corresponding raking weights are converged. Finally, the resulting raking weights respectively the weighted norm sample represents the target population with respect to the marginal proportions of the used SVs. Each case is assigned with an according weight in a way that the proportions of the strata in the norm sample aligns with the composition of the representative population.

The integration of raking weights in *cNORM* is accomplished
in three steps.

- Computation and standardization of raking weights
- Initial ranking of test raw scores using standardized raking weights with weighted percentile estimation
- Regression-based norming with standardized regression weights

Raking weights are computed regarding the proportions of the SVs in the target population and the actual norm sample. Afterwards, the resulting raking weights are standardized by dividing every weight by the smallest resulting raking weight, i.e., the smallest weight is set to 1.0, while the ratio between one weight and each other remains the same. Consequently, underrepresented cases in the sample are weighted with a factor larger 1.0. To compute the weights, please provide a data frame with three columns to specify the population marginals. The first column specifies the stratification variable, the second the factor level of the stratification variable and the third the proportion for the representative population. The function ‘computeWeights()’ is used to retrieve the weights. The original data and the marginals have to be passed as function parameters.

Secondly, the norm sample is ranked with respect to the raking weights using weighted percentile. This step is the actual start of the further regression-based norming approach and it is automatically applied in the ‘cnorm()’ function, as soon as weights are specified.

Finally, the standardized raking weights are used in the weighted best-subset regression to obtain an adequate norm model. While the former steps can be seen as kind of data preparation, the computation of the regression-based norm model represents the actual norming process, since the resulting regression model is used for the actual mapping between achieved raw score and assigned norm score. By using the standardized raking weights in weighted regression, an overfit of the regression model with respect to overrepresented data points should be reduced. This third step is as well applied automatically when using the ‘cnorm()’ function.

In the following, the usage of raking weights in regression-based
norming with *cNORM* is illustrated in detail based the on a not
representative norm sample for the German version of the *Peabody
Picture Vocabulary Test* (PPVT-IV)

```
library(cNORM)
# Assign data to object norm.data
<- ppvt
norm.data head(norm.data)
#> age sex migration region raw group
#> 1 2.5971 1 0 west 120 3.160655
#> 2 2.5993 1 0 west 67 3.160655
#> 3 2.6241 1 0 west 23 3.160655
#> 4 2.8622 1 0 south 50 3.160655
#> 5 2.8764 1 0 south 44 3.160655
#> 6 2.9308 1 0 west 55 3.160655
```

For the post-stratification, we need population marginals for the relevant stratification variables as a data frame, with each level of each stratification variable in a row. The data frame must contain the names of the SVs (column 1), the single levels (column 2) and the corresponding proportion in the target population (column 3).

```
# Generate population marginals
<- data.frame(var = c("sex", "sex", "migration", "migration"),
marginals level = c(1,2,0,1),
prop = c(0.51, 0.49, 0.65, 0.35))
head(marginals)
#> var level prop
#> 1 sex 1 0.51
#> 2 sex 2 0.49
#> 3 migration 0 0.65
#> 4 migration 1 0.35
```

To caclulate raking weights, the cNORM’s ‘computeWeights()’ function is used, with the norm sample data and the population marginals as function parameters.

```
<- computeWeights(data = norm.data, population.margins = marginals)
weights #> Raking converged normally after 3 iterations.
```

Using the ‘cnorm()’ function passing the raking weights by function parameter ‘weights’, the intial weighted ranking and the actual norming process is started.

```
<- cnorm(raw = norm.data$raw, group = norm.data$group,
norm.model weights = weights)
```

The resulting model contains four predictors with a RMSE of 3.54212.

```
summary(norm.model)
#> Final solution: 6 terms
#> R-Square Adj. = 0.990042
#> Final regression model: raw ~ L2 + L1A1 + L1A2 + L2A1 + L2A3 + L4A1
#> Regression function: raw ~ -92.16983481 + (0.0195782225*L2) + (1.327109958*L1A1) + (-0.03851643094*L1A2) + (-0.01236535515*L2A1) + (1.320696396e-05*L2A3) + (3.158314092e-07*L4A1)
#> Raw Score RMSE = 3.60335
#> Post stratification was applied. The weights range from 1 to 1.415 (m = 1.116, sd = 0.182).
```

Moreover, the percentile plot reveals no hints on model violation, like intersecting percentile curves. It reaches a high multiple R2 with only few terms.

```
plot(norm.model, "subset")
plot(norm.model, "norm")
```

We extensively simulated biased distributions and assessed, if our approach can mitigate the effects of unrepresentative samples. cNORM itself already corrects for several types of sampling eror, namely if deviations occur in specific age groups or if joint probabilities of stratification variables are unbalanced (while preserving the marginals). Weighted Continuous Norming as well works very well in most, but not all use cases. Please note the following:

- Non-representativeness in most cases leads to (moderately) increased error of the normed scores. It is - of course - always better to ensure the highest feasible degree of representativeness in the data collection.
- The data collection should be as random as possible.
- In most but not in all cases, Weighted Continuous Norming reduces
negative effects of non-representative norm samples. If the mean of the
standardized weights exceeds a value of \(m_{weights}=2\), this is an indication to
rather not use weighting.

- With cNORM, representativeness need not necessarily be established in every single age group. If the marginals are more or less correct, weighting is unnecessary.
- Only use stratification for variables with considerable influence on the dependent variable.
- If available, the probabilities of cross-classifications of the stratification variables can be used. You can recode several variables into one and directly specify the according population marginals (especially in combination with the next point).
- Avoid too many stratification variables with many fine-grained levels. This leads to high weights in specific subgroups. Rather combine different levels of stratification variables, if the according subgroups do not differ in the outcome variable.

- Gary, S., & Lenhard, W., (2021). In norming we trust - Verfahren
zur statistischen Modellierung kontinuierlicher Testnormen auf dem
Prüfstand.
*Diagnostica*, 67(2), 75 - 86. - Hernandez, A., Aguilar, C., Paradell, È., Munoz, M., Vannier, L.C.,
& Vallar, F. (2017). The effect of demographic variables on the
assesment of cognitive ability.
*Psicothema*.*29*(4), 469 - 474. - Kruskal, W., & Mosteller, F. (1979). Representative sampling,
III: The current statistical literature.
*International Statistical Review/Revue Internationale de Statistique*, 245 - 265 - Lenhard, A., Lenhard, W., Segerer, R., & Suggate, S. (2015).
*Peabody Picture Vocbulary Test (PPVT-4)*. Frankfurt: Pearson Clinical Assesment. - Lenhard, A., Lenhard, W., & Gary, S. (2019). Continuous norming
of psychometric tests: A simulation study of parametric and
semi-parametric approaches.
*PloS one*,*14*(9), e0222279. - Lenhard, W., & Lenhard, A. (2021). Improvement of norm score
quality via regression-based continuous norming.
*Educational and Psychological Measurement*,*81*(2), 229 - 261. - Lumley, T. (2011).
*Complex surveys: A guide to analysis using R*(Vol. 565); John Wiley & Sons. - Mercer, A., Lau, A., & Kennedy, C. (2018). For weighting online
opt-in samples, what matters most.
*Pew Research Center*.