Abstract
This vignette lists the available metrics in cvms
, along
with their formulas.
Contact the author at rpkgs@ludvigolsen.dk
cvms
has a large set of metrics for model evaluation. In
this document, we list the metrics and their formulas.
Some of the metrics in the package are computed with external packages. These are listed at the bottom.
Some of the metrics are disabled by default to avoid cluttering the
output tibble. These can be enabled in the metrics
argument. This argument takes a list of named booleans, like
list("Accuracy" = FALSE, "Weighted F1" = TRUE)
. This can be
generated with the helper functions gaussian_metrics()
,
binomial_metrics()
, and multinomial_metrics()
.
If, for instance, we only wish to calculate the RMSE
metric
for our regression model, we can use either
list("all" = FALSE, "RMSE" = TRUE)
or
gaussian_metrics(all = FALSE, rmse = TRUE)
.
The metrics used to evaluate regression tasks (like linear regression):
Symbol  Denotes  Formula 

\(y\)  Targets  
\(\hat{y}\)  Predictions  
\(\bar{y}\)  Average target  
\(n\)  Number of observations  
\({\scriptstyle \operatorname{IQR}(x)}\)  Interquartile Range  \({\scriptstyle \operatorname{quantile}(x, 3/4)  \operatorname{quantile}(x, 1/4)}\) 
\(\lvert x \rvert\)  Absolute value of \(x\) 
Metric name  Abbreviation  Formula 

Root Mean Square Error  RMSE  \(\sqrt{\frac{\sum_{i=1}^{n}(\hat{y}_{i}y_{i})^2}{n}}\) 
Mean Absolute Error  MAE  \(\frac{\sum_{i=1}^{n}\lvert\hat{y}_{i}y_{i}\rvert}{n}\) 
Root Mean Square Log Error  RMSLE  \(\sqrt{\frac{\sum_{i=1}^{n}(\ln{(\hat{y}_{i}+1)}\ln{(y_{i}+1))^2}}{n}}\) 
Mean Absolute Log Error  MALE  \(\frac{\sum_{i=1}^{n}\lvert\ln{(\hat{y}_{i}+1)}\ln{(y_{i}+1)\rvert}}{n}\) 
Relative Absolute Error  RAE  \(\frac{\sum_{i=1}^{n}\lvert\hat{y}_{i}y_{i}\rvert}{\sum_{i=1}^{n}\lvert y_{i}\bar{y}\rvert}\) 
Relative Squared Error  RSE  \(\frac{\sum_{i=1}^{n}(\hat{y}_{i}y_{i})^2}{\sum_{i=1}^{n}(y_{i}  \bar{y})^2}\) 
Root Relative Squared Error  RRSE  \({\scriptstyle \sqrt{RSE} }\) 
Mean Absolute Percentage Error  MAPE  \(\frac{1}{n}\sum_{i=1}^{n} \lvert \frac{\hat{y}_{i}y_{i}}{y_{i}} \rvert\) 
Normalized RMSE (by target range) 
NRMSE(RNG)  \(\frac{RMSE}{\max{y}\min{y}}\) 
Normalized RMSE (by target IQR) 
NRMSE(IQR)  \(\frac{RMSE}{\operatorname{IQR}(y)}\) 
Normalized RMSE (by target STD) 
NRMSE(STD)  \(\frac{RMSE}{\sqrt{\frac{1}{n1}\sum_{i=1}^{n}(y_i\bar{y})^2}}\) 
Normalized RMSE (by target mean) 
NRMSE(AVG)  \(\frac{RMSE}{\bar{y}}\) 
Mean Square Error  MSE  \(\frac{\sum_{i=1}^{n}(\hat{y}_{i}y_{i})^2}{n}\) 
Total Absolute Error  TAE  \({\scriptstyle \sum_{i=1}^{n}\lvert\hat{y}_{i}y_{i}\rvert}\) 
Total Squared Error  TSE  \({\scriptstyle \sum_{i=1}^{n}(\hat{y}_{i}y_{i})^2}\) 
The metrics used to evaluate binary classification tasks:
Based on a confusion matrix, we first count the True Positives (TP),
True Negatives (TN), False Positives (FP), and False Negatives (FN).
Below, 1
is the positive class.
#> Target
#> Prediction 0 1
#> 0 TN FN
#> 1 FP TP
With these counts, we can calculate the following metrics. Note, that
the Kappa
metric normalizes the counts to percentages.
Metric name(s)  Abbreviation  Formula 

Accuracy  \(\frac{TP + TN}{TP + TN + FP + FN}\)  
Balanced Accuracy  \(\frac{Sensitivity + Specificity}{2}\)  
Sensitivity, Recall, True Positive Rate 
\(\frac{TP}{TP + FN}\)  
Specificity, True Negative Rate 
\(\frac{TN}{TN + FP}\)  
Positive Predictive Value, Precision 
Pos Pred Value  \(\frac{TP}{TP + FP}\) 
Negative Predictive Value  Neg Pred Value  \(\frac{TN}{TN + FN}\) 
F1 score  \(2 \cdot \frac{Pos Pred Value \cdot Sensitivity}{Pos Pred Value + Sensitivity}\)  
Matthews Correlation Coefficient  MCC  \(\frac{TP \cdot TN  FP
\cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP) (TN +
FN)}}\) Note: When the denominator is 0, we set it to 1 to avoid NaN . 
Detection Rate  \(\frac{TP}{TP + FN + TN + FP}\)  
Detection Prevalence  \(\frac{TP + FP}{TP + FN + TN + FP}\)  
Prevalence  \(\frac{TP + FN}{TP + FN + TN + FP}\)  
Threat Score  \(\frac{TP}{TP + FN + FP}\)  
False Negative Rate  \({\scriptstyle 1  Sensitivity}\)  
False Positive Rate  \({\scriptstyle 1  Specificity}\)  
False Discovery Rate  \({\scriptstyle 1  Pos Pred Value}\)  
False Omission Rate  \({\scriptstyle 1  Neg Pred Value}\)  
Kappa  For Kappa, the counts
(TP , TN , FP , FN ) are
normalized to percentages (summing to 1). Then:\({\scriptstyle p_{observed} = TP + TN}\) \({\scriptstyle p_{expected} = (TN + FP)(TN + FN) + (FN + TP)(FP + TP)}\) \(Kappa = \frac{p_{observed}  p_{expected}}{1  p_{expected}}\) 
We have four types of metrics for the multiclass classification evaluation:
Overall metrics simply look at whether a prediction
is correct or not. Currently, cvms
only has the
Overall Accuracy
.
The Macro/Average metrics are based
on onevsall evaluations of each class. In a
onevsall evaluation, we set all predictions and targets for
the current class to 1
and all others to 0
(
\({\scriptstyle y_{o,c} = 1 \text{ if } y_{o}
= c \text{ else } 0}\) and \({\scriptstyle \hat{y}_{o,c} = 1 \text{ if }
\hat{y} _{o} = c \text{ else } 0}\) ) and perform a binomial
evaluation. Once done for all classes, we average the results. Note that
this is sometimes referred to as onevsrest, as it is the current class
against the rest of the classes.
With a few exceptions (AUC
and MCC
), the
metrics in the multinomial outputs that share their name with the
binomial metrics are macro metrics. AUC
and
MCC
instead have specific multiclass
variants.
The Weighted metrics are averages, similar to the
macro metrics, but weighted by the Support
for each
class.
Metric name  Abbreviation  Formula 

Overall Accuracy  \(\frac{Correct}{Correct + Incorrect}\)  
Macro Metric  \({\scriptstyle \frac{1}{\lvert C \rvert}\sum_{c}^{C} metric_{c}}\)  
Support  \({\scriptstyle support_c =
\lvert \{ o \in O : o=c \} \rvert \quad \forall c \in
C}\) I.e., a count of the class in the target column. \(C\): the set of classes; \(O\): the observations. \(\lvert x \rvert\) denotes length of \(x\). 

Weighted metric  \(\frac{\sum_{c}^{C} metric_{c} \cdot support_{c}}{\sum_{c}^{C} support_{c}}\)  
Multiclass MCC  MCC  \({\scriptstyle \frac{N
\operatorname{Tr}(C)\sum_{k l} \tilde{\mathcal{C}}_{k}
\hat{\mathcal{C}}_{l}}{\sqrt{N^{2}\sum_{k l}
\tilde{\mathcal{C}}_{k}\left(\hat{\mathcal{C}}^{\mathrm{T}}\right)_{l}}
\sqrt{N^{2}\sum_{k l}\left(\tilde{C}^{\mathrm{T}}\right)_{k}
\hat{C}_{l}}} }\) \(N\): number of samples \(C\): a \(K \times K\) confusion matrix \(Tr(C)\): Number of correct predictions \(\tilde{\mathcal{C}}_{k}\): \(k\)th row of \(C\) ; \(\hat{C}_{l}\): \(l\)th column of \(C\) \(C^{T}\): \(C\) transposed Note: When the computation is NaN , we return 0 .Code was ported from scikitlearn. Gorodkin, J. (2004). Comparing two Kcategory assignments by a Kcategory correlation coefficient. Computational biology and chemistry, 28(56), 367374. 
These metrics are calculated by other packages:
Metric name  Abbreviation  Package::Function  Used in 

Aikake Information Criterion  AIC  stats::AIC  Shared 
Corrected Aikake Information Criterion  AICc  MuMIn::AICc  Shared 
Bayesian Information Criterion  BIC  stats::BIC  Shared 
Aikake Information Criterion  AIC  stats::AIC  Shared 
Marginal Rsquared  r2m  MuMIn::r.squaredGLMM  Gaussian 
Conditional Rsquared  r2c  MuMIn::r.squaredGLMM  Gaussian 
ROC curve  ROC  pROC::roc  Binomial 
Area Under the Curve  AUC  pROC::roc  Binomial 
Multiclass ROC curve  ROC  pROC::multiclass.roc  Multinomial 
Multiclass Area Under the Curve  AUC  pROC::multiclass.roc  Multinomial 