datanugget: Create, and Refine Data Nuggets
Creating, and refining data nuggets.
Data nuggets reduce a large dataset into a small collection of nuggets of
data, each containing a center (location), weight (importance), and scale
(variability) parameter. Data nugget centers are created by choosing
observations in the dataset which are as equally spaced apart as possible.
Data nugget weights are created by counting the number observations
closest to a given data nugget’s center. We then say the data nugget
'contains' these observations and the data nugget center is recalculated
as the mean of these observations. Data nugget scales are created by
calculating the trace of the covariance matrix of the observations
contained within a data nugget divided by the dimension of the dataset.
Data nuggets are refined by 'splitting' data nuggets which have scales or
shapes (defined as the ratio of the two largest eigenvalues of the
covariance matrix of the observations contained within the data nugget)
deemed too large.
||R (≥ 4.0), doSNOW (≥ 1.0.16), foreach (≥ 1.5.1), parallel (≥ 4.0.5)
||testthat (≥ 3.0.0)
||Yajie Duan [cre, ctb],
Traymon Beavers [aut],
Javier Cabrera [aut],
Mariusz Lubomirski [aut]
||Yajie Duan <yajieritaduan at gmail.com>
Please use the canonical form
to link to this page.