Title: | Cross-Validated Covariance Matrix Estimation |
---|---|
Description: | An efficient cross-validated approach for covariance matrix estimation, particularly useful in high-dimensional settings. This method relies upon the theory of high-dimensional loss-based covariance matrix estimator selection developed by Boileau et al. (2022) <doi:10.1080/10618600.2022.2110883> to identify the optimal estimator from among a prespecified set of candidates. |
Authors: | Philippe Boileau [aut, cre, cph] , Nima Hejazi [aut] , Brian Collica [aut] , Jamarcus Liu [ctb], Mark van der Laan [ctb, ths] , Sandrine Dudoit [ctb, ths] |
Maintainer: | Philippe Boileau <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.2 |
Built: | 2024-11-13 05:00:31 UTC |
Source: | https://github.com/philboileau/cvcovest |
adaptiveLassoEst()
applied the adaptive LASSO to the
entries of the sample covariance matrix. The thresholding function is
inspired by the penalized regression introduced by
Zou (2006). The thresholding function assigns
a weight to each entry of the sample covariance matrix based on its
initial value. This weight then determines the relative size of the penalty
resulting in larger values being penalized less and reducing bias
(Rothman et al. 2009).
adaptiveLassoEst(dat, lambda, n)
adaptiveLassoEst(dat, lambda, n)
dat |
A numeric |
lambda |
A non-negative |
n |
A non-negative |
A matrix
corresponding to the estimate of the covariance
matrix.
Rothman AJ, Levina E, Zhu J (2009).
“Generalized Thresholding of Large Covariance Matrices.”
Journal of the American Statistical Association, 104(485), 177-186.
doi:10.1198/jasa.2009.0101, https://doi.org/10.1198/jasa.2009.0101.
Zou H (2006).
“The Adaptive Lasso and Its Oracle Properties.”
Journal of the American Statistical Association, 101(476), 1418-1429.
doi:10.1198/016214506000000735, https://doi.org/10.1198/016214506000000735.
adaptiveLassoEst(dat = mtcars, lambda = 0.9, n = 0.9)
adaptiveLassoEst(dat = mtcars, lambda = 0.9, n = 0.9)
bandingEst()
estimates the covariance matrix of data with
ordered variables by forcing off-diagonal entries to be zero for indices
that are far removed from one another. The {i, j} entry of the estimated
covariance matrix will be zero if the absolute value of {i - j} is greater
than some non-negative constant k
. This estimator was proposed by
Bickel and Levina (2008).
bandingEst(dat, k)
bandingEst(dat, k)
dat |
A numeric |
k |
A non-negative, |
A matrix
corresponding to the estimate of the covariance
matrix.
Bickel PJ, Levina E (2008). “Regularized estimation of large covariance matrices.” Annals of Statistics, 36(1), 199–227. doi:10.1214/009053607000000758.
bandingEst(dat = mtcars, k = 2L)
bandingEst(dat = mtcars, k = 2L)
cvCovEst()
identifies the optimal covariance matrix
estimator from among a set of candidate estimators.
cvCovEst( dat, estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst), estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst = list(gamma = 0)), cv_loss = cvMatrixFrobeniusLoss, cv_scheme = "v_fold", mc_split = 0.5, v_folds = 10L, parallel = FALSE, ... )
cvCovEst( dat, estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst), estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst = list(gamma = 0)), cv_loss = cvMatrixFrobeniusLoss, cv_scheme = "v_fold", mc_split = 0.5, v_folds = 10L, parallel = FALSE, ... )
dat |
A numeric |
estimators |
A |
estimator_params |
A named |
cv_loss |
A |
cv_scheme |
A |
mc_split |
A |
v_folds |
An |
parallel |
A |
... |
Not currently used. Permits backward compatibility. |
A list
of results containing the following elements:
estimate
- A matrix
corresponding to the estimate of
the optimal covariance matrix estimator.
estimator
- A character
indicating the optimal
estimator and corresponding hyperparameters, if any.
risk_df
- A tibble
providing the
cross-validated risk estimates of each estimator.
cv_df
- A tibble
providing each
estimators' loss over the folds of the cross-validated procedure.
args
- A named list
containing arguments passed to
cvCovEst
.
cvCovEst( dat = mtcars, estimators = c( linearShrinkLWEst, thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1)) ) )
cvCovEst( dat = mtcars, estimators = c( linearShrinkLWEst, thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1)) ) )
cvFrobeniusLoss()
evaluates the aggregated Frobenius loss
over a fold
object (from 'origami'
(Coyle and Hejazi 2018)).
cvFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
cvFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
fold |
A |
dat |
A |
estimator_funs |
An |
estimator_params |
A named |
A tibble
providing information on estimators,
their hyperparameters (if any), and their scaled Frobenius loss evaluated
on a given fold
.
Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.
library(MASS) library(origami) library(rlang) # generate 10x10 covariance matrix with unit variances and off-diagonal # elements equal to 0.5 Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10) # sample 50 observations from multivariate normal with mean = 0, var = Sigma dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma) # generate a single fold using MC-cv resub <- make_folds(dat, fold_fun = folds_vfold, V = 2 )[[1]] cvFrobeniusLoss( fold = resub, dat = dat, estimator_funs = rlang::quo(c( linearShrinkEst, thresholdingEst, sampleCovEst )), estimator_params = list( linearShrinkEst = list(alpha = c(0, 1)), thresholdingEst = list(gamma = c(0, 1)) ) )
library(MASS) library(origami) library(rlang) # generate 10x10 covariance matrix with unit variances and off-diagonal # elements equal to 0.5 Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10) # sample 50 observations from multivariate normal with mean = 0, var = Sigma dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma) # generate a single fold using MC-cv resub <- make_folds(dat, fold_fun = folds_vfold, V = 2 )[[1]] cvFrobeniusLoss( fold = resub, dat = dat, estimator_funs = rlang::quo(c( linearShrinkEst, thresholdingEst, sampleCovEst )), estimator_params = list( linearShrinkEst = list(alpha = c(0, 1)), thresholdingEst = list(gamma = c(0, 1)) ) )
cvMatrixFrobeniusLoss()
evaluates the matrix Frobenius
loss over a fold
object (from 'origami'
(Coyle and Hejazi 2018)). This loss function is equivalent to that
presented in cvFrobeniusLoss()
in terms of estimator
selections, but is more computationally efficient.
cvMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
cvMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
fold |
A |
dat |
A |
estimator_funs |
An |
estimator_params |
A named |
A tibble
providing information on estimators,
their hyperparameters (if any), and their matrix Frobenius loss evaluated
on a given fold
.
Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.
library(MASS) library(origami) library(rlang) # generate 10x10 covariance matrix with unit variances and off-diagonal # elements equal to 0.5 Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10) # sample 50 observations from multivariate normal with mean = 0, var = Sigma dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma) # generate a single fold using MC-cv resub <- make_folds(dat, fold_fun = folds_vfold, V = 2 )[[1]] cvMatrixFrobeniusLoss( fold = resub, dat = dat, estimator_funs = rlang::quo(c( linearShrinkEst, thresholdingEst, sampleCovEst )), estimator_params = list( linearShrinkEst = list(alpha = c(0, 1)), thresholdingEst = list(gamma = c(0, 1)) ) )
library(MASS) library(origami) library(rlang) # generate 10x10 covariance matrix with unit variances and off-diagonal # elements equal to 0.5 Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10) # sample 50 observations from multivariate normal with mean = 0, var = Sigma dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma) # generate a single fold using MC-cv resub <- make_folds(dat, fold_fun = folds_vfold, V = 2 )[[1]] cvMatrixFrobeniusLoss( fold = resub, dat = dat, estimator_funs = rlang::quo(c( linearShrinkEst, thresholdingEst, sampleCovEst )), estimator_params = list( linearShrinkEst = list(alpha = c(0, 1)), thresholdingEst = list(gamma = c(0, 1)) ) )
cvScaledMatrixFrobeniusLoss()
evaluates the scaled matrix
Frobenius loss over a fold
object (from 'origami'
(Coyle and Hejazi 2018)). The squared error loss computed for each
entry of the estimated covariance matrix is scaled by the training set's
sample variances of the variable associated with that entry's row and
column variables. This loss should be used instead of
cvMatrixFrobeniusLoss()
when a dataset's variables' values
are of different magnitudes.
cvScaledMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
cvScaledMatrixFrobeniusLoss(fold, dat, estimator_funs, estimator_params = NULL)
fold |
A |
dat |
A |
estimator_funs |
An |
estimator_params |
A named |
A tibble
providing information on estimators,
their hyperparameters (if any), and their scaled matrix Frobenius loss
evaluated on a given fold
.
Coyle J, Hejazi N (2018). “origami: A Generalized Framework for Cross-Validation in R.” Journal of Open Source Software, 3(21), 512. doi:10.21105/joss.00512.
library(MASS) library(origami) library(rlang) # generate 10x10 covariance matrix with unit variances and off-diagonal # elements equal to 0.5 Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10) # sample 50 observations from multivariate normal with mean = 0, var = Sigma dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma) # generate a single fold using MC-cv resub <- make_folds(dat, fold_fun = folds_vfold, V = 2 )[[1]] cvScaledMatrixFrobeniusLoss( fold = resub, dat = dat, estimator_funs = rlang::quo(c( linearShrinkEst, thresholdingEst, sampleCovEst )), estimator_params = list( linearShrinkEst = list(alpha = c(0, 1)), thresholdingEst = list(gamma = c(0, 1)) ) )
library(MASS) library(origami) library(rlang) # generate 10x10 covariance matrix with unit variances and off-diagonal # elements equal to 0.5 Sigma <- matrix(0.5, nrow = 10, ncol = 10) + diag(0.5, nrow = 10) # sample 50 observations from multivariate normal with mean = 0, var = Sigma dat <- mvrnorm(n = 50, mu = rep(0, 10), Sigma = Sigma) # generate a single fold using MC-cv resub <- make_folds(dat, fold_fun = folds_vfold, V = 2 )[[1]] cvScaledMatrixFrobeniusLoss( fold = resub, dat = dat, estimator_funs = rlang::quo(c( linearShrinkEst, thresholdingEst, sampleCovEst )), estimator_params = list( linearShrinkEst = list(alpha = c(0, 1)), thresholdingEst = list(gamma = c(0, 1)) ) )
denseLinearShrinkEst()
computes the asymptotically
optimal convex combination of the sample covariance matrix and a dense
target matrix. This target matrix's diagonal elements are equal to the
average of the sample covariance matrix estimate's diagonal elements, and
its off-diagonal elements are equal to the average of the sample covariance
matrix estimate's off-diagonal elements. For information on this
estimator's derivation, see Ledoit and Wolf (2020) and
Schäfer and Strimmer (2005).
denseLinearShrinkEst(dat)
denseLinearShrinkEst(dat)
dat |
A numeric |
A matrix
corresponding to the estimate of the covariance
matrix.
Ledoit O, Wolf M (2020).
“The Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation.”
Journal of Financial Econometrics.
ISSN 1479-8409, doi:10.1093/jjfinec/nbaa007, nbaa007, https://academic.oup.com/jfec/advance-article-pdf/doi/10.1093/jjfinec/nbaa007/33416890/nbaa007.pdf.
Schäfer J, Strimmer K (2005).
“A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics.”
Statistical Applications in Genetics and Molecular Biology, 4(1).
doi:10.2202/1544-6115.1175, https://www.degruyter.com/view/journals/sagmb/4/1/article-sagmb.2005.4.1.1175.xml.xml.
denseLinearShrinkEst(dat = mtcars)
denseLinearShrinkEst(dat = mtcars)
is.cvCovEst()
provides a generic method for checking if
input is of class cvCovEst
.
is.cvCovEst(x)
is.cvCovEst(x)
x |
The specific object to test. |
A logical
indicating TRUE
if x
inherits from
class cvCovEst
.
cv_dat <- cvCovEst( dat = mtcars, estimators = c( thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1)) ), center = TRUE, scale = TRUE ) is.cvCovEst(cv_dat)
cv_dat <- cvCovEst( dat = mtcars, estimators = c( thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1)) ), center = TRUE, scale = TRUE ) is.cvCovEst(cv_dat)
linearShrinkEst()
computes the linear shrinkage estimate
of the covariance matrix for a given value of alpha
. The linear
shrinkage estimator is defined as the convex combination of the sample
covariance matrix and the identity matrix. The choice of alpha
determines the bias-variance tradeoff of the estimators in this class:
values near 1 are more likely to exhibit high variance but low bias, and
values near 0 are more likely to be be very biased but have low variance.
linearShrinkEst(dat, alpha)
linearShrinkEst(dat, alpha)
dat |
A numeric |
alpha |
A |
A matrix
corresponding to the estimate of the covariance
matrix.
linearShrinkEst(dat = mtcars, alpha = 0.1)
linearShrinkEst(dat = mtcars, alpha = 0.1)
linearShrinkLWEst()
computes an asymptotically optimal
convex combination of the sample covariance matrix and the identity matrix.
This convex combination effectively shrinks the eigenvalues of the sample
covariance matrix towards the identity. This estimator is more accurate
than the sample covariance matrix in high-dimensional settings under fairly
loose assumptions. For more information, consider reviewing the manuscript
by Ledoit and Wolf (2004).
linearShrinkLWEst(dat)
linearShrinkLWEst(dat)
dat |
A numeric |
A matrix
corresponding to the Ledoit-Wolf linear shrinkage
estimate of the covariance matrix.
Ledoit O, Wolf M (2004). “A well-conditioned estimator for large-dimensional covariance matrices.” Journal of Multivariate Analysis, 88(2), 365 - 411. ISSN 0047-259X, doi:10.1016/S0047-259X(03)00096-4, https://www.sciencedirect.com/science/article/pii/S0047259X03000964.
linearShrinkLWEst(dat = mtcars)
linearShrinkLWEst(dat = mtcars)
nlShrinkLWEst()
invokes the analytical estimator
presented by Ledoit and Wolf (2018) for applying a
nonlinear shrinkage function to the sample eigenvalues of the covariance
matrix. The shrinkage function relies on an application of the Hilbert
Transform to an estimate of the sample eigenvalues' limiting spectral
density. This estimated density is computed with the Epanechnikov kernel
using a global bandwidth parameter of n^(-1/3)
. The resulting
shrinkage function pulls eigenvalues towards the nearest mode of their
empirical distribution, thus creating a localized shrinkage effect rather
than a global one.
We do not recommend that this estimator be employed when the estimand is the correlation matrix. The diagonal entries of the resulting estimate are not guaranteed to be equal to one.
nlShrinkLWEst(dat)
nlShrinkLWEst(dat)
dat |
A numeric |
A matrix
corresponding to the estimate of the covariance
matrix.
Ledoit O, Wolf M (2018). “Analytical nonlinear shrinkage of large-dimensional covariance matrices.” Technical Report 264, Department of Economics - University of Zurich. https://EconPapers.repec.org/RePEc:zur:econwp:264.
nlShrinkLWEst(dat = mtcars)
nlShrinkLWEst(dat = mtcars)
The plot
method is a generic method for plotting objects
of class, "cvCovEst"
. The method is designed as a tool for diagnostic
and exploratory analysis purposes when selecting a covariance matrix
estimator using cvCovEst
.
## S3 method for class 'cvCovEst' plot( x, dat_orig, estimator = NULL, plot_type = c("summary"), stat = c("min"), k = NULL, leading = TRUE, abs_v = TRUE, switch_vars = FALSE, min_max = FALSE, ... )
## S3 method for class 'cvCovEst' plot( x, dat_orig, estimator = NULL, plot_type = c("summary"), stat = c("min"), k = NULL, leading = TRUE, abs_v = TRUE, switch_vars = FALSE, min_max = FALSE, ... )
x |
An object of class, |
dat_orig |
The |
estimator |
A |
plot_type |
A |
stat |
A |
k |
A |
leading |
A |
abs_v |
A |
switch_vars |
A |
min_max |
A |
... |
Additional arguments passed to the plot method. These are not explicitly used and should be ignored by the user. |
This plot method is designed to aide users in understanding the
estimation procedure carried out in cvCovEst()
. There are
currently four different values for plot_type
that can be called:
"eigen"
- Plots the eigenvalues associated with the
specified estimator
and stat
arguments in decreasing
order.
"risk"
- Plots the cross-validated risk of the specified
estimator
as a function of the hyperparameter values passed to
cvCovEst()
. This type of plot is only compatible with
estimators which take hyperparameters as arguments.
"heatmap"
- Plots a covariance heat map associated with the
specified estimator
and stat
arguments. Multiple
estimators and performance stats may be specified to produce grids of
heat maps.
"summary"
- Specifying this plot type will run all of the
above plots for the best performing estimator selected by
cvCovEst()
. These plots are then combined into a single
panel along with a table containing the best performing estimator
within each class. If the optimal estimator selected by
cvCovEst()
does not have hyperparameters, then the risk
plot is replaced with a table displaying the minimum, first quartile,
median, third quartile, and maximum of the cross-validated risk
associated with each class of estimator.
The stat
argument accepts five values. They each correspond to a
summary statistic of the cross-validated risk distribution within a class
of estimator. Possible values are:
"min"
- minimum
"Q1"
- first quartile
"median"
- median
"Q3"
- third quartile
"max"
- maximum
A plot object
cv_dat <- cvCovEst( dat = mtcars, estimators = c( thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1)) ) ) plot(x = cv_dat, dat_orig = mtcars)
cv_dat <- cvCovEst( dat = mtcars, estimators = c( thresholdingEst, sampleCovEst ), estimator_params = list( thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1)) ) ) plot(x = cv_dat, dat_orig = mtcars)
poetEst()
implements the Principal Orthogonal complEment
Thresholding (POET) estimator, a nonparametric, unobserved-factor-based
estimator of the covariance matrix (Fan et al. 2013). The
estimator is defined as the sum of the sample covariance matrix'
rank-k
approximation and its post-thresholding principal orthogonal
complement. The hard thresholding function is used here, though others
could be used instead.
poetEst(dat, k, lambda)
poetEst(dat, k, lambda)
dat |
A numeric |
k |
An |
lambda |
A non-negative |
A matrix
corresponding to the estimate of the covariance
matrix.
Fan J, Liao Y, Mincheva M (2013). “Large covariance estimation by thresholding principal orthogonal complements.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(4), 603–680. ISSN 13697412, 14679868, https://www.jstor.org/stable/24772450.
poetEst(dat = mtcars, k = 2L, lambda = 0.1)
poetEst(dat = mtcars, k = 2L, lambda = 0.1)
robustPoetEst()
implements the robust version of
Principal Orthogonal complEment Thresholding (POET) estimator, a
nonparametric, unobserved-factor-based estimator of the covariance matrix
when the underlying distribution is elliptical
(Fan et al. 2018). The estimator is defined as the sum of the
sample covariance matrix's rank-k
approximation and its
post-thresholding principal orthogonal complement. The rank-k
approximation is constructed from the sample covariance matrix, its leading
eigenvalues, and its leading eigenvectors. The sample covariance matrix and
leading eigenvalues are initially estimated via an M-estimation procedure
and the marginal Kendall's tau estimator. The leading eigenvectors are
estimated using spatial Kendall's tau estimator. The hard thresholding
function is used to regularize the idiosyncratic errors' estimated
covariance matrix, though other regularization schemes could be used.
We do not recommend that this estimator be employed when the estimand is the correlation matrix. The diagonal entries of the resulting estimate are not guaranteed to be equal to one.
robustPoetEst(dat, k, lambda, var_est = c("sample", "mad", "huber"))
robustPoetEst(dat, k, lambda, var_est = c("sample", "mad", "huber"))
dat |
A numeric |
k |
An |
lambda |
A non-negative |
var_est |
A |
A matrix
corresponding to the estimate of the covariance
matrix.
Fan J, Liao Y, Mincheva M (2013).
“Large covariance estimation by thresholding principal orthogonal complements.”
Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(4), 603–680.
ISSN 13697412, 14679868, https://www.jstor.org/stable/24772450.
Fan J, Liu H, Wang W (2018).
“Large covariance estimation through elliptical factor models.”
Ann. Statist., 46(4), 1383–1414.
doi:10.1214/17-AOS1588.
robustPoetEst(dat = mtcars, k = 2L, lambda = 0.1, var_est = "sample")
robustPoetEst(dat = mtcars, k = 2L, lambda = 0.1, var_est = "sample")
sampleCovEst()
computes the sample covariance matrix.
This function is a simple wrapper around covar()
.
sampleCovEst(dat)
sampleCovEst(dat)
dat |
A numeric |
A matrix
corresponding to the estimate of the covariance
matrix.
sampleCovEst(dat = mtcars)
sampleCovEst(dat = mtcars)
scadEst()
applies the SCAD thresholding function of
Fan and Li (2001) to each entry of the sample
covariance matrix. This penalized estimator constitutes a compromise
between hard and soft thresholding of the sample covariance matrix: it is
a linear interpolation between soft thresholding up to 2 * lambda
and hard thresholding after 3.7 * lambda
(Rothman et al. 2009).
scadEst(dat, lambda)
scadEst(dat, lambda)
dat |
A numeric |
lambda |
A non-negative |
A matrix
corresponding to the estimate of the covariance
matrix.
Fan J, Li R (2001).
“Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties.”
Journal of the American Statistical Association, 96(456), 1348-1360.
doi:10.1198/016214501753382273, https://doi.org/10.1198/016214501753382273.
Rothman AJ, Levina E, Zhu J (2009).
“Generalized Thresholding of Large Covariance Matrices.”
Journal of the American Statistical Association, 104(485), 177-186.
doi:10.1198/jasa.2009.0101, https://doi.org/10.1198/jasa.2009.0101.
scadEst(dat = mtcars, lambda = 0.2)
scadEst(dat = mtcars, lambda = 0.2)
spikedFrobeniusShrinkEst()
implements the asymptotically
optimal shrinkage estimator with respect to the Frobenius loss in a spiked
covariance matrix model. Informally, this model admits Gaussian
data-generating processes whose covariance matrix is a scalar multiple of
the identity, save for a few number of large "spikes". A thorough review of
this estimator, or more generally spiked covariance matrix estimation, is
provided in Donoho et al. (2018).
spikedFrobeniusShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
spikedFrobeniusShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
dat |
A numeric |
p_n_ratio |
A |
num_spikes |
A |
noise |
A |
A matrix
corresponding to the covariance matrix estimate.
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
spikedOperatorShrinkEst()
implements the asymptotically
optimal shrinkage estimator with respect to the operator loss in a spiked
covariance matrix model. Informally, this model admits Gaussian
data-generating processes whose covariance matrix is a scalar multiple of
the identity, save for a few number of large "spikes". A thorough review of
this estimator, or more generally spiked covariance matrix estimation, is
provided in Donoho et al. (2018).
spikedOperatorShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
spikedOperatorShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
dat |
A numeric |
p_n_ratio |
A |
num_spikes |
A |
noise |
A |
A matrix
corresponding to the covariance matrix estimate.
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
spikedOperatorShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
spikedOperatorShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
spikedSteinShrinkEst()
implements the asymptotically
optimal shrinkage estimator with respect to the Stein loss in a spiked
covariance matrix model. Informally, this model admits Gaussian
data-generating processes whose covariance matrix is a scalar multiple of
the identity, save for a few number of large "spikes". A thorough review of
this estimator, or more generally spiked covariance matrix estimation, is
provided in Donoho et al. (2018).
spikedSteinShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
spikedSteinShrinkEst(dat, p_n_ratio, num_spikes = NULL, noise = NULL)
dat |
A numeric |
p_n_ratio |
A |
num_spikes |
A |
noise |
A |
A matrix
corresponding to the covariance matrix estimate.
Donoho D, Gavish M, Johnstone I (2018). “Optimal shrinkage of eigenvalues in the spiked covariance model.” The Annals of Statistics, 46(4), 1742 – 1778.
spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
spikedFrobeniusShrinkEst(dat = mtcars, p_n_ratio = 0.1, num_spikes = 2L)
summary()
provides summary statistics regarding
the performance of cvCovEst()
and can be used for diagnostic
plotting.
## S3 method for class 'cvCovEst' summary( object, dat_orig, summ_fun = c("cvRiskByClass", "bestInClass", "worstInClass", "hyperRisk"), ... )
## S3 method for class 'cvCovEst' summary( object, dat_orig, summ_fun = c("cvRiskByClass", "bestInClass", "worstInClass", "hyperRisk"), ... )
object |
A named |
dat_orig |
The |
summ_fun |
A |
... |
Additional arguments passed to |
summary()
accepts four different choices for the
summ_fun
argument. The choices are:
"cvRiskByClass"
- Returns the minimum, first quartile,
median, third quartile, and maximum of the cross-validated risk
associated with each class of estimator passed to
cvCovEst()
.
"bestInClass"
- Returns the specific hyperparameters, if
applicable, of the best performing estimator within each class along
with other metrics.
"worstInClass"
- Returns the specific hyperparameters, if
applicable, of the worst performing estimator within each class along
with other metrics.
"hyperRisk"
- For estimators that take hyperparameters as
arguments, this function returns the hyperparameters associated with
the minimum, first quartile, median, third quartile, and maximum of the
cross-validated risk within each class of estimator. Each class has
its own tibble
, which are returned as a
list
.
A named list
where each element corresponds to the output of
of the requested summaries.
cv_dat <- cvCovEst( dat = mtcars, estimators = c( linearShrinkEst, thresholdingEst, sampleCovEst ), estimator_params = list( linearShrinkEst = list(alpha = seq(0.1, 0.9, 0.1)), thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1)) ), center = TRUE, scale = TRUE ) summary(cv_dat, mtcars)
cv_dat <- cvCovEst( dat = mtcars, estimators = c( linearShrinkEst, thresholdingEst, sampleCovEst ), estimator_params = list( linearShrinkEst = list(alpha = seq(0.1, 0.9, 0.1)), thresholdingEst = list(gamma = seq(0.1, 0.9, 0.1)) ), center = TRUE, scale = TRUE ) summary(cv_dat, mtcars)
taperingEst()
estimates the covariance matrix of a
data.frame
-like object with ordered variables by gradually shrinking
the bands of the sample covariance matrix towards zero. The estimator is
defined as the Hadamard product of the sample covariance matrix and a
weight matrix. The amount of shrinkage is dictated by the weight matrix
and is specified by a hyperparameter k
. This estimator is attributed
to Cai et al. (2010).
The weight matrix is a Toeplitz matrix with entries defined as follows. Let
i and j index the rows and columns of the weight matrix, respectively. If
abs(i - j) <= k / 2
, then entry {i, j} in the weight matrix is
equal to 1. If k / 2 < abs(i - j) < k
, then entry {i, j} is equal
to 2 - 2 * abs(i - j) / k
. Otherwise, entry {i, j} is equal to 0.
taperingEst(dat, k)
taperingEst(dat, k)
dat |
A numeric |
k |
A non-negative, even |
A matrix
corresponding to the estimate of the covariance
matrix.
Cai TT, Zhang C, Zhou HH (2010). “Optimal rates of convergence for covariance matrix estimation.” Ann. Statist., 38(4), 2118–2144. doi:10.1214/09-AOS752.
taperingEst(dat = mtcars, k = 0.1)
taperingEst(dat = mtcars, k = 0.1)
thresholdingEst()
computes the hard thresholding estimate
of the covariance matrix for a given value of gamma
. The threshold
estimator of the covariance matrix applies a hard thresholding operator to
each element of the sample covariance matrix. For more information on this
estimator, review Bickel and Levina (2008).
thresholdingEst(dat, gamma)
thresholdingEst(dat, gamma)
dat |
A numeric |
gamma |
A non-negative |
A matrix
corresponding to the estimate of the covariance
matrix.
Bickel PJ, Levina E (2008). “Covariance regularization by thresholding.” Annals of Statistics, 36(6), 2577–2604. doi:10.1214/08-AOS600.
thresholdingEst(dat = mtcars, gamma = 0.2)
thresholdingEst(dat = mtcars, gamma = 0.2)