High-Dimension

RaSE: Random Subspace Ensemble Classification

We propose a new model-free ensemble classification framework, Random Subspace Ensemble (RaSE), for sparse classification. In the RaSE algorithm, we aggregate many weak learners, where each weak learner is a base classifier trained in a subspace …

A Projection Based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models

Measuring conditional dependence is an important topic in econometrics with broad applications including graphical models. Under a factor model setting, a new conditional dependence measure based on projection is proposed. The corresponding …

Neyman-Pearson classification: parametrics and sample size requirement

In contrast to the classical binary classification paradigm that minimizes the overall classification error, the Neyman-Pearson (NP) paradigm seeks classifiers with a minimal type II error while having a constrained type I error under a …

Likelihood adaptively modified penalties

A new family of penalty functions, ie, adaptive to likelihood, is introduced for model selection in general regression models. It arises naturally through assuming certain types of prior distribution on the regression parameters. To study the …

On the sparsity of Mallows model averaging estimator

We show that Mallows model averaging estimator proposed by Hansen (2007) can be written as a least squares estimation with a weighted $L_1$ penalty and additional constraints. By exploiting this representation, we demonstrate that the weight vector …

Regularization after retention in ultrahigh dimensional linear regression models

In ultrahigh dimensional setting, independence screening has been both theoretically and empirically proved a useful variable selection framework with low computation cost. In this work, we propose a two-step framework by using marginal information …

The restricted consistency property of leave-$n_v$-out cross-validation for high-dimensional variable selection

Cross-validation (CV) methods are popular for selecting the tuning parameter in the high-dimensional variable selection problem. We show the mis-alignment of the CV is one possible reason of its over-selection behavior. To fix this issue, we …

SIS R package

Sure Independence Screening

Large-Scale Model Selection with Misspecification

Model selection is crucial to high-dimensional learning and inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work assumes implicitly that the …

Model Selection for High Dimensional Quadratic Regression via Regularization

Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and …