The restricted consistency property of leave-$n_v$-out cross-validation for high-dimensional variable selection

Abstract

Cross-validation (CV) methods are popular for selecting the tuning parameter in the high-dimensional variable selection problem. We show the mis-alignment of the CV is one possible reason of its over-selection behavior. To fix this issue, we propose a version of leave-$n_v$-out cross-validation (CV($n_v$)), for selecting the optimal model among the restricted candidate model set for high-dimensional generalized linear models. By using the same candidate model sequence and a proper order of construction sample size $n_c$ in each CV split, CV($n_v$) avoids the potential hurdles in developing theoretical properties. CV($n_v$) is shown to enjoy the restricted model selection consistency property under mild conditions. Extensive simulations and real data analysis support the theoretical results and demonstrate the performances of CV($n_v$) in terms of both model selection and prediction.

Publication
Statistica Sinica