Neyman-Pearson classification: parametrics and sample size requirement


In contrast to the classical binary classification paradigm that minimizes the overall classification error, the Neyman-Pearson (NP) paradigm seeks classifiers with a minimal type II error while having a constrained type I error under a user-specified level, addressing asymmetric type I/II error priorities. In this work, we present NP-sLDA, a new binary NP classifier that explicitly takes into account feature dependency under high-dimensional NP settings. This method adapts the popular sparse linear discriminant analysis (sLDA, Mai et al. (2012)) to the NP paradigm. We borrow the threshold determination method from the umbrella algorithm in Tong et al. (2017). On the theoretical front, we formulate a new conditional margin assumption and a new conditional detection condition to accommodate unbounded feature support, and show that NP-sLDA satisfies the NP oracle inequalities, which are natural NP paradigm counterparts of the oracle inequalities in classical classification. Numerical results show that NP-sLDA is a valuable addition to existing NP classifiers. We also suggest a general data-adaptive sample splitting scheme that, in many scenarios, improves the classification performance upon the default half-half class 0 split used in Tong et al. (2017), and this new splitting scheme has been incorporated into a new version of the R package nproc.

Journal of Machine Learning Research