In contemporary data analysis, it is increasingly common to work with non-stationary complex data sets. These data sets typically extend beyond the classical low-dimensional Euclidean space, making it challenging to detect shifts in their distribution without relying on strong structural assumptions. This paper proposes a novel o ine change-point detection method that leverages classiers developed in the statistics and machine learning community. With suitable data splitting, the test statistic is constructed through sequential computation of the Area Under the Curve (AUC) of a classier, which is trained on data segments on both ends of the sequence. It is shown that the resulting AUC process attains its maxima at the true change-point location, which facilitates the change-point estimation. The proposed method is characterized by its complete nonparametric nature, high versatility, considerable exibility, and absence of stringent assumptions on the underlying data or any distributional shifts. Theoretically, we derive the limiting pivotal distribution of the proposed test statistic under null, as well as the asymptotic behaviors under both local and xed alternatives. The localization rate of the change-point estimator is also provided. Extensive simulation studies and the analysis of two real-world data sets illustrate the superior performance of our approach compared to existing model-free change-point detection methods.

3910 3104
KK 1336
- Ph.D. in Statistics (The Pennsylvania State University), 2021;
- Bachelor in Statistics (Renmin University of China), 2016.
Zhanrui Cai is an assistant professor in the area of Innovation and Information Management at the HKU Business School. Previously, he was an assistant professor in the Department of Statistics, Iowa State University.
- Statistics Inference
- Machine Learning
- Artificial Intelligence
- Privacy Protection
- Xia, X., Zhang, L., Cai, Z. (2025) “Differentially private sliced inverse regression: minimax optimality and algorithm”, Journal of the American Statistical Association, forthcoming.
- Kanrar, R., Jiang, F., Cai, Z. (2025) “Model-free change-point detection using AUC of a classifier”, Journal of Machine Learning Research, 26(190), 1-50.
- Cai, Z., Zhang, Y., Guo, X., Zhu, L., Li, R. (2025) “A nonparametric independence test via penalized mutual information”, Science China Mathematics, forthcoming.
- Gao, Y., Zhang, Z., Cai, Z., Zhu, X., Zou, T., Wang, H. (2025) “Penalized sparse covariance regression with high dimensional covariates”, Journal of Business & Economic Statistics, 43(3), 615-626.
- Awan, J., Cai, Z. (2025) “One step to efficient synthetic data”, Statistica Sinica, 35(I), 539-561.
- Cai, Z., Lei, J., Roeder, K. (2024) “Asymptotic distribution-free independence test for high-dimension data”, Journal of the American Statistical Association, 119(547), 1794-1804.
- Cai, Z., Li, C., Wen, J., Yang, S. (2024) “Asset splitting algorithm for ultrahigh dimensional portfolio selection and its theoretical property”, Journal of Econometrics, 239(2), 105291.
- Xia, X., Cai, Z. (2023) “Adaptive false discovery rate control with privacy guarantee”, Journal of Machine Learning Research, 24(252), 1-35.
- Tong, Z., Cai, Z., Yang, S., Li, R. (2023) “Model-free conditional feature screening with FDR control”, Journal of the American Statistical Association, 118(544), 2575-2587.
- Du, J., Cai, Z., Roeder, K. (2022) “Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT”, Proceedings of the National Academy of Sciences, 119(49), e2214414119.
- Cai, Z., Lei, J., Roeder, K. (2022) “Model-free prediction test with application to genomics data”, Proceedings of the National Academy of Sciences, 119(34), e2205518119.
- Cai, Z., Li, R., Zhang, Y. (2022) “A distribution free conditional independence test with applications to causal discovery”, Journal of Machine Learning Research, 23(85), 1-41.
- Cai, Z., Xi, D., Zhu, X., Li, R. (2022) “Causal discoveries for high dimensional mixed data”, Statistics in Medicine, 41(24), 4924-4940.
- Zhu, X., Cai, Z., Ma, Y. (2022) “Network functional varying coefficient model”, Journal of the American Statistical Association, 117(540), 2074-2085.
- Cai, Z., Li, R., Zhu, L. (2020) “Online sufficient dimension reduction through sliced inverse regression”, Journal of Machine Learning Research, 21(10), 1-25.
Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper, we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a single cell data set to test the independence between two types of single cell sequencing measurements, whose high dimensionality and sparsity make existing methods hard to apply.
Differentially private multiple testing procedures can protect the information of individuals used in hypothesis tests while guaranteeing a small fraction of false discoveries. In this paper, we propose a differentially private adaptive FDR control method that can control the classic FDR metric exactly at a user-specified level α with a privacy guarantee, which is a non-trivial improvement compared to the differentially private Benjamini-Hochberg method proposed in Dwork et al. (2021). Our analysis is based on two key insights: 1) a novel p-value transformation that preserves both privacy and the mirror conservative property, and 2) a mirror peeling algorithm that allows the construction of the filtration and application of the optimal stopping technique. Numerical studies demonstrate that the proposed DP-AdaPT performs better compared to the existing differentially private FDR control methods. Compared to the non-private AdaPT, it incurs a small accuracy loss but significantly reduces the computation cost.




