Zhanrui CAI
Prof. Zhanrui CAI
创新及资讯管理学
Assistant professor

3910 3104

KK 1336

Academic & Professional Qualification
  • Ph.D. in Statistics (The Pennsylvania State University), 2021;
  • Bachelor in Statistics (Renmin University of China), 2016.
Biography

Zhanrui Cai is an assistant professor in the area of Innovation and Information Management at the HKU Business School. Previously, he was an assistant professor in the Department of Statistics, Iowa State University.

Research Interest
  • Statistics Inference
  • Machine Learning
  • Artificial Intelligence
  • Privacy Protection
Selected Publications
  • Gao, Y., Zhang, Z., Cai, Z., Zhu, X., Zou, T. and Wang, H. (2024) “Penalized Sparse Covariance Regression with High Dimensional Covariates”, Journal of Business & Economic Statistics, forthcoming.
  • Awan, J., and Cai, Z. (2024) “One Step to Efficient Synthetic Data”, Statistica Sinica, forthcoming.
  • Xia, X., and Cai, Z. (2023) “Adaptive False Discovery Rate Control with Privacy Guarantee”, Journal of Machine Learning Research, 24(252): 1-35.
  • Cai, Z., Lei, J., Roeder, K. (2023) “Asymptotic distribution-free independence test for high dimension data”, Journal of the American Statistical Association.
  • Cai, Z., Lei, J., Roeder, K. (2022) “Model-free prediction test with application to genomics data”, Proceedings of the National Academy of Sciences.
  • Du, J., Cai, Z., Roeder, K. (2022) Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT, Proceedings of the National Academy of Sciences
  • Cai, Z., Li, C., Wen, J., Yang, S. (2022) Asset splitting algorithm for ultrahigh dimensional portfolio selection and its theoretical property, Journal of Econometrics.
  • Cai, Z., Zhang, Y., Li, R. (2022) “A distribution-free conditional independence test with application to causal discovery”, Journal of Machine Learning Research.
  • Cai, Z., Xi, D., Zhu, X., Li, R. (2022) Causal discoveries for high dimensional mixed data, Statistics in Medicine.
  • Tong, Z., Cai, Z., Yang, S., Li, R. (2022) Model-free conditional feature screening with FDR control, Journal of the American Statistical Association.
  • Zhu, X., Cai, Z., Ma, Y. (2021) Network functional autoregression model, Journal of the American Statistical Association.
  • Cai, Z., Li, R., Zhu, L. (2020) “Online sufficient dimension reduction through sliced inverse regression”, Journal of Machine Learning Research.
Recent Publications
Model-free Change-Point Detection Using AUC of a Classifier

In contemporary data analysis, it is increasingly common to work with non-stationary complex data sets. These data sets typically extend beyond the classical low-dimensional Euclidean space, making it challenging to detect shifts in their distribution without relying on strong structural assumptions. This paper proposes a novel o ine change-point detection method that leverages classiers developed in the statistics and machine learning community. With suitable data splitting, the test statistic is constructed through sequential computation of the Area Under the Curve (AUC) of a classier, which is trained on data segments on both ends of the sequence. It is shown that the resulting AUC process attains its maxima at the true change-point location, which facilitates the change-point estimation. The proposed method is characterized by its complete nonparametric nature, high versatility, considerable exibility, and absence of stringent assumptions on the underlying data or any distributional shifts. Theoretically, we derive the limiting pivotal distribution of the proposed test statistic under null, as well as the asymptotic behaviors under both local and xed alternatives. The localization rate of the change-point estimator is also provided. Extensive simulation studies and the analysis of two real-world data sets illustrate the superior performance of our approach compared to existing model-free change-point detection methods.

Asymptotic Distribution-Free Independence Test for High Dimension Data

Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper, we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a single cell data set to test the independence between two types of single cell sequencing measurements, whose high dimensionality and sparsity make existing methods hard to apply.