Adaptive and Robust Representation Multi-Task Learning
Prof. Yang FENG
Professor and Ph.D. Program Director of Biostatistics
School of Global Public Health
New York University
We study multi-task linear regression for a collection of tasks that share a latent, low-dimensional structure. Each task’s regression vector belongs to a subspace whose dimension, denoted intrinsic dimension, is much smaller than the ambient dimension. Unlike classical analyses that assume an identical subspace for every task, we allow each task’s subspace to drift from a single reference subspace by a controllable similarity radius, and we permit an unknown fraction of tasks to be true outliers that violate the shared-structure assumption altogether.
Our contributions are threefold. First, adaptivity: we design a penalized empirical-risk procedure and a spectral method. Both algorithms automatically adjust to the unknown similarity radius and to the proportion of outliers. Second, minimaxity: we prove information-theoretic lower bounds on the best achievable prediction risk over this problem class and show that both algorithms attain these bounds up to constant factors; when no outliers are present, the spectral method is exactly minimax-optimal. Third, robustness: for every choice of similarity radius and outlier proportion, the proposed estimators never incur larger expected prediction error than independent single-task ridge regression, while delivering strict improvements whenever tasks are even moderately similar and outliers are sparse.
Extensive simulations spanning hundreds of predictors, tens to hundreds of tasks, and a wide range of similarity and contamination levels confirm a sharp phase transition predicted by our theory: as soon as cross-task similarity exceeds a data-driven threshold and outliers remain few, our adaptive estimators decisively outperform single-task learning without ever performing worse.
Yang Feng is a Professor and Ph.D. Program Director of Biostatistics in the School of Global Public Health and an affiliate faculty in the Center for Data Science at New York University. He obtained his Ph.D. in Operations Research at Princeton University in 2010. Feng’s research interests encompass the theoretical and methodological aspects of machine learning, high-dimensional statistics, network models, and nonparametric statistics, leading to a wealth of practical applications, including Alzheimer’s disease, cancer classification, and electronic health records. He has published over 70 papers across leading journals in statistics, machine learning, econometrics, public health, and medicine. His research has been funded by multiple grants from the National Institutes of Health (NIH) and the National Science Foundation (NSF), notably the NSF CAREER Award. He is currently an Associate Editor for the Journal of the American Statistical Association (JASA), the Journal of Business & Economic Statistics (JBES), the Journal of Computational & Graphical Statistics (JCGS), and the Annals of Applied Statistics (AoAS). His professional recognitions include being named a fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), as well as an elected member of the International Statistical Institute (ISI).