Synthetic Nearest Neighbours: Extending Synthetic Controls for Matrix Completion with Missing Not at Random Data
Prof. Dennis Shen
Assistant Professor
Department of Data Sciences and Operations
USC Marshall Business School
We develop a causal framework for matrix completion under missing not at random (MNAR) data. Drawing on the method of synthetic controls from the econometric panel data literature, our approach relaxes two core assumptions that underlie standard MNAR matrix completion models: positivity (every entry is observed with positive probability) and independence (observations are independent across entries). Unlike traditional panel data models that often rely on rigid block-sparsity patterns, our framework accommodates flexible and heterogeneous observation structures commonly encountered in matrix completion problems. To operationalize our framework, we propose synthetic nearest neighbors (SNN), a novel algorithm that blends elements of K-nearest neighbors with synthetic controls. Under suitable assumptions on the underlying matrix and observed sparsity pattern, we prove that SNN achieves entrywise mean-squared error convergence for estimating the mean matrix, attaining a near-parametric rate. We further extend our analysis to heteroskedastic variance estimation, establishing that SNN attains entrywise mean-squared error convergence under bounded noise and asymptotic unbiasedness under general sub-Gaussian noise. Simulations studies corroborate our theoretical findings and demonstrate the robustness of SNN across a range of MNAR scenarios.
Dennis is an assistant professor in the Data Sciences and Operations Department at the USC Marshall School of Business. He received his PhD in Electrical Engineering and Computer Science from MIT. Before joining USC, he was a FODSI postdoctoral fellow at the Simons Institute at UC Berkeley. He also served as a technical consultant for Uber Technologies and TauRx Therapeutics. Dennis’s research interests lie at the intersection of causal inference, high-dimensional statistics, and machine learning. He has received several recognitions for his work, including the INFORMS George B. Dantzig Dissertation Award (2nd place) and MIT George Sprowls PhD Thesis Award in Artificial Intelligence & Decision-making.

















