Phase transitions for spectral estimators in generalized linear models via approximate message passing
In a generalized linear model (GLM), the goal is to estimate a d-dimensional signal x from an n-dimensional observation of the form f(Ax, w), where A is a design matrix and w is a noise vector. Well-known examples of GLMs include linear regression, phase retrieval, 1-bit compressed sensing, and logistic regression. We focus on the high-dimensional setting in which both the number of measurements n and the signal dimension d diverge, with their ratio tending to a fixed constant. Spectral methods provide a popular solution to obtain an initial estimate, and they are also commonly used as a ‘warm start’ for other algorithms. In particular, the spectral estimator is the principal eigenvector of a data-dependent matrix, whose spectrum exhibits a phase transition. Random matrix theory tools capture the phase transition and provide precise asymptotics on spectral methods when the design matrix A is i.i.d. Gaussian. However, in practice, data is heterogeneous (e.g., it comes from a mixed GLM with multiple signals to recover) and structured (e.g., it comes from a GLM with a correlated design matrix). To tackle these challenging settings, I will present a new strategy based on Approximate Message Passing (AMP). Our analysis gives an exact characterization of spectral methods and, therefore, it allows the optimization of the data pre-processing, which leads to a significant improvement in performance with respect to existing heuristic approaches. We highlight that our methodology based on AMP is broadly applicable, and it opens the way to the study of spiked matrices and of the corresponding spectral estimators in a variety of settings.
Area: IS3 - Mathematics of Machine Learning (Andrea Agazzi)
Keywords: High-dimensional estimation, generalized linear models, structured data, heterogeneous data, mixed regression, spectral estimator, approximate message passing