A random partition model with dependence on covariates

Del Sole Claudio, Bocconi University

Bayesian nonparametric models often assume some kind of homogeneity among the observations, motivated by the exchangeability assumption in De Finetti's representation theorem. In presence of multiple groups of observations, possibly sharing similar features, homogeneity within each group is usually modeled through partial exchangeability, corresponding to a regression model with categorical covariates. Nevertheless, including continuous covariates within a fully nonparametric regression model represents a more challenging task. On one hand, the dependent Dirichlet process framework (MacEachern, 2000), based on the stick-breaking construction, and its subsequent developments allow for great flexibility in modelling the latent partition structure, but mostly lack analytical tractability; on the other hand, the PPMx model (Muller et al., 2011) relies on a tractable product-form partition probability function, but requires the specification of a model for the covariates in order to be consistent for new observations. This talk introduces a novel class of covariate-dependent random probability measures, arising from the normalization of completely random measures. Specifically, the jumps of a common CRM are rescaled via multiplication by a suitable similarity kernel, which accounts for the structure in the space of covariates; such kernel-based weighting approach is inspired by kernel stick-breaking processes (Dunson & Park, 2008) and closely related to Dunson et al. (2007), Foti & Williamson (2012) and Antoniano-Villalobis et al. (2014). This construction induces a random partition model with dependence on covariates, which is characterized by great flexibility while retaining some analytical tractability, thanks to the introduction of suitable latent variables; moreover, it is inherently consistent for new observations. Both the partition probability function and the posterior distribution of the common CRM are derived in closed form, conditionally on such additional latent variables, and further investigated in the special case of stable processes. A marginal Gibbs sampler, based on a generalized Polya urn scheme, is also developed for posterior computation, together with a conditional slice sampling algorithm (Foti & Williamson, 2012). Our proposal can be effectively exploited as a clustering or species sampling model which incorporates information available through covariates: observations with similar covariates, according to some kernel choice, are more likely to belong to the same group or species. In addition, this construction may act as the building block for nonparametric regression models, characterized by an infinite mixture of parametric regression models, with the mixing distribution changing with covariates.

Area: CS15 - Recent Advances in Bayesian Nonparametric Statistics (Marta Catalano and Beatrice Franzolini)

Keywords: Bayesian nonparametrics, random partition, completely random measure

Please Login in order to download this file