A random partition model with dependence on covariates

Del Sole Claudio, Bocconi University
Lijoi Antonio, Bocconi University
Pruenster Igor, Bocconi University

Bayesian nonparametric models often assume some kind of homogeneity among the observations, motivated by the exchangeability assumption in de Finetti’s representation theorem. In presence of multiple groups of observations, homogeneity within each group is usually modeled through partial exchangeability. Instead, including continuous covariates within a fully nonparametric regression model represents a more challenging task, and existing models in the literature face a trade-off between flexibility in modelling the latent partition structure (MacEachern, 2000), its analytical tractability (Muller et al., 2011), and its consistency for new observations. This work introduces a novel class of covariate-dependent random probability measures, arising from the normalization of suitable random measures, which depend on covariates through a kernel structure: specifically, the jumps of a common discrete random measure are rescaled via multiplication by a similarity kernel. Such kernel-based weighting approach is inspired by kernel stick-breaking processes (Dunson & Park, 2008) and closely related to Dunson et al. (2007), Foti & Williamson (2012) and Antoniano-Villalobos et al. (2014). A noteworthy example arises when the distribution of such random measure is a specific transformation of the distribution of a stable completely random measure. This construction induces a random partition model with dependence on covariates, which is characterized by great flexibility while retaining some analytical tractability, thanks to the introduction of suitable latent variables; moreover, it is inherently consistent for new observations. Both the partition probability function and the posterior distribution of the common random measure are derived in closed form, conditionally on such latent variables. A marginal Gibbs sampler, based on a generalized Pólya urn scheme, is also developed for posterior computation, together with a conditional slice sampling algorithm (Foti & Williamson, 2012). Our proposal can be effectively exploited as a clustering or species sampling model which incorporates information available through both discrete and continuous covariates; in addition, it may represent the building block for the construction of nonparametric regression models.

Area: CS15 - Recent Advances in Bayesian Nonparametric Statistics (Marta Catalano and Beatrice Franzolini)

Keywords: Bayesian nonparametrics, random partition, completely random measure

Please Login in order to download this file