An almost infinite sites model
A main challenge in molecular evolution is to provide computationally efficient mutation models with flexible assumptions that properly reflect genetic variation. The infinite sites model assumes that each mutation event occurs at a site never previously mutant, i.e. it does not allow recurrent mutations. This is reasonable for low mutation rates and makes statistical inference much more tractable. However, recurrent mutations are common enough to be observable from genetic variation data, even in species with low per-site mutation rates such as humans. The finite sites model on the other hand allows for recurrent mutations but is computationally unfeasible to work with in most cases. In this work, we bridge these two approaches by developing a novel molecular evolution model, the almost infinite sites model, that both admits recurrent mutations and is tractable. We provide a recursive characterisation of the likelihood of our proposed model and outline a parsimonious approximation scheme for computing it. We show the usefulness of our model in simulated and human mitochondrial data.
Area: CS8 - Combinatorial structures in probability and statistics (Elia Bisi)
Keywords: coalescent, molecular evolution, infinite sites, finite sites, population genetics
Please Login in order to download this file