A couple of times in the past we have been guilty of publishing papers in which the mean log posteriors of two models were compared with AIC. This is not a theoretically sound method for model comparison in a Bayesian setting because the AIC is designed to be used to compare the (penalized) maximum likelihood of two models.
The most sound theoretical framework for model comparison in a Bayesian framework is calculation of the Bayes Factor (BF), which is the ratio of the marginal likelihoods (marginal with respect to the prior) of the two models. Generally speaking calculating the BF involves a Bayesian MCMC that averages over both models (using something called ‘reversible jump’ MCMC), and this is not something that can be done in BEAST (yet). However there are a couple of ways of approximately calculating the marginal likelihood of each model (and therefore the Bayes factor between them) that can be done by processing the output of two BEAST analyses.
To estimate the marginal likelihood you can
- Use the harmonic mean estimator in Tracer -- never do this! The estimator has infinite variance and has trouble reliably distinguishing between models (Baele et al. 2012).
- Use stepping stone/path sampling analysis using the model-selection package. This is described in the tutorial for distinguishing models with different taxon assignments allowing you to do species delimitation using the BFD* method(Leaché et al, 2014).
- Use nested sampling (Maturana et al, 2018) using the NS package. Nested sampling not only gives you an estimate of the marginal likelihood, but also an estimate of the variance of that estimate, which allows you to judge how much confidence you can have in the corresponding BFs.
By estimating the marginal likelihood of the models and then taking the difference (in log space) you get the log BF and you can look up this number in a table (Kass & raftery, 1995) to decide when the BF (log BF) is big enough to strongly favour one model over the other (BF > 20 is strong support for favoured model).
Baele, G., Li, W.L.S., Drummond, A.J., Suchard, M.A. and Lemey, P., 2012. Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics. Molecular biology and evolution, 30(2), pp.239-243.
Kass, R.E. and Raftery, A.E., 1995. Bayes factors. Journal of the american statistical association, 90(430), pp.773-795.
Leaché, A.D., Fujita, M.K., Minin, V.N. and Bouckaert, R.R., 2014. Species delimitation using genome-wide SNP data. Systematic Biology, 63(4), pp.534-542.
Maturana R, P., Brewer, B. J., Klaere, S., & Bouckaert, R. (2018). Model selection and parameter inference in phylogenetics using Nested Sampling. Systematic Biology, syy050. doi:10.1093/sysbio/syy050.