Adaptive Metropolis Coupled MCMC(MC3) revisited

1 September 2021 by Remco Bouckaert

A previous post on adaptive MC3 did not quite point out how to set the number of MCMC samples between resampling and how to set the target acceptance probability for swaps. Here are some rules of thumb to determine these values and hot to set the resampleEvery and target attributes for adaptive MC3 analyses.

Setting resampleEvery

The resampleEvery attribute determines the number of MCMC sampled between attempts to swap chains. When setting resampleEvery too large, the adaptive mechanism does not get enough information to base changes of the temperatures on, so the swap acceptance probability may stay far away from the target. When resampleEvery is very small, apart from the overhead in swapping, chains will not moved very much, so there is no point in swapping.

What seems to be reasonable is to allow chains to run long enough such that every parameter in the model has had a chance to be changed at least once. Counting the number of parameters can be roughly done as follows: for every N dimensional real, boolean or integer parameter add N and for any tree add the number of taxa times 3 (with k taxa, there are about 2k parent indicator parameters, and about k internal node height parameters). This does not take dependencies between parameters into account (nucleotide frequencies would count as 4 while there are only 3 degrees of freedom since they have to sum to 1), but gives a principled lower bound on the resampleEvery value.

Setting target

Now that we know the resampleEvery value, we can set the target. Theoretical results point to a target acceptance probability of 0.234. Lower targets will results in hotter chains that can explore more of the sample space, but lower targets also means that hotter chains contribute less often to the cold chain.

Depending on the resampleEvery value, this can result in many accepted swaps with the cold chain that do not show up in effective sample sizes of parameters: a chain of 10 million samples that attempts to swap every 1000 samples with 4 chains (1 cold, 3 hot) has 10 thousand swap attempts of which 2340 can be expected to succeed and a third of these (780) can be expected to involve the cold chain. Note that the third comes from 1 divided by (the number of chains minus 1), which is in this case 1/(4-1)=1/3.

If we aim for ESS of 200, it does not appear to be sensible to swap the cold chain more often than that. This gives us a way to formulat the target temperature: calculate the number of swaps = chain length / resampleEvery. Then the number of swaps with the cold chain is (number of swaps)/(number of chains - 1). And our target is determined by the target ESS: target = (target ESS)/(number of cold chain swaps). So, for a chain of 10 million, 4 chains and target ESS of 200, we get target = 200/3333 = 0.06, and for 20 million 0.03, 100 million 0.006, etc.


Thanks to Nico Neureiter and Nicola Müller for formulating these rules of thumb.