2 April 2025 by Remco Bouckaert
MAP with BEAST2
If you want to find the maximum aposteriori probability (MAP) state for your BEAST analysis, this can be found using simulated annealing (Van Laarhoven & Aarts, 1987), which is implemented in the BEASTLabs package.
Start with setting up an MCMC analysis in BEAUti, save the XML and in a text editor, replace the opening run
element with
<run spec="beastlabs.inference.SimulatedAnnealing" id="mcmc" chainLength="1000000" startTemp='0.001' endTemp='0.0001’>
Simulated annealing is a probabilistic optimisation technique inspired by the annealing process in metallurgy, where a material is heated and then slowly cooled to achieve a stable structure.
It resembles MCMC in that it occasionally allows worse solutions to be chosen, thus avoiding getting stuck in local optima.
Just like MCMC, a proposal is accepted when it improves the posterior, or with a probability depending on the temperature.
During the simulated annealing run, the temperature reduces from the startTemp
to the endTemp
where the end temperature is so low that hardly any reduction in posterior is accepted.
It takes chainLength
steps to go from the start to the end temperature.
The algorithm is guaranteed to find the optimal solution if run long enough, but in practice you want to run multiple runs to get a feeling for what make good parameters (start and end temperature and chain length). You can also resume, which then takes a good end-state of a simulated annealing run and resets the temperature to the high start temperature. Repeating the process can never make things worse.
There is an example XML file in the BEASTLabs package.
Maximum likelihood with BEAST2
By default, simulated annealing estimates the MAP state, but if you remove priors it should give you the maximum likelihood state.
This is easiest done by deleting the element with id="prior"
and all of its contents from the XML file.
Be aware that there are reference in the loggers to the prior (they look like <log idref="prior"/>
), which should be deleted as well.
Beware MAP/Maximum likelihood states
There are a few things to take in account when interpreting the MAP/ML state.
First, the “maximum” in MAP and ML is better called “high posterior”/”high likelihood” states, since almost surely it will be an approximation of the actual MAP or ML state in phylogenetics.
Second, be aware that this state can be very atypical for the posterior distribution. For example, if the likelihood consists of 100 coin tosses with p=0.8 being head, the ML state is 100 heads, but a more typical state is 80 heads and 20 tails: it is in fact 487 million times more likely encountering a 80 heads/20 tails state than a 100 heads state. For phylogenies, a similar situation can arrise when through the nearest neighbour configurations there are trees that are less likely than the MAP or ML tree.
Finally, MAP and ML trees do not contain any indication of uncertainty. The main reason for doing a full Bayesian analysis is being able to accurately estimate uncertainty. And without uncertainty estimates it is not science.
References
Van Laarhoven PJ, Aarts EH. Simulated annealing. Springer Netherlands; 1987.