1 June 2024 by Remco Bouckaert
There are quite a few cases where a simulation study is useful, e.g. for testing whether a theory can be falsified. BEAST has a number of ways to simulate (hyper-)parameters, trees and sequence data.
Simulate by MCMC
If you want to simulate trees and its parameters, it is possible to set up an analysis with the appropriate priors in BEAUti or craft an XML by hand, and use the sampleFromPrior
option for BEAST to sample trees and tree prior hyper parameters.
For a given tree, the beast.app.seqgen.SimulatedAlignment
can then be used to simulate an alignment.
The SimulatedAlignment
can be used as a replacement of Alignment
.
This can be useful in well calibrated simulation studies (Mendes et al, 2024).
The SimulatedAlignment
takes in a standard site model and allows for discretised gamma rate heterogeneity.
For simulating codon sequences, the CodonSubstModel package has a SimulatedCodonAlignment
class that can replace a CodonAlignment
and there is an example XML.
If you want to simulate under continuous gamma rate heterogeneity, the rbbeast.evolution.util.ContinuousGammaSimulatedAlignment
in the RBS package can be used.
(example XML).
DirectSimulator
The beast.base.inference.DirectSimulator
provides a simulator that is more efficient than sampling from MCMC, and uses independent implementations for directly simulating parameter values from parametric distributions.
There are examples XML files included in the BEAST distribution, also available here, here and here.
It requires hand crafting the XML.
Any Distribution that implements the sample
method can be used, including the YuleModel
for sampling trees and Prior
for sampling parameters, but it is quite limited.
The (re)MASTER packages
MASTER (Vaughan & Drummond 2013) is a BEAST 2 package aimed at generating simulators for stochastic models of structured and unstructured population dynamics, but also allows for simulating other trees and networks. It is more powerful than the DirectSimulator.
ReMaster (Vaughan, 2024) is a faster, leaner complete rewrite of MASTER and has more capabilities than MASTER. There is ample user documentation.
LinguaPhylo
LinguaPhylo (Drummond et al, 2024) is a probabilistic model specification language for reproducible phylogenetic analyses that allows simulation under a wide range of models. It is more powerful than the reMaster and comes with its own GUI: LPhyStudio.
LPhyBEAST is a program that converts LPhy model specification, and a data block into a BEAST 2 XML input file. The lphybeast and LPhyBeastExt packages for BEAST 2 are required for BEAST 2 to be run on these XML files.
References
AJ Drummond, K Chen, FK Mendes, D Xie. LinguaPhylo: a probabilistic model specification language for reproducible phylogenetic analyses [PLOS Computational Biology 19 (7), e1011226] (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011226)
Mendes FK, Bouckaert R, Carvalho LM, Drummond AJ. How to validate a Bayesian evolutionary model. bioRxiv. 2024:2024-02.
Vaughan TG, Drummond AJ. A stochastic simulator of birth-death master equations with application to phylodynamics. Mol Biol Evol 2013;30:1480–93.
Vaughan TG. ReMASTER: improved phylodynamic simulation for BEAST 2.7. Bioinformatics. 2024 Jan 1;40(1):btae015.