The majority of computing time is spent in calculating the treelikelihood, so we will concentrate on tips on how to speed up this calculation. There are no hard and fast rules, and you have to try a few combinations to see what works best for your data.
In general, when there are many patterns and/or a large number of states, BEAGLE GPU works best. Often BEAGLE SSE gives good performance on medium data sets, or when there are few patterns. For nucleotide data, ThreadedTreeLikelihood can be helpful.
BEAGLE is a high performance library that efficiently calculates treelikelihoods on CPU and GPUs. Note: a GPU is not necessary, BEAGLE can give considerable performance improvements on CPU as well. See http://beast.bio.ed.ac.uk/BEAGLE for more details. How it is installed and used with BEAST depends on the platform: https://github.com/beagle-dev/beagle-lib
By default, BEAST tries to use BEAGLE for treelikelihood calculations, if it is installed. However, this does not always leads to performance improvements, especially with nucleotide data where there are few patterns. You can start BEAST with the -java command line flag to ensure BEAGLE is not considered. Some likelihoods that have a ‘useJava’ flag (see below) to ensure only Java is considered.
It is worth trying the -beagle_SSE option, which uses a CPU version optimised for the SSE instruction set, which most CPUs support.
(In versions before v2.4.0, the BEASTlabs-package — see Manage packages on how to install — contained the ThreadedTreeLikelihood). The ThreadedTreeLikelihood has a treelikelihood that splits up the patterns into equal parts and uses a thread for each of the parts.
The number of parts is determined by the number of threads (and can be specified using the ‘threads’ attribute). To use the ThreadedTreelikelihood, open the XML file in a text editor and change
There is a flag ‘useJava’ to indicate the calculation should use the Java treelikelihood, and not consider BEAGLE. (When using BEAST versions before v2.4.0, to use BEAGLE, set useJava=’false’
If you want to limit the number of threads used for splitting the patterns, use the threads attribute. For example, to limit the number of partitions to 3, use
To run multiple BEAGLE instances with ThreadedTreeLikelihood”, the number of threads used to start BEAST must be at least the number of BEAGLE instances, so you would want to start BEAST using
if you want to use 3 BEAGLE instances (for versions before v2.4.0 use the -beagle_instances flag instead of -instances). Using only the ‘instances’ flag but not the ‘threads’ flag results in just 1 thread being created.
The beast-classic add-on (see Manage packages on how to install) has a Treelikelihood used for discrete phylogeography and ancestral reconstruction called AncestralTreelikelihood. Since it typically only uses a single site, threading does not help.
There is a flag ‘useJava’ to indicate the calculation should not consider BEAGLE.
Multiple partitions: CompoundDistribution
If you have multiple partitions, you can consider the useThreads flag of CompoundDistribution, which is false by default. If set to true, all distributions inside the CompoundDistribution will be calculated in parallel using the number of threads used to run BEAST.
If the treelikelihoods share parameters, e.g through a relaxed clock model, this may not always be safe.
For large analysis, getting through burn-in is a considerable waste of time. When a good starting point can be found burn-in can be reduced, and particle filter approach allows for finding a good starting point relatively efficiently.
For the adventurous, there is the beast.inference.ParticleLauncherByFile method in the BEASTLabs-add-on (see Manage packages on how to install). It runs a number of chains in parallel that communicate with each other through the file system on set intervals. When a chain gets too far behind, it samples a state from the other chains proportional to their posteriors (effectively taking the most likely most of the time).
To convert a BEAST XML file, replace the run entry
and do not forget to close the mcmc element, just before the </run> closing tag
Inside rootdir, it creates subdirectories particleX where X is a number from 0 to nrofparticles.
The text content of the run element is interpreted as a script, and it is executed for every particle. There are a few variables that are replaced before launching the script
You have to adapt the script for your own cluster. The example below is an example for the nesi-cluster using loadLeveler.