BEAST 2

31 March 2021 by Remco Bouckaert

In a previous post, it was shown how you can start BEAST from the command line for your operating system, which can make BEAST run faster and gives more flexibility. BEAST has a number of commmand line options that are not available from the GUI version. Here we have a look at these options and give some context on when to use them.

`-validate` Parse the XML, but do not run

If the XML does not parse, an error will be reported. If the XML will not start because of incompatible priors, for examples, that will be reported as well. So, this option is useful for debugging XML

`-seed` Specify a random number generator seed

Though BEAST uses random numbers for MCMC, random starting trees, etc., in practice these are generated by a deterministic algorithm that rely on a starting value called the seed. Different runs of BEAST with the same seed should result in identical traces, though small differences in hardware and Java versions may cause tiny deviations resulting in difference traces.

If the seed is not specified, it will be based on the clock time in milliseconds. If you start multiple instances BEAST through a script, be aware to delay starting them by at least a millisecond, otherwise some runs may use the same seed, and the BEAST runs may be identical.

`-DF` and `-DFout` for flexible alignment management

With the -D name=value option, attribute values of the XML containing $(name) will be replaced by value, which is quite handy to parameterise the XML. However, command lines can become a bit unwieldy when there are many parameters, or values are quite large. Also, it is not possible to replace large sections, like all sequences.

Now, BEAST has a -DF option that specifies a file in say JSON format that defines name/value pairs, which is what a JSON dictionary provides quite naturally, and allows for multiple lines for values. For example, like so:

{
"sequences":"
<sequence taxon='D4Brazi82'>            ATGCGATGCG        </sequence>
<sequence taxon='D4ElSal83'>            ATGCGATGCG        </sequence>
...
<sequence taxon='D4Thai84'>             GTGCGATGCG        </sequence>
";
"datetrait":"D4Brazi82  = 1982,
                D4ElSal83  = 1983,
...
                D4Thai84   = 1984"
}

where ... means many more of the same. This allows for using the same analysis with multiple data sets (or multiple analyses with the same data set), which can be handy for well calibrated simulation studies or situations where the data set rapidly evolves.

The resulting XML, where user defined parameters are replaced by the information from the JSON file, is by default written to a file with the same name as input XML file, but with .out added before .xml (so input beast.xml becomes output beast.out.xml). The output file can be specified using the -DFout option, e.g.

beast -DF definitions.json -DFout result.xml beast.xml

If no output is desired, you can output to /dev/null using -DFout /dev/null on OS X and Linux, or -DFout NUL on Windows.

`-D` option for default values for user defined values

BEAST has the -D option to pass values for user defined parameters. For example, specifying chainLength='$(chainLength)' in the XML allows you to run BEAST with

bin/beast -D chainLength=1000000 beast.xml

and in the XML chainLength='$(chainLength)' is interpreted as chainLength='1000000'.

By specifying chainLength='$(chainLength=1000000)' in beast.xml, the value 1000000 is assumed to be default, so no -D option is required. However, it can be specified if desired.

`-working` Change working directory to input file’s directory

This causes trace and tree files to be written in the same directory as the input XML file.

`-prefix` Specify a prefix for all output log filenames

This can be handy to put log files in a specific directory other than the working directory.

`-overwrite` Allow overwriting of log files

By default, log files are only written if the file does not already exists, and BEAST halts when it finds an already existing file and ask whether the file should be overwritten with options Y=yes, A=yes and do the same for all other files. Any other key stops BEAST.

By using the -overwrite flag, this check is ignored and any existing file will be overwritten.

`-resume` Allow appending of log files

BEAST writes a state file with the name of the XML file and state extension, e.g. for a beast.xml it writes beast.xml.state in the working directory. When BEAST is started with the resume flag, it loads the state from the state file, and appends new MCMC samples to the trace and tree log. If for some reason (power outage, disk full, exceeded time on cluster) the MCMC chain was interrupted and trace and tree log are not of the same length, BEAST removes the log entries from the longest file so they all start at the same sample number.

`-statefile` Specify the filename for storing/restoring the state

Use this if you want to have the state file in a specific directory.

`-sampleFromPrior` samples from prior for MCMC analysis (by adding `sampleFromPrior="true"` in the first run element)

If you have not already specified sampleFromPrior="true" in the run element, this will cause the MCMC sampler to ignore the likelihood and only sample from the prior. It is important to sample from the prior to detect whether the data actually changes parameter priors, or just reconfirm what you already knew before the analysis.

It also shows how different priors can interact with each other. In particular tree priors may show unexpected behaviour when there are multiple calibrations that interfere with each other as well as with parameters of the tree prior.

`-strictversions` Use only package versions as specified in the `required` attribute

You can use this option to replicate an analysis with exactly those package versions of the original analysis. The way it works is that BEAUti adds a required attribute on the beast-element in the XML containing packages and their versions used to set up the analysis. Of course, you can edit the XML by hand and change version numbers and add packages if you like.

When you start BEAST with the strictversions option it only loads packages and versions as specified in the required attribute. Of course, the versions of these packages must be installed for BEAST to be able to load them. Therefore, the package manager in BEAUti allows specifying specific versions of the package to install, and multiple package version can be installed side by side. By default, the latest version of the package that is installed will be loaded, unless the -strictversions flag is set. The addonmanager utility has a -version flag for specifying the package version to install, if you prefer installing packages from the command line.

`-loglevel` error,warning,info,debug,trace

This determines the number of error messages shown on screen. The least verbose level is error, which only shows severe error messages, and the most verbose trace, which usually shows too much information, but can be useful for debugging. By default, the log level is set at info.

`-errors` Specify maximum number of numerical errors before stopping

By default, a single error will stop BEAST. Setting -errors to less than 0 results in the same behaviour. If errors is set to larger than 1, unexpected behaviour may follow, dependent on the severity of the error.

`-noerr` Suppress all output to standard error

This reduces the amount of error messages, which may speed up things a bit. Note that messages generated for log levels warning and error will be suppressed. This may be removed in the future.

`-window` Provide a console window

In general, this will execute a bit slower due to the work required in communicating with the console window than running BEAST in a terminal.

`-options` Display an options dialog

This provides a GUI interface to BEAST and displays the familiar BEAST dialog shown when double clicking the BEAST icon. Not all command line options are available in the GUI version.

`-help` Shows all available options

When new versions are released, you might check them out to see whether new options were added.

`-version` Print version number and stop

Print the currently installed BEAST package version number and stop. Handy to check which version you are using. Note that when you have say version 2.6.0 downloaded and installed, but the BEAST package is upgraded to v2.6.3, it will print v2.6.3, since that is the code being executed. The v2.6.0 beast script will only be used to load the v2.6.3 package, nothing else.

Performance and load balancing options

There are a number of performance and load balancing options in the table below. They interact with each other, so are addressed together here.

Option	Short description
`-threads`	The number of computational threads to use (default 1), -1 for number of cores
`-instances`	divide site patterns amongst number of threads (use with -threads option)
`-java`	Use Java only, no native implementations
`-beagle`	Use beagle library if available
`-beagle_info`	BEAGLE: show information on available resources
`-beagle_order`	BEAGLE: set order of resource use
`-beagle_CPU`	BEAGLE: use CPU instance
`-beagle_GPU`	BEAGLE: use GPU instance if available
`-beagle_SSE`	BEAGLE: use SSE extensions if available
`-beagle_single`	BEAGLE: use single precision if available
`-beagle_double`	BEAGLE: use double precision if available
`-beagle_scaling`	BEAGLE: specify scaling scheme to use

Using the BEAGLE library usually helps considerably in speeding up BEAST runs, and is used by default. If you do not want to use BEAGLE, use the -java option, and all other BEAGLE options will be ignored. Once you have BEAGLE installed, you can run beast -beagle_info to find out which resources are available. If you have a suitable GPU (most laptops don’t), you can use these using CUDA or OpenCL (see BEAGLE site for details).

This is the output I get for beast -beagle_info on my laptop, but here is a more elaborate example with GPUs.

--- BEAGLE RESOURCES ---

0 : CPU
    Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU

To choose a resource, you can use the -beagle, -beagle_CPU, -beagle_SSE, -beagle_GPU flags, which select the default, CPU, SSE and GPU resource respectively (unless the -java flag is used, then this option is ignored). If you have multiple partitions, it may be useful to use different resources (e.g. different GPUS) for different partitions. This can be done with the -beagle_order option, which takes a comma separated list of resources (see example).

The -beagle_single flag makes BEAGLE use single precision, which can be faster especially on GPUs, but is also less accurate. The -beagle_double option on the other hand makes BEAGLE use double precision.

When BEAGLE encounters an underflow (common with large trees), it uses a scaling technique to deal with it. The -beagle_scaling flag sets the scaling scheme, which is dynamic by default. It can take options none: for no scaling at all, dynamic: rescale when needed and reuse scaling factors, always: rescale every node, every site, every time - slow but safe, delayed: postpone until first underflow then switch to always, and auto: BEAGLE automatic scaling - currently playing it safe with always.

Another things that may or may not help (depending on your data) is using threads. For most analyses, the tree likelihood dominates the calculation time (though computationally expensive priors as in the MASCOT and PIQMEE packages also take a lot of computation). By default when using -threads each tree likelihood split its alignment into equal parts and likelihoods are calculated in paralel. Depending on the number of patterns in the alignment you can get better performance. It is also possible to get worse performance if the overhead of managing threads outweighs the speedup of calculating in paralel, so you must experiment with your data to see what works best.

Using the -instances option allows you to specify how many splits the tree likelihood makes. This is useful if you have many partitions, and some of the threading should go towards calculating tree likelihoods of partitions in paralel, while each partition itself also is calculated in paralel. If partitions are rather unbalanced, with some very large and some very small partitions, you can control the individual number of threads/splits per tree likelihood by setting the threads attribute on the corresponding ThreadedTreeLikelihoods.

At CIPRES they did a large study resulting in the following settings:

Data	Data				Other beagle
partitions	patterns	-threads	-instances	GPUs	parameters
Nucleotide data
1 to 3	<750	1	1		-beagle_SSE
1 to 3	750-2,999	3	3		-beagle_SSE
1 to 3	3,000-9,999	6	6		-beagle_SSE
1 to 3	10,000-39,999	1	1	1	-beagle_GPU
1 to 3	>=40,000	4	4	4	-beagle_GPU -beagle_order 1,2,3,4
4 to 19	<1,200	1	1		-beagle_SSE
4 to 19	1,200-4,999	3	3		-beagle_SSE
4 to 19	5,000-19,999	6	6		-beagle_SSE
4 to 19	>=20,000	1	1	1	-beagle_GPU
>=20	any	4	1		-beagle_SSE
Amino acid data
1	<5,000	1	1	1	-beagle_GPU
1	>=5,000	4	4	4	-beagle_GPU -beagle_order 1,2,3,4
2 to 39	any	1	1	1	-beagle_GPU
>=40	any	24	1		-beagle_SSE

(Source: Mark Miller’s post on the user list.)

Note: BEAGLE is not used for specialised tree likelihoods as in SNAPP or SNAPPER, and due to the structure of these likelihoods and the nature of BEAGLE probably never will be.

BEAST2

BEAST command line options

`-validate` Parse the XML, but do not run

`-seed` Specify a random number generator seed

`-DF` and `-DFout` for flexible alignment management

`-D` option for default values for user defined values

`-working` Change working directory to input file’s directory

`-prefix` Specify a prefix for all output log filenames

`-overwrite` Allow overwriting of log files

`-resume` Allow appending of log files

`-statefile` Specify the filename for storing/restoring the state

`-sampleFromPrior` samples from prior for MCMC analysis (by adding `sampleFromPrior="true"` in the first run element)

`-strictversions` Use only package versions as specified in the `required` attribute

`-loglevel` error,warning,info,debug,trace

`-errors` Specify maximum number of numerical errors before stopping

`-noerr` Suppress all output to standard error

`-window` Provide a console window

`-options` Display an options dialog

`-help` Shows all available options

`-version` Print version number and stop

Performance and load balancing options

Bayesian evolutionary analysis by sampling trees

-validate Parse the XML, but do not run

-seed Specify a random number generator seed

-DF and -DFout for flexible alignment management

-D option for default values for user defined values

-working Change working directory to input file’s directory

-prefix Specify a prefix for all output log filenames

-overwrite Allow overwriting of log files

-resume Allow appending of log files

-statefile Specify the filename for storing/restoring the state

-sampleFromPrior samples from prior for MCMC analysis (by adding sampleFromPrior="true" in the first run element)

-strictversions Use only package versions as specified in the required attribute

-loglevel error,warning,info,debug,trace

-errors Specify maximum number of numerical errors before stopping

-noerr Suppress all output to standard error

-window Provide a console window

-options Display an options dialog

-help Shows all available options

-version Print version number and stop

Performance and load balancing options

Bayesian evolutionary analysis by sampling trees

`-validate` Parse the XML, but do not run

`-seed` Specify a random number generator seed

`-DF` and `-DFout` for flexible alignment management

`-D` option for default values for user defined values

`-working` Change working directory to input file’s directory

`-prefix` Specify a prefix for all output log filenames

`-overwrite` Allow overwriting of log files

`-resume` Allow appending of log files

`-statefile` Specify the filename for storing/restoring the state

`-sampleFromPrior` samples from prior for MCMC analysis (by adding `sampleFromPrior="true"` in the first run element)

`-strictversions` Use only package versions as specified in the `required` attribute

`-loglevel` error,warning,info,debug,trace

`-errors` Specify maximum number of numerical errors before stopping

`-noerr` Suppress all output to standard error

`-window` Provide a console window

`-options` Display an options dialog

`-help` Shows all available options

`-version` Print version number and stop