31 March 2021 by Remco Bouckaert
In a previous post, it was shown how you can start BEAST from the command line for your operating system, which can make BEAST run faster and gives more flexibility. BEAST has a number of commmand line options that are not available from the GUI version. Here we have a look at these options and give some context on when to use them.
-validate
Parse the XML, but do not run
If the XML does not parse, an error will be reported. If the XML will not start because of incompatible priors, for examples, that will be reported as well. So, this option is useful for debugging XML
-seed
Specify a random number generator seed
Though BEAST uses random numbers for MCMC, random starting trees, etc., in practice these are generated by a deterministic algorithm that rely on a starting value called the seed. Different runs of BEAST with the same seed should result in identical traces, though small differences in hardware and Java versions may cause tiny deviations resulting in difference traces.
If the seed is not specified, it will be based on the clock time in milliseconds. If you start multiple instances BEAST through a script, be aware to delay starting them by at least a millisecond, otherwise some runs may use the same seed, and the BEAST runs may be identical.
-DF
and -DFout
for flexible alignment management
With the -D name=value
option, attribute values of the XML containing $(name)
will be replaced by value, which is quite handy to parameterise the XML. However, command lines can become a bit unwieldy when there are many parameters, or values are quite large. Also, it is not possible to replace large sections, like all sequences.
Now, BEAST has a -DF
option that specifies a file in say JSON format that defines name/value pairs, which is what a JSON dictionary provides quite naturally, and allows for multiple lines for values. For example, like so:
{
"sequences":"
<sequence taxon='D4Brazi82'> ATGCGATGCG </sequence>
<sequence taxon='D4ElSal83'> ATGCGATGCG </sequence>
...
<sequence taxon='D4Thai84'> GTGCGATGCG </sequence>
";
"datetrait":"D4Brazi82 = 1982,
D4ElSal83 = 1983,
...
D4Thai84 = 1984"
}
where ...
means many more of the same. This allows for using the same analysis with multiple data sets (or multiple analyses with the same data set), which can be handy for well calibrated simulation studies or situations where the data set rapidly evolves.
The resulting XML, where user defined parameters are replaced by the information from the JSON file, is by default written to a file with the same name as input XML file, but with .out
added before .xml
(so input beast.xml
becomes output beast.out.xml
). The output file can be specified using the -DFout
option, e.g.
beast -DF definitions.json -DFout result.xml beast.xml
If no output is desired, you can output to /dev/null
using -DFout /dev/null
on OS X and Linux, or -DFout NUL
on Windows.
-D
option for default values for user defined values
BEAST has the -D
option to pass values for user defined parameters. For example, specifying chainLength='$(chainLength)'
in the XML allows you to run BEAST with
bin/beast -D chainLength=1000000 beast.xml
and in the XML chainLength='$(chainLength)'
is interpreted as chainLength='1000000'
.
By specifying chainLength='$(chainLength=1000000)'
in beast.xml
, the value 1000000 is assumed to be default, so no -D
option is required. However, it can be specified if desired.
-working
Change working directory to input file’s directory
This causes trace and tree files to be written in the same directory as the input XML file.
-prefix
Specify a prefix for all output log filenames
This can be handy to put log files in a specific directory other than the working directory.
-overwrite
Allow overwriting of log files
By default, log files are only written if the file does not already exists, and BEAST halts when it finds an already existing file and ask whether the file should be overwritten with options Y=yes, A=yes and do the same for all other files. Any other key stops BEAST.
By using the -overwrite
flag, this check is ignored and any existing file will be overwritten.
-resume
Allow appending of log files
BEAST writes a state
file with the name of the XML file and state
extension, e.g. for a beast.xml
it writes beast.xml.state
in the working directory. When BEAST is started with the resume
flag, it loads the state from the state file, and appends new MCMC samples to the trace and tree log. If for some reason (power outage, disk full, exceeded time on cluster) the MCMC chain was interrupted and trace and tree log are not of the same length, BEAST removes the log entries from the longest file so they all start at the same sample number.
-statefile
Specify the filename for storing/restoring the state
Use this if you want to have the state file in a specific directory.
-sampleFromPrior
samples from prior for MCMC analysis (by adding sampleFromPrior="true"
in the first run element)
If you have not already specified sampleFromPrior="true"
in the run element, this will cause the MCMC sampler to ignore the likelihood and only sample from the prior. It is important to sample from the prior to detect whether the data actually changes parameter priors, or just reconfirm what you already knew before the analysis.
It also shows how different priors can interact with each other. In particular tree priors may show unexpected behaviour when there are multiple calibrations that interfere with each other as well as with parameters of the tree prior.
-strictversions
Use only package versions as specified in the required
attribute
You can use this option to replicate an analysis with exactly those package versions of the original analysis. The way it works is that BEAUti adds a required
attribute on the beast
-element in the XML containing packages and their versions used to set up the analysis. Of course, you can edit the XML by hand and change version numbers and add packages if you like.
When you start BEAST with the strictversions
option it only loads packages and versions as specified in the required
attribute. Of course, the versions of these packages must be installed for BEAST to be able to load them. Therefore, the package manager in BEAUti allows specifying specific versions of the package to install, and multiple package version can be installed side by side. By default, the latest version of the package that is installed will be loaded, unless the -strictversions
flag is set. The addonmanager utility has a -version
flag for specifying the package version to install, if you prefer installing packages from the command line.
-loglevel
error,warning,info,debug,trace
This determines the number of error messages shown on screen. The least verbose level is error
, which only shows severe error messages, and the most verbose trace
, which usually shows too much information, but can be useful for debugging. By default, the log level is set at info
.
-errors
Specify maximum number of numerical errors before stopping
By default, a single error will stop BEAST. Setting -errors
to less than 0 results in the same behaviour. If errors
is set to larger than 1, unexpected behaviour may follow, dependent on the severity of the error.
-noerr
Suppress all output to standard error
This reduces the amount of error messages, which may speed up things a bit. Note that messages generated for log levels warning
and error
will be suppressed. This may be removed in the future.
-window
Provide a console window
In general, this will execute a bit slower due to the work required in communicating with the console window than running BEAST in a terminal.
-options
Display an options dialog
This provides a GUI interface to BEAST and displays the familiar BEAST dialog shown when double clicking the BEAST icon. Not all command line options are available in the GUI version.
-help
Shows all available options
When new versions are released, you might check them out to see whether new options were added.
-version
Print version number and stop
Print the currently installed BEAST package version number and stop. Handy to check which version you are using. Note that when you have say version 2.6.0 downloaded and installed, but the BEAST package is upgraded to v2.6.3, it will print v2.6.3, since that is the code being executed. The v2.6.0 beast script will only be used to load the v2.6.3 package, nothing else.
Performance and load balancing options
There are a number of performance and load balancing options in the table below. They interact with each other, so are addressed together here.
Option | Short description |
---|---|
-threads |
The number of computational threads to use (default 1), -1 for number of cores |
-instances |
divide site patterns amongst number of threads (use with -threads option) |
-java |
Use Java only, no native implementations |
-beagle |
Use beagle library if available |
-beagle_info |
BEAGLE: show information on available resources |
-beagle_order |
BEAGLE: set order of resource use |
-beagle_CPU |
BEAGLE: use CPU instance |
-beagle_GPU |
BEAGLE: use GPU instance if available |
-beagle_SSE |
BEAGLE: use SSE extensions if available |
-beagle_single |
BEAGLE: use single precision if available |
-beagle_double |
BEAGLE: use double precision if available |
-beagle_scaling |
BEAGLE: specify scaling scheme to use |
Using the BEAGLE library usually helps considerably in speeding up BEAST runs, and is used by default. If you do not want to use BEAGLE, use the -java
option, and all other BEAGLE options will be ignored. Once you have BEAGLE installed, you can run beast -beagle_info
to find out which resources are available. If you have a suitable GPU (most laptops don’t), you can use these using CUDA or OpenCL (see BEAGLE site for details).
This is the output I get for beast -beagle_info
on my laptop, but here is a more elaborate example with GPUs.
--- BEAGLE RESOURCES ---
0 : CPU
Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU
To choose a resource, you can use the -beagle
, -beagle_CPU
, -beagle_SSE
, -beagle_GPU
flags, which select the default, CPU, SSE and GPU resource respectively (unless the -java
flag is used, then this option is ignored). If you have multiple partitions, it may be useful to use different resources (e.g. different GPUS) for different partitions. This can be done with the -beagle_order
option, which takes a comma separated list of resources (see example).
The -beagle_single
flag makes BEAGLE use single precision, which can be faster especially on GPUs, but is also less accurate. The -beagle_double
option on the other hand makes BEAGLE use double precision.
When BEAGLE encounters an underflow (common with large trees), it uses a scaling technique to deal with it. The -beagle_scaling
flag sets the scaling scheme, which is dynamic
by default. It can take options none
: for no scaling at all, dynamic
: rescale when needed and reuse scaling factors, always
: rescale every node, every site, every time - slow but safe, delayed
: postpone until first underflow then switch to always
, and auto
: BEAGLE automatic scaling - currently playing it safe with always
.
Another things that may or may not help (depending on your data) is using threads. For most analyses, the tree likelihood dominates the calculation time (though computationally expensive priors as in the MASCOT and PIQMEE packages also take a lot of computation).
By default when using -threads
each tree likelihood split its alignment into equal parts and likelihoods are calculated in paralel. Depending on the number of patterns in the alignment you can get better performance. It is also possible to get worse performance if the overhead of managing threads outweighs the speedup of calculating in paralel, so you must experiment with your data to see what works best.
Using the -instances
option allows you to specify how many splits the tree likelihood makes. This is useful if you have many partitions, and some of the threading should go towards calculating tree likelihoods of partitions in paralel, while each partition itself also is calculated in paralel. If partitions are rather unbalanced, with some very large and some very small partitions, you can control the individual number of threads/splits per tree likelihood by setting the threads
attribute on the corresponding ThreadedTreeLikelihoods.
At CIPRES they did a large study resulting in the following settings:
Data | Data | Other beagle | ||||
---|---|---|---|---|---|---|
partitions | patterns | -threads | -instances | GPUs | parameters | |
Nucleotide data | ||||||
1 to 3 | <750 | 1 | 1 | -beagle_SSE | ||
1 to 3 | 750-2,999 | 3 | 3 | -beagle_SSE | ||
1 to 3 | 3,000-9,999 | 6 | 6 | -beagle_SSE | ||
1 to 3 | 10,000-39,999 | 1 | 1 | 1 | -beagle_GPU | |
1 to 3 | >=40,000 | 4 | 4 | 4 | -beagle_GPU -beagle_order 1,2,3,4 | |
4 to 19 | <1,200 | 1 | 1 | -beagle_SSE | ||
4 to 19 | 1,200-4,999 | 3 | 3 | -beagle_SSE | ||
4 to 19 | 5,000-19,999 | 6 | 6 | -beagle_SSE | ||
4 to 19 | >=20,000 | 1 | 1 | 1 | -beagle_GPU | |
>=20 | any | 4 | 1 | -beagle_SSE | ||
Amino acid data | ||||||
1 | <5,000 | 1 | 1 | 1 | -beagle_GPU | |
1 | >=5,000 | 4 | 4 | 4 | -beagle_GPU -beagle_order 1,2,3,4 | |
2 to 39 | any | 1 | 1 | 1 | -beagle_GPU | |
>=40 | any | 24 | 1 | -beagle_SSE |
(Source: Mark Miller’s post on the user list.)
Note: BEAGLE is not used for specialised tree likelihoods as in SNAPP or SNAPPER, and due to the structure of these likelihoods and the nature of BEAGLE probably never will be.