- 1 Introduction
- 2 Installing and running BEAST
- 3 Evolutionary rates and time scales
- 4 Parameters
- 5 Optimizing operators
- 6 Starting tree and fixing trees
- 7 Setting up models
- 8 Effective Sample Size (ESS) of parameters
- 9 Interpreting the results
- 10 Error messages
- 10.1 What does the error message I am getting mean?
- 10.2 Why is BEAST telling me that “the initial model is invalid because the state has a zero likelihood”?
- 10.3 BEAST seems to run but the posterior is reported as ? (question mark) and the none of the values change
- 10.4 I am getting the error “java.lang.OutofMemoryError Java heap space”, Is there a way I can increase the memory for the program?
- 10.5 I still get an OutofMemoryError in TreeAnnotator even when I use all the memory in the machine
- 11 How to set up tips sampling
- 12 Questions that haven’t been done yet:
What can BEAST do?
- A description of some of the analyses that can be performed using BEAST can be found here.
- See also the list of tutorials.
Installing and running BEAST
How do I install and run BEAST?
- That depends on the operating system you are using. Please look at the README file that was included in the package you downloaded.
How do I use BEAGLE with BEAST?
- How it is installed and used with BEAST depends on the platform: https://github.com/beagle-dev/beagle-lib/wiki
Evolutionary rates and time scales
I have X sequences sampled over a Y year time span; are they enough to estimate the substitution rate?
- It depends on the substitution rate. If the substitution rate is high enough to produce a substantial number of substitutions in the Y years then it may work. The easiest thing is to simply try it and see. It would be best to start with a simple model – HKY, constant population size and a strict molecular clock. If that seems to behave well then you might consider a more realisitic model, depending on the question you are trying to answer. If the time span is insufficient to provide information about the substitution rate then BEAST will not converge and the age of the root of the tree will simply increase to a very large value (and the rate will drop towards zero).
What do all the parameters do/mean?
Do I need to worry about optimizing operators if my ESSs are okay?
- No. Tuning the operators will only increase the efficiency of the sampling – resulting in better ESSs for the same chain length. If you are already getting suitable ESSs then that is fine. See this tutorial for more details about this subject
Why does the operator analysis continue to suggest that I decrease my
size attribute in order to improve my acceptance probability?
sizevalue in the
operator should be proportional to the height of your tree (say about 10% initially). If the tree is uncalibrated then the height of the tree is given in _substitutions per site_ which can be very small.
Starting tree and fixing trees
Can you designate a user-defined starting tree?
- Yes, you can insert a NEWICK format tree into the XML to act as a starting tree – see this how-to: Fix starting tree
How can you keep this topology constant while estimating other parameters, e.g., node height?
- Remove all the operators that act on the
Tree. In BEAUti you can do this by setting the weight of the following operators to zero: narrow exchange, wide exchange, Wilson Balding and subtree slide. Alternatively, you can remove these operators from the XML. Without these operators, or when they have zero weight, the actual topology of the tree won’t change.
Setting up models
How do I tell BEAST to use an outgroup?
- The simple answer is that you may not want to – BEAST will sample the root position along with the rest of the nodes in the tree. If you then calculate the proportion of trees that have a particular root, you obtain a posterior probability for this root position. However if you have a strong prior for an outgroup then you can constrain the ingroup to be monophyletic. See this tutorial for details of how to do this.
How do I run BEAST without data to sample the Prior?
In BEAUti, on the MCMC tab, click the Sample From Prior checkbox, save the XML and run with BEAST. In the XML, set
sampleFromPrior='true' on the mcmc element.
Does it matter what order the Priors & Likelihoods come in the XML?
How do tree prior distributions effect estimation of rates and dates?
Effective Sample Size (ESS) of parameters
Interpreting the results
How do I do model comparison?
How do I summarize the posterior distribution of trees?
What does the error message I am getting mean?
- Look at the Error Messages page for details of the different error messages.
Why is BEAST telling me that “the initial model is invalid because the state has a zero likelihood”?
- There are essentially three reasons that you can get the “initial model is invalid” error.
- Firstly, your initial tree could violate one or more of the boolean priors (priors with hard bounds including constraints on the monophyly of clades). This can be fixed by providing a starting tree that conforms to these constraints using the
element in the XML file. See this tutorial for details about how to use a starting tree. The starting tree should contain all constrained clades and the root node and any clade MRCAs should fit within any uniform priors (or translated exponential or lognormal priors). This means the provided NEWICK format tree should be rooted with branch length in the units of time being used in the priors. Technically it doesn’t matter whether the initial tree is ultrametric because BEAST will adjust it until it is. However, this process alters the ages of the nodes of the tree and thus could cause it to violate the hard bounds. If the constraint is on the age of the root of the tree, BEAST can be told to rescale the entire tree so that the root has a particular age. This is done by adding an initialiser
attribute to the
<mcmc> <init spec='beast.util.TreeParser' initial='@tree' newick='(...my newick tree here...);'/> ... </mcmc>
- Secondly, the initial tree could be particularly far from the optimum which may cause the calculation of the likelihood of the sequences to fail. More technically, the likelihood of a particular site is calculated by traversing the tree and taking the product of the probability for each branch. If the individual probabilities are small (because the data doesn’t fit the tree) then this product can rapidly approach, and will eventually be rounded to, zero. This is more likely to occur if there are many sequences as there are more probabilities to multiply. Once the likelihood for a given site is calculated, the logarithm is taken and then summed across all sites and thus long sequences do not cause this problem. The only solution to this problem is to start with a better tree (such as the UPGMA option in BEAUti) or a reasonably optimal starting tree.
- Thirdly, if you are using calibrations and estimating or specifying a rate of substitution, it is possible that the initial value for the rate is too small or too large which can also cause the underflow problems when the branches are scaled from units of time to units of substitutions using the rate. One potential reason the rate might be inappropriate is if the calibrations and tree is given in units of millions of years (e.g., a root of 10My) whereas the rate is given in units of substitutions per year (e.g., a rate of 1.0E-8 subst. per year). If you multiply the initial rate with the initial age of the tree it should be in the sort of range you would expect for the genetic divergence of the gene you are using (probably 0.01-1.0 – genes are usually selected to have diversity but not so much that they are saturated).
- We have also added a mechanism for rescaling the likelihood of the tree as it is calculated to avoid numerical underflows. Turning this on should help for large trees in cases where an initial likelihood was zero. To turn this on you need to add ‘useScaling=”true”‘ to the treeLikelihood element(s) in the BEAST XML file. Search for the line:
<distribution ... id="treeLikelihood">
- and change it to:
<distribution ... id="treeLikelihood" scaling="always">
- If you are using codon partitioning there will be more than one treeLikelihood element.
BEAST seems to run but the posterior is reported as ? (question mark) and the none of the values change
- This can happen when using a complex coalescent prior such as the Logistic. The problem is that the particular set of starting parameters may result in probabilities close to zero. Try altering the starting values of the coalescent model parameters.
I am getting the error “java.lang.OutofMemoryError Java heap space”, Is there a way I can increase the memory for the program?
- Look at the Increasing Memory Usage page for details of increasing the memory available to BEAST and the other programs.
I still get an OutofMemoryError in TreeAnnotator even when I use all the memory in the machine
- This is usually due to very large tree files being loaded into TreeAnnotator. It is unlikely that more than 10,000 trees are necessary to give good estimates of well supported nodes so if you are trying to load hundreds of thousands of trees then you may well have problems (and reach the memory limitations of your computer). When running BEAST it is a good idea to adjust the sampling frequency of the log files to obtain about 10,000 samples when you change the chain length. For existing log files, you can use LogCombiner to ‘thin’ out the trees at a lower sampling frequency.
How to set up tips sampling
- Create a prior on the tip you want to sample. In BEAUti you can create it in the priors panel by clicking the little + button at the bottom. You can specify a taxon, say ‘Human’, give it a name like HumanTip that identifies the taxon. Creating the prior ensures the tip value gets logged in the tracelog.
- Make sure the prior is on the tip. For the taxon for which you want to specify a prior distributin select the combobox next to the button with the name of the taxonset on it, and choose “uniform” if you want to specify a range, or “lognormal” (recommended, since it does not stretch out below zero) or “normal” (easier to interpret, but has the danger of putting considerably probability mass in the future) if you want to specify a log-normal or normal distribution. Then, click the little triangle next to the taxon button, and it should show a panel where you can set of the parameters for the distribution, and some statistics are shown to help you get an impression of the parameter choices.
- Also in the priors panel, when you click the little triangle next to the prior, it shows the ‘tipsonly’ flag. Make sure it is checked.
- Add an operator to sample the tip. BEAUti support is not optimal. The best way to do this is to save the file from BEAUti, open it in a text editor and add the following XML fragment:
<operator id="TipDatesRandomWalker" windowSize="1" spec="TipDatesRandomWalker" taxonset="@HumanTip" tree="@Tree.t:dna" weight="1.0"/>
Make sure the taxonset has the same name as the taxonset you specified, so we have taxonset=”@HumanTip” when the tip is called HumanTip (note, leave the @ in there). If there are different data ranges for tips you need to specify a TipDatesRandomWalker for each of them. Make sure the id attribute is unique, for example by numbering them:
<operator id="TipDatesRandomWalker1" windowSize="1" spec="TipDatesRandomWalker" taxonset="@HumanTip" tree="@Tree.t:dna" weight="1.0"/> <operator id="TipDatesRandomWalker2" windowSize="1" spec="TipDatesRandomWalker" taxonset="@ChimpTip" tree="@Tree.t:dna" weight="1.0"/> <operator id="TipDatesRandomWalker3" windowSize="1" spec="TipDatesRandomWalker" taxonset="@GibbonTip" tree="@Tree.t:dna" weight="1.0"/>
Also make sure that the tree refers to the tree that you want to sample (here tree=”@Tree.t:dna”). Typically, it is named “Tree.t:” + the partition name, or if you renamed the tree in the paritions panel, use that name.
Questions that haven’t been done yet:
Creating XML Input
- What is a “Double”?
Complex models and hypothesis testing
- Is it possible to divide the data into partitions and assign different site models to each partition?
Yes,you can. Take a look at these tutorials
The differences between these two tutorial: Tutorial 8: The different genes(locus) have different alignments. For example, the alignment for gene1 are “alignment1”, the alignment for gene2 are “alignment2”, the alignment for gene3 are “alignment3”;
I think according to this tutorial, the different alignments of genes(locus) can have different taxons. For example, the “alignment1” have taxon A,taxon B,taxon C,taxon D; the “alignment2” have taxon A,taxon B,taxon C,taxon E; the “alignment3” have taxon A,taxon B,taxon C,taxon D, taxon E.
Tutorial 6: The different genes(locus) are in one alignment; I think this tutorial will work well if all of your genes have all the same taxa;
How do you test alternative demographic scenarios with BEAST?