Category Archives: Blog

BEAST 1 vs 2 performance benchmarking

March 2016 by Remco Bouckaert, Tim Vaughan, Walter Xie, and Alexei Drummond

Recently, a few users reported problems with BEAST 2 performance, concluding it was worse than BEAST 1. This puzzled us, because BEAST 1 and 2 share the same core algorithms, and both spend most of their time doing phylogenetic likelihood calculations, which is optimised using BEAGLE, a library shared by both programs. In fact, recently we changed the way that BEAST 2 handles proportion invariant categories, saving some phylogenetic likelihood calculations, so in theory it should be faster when using a proportion of invariant sites in the model. So, we became curious whether there are real performance differences between BEAST 1 and 2 and decided to do a benchmark. We expected them to perform roughly the same on GTR and GRT+G analyses, and BEAST 2 to do better on GTR+I and GTR+G+I analyses.

The picture below summarises the speed of BEAST 2 over BEAST 1 using 1, 2, 4 thread(s) in the 3 different operation systems. As you can see the performance is very similar for GTR and GTR+G, with BEAST2 being perhaps slight faster (although this could be due to debugging that BEAST1 performs at the start of the chain):

 

What we did

Analyses

BEAST can do many kinds of analyses, but for the purpose of this benchmark, we want to see whether the TreeLikelihood calculations, which typically dominate the computational time of MCMC runs, are comparable. To see the impact of the way BEAST 2 handles proportion invariant, we want to have an analysis with and without a proportion invariant category. And since many analyses use gamma rate heterogeneity with and without proportion invariants, we end up with four variants:

  • GTR
  • GTR + 4 gamma categories
  • GTR + proportion invariant
  • GTR + 4 gamma categories + proportion invariant

To keep things otherwise simple, we use a Yule tree prior, a strict clock and start with a random tree. To be practical, we set up the analysis in BEAUti 1 and 2, just importing an alignment, choosing the site model, setting the tree prior in BEAST 1 (BEAST 2 uses Yule by default) and save to file. As it turns out, the analyses produced that way are almost the same, but there are some small differences in the operator settings. Due to auto-optimisation, they will eventually become almost the same, but to make the two analyses as equal as possible we edited the XML so that they have the same operator weights and tuning values. Also, the population size used to generate the random starting tree differed so these were made the same as well.

The MCMC runs were run for 1 million steps in order to make them long enough that the slightly different ways extra likelihood calculations are done at the start for debugging purposes has little effect on the outcome. Also, with longer runs JIT compiler differences are eliminated. We took care to run the different programs under the same circumstances, on a computer not doing any other jobs at the time.

This whole process was automated to deal with the various data sets we wanted to test.

Threading

The way to set up threads in BEAST 2 is a bit cumbersome (v2.4.0 improves things a lot), so perhaps the reason is different configurations of threading. Therefore, we want to see what the impact of threading is. That led us to 3 variants:

  • 1 thread BEAGLE SSE
  • 2 thread BEAGLE SSE
  • 4 thread BEAGLE SSE

For BEAST 1, we used the flags -overwrite -beagle_instances. For BEAST 2 we used -overwrite -threads for the SSE runs. For all cases, we verified that both programs use the same settings of BEAGLE as reported at the start of the run.

Data sets

To get an impression of the impact of different data, we randomly selected a number of data sets from treebase.org with a number of sizes. We also used the data sets from the BEAST 1 examples benchmark directory giving a total of 15 data sets.

dataset taxa sites patterns
. . . .
M1044 50 1133 493
M1366 41 1137 769
M1510 36 1812 1020
M1748 67 955 336
M1749 74 2253 1673
M1809 59 1824 1037
M336 27 1949 934
M3475 50 378 256
M501 29 2520 1253
M520 67 1098 534
M755 64 1008 407
M767 71 1082 446
benchmark1 1441 98 593
benchmark2 62 10869 5565
old_benchmark 17 1485 138

Versions

To have a fair comparison, we used the latest versions currently avaiable v1.8.3 and v2.4.0.

Results


The images below show the run time for 1, 2, 4 thread(s) in Linux, where 1.8.3(t0) presents no threading pool for single thread in BEAST 1.8.3.

  • 1 thread:
  • 2 threads:
  • 4 threads:

With increasing number of threads, the difference in run time in seconds decreases, but BEAST 2 is almost always slightly faster than BEAST 1 in these comparisons. However, it turned out that the data sets are too small for four threads to be of much use — the four threaded runs tended to be slower than for two threads, which is optimal for most of these datasets for both BEAST versions. This may also be a function of the hardware used.

Cursory checks of ESSs for BEAST 1 and 2 in Tracer did not show any substantial difference, which is not surprising since the same mixture of operators was used. Also, parameter estimates tended to agree between some randomly selected analyses.

To make sure that it differences are not OS dependent, we ran the analyses on Windows 7, OS X and Linux, but did not find any substantial differences between the operating systems.

Conclusions

To our surprise, we found that BEAST 2 is slightly faster than BEAST 1. This is not what we expected since both programs perform the same analysis using the same BEAGLE library. Although we did our best to compare apples with apples, it is possible we overlooked something, so let us know if you find anything that can explain the differences in performance.

If you want to replicate these runs, you can find them in the benchmark repository on https://github.com/CompEvol/benchmark, which includes the data, some instructions and scripts to run them.

What is new in v2.4.0 and its packages

BEAST improved performance

BEAST is up to 2x faster when using proportion invariant sites and BEAGLE. When using proportion invariant in combination with gamma rate heterogeneity, it still is faster than before.

BEAST always had a “beagle_instance” command line flag, that was essentially ignored. This is now replaced by an flag that actually works, and is named “instances” since it works both with java and BEAGLE tree likelihoods.

By default, the treelikelihood is now threaded for analyses using the Standard template in BEAUti. The number of threads used per treelikelihood is determined by the “instances” flag, but can be overridden using the “threads” attribute of ThreadedTreeLikelihood (which was migrated from the BEASTLabs package).

Further, there are a few minor performance improvements, including faster MRCAPrior handling.

A bug in StartBeastStartState was fixed to work with calibrations with other than the CalibratedYule prior.

BEAUti

The parametric distributions in priors panel now show mean as well as median of the distribution.

There is better taxon management preventing adding numbers to taxon names

The layout tip dates panel was improved to deal with changing window sizes.

A bug in *BEAST clock cloning is fixed.

Allow setting branch length as substitution option on tree logger, which was previously not possible.’

Improved JSON export of BEAST analyses (just use json as extension to the file when saving) and using a library with a more sensible license.

Package manager

The package manager has been changed so it can read all package information (including that of older versions) from a single package file. A bigger change is that BEAST is now treated as a separate package: when you start any of the BEAST applications, it loads the beast.jar file from the user package directory, and if it is not already there, will put a copy in that place. This makes it much easier to upgrade BEAST: just select BEAST in the package list and click the install/update button.

The GUI of the package manager is improved, among other things, showing by colour whether a package can be installed.

For developers

The biggest change with this release is really for developers, as outlined in a separate post here.

Packages

Due to some API changes, all packages have been re-released. Some packages have not been updated yet, but will be soon. New packages expected soon that have not been available before include startbeast2 and correlated characters.

What will change in v2.4.0 for developers

3 February 2016 by Remco Bouckaert

Most significant upcoming changes are

  • Annotated constructor support, so instead of using Input and initAndValidate you can use constructors and use most of the info that now goes into and Input in a @Param annotation. See, for example AnnotatedRunnableTestClass and JSONTest.
  • Better JSON support for BEAST specifications, using a non-evil JSON library.
  • Removal of Exceptions in favour of classes that derive from Exception. This means that many methods that previously were throwing Exceptions, are throwing more specialised Exceptions, or nothing at all (if only RuntimeExceptions are thrown).
  • Cleaned up code, better conforming to Java 8 constructions and naming conventions. Also, attempt to remove the term ‘Plugin’ and replace with BEAST object where appropriate, since the term plugin is not used any more.

Code changes

This is a (still evolving) list of changes for package developers containing possible changes required to make packages compatible with BEAST v2.4.0. Mostly, there are minor method signature changes, and some member variables name changes, with the exception of Exceptions.

Exceptions

However, the biggest change is that throws Exception on initAndValidate will be removed. initAndValidate is supposed to check validity of values of inputs, and initialise. If for some reason this fails, the most appropriate exception to throw is IllegalArgumentException or RuntimeException.

Note you can always throw fewer exceptions than the method derived from, so you can change your code to work with both v2.3 and v2.4 by just removing or specialising the exception that is thrown.

Signature changes

The signature of BeautiDoc.deepCopyPlugin changes: requires an extra argument to tell which partition to copy from.

Access changes

A number of package private member and methods now are protected to allow access from difference packages.
BeautiAlignmentProvider.getAlignments(),

Most inputs are now final, so cannot be re-assigned.

Name changes

SubtreeSlide.fSize is now SubtreeSlide.size
InputEditor.m_plugin is now InputEditor.beastObject
BeautiConfig.inlinePlugin, collapsedPlugins, suppressPlugins are now inlineBEASTObject, collapsedBEASTObjects, suppressBEASTObjects

Deprecated

BEASTObject.outputs is now private. Use BEASTObject.getOutputs() to access the set of outputs.

What is new in v2.3.2 and its packages

Main reason for this release is to get the path corrected so Standard and StarBeast templates are visible under templates menu. In the v2.3.1 they got lost due to a new way of handling search paths. But there are many other reasons to upgrade to this release as pointed out below.

BEAUti

A fix of import of traits from file when partitions are split into say codon positions.

A fix for cloning of scripts with partition information.

Set up weights correctly of FixedMeanRate operator when ascertainment correction is applied. Previously, ascertainment correction columns were included in the weights.

Allows ParameterInputEditor to edit Parameter inputs.

Ensure when focus is on an editable field in Taxon set dialog the last value entered is captured when switching tabs.

BEAST

A “-validate” command line option for was added for parsing XML files without running them. This can be useful for just testing whether an XML file is correct, without having to stop the MCMC run and delete log files that are being created.

The MRCAPrior is now much more efficient. This gives performance improvements when there is little data and many MRCAPriors.

The way of generating random trees has been robustified.

More robust storing the state file on Windows.

LogCombiner

Ensured in the GUI version of LogCombiner burn in editing finished properly. The burn in was previously ignored if the burn in field was edited and the focus left on the edit field when pressing the run button.ds

LogAnalyser

LogAnalyser now has one line per file mode, so you can analyse multiple files and instead of having all info printed as blocks it can output all results for a single log file on a single line. This is handy when importing in R for further post-processing.

A CLI script added in the bin directory for ease of launch.

Error messages

More sensible error messages in many classes, for instance TreeParser, RPNCalculator, NodeReheight.

DensiTree is updated to version 2.2.4.

Packages

New releases of the following packages were created since the release v2.3.1:
* BACTER,
* GEO_SHERE,
* STACEY,
* bModelTest,
* SNAPP,
* BASTA,
* RBS.
* MultiTypTree and
* MASTER.

What is new in v2.3.1 and its packages

BEAUti fixes

Robustify (un)linking of partitions.

Improved Fasta import.

BEAST

Support for probability vectors for uncertain sequences (see examples/testJukesCantorShortUncertain.xml), which are alignments where characters are encoded as a distribution.

Improved error messages.

TreeAnnotator

TreeAnnotator has a flag to make it use less memory. The original implementation loaded all trees in memory, which could take up quite a bit of space, especially because the default memory limits were set to 1GB (now increased to 4GB). Setting the flag causes TreeAnnotator not to load the whole tree set in memory, just a single tree at the time.

TreeAnnotator now recognises -b as flag to specify burn-in.

AppStore, LogCombiners, LogAnalyser

Command line interface are improved for these applications. Use application -help to see details.

Misc

Tree parsing is now based on Antlr instead of a hand crafted parser.

DensiTree updated to v2.2.3.

BinaryCovarion model option to run as reversible model. The original implementation did not take the hidden frequencies in account when setting up the rate matrix, resulting in an irreversible model in case the hidden frequencies deviated from (0.5, 0.5). Set the mode="REVERSIBLE" attribute in the XML to make it do so.

Set log level by environment variable. There are five log-levels (error, warning, info, debug, trace) which from left to right log increasingly more information. Only BEAST has a flag to set the level, but other applications do not. Now, you can set an environment variable beast.log.level to the desired value. Either set it external variable, e.g. export beast.log.level=debug on Linux, or add as directive to java through java -Dbeast.log.level=debug -cp beast.jar .....

BASTA

A package for approximate structured coalescent, now allows Bayesian stochastic variable selection.

BACTER

A package providing limited support for ancestral recombination graphs a la ClonalOrigin, but with all the substitution models and other support provided by BEAST.
Documentation here.

BEASTLabs

More efficient handling of many monophyletic constraints through the MultiMonophyletic prior

A few operators were added to deal with multiple monophyletic constraints, including restricted subtree slide (RestrictedSubtreeSlide), restricted nearest neighbour interchange (NNI), and restricted subtree prune regraft (SPR).

Others

Many refinements in other packages were made as well, so upgrading to the latest version will be worth it.

Common problems with ancestral state reconstruction/discrete phylogeography in BEAST

7 July 2015 by Remco Bouckaert

These are a few issues that can pop up when doing ancestral reconstruction aka discrete phylogegraphy as outlined in the Ancestral reconstruction tutorial following the model of Lemey et al, 2009.

Too many states

When doing a phylogeographical analysis, it is tempting to split up the samples in as fine a geographic classification as the available data allows — for instance by splitting the samples by country of origin. This can lead to a large number of states/countries. The more states are defined, the more rates need to be estimated. For an analysis with N states, at least N-1 rates need to be specified in order to be able to get from any one state to any other (possibly through a set of other states), and this is a requirement for the likelihood to be larger than zero.

So, it depends on the number of samples what a reasonable number of states can be; more samples allow for more states. I have seen a number of cases where it was attempted to use a number of states more than half the number of states. In such cases, it makes sense to merge states (combine different countries into larger regions)

Note that ancestral reconstruction has no notion of how far samples are apart from each other, so it can only estimate rates based on state transitions in the tree informed by locations at the tips. Instead of using ancestral state reconstruction, you could use a form of continuous phylogeography, which tends to have more power since it has a notion of distance built in. If you do not know the exact point locations of the tips, tip locations can be sampled, or approximated by the mean of the region where the sample originated.

Analysis does not start

A common result of defining too many states is that the analysis does not start. You will see an error containing something like this:

Start likelihood: -Infinity after 11 initialisation attempts
P(posterior) = -Infinity (was NaN)
	P(prior) = -Infinity (was NaN)
		P(CoalescentConstant.t:chr1) = -224.91126226515757 (was NaN)
		P(GammaShapePrior.s:chr1) = -1.0 (was NaN)
		P(KappaPrior.s:chr1) = -1.8653600339742873 (was NaN)
		P(nonZeroRatePrior.s:location) = -Infinity (was NaN)
		P(PopSizePrior.t:chr1) = 1.2039728043259361 (was NaN)
		P(relativeGeoRatesPrior.s:location) = -350.99999999999994 (was NaN)
		P(geoclockPrior.c:location) = -6.915086640662835 (was NaN)
	P(likelihood) = NaN (was NaN)
		P(treeLikelihood.chr1) = NaN (was NaN)
		P(traitedtreeLikelihood.location) = NaN (was NaN)

Note the Infinity in the line for nonZeroRatePrior.s:location. This is the prior over the number of rates that are used. By default, this prior is a Poisson prior with mean of 0.693 and offset equal to the number of states minus 1. This is a rather tight prior. At the start, by default all rates are estimated. And though in theory the Poisson prior extends over the range of positive numbers, due to numerical issues, the probability of the number of estimated rates can be large enough that the support can become zero.

Workarounds for this are

  • Reduce the number of states.
  • Start with a wider prior on non-zero rates by increasing the value of lambda, or use a different prior altogether. Once the analysis runs for a little while you can stop it, set the prior back and resume.
  • Set up a start state that contains more zeros. This is a bit fiddly, since it involves editing the XML. Find the rateIndicator parameter ( id="rateIndicato.s:location"). Its value say is true, and it has dimension N. For parameters that have less values than the dimension its value is copied till all N values are available. So, if you have dimension=6 (i.e., we need 6 flags) and value=”true false” it will be copied 3 times, giving “1 0 1 0 1 0″. With value=”true false true” we get “1 0 1 1 0 1”.
    So, what you can do if you have N states is set up a set of values such that only the N-1 rates along the diagonal are true.

Analysis does not converge

There are many reasons an analysis does not converge (there are several sections on it in the book and tips on how to increase ESS). Probably, the first you want to do is make sure rates are set up correctly.

A specific reasons for the ancestral state reconstruction to fail include that there are too many states, hence there is not enough data for the rates to be estimated.

Numeric instability with asymmetric analysis

By default, the ancestral reconstruction uses a symmetric rate matrix, like her on the left.

A B C
A D E
B D F
C E F
A B C
D E F
G H I
J K L

By setting the symmetric attribute to false on the element with spec="SVSGeneralSubstitutionModel", an asymmetric rate matrix is used, which means going from state 1 to 2 can get a different rate than the other way around. This means that potentially the number of rates is doubled. It also means that the rateindicator has a dimension that is doubled.

This can lead to numeric instability for the eigensystem (which does an Eigen value decomposition of the rate matrix), which means your analysis won’t start. This can be solved by changing the default Eigen-decomposition method to a more robust variant by setting the eigenSystem attribute of the substitution model to beast.evolution.substitutionmodel.RobustEigenSystem so the substitution model looks something like this:


    
        0.33333333333
    

How much data do I need for SNAPP?

30 June 2015 by Remco Bouckaert

The unwelcome answer to that question is; it depends.

Number of lineages per species

First of all, there should be more than one lineage (= one haploid sequence) for every species. If there is only a single lineage, there are no coalescent events possible in the branches ending in the tips, and the branch above it will have on average only a single coalescent event. This means that the populations sizes for each of the branches will be informed by only a single coalescent event (on average) and there will be very little signal to inform population sizes. The result is that almost certainly, the population size will be sampled from the prior. And since population size and branch lengths are confounded (large population size means larger branch lengths) and the prior on population sizes is quite broad by default, it may take a lot of time to converge.

So, multiple lineages per species is recommended. Of course, this has to be balanced with the penalty in computational that is incurred. So, you have to experiment a bit to find out what is computationally feasible, and how much signal can be obtained from the data.

Sequence length

In SNAPP, every site in a sequence has its own gene tree that is assumed to be independent of all other gene trees. So, adding sites also means adding gene trees.

When samples are very closely related, all coalescent events happen very closely to the present time (the sampling time). If so and you look at a branch ending in a species, there is only a single lineage left at the top of the branch. This means we are running in the problem described above; there is no signal in the data left to determine population sizes, and convergence will be difficult. There is no point in adding more sites that have this property, since it would just slow down the calculation without adding more information.

When samples are very distantly related, all coalescent events happen in the branch stemming out of the root. This means, there is no topological information in such samples, and every species tree will fit equally well. On top of this, there is no information to inform population sizes, so SNAPP will not give a lot of information, and will have a terrible time to reach convergence.

In between these extremes, there is the goldilocks zone, where samples coalesc not too early, and not too late, but just in at the right time. In this goldilocks zone, there will be some lineage sorting, so branches above those ending in tips will contain some population size information. This is the kind of data you would like to add.

Of course, it is hard to tell beforehand what kind of data you have, so it is hard to tell beforehand what is the ideal sequence length.


Thanks to David Bryant for pointing out most of the above.

Help, BEAST acts weird! (or how to set up rates)

23 June 2015 by Remco Bouckaert

“What is going wrong?” is an often asked question. There can be many things going wrong, but there is one thing that goes wrong more often than other things and it easy to fix.

The first thing you want to check is the settings of the rates in BEAUti. There are two places where rates are set:

  • The site model panel, where the substitution rate is set
  • The clock model panel where the clock rate is set

The final rate used is the product of these rates.

The way to think of the substitution rate is that it is a relative rate with respect to other partitions, while the clock rate is the overall rate for substitutions per site per year (or any other unit of time you choose to use). So, substitution rates will be numbers close to 1, while clock rates tend to be small numbers, such as 4e-9 substitutions per site per year.

Substitution rates

To set up the substitution rates, use this chart:

Standard analysis

For an analysis using the Standard template, you can go to the clock model tab and use this chart to set up the clock rate:

* Partitions can be ordered arbitrarily. With the first partition I mean the one for which there are either calibrations, tip dates or a rate from the literature, which usually is the first partition listed in the list of clocks, but may be a later one as well.

** Set the clock rate to “1e-x” where x is a number that is somewhere in the region you expect it for your data helps to get through burn-in faster. You could leave it at the default value of 1.0, but it just takes longer to reach convergence. Assuming you are using years as units of time, workable values are 1e-9 for nuclear data, 1e-6 for mitochondrial, bacterial and DNA viral data and 1e-4 for RNA viral data, but if you have more specific information about your sequences it helps to use it to specify starting value.

*BEAST analysis

*BEAST analysis are a bit different in that tip dates are not allowed (at the time of writing) and calibrations are on the species tree, not the gene tree. Usually, all clock rates but the first are estimated using a broad prior. To decide whether the first rate should be estimated or not, use the chart above.

If BEAST still acts weird after rates are set up correctly, just post a question on the BEAST user list.

Better BEAUti templates

16 June 2015 by Remco Bouckaert

When developing a BEAUti template, you have to keep in mind that a BEAST model is directed acyclic graph of BEAST objects such as for example shown here. A BEAUti template describes a sub-graph that can be slotted into the overall model. This means the template has to define two things:

  1. A set of BEAST objects
  2. A set of rules on how to put the sub-network into the full graph

Up to now, the rules on how to connect networks to the graph was through BeautiConnector rules specifying the srcID of one of the BEAST objects in the sub-network, and a targetID and inputName specifying which object in the larger network to connect to. Furthermore, connections are not always necessary; if a parameter is kept fixed instead of estimated, there is no need to log it, so there is no need to connect that parameter to any logger. A BeautiConnector only connects conditional on whatever is specified in the if attribute.

Below is a simple template that specifies the HKY substitution model. The BEAST objects are specified in the CDATA section: the HKY substitution model and its kappa parameter and frequencies object, two operators and a prior on kappa. If kappa is estimated (the default) kappa should be connected to the state (see first connector rule). Likewise for frequencies (second rule).


    
    
    
	
    



    
	
	
    





]]>




Scale HKY transition-transversion
parameter of partition s:$(n)


Exchange values of frequencies of partition s:$(n)





HKY transition-transversion
parameter of partition s:$(n)


From BEAST v2.3.0, the connector rules can be integrated in the XML fragment that specify the BEAST objects. At the top level, the target objects are specified through their IDs, and anything that need to be connected can be inserted through nesting these objects. The conditions are encoded in the if attribute in the beauti namespace, so they look like beauti:if followed by the condition. Let’s walk through the example above.

First, we specify the subtemplate header, and the HKY object:


    
    
    
	
    

So far nothing different from above. However, next we will connect the kappa and frequency parameter to the state; just define a state element and specify an idref to the state object. The kappa and frequency parameters will be connected to the stateNode input, for which we specify two stateNode elements, and idrefs to the kappa and frequency parameters specified in the HKY block above.


	
	

This replaces the first two connector rules in the original template.

Next, we define the prior on kappa. Since it will be connected to the prior, we wrap it in a distribution element with idref to prior. The condition on which to connect (that kappa is in the likelihood and is estimated) is specified in the prior element. We added name="distribution" in order to ensure the prior is connected to the distribution input.


	
	    
		
		
	    
	

There is another way to specify conditions which is especially handy when there are several items to be connected that are to be connected under the same condition. The if element specified just inside a top-level element in the CDATA block is interpreted as a condition that applies to all of the elements inside. The condition itself is specified in the cond attribute. For example, the operators can be defined like so;

    

    
        
    
    
        
    

That leaves the connections to the loggers to be defined;


    
                

]]>

And that completes the HKY template.

The hope is that this new way for specifying sub-templates is a bit more intuitive once you are used to writing plain BEAST 2 XML. The idea is that going from an example XML to a sub-template just means

  • remove all XML elements outside sub-graph that is not connected to
  • replace all BEAST objects outside the sub-graph with idrefs
  • add conditions (in beauti:if attributes or if elements)

Then the only thing left is to rename IDs so they math partition information and wrap the BEAUti subtemplate bits around the CDATA section.

Sampling tip dates

9 June 2015 by Remco Bouckaert

To sample the height of leaf nodes, you need to do the following:

  1. Set up a calibration on the tip you want to sample.
  2. Add an operator for scaling the tip.
  3. Add an entry to the logger if you want to log the leaf height

Tip calibration in BEAUti

To set up a calibration, the easiest way to do this is by adding a calibration in BEAUti: in the priors panel, hit the little plus (‘+’) button at the bottom of the screen, then specify the leaf you want to sample and give it a unique name. After hitting the OK button, open the details of the prior by pressing the little triangle next to the taxonset (here Homo_sapiens.prior) and a screen shows up like this:

Make sure the Tipsonly box is checked. If you have multiple tips with the same calibration you can put all of these in the same taxonset. With the tipsonly-flag set, the calibration will be applied to the leafs instead of the most recent common ancestor of the set of tips.

Tip calibration in XML

You can also use add an MRCAPrior to the XML inside the distribution element with id=”prior” like so:


  
    
  
  

Make sure taxon id’s are unique: it is possible a taxon with the id of the tip you want to sample is already specified elsewhere in the XML. If so, when starting BEAST, you will get an error saying something like

Error 104 parsing the xml input file

IDs should be unique. Duplicate id 'Homo_sapiens' found

identifying the id that was already specified.

Also, you want to point to the right tree specified by tree="@Tree.t:tree" in the fragement above.

Tip sample operator in XML

Once the calibration is set up, for each tip you want to sample add an operator to the XML like so:


and edit it as follows:

  • Make sure the id is unique, just changhing the number will do that.
  • The taxonset attribute should refer to the correct taonset.
  • Check that the tree attribute points to the tree you want to sample from. It should be the same tree as in the MRCAPrior.

Add logger entry

To log the leaf height in the trace log, so you can see its mean height, as well as check how well it mixes, add an entry referring to the MRCAPrior to the tracelog. Just place a log entry inside the logger with id=”tracelog” like so:


That’s all.