XML tricks

27 June 2018 by Remco Bouckaert

In this post, we will look at a few BEAST XML editing tricks, including use of plates, XML entities, and parameterising XML.

Plates

Plates are small macros that are expanded before the XML is interpreted and are useful for things that are repeated with only a single value changed. For example, a set of taxa can be defined as:

<taxonset id="taxa" spec="TaxonSet">
	<taxon id="human" spec="Taxon"/>
	<taxon id="chimp" spec="Taxon"/>
	<taxon id="bonob" spec="Taxon"/>
	<taxon id="gorilla" spec="Taxon"/>
	<taxon id="orangutan" spec="Taxon"/>
	<taxon id="siamang" spec="Taxon"/>
</taxonset>

Using a plate, this can be defined as follows:

<taxonset id="taxa" spec="TaxonSet">
	<plate var="n" range="human,chimp,bonob,gorilla,orangutan,siamang">
		<taxon id="$(n)" spec="Taxon"/>
	</plate>
</taxonset>

The plate format is <plate var="$var" range="$range"> where $var is the variable to be replaced in the XML fragment inside the plate element and $range is a comma separated string of values that are used to replace the variable.

If var="n", any place in the XML inside the plate that contains $(n) is replaced by a value from $range. So, the XML above with the plate in it expands to the taxon set that we started with. Plates can also be useful for sets of operators, parameters, priors, etc. especially when there are many partitions in the analysis.

Plates can be nested, but be careful to use different var attributes.

Parameterising XML

Sometimes, it can be useful to get values from outside BEAST as values for the XML. There is now a -D option for BEAST that takes as argument a comma separated string with name-value pairs, e.g.

/path/to/beast -D "chain_length=10000000,log_every=1000" myxml.xml

Every occurrence of $(name) will be replaced by value in the XML in a pre-processing step, so for instance $(chain_length) will be replaced by 10000000 and $(log_every) with 1000 in the example above.

XML Entities

When there are many partitions, and you want to be able to include and exclude partitions quickly, it may be useful to define XML entities in combination with plates. For example, let there be three partitions: COI, ND2, CYTB. We could have a HKY model for each of them, with associated operators requiring the following fragments:

<plate var="n" range="COI,ND2,CYTB">
	<stateNode id="kappy.s:$(n)" spec="RealParameter" value="1.0"/>
</plate>

<plate var="n" range="COI,ND2,CYTB">
	<operator id="kappyScaler.s:$(n)" spec="ScaleOperator" parameter="kappa.s:$(n)" weight="1.0"/>
</plate>

If we want to exclude say COI, we just delete it from the range. But there may be quite a few places, if we also have a prior on kappa, a clock rate for the partition, a treelikelihood, loggers for each of these, etc. So, a more elegant solution is to be able to specify the range. If we replace the partitions "COI,ND2,CYTB" with "&range;", like so:

<plate var="n" range="&range;">
	<stateNode id="kappy.s:$(n)" spec="RealParameter" value="1.0"/>
</plate>

<plate var="n" range="&range;">
	<operator id="kappyScaler.s:$(n)" spec="ScaleOperator" parameter="kappa.s:$(n)" weight="1.0"/>
</plate>
At the top of the XML file, we define an XML entity, just after the XML declaration but before the beast-element, like so:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE note [
<!ENTITY range "COI,ND2,CYTB">
]>
<beast...
By changing the value of the entity, the whole model changes with one simple edit. For example, removing COI from the analysis can be done by just removing COI from the entity definition.