Roll your own models with BEASTShell

9 June 2014 by Remco Bouckaert

BEASTShell contains a number of classes that can be customised using script fragments, similar to the beast.util.Script class in BEASTLabs that we talked about before. This means that you can create customised loggers, substitution models, clock models, and a few other classes just by changing the XML that specifies a BEAST analysis.

The following table shows the classes, all in the beast.shell package, and the functions that the script is required to specify.

Class Base class/interface Required functions
BSHClockModel Base getRateForBranch(node, meanrate)
BSHDistribution Distribution calculateLogP()
BSHLogger Loggable init(out), log(sample, out), close(out)
BSHOperator Operator double proposal()
BSHParametricDistribution ParametricDistribution density(x), cumulativeProbability(x), inverseCumulativeProbability(p)
BSHSiteModel SiteModelInterface.Base getCategoryCount(), getCategoryOfSite(site, node), getRateForCategory(category, node), getCategoryRates(node), getProportionForCategory(category, node), getCategoryProportions(node)
BSHSubstitutionModel SubstitutionModel.Base getTransitionProbabilities(node, fStartTime, fEndTime, fRate, matrix)
BSHTreeDistribution TreeDistribution calculateLogP(tree, intervals)

Each of these classes have an input called x, which is a list of Functions. By default, the ID of a function is used as global variable in the script, but you can assign a name to a Function using the beast.shell.NamedFunction class. The following

	<x spec='NamedFunction' term='param' function="@birthRate.t:alignment"/>

assigns the name “param” to the value of birthRate.t:alignment, and when using the variable “param” in the script it will take on the value of birthRate.t:alignment.

Each of the script classes also have a value input, which can be used to specify a script. All inline text in an element will be passed to the value-input, which means you can put your script inside a CDATA block in the XML inside the XML element representing the script-object. For example, a BSHDistribution that is proportional to 1/X can be defined as this:

<distribution spec='beast.core.BSHDistribution'>
    <x spec='NamedFunction' term='param' function="@birthRate.t:alignment"/>
    calculateLogP() {return Math.log(1.0/param.getValue());}
</distribution>

Note that to access “param”, which implements Function, we need to call Function methods, like getValue.

For BSHDistributions, we only need to specify calculateLogP, but for BSHLoggers, we need to specify three functions: init (typically used for printing the header in the trace file), log (to print a log value) and close (which for trace files means doing nothing, but for tree files an “End;” is expected). When any of these functions is not specified, an exception is thrown at the time the corresponding method is called. Likewise, an exception is thrown if there is an error in the program or any other unexpected condition is encountered.

To the following specifies a BSHLogger that reports the square of a value if its value is less than 1, otherwise it reports the value.

<log spec='BSHLogger' id='kappaSquared'>
    <x spec='NamedFunction' term='param' function="@kappa.s:alignment"/>
<![CDATA[
        init(out) {
                out.print("kappa^2\t");
        }
        log(sample, out) {
				if (param.getValue() < 1) {
	                out.print(param.getValue()*param.getValue() + "\t");
				} else {
	                out.print(param.getValue() + "\t");
				}
        }
        close(out) {
                // nothing to do
        }
]]>
     </log>

Here, we use a CDATA block so that we do not have to worry about escaping special characters like the < in if (param.getValue() < 1) .... Otherwise, you would have to escape those characters, giving if (param.getValue() &lt; 1) ... making things quite hard to read. All text within the CDATA block is taken as literal text and passed to the value-input.

BEASTShell recognises some XML docment friendly tokens that are mapped on operators, so you can write if (param.getValue() @lt 1) ... instead. The full set of tokens is listed below.

token interpreted as
@gt >
@lt <
@lteq <=
@gteq >=
@or ||
@and &&
@bitwise_and &
@bitwise_or |
@left_shift <<
@right_shift >>
@right_unsigned_shift >>>
@and_assign &=
@or_assign |=
@left_shift_assign <<=
@right_shift_assign >>=
@right_unsigned_shift_assign >>>=

I prefer the CDATA block, but you know about some alternatives now. You do not need to limit yourself to the functions required by the BSH-classes, but can define any other function to be used by the script as well.

The following demonstrates usage of the BSHOperator as a scale-operator on a RealParameter:

<operator id="KappaScaler" spec="BSHOperator" weight="0.1">
    <x spec='NamedFunction' term='param' function="@kappa.s:alignment"/>
<![CDATA[
proposal() {
        old = param.getArrayValue();
        newvalue = Math.exp( (beast.util.Randomizer.nextDouble() - 0.5)) * old;
        if (newvalue < param.getLower() || newvalue > param.getUpper()) {
                return Double.NEGATIVE_INFINITY;
        }
        param.setValue(new Double(newvalue));
        return 0.0;
}
]]>
</operator>

Note how it sets the value of the parameter, which it can do only because the input is a RealParameter but may fail when other objects are used as “param”.