Scripting languages for BEAST

19 May 2014 by Remco Bouckaert

In a previous blog, we looked at ways in which to program BEAST without having to program in Java, but use some scripting capability and specify formulas and functions in the XML specification. What we saw so far was pretty much limited to writing loggers (and a substitution model). Scripting capabilities are pretty limited, but it would be nice to be able to extend and use BEAST and its libraries without having to delve into Java, and the particular conventions used in BEASTObject classes.

Why scripting: scripting languages are dynamically typed thus simplifying code, and increasing powerful in that few lines of code can produce a lot of functionality, unlike the verbose Java language. When performance is an issue, Java implementations can be considered, noting that most of the time of MCMC calculations in BEAST tend to be in the treelikelihood, which is too complex to script. The BEAST framework can take care of most of tracking which part of the model requires recalculation after an MCMC step, so again this is not something that a script needs to pay too much attention to. Also, scripting provides a fast way to write tests. Furthermore, a script language can be used in an interactive console, which allows exploring various outputs of BEAST analysis.

Scripting is already possible through ScriptEngine implementations in Java, so you can create and access BEAST-objects that way. However, it is quite inconvenient to do so, since setting values of inputs is awkward. A successful scripting implementation should do the following:

  1. Provide a convenient way to create and initialise BEASTObjects, for example, creating a tree from a Newick string.
  2. Provide a convenient way to access Tree structures, e.g. calculate the total length of branches in a clade
  3. Provide a convenient way to implement standard BEASTObjects, such as Distributions, SubstitutionModels, TreeDistributions, and BranchRateModels that can be integrated with BEAST models in an MCMC analysis.
  4. Provide a convenient way to write junit tests.

Syntax

The first choice to make is which syntax to use — roll your own, or use an existing syntax. As much fun as it is to write parsers and design new languages, having Yet Another Scripting Language is not something the world is waiting for. Furthermore, there are plenty languages out there that already are well known and are expressive, well known and convenient enough to do the kind of things we want. Furthermore, there are scripting engines out in the wild that are already optimised for speed — though their default methods of integrating with Java might not suite the way BEAST was designed.

So, the next question becomes: which language should we use? We already disregarded compiled languages such as Java, C++, C, Objective C, C#, etc. There are heaps of interpreted languages out there — Python, Ruby, JavaScript — and statistical scripting languages — WinBUGS, R, stan, matlab, mathematica JAGS.

Popularity: Python, JavaScript, Ruby, Perl, VB and PHP appear currently the most popular according to a number of rankings, including tiobe, lang pop, and code eval blog. Readwrite has 5 ways of measuring language popularity, and again these 6 scripting languages score highest among most rankings. Among the statistical scripting languages, R has gained significant popularity in the last few years. Redmonk, has a interesting plot of languages from stackoverflow and github showing the popularity or R as well as the general purpose scripting languages.

Popularity of scripting languages is important in reducing the average learning curve; a popular syntax is probably familiar for users because they have seen them before.

Standard implementations

The Bean scripting framework and JSR-223 API are the two standard ways of integrating Java and scripting languages. The Bean framework appears dormant (latest release 2011), so let’s have a look at how some of the ScriptEngine implementations do Java-integration.

JavaScript: rhino is a Javascript engine, embedded in J2SE 6. Nashorn is Javascript engine with increased performance, but only available for Java 8. Rhino has a rich set of methods of accessing Java (see documentation). It can access methods of JavaBeans defined as getFoo(), setFoo(), isFoo() as JavaSCript properties. So, let f be a file, then f.name will call f.getName() and f.directory will call f.isDirectory().

Python: Jython a Python interpreter in Java with extensive Java scripting capabilities.

Ruby: JRuby is a Ruby interpreter in Java, also with scripting capabilities.

R: Renjin is an R interpreter in Java written with efficiency in mind, and can be used for scripting with Java. It is aware of JavaBeans, so you can initialise a class like so:

bob <- Customer$new(name='Bob', age=36)

and it will create an object of class Customer using the empty constructor, then call setName("Bob") followed by setAge(36). This comes very close to the way we use initByName method of BEASTObject, which in Java would be

Customer bob = new Customer();
bob.initByName("name","Bob", "age", 36)

So, if the Renjin engine could be coerced into calling the code above, we would have a convenient way to create BEASTObjects.

Fastr is another R interpreter in Java, but it does not implement a ScriptEngine, and also does not support importing of R packages, while renjin does.

Stan, matlab, and mathematica have no interpreters written in Java that I am aware of.

BEASTShell

After spending quite a bit of time looking at the existing languages, an unexpected alternative presented itself: BeanShell, a scripting language with Java syntax, running on the JVM and largely forgotten. However, any Java developer will not have any problems picking up the language — it uses largely the same syntax as Java! Furthermore, it runs on the JVM and has facilities to work with Java libraries. Also, the code is sufficiently clear that it is easy to coerce the language into creating BEASTObjects in a convenient way. The resulting language is called BEASTShell. In the next post, we will have a closer look at the details and see what we can do with BEASTShell.