Multi-line strings have always been a bit cumbersome in Java. Either you have to add strings in quotes separated by a + sign, like so:
or use a StringBuilder and append every string like so:
which is even more verbose. Perl supports multi-line strings and the above simplifies to
There is a proposal by Stephen Colebourne to incorporate multi-line strings into Java, but ala, so far that never made it (we are at Java 8 now). Fortunately, hidden in the parser, and absent from the documentation, there is this little gem which tells we can have multi-line strings in BEASTShell. The escape sequence is “””, so the Perl fragment in BEASTShell would be
which is just one characters more than the Perl script! If you want quotes inside the multi-line string, no escape sequence is necessary, just insert them in the string:
(which surely upsets the code colouring algorithm:-)) For triple quotes you still need to add two strings
but, he, would needs that?
The ugliness of regular expressions
Another eye-sore is Java syntax is regular expression handling. Look at this fragment, almost completely from the Java documentation:
(The original code in the documentation prints the m.start() and m.end(), but why would one be interested in the location instead of the matching string?) The equivalent in BEASTShell would be a bit shorter:
But in Perl this can be done even shorter:
Gedit counts these fragments as follows
In summary, regular expressions in Java and — to a lesser extent — BEASTShell are very verbose and ugly.
Improved regular expressions
BEASTShell has a regexp command that returns a list of matches, which can be used like so:
And now we can extend the table of Gedit counts:
This is of course a bit cheating, because we can hide any complex function inside a command. But I think this is justified here since regular expression matching is common enough and verbose enough in Java/BeanShell that a few extra commands are very helpful. Also, it shows how an effectively defined command can help streamline your scripts.
Filter *BEAST analysis
Let’s put this to work to select a set of species taxa in an existing *BEAST analysis. First, define the original species and their sequences.
Then, we define a command to print out a new taxonset, using regexp to get info out of the string:
Assuming the original sequences are in a file called source.xml, we can grab the sequences from the file using: