BEAST 2

Path Sampling

[see also blog post on Path sampling with a GUI.]

Set up

To set up a path sampling analysis, you need the model-selection package (pre BEAST version 2.1.2, it was in the beastii package) and edit the XML for the analysis.

Essentially, you need to wrap the run-element of the original analysis into a run element for the path sampler and rename the run element to mcmc like so

  • rename the run element from the XML analysis into mcmc (do not forget to rename the closing element </run> into </mcmc>)
  • insert before the mcmc element the following fragment:

   
<run spec='beast.inference.PathSampler' chainLength="100000" alpha='0.3' rootdir='/tmp/step' burnInPercentage='80' preBurnin="50000" deleteOldLogs='true'>
	cd $(dir)
 	java -cp $(java.class.path) beast.app.beastapp.BeastMain $(resume/overwrite) -java -seed $(seed) beast.xml

  • insert </run> after the closing </mcmc> element.

The new run element has the following parameters:

nrOfSteps the number of steps to use, default 8
alpha alpha parameter of Beta(alpha,1) distribution used to space out steps, default 0.3If alpha <= 0, uniform intervals are used.
rootdir root directory for storing particle states and log files (default /tmp)
mcmc MCMC analysis used to specify model and operations in each of the particles
chainLength number of sample to run a chain for a single step Default: 100000
burnInPercentage burn-In Percentage used for analysing log files Default: 50
preBurnin number of samples that are discarded for the first step, but not the others Default: 100000
value script for launching a job. $(dir) is replaced by the directory associated with the particle $(java.class.path) is replaced by a java class path used to launch this application $(java.library.path) is replaced by a java library path used to launch this application $(seed) is replaced by a random number seed that differs with every launch $(host) is replaced by a host from the list of hosts
hosts comma separated list of hosts. If there are k hosts in the list, for particle i the term $(host) in the script will be replaced by the (i modulo k) host in the list. Note that whitespace is removed
deleteOldLogs delete existing log files from root dir
doNotRun Set up all files but do not run analysis if true. This can be useful for setting up an analysis on a cluster

You can BEAST with multiple threads to make it run several chains in parallel (e.g. using the -threads option from the command line).

At the end of the run, the marginal likelihood estimate is printed. Also, in the directory specified by ‘rootdir’ for each step a subdirectory is created with the steps in it.

To recalculate the marginal likelihood estimates from the logs, run

 
applauncher PathSampleAnalyser <nrofsteps> <alpha> <rootdir> <burninpercentage>
Note applauncher was called appstore before v2.5.0.

where nrofsteps, alpha, rootdir, burninpercentage should match the ones specified in the XML file.

You can use Tracer to inspect the log files for the various steps to make sure that th e burn-in used is sufficiently large.

Example file

In the $(BEAST-package)/model-selection/examples directory, there is a file testPathSampler.xml that should run, where $(BEAST-package) is the package directory on your operating system.

See Manage_package#Installation_directories for finding out what $(BEAST-package) is on your machine.

Setting up an analysis for a cluster

The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in the rootdir, one for every thread. These shell scripts can be executed separately. Once all scripts have been executed you need to run the PathSampleAnalyser to get the marginal likelihood estimate.

NB these are actually the log of the marginal likelihood estimates, but that is usually omitted.

The number of batch files is determined by the number of threads used for running the XML. These batch files can be submitted to a cluster.

Calculating Bayes factors

At the end of a pathsampling run, the marginal likelihood estimate is reported, e.g.

 
 Step        theta         likelihood   contribution ESS
 0            0            -19678.4774  -27.5334     15.1987      
 1            0.0015       -9950.7265   -113.8373    16.9272      
 2            0.0154       -6848.3262   -285.8846    39.1143      
 3            0.0593       -6153.3842   -580.8173    41.1282      
 4            0.1548       -6039.2476   -1028.7865   58.0664      
 5            0.3258       -6005.1028   -1633.722    24.3111      
 6            0.5982       -5990.5743   -2405.6227   14.0319      
 7            1            -5985.3503   0            8.9284       
 marginal L estimate =-6076.203746592951

You can use the value after “marginal L estimate” to compare models.

Say, the marginal likelihood estimate value for model 1 is X and the value for model 2 is Y, then the Bayes factor comparing model s1 and 2 is is X-Y. If this difference is positive, then the Bayes factor is in favour of model 1, if it is negative, it is in favour of model 2.

Resuming steps

If you suspect one or more of the steps did not run for long enough judging from a low ESS, you can resume the chain for that path for a little longer by resuming the chain.

The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in each of the step dirs called resume.sh or resume.bat. Execute them for the appropriate step and then run the PathSampleAnalyser.

SNAPP

Older versions of SNAPP (<1.1.10) use a non-standard MCMC loop. To use path sampling with a SNAPP analysis, you have to pre-process your XML file as follows:

1. replace snap.MCMC with beast.core.MCMC

2. copy the stateDistribution outside the run element, for example copy it to just before the run element.

3. See if the XML file runs. If not, and if there are any attributes that are not recognised on the run element, remove these.

Now you can follow the instructions above to do a path-sampling analysis.

Running as mulitple jobs

  • In the XML, add doNotRun=’true’ to the run element. This ensures that you can start the analysis by hand, giving you control over which computer runs the jobs.
  • Run BEAST on the XML file with -threads X where X is the number of separate jobs you want to run (e.g. the number of cores on the computer).
  • Check that in the root directory specified in the XML for creating step directories, there are now X files called run0.sh run1.sh …. runX.sh on Mac or Linux and run0.bat, run1.bat,…, runX.bat on Windows.
  • Start each of these scripts.
  • Run the PathSampleAnalyser to get an estimate of the marginal likelihood.

Trouble shooting

Make sure no newline missing in the XML. The line that says

   
cd $(dir) java -cp $(java.class.path) beast.app.beastapp.BeastMain $(resume/overwrite) -java -seed $(seed) beast.xml 

should be two lines

 
cd $(dir) 
java -cp $(java.class.path) beast.app.beastapp.BeastMain $(resume/overwrite) -java -seed $(seed) beast.xml 

If you still have problems running, check that the script file (assuming your rootdir is /tmp/step/

   
/tmp/step/step0/run.bat

contains two lines, one starting with “cd” and one starting with “java”. If not, the newlines inserted in the XML did not get through to the batch file.

On windows note is the slashes in the rootdir. Not

   
rootdir="/temp/step/"

but

   
rootdir="\temp\steps\"

Bayesian evolutionary analysis by sampling trees

Served through Jekyll, customised theme based on the twentyfourteen wordpress theme.