[see also blog post on Path sampling with a GUI.]
To set up a path sampling analysis, you need the model-selection package (pre BEAST version 2.1.2, it was in the beastii package) and edit the XML for the analysis.
Essentially, you need to wrap the run-element of the original analysis into a run element for the path sampler and rename the run element to mcmc like so
- rename the run element from the XML analysis into mcmc (do not forget to rename the closing element </run> into </mcmc>)
- insert before the mcmc element the following fragment:
- insert </run> after the closing </mcmc> element.
The new run element has the following parameters:
|nrOfSteps||the number of steps to use, default 8|
|alpha||alpha parameter of Beta(alpha,1) distribution used to space out steps, default 0.3If alpha <= 0, uniform intervals are used.|
|rootdir||root directory for storing particle states and log files (default /tmp)|
|mcmc||MCMC analysis used to specify model and operations in each of the particles|
|chainLength||number of sample to run a chain for a single step Default: 100000|
|burnInPercentage||burn-In Percentage used for analysing log files Default: 50|
|preBurnin||number of samples that are discarded for the first step, but not the others Default: 100000|
|value||script for launching a job. $(dir) is replaced by the directory associated with the particle $(java.class.path) is replaced by a java class path used to launch this application $(java.library.path) is replaced by a java library path used to launch this application $(seed) is replaced by a random number seed that differs with every launch $(host) is replaced by a host from the list of hosts|
|hosts||comma separated list of hosts. If there are k hosts in the list, for particle i the term $(host) in the script will be replaced by the (i modulo k) host in the list. Note that whitespace is removed|
|deleteOldLogs||delete existing log files from root dir|
|doNotRun||Set up all files but do not run analysis if true. This can be useful for setting up an analysis on a cluster|
You can BEAST with multiple threads to make it run several chains in parallel (e.g. using the -threads option from the command line).
At the end of the run, the marginal likelihood estimate is printed. Also, in the directory specified by ‘rootdir’ for each step a subdirectory is created with the steps in it.
To recalculate the marginal likelihood estimates from the logs, runNote
where nrofsteps, alpha, rootdir, burninpercentage should match the ones specified in the XML file.
You can use Tracer to inspect the log files for the various steps to make sure that th e burn-in used is sufficiently large.
In the $(BEAST-package)/model-selection/examples directory, there is a file testPathSampler.xml that should run, where $(BEAST-package) is the package directory on your operating system.
See Manage_package#Installation_directories for finding out what $(BEAST-package) is on your machine.
Setting up an analysis for a cluster
The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in the rootdir, one for every thread. These shell scripts can be executed separately. Once all scripts have been executed you need to run the PathSampleAnalyser to get the marginal likelihood estimate.
NB these are actually the log of the marginal likelihood estimates, but that is usually omitted.
The number of batch files is determined by the number of threads used for running the XML. These batch files can be submitted to a cluster.
Calculating Bayes factors
At the end of a pathsampling run, the marginal likelihood estimate is reported, e.g.
You can use the value after “marginal L estimate” to compare models.
Say, the marginal likelihood estimate value for model 1 is X and the value for model 2 is Y, then the Bayes factor comparing model s1 and 2 is is X-Y. If this difference is positive, then the Bayes factor is in favour of model 1, if it is negative, it is in favour of model 2.
If you suspect one or more of the steps did not run for long enough judging from a low ESS, you can resume the chain for that path for a little longer by resuming the chain.
The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in each of the step dirs called resume.sh or resume.bat. Execute them for the appropriate step and then run the PathSampleAnalyser.
Older versions of SNAPP (<1.1.10) use a non-standard MCMC loop. To use path sampling with a SNAPP analysis, you have to pre-process your XML file as follows:
1. replace snap.MCMC with beast.core.MCMC
2. copy the stateDistribution outside the run element, for example copy it to just before the run element.
3. See if the XML file runs. If not, and if there are any attributes that are not recognised on the run element, remove these.
Now you can follow the instructions above to do a path-sampling analysis.
Running as mulitple jobs
- In the XML, add doNotRun=’true’ to the run element. This ensures that you can start the analysis by hand, giving you control over which computer runs the jobs.
- Run BEAST on the XML file with -threads X where X is the number of separate jobs you want to run (e.g. the number of cores on the computer).
- Check that in the root directory specified in the XML for creating step directories, there are now X files called run0.sh run1.sh …. runX.sh on Mac or Linux and run0.bat, run1.bat,…, runX.bat on Windows.
- Start each of these scripts.
- Run the PathSampleAnalyser to get an estimate of the marginal likelihood.
Make sure no newline missing in the XML. The line that says
should be two lines
If you still have problems running, check that the script file (assuming your rootdir is /tmp/step/
contains two lines, one starting with “cd” and one starting with “java”. If not, the newlines inserted in the XML did not get through to the batch file.
On windows note is the slashes in the rootdir. Not