[see also blog post on Path sampling with a GUI.]
To set up a path sampling analysis, you need the model-selection package (pre BEAST version 2.1.2, it was in the beastii package) and edit the XML for the analysis.
Essentially, you need to wrap the run-element of the original analysis into a run element for the path sampler and rename the run element to mcmc like so
- rename the run element from the XML analysis into mcmc (do not forget to rename the closing element </run> into </mcmc>)
- insert before the mcmc element the following fragment:
- insert </run> after the closing </mcmc> element.
The new run element has the following parameters:
You can BEAST with multiple threads to make it run several chains in parallel (e.g. using the -threads option from the command line).
At the end of the run, the marginal likelihood estimate is printed. Also, in the directory specified by ‘rootdir’ for each step a subdirectory is created with the steps in it.
To recalculate the marginal likelihood estimates from the logs, runNote
where nrofsteps, alpha, rootdir, burninpercentage should match the ones specified in the XML file.
You can use Tracer to inspect the log files for the various steps to make sure that th e burn-in used is sufficiently large.
In the $(BEAST-package)/model-selection/examples directory, there is a file testPathSampler.xml that should run, where $(BEAST-package) is the package directory on your operating system.
See Manage_package#Installation_directories for finding out what $(BEAST-package) is on your machine.
Setting up an analysis for a cluster
The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in the rootdir, one for every thread. These shell scripts can be executed separately. Once all scripts have been executed you need to run the PathSampleAnalyser to get the marginal likelihood estimate.
The number of batch files is determined by the number of threads used for running the XML. These batch files can be submitted to a cluster.
Calculating Bayes factors
At the end of a pathsampling run, the marginal likelihood estimate is reported, e.g.
You can use the value after “marginal L estimate” to compare models.
Say, the marginal likelihood estimate value for model 1 is X and the value for model 2 is Y, then the Bayes factor comparing model s1 and 2 is is X-Y. If this difference is positive, then the Bayes factor is in favour of model 1, if it is negative, it is in favour of model 2.
If you suspect one or more of the steps did not run for long enough judging from a low ESS, you can resume the chain for that path for a little longer by resuming the chain.
The path sampler will write shell scripts (for Mac, Linux) and batch files (for Windows) in each of the step dirs called resume.sh or resume.bat. Execute them for the appropriate step and then run the PathSampleAnalyser.
Older versions of SNAPP (<1.1.10) use a non-standard MCMC loop. To use path sampling with a SNAPP analysis, you have to pre-process your XML file as follows:
1. replace snap.MCMC with beast.core.MCMC
2. copy the stateDistribution outside the run element, for example copy it to just before the run element.
3. See if the XML file runs. If not, and if there are any attributes that are not recognised on the run element, remove these.
Now you can follow the instructions above to do a path-sampling analysis.
Running as mulitple jobs
- In the XML, add doNotRun=’true’ to the run element. This ensures that you can start the analysis by hand, giving you control over which computer runs the jobs.
- Run BEAST on the XML file with -threads X where X is the number of separate jobs you want to run (e.g. the number of cores on the computer).
- Check that in the root directory specified in the XML for creating step directories, there are now X files called run0.sh run1.sh …. runX.sh on Mac or Linux and run0.bat, run1.bat,…, runX.bat on Windows.
- Start each of these scripts.
- Run the PathSampleAnalyser to get an estimate of the marginal likelihood.
Make sure no newline missing in the XML. The line that says
should be two lines
If you still have problems running, check that the script file (assuming your rootdir is /tmp/step/
contains two lines, one starting with “cd” and one starting with “java”. If not, the newlines inserted in the XML did not get through to the batch file.
On windows note is the slashes in the rootdir. Not