4 July 2018 by Remco BouckaertThis post is about solving the problem when the message "Fatal exception: Could not find a proper state to initialise. Perhaps try another seed" appears just after starting BEAST. Sometimes, just restarting with another seed solves the problem (as suggested by the error message), but if this does not help, there are some hints below on how to fix the issue.
When this error is displayed, also a decomposition of the posterior into all of its components is shown, for example
The message shows for each component of the posterior its value (actually the log of its value) and its previous value -- these should be the same if everything works fine, but since you see this message, this is probably not so.
The first thing to look for when the posterior (shown in the first line) is -Infinity, which of the components is -Infinity. If it is a prior, we have to find out what the prior represents to fix it, and if it is a likelihood.
Common issues with priors
Ancestral ancestor analysis with too many states In the error message listed above, the nonZeroRatePrior.s:location-prior is zero (so the log of it is -Infinity). This usually happens when there are many states for the discrete trait model. The prior on the number of rates that is non-zero is rather tight, and initially all rates are non-zero, so there are n(n-1)/2 non-zero rates (for n discrete states), which can result in the probability of that many rates become zero due to numeric issues. To fix this, increase the mean of the prior so the analysis starts. If you really want to keep the prior as is, you can start with a more relaxed, run the chain for a little while till many rates become zero (which usually happens quite quickly), then stop the chain, change the prior back and resume.
Priors on parameter with starting value out of range If a prior on a parameter is causing problems it is probably because the starting value has zero probability. If the starting value has non-zero support, check the bounds of the parameter -- possibly, the starting value is outside of these bounds. In BEAUti, the parameter starting value with lower and upper bound are displayed next to the prior.
Incompatible MRCA priors When an MRCA-prior is listed as -Infinity, the starting tree is not compatible with the MRCA prior, so you should change the starting tree. However, it is also possible that MRCA priors are not compatible, so no starting tree exists. This happens when one MRCA requires a clade -- say A,B,C -- to be monophyletic, while another MRCA -- say B,C,D -- has a non-overlapping set of taxa that also needs to be monophyletic. Obviously, not both (A,B,C) and (B,C,D) can be monophyletic at the same time. This usually happens when by mistake a taxon is left out in one of the two clades (for example, taxon A could be added to the second clade, and (A,B,C) and (A,B,C,D) can be monophyletic at the same time.
Single taxon MRCA prior In this case, any calibration distribution is defined on the most common recent ancestor of the set of taxa, which with a single taxon is the taxon itself. But that means the distribution is defined on the leaf taxon, which leads to BEAST finding it impossible to find a starting state. Setting the ‘useOriginate’ flag to true, or including more taxa to the MRCA prior may fix this.
Tree prior has numerical issues In this case, the posterior will be NaN instead of -Infinity, as in this example:
Some tree priors do not behave well when ages are in units that requires large numbers (larger than about 500). This can easily be solved by changing the units; if you use years, use millennia instead -- if you use millennia, use millions of years instead -- if you use millions of years, use billions of years instead. Note that by changing all time information, priors, in particular MRCA priors, need to be adjusted as well. Be aware that for normal distributions both mean and standard deviation must be scaled, but for log normal only the mean (in real space) needs to be scaled.
Common issues with likelihoods
Substitution model (in)compatible with BEAGLE Some substitution models (e.g. the pseudo Dollo model) only work with BEAGLE, while others (e.g. the pseudo Dollo covarion model) only works without BEAGLE. The first case is solved by forcing BEAST to use BEAGLE (this is default behaviour if BEAGLE is installed), the second by forcing BEAST to not use BEAGLE, but use the java tree likelihood instead (using the -java option when you start BEAST from the command line).
Starting clock rate too high If the starting tree is relatively young compared to the initial clock rate, this may result in zero tree likelihoods. This is easily fixed by starting with a higher clock rate.
No scaling enforced Using the -beagle_scaling scaling flag, you can force the tree likelihood not to use scaling, which can speed up the likelihood calculation, but can also cause the starting likelihood to be zero if the starting tree is randomly chosen. The trick is to start with a better (higher posterior) starting tree. This can be achieved crudely by using a better starting tree. Alternatively, you can run BEAST for a little while without preventing scaling, then stop and resume without scaling. This ensures the starting point of the MCMC is random (assuming a random starting tree is used), so allows you to check the MCMC will converge from different starting locations as it should.
Older versions importing fasta With older version of BEAST (v2.4 and before) fasta files with nucleotide sequences were not always imported correctly, resulting in sequences being interpreted as having 20 states (`totalcount="20"`) instead of 4 (`totalcount="4"` in the XML). This results in likelihoods being `-Infinity`, and can be solved by changing the `totalcount` attribute in the XML, or by upgrading to a later version.
Let me know if I missed anything and I'll add it to the list.