1 December 2024 by Remco Bouckaert
Transmission trees are phylogenies that represent infections spreading through a population. Transmission trees have transmission events representing one host infecting another host. The BREATH package allows simulation of such trees under the transmission likelihood (Colijn et al, 2024), allowing testing of models.
The transmission tree simulator is available as the TransmissionTreeSimulator
app in the BREATH package for BEAST 2.
The parameters that determine the shape and size of the tree are the endTime
, popSize
and sampling and transmission hazards.
The sampling hazard consists of a gamma distribution and a multiplier that can be interpreted as the probability of a host being sampled.
Like the sampling hazard, the transmission hazard consists of a gamma distribution. It also comes with a multiplier representing the average number of other hosts infected by a host, which sets the scale of the tree.
Note that not all hosts will be sampled: some hosts remain unsampled and do not end up in the output tree.
Simulated tree from time 0 to time te indicated on the x-axis. Left: small simulated tree where blue boxes indicate hosts, red tree the tree ending in samples, red+green branches are branches generated by the simulator as within host coalescent trees, red+green+black branches form the underlying phylogeny. Right: tree output by the simulator. Hosts D to H are not sampled, so these are removed from the simulator output. Host E becomes an unsampled host infected by A and infecting C. Hosts G and H form a block of size 2 of unknown hosts, while hosts D and F leave no trace and remain unknown unknowns.
The simulator can be conditioned on producing trees with a fixed number of taxa, but by default a mixture of taxa will be produced. Note that it is not uncommon for most trees to have 1 taxon, depending on parameter settings. Take care when setting parameters values: especially when conditioning on a large number of taxa it may take a long time for such trees to be generated if the parameters are not compatible with trees of that size.
Installing BREATH
- Start BEAUti
- Click to the
File => Manage packages
menu item. - Select
BREATH
in the list of packages and the clickInstall
button. IfBREATH
is not in the list of packages, you must add a package repository first like so: in the package manager, clickPackage repositories
button, then clickAdd URL
in the window that pops up, where you can puthttps://raw.githubusercontent.com/CompEvol/CBAN/master/packages-extra-2.7.xml
in the text field. Then return to the package manager window where theBREATH
package should appear. - Close BEAUti – it needs to restart to pick up the new packages.
Using the simulator
To use the command line version of the simulator, use the applauncher
application (which is part of the BEAST 2 distribution) from a terminal/command prompt. Any of the above options can be used.
Alternatively, start BEAUti (which is also part of the BEAST 2 distribution), select the File/Launch apps
menu, and select TransmissionTreeSimulator
from the list of applications. Click the launch
button to start a GUI version of the simulator, which looks like so:
Simulator options
The simulator has the following options:
endTime
(real number): end time of the study. This determines the length in time of the outbreak. Any hosts not sampled while reaching the end time will be pruned from the tree.popSize
(real number): population size governing the coalescent process that determine coalescent times within a single host.sampleShape
(real number): shape parameter of the sampling intensity functionsampleRate
(real number): rate parameter of the sampling intensity functionsampleConstant
(real number): constant multiplier of the sampling intensity functiontransmissionShape
(real number): shape parameter of the transmission intensity functiontransmissionRate
(real number): rate parameter of the transmission intensity functiontransmissionConstant
(real number): constant multiplier of the transmission intensity functionout
(file name): output file for trees in Newick format. Print to stdout if not specified (optional)trace
(file name): trace output file with end time, tree heights and tree lengths, or stdout if not specified (optional)seed
(long): random number seed used to initialise the random number generator (optional)maxAttempts
(integer): maximum number of attempts to generate coalescent sub-trees (default: 1000)taxonCount
(integer): generate tree with taxonCount number of taxa. Ignored if negative, so different numbers of taxa can be expected in different trees (default: -1)maxTaxonCount
(integer): reject any tree with more than this number of taxa. Ignored if negative (default: -1).treeCount
(integer): generate treeCount number of trees (default: 1)-
directOnly
(truefalse): consider direct infections only, if false block counts are ignored (default: true) -
quiet
(truefalse): suppress some screen output like statistics on how many trees have a certain taxon count (default: false)
How to choose simulation parameters
Not all combinations of parameters lead to sensible trees. It is quite possible that only single taxon trees are generated. Even when choosing sensible parameter combinations, one of the modes of the taxon count distribution will be near 1.
- Choose
transmissionConstant
in [1, 4]. This sets the mean number of transmission events per host and determines the scale of the tree. - Choose
sampleConstant
in (0.5, 1), to sample enough cases that person-to-person transmission inference is likely to be a reasonable task. - Choose
transmissionShape
/transmissionRate
to set the mean inter-infection time (ignoring sampling) . - Choose
sampleShape
/sampleRate
>transmissionShape
/transmissionRate
so that sampling occurs at after the mean generation time, on average. Otherwise it seems likely that the transmission chains will die out quickly. - Choose
endTime
the approximate number of transmission generations. Keep in mind that if the mean time to sampling is considerably greater than the mean time to infection, andtransmissionConstant
is high, the number of infections could grow very large. - Choose
popSize
in such a way that the probability that lineages will coalesce in the required time is pretty high, for examplepopSize
< -transmissionRate
/transmissionShape
log(0.95) .
After choosing the hazard function parameters, a quick sanity check is to plot the gamma distribution densities of the sampling and transmission hazard in the same plot. This plot shows how likely it is for a transmission to happen at a given time and how likely it is for a host to be sampled. For an exponentially growing process, the mean of the sampling hazard should be larger than that of the transmission hazard.
Reducing transmissionConstant
will make a big (nonlinear) difference.
Changing samplingConstant
will not make much difference to transmission or the size of the process (though it will to the number of sampled cases, in a linear way), because if sampling happens, it’s most likely to happen after the peak in transmission anyway.
Troubleshooting
- Do the transmission chains not take off? e.g. there are no more cases to simulate, but the max sampling times are much less than
endTime
?
Solution: IncreasesampleShape
/sampleRate
(delay sampling until longer after transmission), increasetransmissionConstant
, or if the transmission process is taking off but there are too few generations, increaseendTime
. - Is the number of cases exploding exponentially and there are too many?
Solution: See above, but do the opposite. - Within-host coalescent trees are rejected because they don’t coalesce in time?
Solution: DecreasepopSize
. - Within-host coalescent trees have very very short branch lengths? (This could be OK)
Solution: IncreasepopSize
.
References
Colijn C, Hall MD, Bouckaert R. Taking a BREATH (Bayesian Reconstruction and Evolutionary Analysis of Transmission Histories) to simultaneously infer phylogenetic and transmission trees for partially sampled outbreaks. bioRxiv. 2024:2024-07. doi:10.1101/2024.07.11.603095