What hardware for BEAST 2

1 February 2024 by Remco Bouckaert

When it comes to running BEAST 2, the choice of hardware can significantly impact the speed and efficiency of the analysis. When you are learning BEAST 2, many of the tutorials can be run on low-powered laptops, so there is no need to purchase expensive hardware just to get into BEAST 2 analyses.

In general, performance of BEAST 2 analyses is compute bound, not memory or disk bound, so prioritise investing in CPUs and/or GPUs. However, running many BEAST analyses at the same time in parallel can change that equation. Running more than 16 BEAST jobs on the same CPU can result in CPUs having to wait for data to be read from RAM to the CPU caches, which then results in lower performance with current CPUs. If you need to run many more jobs, it is probably better to get more than one computer for optimal performance.

Processor (CPU) vs Graphics Processing Unit (GPU)

The specific hardware requirements can vary based on the size of the datasets you plan to analyse. Whether to invest more in CPUs or in GPUs can be seen from the following table based on a large study at CIPRES, where they found the following settings to be optimal:

Data Data       Other beagle  
partitions patterns -threads -instances GPUs parameters  
Nucleotide data            
1 to 3 <750 1 1   -beagle_SSE  
1 to 3 750-2,999 3 3   -beagle_SSE  
1 to 3 3,000-9,999 6 6   -beagle_SSE  
1 to 3 10,000-39,999 1 1 1 -beagle_GPU  
1 to 3 >=40,000 4 4 4 -beagle_GPU -beagle_order 1,2,3,4  
4 to 19 <1,200 1 1   -beagle_SSE  
4 to 19 1,200-4,999 3 3   -beagle_SSE  
4 to 19 5,000-19,999 6 6   -beagle_SSE  
4 to 19 >=20,000 1 1 1 -beagle_GPU  
>=20 any 4 1   -beagle_SSE  
Amino acid data            
1 <5,000 1 1 1 -beagle_GPU  
1 >=5,000 4 4 4 -beagle_GPU -beagle_order 1,2,3,4  
2 to 39 any 1 1 1 -beagle_GPU  
>=40 any 24 1   -beagle_SSE  

(Source: Mark Miller’s post on the user list.)

Other than that, buy the fastest CPU and/or GPU you can afford. AMD’s threadrippers seem to provide good value for money at this point in time.

Random Access Memory (RAM)

Large phylogenetic analyses can require a substantial amount of RAM to hold and manipulate the data effectively. The ability to load the entire dataset into memory can significantly speed up the analysis process. Therefore, a system with a generous amount of RAM is beneficial for handling large datasets efficiently.

Storage

Ample storage is crucial for managing the trace and tree files produced by BEAST 2. Especially when doing simulation studies or lots of large analyses, having fast storage available can help in reducing post processing the files. Solid-state drives (SSDs) can provide faster read/write speeds compared to traditional hard disk drives (HDDs), which can be advantageous for handling large datasets and temporary files.

Operating System

The choice of operating system can also impact the performance of phylogenetic analyses. While many phylogenetic analysis software packages are compatible with multiple operating systems, it is essential to ensure that the chosen hardware is compatible with the preferred operating system.

Cooling

Lots of computations come with lots of heat begin produced. Water cooling is often necessary for overclocking high-end CPUs. Custom water cooling loops can handle higher temperatures and provide better cooling performance, making them suitable for pushing high-end hardware to its limits. Furthermore, water cooling is quite nice to have when you share the office with the computer, since it results in much quieter operation.

Network Connectivity

If you are working with large datasets or collaborating with others, having a fast and reliable network connection can be beneficial for data transfer and remote access to computing resources.

Summary

The best hardware for phylogenetic analyses should prioritize a powerful multi-core processor, ample RAM, fast and sufficient storage, and, in some cases, GPU acceleration.