Background
Analyzing and interpreting the extensive output generated in Bayesian phylodynamic analyses can sometimes be a challenging task. This tutorial aims to provide insights into scientific figure design techniques and resources in R that can effectively visualize BEAST2 results.
By employing appropriate visualizations, we can enhance our understanding of the evolutionary processes and facilitate the effective communication of findings.
In most BEAST2 analyses, the results consist of two files: the posterior sample of phylogenetic trees (.trees file) and the parameters (.log file). These are long files containing MCMC samples and we will need to summarise them to be able to understand the results.
Programs used in this Exercise
BEAST2 - Bayesian Evolutionary Analysis Sampling Trees 2
BEAST2 (http://www.beast2.org) is a free software package for Bayesian evolutionary analysis of molecular sequences using MCMC and strictly oriented toward inference using rooted, time-measured phylogenetic trees. This tutorial assumes that the files came from analyses using the BEAST2 version v2.7.x (Bouckaert et al., 2014; Bouckaert et al., 2019).
R and RStudio
R (https://www.r-project.org) is a programming language and software environment designed for statistical computing, data analysis, and graphical visualization. It is widely used in bioinformatics, data science, and statistical modeling due to its extensive package ecosystem and flexibility.
RStudio (https://www.rstudio.com) is an integrated development environment (IDE) for R that provides a user-friendly interface to write, run, and debug R code. It simplifies data analysis workflows by offering features like script editors, project management, and built-in plotting tools. It is optional but will provide better user experience.
We will install R packages for data processing and visualisation along the way.
Motivation and Tips for BEAST2 results visualization
When generating scientific figures based on abundant Bayesian MCMC results, there are several important considerations to keep in mind.
- Display Uncertainty: Bayesian analysis provides posterior distributions that capture the uncertainty of parameter estimates, including tree topology, branch lengths, and evolutionary rates. Displaying this uncertainty is important for robust interpretation. This can be done by showing the 95% HPD interval around key parameters, like node heights with error bars, or visualizations such as boxplots or density plots for the posterior probability of the parameters. Avoid presenting point estimates without any indication of the associated uncertainty.

- Clarity of Tree Representations: Trees are complex objects depicting numerous relationships between samples. Extracting valuable insights from trees heavily relies on a clear representation. Avoid cluttering the tree with too many branches or overlapping labels. Use annotations to highlight key clades or important nodes in the tree. Don’t forget to include the scale or axis in your tree.
- Tell a story: With the abundance of information in BEAST2 results, it is important to have a clear objective for the main result you wish to communicate. Otherwise, the figure may become excessively complex and lose its meaning. Utilize color, shape, and size variations to emphasize significant patterns. Incorporate the temporal and geographical information present in the parameters, such as utilizing skyline plots or geographical maps instead of visualizing individual parameters. Prior to plotting, summarize or transform the results if needed.

- Experiment. There is not just one way of visualizing a set of results, so do not be afraid of experimenting with different type of plots. Interactive visualization tools can also be useful to interpret and communicate complex data. These tools enable exploration and manipulation of the results in real time, so they will work well for online and interactive presentations (but not for in a PDF document).
Remember, figures should be self-explanatory and provide sufficient context to be understood. Seek feedback from colleagues or mentors to improve the clarity and effectiveness of your figures.
Next, you will find a few code snippets with examples to visualize the most common type of plots from BEAST2 results. It is not an extensive tutorial (you can find a lot of good resources on that at the end of each section). We assume that you have a working knowledge of R and ggplot.
How to visualize BEAST2 summary tree in R with ggtree
To visualize trees there are multiple tools and packages available: FigTree, Densitree, IcyTree, R packages: ggtree, phytools, ape, python packages: Biopython Phylo, ETE toolkits and many more. IcyTree is great for quick visualization and exploration of trees. For publication-ready figures, you can use FigTree and ggtree or other R and python packages. Densitree is a program for qualitative analysis of set of trees. In this tutorial we will focus in R and the ggtree package.
1. Install and load the required packages
The R code below will install the metapackage treedataverse
(Yu, 2025) that includes several useful packages for processing and visualizing trees in R: treeio
(Wang et al., 2020), ggtree
(Yu et al., 2017), ggtreeExtra
(missing reference) and tidytree
(Yu, 2022). We will also use library ggsci
which has nice color palettes (Xiao, 2024).
# Install packages
install.packages("BiocManager")
install.packages("ggsci")
BiocManager::install("YuLab-SMU/treedataverse")
# Load packages
library(treedataverse)
library(ggsci)
2. Read the BEAST summary tree and metadata
In general, we will visualize a summary tree from the posterior samples of trees, e.g. we can create a Maximum Clade Credibility (MCC) tree with TreeAnnotator.
tree_file <- "https://github.com/Taming-the-BEAST/Visualizing-BEAST2-results/raw/refs/heads/main/precooked_runs/Primates.MCC.tree"
tree <- read.beast(tree_file)
3. Plot the tree with backward in time axis
p <- ggtree(tree, mrsd = NULL) +
theme_tree2() + # This ggtree theme will add a forward in time x-axis
labs(caption="million of years")
p <- revts(p) # This will reverse the time in the axis
p

Usually, with epidemiological datasets, you will want a forward in time x-axis in calendar time. In this case, use theme_tree2()
and the parameter mrsd
in the in ggtree()
function equal to the most recent sample date.
4. Customize tree labels
p1 <- p +
geom_tiplab(size = 3, hjust = -0.1) + # add the tip labels
geom_nodelab(aes(label = round(posterior, 3)),
hjust = 1.5, vjust = -0.5, size = 3) + # add posterior value to internal nodes
xlim_tree(20) # adjust x axis to show full tip names
p1

5. Add error bars
p1 <- p1 +
geom_range(range = 'height_0.95_HPD',
color = pal_jco("default")(1), alpha = 0.2, size = 2)
p1

6. Customize tips and color
p1 <- p1 +
geom_point(aes(color = isTip)) + # add tip points and color by tip/internal node
theme(legend.position = "none") + # hide the legend
scale_color_jco() # select color palette
p1

7. Annotate the tree with external data
If you have external data you can add it to your ggtree object and use it to highlight specific branches or clades.
# Data with cleaned label and classification
# Usually you will have this information in a file that you can directly load:
# metadata <- read_csv(metadata_file)
metadata <- as_tibble(tree) %>%
filter(!is.na(label)) %>%
mutate(new_label = gsub("M.", "Macaca", gsub("_", " ", tree@phylo$tip.label)),
group = c("Apes", "Apes", "Apes", "Lemurs",
"Old World monkey", "Old World monkey", "Old World monkey", "Old World monkey",
"Apes", "Apes", "New World monkey", "Tarsiers")) %>%
group_by(group) %>%
mutate(clade = MRCA(tree, node)) %>%
ungroup %>%
select(node, label, new_label, clade, group)
# Highlight clades
p2 <- p +
geom_hilight(data = metadata,
mapping = aes(node = clade, fill = group),
type = "gradient", gradient.direction = 'rt',alpha = 0.8) +
geom_tree() +
scale_fill_npg(name = "")
# Add previous layers on top of highlighted clades
p2 <- p2 %<+% metadata + # you can also left join external data to use it
geom_tiplab(aes(label = new_label),
size = 3, hjust = -0.1) + # add the tip labels
xlim_tree(30) +
geom_nodelab(aes(label = round(posterior, 3)),
hjust = 1.5, vjust = -0.5, size = 3) + # add posterior value to internal nodes
geom_range(range = 'height_0.95_HPD',
color = "grey30", alpha = 0.2, size = 2) +
theme(legend.position = "top") + # This ggtree theme will add a forward in time x-axis
labs(caption="million of years") # This will reverse the time in the axis
p2

Much more in ggtree
The book “Data Integration, Manipulation and Visualization of Phylogenetic Trees” by Guangchuang Yu (author of ggtree
) is a great guide to visualization of phylogenetic trees with ggtree. It has a lot of examples and a clear structure to navigate it. There are many more things you can do, like adding heatmaps as annotations for your tips, or including images. Go and check it out!
How to visualize parameter posterior samples in R with beastio and ggplot2
The posterior samples of the model parameters are in the .log
file. This file is a text file that we can read into R and manipulate as a dataframe. However, there are packages that facilitate the data manipulation and processing of log files. In the following examples we will use the beastio
(du Plessis, 2023) package to this purpose, and ggplot2
(Wickham, 2016) for data visualization. We will also make use from tidyr
(Wickham et al., 2023) and tibble
(M<U+00FC>ller & Wickham, 2023) packages for data manipulation.
1. Install and load the required packages
BiocManager::install("laduplessis/beastio")
library(tidyr)
library(tibble) # for data frame manipulation
library(beastio)
2. Read the .log file
file <- "https://github.com/Taming-the-BEAST/Visualizing-BEAST2-results/raw/refs/heads/main/precooked_runs/primate-mtDNA_long.log"
# Read log file with 10% burnin
trace_mcmc <- beastio::readLog(file, burnin = 0.1, as.mcmc = TRUE) # as mcmc object
trace <- beastio::readLog(file, burnin = 0.1, as.mcmc = FALSE) # as dataframe for ggplot
# Get 95% HPDs for all parameters
hpds <- beastio::getHPDMedian(trace_mcmc)
3. Tidy the data and plot the posterior distribution
Depending on the figure you would like to create, you may need to tidy the data to make it suitable for ggplot
(more on tidy data).
# Select mutation rates parameters and prepare the data for plotting
mut_rates <- beastio::getLogFileSubset(trace, pattern = "mutationRate") %>%
pivot_longer(everything(), names_to = "mutationRate", values_to = "value") %>%
mutate(mutationRate = gsub("mutationRate.", "", mutationRate))
hpds_mut_rates <- beastio::getHPDMedian(getLogFileSubset(trace_mcmc, pattern = "mutationRate")) %>%
as_tibble(rownames = "mutationRate") %>%
mutate(mutationRate = gsub("mutationRate.", "", mutationRate))
4. Choose the Plot Type
Select the appropriate type of plot for visualizing the posterior distribution. Histograms, density plots, boxplots, violins, and ridgeline plots are commonly used to visualize Bayesian posterior distributions.
- Density Plot: Use the
geom_density
function to create a density plot. This will estimate and display the probability density function of the parameter values.
# Density plot
ggplot() +
geom_density(data = mut_rates,
aes(x = value, color = mutationRate, fill = mutationRate),
alpha = 0.5) +
geom_vline(data = hpds_mut_rates, # Add median line
aes(xintercept = med, color = mutationRate), size = 0.4, linetype = 2) +
theme_minimal() +
theme(legend.position = c(0.92, 0.8)) +
scale_color_jco(name = "") +
scale_fill_jco(name = "") +
labs(title = "Primate mitochondrial genome estimated mutation rate") +
xlab("Mutation rate")

- Boxplot and Violin Plot: Use
geom_boxplot
andgeom_violi
n functions to create boxplots and violin plots.
p_mut_rates <- ggplot(data = mut_rates,
aes(x = mutationRate, y = value, color = mutationRate, fill = mutationRate)) +
geom_violin(alpha = 0.2) +
geom_boxplot(width = 0.1, outlier.shape = NA, alpha = 0.2) +
theme_minimal() +
theme(legend.position = "none") +
scale_color_jco(name = "") +
scale_fill_jco(name = "") +
xlab("Mutation rate")
p_mut_rates + labs(title = "Primate mitochondrial genome estimated mutation rate")

Skyline plots
In a skyline analysis, we are interested in visualizing the estimated rates over time. Follow the steps on the Skyline Plots Tutorial for a more detailed explanation on how to create a skyline plot in R with the bdskytools
(du Plessis, 2016) package. You can find the code from that tutorial in the next code snippet:
BiocManager::install("laduplessis/bdskytools")
library(bdskytools)
file_bdsky <- "https://github.com/Taming-the-BEAST/Visualizing-BEAST2-results/raw/refs/heads/main/precooked_runs/hcv_bdsky.log"
trace_bdsky <- readLogfile(file_bdsky, burnin = 0.1)
Re_sky <- getSkylineSubset(trace_bdsky, "reproductiveNumber")
Re_hpd <- getMatrixHPD(Re_sky)
delta_hpd <- getHPD(trace_bdsky$becomeUninfectiousRate)
timegrid <- seq(0, 400, length.out = 101)
Re_gridded <- gridSkyline(Re_sky, trace_bdsky$origin, timegrid)
Re_gridded_hpd <- getMatrixHPD(Re_gridded)
times <- 1993 - timegrid
par(mar=c(5,4,4,4)+0.1)
plotSkylinePretty(range(times), as.matrix(delta_hpd), type='step', axispadding=0.0,
col=pal.dark(cblue), fill=pal.dark(cblue, 0.5), col.axis=pal.dark(cblue),
ylab=expression(delta), side=4, yline=2, ylims=c(0,1), xaxis=FALSE)
plotSkylinePretty(times, Re_gridded_hpd, type='smooth', axispadding=0.0,
col=pal.dark(corange), fill=pal.dark(corange, 0.5), col.axis=pal.dark(corange),
xlab="Time", ylab=expression("R"[e]), side=2, yline=2.5, xline=2, xgrid=TRUE,
ygrid=TRUE, gridcol=pal.dark(cgray), ylims=c(0,3), new=TRUE, add=TRUE)
title("Hepatitis C in Egypt", adj = 0)

Arranging plots
In the previous example of skyline plot, we combined the visualizations of Re
and the becoming uninfectious rate
estimates. Grouping plots together can be useful to have a comprehensive understanding of the data or in identifying patterns and emphasizing key findings.
In R, there are various methods to arrange plots together. You can plot several parameters within a single figure, as shown in the previous example, or use the geom_facet()
function in ggplot
. Alternatively, you can use a package to arrange several independent plots such as gridExtra
, cowplot
, patchwork
or ggpubr
.
In the following example we will show how to use patchwork
(https://patchwork.data-imaginist.com/).
install.packages("patchwork")
You can arrange plots with the operator +
. With the operator |
the plots will be next to each other and with /
they will be placed on top of each other. You can adjust the number of columns, widths and more with plot_layout()
.
library(patchwork)
p_clock_rate <- ggplot(trace) +
geom_density(aes(x = clockRate)) +
geom_vline(xintercept = getHPD(trace$clockRate)[[2]], linetype = 2) +
xlab("Clock rate") +
theme_minimal()
p2 + ggtitle("Primates evolution") +
p_clock_rate / p_mut_rates +
plot_layout(widths = c(0.7, 0.3))

About color palettes
When choosing color palettes in R, there are several resources available that can help you select visually appealing and effective color schemes. Here are some popular resources:
- Color palettes in R:
RColorBrewer
,ggsci
,paleteer
,viridis
,wesanderson
. - You can also create your own color palettes. Online tools like Chroma.js and Viz Palette can help you to select better combinations and make your palette colorblind safe.
And here is a useful article about the topic: best color palettes for scientific figures and data visualizations.
General tips for scientific figures
-
Clear and Concise Communication: Figures should effectively convey the main findings or message of your research. Ensure that the figure’s content is clear, concise, and easily understandable to your target audience. Avoid clutter or unnecessary details that may confuse or distract from the main point.
-
Proper Data Representation: Select the most appropriate visualization type for your data (e.g., bar chart, line plot, scatter plot, heat map) to accurately represent the relationships, patterns, or trends you want to highlight.
-
Adequate Labels and Annotations: Provide clear and informative labels for axes, data points, and other elements within the figure. Use descriptive titles, axis labels, and legends to provide context and help viewers understand the information presented.
-
Consistency and Reproducibility: Maintain consistency in terms of scale, symbols, colors, and formatting across multiple figures within your study. This consistency ensures that comparisons can be made easily and accurately. Additionally, provide sufficient details, such as data sources or statistical methods used, to allow others to reproduce or validate your results.
-
Aesthetics and Readability: Pay attention to the overall visual appeal of your figures. Use appropriate colors, fonts, and sizes to enhance readability and ensure that the figure is visually appealing. Choose colors that are easily distinguishable and consider accessibility guidelines to ensure that your figures can be understood by individuals with color vision deficiencies. Use high-resolution images or scalable vector graphics (SVG) to ensure figures are clear and legible when printed or zoomed.
Resources data visualization in R
There are very good resources for data visualization in R. Here some of our favorites:
- Fundamentals of Data Visualization (Wilke, 2019) A guide to making visualizations that “accurately reflect the data, tell a story, and look professional.” Amazing resource to understand the best way to visualize each type of data and with useful advice on how to communicate a message with it.
- The R Graph Gallery: A comprehensive online collection of diverse and visually appealing data visualizations created using R. Great for inspiration!
- ggplot2 Package Documentation: The official documentation for the ggplot2 package provides detailed explanations and examples for each function and feature. It also includes a Cheatsheet!
- The R Graphics Cookbook (Chang, 2018) offers practical recipes and examples for creating a wide range of plots using ggplot2.
Relevant References
- Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.-H., Xie, D., Suchard, M. A., Rambaut, A., & Drummond, A. J. (2014). BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 10(4), e1003537. https://doi.org/10.1371/journal.pcbi.1003537
- Bouckaert, R., Vaughan, T. G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., Maio, N. D., Matschiner, M., Mendes, F. K., Müller, N. F., Ogilvie, H. A., Plessis, L. du, Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., … Drummond, A. J. (2019). BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Computational Biology, 15(4).
- Heath, T. A., Huelsenbeck, J. P., & Stadler, T. (2014). The fossilized birth-death process for coherent calibration of divergence-time estimates. Proceedings of the National Academy of Sciences, USA, 111, E2957–E2966.
- Yu, G. (2025). treedataverse: Easily Install and Load the ’treedataverse.’
- Wang, L.-G., Lam, T. T.-Y., Xu, S., Dai, Z., Zhou, L., Feng, T., Guo, P., Dunn, C. W., Jones, B. R., Bradley, T., Zhu, H., Guan, Y., Jiang, Y., & Yu, G. (2020). treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution, 37(2), 599–603. https://doi.org/10.1093/molbev/msz240
- Yu, G., Smith, D., Zhu, H., Guan, Y., & Lam, T. T.-Y. (2017). ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution, 8(1), 28–36. https://doi.org/10.1111/2041-210X.12628
- Yu, G. (2022). Data Integration, Manipulation and Visualization of Phylogenetic Treess (1st edition). Chapman and Hall/CRC. https://doi.org/10.1201/9781003279242
- Xiao, N. (2024). ggsci: Scientific Journal and Sci-Fi Themed Color Palettes for ’ggplot2.’ https://CRAN.R-project.org/package=ggsci
- du Plessis, L. (2023). beastio: Functions for pre- and post-processing of BEAST and BEAST2 files.
- Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org
- Wickham, H., Vaughan, D., & Girlich, M. (2023). tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr
- M<U+00FC>ller, K., & Wickham, H. (2023). tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble
- du Plessis, L. (2016). bdskytools: BDSKY Tools.
- Wilke, C. O. (2019). Fundamentals of data visualization: a primer on making informative and compelling figures. O’Reilly Media.
- Chang, W. (2018). R graphics cookbook: practical recipes for visualizing data. O’Reilly Media.
Citation
If you found Taming the BEAST helpful in designing your research, please cite the following paper:
Joëlle Barido-Sottani, Veronika Bošková, Louis du Plessis, Denise Kühnert, Carsten Magnus, Venelin Mitov, Nicola F. Müller, Jūlija Pečerska, David A. Rasmussen, Chi Zhang, Alexei J. Drummond, Tracy A. Heath, Oliver G. Pybus, Timothy G. Vaughan, Tanja Stadler (2018). Taming the BEAST – A community teaching material resource for BEAST 2. Systematic Biology, 67(1), 170–-174. doi: 10.1093/sysbio/syx060