ATAC-Seq: QC, peak calling, differential accessibility
Overview
Goal
- Identify regions in the genome of open chromatin (“ATAC-Seq peaks”)
- Find subsets which are specifically open or closed in one of the experimental conditions or cell types
- Look at ATAC-Seq specific biases and learn how to deal with them
ATAC-Seq – Assay for Transposase-Accessible Chromatin using Sequencing
The ATAC-Seq assay was developed around 2013 in the Greenleaf lab and is based on two main observations:
- a Tn5 transposase preloaded with sequencing adapters can be used to simultaneously fragment and tag genomic DNA for high-throughput sequencing library construction
- Tn5 efficiently inserts into nucleosome-free regions
The assays identifies regions of open chromatin (also called accessible regions). Accessibility is a hallmark of active gene regulatory elements such as enhancers, promoters and insulators.
As ATAC-Seq does not focus on specific epigenetic marks, such as H3K27ac in ChIP-Seq, it is a more unbiased assay to study epigenetic regulation.
In high resolution, ATAC-Seq data also enables the study of nucleosome positioning and transcription factor footprints.
- Due to the simplified library construction, bulk ATAC-Seq can be performed with low amounts of starting material (DNA from 500 - 75’000 cells). Its even possible to perform ATAC-Seq of single nuclei.
Example from the ImmGen ATAC-Seq data
The immunological genome project (ImmGen) is an international research collaborative for the study of genome activity and regulation in the immune system of mouse.
It offers several large scale data sets among which is chromatin accessibility on 86 primary immune cell types used to map out the atlas of cis-regulatory elements of the immune system (Yoshida et al. 2019).
Raw data is deposited in GEO under GSE100738.
We use a data subset on T cell differentiation from double-positive thymocytes to either CD4+ or CD8+ T cells.
Data Preprocessing Outside of R
Paired-end reads were mapped with bowtie2 to the mouse genome (UCSC version mm10).
The output was sorted and indexed with samtools.
Duplicated reads were marked with picard.
Data analysis
The data analysis workflow we discuss today consists of the following steps:
Counting reads in genomic windows
Perform quality control such as measuring enrichment and GC bias
Define accessible windows (“peak calling”)
Identify differentially accessibility windows while accounting for bias
Summarize and annotate differential regions
In R, both peak calling (based on a genome-wide windowing approach) and differential accessibility analysis are possible via the Bioconductor csaw package. For an extended documentation look at csawBook.
The required R packages are:
# data import and structures
library(SummarizedExperiment)
library(rtracklayer)
# differential accessibility
library(csaw)
library(edgeR)
# annotation
library(org.Mm.eg.db)
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
# vizualizing coverage tracks
library(Gviz)
# plotting
require(tidyverse)
library(patchwork)
library(ggplot2)
library(ggrepel)
library(ggrastr)
options(ggrastr.default.dpi=150)Counting reads in genomic windows
All of the quality control and analysis we discuss focus on counts within small windows covering the entire genome. We get these counts using the
windowCountsfunction from the csaw package.Its advisable to remove from the analysis some regions in the genome, typically centromere/telomere regions (not mappable), additionally dense repeat regions or other “read-attracting regions” which show a condition-independent over-representation of reads. There is a blacklist of such regions for the mouse genome (mm10 version), available from the ENCODE project at https://github.com/Boyle-Lab/Blacklist/tree/master/lists.
Since the
windowCountsfunction from the csaw package discards only reads “completely contained within the supplied black-listed ranges” we have to extend them by the read size on both ends. As some of the extended regions exceed chromosome boundaries we need to additionally trim them to fit with chromosome sizes (seqlengths).We combine the extended black-listed regions with the mitochondrial (
chrM) genome coordinates (taken from the BSgenome.Mmusculus.UCSC.mm10 package).
blacklist.gr <- rtracklayer::import("data/ATACSeq/mm10-blacklist.v2.bed.gz",
seqinfo=seqinfo(BSgenome.Mmusculus.UCSC.mm10))
# extension
readLength <- 38
blacklist.gr <- trim(
resize(blacklist.gr,width=width(blacklist.gr) + 2*readLength, fix="center")
)
# add mitochondrial genome to black list
chrM.gr <- GRanges("chrM",
IRanges(1,seqlengths(BSgenome.Mmusculus.UCSC.mm10)["chrM"]),
name="Mitochondrial genome")
blacklist.gr <- c(blacklist.gr, chrM.gr)- Next we load information about samples and corresponding BAM files as given in the
alignments.tsvfile. Note that BAM files are available on sciCORE NextCloud. Link and password are shared via ADAM. You only need to download the BAM files, if your want to re-do the window counting step. For all other steps two smallerSummarizedExperimentobjects are also shared via NextCloud that contain all necessary data.
# table with link to BAM files and sample annotation
alignments <- read.table("data/ATACSeq/alignments.tsv", header=TRUE, stringsAsFactors=FALSE)
# BAM files
bamFiles <- setNames(alignments$FileName, alignments$ExternalSampleName)Now we are in place for counting and just have to set a couple of parameters in the
readParamandwindowCountsfunction:We treat the paired-end data as single-end (
pe="none"). The reason is that the two ends of a single fragment are created by two “independent” Tn5 transposase cuts and we therefore treat them as two events.We will remove PCR duplicates as marked by Picard (
dedup=TRUE). Given that duplication levels are quite different, we want to level out any effect due to differences in library complexity.We ignore (=do not count any reads) in black-listed regions.
We additionally focus only on standard chromosomes (parameter
restrict). Unplaced or unlocalized scaffolds are not covered by the blacklist and might contain additional problematic regions.Within
windowCountwe define the window width as 150 bases, roughly the size of a nucleosome. There is a trade-off between spatial resolution (as controlled by window size and window shift) and library size/complexity (=the number of reads observed in individual windows) which determine the power to detect differential accessibility. To identify ATAC-Seq peaks (not footprints!) the size of the nucleosome seems a reasonable scale. However, small library sizes might call for larger windows e.g. 500 bases.Additionally we do not perform any extension of reads (
ext=0). The 5’ end of reads, which is counted, reflects the position of Tn5 transposase cut on the scale of our window size.Initially, we don’t filter windows by their count size (
filter=0). This is only done to get an idea of the un-covered fraction of the mappable genome.By setting the
bin=TRUEwe make sure that the windows do not overlap (step == width).The
BPPARAMargument is used to parallel computing using theBiocParallelpackage (6 processors here).
param <- readParam(pe="none", dedup=TRUE,
discard=blacklist.gr,
restrict=GenomeInfoDb::standardChromosomes(BSgenome.Mmusculus.UCSC.mm10))
windows <- windowCounts(bamFiles, ext=0, width=150,
bin=TRUE, filter=0, param=param,
BPPARAM=BiocParallel::MulticoreParam(6))
# Add sample annotation
colData(windows) <- cbind(colData(windows), alignments)- Finally, we also tabulate the frequency of guanine and cytosine nucleotides (“GC content”) of each window. For the purpose of excluding assembly gaps we also tabulate the frequency of the “N” nucleotide.
seqs <- getSeq(BSgenome.Mmusculus.UCSC.mm10, rowRanges(windows))
rowData(windows) <- Biostrings::letterFrequency(seqs, letters=c('GC', 'N'), as.prob=TRUE) # G or C, N frequency- We remove assembly gaps (windows where more than 2/3 of the bases are N’s) and also exclude windows partially overlapping blacklisted regions or chromosome ends i.e. that are not of length 150 bases.
is.agap <- rowData(windows)[,'N'] > 2/3 # i.e. more than 100 of 150 bases
table(is.agap)/nrow(windows)is.agap
FALSE TRUE
0.97139194 0.02860806
is.smaller <- width(rowRanges(windows)) != 150
table(is.smaller)is.smaller
FALSE TRUE
18170240 22
windows <- windows[!is.agap & !is.smaller,]saveRDS(windows, file="data/ATACSeq/csaw_window_counts_rmDup.rds")Export coverage tracks for IGV
The window counts can be displayed as coverage tracks in genome browsers such as the Integrative Genomics Viewer (IGV).
Here we normalize the counts by library size and use the rtracklayer package to export the coverage per window in bigWig format.
gr <- rowRanges(windows)
CPM <- csaw::calculateCPM(windows, log=FALSE)
for (cn in colnames(CPM)) {
gr$score <- CPM[,cn]
rtracklayer::export.bw(gr, sprintf('bw/%s.bigWig', cn),
format = "bigWig")
}Quality control
Quality control steps start already upstream of read alignment, e.g. using tools such as fastQC. Here we assume all necessary steps (e.g. removing adapter sequences or low quality samples) have been taken and we can focus on analysis specific quality control figures. Important quality controls include assessment of:
Library complexity
Read enrichment in open vs. closed chromatin
GC bias
Library complexity
- When ATAC-Seq is performed from low amounts of input material, the coverage of the genome is limited and when combined with PCR amplification can lead to a low complexity of the final library despite high sequencing depth. This can be assessed by the PCR duplication level or by scanning through the BAM coverage in IGV. In the latter case, low complexity libraries show a fragmented genome coverage and the appearance of piles of read pairs aligning to the same position. These artificial coverage peaks must be removed (through PCR de-duplication) prior to peak calling. In our case, samples don’t suffer from low complexity.
Read enrichment
A common source of variance in ATAC-Seq data is due to differences in the signal-to-noise ratio between samples. In other words, the fraction of mapped reads spend in open versus closed chromatin differs between samples.
The technical source of this variance is often due to differences in:
- the level of dechromatinized DNA of damaged/dead cells (=background signal)
- the ratio of Tn5 to nuclei
- permeabilization of the nuclei
- tagmentation efficiency
Enrichment bias is the ATAC-Seq analog of IP-efficiency differences among ChIP-Seq samples. Here we use the more general term “enrichment bias” because of its multiple levels of origin.
Genome-wide enrichment is assessed using our window counts by:
- ordering windows by increasing count number, build the cumulative sum and divide by the total.
- plot this fraction against the fraction of the covered genome.
For computational efficieny we will just use reads from chromosome 1 for QC.
chr1 <- seqnames(rowRanges(windows)) == 'chr1'
qcWindows <- windows[chr1,]fracLib <- matrix(0, nrow = nrow(qcWindows), ncol = ncol(qcWindows),
dimnames = dimnames(qcWindows))
for (cn in colnames(qcWindows)) {
counts <- as.numeric(assays(qcWindows)$counts[,cn])
counts <- sort(counts, decreasing=FALSE)
fracLib[,cn] <- cumsum(counts)/sum(counts)
}
fracGenome <- seq(0, 1, length.out=nrow(qcWindows)) # since all windows are of the same sizedata <- tibble::tibble(x=fracGenome) |>
dplyr::bind_cols(as_tibble(fracLib)) |>
tidyr::pivot_longer(-x) |>
dplyr::mutate(SampleGroup=str_replace(name, "_rep\\d+$", ""),
Replicate=str_replace(name, ".*_rep(\\d+)$", "R\\1"))
ggplot(data = data,
mapping = aes(x=x, y=value, colour=SampleGroup, linetype=Replicate)) +
rasterize( geom_line() ) +
geom_abline(slope=1, intercept=0, linetype="dotted", colour="black") +
scale_color_brewer(palette = "Set1") +
labs(x="Fraction Genome", y="Fraction Library") +
theme_bw(base_size=15)Some notes on the enrichment figure:
- the bisecting line corresponds to a uniform distribution of reads across windows i.e. when there is no enrichment.
- enrichment strength is measured by the deviation from the bisecting line. A strong enrichment of reads in a small number of windows shows up as a sudden steep rise of the line.
- the percentage of the uncovered (but mappable) genome corresponds to the point where the line starts to rise from 0 library fraction.
- low sequencing depth results in an intial discreteness of the line (many windows with 1, 2, 3, etc. reads)
Despite the overall low sequencing depth, the figure tells us that DP samples are less enriched compared to CD4 and CD8. This fact, together with a higher sequencing depth, leads to a higher coverage of the genome (about 65%), but a lower signal-to-noise ratio in DP.
Promoter accessibility
To better understand the effect of enrichment bias, lets focus on gene promoters. Promoters of expressed genes are generally accessible (=open) and thus a site of read enrichment. By comparing open and closed promoters we can visualize the difference in foreground vs. background signal as seen above.
Ideally we count the reads around transcription start sites (TSS) in use, which would require matching mRNA-Seq data. We will take a simpler approach and use windows overlapping with the most upstream TSS as defined in the “known gene” track from UCSC. Although we use the word “promoter” below, we mean TSS-overlapping windows in the following.
tss <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)
tss <- resize(tss, width=1, fix="start")
overlapTSS <- rowRanges(windows) %over% tss
table(overlapTSS)overlapTSS
FALSE TRUE
17626435 24010
rowData(windows)$overlapTSS <- overlapTSSlogCPM <- csaw::calculateCPM(windows, log=TRUE) |> # normalized by library size only
tibble::as_tibble() |>
dplyr::filter(overlapTSS) |>
tidyr::pivot_longer(everything()) |>
dplyr::mutate(SampleGroup=str_replace(name, "_rep\\d+$", ""),
Replicate=str_replace(name, ".*_rep(\\d+)$", "R\\1"))
ggplot(data=logCPM,
aes(x=value, colour=SampleGroup, linetype=Replicate)) +
geom_vline(xintercept=-1.2) + # threshold separating open and closed promoters
labs(x="logCPM", y="Density") +
scale_color_brewer(palette = "Set1") +
stat_density(geom="line", position="identity") +
theme_bw(base_size=15)The logCPM densities appear bimodal and can be split into average abundance for closed (left peaks) and open (right peaks) promoters using a threshold around logCPM = -1.2. The difference between average abundance of closed and open promoters can serve as a proxy for the signal-to-noise ratio of a sample. It is larger in CD4 and CD8 samples compared to DP samples.
We can compare this finding with read coverage tracks of the data, either using IGV or the Gviz package (see code below).
- Note the different y-scales on the figure! Each track is normalized to the number of mapped reads (excluding all regions mentioned above). DP samples show a smaller dynamic range in the coverage tracks due to lower enrichment. As we are looking at a housekeeping gene, we would expect similar accessibility levels across samples.
MA plot
- When comparing abundances across samples, the consequence of enrichment bias can be further highlighted using mean-average (MA) plots which you already encountered in the mRNA-Seq and ChIP-Seq lectures. Here we additionally highlight promoter windows in red.
logCPM <- csaw::calculateCPM(windows, log=TRUE)
## get list of samples per cell type
ct <- split(colnames(logCPM), colData(windows)$CellType)
## calculate average logCPM per cell type and window
df <- vapply(ct, function(x) {
rowMeans(logCPM[,x,drop=FALSE])
}, rep(0, nrow(windows))) |>
tibble::as_tibble() |>
dplyr::mutate(overlapTSS = rowData(windows)$overlapTSS,
GC = rowData(windows)[,'G|C'])
plotMA <- function(df, s1, s2) {
ggplot(df,
mapping = aes(x = .data[[s1]] + .data[[s2]],
y = .data[[s1]] - .data[[s2]])) +
geom_hex(bins = 100, aes(fill=after_stat(density)^(1/16)), show.legend = FALSE) +
rasterize( geom_point(data = subset(df, overlapTSS), col='red', pch='.') ) +
rasterize( geom_smooth(data = subset(df, overlapTSS), col='plum', se = FALSE) ) +
geom_hline(yintercept = 0, lty = 2) +
labs(x = "A", y="M", title=sprintf("%s vs. %s", s1, s2)) +
theme_bw(base_size = 15)
}
combi <- list(c('DP','CD4'), c('DP','CD8'), c('CD8','CD4'))
gl <- lapply(combi, function(x) plotMA(df, x[1], x[2]))
patchwork::wrap_plots(gl, ncol=2)- The comparisons involving DP show a clear downward trend. Thus, open promoters appear systematically “more accessible” in CD4 or CD8 versus DP. We observe an opposite trend in CD8 versus CD4. Thus enrichment bias can be characterized by a systematic trend depending on the average abundance.
GC bias
The GC content of a sequence influences the amplification rate during PCR. Subtle differences in GC composition between samples can get amplified during library preparation leading e.g. to an enrichment of GC-rich sequences and a suppression of AT-rich sequences.
Similar to the MA-plot above we can highlight any systematic dependence of the log-fold-change (M values) on GC content in the figures below.
plotGC <- function(df, s1, s2) {
ggplot(df,
mapping = aes(x = GC,
y = .data[[s1]] - .data[[s2]])) +
geom_hex(bins = 100, aes(fill=after_stat(density)^(1/16)), show.legend = FALSE) +
rasterize( geom_point(data = subset(df, overlapTSS), col='red', pch='.') ) +
rasterize( geom_smooth(data = subset(df, overlapTSS), col='plum', se = FALSE) ) +
geom_hline(yintercept = 0, lty = 2) +
labs(x = "GC", y="M", title=sprintf("%s vs. %s", s1, s2)) +
theme_bw(base_size = 15)
}
gl <- lapply(combi, function(x) plotGC(df, x[1], x[2]))
patchwork::wrap_plots(gl, ncol=2)- We observe similar, but less pronounced, trends as in the MA figures above.
ATAC-Seq libraries focus on promoters and other regulatory regions which in mammalian genomes are often associated with higher GC (and CpG) content. Therefore an ATAC-Seq enrichment bias will most likely induce a GC bias in the data. The question arises which bias should we consider when normalizing the data? Below we remove the enrichment bias trend by LOESS-normalization as it has a more pronounced systematic trend and the assumption behind removing it is less specific compared to the GC bias removal. But we will check if GC bias is also corrected by removing enrichment bias. Alternatively, more elaborate normalization techniques such as cqn or qsmooth might be applied.
Filter windows (“peak calling”)
We now aim at identifying windows with increased abundance compared with a global background. We will do so jointly across all samples as a means to remove uninteresting, lowly-covered windows. The main aim is to improve the detection of differentially accessible windows in the later analysis.
Here we use a cutoff based on the average global abundance across windows and samples. A global background signal is estimated by counting reads in 10 kb windows. Large bins ensure reliable estimates of read density even if the genome coverage is low. Note, we use the same
paramobject to synchronize both calls.
background <- windowCounts(bamFiles, bin=TRUE, width=10000L, param=param)
colData(background) <- cbind(colData(background), alignments)
saveRDS(background, file="data/ATACSeq/csaw_background_10k_rmDup.rds")- Next we apply
filterWindowsGlobalto compute enrichment values which measure the increase in the read count in each window over the expected count based on the background signal. The resulting filter statistic for each window is defined as the difference between the average window abundance (~ average log2CPM) and the median global background abundance. It is adjusted for the differences in widths between windows and bins.
filterStat <- filterWindowsGlobal(windows, background)
str(filterStat)List of 3
$ abundances : num [1:17650445] -3.25 -2.93 -2.93 -2.84 -3.13 ...
$ back.abundances: num [1:17650445] -2.79 -2.79 -2.79 -2.79 -2.79 ...
$ filter : num [1:17650445] -0.4581 -0.1439 -0.1395 -0.0541 -0.3466 ...
globalBG <- median( filterStat$back.abundances )
globalBG[1] -2.78763
head(filterStat$abundances - globalBG)[1] -0.45806515 -0.14385829 -0.13949719 -0.05407898 -0.34656929 -0.45806515
- To filter for high abundance windows we require at least 3-fold increase in enrichment over the background abundance. Note the filter statistics is on log2-scale. Our required 3-fold increase results in a similar threshold used above to split promoter abundances into open and closed.
keep <- filterStat$filter > log2(3)
sum(keep)[1] 125837
mean(keep)[1] 0.007129395
min(filterStat$abundances[keep])[1] -1.202668
globalBG + log2(3)[1] -1.202668
ggplot(data = tibble(x=filterStat$abundances),
mapping = aes(x)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = globalBG, color='red') +
geom_vline(xintercept = globalBG + log2(3), color='blue') +
scale_y_log10() +
labs(x="Abundance", y="Frequency", title="Histogram of Window Abundances") +
theme_bw(base_size=15)Warning in scale_y_log10(): log-10 transformation introduced infinite
values.
- We subset the
SummarizedExperimentand export the filtered windows to BED format using the export functionality of rtracklayer:
rowData(windows)$enrichment <- filterStat$filter
filtered.windows <- windows[keep,]
rtracklayer::export(rowRanges(filtered.windows),
'data/ATACSeq/filtered_global.bed')We did not use the local filtering option of csaw (see csawBook: filtering) because ATAC-Seq peaks are broader and have more internal structure compared to e.g. ChIP-Seq of H3K4me3 or H3K27ac making the use of local backgrounds more error prone.
Internally, the filterWindowsGlobal function will compare the sample-average abundance of a window with the average background abundance across all samples. This average might be low if a peak appears only in a small subset of the samples. For large datasets with many sample groups, a group specific filtering with subsequent merging of the retained windows is required in order to capture group-specific peaks.
Differential accessibility analysis
- In analogy to differential expression analysis, differential accessibilty analysis will evaluate changes in accessibility of windows across sample groups.
Normalization
- Up to now we normalized the data by library size only. As highlighted in the MA plots above, comparison between any cell type shows a trend of the log-fold-change on the average abundance.
Normalization procedures based on scaling, such as TMM (edgeR) or median of ratio’s (DESeq2), calculate effective library sizes to compensate the effect of composition biases across samples. From the perspective of MA-plots, the effect of scaling the library sizes leads to a global shift of the M values (i.e. the whole cloud of points moves up or down), allowing to either set the M values of low abundance/closed or the high abundance/open windows to zero on average. However, it cannot treat low and high abundance windows separately as required for trended biases.
Offsets allow to normalize this type of trended biases across samples. They are window- and sample-specific normalization factors, similar to the (effective) library size which are just sample-specific normalization factors.
For each sample, a trend curve is fitted to the log-counts against the log-average counts across all samples. The scaled, fitted value for each window and sample is then used as an offset in the generalized linear model underlying the
edgeRfitting procedure to correct for the trend. For more details, have a look at csawBook: trended biases.
filtered.windows <- normOffsets(filtered.windows)logCPM <- csaw::calculateCPM(filtered.windows, log=TRUE, use.offsets = TRUE)
ct <- split(colnames(logCPM), colData(filtered.windows)$CellType)
df <- vapply(ct, function(x) {
rowMeans(logCPM[,x,drop=FALSE])
}, rep(0, nrow(filtered.windows))) |>
tibble::as_tibble() |>
dplyr::mutate(overlapTSS = rowData(filtered.windows)$overlapTSS,
GC = rowData(filtered.windows)[,'G|C'])
patchwork::wrap_plots(lapply(combi, function(x) plotMA(df, x[1], x[2])), ncol=2)patchwork::wrap_plots(lapply(combi, function(x) plotGC(df, x[1], x[2])), ncol=2)- As expected, any systematic trend between mean abundance and differential accessibility is now removed (MA figures). The correction also removed the GC bias trend we saw before highlighting the link between both.
Differential accessibility test
- We are now ready to identify regions of differential accessibility between cell types. We could use DESeq2 package as you already did for mRNA-Seq analysis or we stay in the framework of csaw which is using the edgeR package for differential accessibility testing. Since the
edgeRworkflow works with a list-type object calledDGEListwe first need to convert ourSummarizedExperimentto this object while preserving the normalization offsets:
dgel <- SE2DGEList(filtered.windows)
dgel$offset <- assay(filtered.windows, 'offset')The fitting and testing procedure consists of following steps:
- Define the linear model through a model matrix. Here, the matrix basically assigns samples to cell types.
moma <- model.matrix(~ 0 + CellType, dgel$samples)
moma CellTypeCD4 CellTypeCD8 CellTypeDP
T.DP.Th_rep1 0 0 1
T.DP.Th_rep2 0 0 1
T.4.Th_rep1 1 0 0
T.4.Th_rep2 1 0 0
T.8.Th_rep1 0 1 0
T.8.Th_rep2 0 1 0
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$CellType
[1] "contr.treatment"
- Fit the model.
fit <- glmQLFit(dgel, moma)- Define the comparison of interest e.g. CD4 vs DP (
makeContrasts) and test for differential accessibility (glmQLFtest).
contr <- makeContrasts(CellTypeCD4 - CellTypeDP, levels = colnames(moma))
contr Contrasts
Levels CellTypeCD4 - CellTypeDP
CellTypeCD4 1
CellTypeCD8 0
CellTypeDP -1
res <- glmQLFTest(fit, contrast = contr)- Correct for multiple testing.
tt <- topTags(res, adjust.method = "fdr",
n=Inf, sort.by="none")
tt |> head()Coefficient: 1*CellTypeCD4 -1*CellTypeDP
seqnames start end width strand G.C N overlapTSS enrichment
1 chr1 3042901 3043050 150 * 0.4666667 0 FALSE 1.830482
2 chr1 3043051 3043200 150 * 0.4733333 0 FALSE 2.210324
3 chr1 3043201 3043350 150 * 0.3866667 0 FALSE 2.906892
4 chr1 4258201 4258350 150 * 0.6333333 0 FALSE 1.920591
5 chr1 4258351 4258500 150 * 0.6000000 0 FALSE 2.117133
6 chr1 4496251 4496400 150 * 0.5266667 0 FALSE 1.777089
logFC logCPM F PValue FDR
1 0.07045526 2.213536 0.01360269 0.90728872 0.9669320
2 -0.30027014 2.634225 0.33564409 0.56311130 0.8046877
3 -0.77911343 3.351421 3.59367101 0.05967437 0.2627815
4 -1.29544388 2.409932 5.39075655 0.02141415 0.1422614
5 -1.11641633 2.587516 4.59920384 0.03338870 0.1868594
6 0.80015768 1.998077 1.08381474 0.29930509 0.6086676
- After testing for differential accessibility we store the
tablecomponent of the topTags results in the row data of thefiltered.windowsobject:
rowData(filtered.windows) <- cbind(rowData(filtered.windows), tt$table)
saveRDS(filtered.windows, file='data/ATACSeq/csaw_filtered_window_counts_fitted.rds')Join neighboring windows and compute their significance
The differential analysis results are still on the window level and are not taking into account if differential windows form larger peak regions. The aggregation into larger regions is done in two steps:
by merging neighboring windows
summarizing the differential results for those merged windows
Neighboring windows are joined into larger regions by calling
mergeWindows. It will apply a single linkage algorithm to join neighboring windows with less thantoldistance apart. An optionalmax.widthparameter would keep regions at a maximal width in order to avoid very large regions.
merged <- mergeWindows(rowRanges(filtered.windows), tol=150L)
str(merged,1)List of 2
$ ids : int [1:125837] 1 1 1 2 2 3 3 3 3 4 ...
$ regions:Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
merged$regionGRanges object with 48744 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 3042901-3043350 *
[2] chr1 4258201-4258500 *
[3] chr1 4496251-4496850 *
[4] chr1 4771051-4771200 *
[5] chr1 4780201-4780350 *
... ... ... ...
[48740] chrY 897451-897600 *
[48741] chrY 1010251-1010700 *
[48742] chrY 1245451-1245900 *
[48743] chrY 2224951-2225100 *
[48744] chrY 2225401-2225550 *
-------
seqinfo: 22 sequences from an unspecified genome
summary(width(merged$region)) Min. 1st Qu. Median Mean 3rd Qu. Max.
150.0 150.0 300.0 399.9 450.0 3900.0
- We annotate the peak regions using the
detailRangesfunction ofcsawto extract information about promoters and genes from annotation packages. Peak regions will be assign to features in the neighborhood of 10kb, promoters are defined as 1.5kb upstream and 500 bases downstream of a TSS.
anno <- detailRanges(merged$region,
txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
orgdb=org.Mm.eg.db, promoter=c(1500, 500), dist=10*1e3L)
merged$region$overlap <- anno$overlap
merged$region$left <- anno$left
merged$region$right <- anno$right
merged$regionGRanges object with 48744 ranges and 3 metadata columns:
seqnames ranges strand | overlap left
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr1 3042901-3043350 * |
[2] chr1 4258201-4258500 * | Rp1:-:I
[3] chr1 4496251-4496850 * | Sox17:-:PE Sox17:-:309
[4] chr1 4771051-4771200 * |
[5] chr1 4780201-4780350 * | Mrpl15:-:I Mrpl15:-:2553
... ... ... ... . ... ...
[48740] chrY 897451-897600 * | Kdm5d:+:P
[48741] chrY 1010251-1010700 * | Eif2s3y:+:PE
[48742] chrY 1245451-1245900 * | Uty:-:PE Uty:-:232
[48743] chrY 2224951-2225100 * |
[48744] chrY 2225401-2225550 * |
right
<character>
[1]
[2] Rp1:-:3027
[3]
[4] Mrpl15:-:2006
[5] Mrpl15:-:871
... ...
[48740] Kdm5d:+:188
[48741] Eif2s3y:+:560
[48742]
[48743]
[48744]
-------
seqinfo: 22 sequences from an unspecified genome
- P-values are computed for the merged regions from the individual p-values of the underlying windows. Here its important to control the desired FDR at the region level instead of the window level. At the same time we would like to keep some resolution of the windowing approach i.e. indicate if merged neighboring windows show the same or opposite changes in accessibility. The
combineTestsfunction will summarize the test statistics on the region level and select a representative fold-change for the whole region. For details check csawBook: correction for multiple testing:
region.stats <- combineTests(merged$id, tt$table) |>
tibble::as_tibble() |>
dplyr::mutate(rep.logCPM = tt$table[rep.test,'logCPM'],
Contrast = colnames(contr))
region.stats |>
head()# A tibble: 6 × 10
num.tests num.up.logFC num.down.logFC PValue FDR direction rep.test
<int> <int> <int> <dbl> <dbl> <chr> <int>
1 3 0 0 1.79e-1 3.93e-1 down 3
2 2 0 2 3.34e-2 1.36e-1 down 5
3 4 1 0 6.09e-6 2.31e-4 up 7
4 1 0 0 2.85e-1 5.14e-1 down 10
5 1 0 0 3.99e-1 6.19e-1 up 11
6 4 0 0 6.02e-1 7.75e-1 up 13
# ℹ 3 more variables: rep.logFC <dbl>, rep.logCPM <dbl>, Contrast <chr>
Here every row is a merged peak region with the summary statistics based on the underlying window tests.
We merge the region statistics with the region annotation and export the final results:
peaks <- merged$region
mcols(peaks) <- c(mcols(peaks), region.stats)
write.table(as.data.frame(peaks), file=gzfile('data/ATACSeq/peaks_CD4_vs_DP.tsv.gz'),
row.names=FALSE, sep='\t')- We also export peaks in BED format where each peak is labelled by its cell type specificity:
peaks$name <- ifelse(peaks$rep.logFC > 1.5 & peaks$FDR < 0.05, 'CD4',
ifelse(peaks$rep.logFC < (-1.5) & peaks$FDR < 0.05, 'DP', 'Common'))
rtracklayer::export(peaks, 'data/ATACSeq/peaks_CD4_vs_DP.bed')Inspecting the results
- How many common and cell type specific peaks are there?
table(peaks$direction, peaks$FDR < 0.05)
FALSE TRUE
down 16931 2473
mixed 4703 6
none 1 0
up 19613 5017
Many mixed, significant peaks would indicate a too coarse resolution of the regions. Our resolution looks good.
Plot the merged test results as a Volcano plot:
data <- mcols(peaks) |>
tibble::as_tibble() |>
dplyr::mutate(logFC = rep.logFC,
Significant = FDR < 0.05,
Annotated = overlap != "")
ggplot(data = data,
mapping = aes(x = logFC,
y = -log10(PValue),
color = Significant,
shape = Annotated,
label = overlap)) +
rasterize( geom_point(alpha=0.5) ) +
geom_text_repel(data = data |> filter(Annotated & Significant & abs(logFC) > 1.5),
show.legend = FALSE) +
geom_vline(xintercept = 0, lty = 2) +
scale_color_manual(values=c(`TRUE`='red', `FALSE`='blue')) +
theme_bw()- Check regulation of positive controls: Cd4 (up) and Cd8a (down)
peaks[grep('Cd4:', peaks$overlap)]GRanges object with 2 ranges and 14 metadata columns:
seqnames ranges strand | overlap left
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr6 124885201-124885500 * | Cd4:-:I Cd4:-:5508
[2] chr6 124885951-124886400 * | Cd4:-:I Cd4:-:6258
right num.tests num.up.logFC num.down.logFC PValue
<character> <integer> <integer> <integer> <numeric>
[1] Cd4:-:2580 2 2 0 3.44160e-06
[2] Cd4:-:1680 3 0 0 4.95218e-01
FDR direction rep.test rep.logFC rep.logCPM
<numeric> <character> <integer> <numeric> <numeric>
[1] 0.000147934 up 100902 2.605847 2.80868
[2] 0.698801771 mixed 100904 -0.550344 2.90117
Contrast name
<character> <character>
[1] CellTypeCD4 - CellTy.. CD4
[2] CellTypeCD4 - CellTy.. Common
-------
seqinfo: 22 sequences from an unspecified genome
peaks[grep('Cd8a:', peaks$overlap)]GRanges object with 2 ranges and 14 metadata columns:
seqnames ranges strand | overlap left
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr6 71373301-71373600 * | Cd8a:+:PE
[2] chr6 71376751-71376900 * | Cd8a:+:E Cd8a:+:833
right num.tests num.up.logFC num.down.logFC PValue
<character> <integer> <integer> <integer> <numeric>
[1] Cd8a:+:903 2 0 1 0.00204564
[2] 1 0 0 0.60592470
FDR direction rep.test rep.logFC rep.logCPM
<numeric> <character> <integer> <numeric> <numeric>
[1] 0.0194789 down 97960 -2.015239 3.21142
[2] 0.7772624 up 97962 0.354082 2.13924
Contrast name
<character> <character>
[1] CellTypeCD4 - CellTy.. DP
[2] CellTypeCD4 - CellTy.. Common
-------
seqinfo: 22 sequences from an unspecified genome
- Check a few more regulated sites in IGV and have a look at positive controls such as Runx3 and Cd8 for CD8, Gata3 and Zbtb7b (Thpok) for CD4, Ccr7 (for exit of CD4/CD8 from thymus) etc.
Outlook
- The results of ATAC-Seq analysis usually don’t stand alone. Downstream of differential peak calling and annotation is often some type of overlap and/or filtering by additional data e.g. mRNA-Seq or ChIP-Seq. Further annotation of peaks by transcription factor binding sites followed by some type of enrichment test is often used to identify regulators driving changes in accessibility and gene expression. This will be a focus of the following lecture!
Appendix: plotting coverage tracks with Gviz
Using the Gviz package we can plot specific regions to check the ATAC-Seq profile.
Assuming you have access to the BAM files, the following code will produce the coverage tracks from the example above (housekeeping gene Hprt):
library(Gviz)
chrom <- "chrX"
afrom <- 52977084 - 15000L
ato <- 53002718 + 35000L
T.DP.Th_rep1 <- AlignmentsTrack(bamFiles["T.DP.Th_rep1"], type="coverage",
col="dodgerblue3", fill="dodgerblue3",
name="DP, rep1",
transformation=function(x) x/colData(windows)["T.DP.Th_rep1","totals"]*1e8)
T.4.Th_rep1 <- AlignmentsTrack(bamFiles["T.4.Th_rep1"], type="coverage",
col="dodgerblue3", fill="dodgerblue3",
name="CD4, rep1",
transformation=function(x) x/colData(windows)["T.4.Th_rep1","totals"]*1e8)
T.8.Th_rep1 <- AlignmentsTrack(bamFiles["T.8.Th_rep1"], type="coverage",
col="dodgerblue3", fill="dodgerblue3",
name="CD8, rep1",
transformation=function(x) x/colData(windows)["T.8.Th_rep1","totals"]*1e8)
txTrack <- UcscTrack(genome = "mm10", chromosome = chrom, track="NCBI RefSeq",
table = "ncbiRefSeqCurated", from = afrom, to =ato,
trackType = "GeneRegionTrack",
rstarts = "exonStarts", rends = "exonEnds",
gene = "name2", symbol = "name2",
transcript = "name", strand = "strand",
fontcolor = "black", name = "",
transcriptAnnotation = "symbol", showId=TRUE)
axisTrack <- GenomeAxisTrack(name=chrom, rotation.title=0, col="black", fontcolor="black")
plotTracks(list(T.DP.Th_rep1, T.8.Th_rep1, T.4.Th_rep1, axisTrack, txTrack), chromosome=chrom, from=afrom, to=ato,
fontcolor.title="black", background.title="transparent", showTitle=TRUE, col.axis="black",
sizes=c(1.5,1.5,1.5,0.5,1))- For more details on Gviz have a look at the user guide.
Exercise
- Use the promotor counts provided on the course website and repeat the differential accessibility analysis. How many differentially accessible promoters (FDR < 0.05 and |logFC| > 1.5) do you get for each of the three comparison CD4 vs DP, CD8 vs DP and CD4 vs CD8?
proms <- readRDS('data/ATACSeq/csaw_promoter_counts_rmDup.rds')
proms <- csaw::normOffsets(proms)
dgel <- SE2DGEList(proms)
dgel$offset <- assay(proms, 'offset')
moma <- model.matrix(~ 0 + CellType, dgel$samples)
fit <- glmQLFit(dgel, moma)
contr <- makeContrasts(CellTypeCD4 - CellTypeDP,
CellTypeCD8 - CellTypeDP,
CellTypeCD4 - CellTypeCD8,
levels = colnames(moma))
test <- sapply(colnames(contr),
function(cn) glmQLFTest(fit, contrast = contr[,cn]),
simplify = FALSE)
sapply(test, function(x) table(decideTests(x, p.value = 0.05, lfc = 1.5))) CellTypeCD4 - CellTypeDP CellTypeCD8 - CellTypeDP
-1 21 113
0 23603 23418
1 38 131
CellTypeCD4 - CellTypeCD8
-1 15
0 23638
1 9
- Repeat the differential promoter analysis, this time only correcting for library composition differences. I.e. instead of using gene- and sample-wise offsets, apply TMM-based scaling to account for library composition biases only (Check: edgeR User Guide, section 2.8 Normalization). How many differentially accessible promoters do you get now? Check the Volcano plots. Does it look suspicious to you?
dgel2 <- dgel
dgel2$offset <- NULL
dgel2 <- normLibSizes(dgel2, method="TMM")
fit2 <- glmQLFit(dgel2, moma)
test2 <- sapply(colnames(contr),
function(cn) glmQLFTest(fit2, contrast = contr[,cn]),
simplify = FALSE)
sapply(test2, function(x) table(decideTests(x, p.value = 0.05, lfc = 1.5))) CellTypeCD4 - CellTypeDP CellTypeCD8 - CellTypeDP
-1 107 2754
0 23507 20562
1 48 346
CellTypeCD4 - CellTypeCD8
-1 16
0 23640
1 6
gp <- lapply(colnames(contr), function(ns) {
data <- topTags(test2[[ns]], sort="none", n=Inf)$table |>
tibble::as_tibble() |>
dplyr::mutate(Significant = FDR < 0.05 & abs(logFC) > 1.5)
ggplot(data,
mapping = aes(x = logFC,
y = -log10(PValue),
label = symbol,
color = Significant)) +
rasterize( geom_point(alpha=0.5) ) +
geom_text_repel(data = data |> dplyr::filter(Significant & abs(logFC) > 1.5),
show.legend = FALSE) +
geom_vline(xintercept = 0, lty = 2) +
scale_color_manual(values=c(`TRUE`='red', `FALSE`='blue')) +
labs(title=ns) +
theme_bw()
})
patchwork::wrap_plots(gp, ncol=2)- Above, we highlighted the problem of an apparent GC-bias. How do the results change if we correct for GC bias alone? We can use the same offset-based normalization approach to correct for GC bias. The following code calculates GLM offsets for the promoter counts correcting for systematic GC trends only.
logCounts <- log(assay(proms, 'counts') + 1)
GC <- rowData(proms)[,'G|C']
offs <- NA * logCounts
for (cn in colnames(logCounts)) {
fit <- loessFit(logCounts[, cn], GC)
offs[, cn] <- fit$fitted
}
GC.offset <- edgeR::scaleOffset(colData(proms)$totals, offs) # convert offsets to effective library sizes
assay(proms, "offset") <- GC.offsetAgain, we can visualize the effect of GC-trend correction via MA and GC plots:
logCPM <- csaw::calculateCPM(proms, log=TRUE, use.offsets = TRUE)
ct <- split(colnames(logCPM), colData(proms)$CellType)
df <- vapply(ct, function(x) {
rowMeans(logCPM[,x,drop=FALSE])
}, rep(0, nrow(proms))) |>
as_tibble() |>
mutate(GC = rowData(proms)[,'G|C'],
overlapTSS = TRUE)
combi <- list(c('DP','CD4'), c('DP','CD8'), c('CD8','CD4'))
patchwork::wrap_plots(lapply(combi, function(x) plotGC(df, x[1], x[2])), ncol=2)`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
# Here we already see that correcting for GC bias alone doesnt't remove the enrichment bias
patchwork::wrap_plots(lapply(combi, function(x) plotMA(df, x[1], x[2])), ncol=2)`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
- Repeat the differential analysis for all three comparisons using the GC-trend corrected data. How many up- and down regulated differential peaks per comparison do you get now?
dgel3 <- dgel
dgel3$offset <- GC.offset
fit3 <- glmQLFit(dgel3, moma)
test3 <- sapply(colnames(contr),
function(cn) glmQLFTest(fit3, contrast = contr[,cn]),
simplify = FALSE)
sapply(test3, function(x) table(decideTests(x, p.value = 0.05, lfc = 1.5))) CellTypeCD4 - CellTypeDP CellTypeCD8 - CellTypeDP
-1 21 1651
0 23494 20721
1 147 1290
CellTypeCD4 - CellTypeCD8
-1 8
0 23652
1 2
gp <- lapply(colnames(contr), function(ns) {
data <- topTags(test3[[ns]], sort="none", n=Inf)$table |>
tibble::as_tibble() |>
dplyr::mutate(Significant = FDR < 0.05 & abs(logFC) > 1.5)
ggplot(data,
mapping = aes(x = logFC,
y = -log10(PValue),
label = symbol,
color = Significant)) +
rasterize( geom_point(alpha=0.5) ) +
geom_text_repel(data = data |> dplyr::filter(Significant & abs(logFC) > 1.5),
show.legend = FALSE) +
geom_vline(xintercept = 0, lty = 2) +
scale_color_manual(values=c(`TRUE`='red', `FALSE`='blue')) +
labs(title=ns) +
theme_bw()
})
patchwork::wrap_plots(gp, ncol=2)References
- Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10, 1213-1218
- Corces MR et al. (2017) An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nature Methods 14, 959–962
- Grandi FC et al. (2022) Chromatin accessibility profiling by ATAC-seq. Nat Protoc 17, 1518–1552.
- Yoshida H et al. (2019) The cis-Regulatory Atlas of the Mouse Immune System. Cell. 176(4), 897-912
- Lun AT and Smyth GK (2016) csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows. NAR 44(5):e45
- Lun AT (2023) The csaw book. https://bioconductor.org/books/release/csawBook/
- Chen Y et al. (2025) edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. NAR 53(2):gkaf018
Session info
R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ggrastr_1.0.2
[2] ggrepel_0.9.8
[3] patchwork_1.3.2
[4] Gviz_1.56.0
[5] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0
[6] GenomicFeatures_1.64.0
[7] BSgenome.Mmusculus.UCSC.mm10_1.4.3
[8] BSgenome_1.80.0
[9] BiocIO_1.22.0
[10] Biostrings_2.80.0
[11] XVector_0.52.0
[12] org.Mm.eg.db_3.23.0
[13] AnnotationDbi_1.74.0
[14] edgeR_4.10.0
[15] limma_3.68.2
[16] csaw_1.46.0
[17] rtracklayer_1.72.0
[18] SummarizedExperiment_1.42.0
[19] Biobase_2.72.0
[20] GenomicRanges_1.64.0
[21] Seqinfo_1.2.0
[22] IRanges_2.46.0
[23] S4Vectors_0.50.0
[24] BiocGenerics_0.58.0
[25] generics_0.1.4
[26] MatrixGenerics_1.24.0
[27] matrixStats_1.5.0
[28] BiocStyle_2.40.0
[29] lubridate_1.9.5
[30] forcats_1.0.1
[31] stringr_1.6.0
[32] dplyr_1.2.1
[33] purrr_1.2.2
[34] readr_2.2.0
[35] tidyr_1.3.2
[36] tibble_3.3.1
[37] ggplot2_4.0.3
[38] tidyverse_2.0.0
[39] knitr_1.51
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.18.0
[3] jsonlite_2.0.0 magrittr_2.0.5
[5] ggbeeswarm_0.7.3 farver_2.1.2
[7] rmarkdown_2.31 vctrs_0.7.3
[9] Cairo_1.7-0 memoise_2.0.1
[11] Rsamtools_2.28.0 RCurl_1.98-1.18
[13] base64enc_0.1-6 htmltools_0.5.9
[15] S4Arrays_1.12.0 progress_1.2.3
[17] curl_7.1.0 SparseArray_1.12.2
[19] Formula_1.2-5 htmlwidgets_1.6.4
[21] httr2_1.2.2 cachem_1.1.0
[23] GenomicAlignments_1.48.0 lifecycle_1.0.5
[25] pkgconfig_2.0.3 Matrix_1.7-5
[27] R6_2.6.1 fastmap_1.2.0
[29] digest_0.6.39 colorspace_2.1-2
[31] Hmisc_5.2-5 RSQLite_3.52.0
[33] labeling_0.4.3 filelock_1.0.3
[35] timechange_0.4.0 httr_1.4.8
[37] abind_1.4-8 mgcv_1.9-4
[39] compiler_4.6.0 bit64_4.8.0
[41] withr_3.0.2 backports_1.5.1
[43] htmlTable_2.5.0 S7_0.2.2
[45] BiocParallel_1.46.0 DBI_1.3.0
[47] hexbin_1.28.5 biomaRt_2.68.0
[49] rappdirs_0.3.4 DelayedArray_0.38.1
[51] rjson_0.2.23 tools_4.6.0
[53] foreign_0.8-91 vipor_0.4.7
[55] otel_0.2.0 beeswarm_0.4.0
[57] nnet_7.3-20 glue_1.8.1
[59] restfulr_0.0.16 nlme_3.1-169
[61] checkmate_2.3.4 cluster_2.1.8.2
[63] gtable_0.3.6 tzdb_0.5.0
[65] ensembldb_2.36.0 data.table_1.18.4
[67] hms_1.1.4 metapod_1.20.0
[69] pillar_1.11.1 splines_4.6.0
[71] BiocFileCache_3.2.0 lattice_0.22-9
[73] deldir_2.0-4 bit_4.6.0
[75] biovizBase_1.60.0 tidyselect_1.2.1
[77] locfit_1.5-9.12 gridExtra_2.3
[79] ProtGenerics_1.44.0 xfun_0.57
[81] statmod_1.5.1 stringi_1.8.7
[83] UCSC.utils_1.8.0 lazyeval_0.2.3
[85] yaml_2.3.12 evaluate_1.0.5
[87] codetools_0.2-20 cigarillo_1.2.0
[89] interp_1.1-6 BiocManager_1.30.27
[91] cli_3.6.6 rpart_4.1.27
[93] dichromat_2.0-0.1 Rcpp_1.1.1-1.1
[95] GenomeInfoDb_1.48.0 dbplyr_2.5.2
[97] png_0.1-9 XML_3.99-0.23
[99] parallel_4.6.0 blob_1.3.0
[101] prettyunits_1.2.0 jpeg_0.1-11
[103] latticeExtra_0.6-31 AnnotationFilter_1.36.0
[105] bitops_1.0-9 VariantAnnotation_1.58.0
[107] scales_1.4.0 crayon_1.5.3
[109] rlang_1.2.0 KEGGREST_1.52.0