Sequence motifs: from position weigth matrices to motif enrichments
Goal
- Understand what a sequence motifs is and how it is represented
- Learn how to search DNA for potential TF binding sites using sequence motifs
- Learn how to test if differentially accessible peaks (as identified in the previous lecture) are enriched for specific transcription factor binding sites
Required packages
- We will use the following set of packages during the lecture:
## parallel computation
library(BiocParallel)
## plotting
library(ggplot2)
library(ggdendro)
## sequence motif specifc packages
library(universalmotif)
library(MotifDb)
library(TFBSTools)
library(motifmatchr)
library(BSgenome.Mmusculus.UCSC.mm10)
library(rtracklayer)
library(JASPAR2022)
## binned enrichment
library(monaLisa)
library(SummarizedExperiment)
##
library(ComplexHeatmap)
library(circlize)
Sequence motifs
Representing sequence motifs
DNA sequence motifs where first defined from sets of experimentally identified transcription factor binding sites as e.g. studied in EMSA, protein binding arrays, ChIP-Seq or high-throughput SELEX
These experiments yield sets of sequences with high affinity for the TF under study. Lets take a look at such as set identified from human GATA2 binding using classical electrophoretic mobility shift assay (EMSA).
By aligning the sequences and treating positions independently we can derive a compact representation of the GATA2 binding site in form of a position count matrix (PCM), also called position frequency matrix (PFM): at each position we tabulate the occurrence of all possible letters (here: A,C,G,T).
There are quite a number of different Bioconductor packages and classes to represent sequence motifs. The base functionality for handling sequence motifs is defined in the Biostrings package, but it is enhanced by many other packages such MotifDb and TFBSTools. However, these packages define their own classes which hinders inter-operability. The universalmotif package aims at providing a uniform interface to all these different classes. Nevertheless to be able to use all functionality, e.g. fast motif searches, we still rely on all these different packages, even though some of the functionality is redundant.
Given the GATA2 example sequences we can create a position count matrix (PCM) using the
create_motif
function from universalmotif by:
<- readRDS('data/Motifs/hGATA2_binding_sites.rds')
seqs
<- create_motif(seqs, type="PCM", name = "GATA2")
gata2 gata2
Motif name: GATA2
Alphabet: DNA
Type: PCM
Strands: +-
Total IC: 9.19
Pseudocount: 0
Consensus: SGATAN
Target sites: 19
S G A T A N
A 0 0 19 0 19 2
C 9 0 0 0 0 6
G 10 19 0 0 0 8
T 0 0 0 19 0 3
- Another representation which abstracts from the total number of counts is the position probability matrix (PPM). It normalizes the count matrix by the number of observations, resulting in an estimate for the probability of a observing each letter at a given position. Let \(i \in \{1, 2, 3, 4 \}\) label the bases (vertical direction in the motif matrix) and \(j\) the positin in the motif (horizontal direction), then elements of PPM are simply given as
\[ {\rm PPM}_{ij} = \frac{\rm counts_{ij}}{\rm \sum_{i^{\prime}} counts_{i^{\prime}j}} . \]
convert_type(gata2, type="PPM")
Motif name: GATA2
Alphabet: DNA
Type: PPM
Strands: +-
Total IC: 9.19
Pseudocount: 0
Consensus: SGATAN
Target sites: 19
S G A T A N
A 0.00 0 1 0 1 0.11
C 0.47 0 0 0 0 0.32
G 0.53 1 0 0 0 0.42
T 0.00 0 0 1 0 0.16
view_motifs(gata2, use.type = "PPM")
- Closely related is the position weight matrix (PWM), which uses the PPM and a background model to assigns a weight to each letter at each position. Background refers here to the base composition at non-specific sites. Often it is assumed to be either uniform (i.e. \(1/4\) for all A,T,C,G) or the genome-wide nucleotide frequencies. PWM scores are the log-ratio of the PPM and the background frequencies:
\[ {\rm PWM}_{ij} = \log_2\left(\frac{\rm PPM_{ij}}{\rm bg_{i}}\right) \] where \({\rm bg}_{i}\) is the background frequency of base \(i\).
PWM scores are negative if the probability for letter at a given position is below the background probability.
In order to confine scores to finite values one has to make use of pseudo-counts which will make probabilities greater 0.
- The example below uses as a background frequency the occurrence of A,C,G,T on chromosome 1 of the mouse genome.
<- oligonucleotideFrequency(BSgenome.Mmusculus.UCSC.mm10$chr1, width=1)
freq freq
A C G T
56530182 39495313 39467408 56416289
/sum(freq) freq
A C G T
0.2945673 0.2058021 0.2056567 0.2939739
create_motif(seqs, type="PWM", name="GATA2", pseudocount = 1, bkg = freq/sum(freq))
Motif name: GATA2
Alphabet: DNA
Type: PWM
Strands: +-
Total IC: 7.84
Pseudocount: 1
Consensus: SGATAN
Target sites: 19
S G A T A N
A -4.56 -4.56 1.71 -4.56 1.71 -1.39
C 1.17 -4.04 -4.04 -4.04 -4.04 0.60
G 1.32 2.23 -4.04 -4.04 -4.04 1.00
T -4.56 -4.56 -4.56 1.71 -4.56 -0.86
- Yet the final representation of a sequence motif uses information content (IC) as a means to highlight conserved over non conserved sites. The information content matrix (ICM) is a scaled version of the PPM where each position gets scaled by its information content (IC) measured in bits. Most conserved sites have an IC of 2 bits. This type of visualization is sometimes called sequence logo.
view_motifs(gata2, use.type = "ICM")
convert_type(gata2, type="ICM")
Motif name: GATA2
Alphabet: DNA
Type: ICM
Strands: +-
Total IC: 9.19
Pseudocount: 0
Consensus: SGATAN
Target sites: 19
S G A T A N
A 0.00 0 2 0 2 0.02
C 0.47 0 0 0 0 0.06
G 0.53 2 0 0 0 0.08
T 0.00 0 0 2 0 0.03
A detailed derivation of ICM based on Shannon entropy can be found here. The idea is to quantify how much each site is informative about its composition compared to the random background. Long or polarized motifs have generally high IC and can thus be more uniquely matched to the genome.
Some limitations of sequence motifs
PWMs are the ones used to scan sequences and identify matches e.g. potential TF binding sites
In many cases they are a good proxy for the TFs binding affinity, but there are also limitations to the PWM model:
- they use single letter frequencies, i.e., they do not capture dependencies between different sites.
- Independent of affinity, TF concentration is a major determinant for binding site occupancy. While at low TF concentrations the PWM score for a given site reflects the probability of it being bound, at very high TF concentration unspecific binding will dominate.
- The binding preference is also determined by other factors such as chromatin state or co-factors.
Motif databases
There are several resources providing PWM matrices, some are free (i.e., Jaspar), some require a license (i.e., Transfac).
Many resources are directly available through R packages e.g. the MotifDb, TFBSTools and JASPAR2022 packages.
All these databases and packages collect the PWMs from different primary sources and there is a lot of redundancy in them. Even a single resource might have multiple PWMs for the same TF (acquired in different species, by different techniques).
For example lets check the GATA2 transcription factor in the MotifDb package:
library(MotifDb)
query(MotifDb, andStrings=(c("GATA2")))
MotifDb object of length 19
| Created from downloaded public sources, last update: 2022-Mar-04
| 19 position frequency matrices from 12 sources:
| HOCOMOCOv10: 2
| HOCOMOCOv11-core-A: 1
| HOCOMOCOv11-secondary-A: 1
| HOCOMOCOv13: 1
| HOMER: 1
| JASPAR_2014: 1
| JASPAR_CORE: 1
| jaspar2016: 2
| jaspar2018: 4
| jaspar2022: 2
| jaspar2024: 2
| SwissRegulon: 1
| 4 organism/s
| Hsapiens: 14
| Athaliana: 3
| Mmusculus: 1
| other: 1
Hsapiens-HOCOMOCOv13-GATA2.H13CORE.1.P.B
Hsapiens-HOCOMOCOv10-GATA2_HUMAN.H10MO.A
Mmusculus-HOCOMOCOv10-GATA2_MOUSE.H10MO.A
Hsapiens-HOCOMOCOv11-core-A-GATA2_HUMAN.H11MO.0.A
Hsapiens-HOCOMOCOv11-secondary-A-GATA2_HUMAN.H11MO.1.A
...
Hsapiens-jaspar2022-GATA2-MA0036.3
Athaliana-jaspar2022-GATA20-MA1324.1
Hsapiens-jaspar2024-GATA2-MA0036.4
Athaliana-jaspar2024-GATA20-MA1324.1
Hsapiens-SwissRegulon-GATA2.SwissRegulon
query(MotifDb, andStrings=(c("GATA2", "JASPAR2018")))
MotifDb object of length 4
| Created from downloaded public sources, last update: 2022-Mar-04
| 4 position frequency matrices from 1 source:
| jaspar2018: 4
| 2 organism/s
| Hsapiens: 3
| Athaliana: 1
Hsapiens-jaspar2018-GATA2-MA0036.1
Hsapiens-jaspar2018-GATA2-MA0036.2
Hsapiens-jaspar2018-GATA2-MA0036.3
Athaliana-jaspar2018-GATA20-MA1324.1
query(MotifDb, andStrings=(c("GATA2", "JASPAR2018", "Hsapiens")))
MotifDb object of length 3
| Created from downloaded public sources, last update: 2022-Mar-04
| 3 position frequency matrices from 1 source:
| jaspar2018: 3
| 1 organism/s
| Hsapiens: 3
Hsapiens-jaspar2018-GATA2-MA0036.1
Hsapiens-jaspar2018-GATA2-MA0036.2
Hsapiens-jaspar2018-GATA2-MA0036.3
- This set of final GATA2 motifs from the Jaspar2022 database is quite redundant as the following picture shows. Here, we add our matrix to the end of the list.
<- query(MotifDb, andStrings=(c("GATA2", "JASPAR2018", "Hsapiens")))
res <- convert_motifs(res)
res 4]] <- gata2
res[[view_motifs(res, dedup.names = TRUE)
This redundancy complicates the interpretation of motif analysis, however its not clear from the beginning which motifs are superior in a redundant motif set. Often the motif with he highest information content is only reflecting the consensus site or the best scoring site and will not match relevant sites with lower affinity.
For the following analysis we will select PWMs from the “core” of the Jaspar 2022 database using
TFBSTools
. As the number of mouse specific motifs is quite limited, we will select all vertebrate motifs.
<- getMatrixSet(JASPAR2022,
pwms opts = list(matrixtype = "PWM",
collection="CORE",
tax_group = "vertebrates"))
<- sapply(pwms, name) #collect names tfNames
Sequence matching and scoring
- The score of a sequence is calculated by summing the score of the matching letter for each position of the PWM
<- convert_type(gata2, type="PWM", pseudocount = 1)
gata2PWM 'motif'] gata2PWM[
S G A T A N
A -4.3219281 -4.321928 1.944858 -4.321928 1.944858 -1.1520031
C 0.8875253 -4.321928 -4.321928 -4.321928 -4.321928 0.3219281
G 1.0356239 1.944858 -4.321928 -4.321928 -4.321928 0.7224660
T -4.3219281 -4.321928 -4.321928 1.944858 -4.321928 -0.6214884
score_match(gata2PWM, "CAAGAT")
[1] -4.488
0.8875253 + (-4.321928) + 1.944858 + (-4.321928) + 1.944858 + (-0.6214884)
[1] -4.488103
- We can also assess the minimal and maximal score of the motif.
motif_range(gata2PWM)
min max
-22.757 9.533
#by hand
<- 1.0356239 + 1.944858 + 1.944858 + 1.944858 + 1.944858 + 0.7224660
mot.max <- - 4.3219281 - 4.321928 - 4.321928 - 4.321928 - 4.321928 - 1.1520031
mot.min
sprintf("Maximal score of the motif: %s", mot.max)
[1] "Maximal score of the motif: 9.5375219"
sprintf("Minimal score of the motif: %s", mot.min)
[1] "Minimal score of the motif: -22.7616432"
Potential target sites of TFs (e.g., genomic locations potentially bound by the TF) are showing scores greater than 0, and non-specific sites are showing scores below 0.
However a score threshold of 0 will in almost all cases (except very polarized motifs) pick far too many non-relevant sites when e.g. applied to a set of peak or promoter sequences.
Ideally, each PWM would be accompanied by a score threshold which is derived experimentally e.g. using ChIP-Seq data by contrasting true binding sites with non-functional sites. Such an absolute score would reflect directly the critical binding affinity for each TF. This approach was used in the HOMER database, but doesn’t guarantee an optimal score threshold when applied to other sets of sequences. This approach using an absolute score is also implemented in
monalisa
package as the default motif scoring.There are other two alternatives implemented in different Bioconductor packages:
Use a simple similarity cutoff, e.g., 80% of the highest possible PWM score (
matchPWM
approach form Biostrings). This approach tends to favor short motifs.Calculate p-values per motif and threshold them. Here, first a sequence background needs to be estimated for each motif. This process generates random sequences of the same length as the motif sequence and scores them. The threshold \(T\) is indirectly chosen such, that the probability, that the background distribution would generate the set of sequences all with score \(\ge T\) is less likely than the p-value. Usually a relatively stringent p-value needs to be chosen. Then short motifs or motifs with low information content get a threshold which is close or equal to their maximal score (i.e. only sites close to perfect matches are predicted). On the other hand motifs with high information content are treated more liberal, such that more sites that deviate from the optimal are retained.
Binned motif enrichment analysis
In the previous lecture we identified differentially open chromatin regions between the doubly-positive (DP) and CD4+/CD8+ T cell populations. Now we would like to infer underlying regulators. To this end we will investigate the relation between the occurrence of (potential) transcription factor binding sites and differential accessibility.
The basic idea would to identify overrepresented motifs in the differentially regulated regions and thus link them to activities of transcription factors. We could generate a contingency table per TF motif that will split the chromatin regions according to their accessibility (specific vs. common) and if the TF motif is identified (target vs. non-target). Then, Fisher’s test can be applied.
common | specific (up or down) | |
---|---|---|
target | common.target | specific.target |
non-target | common.non-target | specific.non-target |
We will elaborate on this basic idea and implement a binned version of enrichment analysis using
monalisa
package. The word “binned” refers to the approach where we consider a continuous score per analysed region (like \(\rm{log}_2 FC\)) and group the regions based on this score into several bins. Using Fisher’s test, we subsequently test overrepresentation of the motifs in each bin (foreground) with respect to all the remaining regions (background). Compared to the approach using the contingency table above, we will obtain a finer resolution of the associations between the motifs and the observed changes in accessibility.The package performs the motif matching as well as the Fisher’s test. It also provides useful diagnostic tools and has a built-in sequence composition correction for GC and k-mer differences between the foreground and background.
Region binning
First, we read the data from the last week’s lecture. We will will try to identify TFs that are driving the differentiation of T cells towards CD4+ helper cells.
<- read.table(gzfile("data/Motifs/peaks_CD4_vs_DP.tsv.gz"), header=TRUE, stringsAsFactors=FALSE, sep="\t")
peaks <- GRanges(peaks, seqinfo=seqinfo(BSgenome.Mmusculus.UCSC.mm10))
peaks.gr names(peaks.gr) <-as.character(peaks.gr)
peaks.gr
GRanges object with 48744 ranges and 14 metadata columns:
seqnames ranges strand | overlap
<Rle> <IRanges> <Rle> | <character>
chr1:3042901-3043350 chr1 3042901-3043350 * |
chr1:4258201-4258500 chr1 4258201-4258500 * | Rp1:-:I
chr1:4496251-4496850 chr1 4496251-4496850 * | Sox17:-:PE
chr1:4771051-4771200 chr1 4771051-4771200 * |
chr1:4780201-4780350 chr1 4780201-4780350 * | Mrpl15:-:I
... ... ... ... . ...
chrY:897451-897600 chrY 897451-897600 * | Kdm5d:+:P
chrY:1010251-1010700 chrY 1010251-1010700 * | Eif2s3y:+:PE
chrY:1245451-1245900 chrY 1245451-1245900 * | Uty:-:PE
chrY:2224951-2225100 chrY 2224951-2225100 * |
chrY:2225401-2225550 chrY 2225401-2225550 * |
left right num.tests num.up.logFC
<character> <character> <integer> <integer>
chr1:3042901-3043350 3 0
chr1:4258201-4258500 Rp1:-:3027 2 0
chr1:4496251-4496850 Sox17:-:309 4 1
chr1:4771051-4771200 Mrpl15:-:2006 1 0
chr1:4780201-4780350 Mrpl15:-:2553 Mrpl15:-:871 1 0
... ... ... ... ...
chrY:897451-897600 Kdm5d:+:188 1 0
chrY:1010251-1010700 Eif2s3y:+:560 3 0
chrY:1245451-1245900 Uty:-:232 3 1
chrY:2224951-2225100 1 0
chrY:2225401-2225550 1 0
num.down.logFC PValue FDR direction
<integer> <numeric> <numeric> <character>
chr1:3042901-3043350 0 1.96259e-01 0.414670778 down
chr1:4258201-4258500 2 3.55200e-02 0.141788961 down
chr1:4496251-4496850 0 5.06666e-06 0.000198688 up
chr1:4771051-4771200 0 2.84625e-01 0.513425190 down
chr1:4780201-4780350 0 3.98520e-01 0.618252846 up
... ... ... ... ...
chrY:897451-897600 0 0.1663729 0.375639 up
chrY:1010251-1010700 0 0.1824989 0.397326 up
chrY:1245451-1245900 0 0.0398650 0.152983 up
chrY:2224951-2225100 0 0.0880565 0.254419 down
chrY:2225401-2225550 0 0.0513747 0.180484 down
rep.test rep.logFC rep.logCPM Contrast
<integer> <numeric> <numeric> <character>
chr1:3042901-3043350 3 -0.776602 0.116788 CD4 - DP
chr1:4258201-4258500 5 -1.115939 -0.661670 CD4 - DP
chr1:4496251-4496850 7 2.978558 0.121192 CD4 - DP
chr1:4771051-4771200 10 -0.732725 -1.027829 CD4 - DP
chr1:4780201-4780350 11 0.504890 -0.618404 CD4 - DP
... ... ... ... ...
chrY:897451-897600 125829 0.867217 -0.817155 CD4 - DP
chrY:1010251-1010700 125832 0.876682 0.315328 CD4 - DP
chrY:1245451-1245900 125833 1.636117 -0.948253 CD4 - DP
chrY:2224951-2225100 125836 -1.034671 -0.894518 CD4 - DP
chrY:2225401-2225550 125837 -0.990185 -0.498133 CD4 - DP
name
<character>
chr1:3042901-3043350 Common
chr1:4258201-4258500 Common
chr1:4496251-4496850 CD4
chr1:4771051-4771200 Common
chr1:4780201-4780350 Common
... ...
chrY:897451-897600 Common
chrY:1010251-1010700 Common
chrY:1245451-1245900 Common
chrY:2224951-2225100 Common
chrY:2225401-2225550 Common
-------
seqinfo: 239 sequences (1 circular) from mm10 genome
Now, we will bin the regions according to the reported log-fold-change. First, let’s have a look how are these values distributed.
hist(peaks.gr$rep.logFC, 100, col = "gray", main = "",
xlab = "Change of accesibility (log2FC)", ylab = "frequency")
The distribution is fairly symmetric and centered around 0. We will use the bin
function from the package to split the regions into groups of an equal size of 2000 and by minAbsX = 1
<- monaLisa::bin(x = peaks.gr$rep.logFC, binmode = "equalN", nElement = 2000,
bins minAbsX = 1)
plotBinDensity(peaks.gr$rep.logFC, bins, legend = "topleft", xlab = "log2FC")
Warning: `legend` is deprecated and ignored.
ℹ You can use `legendPosition` to control legend drawing and position
Note that the bin breaks around the no-change bin are not exactly -1 to 1. They have been adjusted to have the required 2000 per bin below and above it.
The size of the identified regions from ATACseq can vary substantially. The very long regions have naturally higher chance to be randomly matched by a TF motif.
summary(width(peaks.gr))
Min. 1st Qu. Median Mean 3rd Qu. Max.
150.0 150.0 300.0 399.9 450.0 3900.0
hist(width(peaks.gr))
abline(v = median(width(peaks.gr)), col="red") #add median value
abline(v = quantile(width(peaks.gr),0.95), col="blue", lwd=3, lty=2)
At this point it is a question if we ignore this bias or trim the sequences, which on the other hand leads to a loss of regions that we identified as accessible. One solution could be to split the very long regions and analyse them separately. For simplicity, here we will trim only the very long sequences (above the 95th quantile of the sequence width distribution).
<- quantile(width(peaks.gr),0.95)
width.for.resize width(peaks.gr) > width.for.resize ] <- trim( resize(peaks.gr[width(peaks.gr) > width.for.resize ], width = width.for.resize, fix = "center"))
peaks.gr[summary(width( peaks.gr))
Min. 1st Qu. Median Mean 3rd Qu. Max.
150.0 150.0 300.0 384.1 450.0 1200.0
Sequence compositions
So far, our GRanges
object contains only the genomic coordinates of the regions. Let’s extract the respective sequences using Biostrings
package and turn the variable into a DNAStringSet
object.
<- getSeq(BSgenome.Mmusculus.UCSC.mm10, peaks.gr) peaks.gr
Now we can plot the CG content and dinucleotide frequencies across the bins.
plotBinDiagnostics(seqs = peaks.gr, bins = bins, aspect = "GCfrac")
::plotBinDiagnostics(seqs = peaks.gr, bins = bins, aspect = "dinucfreq") monaLisa
Apparently, the CG content is generally correlated with identified differential accessibility. This would mean that any GC-rich motif is likely to be associated with the bins with negative log-fold-change. As already mentioned, monalisa
takes the GC content into account. Alternatively, one could re-sample the regions with more consistent GC content.
Enrichment test
The enrichment test is performed using the function calcBinnedMotifEnrR
. The function performs several steps:
It matches the motifs to the sequences of the studied regions using the
findMotifHits
function. The default setting implementsmatchPWM
function from theBiostrings
package with a fixed empiric scoremin.score = 10
.It performs adjustments for different k-mer and GC compositions between the foreground and the background. The default maximal k-mer length is set
maxKmerSize = 3L
.Fisher’s test is performed for each bin vs. the rest. Benjamini-Hochberg correction for multiple testing is applied.
suppressWarnings({
<- calcBinnedMotifEnrR(seqs = peaks.gr ,
MotifEnrichment pwmL = pwms ,
bins = bins ,
BPPARAM = BiocParallel::SerialParam(RNGseed = 42),
verbose = TRUE)
})
ℹ Filtering sequences ...
ℹ in total filtering out 0 of 48744 sequences (0%)
✔ in total filtering out 0 of 48744 sequences (0%) [7ms]
ℹ Filtering sequences ...
✔ Filtering sequences ... [60ms]
ℹ Scanning sequences for motif hits...
✔ Scanning sequences for motif hits... [4m 41.1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin [-7.97,-1.88]
✔ starting analysis of bin [-7.97,-1.88] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [73ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
✔ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [66ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [3s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.6s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [14ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.2s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (-1.88,-1.45]
✔ starting analysis of bin (-1.88,-1.45] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [67ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
✔ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [42ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [3s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [3.2s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [21ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (-1.45,-1.18]
✔ starting analysis of bin (-1.45,-1.18] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [70ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
✔ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [39ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [3s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.8s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [16ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.2s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (-1.18,-0.989]
✔ starting analysis of bin (-1.18,-0.989] [5ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [78ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
✔ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [56ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.9s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [3.4s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.9s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [18ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.2s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (-0.989,1]
✔ starting analysis of bin (-0.989,1] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [71ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
✔ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [41ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [1.6s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [1.6s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [1.9s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [15ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [2.3s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (1,1.16]
✔ starting analysis of bin (1,1.16] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [86ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
✔ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [55ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [3.1s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [3.2s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.9s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [15ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (1.16,1.36]
✔ starting analysis of bin (1.16,1.36] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [70ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
✔ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [41ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [16ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (1.36,1.6]
✔ starting analysis of bin (1.36,1.6] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [67ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
✔ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [39ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [3s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [13ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (1.6,1.93]
✔ starting analysis of bin (1.6,1.93] [5ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [71ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
✔ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [40ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [3.3s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [2.6s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [13ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (1.93,2.48]
✔ starting analysis of bin (1.93,2.48] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [67ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
✔ 7 of 9 GC-bins used (have both fore- and background sequences) 12 of 487…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [40ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.6s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.8s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [13ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [1.1s]
ℹ Create motif hit matrix...
ℹ starting analysis of bin (2.48,8.36]
✔ starting analysis of bin (2.48,8.36] [4ms]
ℹ Create motif hit matrix...
ℹ Defining background sequence set (otherBins)...
✔ Defining background sequence set (otherBins)... [66ms]
ℹ Create motif hit matrix...
ℹ Correcting for GC differences to the background sequences...
ℹ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
✔ 8 of 9 GC-bins used (have both fore- and background sequences) 0 of 4874…
ℹ Correcting for GC differences to the background sequences...
✔ Correcting for GC differences to the background sequences... [39ms]
ℹ Create motif hit matrix...
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ starting iterative adjustment for k-mer composition (up to 160 iteration…
✔ starting iterative adjustment for k-mer composition (up to 160 iteration…
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 40 of 160 iterations done
✔ 80 of 160 iterations done [2.7s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 80 of 160 iterations done
✔ 120 of 160 iterations done [2.9s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 120 of 160 iterations done
✔ 160 of 160 iterations done [2.9s]
ℹ Correcting for k-mer differences between fore- and background sequences.…
ℹ 160 of 160 iterations done
ℹ iterations finished
ℹ 160 of 160 iterations done
✔ 160 of 160 iterations done [16ms]
ℹ Correcting for k-mer differences between fore- and background sequences.…
✔ Correcting for k-mer differences between fore- and background sequences.…
ℹ Create motif hit matrix...
ℹ Calculating motif enrichment...
ℹ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
✔ using Fisher's exact test (one-sided) to calculate log(p-values) for enr…
ℹ Calculating motif enrichment...
✔ Calculating motif enrichment... [985ms]
ℹ Create motif hit matrix...
✔ Create motif hit matrix... [2m 20.5s]
# saveRDS(MotifEnrichment, file = "data/Motifs/MotEnrichment.RDS")
We will filter those transcription factors, with \(-{\rm log} (p.adj) > 10\) in at least one bin.
#MotifEnrichment <- readRDS("data/Motifs/MotEnrichment.RDS")
<- apply(assay(MotifEnrichment, "negLog10Padj"), 1,
sel function(x) max(abs(x), 0, na.rm = TRUE)) > 10.0
sum(sel, na.rm = TRUE) #78
[1] 78
<- MotifEnrichment[sel, ] MotifEnrichmentSel
We can visually represent the results in a heatmap using plotMotifHeatmaps
function. Rows can be clustered according to motif similarity.
<- monaLisa::motifSimilarity(rowData(MotifEnrichmentSel)$motif.pfm)
SimMatSel range(SimMatSel)
[1] 0.04543382 1.00000000
<- hclust(as.dist(1 - SimMatSel), method = "average")
hcl ::plotMotifHeatmaps(x = MotifEnrichmentSel, which.plots = c("log2enr", "negLog10Padj"),
monaLisawidth = 2.1, cluster = hcl, maxEnr = 2, maxSig = 15,
show_dendrogram = TRUE, show_seqlogo = TRUE, show_motif_GC = TRUE,
width.seqlogo = 1.5)
In the heatmap we can identify several TFs whose motifs correlate with the observed differences in accessibility and thus are more likely to drive the differentiation process towards CD4+ T cells. The prioritization of the TF can be done on multiple criteria. The strength of enrichment, but mainly from the correlation of the significance along the binning.
Another challenge becomes apparent in the motif enrichment analysis: many transcription factors share highly similar motifs, making it difficult to confidently identify the true regulatory factor. This problem is often referred to as co-linearity problem because the matrix of regions vs. transcription binding sites (site-count matrix) has many nearly identical columns. At this stage it is difficult to clearly assess the right candidate. ATACseq data are often accompanied with RNAseq and one can prioritize the TFs based on their expression levels or additional functional knowledge.
We should be also careful about the motifs with high GC content. Despite the built-in correction, their significance in the analysis might be influenced by this lower-order nucleotide content.
One can also run an “unbiased” k-mer enrichment analysis
calcBinnedKmerEnr
which searches for enriched k-mers of a lengthkmerLen
in the bins.
# suppressWarnings({kmerEnrichment <- calcBinnedKmerEnr(seqs = peaks.gr ,
# bins = bins,
# kmerLen = 6,
# includeRevComp = TRUE,
# BPPARAM = BiocParallel::SerialParam(RNGseed = 42),
# verbose = FALSE)
# })
#saveRDS(kmerEnrichment, file = "data/Motifs/kmerEnrichment.RDS")
<- readRDS("data/Motifs/kmerEnrichment.RDS")
kmerEnrichment <- apply(assay(kmerEnrichment, "negLog10Padj"), 1,
selkm function(x) max(abs(x), 0, na.rm = TRUE)) > 10
sum(selkm)
[1] 52
#> [1] 52
<- kmerEnrichment[selkm, ] sekmSel
Here, we can plot a heatmap that links the identified overrepresented k-mers in the motifs from the previous analysis. It shows that scoring of several transcription factors can be linked with their similarity to the same enriched k-mer. On the other hand, we can also identify k-mers that are not associated with the annotated transcription factors.
<- rowData(MotifEnrichmentSel)$motif.pfm
pfmSel <- motifKmerSimilarity(x = pfmSel,
sims kmers = rownames(sekmSel),
includeRevComp = TRUE)
dim(sims)
[1] 78 52
<- max(sapply(TFBSTools::Matrix(pfmSel), ncol))
maxwidth <- lapply(pfmSel, seqLogoGrob, xmax = maxwidth)
seqlogoGrobs <- rowAnnotation(logo = annoSeqlogo(seqlogoGrobs, which = "row"),
hmSeqlogo annotation_width = unit(1.5, "inch"),
show_annotation_name = FALSE
)
Heatmap(sims,
show_row_names = TRUE, row_names_gp = gpar(fontsize = 8),
show_column_names = TRUE, column_names_gp = gpar(fontsize = 8),
name = "Similarity", column_title = "Selected TFs and enriched k-mers",
col = colorRamp2(c(0, 1), c("white", "red")),
right_annotation = hmSeqlogo)
Further references
Session info
sessionInfo()
R version 4.5.0 Patched (2025-05-01 r88190)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Prague
tzcode source: internal
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] circlize_0.4.16 ComplexHeatmap_2.24.0
[3] SummarizedExperiment_1.38.1 Biobase_2.68.0
[5] MatrixGenerics_1.20.0 matrixStats_1.5.0
[7] monaLisa_1.14.0 JASPAR2022_0.99.8
[9] BiocFileCache_2.16.0 dbplyr_2.5.0
[11] BSgenome.Mmusculus.UCSC.mm10_1.4.3 BSgenome_1.76.0
[13] rtracklayer_1.68.0 BiocIO_1.18.0
[15] motifmatchr_1.30.0 TFBSTools_1.46.0
[17] MotifDb_1.50.0 Biostrings_2.76.0
[19] XVector_0.48.0 GenomicRanges_1.60.0
[21] GenomeInfoDb_1.44.0 IRanges_2.42.0
[23] S4Vectors_0.46.0 BiocGenerics_0.54.0
[25] generics_0.1.4 universalmotif_1.26.2
[27] ggdendro_0.2.0 ggplot2_3.5.2
[29] BiocParallel_1.42.0 BiocPkgTools_1.26.0
[31] htmlwidgets_1.6.4 BiocStyle_2.36.0
[33] magick_2.8.6 png_0.1-8
[35] knitr_1.50
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 shape_1.4.6.1
[3] rstudioapi_0.17.1 jsonlite_2.0.0
[5] magrittr_2.0.3 farver_2.1.2
[7] rmarkdown_2.29 GlobalOptions_0.1.2
[9] vctrs_0.6.5 memoise_2.0.1
[11] Rsamtools_2.24.0 RCurl_1.98-1.17
[13] gh_1.4.1 htmltools_0.5.8.1
[15] S4Arrays_1.8.0 curl_6.2.2
[17] SparseArray_1.8.0 lubridate_1.9.4
[19] cachem_1.1.0 GenomicAlignments_1.44.0
[21] whisker_0.4.1 igraph_2.1.4
[23] lifecycle_1.0.4 iterators_1.0.14
[25] pkgconfig_2.0.3 Matrix_1.7-3
[27] R6_2.6.1 fastmap_1.2.0
[29] clue_0.3-66 GenomeInfoDbData_1.2.14
[31] digest_0.6.37 colorspace_2.1-1
[33] TFMPvalue_0.0.9 stabs_0.6-4
[35] RSQLite_2.3.11 labeling_0.4.3
[37] seqLogo_1.74.0 filelock_1.0.3
[39] timechange_0.3.0 httr_1.4.7
[41] abind_1.4-8 compiler_4.5.0
[43] bit64_4.6.0-1 withr_3.0.2
[45] doParallel_1.0.17 biocViews_1.76.0
[47] DBI_1.2.3 fauxpas_0.5.2
[49] MASS_7.3-65 DelayedArray_0.34.1
[51] rjson_0.2.23 gtools_3.9.5
[53] caTools_1.18.3 tools_4.5.0
[55] splitstackshape_1.4.8 glue_1.8.0
[57] rorcid_0.7.0 restfulr_0.0.15
[59] cluster_2.1.8.1 gtable_0.3.6
[61] tzdb_0.5.0 tidyr_1.3.1
[63] data.table_1.17.2 hms_1.1.3
[65] xml2_1.3.8 foreach_1.5.2
[67] pillar_1.10.2 stringr_1.5.1
[69] splines_4.5.0 dplyr_1.1.4
[71] lattice_0.22-7 survival_3.8-3
[73] bit_4.6.0 tidyselect_1.2.1
[75] DirichletMultinomial_1.50.0 RBGL_1.84.0
[77] crul_1.5.0 xfun_0.52
[79] DT_0.33 stringi_1.8.7
[81] UCSC.utils_1.4.0 yaml_2.3.10
[83] evaluate_1.0.3 codetools_0.2-20
[85] httpcode_0.3.0 tibble_3.2.1
[87] BiocManager_1.30.25 graph_1.86.0
[89] cli_3.6.5 Rcpp_1.0.14
[91] XML_3.99-0.18 RUnit_0.4.33
[93] parallel_4.5.0 readr_2.1.5
[95] blob_1.2.4 bitops_1.0-9
[97] glmnet_4.1-8 pwalign_1.4.0
[99] scales_1.4.0 purrr_1.0.4
[101] crayon_1.5.3 GetoptLong_1.0.5
[103] rlang_1.1.6 rvest_1.0.4