Introduction to RNA-seq, quantification and import into R

Author

Panagiotis Papasaikas

Published

March 13, 2024

Introduction to RNA sequencing


RNA sequencing (RNA-Seq) is revolutionizing the study of the transcriptome

Evolution of transcriptomics technologies:

  • Northern blot –> single genes
  • RT-PCR –> several genes
  • Microarrays –> Targeted genomes
  • RNAseq –> whole genomes

RNA-Seq takes over other transcriptomics technologies

(Adapted from Lachmann et al: https://doi.org/10.1038/s41467-018-03751-6 )

Advantages of RNAseq over microarrays:

  • Highly sensitive and accurate
  • Detects both known and novel features
  • Can be applied to any species

Caveats of RNAseq:

  • Results are highly sensitive to protocol and analysis parameters
  • Difficulty to analyse repetitive regions

RNAseq: high throughput sequencing of cDNA using NGS technologies

What is RNAseq?

High-throughput sequencing of cDNA to assess the presence/quantity of RNA in a biological sample at a given moment in time

How does RNAseq work?

–> Sequencing of every RNA molecule –> Counting the number of RNA molecules, or reads, per transcript / gene

Output of an RNAseq experiment:

The summarized RNAseq data is a count table. Typically features (usually genes or transcript isoforms) are arranged in the rows and assayed samples are arranged in the columns:


Applications of RNAseq

RNAseq comes as a replacement to microarrays…

… to study changes occurring in disease states, in response to therapeutics, under different environmental conditions, etc

What can be obtained/analyzed from RNAseq?

  • Gene expression (different cell fractions, RNA populations can be targeted or selected for)
  • Transcript isoforms (alternative splicing, alternative pA…)
  • Gene fusions
  • Single nucleotide variants
  • Allele-specific gene expression

RNAseq analysis


Outline of this course

Week 1:

  1. Basic concepts in RNAseq. From raw data to count data (alignment and gene quantification)

Week 2:

  1. Summarized Experiment,normalization, exploratory analyses and visualization

Week 3:

  1. Differential gene expression, design matrices.

The different RNA-seq protocols

Prior to analysis, a good knowledge of which RNA-seq protocol was used is needed. The analysis parameters depend on it.

Paired-end vs. single-end RNAseq protocols

  • Paired-end libraries contain sequence information of sequence linkage over longer distances
  • Paired-end sequencing is more expensive in terms of price/sequenced fragment (same coverage but half the fragments for the same price)
  • Paired-end gives more resolution, in particular in repeat regions and over transcript isoforms

Stranded vs. unstranded RNAseq protocols

  • Stranded protocols are useful with regards for example to overlapping genes

Poly-A selected vs. ribo-depleted RNAseq protocols

A large fraction on the endogenous RNA originates from ribosomal RNA sequences. Typical approaches to enrich for non rRNA sequences include:

  • Poly-A libraries for analysis of mRNA only. RNA products that are not polyadenylated (e.g. histone genes) will be absent/underrepresented.
  • ribo-depleted libaries give access to the coding and non-coding transcriptome + non-polyA transcripts. Higher fraction of intronic reads.


RNA sequencing workflow


Inside an illumina flowcell

Every chip (flowcell) contains multiple lanes and every lane contains two columns of 50-60 tiles. Each tile contains several million “clusters” that contain primers:

In every cycle sequences with addapters are bound to primers, bridged to nearby primers and extended. After dissociation the cycle is repeated resulting in clusters of identical sequences. Labelled nucleotides are then added and scanned one at a time during incorporation giving the sequencing information.

During a labelled nucleotide addition cycle a tile area would look smth like this:


The FASTQ format

Sequencing information is recorded in a FASTQ file. This file contains information on the coordinates of the cluster in the flowcell, the read sequence and a string encoding the quality of the read nucleotides:

Quality scores explained: https://en.wikipedia.org/wiki/FASTQ_format#Quality


Quality control

Every sample in a sequecing run comes with a QC report (typically from the FASTQC tool). A FASTQC report is a summary of several indicators that allow us to pinpoint quality issues (e.g sequence duplication levels, contamination):

A couple of examples:


Exercise 1:

For the following examples of fastqc reports identify the issues in the FASTQC quality report:


Recap - transcriptomics

RNA-seq quantification

Input data

  • Quantification starts from a set of reads and a set of reference sequences

  • The reference sequences are typically either chromosomes (=genomic sequences) or transcript isoforms (=transcriptomic sequences)
  • Both types of reference files can be downloaded from (e.g.):
    • GENCODE (only human and mouse): https://www.gencodegenes.org/
    • Ensembl (lots of species): http://www.ensembl.org/info/data/ftp/index.html
    • Always stick to the same source and version of reference files during a project!

Approach I: Genome alignment and overlap counting

Alignment

  • The genome sequence (FASTA file) is obtained from one of the reference databases

  • In addition, a GTF file with the genomic locations of exons is obtained (from the same source as the genome)

  • Reads are aligned to the genome (after indexing the latter for increased computational efficiency)
  • For RNA-seq, the aligner must be able to accommodate spliced reads
    • STAR - command line application
    • HISAT2 - command line application
    • QuasR - R/Bioconductor package, incorporates HISAT2 for alignment of RNA-seq reads

  • Example STAR command, genome indexing
STAR --runThreadN 24 \
     --runMode genomeGenerate \
     --genomeDir my_genome \
     --genomeFastaFiles my_genome.fa \
     --sjdbGTFfile my_genes.gtf \
     --sjdbOverhang 99
  • Example STAR command, read alignment
STAR --runThreadN 24 \
     --runMode alignReads \
     --genomeDir my_genome \
     --readFilesIn S1_read1.fq.gz S1_read2.fq.gz \
     --readFilesCommand zcat \
     --outFileNamePrefix output/S1/ \
     --outSAMtype BAM SortedByCoordinate
  • The output file from the alignment step is typically a SAM file (or a BAM file, which is just a binary version of the SAM file)
  • The SAM/BAM file contains a header, which provides information about the reference sequences and the call to the aligner, and the main body, with one line per alignment

  • Detailed information about the SAM file format is available from https://samtools.github.io/hts-specs/SAMv1.pdf
  • Interpretation of SAM flags: https://broadinstitute.github.io/picard/explain-flags.html
  • Illustration of the CIGAR string: https://genome.sph.umich.edu/wiki/SAM#What_is_a_CIGAR.3F

Read counting

  • The abundance of a gene is obtained by counting the number of reads that overlap the exonic regions of the gene
  • Reads that overlap multiple genes (or map to multiple genomic locations) are typically discarded
  • Several R packages are available for doing the counting:
  • They can also be generated by STAR as part of the alignment process

  • We can also use this approach to quantify other features, such as individual exons (here a read is typically counted for all the exons that it overlaps)
  • This is useful e.g. if we are interested in finding specific exons that are only expressed in certain biological conditions

A problem with overlap counting

  • Imagine a situation where we have two samples, each expressing a gene.
  • In sample 1, isoform T1 (of length L) is expressed (at a level 2E).
  • In sample 2, isoform T2 (of length 2L) is expressed (at a level E).

  • When the transcripts are fragmented for sequencing, each molecule of isoform T2 contributes twice as many reads as a molecule of isoform T1.

  • Thus, when we count the number of reads overlapping the gene, both samples get the same number, even though the gene was twice as highly expressed in sample 1.

  • There are ways to address this issue, but we need additional information.

Approach II: Transcriptome mapping and abundance estimation

Mapping

  • The transcriptome sequences (FASTA file) is obtained from one of the reference databases

  • In addition, a text file mapping transcript IDs to the corresponding gene IDs should be prepared
  • Reads are mapped to the transcripts (after indexing the latter for increased computational efficiency)
  • For each read, we record the set of transcripts that it could have been generated from
  • Each combination of transcripts form an equivalence class, and we construct a table of counts for these equivalence classes

Quantification

  • Next, we estimate the individual transcript abundances that are most likely, given the observed equivalence class counts
  • Note that we are not assigning a particular read to a specific transcript, but rather estimating underlying abundances!
  • These “expected” abundances do not have to be integers
  • Multimapping reads are naturally accounted for via the equivalence classes
  • Several tools are available for performing this type of “alignment-free” quantification

  • Example command, Salmon, transcriptome indexing
salmon index -i my_transcripts.idx \
     -t my_transcripts.fasta
  • Example command, Salmon, mapping/quantification
salmon quant -i my_transcripts.idx -l A \
     -1 sample1_1.fastq -2 sample1_2.fastq \
     -p 10 -o results/sample1 --validateMappings \
     --numBootstraps 30 --seqBias --gcBias
  • The estimated transcript-level abundances can be added up on the gene level

  • It’s important to note that if we just add up counts from different isoforms, the resulting gene count has the same problem as the gene count from the overlap counting (\(C\) counts from a shorter isoform does not correspond to the same abundance as \(C\) counts from a longer isoform)! However, with the transcript abundance estimation approach we have additional information that can help us rectify it (see this paper for more information).
  • The basic output from the transcript abundance estimation tools is a text file with estimated abundances. This one is from Salmon:

  • Name is the transcript identifier
  • Length is the annotated transcript length
  • EffectiveLength is the length after accounting for the fact that a read is not equally likely to start anywhere in the transcript
  • TPM is the estimated abundance in transcript-per-million
  • NumReads is the estimated abundance in read counts

TPM vs read counts - what’s the difference?

  • The read count depends on the length of the feature, in addition to its abundance.
  • The TPM (similar to the nowadays less used RPKM/FPKM) aims to provide an abundance unit that is proportional to the number of molecules of a given target gene/transcript in a sample.
  • Most of the widely used framework for differential gene expression directly model the counts, since these contain information related to the certainty of the abundance estimate, which is lost in the TPMs.

  • Another commonly used abundance unit is the CPM (counts per million), which is obtained by normalizing the counts for differences in library size (but not by feature length).

How do the two quantification approaches differ?

  • For most genes, counts are similar with the two approaches.
  • Alignment-free methods are faster than methods based on genome alignment, since the former don’t perform a base-by-base matching.
  • Salmon provides more highly resolved information than (e.g.) featureCounts (transcript-level rather than gene-level).
  • Alignment-free methods typically use a larger fraction of the reads, since they don’t discard multimapping reads.
  • Alignment-free methods don’t provide a precise alignment to the genome (which may be useful, e.g., for visualization in genome browsers).

Reading quantifications into R

  • Here, we will focus on the processing of Salmon output.
  • Recall that the main output of Salmon is a text file with transcript abundances.

  • To read this into R, we can use the tximeta function from the tximeta package.
library(tximeta)
?tximeta

  • The first argument required by the tximeta function is a coldata data.frame, which tells the function which samples we want to import, and where the corresponding quantification files are.
  • As an example, we will use data from the macrophage package. This package contains quant.sf files from Salmon, as well as a coldata file.
  • These files are stored in the macrophage package folder in your R package library. We can find this location and check its content like so:
library(macrophage)
dir <- system.file("extdata", package = "macrophage")
dir
[1] "/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata"
list.files(dir)
[1] "coldata.csv"                   "errs"                         
[3] "gencode.v29_salmon_0.12.0"     "gencode.v29.annotation.gtf.gz"
[5] "PRJEB18997.txt"                "quants"                       
[7] "supp_table_1.csv"              "supp_table_7.csv"             
  • We start by reading the coldata file. If you analyze your own data, you can read this information from a text file, or create the data.frame directly in R.
coldata <- read.csv(file.path(dir, "coldata.csv"))[, c(1, 2, 3, 5)]
dim(coldata)
[1] 24  4
head(coldata)
           names sample_id line_id condition_name
1 SAMEA103885102    diku_A  diku_1          naive
2 SAMEA103885347    diku_B  diku_1           IFNg
3 SAMEA103885043    diku_C  diku_1         SL1344
4 SAMEA103885392    diku_D  diku_1    IFNg_SL1344
5 SAMEA103885182    eiwy_A  eiwy_1          naive
6 SAMEA103885136    eiwy_B  eiwy_1           IFNg
  • In addition to the names column, tximeta requires that coldata has a column named files, pointing to the Salmon output (the quant.sf file) for the respective samples.
coldata$files <- file.path(dir, "quants", coldata$names, "quant.sf.gz")
head(coldata)
           names sample_id line_id condition_name
1 SAMEA103885102    diku_A  diku_1          naive
2 SAMEA103885347    diku_B  diku_1           IFNg
3 SAMEA103885043    diku_C  diku_1         SL1344
4 SAMEA103885392    diku_D  diku_1    IFNg_SL1344
5 SAMEA103885182    eiwy_A  eiwy_1          naive
6 SAMEA103885136    eiwy_B  eiwy_1           IFNg
                                                                                                                      files
1 /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata/quants/SAMEA103885102/quant.sf.gz
2 /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata/quants/SAMEA103885347/quant.sf.gz
3 /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata/quants/SAMEA103885043/quant.sf.gz
4 /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata/quants/SAMEA103885392/quant.sf.gz
5 /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata/quants/SAMEA103885182/quant.sf.gz
6 /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/macrophage/extdata/quants/SAMEA103885136/quant.sf.gz
all(file.exists(coldata$files))
[1] TRUE
  • Now we have everything we need, and can import the quantifications with tximeta.
  • In this process, we will see that tximeta automatically identifies the source and version of the transcriptome reference that was used for the quantification, and adds some metadata.
  • The imported data will be stored in a SummarizedExperiment container, which you heard about in a previous lecture.
suppressPackageStartupMessages({
  library(SummarizedExperiment)
})
se <- tximeta(coldata = coldata, type = "salmon", dropInfReps = TRUE)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
found matching transcriptome:
[ GENCODE - Homo sapiens - release 29 ]
loading existing TxDb created: 2023-03-22 12:02:57
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
loading existing transcript ranges created: 2023-03-22 12:02:57
se
class: RangedSummarizedExperiment 
dim: 205870 24 
metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
assays(3): counts abundance length
rownames(205870): ENST00000456328.2 ENST00000450305.2 ...
  ENST00000387460.2 ENST00000387461.2
rowData names(3): tx_id gene_id tx_name
colnames(24): SAMEA103885102 SAMEA103885347 ... SAMEA103885308
  SAMEA103884949
colData names(4): names sample_id line_id condition_name
  • We see that tximeta has identified the transcriptome used for the quantification as GENCODE - Homo sapiens - release 29. How did this happen?
  • In fact, the output directory from Salmon contains much more information than just the quant.sf file! (Side note: this means that it is not advisable to move files out of the folder, or to share only the quant.sf file, since the context is lost):
list.files(file.path(dir, "quants", coldata$names[1]), recursive = TRUE)
 [1] "aux_info/ambig_info.tsv.gz"       "aux_info/bootstrap/bootstraps.gz"
 [3] "aux_info/bootstrap/names.tsv.gz"  "aux_info/exp_gc.gz"              
 [5] "aux_info/expected_bias.gz"        "aux_info/fld.gz"                 
 [7] "aux_info/meta_info.json"          "aux_info/obs_gc.gz"              
 [9] "aux_info/observed_bias_3p.gz"     "aux_info/observed_bias.gz"       
[11] "cmd_info.json"                    "lib_format_counts.json"          
[13] "libParams/flenDist.txt"           "logs/salmon_quant.log.txt"       
[15] "quant.sf.gz"                     
  • In particular, the meta_info.json file contains a hash checksum, which is derived from the set of transcripts used as reference during the quantification and which lets tximeta identify the reference source (by comparing to a table of these hash checksums for commonly used references).
rjson::fromJSON(file = file.path(dir, "quants", coldata$names[1], "aux_info", "meta_info.json"))
$salmon_version
[1] "0.12.0"

$samp_type
[1] "gibbs"

$opt_type
[1] "vb"

$quant_errors
list()

$num_libraries
[1] 1

$library_types
[1] "ISR"

$frag_dist_length
[1] 1001

$seq_bias_correct
[1] FALSE

$gc_bias_correct
[1] TRUE

$num_bias_bins
[1] 4096

$mapping_type
[1] "mapping"

$num_targets
[1] 205870

$serialized_eq_classes
[1] FALSE

$eq_class_properties
list()

$length_classes
[1]    520    669   1065   2328 205012

$index_seq_hash
[1] "40849ed828ea7d6a94af54a5c40e5d87eb0ce0fc1e9513208a5cffe59d442292"

$index_name_hash
[1] "77aca5545a0626421efb4730dd7b95482c77da261f9bdef70d36e25ee68bb7ef"

$index_seq_hash512
[1] "f37ae2a7412efd8518d68c22fc3fcc2478b59833809382abd5d9055475505e516730c70914343af34c9232af92fe22832e70afde6b38c26381097068e1e82551"

$index_name_hash512
[1] "67f286e5ec2895c10aa6e3a8feed04cb3a80747de7b3afb620298078481b303d2f979407fc104d4f3e8af0e82be2acb2f294ee5f2f68c00087f2855120d9f4ed"

$num_bootstraps
[1] 20

$num_processed
[1] 45141218

$num_mapped
[1] 40075160

$percent_mapped
[1] 88.77731

$call
[1] "quant"

$start_time
[1] "Tue Jan 15 21:21:22 2019"

$end_time
[1] "Tue Jan 15 21:29:00 2019"

Appendix: Accessing and downloading RNA-sequencing data from GEO

Most journals require that all datasets (including RNAseq) used in a publication are deposited in public genomic data repositories. The NCBI repository is the Sequence Read Archive or SRA. Metadata on the experiments are deposited in GEO (Gene Expression Omnibus).

In order to query, download and obtain fastq files of published experiments on then typicall can:

  • Download .sra files from GEO (raw files)
  • Convert .sra files into .fastq files (input for alignment) using fastq-dump

The GEO identifier of the RNAseq dataset can be found in the original article.

After a quick search for ‘gse’ in the whole article, we can find that data has been deposited in GEO under GSE33252


Fetching useful information prior to analysis

Have a careful look at:

  • Material and Method section of the article
  • GEO database (use GEO id)
  • SRA database (use SRR id)

and get information about:

  • protocol
  • analysis

  • Single-end reads
  • 36b-long reads
  • First four samples: PolyA, unstranded
  • Last two samples: Total RNA, ribo-depleted, stranded
  • Genome assembly used: mm9

Downloading raw data: SRA

Raw data are stored in the GEO/SRA databases under the .sra form. The .sra files need to be downloaded and converted into .fastq files, which can then be aligned.

.sra files can be downloaded from the GEO database


Converting the .sra files to .fastq files

.sra to .fastq conversion is done using the fastq-dump tool from the SRA toolkit

To run fastq-dump from Rstudio, use a .sh script (in Rstudio: new text document saved as .sh)

# ... specify the working directory containing the .sra file
cd <directoryname>

# ... convert .sra to .fastq (single read)
fastq-dump <filename.sra> --gzip --stdout > <filename.fq.gz>

# ... convert .sra to .fastq (paired-end)
fastq-dump <./filename.sra> --gzip --split-files

Downloading data from within R using the SRAdb package

The SRAdb package provides access to the Sequence Read Archive database from within R. It can be used to access, query and download SRA datasets.

This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally.

library(SRAdb)
sqlfile <- file.path(system.file('extdata', package='SRAdb'),'SRAmetadb_demo.sqlite' ) #Note this is a downsized SQLdb only for illustration purposes
# Create a connection to the SQLdatabase:
sra_con <- dbConnect(SQLite(),sqlfile)

#download sra data files from the ftp site:
getSRAfile( c("SRR000648"), sra_con, fileType ='sra', destDir = "data/05/")
#Convert to fastq:
system ("fastq-dump data/05/SRR000648.sra") #This requires the sra command line tools to be installed in the system

#Or directly download fastq files from EBI using the ftp protocol:
getFASTQinfo( c("SRR000648"), sra_con, srcType ='ftp')
getSRAfile( c("SRR000648"), sra_con, fileType ='fastq',destDir = "data/05/")

For more information on the versatile SRAdb functionality see:

https://www.bioconductor.org/packages/release/bioc/vignettes/SRAdb/inst/doc/SRAdb.pdf

Session info

sessionInfo()
R version 4.3.1 Patched (2023-06-20 r84599)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicFeatures_1.52.1      AnnotationDbi_1.62.1       
 [3] SummarizedExperiment_1.30.2 Biobase_2.60.0             
 [5] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
 [7] IRanges_2.34.0              S4Vectors_0.38.1           
 [9] BiocGenerics_0.46.0         MatrixGenerics_1.12.2      
[11] matrixStats_1.0.0           macrophage_1.16.0          
[13] tximeta_1.18.0              BiocStyle_2.28.0           
[15] png_0.1-8                   knitr_1.44                 

loaded via a namespace (and not attached):
 [1] DBI_1.1.3                     bitops_1.0-7                 
 [3] biomaRt_2.56.1                rlang_1.1.1                  
 [5] magrittr_2.0.3                compiler_4.3.1               
 [7] RSQLite_2.3.1                 vctrs_0.6.4                  
 [9] stringr_1.5.0                 ProtGenerics_1.32.0          
[11] pkgconfig_2.0.3               crayon_1.5.2                 
[13] fastmap_1.1.1                 dbplyr_2.3.4                 
[15] XVector_0.40.0                ellipsis_0.3.2               
[17] utf8_1.2.3                    Rsamtools_2.16.0             
[19] promises_1.2.1                rmarkdown_2.25               
[21] tzdb_0.4.0                    purrr_1.0.2                  
[23] bit_4.0.5                     xfun_0.40                    
[25] zlibbioc_1.46.0               cachem_1.0.8                 
[27] jsonlite_1.8.7                progress_1.2.2               
[29] blob_1.2.4                    later_1.3.1                  
[31] DelayedArray_0.26.3           BiocParallel_1.34.2          
[33] interactiveDisplayBase_1.38.0 parallel_4.3.1               
[35] prettyunits_1.2.0             R6_2.5.1                     
[37] stringi_1.7.12                rtracklayer_1.60.0           
[39] Rcpp_1.0.11                   readr_2.1.4                  
[41] httpuv_1.6.11                 Matrix_1.6-1.1               
[43] tidyselect_1.2.0              rstudioapi_0.15.0            
[45] yaml_2.3.7                    codetools_0.2-19             
[47] curl_5.1.0                    lattice_0.21-9               
[49] tibble_3.2.1                  withr_2.5.1                  
[51] shiny_1.7.5                   KEGGREST_1.40.0              
[53] evaluate_0.22                 BiocFileCache_2.8.0          
[55] xml2_1.3.5                    Biostrings_2.68.1            
[57] pillar_1.9.0                  BiocManager_1.30.22          
[59] filelock_1.0.2                generics_0.1.3               
[61] vroom_1.6.4                   RCurl_1.98-1.12              
[63] BiocVersion_3.17.1            ensembldb_2.24.0             
[65] hms_1.1.3                     xtable_1.8-4                 
[67] glue_1.6.2                    lazyeval_0.2.2               
[69] tools_4.3.1                   AnnotationHub_3.8.0          
[71] BiocIO_1.10.0                 GenomicAlignments_1.36.0     
[73] XML_3.99-0.14                 GenomeInfoDbData_1.2.10      
[75] restfulr_0.0.15               cli_3.6.1                    
[77] rappdirs_0.3.3                fansi_1.0.5                  
[79] S4Arrays_1.0.4                dplyr_1.1.3                  
[81] AnnotationFilter_1.24.0       digest_0.6.33                
[83] tximport_1.28.0               rjson_0.2.21                 
[85] htmlwidgets_1.6.2             memoise_2.0.1                
[87] htmltools_0.5.6.1             lifecycle_1.0.3              
[89] httr_1.4.7                    mime_0.12                    
[91] bit64_4.0.5