suppressPackageStartupMessages({
library(SingleCellExperiment)
library(scater)
library(ggplot2)
library(grid)
library(scran)
library(batchelor)
library(BiocSingular)
library(BiocNeighbors)
library(tidyr)
library(dplyr)
library(igraph)
})
# If any package is missing, install it using the following commands and then load it.
# if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#
# BiocManager::install("BiocSingular")
# library(BiocSingular)
Single-cell RNA sequencing 2
Introduction
In the previous lecture we covered all the primary steps of a typical single cell RNA-Seq analysis:
- Loading gene expression data in the form of a Single Cell Experiment object,
- Quality control,
- Cell and gene filtering,
- Dimensionality reduction,
- Visualization and exploratory analysis.
In this lecture we will cover downstream analysis tasks typical for single cell RNA-Seq datasets. Specifically we will go through the following analysis steps:
- Batch effect diagnosis and correction, i.e. data integration,
- Identification of distinct cell sub-populations, i.e. clustering,
- Differential gene expression analyses,
- Identification of marker genes.
For more details about these topics refer to the excellent Bioconductor resource “Orchestrating single-cell analysis with Bioconductor”, or the resources on individual topics listed at the end of this document.
Load packages
We first load the required R packages.
Load the pre-processed dataset
The data set that we will use is a composite dataset of three independent studies performed in three different labs. It consists of 9288 mammary epithelial cells sequenced using 10x Genomics technology. It has already been pre-filtered and contains only good quality cells that are unambiguously assigned to one of three major cell types: luminal progenitors, luminal mature and basal.
For simplicity and velocity, only genes that are present and variable in all three data sets were kept (over-dispersed genes, see the previous lecture).
Next, we load the pre-processed data set and look at its composition:
# Download the data and set row names to gene symbols whenever possible
<- readRDS(gzcon(url("https://ivanek.github.io/analysisOfGenomicsDataWithR/data/09/SCE_MammaryEpithelial_x3.rds?raw=true")))
sce rownames(sce) <- scater::uniquifyFeatureNames(
ID = rownames(sce),
names = as.character(rowData(sce)$gene.symbols)
)# Randomly select 3000 cells for the further analysis to speed up processing:
set.seed(42)
=3000
n<- sce[, sample(1:ncol(sce), n ) ]
sce sce
class: SingleCellExperiment
dim: 1222 3000
metadata(2): pumbed_ids marker_genes
assays(2): counts logcounts
rownames(1222): Malat1 Actb ... Clic6 Bend5
rowData names(2): gene.symbols gene.counts
colnames(3000): vis2558 spk2385 ... wal2232 wal1284
colData names(4): study cell.class library_size detected_genes
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
# Dataset composition per cell type and study:
table(colData(sce)$study , colData(sce)$cell.class)
luminal_progenitor basal luminal_mature
spk 392 145 352
vis 228 408 350
wal 235 241 649
Diagnostic plots
We now look at some high level characteristics of the three data sets:
# Check the distribution of library sizes and the distribution of the number of detected genes across data sets:
# Distribution of library sizes
ggplot(data.frame(colData(sce)),
aes(x = study, y = library_size, color=study)) +
geom_violin() + theme_bw() +
ylab("Total UMI counts per cell") +
ggtitle("Library size distribution per study")
# Distribution of the number of detected genes
ggplot(data.frame(colData(sce)),
aes(x = study, y = detected_genes, color=study)) +
geom_violin() + theme_bw() +
ylab("# of detected genes") +
ggtitle("Number of detected genes per study")
These plots show right away that the three datasets differ in these high level characteristics.
Let us first normalize the gene expression to the library size:
# We first normalize all cells for the library size.
# Calculate size factors for the normalization
<- scran::computeSumFactors(sce, min.mean = 0.1)
sce # Compute log-transformed normalized expression values using calculated size factors
<- logNormCounts(sce) sce
Then model the mean-variance relationship and get top highly variable genes:
# Model the mean-variance relationship
<- modelGeneVarByPoisson(sce)
dec.pois plot(dec.pois$mean, dec.pois$total, xlab = "Mean of log-expression",
ylab = "Variance of log-expression")
curve(metadata(dec.pois)$trend(x), col = "dodgerblue", add = TRUE)
# Get top highly variable genes
<- dec.pois[order(dec.pois$bio, decreasing=TRUE),]
dec.pois.ord <- getTopHVGs(dec.pois.ord) hvg
Next, perform principal component analysis for the normalized expression of highly variable genes:
# Perform principal component analysis
<- runPCA(sce, exprs_values = "logcounts",subset_row=hvg)
sce
# Plot coordinates of PC1 vs coordinates of PC2
plotReducedDim(sce, "PCA", colour_by = "study")
plotReducedDim(sce, "PCA", colour_by = "cell.class")
Have a look at the impact of the fact that these three datasets were generated in three different labs, so called batch effect.
Also look at t-SNE projections calculated for the data set:
# Apply t-sne method
set.seed(42)
<- runTSNE(sce, dimred = "PCA")
sce
<- plotReducedDim(sce, "TSNE", colour_by = "study")
p1 <- plotReducedDim(sce, "TSNE", colour_by = "cell.class")
p2
grid.arrange(p1, p2, ncol = 2)
What do you observe? What do t-SNE projections reflect more: the lab, where data were generated (alse called batch in this case), or cell types?
Batch effect correction
The batch effect refers to technical sources of variation introduced to the data during handling/preparation/processing, when they (i.e. data) are prepared in different batches. This variation distorts measurements resulting in the artefactual separation of cells of the same type across batches.
In case of single cell sequencing, batch effects issues are especially severe because:
- They are much more exaggerated due to the small starting sample and the larger number of / more complicated steps involved in single cell RNA sequencing.
- They are often confounded (i.e mixed-up) with biological variance, since batches are often not identical in composition.
- The distortion within the same batch may simultaneously influence multiple aspects such as, e.g., the expression of certain genes as well as the entire gene expression in a certain cell type(s).
Correcting a batch effect(s) has the following goals:
- To erase the batch-originating variance. Practically this means that similar cell types should be intermixed across batches after correction.
- To preserve meaningful heterogeneity in the data. This means that distinct cell types are not mixed in the corrected data.
- Not to introduce any artefactual variance. In practice this means that similar cells within batches are not getting separated.
Batch effect correction using linear regression
The batch effect is a general phenomenon in biology and may occur in various data sets (not only in scRNA-Seq data sets). There exist numerous ways to correct for it.
One of the standard approaches for correcting batch effects in the case of bulk RNA-Seq data sets is based on the linear regression. A linear model is fitted gene-wise and the batch effect is reflected by a separate parameter in the model specification:
\(y_{ijk} = \alpha_i + \beta_{ij} + \epsilon_{ijk}\),
where \(y_{ijk}\) is the expression of a gene \(i\) in the cell \(k\) coming from the batch \(j\) , \(\alpha_i\) is the basal expression of a gene \(i\) across all cells and batches, \(\beta_{ij}\) is the batch-specific shift of the expression level of a gene \(i\) in all cells from the batch \(j\) and \(\epsilon_{ijk}\) represents the variance within the batch.
Once we fit the model, the batch-specific effects can be corrected by subtracting fitted values of the parameter \(\beta\) from the gene expression in all cells from a batch.
This approach has the following disadvantages:
- It does not account for the difference in the population composition across batches.
- It assumes that the batch effect is additive.
Because of these assumptions this method is not extremely effective in removing the batch-originating variance. In addition, it is prone to erasing meaningful variation (over-correction), when biological and technical variance are confounded.
This approach is the basis of the removeBatchEffect()
function from the limma
package (Ritchie et al. 2015) as well the comBat()
function from the sva
package (Leek et al 2007). The rescaleBatches()
function from the batchelor
package is based on the similar technique with some adjustments implemented for processing single-cell RNA-Seq data sets. Specifically, for each gene, the mean gene expression in each batch is scaled down until it is equal to the lowest mean gene expression across all batches.
Here, we apply the function rescaleBatches()
to remove the batch effect in our data set:
<- batchelor::rescaleBatches(sce, batch=sce$study)
sce.linCor colData(sce.linCor) <- colData( sce )
<- runPCA(sce.linCor, exprs_values = "corrected",subset_row=hvg)
sce.linCor
set.seed(42)
<- runTSNE(sce.linCor, dimred = "PCA")
sce.linCor
<- plotReducedDim(sce.linCor, "TSNE", colour_by = "study")
p1 <- plotReducedDim(sce.linCor, "TSNE", colour_by = "cell.class")
p2
grid.arrange(p1, p2, ncol = 2)
Batch effect correction using Fast-MNN approach
Now we consider one more technique for correcting batch effects in scRNA-Seq data sets called matching mutual nearest neighbours (MNN) technique first proposed by (Haghverdi et al. 2018). This method attempts to identify similar cells across batches after projecting them in a space of reduced dimensions.
Figure was taken from (Haghverdi et al. 2018).
Thus, this type of correction does not apply a uniform shift for the gene expression within the batch. Instead it adjusts a correction of the gene expression for individual cells erasing batch-originating variance more efficiently and less prone to over-correction.
It is worth to notice that this method has the following assumptions:
- There is at least one cell population (or cell type) that is common across all batches.
- The batch effect variation is almost orthogonal to the biological variation (no high level of confounding) or at least its effects are smaller than the biological variation effect between different cell types.
The function that implements this method in R
is called fastMNN()
from the batchelor
package.
# Remove the batch effect in our data set using the function `fastMNN()`
<- 32 # Number of dimensions to use for dimensionality reduction
d <- batchelor::fastMNN( assays(sce)[["logcounts"]], batch=sce$study , d=d )
FMNN.out
# Notice that the object returned by the function fastMNN() is a single cell experiment object.
# Apply t-sne method to the corrected low-dimensional coordinates of cells
set.seed(42)
<- runTSNE(FMNN.out, dimred = "corrected")
FMNN.out
# Save calculated coordinates in the original single cell experiment object
reducedDim (sce, "PCA.FMNN" ) <- reducedDim(FMNN.out, "corrected")
reducedDim (sce, "TSNE.FMNN" ) <- reducedDim(FMNN.out, "TSNE")
<- plotReducedDim(sce, "TSNE.FMNN", colour_by = "study")
p1 <- plotReducedDim(sce, "TSNE.FMNN", colour_by = "cell.class")
p2
grid.arrange(p1, p2, ncol = 2)
# We can also check the proportion of variance lost from each batch
# and in each merge step of MNN. In the following matrix rows
# correspond to merge steps and columns to the three batches (according to the input order):
metadata(FMNN.out)$merge.info$lost.var
spk vis wal
[1,] 0.008754112 0.009460673 0.00000000
[2,] 0.007179037 0.009438419 0.01714101
Compare visually the results of removing the batch effect using linear regression and MNN. Which one performs better?
Clustering
Clustering aims to identify cell populations with similar expression profiles. These populations, i.e. clusters, are assumed to reflect distinct biological cell types (or subtypes) present in the dataset. It is important to keep in mind that defining the desired resolution of the clustering procedure (i.e the ideal number of clusters) is often subjective and context-dependent. In downstream analysis one can extract more biological insights for the identified cell groups by identifying “marker genes” that characterize them.
Graph based clustering
Graph-based approaches model cell populations as nodes on a network and try to identify well connected node communities (cell groupings). Graph (network) construction is based on drawing edges (connections) between nodes (cells) according to some predefined measure of node to node similarity. A wide range of community-detection algorithms is then available to partition the resulting graph in communities of cells that are better connected to each other then to the rest of the cells.
Graph-based approaches have become extremely popular in the context of single cell RNA sequencing analysis, because they tend to be robust to unwanted variance that produces continuous distortions to the signal (such as sampling variance). They make no assumptions on the shape or the distribution of cells within clusters and they are scalable.
We will now build a Shared Nearest Neighbour Graph (SNN graph) using scran. In such a graph two cells are connected only if at least one of their k
nearest neighbours is common.
We will then identify clusters using a community detection method from the package igraph.
# Create a graph of shared nearest neighbours:
<- scran::buildSNNGraph(sce, k=100, use.dimred = 'PCA.FMNN') # Build SNN graph
g <- igraph::cluster_louvain(g)$membership # use the louvain method for community detection
clust table(clust)
clust
1 2 3
793 856 1351
$clust <- factor(clust)
sce
<- plotReducedDim(sce, "TSNE.FMNN", colour_by = "study")
p1 <- plotReducedDim(sce, "TSNE.FMNN", colour_by = "cell.class")
p2 <- plotReducedDim(sce, "TSNE.FMNN", colour_by = "clust")
p3
grid.arrange(p1, p2, p3, ncol = 3)
Exercise 1
- Repeat the clustering and view the projections for different numbers (between 5 and 100) of shared nearest neighbours
k
used for graph construction. How does the value of the parameterk
affect the number of clusters?
# k=25
<- scran::buildSNNGraph(sce, k=25, use.dimred = "PCA.FMNN")
g <- igraph::cluster_louvain(g)$membership
clust_25 table(clust_25)
clust_25
1 2 3 4
791 858 998 353
$clust_25 <- factor(clust_25)
sceplotReducedDim(sce, "TSNE.FMNN", colour_by = "clust_25")
# k=50
<- scran::buildSNNGraph(sce, k=50, use.dimred = "PCA.FMNN")
g <- igraph::cluster_louvain(g)$membership
clust_50 table(clust_50)
clust_50
1 2 3
793 856 1351
$clust_50 <- factor(clust_50)
sceplotReducedDim(sce, "TSNE.FMNN", colour_by = "clust_50")
# k=75
<- scran::buildSNNGraph(sce, k=75, use.dimred = "PCA.FMNN")
g <- igraph::cluster_louvain(g)$membership
clust_75 table(clust_75)
clust_75
1 2 3
793 856 1351
$clust_75 <- factor(clust_75)
sceplotReducedDim(sce, "TSNE.FMNN", colour_by = "clust_75")
- Apply hierarchical clustering. For this apply a function called
dist
to corrected PCA coordinates to calculate distances between cells. Then use the functionhclust
to create a hierarchical tree of cells based on distances between them (the methodward.D2
is recommended). Cut the hierarchical tree with the functioncutree
to get 3 clusters. Visualize the result of clustering. Compare obtained clustering results with the results obtained with the other method.
<- dist(reducedDim(sce, "PCA.FMNN"))
calc.dist <- hclust(calc.dist, method="ward.D2")
hier.tree <- 3 # Set the number of clusters
clus.num <- cutree(hier.tree, k=clus.num)
clusters $clust_hier <- factor(clusters)
sceplotReducedDim(sce, "TSNE.FMNN", colour_by = "clust_hier")
Differential expression and identification of marker genes.
As mentioned above, after clustering one can further characterize the derived cell sub-populations by identifying genes that are differentially expressed in each cluster. If those genes are over-expressed in a cluster, they are also referred to as marker genes. Although it is a standard type of differential expression analysis in scRNA-Seq datasets, it is worth mentioning that the character of the data allows us to compare gene expression in cell sub-populations from different points of view. For example, one can study in more detail, apart from the mean gene expression, additional characteristics of the gene expression distribution across distinct subpopulations, such as its variance, skewness or multi-modality:
Note that these are parameters that are not accessible for study in typical bulk RNA-Seq data sets.
A related issue is how we come up with the definition of distinct subpopulations in our experiment. We saw above how we can leverage information coming from the data themselves to define subpopulations using clustering approaches. This comes at the expense of being somewhat circular in our analyses. We define clusters using the data and then go back to the same data to perform DE analyses!
However, cell groups undergoing DE analysis do not necessary need to be results of clustering. They could also come from the assignment of cell types using prior information (e.g in cases, where cells have been sorted before sequencing).
Although single-cell specific methods exist for differential gene expression analysis, in practice methods developed for bulk RNA-Seq or simple statistical tests such as t-test or its non-parametric counterpart Wilcoxon test perform as well and often better (see Soneson et al. 2018).
Multi-sample data sets, in which similar cell populations are derived from somewhat different contexts (e.g different individuals, treatments, gene perturbations), often require additional analysis steps. For example, cell type composition can be compared between different conditions (“differential abundance analysis”), or differential expression analysis can be performed in the same cell type across different conditions (“differential state analysis”).
In our example data set one could ask, what are the differences in the proportion of luminal mature cells in the three samples, or what are the differences in the gene expression in the luminal mature cell subtype across the three samples. These questions, although they can be very interesting, are beyond the scope of this course and we refer the students to the relevant section of the OSCA bioconductor resource: “Multi-sample comparisons”.
Here, we will use the scran package in order to identify marker genes and visualize their expression. The scoreMarkers
function expects as input normalized log-expression values.
We need to specify the groups of cells, which we want to compare to each other. We have observed in this data set a strong batch effect. It is therefore meaningful to include the factor describing it into scoreMarkers
call as block
argument.
<- scran::scoreMarkers(
markers_all groups = sce$cell.class,
sce, block = sce$study,
assay.type="logcounts"
)
Let’s now visualize some of the identified markers:
<- markers_all[[1]]
cell.type1.markers <- order(cell.type1.markers$mean.AUC,decreasing=TRUE)
ord <- cell.type1.markers[ord,]
cell.type1.markers.ord head(cell.type1.markers.ord)
DataFrame with 6 rows and 19 columns
self.average other.average self.detected other.detected
<numeric> <numeric> <numeric> <numeric>
Mfge8 3.55765 0.592274 0.981403 0.3948362
Plet1 4.76122 1.083235 0.977119 0.4232071
Trf 4.69603 0.561918 0.971340 0.2925815
Csn3 3.96991 0.522368 0.936573 0.0988377
Mgst1 2.82962 0.659289 0.957568 0.3355853
Lcn2 4.17483 1.532456 0.991254 0.5449747
mean.logFC.cohen min.logFC.cohen median.logFC.cohen
<numeric> <numeric> <numeric>
Mfge8 3.27795 2.91121 3.27795
Plet1 3.34906 3.09135 3.34906
Trf 3.55109 3.33510 3.55109
Csn3 2.72625 2.59127 2.72625
Mgst1 2.45540 2.13508 2.45540
Lcn2 2.66144 1.25184 2.66144
max.logFC.cohen rank.logFC.cohen mean.AUC min.AUC median.AUC
<numeric> <integer> <numeric> <numeric> <numeric>
Mfge8 3.64470 3 0.976055 0.968671 0.976055
Plet1 3.60677 2 0.962786 0.961119 0.962786
Trf 3.76707 2 0.954534 0.942453 0.954534
Csn3 2.86124 7 0.942844 0.936817 0.942844
Mgst1 2.77572 6 0.936582 0.923573 0.936582
Lcn2 4.07105 1 0.901258 0.811653 0.901258
max.AUC rank.AUC mean.logFC.detected min.logFC.detected
<numeric> <integer> <numeric> <numeric>
Mfge8 0.983440 2 1.24704 0.7303312
Plet1 0.964453 3 1.27024 0.6166450
Trf 0.966615 3 1.94048 0.8025863
Csn3 0.948871 6 3.60310 3.3762929
Mgst1 0.949591 5 1.33072 0.6843491
Lcn2 0.990863 1 1.30667 0.0973684
median.logFC.detected max.logFC.detected rank.logFC.detected
<numeric> <numeric> <integer>
Mfge8 1.24704 1.76375 129
Plet1 1.27024 1.92384 73
Trf 1.94048 3.07838 49
Csn3 3.60310 3.82991 23
Mgst1 1.33072 1.97709 110
Lcn2 1.30667 2.51597 77
# Violin plots of expression for a set of features on defined groups
::plotExpression(sce, features = c("Mfge8","Plet1"),
scaterx = "cell.class")
#Colour t-SNE according to the expression of identified marker genes:
<- plotReducedDim(sce, "TSNE.FMNN", colour_by = "Mfge8")
p1 <- plotReducedDim(sce, "TSNE.FMNN", colour_by = "Plet1")
p2
grid.arrange(p1, p2, ncol = 2)
#Heatmap for the top 5 marker genes per cell group:
<- rownames(cell.type1.markers.ord)[1:5]
top.feat1
<- markers_all[[2]]
cell.type2.markers <- order(cell.type2.markers$mean.AUC,decreasing=TRUE)
ord <- cell.type2.markers[ord,]
cell.type2.markers.ord <- rownames(cell.type2.markers.ord)[1:5]
top.feat2
<- markers_all[[3]]
cell.type3.markers <- order(cell.type3.markers$mean.AUC,decreasing=TRUE)
ord <- cell.type3.markers[ord,]
cell.type3.markers.ord <- rownames(cell.type3.markers.ord)[1:5]
top.feat3
::plotHeatmap(sce[,order(sce$cell.class)], features = c(top.feat1,top.feat2,top.feat3),
scatercluster_cols = FALSE)
Exercise 2
- We scored markers using the measure
mean.AUC
. What is it? Check “Marker detection section” of OSCA book. Use also other internet resources to understand this important statistical term.
This paragraph from OSCA book answers the question: In the context of marker detection, the area under the curve (AUC) quantifies our ability to distinguish between two distributions in a pairwise comparison. The AUC represents the probability that a randomly chosen observation from our cluster of interest is greater than a randomly chosen observation from the other cluster. A value of 1 corresponds to up-regulation, where all values of our cluster of interest are greater than any value from the other cluster; a value of 0.5 means that there is no net difference in the location of the distributions; and a value of 0 corresponds to down-regulation. The AUC is closely related to the U statistic in the Wilcoxon ranked sum test (a.k.a., Mann-Whitney U-test).
- Imagine if you would not know the type of cells from clusters. How could you figure it out? This is an example, how it would be possible to investigate the cell type of cluster 3.
- Option 1: Read about top marker genes one by one and search in the literature.
- Option 2: Try to annotate top genes. Study top 10 marker genes of cluster 3. Submit them to String-db. Click “Search”, then “Multiple proteins”. Insert top 10 marker genes in the search field. As the organism select “Mus musculus” and click “Search”. Then “Continue”. Go to the tab called “Analysis”.
String-db
will report a publication in 2008 that already mentioned some of these genes before. What is this publication about? Why is could be useful for you?
# To get top 10 marker genes from cluster 3
cat(paste0(rownames(cell.type3.markers.ord)[1:10], "\n"))
Prlr
Ptn
Gpx3
Wfdc2
Krt7
Tmed3
Krt19
Ly6d
Glul
Areg
This is the publication reported by String-db
: https://pubmed.ncbi.nlm.nih.gov/19063729. It would help to annotate cells from cluster 3 (luminal epithelial cells), if this information would not be known before (see Figure 3 of the paper).
And this is the link to String-DB annotation of top genes: https://version-11-5.string-db.org/cgi/network?networkId=bY48DnJlKn5r.
References
Lun ATL, Bach K and Marioni JC (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17:75
Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36(5):421
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47.
Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Storey JD, Zhang Y, Torres LC (2019). sva: Surrogate Variable Analysis. R package version 3.32.1
Soneson C, Robinson MD (2018). Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods, 15(4):255-261
Additional resources
- An online book with many details on performing single-cell analysis with Bioconductor packages:
- A very comprehensive page with links to a large number of available software, tutorials, apps and papers:
- Experimental design, batch effects, comparison of protocols and analysis workflows:
- http://www.cell.com/cell/abstract/S0092-8674(15)01353-7
- https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0927-y
- http://www.nature.com/nmeth/journal/v14/n4/abs/nmeth.4220.html
- http://www.nature.com/nbt/journal/v34/n11/full/nbt.3721.html
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5206706/
- Unique molecular identifiers, UMI collapse and error correction:
- http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2772.html
- https://github.com/CGATOxford/UMI-tools
- https://hemberg-lab.github.io/scRNA.seq.course/unique-molecular-identifiers-umis.html
- http://www.nature.com/nmeth/journal/v11/n6/full/nmeth.2930.html
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5206706/
- Data structures for SC data (and how to deal with storage/memory needs efficiently):
- Inference of hierarchical structure/lineage from SC data:
- http://www.nature.com/nbt/journal/v35/n6/full/nbt.3854.html
- https://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4402.html
- http://www.nature.com/nrg/journal/v18/n4/full/nrg.2016.159.html
- https://www.nature.com/ni/journal/v17/n3/full/ni.3344.html
- http://www.biorxiv.org/content/early/2017/04/19/128843
- https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1175-6
- http://www.nature.com/nmeth/journal/v13/n10/full/nmeth.3971.html?WT.feed_name=subjects_rna-sequencing
- SC medical advances / applications (really too long a list to cover, only some examples):
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4441954/
- http://www.sciencedirect.com/science/article/pii/S073497501730040X
- http://genome.cshlp.org/content/25/10/1499.full
- http://www.nature.com/nri/journal/vaop/ncurrent/abs/nri.2017.76.html
- http://www.nature.com/nrg/journal/v17/n8/abs/nrg.2016.67.html
- https://www.illumina.com/content/dam/illumina-marketing/documents/products/research_reviews/single-cell-sequencing-research-review.pdf
- Single-cell-based pooled screenings:
- Single-cell epigenomics:
Session info
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats4 grid stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] igraph_2.0.3 dplyr_1.1.4
[3] tidyr_1.3.1 BiocNeighbors_1.20.2
[5] BiocSingular_1.18.0 batchelor_1.18.1
[7] scran_1.30.2 scater_1.30.1
[9] ggplot2_3.5.0 scuttle_1.12.0
[11] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[13] Biobase_2.62.0 GenomicRanges_1.54.1
[15] GenomeInfoDb_1.38.8 IRanges_2.36.0
[17] S4Vectors_0.40.2 BiocGenerics_0.48.1
[19] MatrixGenerics_1.14.0 matrixStats_1.3.0
[21] BiocStyle_2.30.0 gridExtra_2.3
[23] png_0.1-8 knitr_1.45
loaded via a namespace (and not attached):
[1] ResidualMatrix_1.12.0 tidyselect_1.2.1
[3] viridisLite_0.4.2 farver_2.1.1
[5] vipor_0.4.7 viridis_0.6.5
[7] bitops_1.0-7 fastmap_1.1.1
[9] RCurl_1.98-1.14 bluster_1.12.0
[11] digest_0.6.35 rsvd_1.0.5
[13] lifecycle_1.0.4 cluster_2.1.6
[15] statmod_1.5.0 magrittr_2.0.3
[17] compiler_4.3.3 rlang_1.1.3
[19] tools_4.3.3 utf8_1.2.4
[21] yaml_2.3.8 labeling_0.4.3
[23] S4Arrays_1.2.1 dqrng_0.3.2
[25] DelayedArray_0.28.0 RColorBrewer_1.1-3
[27] abind_1.4-5 BiocParallel_1.36.0
[29] Rtsne_0.17 purrr_1.0.2
[31] withr_3.0.0 fansi_1.0.6
[33] beachmat_2.18.1 colorspace_2.1-0
[35] edgeR_4.0.16 scales_1.3.0
[37] cli_3.6.2 rmarkdown_2.26
[39] crayon_1.5.2 generics_0.1.3
[41] metapod_1.10.1 DelayedMatrixStats_1.24.0
[43] ggbeeswarm_0.7.2 zlibbioc_1.48.2
[45] parallel_4.3.3 BiocManager_1.30.22
[47] XVector_0.42.0 vctrs_0.6.5
[49] Matrix_1.6-1 jsonlite_1.8.8
[51] ggrepel_0.9.5 irlba_2.3.5.1
[53] beeswarm_0.4.0 locfit_1.5-9.9
[55] limma_3.58.1 glue_1.7.0
[57] codetools_0.2-20 gtable_0.3.4
[59] ScaledMatrix_1.10.0 munsell_0.5.1
[61] tibble_3.2.1 pillar_1.9.0
[63] htmltools_0.5.8.1 GenomeInfoDbData_1.2.11
[65] R6_2.5.1 sparseMatrixStats_1.14.0
[67] evaluate_0.23 lattice_0.22-6
[69] pheatmap_1.0.12 Rcpp_1.0.12
[71] SparseArray_1.2.4 xfun_0.43
[73] pkgconfig_2.0.3