The vignette depends on CIARA packages.

required <- c("CIARA")
if (!all(unlist(lapply(required, function(pkg) requireNamespace(pkg, quietly = TRUE)))))
  knitr::opts_chunk$set(eval = FALSE)

In this vignette it is shown the projection performed between single cell RNA seq mouse data from Iturbe et al., 2021 and in vivo mouse datasets from Deng et al. , 2014 and Mohammed et al. , 2017.

The single cell RNA seq dataset includes 1285 mouse embryonic stem cells, including a small cluster of 2-cell-like cells (2CLC) (cluster 2, 31 cells).

The in vivo mouse dataset from Deng et al. , 2014 includes stages from from early 2 cells-stage to late blastocyst while the in vivo mouse dataset from Mohammed et al. , 2017 includes stages from from E4.5 to E6.5.

Load mouse ESCs raw count matrix

We load the raw count matrix provided in the original paper and create norm counts and run cluster analysis with CIARA function cluster_analysis_integrate_rare

Raw count matrix can be downloaded here

current_wd <- getwd()
url = ""
destfile <- paste0(current_wd,"/")
download.file(url, destfile, quiet = FALSE)
unzip(destfile, exdir=current_wd)
norm_es_vitro=as.matrix(GetAssayData(mayra_seurat_0, slot = "data",assay="RNA"))

Load in vivo mouse datasets

The seurat object seurat_genes_published_mouse.Rda already includes the raw and normalized count matrix obtained combining the two in vivo datasets ( Deng et al., 014 and Mohammed et al. , 2017 ). Normalization was done with Seurat function NormalizeData (default parameters). Seurat object can be downloaded here


norm_vivo <- as.matrix(GetAssayData(seurat_genes_published_mouse, slot = "data",assay="RNA"))

Compute markers for selected in vivo stages

DefaultAssay(seurat_genes_published_mouse) <- "RNA"
cluster_mouse_published <- as.vector(seurat_genes_published_mouse$stim)

relevant_stages <- c("Late_2_cell", "epiblast_4.5", "epiblast_5.5", "epiblast_6.5")

DefaultAssay(seurat_genes_published_mouse) <- "RNA"

markers_first_ESC_small <- CIARA::markers_cluster_seurat(seurat_genes_published_mouse[,cluster_mouse_published%in%relevant_stages],cluster_mouse_published[cluster_mouse_published%in%relevant_stages],names(seurat_genes_published_mouse$RNA_snn_res.0.2)[cluster_mouse_published%in%relevant_stages],10)

markers_mouse <- as.vector(markers_first_ESC_small[[3]])
stages_markers <- names(markers_first_ESC_small[[3]])

## Keeping only the genes in common between in vitro and in vivo datasets
stages_markers <- stages_markers[markers_mouse %in% row.names(norm_es_vitro)]

markers_small <- markers_mouse[markers_mouse %in% row.names(norm_es_vitro)]
names(markers_small) <- stages_markers

Select only black/white markers for in vivo stages

For each in vivo stage, we select only the markers for which the median is above 0.1 and is below 0.1 in all the other stages.

marker_result <- select_top_markers(relevant_stages, cluster_mouse_published, norm_vivo, markers_small, max_number = 100, threshold = 0.1)
marker_all <- marker_result[[1]]
marker_stages <- marker_result[[2]]


We run SCOPRO between the cluster of the mouse ESCs dataset and the in vivo stage “Late 2-cells”.

The function SCOPRO first computes the mean expression profile of \emph{marker_stages_filter} genes for each cluster in the in vivo and in vitro dataset. For a given cluster, a connectivity matrix is computed with number of rows and number of columns equal to the length of \emph{marker_stages_filter}. Each entry (i,j) in the matrix can be 1 if the fold_change between gene i and gene j is above \emph{fold_change}. Otherwise is 0. Finally the connectivity matrix of Late 2-cells stage and all the clusters in the in vitro dataset are compared. A gene i is considered to be conserved between Late 2-cells stage and an in vitro cluster if the jaccard index of the links of gene i is above .

There are 25 markers of the Late 2-cells stage that are also expressed in the mouse ESC datasets. More than 75% of these 25 markers are conserved in the cluster number 2. This result is expected since cluster 2 is made up by 2CLC, a rare population of cells known to be transcriptionally similar to the late 2 cells-stage in the mouse embryo development (typical markers of 2CLC are the Zscan4 genes, also highly expressed in the late 2 cells-stage).

marker_stages_filter <- filter_in_vitro(norm_es_vitro,cluster_es_vitro ,marker_all, fraction = 0.10, threshold = 0)

analysis_2cell <- SCOPRO(norm_es_vitro,norm_vivo,cluster_es_vitro,cluster_mouse_published,"Late_2_cell",marker_stages_filter, threshold = 0.1, number_link = 1, fold_change = 3, threshold_fold_change = 0.1 ,marker_stages, relevant_stages)

plot_score(analysis_2cell, marker_stages, marker_stages_filter, relevant_stages, "Late_2_cell", "Final score", "Cluster", "Late_2_cell")

Visualization of conserved/ not conserved genes between late 2 cells stage and in vitro clusters

We can visualize which are the markers of the late 2 cells stage that are conserved/ not conserved in cluster 2. As expected the Zscan4 family genes are conserved.

common_genes <- select_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
no_common_genes <- select_no_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")

all_genes <- c(no_common_genes[1:4], common_genes[1:10])
all_genes_label <- c(paste0(no_common_genes[1:4], "-no_conserved"), paste0(common_genes[1:10], "-conserved"))

rabbit_plot <- plot_score_genes(all_genes, "Mouse ESC", "Mouse vitro", norm_es_vitro,norm_vivo[ , cluster_mouse_published=="Late_2_cell"],cluster_es_vitro, cluster_mouse_published[cluster_mouse_published == "Late_2_cell"], all_genes_label, 7, 10, "Late_2_cell")
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> other attached packages:
#> [1] SCOPRO_0.1.0
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.1   xfun_0.29          bslib_0.3.1        graphlayouts_0.8.0
#>  [5] purrr_0.3.4        colorspace_2.0-2   vctrs_0.3.8        generics_0.1.1    
#>  [9] viridisLite_0.4.0  htmltools_0.5.2    yaml_2.2.1         utf8_1.2.2        
#> [13] rlang_0.4.12       jquerylib_0.1.4    pillar_1.6.4       glue_1.6.0        
#> [17] DBI_1.1.2          tweenr_1.0.2       lifecycle_1.0.1    stringr_1.4.0     
#> [21] munsell_0.5.0      gtable_0.3.0       CIARA_0.1.0        evaluate_0.14     
#> [25] knitr_1.37         fastmap_1.1.0      parallel_4.0.2     fansi_0.5.0       
#> [29] tidygraph_1.2.0    Rcpp_1.0.7         scales_1.1.1       jsonlite_1.7.2    
#> [33] farver_2.1.0       gridExtra_2.3      ggforce_0.3.3      ggplot2_3.3.5     
#> [37] digest_0.6.29      stringi_1.7.6      ggrepel_0.9.1      dplyr_1.0.7       
#> [41] polyclip_1.10-0    grid_4.0.2         tools_4.0.2        magrittr_2.0.1    
#> [45] sass_0.4.0         tibble_3.1.6       ggraph_2.0.5       crayon_1.4.2      
#> [49] tidyr_1.1.4        pkgconfig_2.0.3    ellipsis_0.3.2     MASS_7.3-52       
#> [53] viridis_0.6.2      assertthat_0.2.1   rmarkdown_2.11     R6_2.5.1          
#> [57] igraph_1.2.11      compiler_4.0.2