Data analysis: Interacting with R packages#
In this vignette we cover the most basic steps to convert an anndata object to a SingleCellExperiment and perform analyses using R-based packages.
Please follow the installation instructions to setup a conda environment containing both Python and R. In order to setup the R-Python interface, we add R_HOME to the environment variables. Here, we use a generic expression pointing to the path of the currently used conda environment and its relative location where R is stored.
We next activate anndata2ri and load the jupyter extension from rpy2.
[1]:
import FACSPy as fp
[2]:
import os
from pathlib import Path
os.environ["R_HOME"] = os.path.join(Path(os.environ["CONDA_PREFIX"]), Path("lib/R/"))
import anndata2ri
anndata2ri.activate()
%load_ext rpy2.ipython
C:\Users\tarik\AppData\Local\Temp\ipykernel_32520\453632031.py:6: DeprecationWarning: The global conversion available with activate() is deprecated and will be removed in the next major release. Use a local converter.
anndata2ri.activate()
C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\robjects\packages.py:367: UserWarning: The symbol 'quartz' is not in this R namespace/package.
warnings.warn(
Dataset loading and conversion#
We create the dataset using FACSPy and perform fp.r_setup(). This function is necessary due to type inflictions from anndata2ri that have to be solved beforehand. We also split the .uns slot and the .obsm["gating"] slot from the anndata, since .uns contains custom data types (such as fp.dt.Metadata, fp.dt.Panel etc.) and the gating information is not really needed outside of FACSPy. If you want to keep the gating information, set the respective parameter to False.
[3]:
adata = fp.mouse_lineages()
[4]:
adata, uns, gating = fp.r_setup(adata)
Next, we use the %%R magic command and the -i flag. This converts adata to one of its’ R-equivalents, a SingleCellExperiment.
Performing FlowSOM clustering using Spectre#
Next, we proceed in the jupyter interface and calculate the FlowSOM clustering information using Spectre.
[5]:
%%R -i adata
adata
class: SingleCellExperiment
dim: 20 9210
metadata(0):
assays(2): compensated transformed
rownames(20): FSC-A FSC-H ... CD45 Time
rowData names(7): pns png ... pnn cofactors
colnames(9210): 4225-25 61597-5 ... 3624-67 44545-41
colData names(8): sample_ID file_name ... age staining
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
At this point, you may save the SingleCellExperiment using saveRDS and continue in your native R enviroment. This is recommended if the setup of the environments is too cumbersome. Later, we show how to re-read a saved (and potentially modified) SingleCellExperiment and convert it back to python.
[6]:
%%R
library(Spectre)
library(SingleCellExperiment)
[7]:
%%R
dt_sce <- Spectre::create.dt(adata)
cell.dat <- dt_sce$data.table
matchfor <- c("transformed")
matchPat <- paste0(matchfor, "\\w?\\b")
idxs <- lapply(matchPat, grep, names(cell.dat))
cluster.cols <- names(cell.dat)[idxs[[1]]]
cell.dat <- run.flowsom(cell.dat, cluster.cols, meta.k = 8)
Exception ignored from cffi callback <function _consolewrite_ex at 0x000001C787ABE200>:
Traceback (most recent call last):
File "C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 133, in _consolewrite_ex
s = conversion._cchar_to_str_with_maxlen(buf, n, _CCHAR_ENCODING)
File "C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 138, in _cchar_to_str_with_maxlen
s = ffi.string(c, maxlen).decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 6: invalid start byte
Exception ignored from cffi callback <function _consolewrite_ex at 0x000001C787ABE200>:
Traceback (most recent call last):
File "C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 133, in _consolewrite_ex
s = conversion._cchar_to_str_with_maxlen(buf, n, _CCHAR_ENCODING)
File "C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 138, in _cchar_to_str_with_maxlen
s = ffi.string(c, maxlen).decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 6: invalid start byte
Exception ignored from cffi callback <function _consolewrite_ex at 0x000001C787ABE200>:
Traceback (most recent call last):
File "C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\rinterface_lib\callbacks.py", line 133, in _consolewrite_ex
s = conversion._cchar_to_str_with_maxlen(buf, n, _CCHAR_ENCODING)
File "C:\Users\tarik\anaconda3\envs\facspy_r\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 138, in _cchar_to_str_with_maxlen
s = ffi.string(c, maxlen).decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 6: invalid start byte
data.table 1.15.2 using 12 threads (see ?getDTthreads). Latest news: r-datatable.com
Attache Paket: 'data.table'
Das folgende Objekt ist maskiert 'package:SummarizedExperiment':
shift
Das folgende Objekt ist maskiert 'package:GenomicRanges':
shift
Das folgende Objekt ist maskiert 'package:IRanges':
shift
Die folgenden Objekte sind maskiert von 'package:S4Vectors':
first, second
SingleCellExperiment detected
-- Adding metadata
-- Adding assay data
-- Adding DimRed data
-- Finalising
Converted a SingleCellExperiment object into a data.table stored in a list
Attache Paket: 'igraph'
Das folgende Objekt ist maskiert 'package:GenomicRanges':
union
Das folgende Objekt ist maskiert 'package:IRanges':
union
Das folgende Objekt ist maskiert 'package:S4Vectors':
union
Die folgenden Objekte sind maskiert von 'package:BiocGenerics':
normalize, path, union
Die folgenden Objekte sind maskiert von 'package:stats':
decompose, spectrum
Das folgende Objekt ist maskiert 'package:base':
union
Thanks for using FlowSOM. From version 2.1.4 on, the scale
parameter in the FlowSOM function defaults to FALSE
Preparing data
Starting FlowSOM
Building SOM
Mapping data to SOM
Building MST
Binding metacluster labels to starting dataset
Binding cluster labels to starting dataset
We append the relevant cluster information to the SingleCell experiment.
Next, we export the SingleCellExperiment back to anndata using the -o command.
In order to restore the old dataset, we use fp.r_restore(). This adds the .uns slot and the .obsm["gating"] slot back.
We see that FlowSOM clusters and FlowSOM metaclusters have been added to the .obs column of the dataset.
[8]:
%%R -o adata
colData(adata)$FlowSOM_clusters <- cell.dat$FlowSOM_cluster
colData(adata)$FlowSOM_metacluster <- cell.dat$FlowSOM_metacluster
adata
class: SingleCellExperiment
dim: 20 9210
metadata(0):
assays(2): compensated transformed
rownames(20): FSC-A FSC-H ... CD45 Time
rowData names(7): pns png ... pnn cofactors
colnames(9210): 4225-25 61597-5 ... 3624-67 44545-41
colData names(10): sample_ID file_name ... FlowSOM_clusters
FlowSOM_metacluster
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
fp.r_restore(adata, uns = uns, gating_matrix = gating) adata
[9]:
adata.obs
[9]:
| sample_ID | file_name | organ | genotype | sex | experiment | age | staining | FlowSOM_clusters | FlowSOM_metacluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 4225-25 | 26 | 21112023_lineage_BM_M10_014.fcs | BM | neg | f | 2 | 95 | stained | 172.0 | 5 |
| 61597-5 | 6 | 20112023_lineage_BM_M4_041.fcs | BM | pos | m | 1 | 95 | stained | 116.0 | 5 |
| 17511-50 | 51 | 22112023_lineage_BM_M14_012.fcs | BM | pos | m | 3 | 96 | stained | 161.0 | 8 |
| 42884-49 | 50 | 22112023_lineage_BM_M13_011.fcs | BM | pos | m | 3 | 96 | stained | 101.0 | 5 |
| 54157-30 | 31 | 21112023_lineage_BM_M9_013.fcs | BM | neg | f | 2 | 95 | stained | 5.0 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 5793-19 | 20 | 20112023_lineage_SPL_M3_048.fcs | SPL | pos | f | 1 | 95 | stained | 190.0 | 8 |
| 20707-70 | 71 | 22112023_lineage_SPL_M18_024.fcs | SPL | neg | m | 3 | 96 | stained | 190.0 | 8 |
| 7544-46 | 47 | 21112023_lineage_SPL_M9_021.fcs | SPL | neg | f | 2 | 95 | stained | 190.0 | 8 |
| 3624-67 | 68 | 22112023_lineage_SPL_M15_021.fcs | SPL | pos | m | 3 | 96 | stained | 189.0 | 8 |
| 44545-41 | 42 | 21112023_lineage_SPL_M10_022.fcs | SPL | neg | f | 2 | 95 | stained | 193.0 | 8 |
9210 rows × 10 columns
Next, we want to plot a dimensionality reduction using CATALYST.
For this, we need to add two columns in adata.var and adata.obs, respectively in order to matcht the requirements by CATALYST.
We convert the anndata object as described above and run the R functions.
[10]:
adata.var["marker_class"] = ["type" if type == "fluo" else "state" for type in adata.var["type"].tolist()]
adata.obs["sample_id"] = adata.obs["sample_ID"]
[11]:
%%R -i adata
library(CATALYST)
adata <- runDR(adata, dr = "UMAP", cells = 2000, features = "type", assay = "transformed")
plotDR(adata, dr = "UMAP", color_by = "Ly6G", assay = "transformed")
We see that a new slot has been created for the UMAP coordinates. This slot can be readily accessed using anndata.
[12]:
%%R -o adata
adata
class: SingleCellExperiment
dim: 20 9210
metadata(0):
assays(2): X transformed
rownames(20): FSC-A FSC-H ... CD45 Time
rowData names(8): pns png ... cofactors marker_class
colnames(9210): 4225-25 61597-5 ... 3624-67 44545-41
colData names(11): sample_ID file_name ... FlowSOM_metacluster
sample_id
reducedDimNames(1): UMAP
mainExpName: NULL
altExpNames(0):
[13]:
adata.obsm.keys()
[13]:
KeysView(AxisArrays with keys: X_umap)
Reread saved RDS files#
In order to convert SingleCellExperiments back that were modified in another R environment, read the RDS file and convert it to anndata via the -o command.
[14]:
%%R -o sce
saveRDS(adata, "my_data.rds")
sce <- readRDS("my_data.rds")
[15]:
sce
[15]:
AnnData object with n_obs × n_vars = 9210 × 20
obs: 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age', 'staining', 'FlowSOM_clusters', 'FlowSOM_metacluster', 'sample_id'
var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors', 'marker_class'
obsm: 'X_umap'
layers: 'transformed'