Dataset Gating: Unsupervised Gating

Dataset Gating: Unsupervised Gating#

In this vignette we showcase the unsupervised gating approaches by FACSPy.

Cell populations in cytometry are defined by their marker expressions. Here, we use a clustering-based approach where every cluster is compared to a user-defined gating strategy. If a cluster matches the gating strategy, it is assigned the specific cell type.

First, we create the dataset, this time consisting of mouse peripheral blood cells. We transform as shown previously using the asinh transform.

[1]:
import warnings
warnings.filterwarnings(
    action='ignore',
    category=FutureWarning
)
[2]:
import FACSPy as fp
import os
[3]:
input_directory = "../../Tutorials/mouse_lineages/"
panel = fp.dt.Panel(os.path.join(input_directory, "panel.csv"))
metadata = fp.dt.Metadata(os.path.join(input_directory, "metadata_pb.csv"))
workspace = fp.dt.FlowJoWorkspace(os.path.join(input_directory, "lineages_full_gated_pb.wsp"))
[4]:
dataset = fp.dt.create_dataset(input_directory = input_directory,
                               panel = panel,
                               metadata = metadata,
                               workspace = workspace)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 221, FSC-H: 1, FSC-W: 71, SSC-A: 216, GFP-A: 54, BV421-A: 1, BUV496-A: 4, BB700-A: 6
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 439, FSC-H: 1, FSC-W: 17, SSC-A: 145, GFP-A: 7, BV421-A: 1, BV711-A: 2, BV786-A: 4, BUV395-A: 1, BUV496-A: 1, BUV737-A: 1, BB700-A: 1
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 882, FSC-H: 7, FSC-W: 124, SSC-A: 300, SSC-H: 3, SSC-W: 97, GFP-A: 3, APC-A: 17, APC-H7-A: 41, BV421-A: 619, BV510-A: 14698, BV605-A: 152, BV711-A: 20, BV786-A: 35, BUV395-A: 34, BUV496-A: 17001, BUV737-A: 32, BYG790-A: 40, BB700-A: 15
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1065, FSC-H: 3, FSC-W: 72, SSC-A: 192, SSC-H: 4, SSC-W: 42, GFP-A: 1, APC-A: 8, APC-H7-A: 30, BV421-A: 1477, BV510-A: 14720, BV605-A: 120, BV711-A: 18, BV786-A: 93, BUV395-A: 17, BUV496-A: 16519, BUV737-A: 19, BYG790-A: 27, BB700-A: 15
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1191, FSC-H: 2, FSC-W: 87, SSC-A: 310, SSC-H: 1, SSC-W: 26, GFP-A: 1, APC-A: 5, APC-H7-A: 6, BV421-A: 416, BV510-A: 10607, BV605-A: 51, BV711-A: 6, BV786-A: 40, BUV395-A: 11, BUV496-A: 11940, BUV737-A: 5, BYG790-A: 8, BB700-A: 3
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 376, SSC-A: 127, SSC-H: 405, SSC-W: 8, GFP-A: 86, APC-A: 7, APC-H7-A: 4, BV421-A: 41, BV510-A: 1328, BV605-A: 19833, BV711-A: 136, BV786-A: 22, BUV395-A: 14, BUV496-A: 30, BUV737-A: 22479, BYG790-A: 23, BB700-A: 60
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1363, FSC-H: 6, FSC-W: 120, SSC-A: 338, SSC-H: 6, SSC-W: 77, GFP-A: 2, APC-A: 14, APC-H7-A: 52, BV421-A: 1614, BV510-A: 29531, BV605-A: 221, BV711-A: 32, BV786-A: 103, BUV395-A: 30, BUV496-A: 33042, BUV737-A: 47, BYG790-A: 36, BB700-A: 31
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1538, FSC-H: 7, FSC-W: 68, SSC-A: 385, SSC-H: 9, SSC-W: 47, GFP-A: 4, APC-A: 2, APC-H7-A: 1, BV421-A: 9, BV510-A: 17, BV605-A: 7, BV711-A: 3, BUV496-A: 3, BUV737-A: 4, BYG790-A: 3, BB700-A: 1
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 856, FSC-H: 3, FSC-W: 36, SSC-A: 160, SSC-H: 3, SSC-W: 16, GFP-A: 2, BV421-A: 1, BV510-A: 5, BV605-A: 17, BUV395-A: 7
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 649, FSC-H: 3, FSC-W: 100, SSC-A: 284, SSC-H: 7, SSC-W: 65, GFP-A: 4, APC-A: 16, APC-H7-A: 55, BV421-A: 306, BV510-A: 16520, BV605-A: 36, BV711-A: 61, BV786-A: 50, BUV395-A: 24, BUV496-A: 19536, BUV737-A: 60, BYG790-A: 21, BB700-A: 54
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1070, FSC-H: 3, FSC-W: 55, SSC-A: 380, SSC-H: 3, SSC-W: 20, GFP-A: 1, APC-A: 21, APC-H7-A: 37, BV421-A: 156, BV510-A: 9506, BV605-A: 24, BV711-A: 32, BV786-A: 35, BUV395-A: 13, BUV496-A: 10985, BUV737-A: 39, BYG790-A: 23, BB700-A: 31
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1391, FSC-H: 3, FSC-W: 57, SSC-A: 420, SSC-H: 2, SSC-W: 20, GFP-A: 1, APC-A: 15, APC-H7-A: 26, BV421-A: 102, BV510-A: 5855, BV605-A: 8, BV711-A: 40, BV786-A: 37, BUV395-A: 2, BUV496-A: 6637, BUV737-A: 37, BYG790-A: 16, BB700-A: 34
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 2020, FSC-H: 24, FSC-W: 104, SSC-A: 388, SSC-H: 5, SSC-W: 63, GFP-A: 2, APC-A: 23, APC-H7-A: 77, BV421-A: 745, BV510-A: 13036, BV605-A: 90, BV711-A: 57, BV786-A: 173, BUV395-A: 49, BUV496-A: 14905, BUV737-A: 75, BYG790-A: 68, BB700-A: 52
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1382, FSC-H: 10, FSC-W: 66, SSC-A: 225, SSC-H: 5, SSC-W: 40, GFP-A: 1, APC-A: 25, APC-H7-A: 36, BV421-A: 423, BV510-A: 12729, BV605-A: 71, BV711-A: 44, BV786-A: 83, BUV395-A: 8, BUV496-A: 14631, BUV737-A: 49, BYG790-A: 21, BB700-A: 39
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1258, FSC-H: 8, FSC-W: 92, SSC-A: 270, SSC-H: 10, SSC-W: 58, GFP-A: 5, APC-A: 23, APC-H7-A: 52, BV421-A: 326, BV510-A: 14123, BV605-A: 38, BV711-A: 60, BV786-A: 72, BUV395-A: 18, BUV496-A: 17148, BUV737-A: 67, BYG790-A: 25, BB700-A: 54
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 481, FSC-H: 6, FSC-W: 112, SSC-A: 347, SSC-H: 7, SSC-W: 61, BUV395-A: 2, BUV496-A: 3
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 626, FSC-H: 4, FSC-W: 32, SSC-A: 98, SSC-H: 3, SSC-W: 21
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1078, FSC-H: 8, FSC-W: 78, SSC-A: 197, SSC-H: 4, SSC-W: 38, GFP-A: 1, APC-A: 26, APC-H7-A: 41, BV421-A: 233, BV510-A: 9102, BV605-A: 57, BV711-A: 21, BV786-A: 88, BUV395-A: 20, BUV496-A: 10788, BUV737-A: 40, BYG790-A: 26, BB700-A: 30
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 520, SSC-A: 20, SSC-W: 70, APC-H7-A: 9, BV510-A: 11, BV605-A: 20, BV711-A: 184, BV786-A: 10865, BUV395-A: 13, BUV496-A: 17, BUV737-A: 20, BYG790-A: 3, BB700-A: 12414
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 845, FSC-H: 3, FSC-W: 38, SSC-A: 125, SSC-H: 4, SSC-W: 22, APC-H7-A: 5, BV421-A: 18, BV510-A: 195, BV605-A: 8748, BV711-A: 42, BV786-A: 10, BUV395-A: 40, BUV496-A: 6, BUV737-A: 10050, BYG790-A: 16, BB700-A: 3
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 1938, FSC-H: 11, FSC-W: 103, SSC-A: 378, SSC-H: 7, SSC-W: 60, APC-H7-A: 18, BV421-A: 50, BV510-A: 345, BV605-A: 17794, BV711-A: 42, BV786-A: 25, BUV395-A: 97, BUV496-A: 20, BUV737-A: 21091, BYG790-A: 43, BB700-A: 22
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 557, FSC-H: 2, FSC-W: 63, SSC-A: 151, SSC-H: 8, SSC-W: 37, GFP-A: 1, APC-A: 27, APC-H7-A: 49, BV421-A: 348, BV510-A: 21624, BV605-A: 30, BV711-A: 47, BV786-A: 56, BUV395-A: 15, BUV496-A: 25098, BUV737-A: 52, BYG790-A: 18, BB700-A: 44
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 666, FSC-H: 7, FSC-W: 146, SSC-A: 603, SSC-H: 10, SSC-W: 102, GFP-A: 1, APC-A: 24, APC-H7-A: 67, BV421-A: 352, BV510-A: 19192, BV605-A: 40, BV711-A: 43, BV786-A: 46, BUV395-A: 14, BUV496-A: 21720, BUV737-A: 51, BYG790-A: 31, BB700-A: 40
  warnings.warn(self.message, UserWarning)
... gating sample 20112023_lineage_PB_Cre_neg_unstained_030.fcs
... gating sample 20112023_lineage_PB_Cre_pos_unstained_029.fcs
... gating sample 20112023_lineage_PB_M2_031.fcs
... gating sample 20112023_lineage_PB_M3_032.fcs
... gating sample 20112023_lineage_PB_M4_033.fcs
... gating sample 20112023_lineage_PB_M5_034.fcs
... gating sample 20112023_lineage_PB_M6_035.fcs
... gating sample 21112023_lineage_PB_Cre_neg_unstained_002.fcs
... gating sample 21112023_lineage_PB_Cre_pos_unstained_001.fcs
... gating sample 21112023_lineage_PB_M10_006.fcs
... gating sample 21112023_lineage_PB_M11_007.fcs
... gating sample 21112023_lineage_PB_M12_008.fcs
... gating sample 21112023_lineage_PB_M7_003.fcs
... gating sample 21112023_lineage_PB_M8_004.fcs
... gating sample 21112023_lineage_PB_M9_005.fcs
... gating sample 22112023_lineage_PB_Cre_neg_unstained_002.fcs
... gating sample 22112023_lineage_PB_Cre_pos_unstained_001.fcs
... gating sample 22112023_lineage_PB_M13_003.fcs
... gating sample 22112023_lineage_PB_M14_004.fcs
... gating sample 22112023_lineage_PB_M15_005.fcs
... gating sample 22112023_lineage_PB_M16_006.fcs
... gating sample 22112023_lineage_PB_M17_007.fcs
... gating sample 22112023_lineage_PB_M18_008.fcs
... compensating sample 20112023_lineage_PB_Cre_neg_unstained_030.fcs
... compensating sample 20112023_lineage_PB_Cre_pos_unstained_029.fcs
... compensating sample 20112023_lineage_PB_M2_031.fcs
... compensating sample 20112023_lineage_PB_M3_032.fcs
... compensating sample 20112023_lineage_PB_M4_033.fcs
... compensating sample 20112023_lineage_PB_M5_034.fcs
... compensating sample 20112023_lineage_PB_M6_035.fcs
... compensating sample 21112023_lineage_PB_Cre_neg_unstained_002.fcs
... compensating sample 21112023_lineage_PB_Cre_pos_unstained_001.fcs
... compensating sample 21112023_lineage_PB_M10_006.fcs
... compensating sample 21112023_lineage_PB_M11_007.fcs
... compensating sample 21112023_lineage_PB_M12_008.fcs
... compensating sample 21112023_lineage_PB_M7_003.fcs
... compensating sample 21112023_lineage_PB_M8_004.fcs
... compensating sample 21112023_lineage_PB_M9_005.fcs
... compensating sample 22112023_lineage_PB_Cre_neg_unstained_002.fcs
... compensating sample 22112023_lineage_PB_Cre_pos_unstained_001.fcs
... compensating sample 22112023_lineage_PB_M13_003.fcs
... compensating sample 22112023_lineage_PB_M14_004.fcs
... compensating sample 22112023_lineage_PB_M15_005.fcs
... compensating sample 22112023_lineage_PB_M16_006.fcs
... compensating sample 22112023_lineage_PB_M17_007.fcs
... compensating sample 22112023_lineage_PB_M18_008.fcs
[5]:
cofactors = fp.dt.CofactorTable(os.path.join(input_directory, "cofactors_pb.csv"))
fp.dt.transform(dataset,
                transform = "asinh",
                cofactor_table = cofactors,
                key_added = "transformed")
[6]:
dataset
[6]:
AnnData object with n_obs × n_vars = 21263495 × 20
    obs: 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age', 'staining'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors'
    obsm: 'gating'
    layers: 'compensated', 'transformed'

We subset the gate to cells, in order to remove erythrocytes from the dataset. Further, we remove the unstained samples and synchronize the dataset.

[7]:
fp.subset_gate(dataset, "cells")
dataset = dataset[dataset.obs["staining"] == "stained"].copy()
fp.sync.synchronize_dataset(dataset)

dataset
Found modified subsets: ['adata_obs_names', 'adata_sample_ids']
        ... synchronizing metadata object to contain sample_IDs of the dataset
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:12: UserWarning: It was detected that the dataset was modified.Please make sure that the performed analyses are still valid. Note that if you removed whole samples, mfi/fop calculations will not be affected.
  warnings.warn(message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\synchronization\_synchronize.py:106: DataModificationWarning: 'It was detected that the dataset was modified.Please make sure that the performed analyses are still valid. Note that if you removed whole samples, mfi/fop calculations will not be affected.'
  warnings.warn('', DataModificationWarning)
[7]:
AnnData object with n_obs × n_vars = 2032231 × 20
    obs: 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age', 'staining'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors'
    obsm: 'gating'
    layers: 'compensated', 'transformed'

We quickly visualize the data on an UMAP embedding. In order to do that, we calculate the PCA and neighbors from the data and subject it to the UMAP calculation.

As we are only interested in CD45+ cells, we set the fp.settings.default_gate and the fp.settings.default_layer accordingly.

[8]:
import scanpy as sc
sc.pp.subsample(dataset, n_obs = 30_000)
[9]:
fp.settings.default_gate = "CD45+"
fp.settings.default_layer = "transformed"

fp.tl.pca(dataset)
fp.tl.neighbors(dataset)
fp.tl.umap(dataset)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\pynndescent\pynndescent_.py:346: NumbaPendingDeprecationWarning: Code using Numba extension API maybe depending on 'old_style' error-capturing, which is deprecated and will be replaced by 'new_style' in a future release. See details at https://numba.readthedocs.io/en/latest/reference/deprecation.html#deprecation-of-old-style-numba-captured-errors
Exception origin:
  File "C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\numba\core\types\functions.py", line 486, in __getnewargs__
    raise ReferenceError("underlying object has vanished")

  init_rp_tree(data, dist, current_graph, leaf_array)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\pynndescent\pynndescent_.py:348: NumbaPendingDeprecationWarning: Code using Numba extension API maybe depending on 'old_style' error-capturing, which is deprecated and will be replaced by 'new_style' in a future release. See details at https://numba.readthedocs.io/en/latest/reference/deprecation.html#deprecation-of-old-style-numba-captured-errors
Exception origin:
  File "C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\numba\core\types\functions.py", line 486, in __getnewargs__
    raise ReferenceError("underlying object has vanished")

  init_random(n_neighbors, data, current_graph, dist, rng_state)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\pynndescent\pynndescent_.py:358: NumbaPendingDeprecationWarning: Code using Numba extension API maybe depending on 'old_style' error-capturing, which is deprecated and will be replaced by 'new_style' in a future release. See details at https://numba.readthedocs.io/en/latest/reference/deprecation.html#deprecation-of-old-style-numba-captured-errors
Exception origin:
  File "C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\numba\core\types\functions.py", line 486, in __getnewargs__
    raise ReferenceError("underlying object has vanished")

  nn_descent_internal_low_memory_parallel(

We use the fp.pl.umap() function to visualize the UMAP embedding. Already we can observe distinct groups of cells, likely corresponding to the different lineages.

We use the color parameter and color by the B-cell marker B220.

[10]:
fp.pl.umap(dataset, color = "B220", vmin = 0, vmax = 3)
../_images/vignettes_dataset_unsupervised_gating_13_0.png

Define a gating strategy#

In FACSPy, the gating strategy is defined as a dictionary. The keys correspond to the population names that should be gated, in our case ‘T_cells’, ‘CD4_T_cells’ etc.

The values are stored as a list. The first entry marks the parent population. For ‘T_cells’, the parent population is ‘CD45+’ cells, for ‘CD4_T_cells’ the parents are ‘T_cells’.

The second entry is a list containing the marker proteins.

We use the following convention. Note that positivity and negativity is defined based on the cofactors. 1. positive markers are marked with ‘+’ 2. negative markers are marked with ‘-’ 3. markers without ‘+’ or ‘-’ are treated as positive

Marker expression can also be expressed in quantiles. The default quantiles are [0, 0.33, 0.66, 1].
1. markers with exceptionally high positivity can be marked as ‘hi’ (quantile 0.66-1)
2. markers with intermediate expression can be marked as ‘int’ (quantile 0.33-0.66)
3. markers with low expression can be marked as ‘lo’ (quantile 0-0.33)

The quantiles can be changed upon the classifier setup.

Note that we prefix the cell populations with fp_ in order to be able to compare them to the manual gating strategy later.

[11]:
fp.rename_channel(dataset, "Siglec-F", "Siglec_F")
[12]:
gating_strategy = {
    "fp_T_cells": ["CD45+", ["CD3+", "CD45+"]],
    "fp_CD4_T_cells": ["T_cells", ["CD3+", "CD4+", "CD8-", "CD45+"]],
    "fp_CD8_T_cells": ["T_cells", ["CD3+", "CD4-", "CD8+", "CD45+"]],
    "fp_Neutrophils": ["CD45+", ["CD45+", "Ly6G+", "Ly6C+", "CD11b+"]],
    "fp_Monocytes": ["CD45+", ["CD45+", "Ly6C+", "Ly6G-", "CD11b+", "NK1.1-"]],
    "fp_B_cells": ["CD45+", ["CD45+", "B220+"]],
    "fp_NK_cells": ["CD45+", ["CD45+", "NK1.1+"]],
    "fp_Eosinophils": ["CD45+", ["CD45+", "Siglec_F+", "Ly6G-"]]
}

Cell identification#

In order to setup the classifier, we instantiate the fp.ml.unsupervisedGating() class. Here, we use leiden clustering for the cluster definition. Other possible options are parc, phenograph and flowsom.

We start the classifier by calling .identify_populations(). We can pass additional keyword arguments to the clustering method. Here, we choose a higher resolution for a finer cluster definition.

[13]:
clf = fp.ml.unsupervisedGating(dataset,
                               gating_strategy = gating_strategy,
                               clustering_algorithm = "leiden",
                               layer = "transformed",
                               cluster_key = None)
[14]:
clf.identify_populations(cluster_kwargs = {"resolution": 5})
Analyzing population: T_cells
... sample 21112023_lineage_PB_M11_007.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 22112023_lineage_PB_M16_006.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 22112023_lineage_PB_M17_007.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 22112023_lineage_PB_M15_005.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 20112023_lineage_PB_M6_035.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 21112023_lineage_PB_M8_004.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 20112023_lineage_PB_M2_031.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 20112023_lineage_PB_M5_034.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 21112023_lineage_PB_M9_005.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 20112023_lineage_PB_M3_032.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 22112023_lineage_PB_M14_004.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 22112023_lineage_PB_M18_008.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 20112023_lineage_PB_M4_033.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 21112023_lineage_PB_M12_008.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 22112023_lineage_PB_M13_003.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 21112023_lineage_PB_M10_006.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
... sample 21112023_lineage_PB_M7_003.fcs
computing PCA for leiden
computing neighbors for leiden!
     ... gating population fp_CD4_T_cells
     ... gating population fp_CD8_T_cells
Analyzing population: CD45+
... sample 21112023_lineage_PB_M11_007.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 22112023_lineage_PB_M16_006.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 22112023_lineage_PB_M17_007.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 22112023_lineage_PB_M15_005.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 20112023_lineage_PB_M6_035.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 21112023_lineage_PB_M8_004.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 20112023_lineage_PB_M2_031.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 20112023_lineage_PB_M5_034.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 21112023_lineage_PB_M9_005.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 20112023_lineage_PB_M3_032.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 22112023_lineage_PB_M14_004.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 22112023_lineage_PB_M18_008.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 20112023_lineage_PB_M4_033.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 21112023_lineage_PB_M12_008.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 22112023_lineage_PB_M13_003.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 21112023_lineage_PB_M10_006.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils
... sample 21112023_lineage_PB_M7_003.fcs
     ... gating population fp_T_cells
     ... gating population fp_Neutrophils
     ... gating population fp_Monocytes
     ... gating population fp_B_cells
     ... gating population fp_NK_cells
     ... gating population fp_Eosinophils

Data visualization#

In order to visualize the data, we convert the cells into an .obs column using fp.convert_gate_to_obs().

[15]:
fp.convert_gate_to_obs(dataset, "Neutrophils")
fp.convert_gate_to_obs(dataset, "fp_Neutrophils")

Here, we use matplotlib to assemble FACSPy generated plots. When we pass a matplotlib.Axes and pass show=False, we can use matplotlib to handle the plots for us.

Note that the manual gating strategy and the unsupervised gating strategy match almost perfectly and label Ly6G positive cells.

[16]:
from matplotlib import pyplot as plt

fig, ax = plt.subplots(ncols = 3, nrows = 1, figsize = (8.5,2))
ax[0] = fp.pl.umap(
    dataset,
    color = "Neutrophils",
    show = False,
    ax = ax[0]
)
ax[0].set_title("manual gating")
ax[1] = fp.pl.umap(
    dataset,
    color = "fp_Neutrophils",
    show = False,
    ax = ax[1]
)
ax[1].set_title("unsupervised gating")
ax[2] = fp.pl.umap(
    dataset,
    color = "Ly6G",
    vmin = 0,
    vmax = 3,
    show = False,
    ax = ax[2]
)
ax[2].set_title("Ly6G expression")

plt.tight_layout()

plt.show()
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\scanpy\plotting\_tools\scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\scanpy\plotting\_tools\scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
../_images/vignettes_dataset_unsupervised_gating_23_1.png

We repeat the analysis for B cells.

[17]:
fp.convert_gate_to_obs(dataset, "B_cells")
fp.convert_gate_to_obs(dataset, "fp_B_cells")

fig, ax = plt.subplots(ncols = 3, nrows = 1, figsize = (8,2))
ax[0] = fp.pl.umap(
    dataset,
    color = "B_cells",
    show = False,
    ax = ax[0]
)
ax[0].set_title("manual gating")
ax[1] = fp.pl.umap(
    dataset,
    color = "fp_B_cells",
    show = False,
    ax = ax[1]
)
ax[1].set_title("unsupervised gating")
ax[2] = fp.pl.umap(
    dataset,
    color = "B220",
    vmin = 0,
    vmax = 3,
    show = False,
    ax = ax[2]
)
ax[2].set_title("B220 expression")

plt.tight_layout()

plt.show()
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\scanpy\plotting\_tools\scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\scanpy\plotting\_tools\scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
../_images/vignettes_dataset_unsupervised_gating_25_1.png

Lastly, we calculate gate frequencies again and compare between manual and semi-supervised learning.

[18]:
fp.tl.gate_frequencies(dataset)
[19]:
fp.pl.gate_frequency(dataset,
                     gate = "B_cells",
                     freq_of = "CD45+",
                     groupby = "staining",
                     stat_test = False)
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_unsupervised_gating_28_1.png
[20]:
fp.pl.gate_frequency(dataset,
                     gate = "fp_B_cells",
                     freq_of = "CD45+",
                     groupby = "staining",
                     stat_test = False)
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_unsupervised_gating_29_1.png
[21]:
fp.save_dataset(dataset,
                output_dir = "../../Tutorials/mouse_lineages/",
                file_name = "raw_dataset_gated",
                overwrite = True)
File saved successfully