Dataset Creation and Transformation#

In this vignette, we showcase a typical analysis workflow for cytometry data.

First, we will assemble necessary metadata, the panel information and the accompanying workspace from FlowJo.

In order to transform the data, we will use the automated calculation of the necessary cofactors. Cofactors are the values that separate the positive from the negative populations in a specific channel. These values are used for the transformation itself as well as for the calculation of frequency-positives (FOP).

We start by importing the necessary libraries.

[1]:
import warnings
warnings.filterwarnings(
    action='ignore',
    category=FutureWarning
)
[2]:
import FACSPy as fp

Assemble the supplementary data#

We read the metadata table directly as a fp.dt.Metadata object. For further information how to use this object, e.g. in order to change values or add columns, please refer to the respective vignette.

The panel information is similarly read into a fp.dt.Panel object.

Lastly, the FlowJoWorkspace is imported which contains the compensation matrix as well as the gating we set manually.

[3]:
panel = fp.dt.Panel("../../Tutorials/mouse_lineages/panel.csv")
metadata = fp.dt.Metadata("../../Tutorials/mouse_lineages/metadata_bm.csv")
workspace = fp.dt.FlowJoWorkspace("../../Tutorials/mouse_lineages/lineages_full_gated_bm.wsp")

Finally, we create the dataset.

[4]:
dataset = fp.create_dataset(input_directory = "../../Tutorials/mouse_lineages",
                            panel = panel,
                            metadata = metadata,
                            workspace = workspace)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4046, FSC-H: 39, FSC-W: 112, SSC-A: 699, SSC-H: 20, SSC-W: 39, BUV496-A: 4, BB700-A: 9
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 3756, FSC-H: 49, FSC-W: 115, SSC-A: 801, SSC-H: 25, SSC-W: 51, GFP-A: 2, APC-A: 1, APC-H7-A: 1, BV421-A: 5, BV510-A: 13, BV605-A: 6, BB700-A: 1
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4804, FSC-H: 71, FSC-W: 145, SSC-A: 1005, SSC-H: 22, SSC-W: 49, APC-H7-A: 6, BV421-A: 11, BV510-A: 2541, BV605-A: 6760, BV711-A: 745, BV786-A: 43, BUV395-A: 21, BUV496-A: 129, BUV737-A: 7194, BYG790-A: 173, BB700-A: 9
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 6715, FSC-H: 96, FSC-W: 217, SSC-A: 1324, SSC-H: 33, SSC-W: 69, GFP-A: 5, APC-A: 8, APC-H7-A: 16, BV421-A: 3040, BV510-A: 7299, BV605-A: 857, BV711-A: 52, BV786-A: 43, BUV395-A: 140, BUV496-A: 7727, BUV737-A: 198, BYG790-A: 7, BB700-A: 23
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5090, FSC-H: 85, FSC-W: 163, SSC-A: 1140, SSC-H: 25, SSC-W: 76, GFP-A: 5, APC-A: 9, APC-H7-A: 16, BV421-A: 2871, BV510-A: 6925, BV605-A: 948, BV711-A: 61, BV786-A: 33, BUV395-A: 185, BUV496-A: 7361, BUV737-A: 251, BYG790-A: 11, BB700-A: 22
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 3994, FSC-H: 57, FSC-W: 98, SSC-A: 760, SSC-H: 20, SSC-W: 39, GFP-A: 5, APC-A: 8, APC-H7-A: 17, BV421-A: 2139, BV510-A: 5646, BV605-A: 594, BV711-A: 36, BV786-A: 35, BUV395-A: 102, BUV496-A: 6027, BUV737-A: 139, BYG790-A: 8, BB700-A: 17
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4634, FSC-H: 65, FSC-W: 181, SSC-A: 944, SSC-H: 29, SSC-W: 81, GFP-A: 2, APC-A: 12, APC-H7-A: 16, BV421-A: 2583, BV510-A: 6914, BV605-A: 797, BV711-A: 45, BV786-A: 28, BUV395-A: 152, BUV496-A: 7346, BUV737-A: 200, BYG790-A: 9, BB700-A: 15
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5643, FSC-H: 62, FSC-W: 224, SSC-A: 970, SSC-H: 31, SSC-W: 83, GFP-A: 3, APC-A: 7, APC-H7-A: 11, BV421-A: 2252, BV510-A: 6162, BV605-A: 690, BV711-A: 40, BV786-A: 23, BUV395-A: 136, BUV496-A: 6566, BUV737-A: 183, BYG790-A: 8, BB700-A: 13
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4643, FSC-H: 58, FSC-W: 121, SSC-A: 970, SSC-H: 18, SSC-W: 54, GFP-A: 1, BV605-A: 3, BUV395-A: 6, BUV496-A: 2
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4288, FSC-H: 64, FSC-W: 168, SSC-A: 895, SSC-H: 13, SSC-W: 74, GFP-A: 1, BV605-A: 5, BV711-A: 15, BUV395-A: 3, BUV496-A: 1, BB700-A: 1
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4153, FSC-H: 61, FSC-W: 170, SSC-A: 887, SSC-H: 16, SSC-W: 91, APC-H7-A: 10, BV421-A: 31, BV510-A: 2178, BV605-A: 7261, BV711-A: 652, BV786-A: 42, BUV395-A: 28, BUV496-A: 112, BUV737-A: 7836, BYG790-A: 187, BB700-A: 17
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4591, FSC-H: 64, FSC-W: 121, SSC-A: 818, SSC-H: 12, SSC-W: 39, GFP-A: 1, APC-A: 12, APC-H7-A: 24, BV421-A: 1892, BV510-A: 5521, BV605-A: 541, BV711-A: 26, BV786-A: 26, BUV395-A: 85, BUV496-A: 5878, BUV737-A: 142, BYG790-A: 11, BB700-A: 22
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4034, FSC-H: 79, FSC-W: 94, SSC-A: 909, SSC-H: 11, SSC-W: 44, APC-H7-A: 18, BV421-A: 40, BV510-A: 1980, BV605-A: 6409, BV711-A: 557, BV786-A: 54, BUV395-A: 41, BUV496-A: 111, BUV737-A: 6888, BYG790-A: 211, BB700-A: 19
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5086, FSC-H: 82, FSC-W: 168, SSC-A: 1126, SSC-H: 21, SSC-W: 71, GFP-A: 2, APC-A: 28, APC-H7-A: 38, BV421-A: 2131, BV510-A: 6383, BV605-A: 666, BV711-A: 60, BV786-A: 52, BUV395-A: 133, BUV496-A: 6899, BUV737-A: 202, BYG790-A: 16, BB700-A: 32
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4435, FSC-H: 73, FSC-W: 145, SSC-A: 1068, SSC-H: 12, SSC-W: 65, GFP-A: 3, APC-A: 15, APC-H7-A: 31, BV421-A: 2079, BV510-A: 6417, BV605-A: 676, BV711-A: 56, BV786-A: 33, BUV395-A: 147, BUV496-A: 6982, BUV737-A: 222, BYG790-A: 8, BB700-A: 32
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4201, FSC-H: 59, FSC-W: 152, SSC-A: 920, SSC-H: 15, SSC-W: 67, APC-H7-A: 11, BV421-A: 28, BV510-A: 1880, BV605-A: 6092, BV711-A: 585, BV786-A: 48, BUV395-A: 31, BUV496-A: 106, BUV737-A: 6607, BYG790-A: 178, BB700-A: 10
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5607, FSC-H: 64, FSC-W: 311, SSC-A: 1099, SSC-H: 30, SSC-W: 139, GFP-A: 4, BV605-A: 7, BUV395-A: 19, BUV496-A: 4
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4807, FSC-H: 66, FSC-W: 220, SSC-A: 872, SSC-H: 15, SSC-W: 98, GFP-A: 1, BV605-A: 7, BUV395-A: 20, BUV496-A: 4
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 3448, FSC-H: 54, FSC-W: 127, SSC-A: 831, SSC-H: 14, SSC-W: 59, GFP-A: 1, APC-A: 24, APC-H7-A: 37, BV421-A: 1920, BV510-A: 6340, BV605-A: 557, BV711-A: 51, BV786-A: 42, BUV395-A: 131, BUV496-A: 6724, BUV737-A: 190, BYG790-A: 14, BB700-A: 28
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4579, FSC-H: 78, FSC-W: 188, SSC-A: 860, SSC-H: 13, SSC-W: 68, GFP-A: 1, APC-A: 20, APC-H7-A: 26, BV421-A: 1953, BV510-A: 6281, BV605-A: 527, BV711-A: 23, BV786-A: 27, BUV395-A: 107, BUV496-A: 6656, BUV737-A: 162, BYG790-A: 12, BB700-A: 19
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4401, FSC-H: 64, FSC-W: 169, SSC-A: 871, SSC-H: 16, SSC-W: 68, APC-H7-A: 20, BV421-A: 27, BV510-A: 2072, BV605-A: 6955, BV711-A: 493, BV786-A: 47, BUV395-A: 33, BUV496-A: 112, BUV737-A: 7429, BYG790-A: 161, BB700-A: 10
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4958, FSC-H: 78, FSC-W: 184, SSC-A: 1258, SSC-H: 27, SSC-W: 82, APC-H7-A: 20, BV421-A: 29, BV510-A: 2327, BV605-A: 8027, BV711-A: 749, BV786-A: 50, BUV395-A: 41, BUV496-A: 123, BUV737-A: 8564, BYG790-A: 203, BB700-A: 12
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5720, FSC-H: 82, FSC-W: 300, SSC-A: 1419, SSC-H: 30, SSC-W: 127, GFP-A: 1, APC-A: 10, APC-H7-A: 20, BV421-A: 2471, BV510-A: 7677, BV605-A: 837, BV711-A: 43, BV786-A: 30, BUV395-A: 183, BUV496-A: 8155, BUV737-A: 252, BYG790-A: 3, BB700-A: 14
  warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5499, FSC-H: 68, FSC-W: 216, SSC-A: 1251, SSC-H: 22, SSC-W: 89, GFP-A: 1, APC-A: 11, APC-H7-A: 20, BV421-A: 1873, BV510-A: 6009, BV605-A: 608, BV711-A: 22, BV786-A: 35, BUV395-A: 123, BUV496-A: 6303, BUV737-A: 167, BYG790-A: 7, BB700-A: 14
  warnings.warn(self.message, UserWarning)
... gating sample 20112023_lineage_BM_Cre_neg_unstained_037.fcs
... gating sample 20112023_lineage_BM_Cre_pos_unstained_036.fcs
... gating sample 20112023_lineage_BM_M1_038.fcs
... gating sample 20112023_lineage_BM_M2_039.fcs
... gating sample 20112023_lineage_BM_M3_040.fcs
... gating sample 20112023_lineage_BM_M4_041.fcs
... gating sample 20112023_lineage_BM_M5_042.fcs
... gating sample 20112023_lineage_BM_M6_043.fcs
... gating sample 21112023_lineage_BM_Cre_neg_unstained_010.fcs
... gating sample 21112023_lineage_BM_Cre_pos_unstained_009.fcs
... gating sample 21112023_lineage_BM_M10_014.fcs
... gating sample 21112023_lineage_BM_M11_015.fcs
... gating sample 21112023_lineage_BM_M12_016.fcs
... gating sample 21112023_lineage_BM_M7_011.fcs
... gating sample 21112023_lineage_BM_M8_012.fcs
... gating sample 21112023_lineage_BM_M9_013.fcs
... gating sample 22112023_lineage_BM_Cre_neg_unstained_010.fcs
... gating sample 22112023_lineage_BM_Cre_pos_unstained_009.fcs
... gating sample 22112023_lineage_BM_M13_011.fcs
... gating sample 22112023_lineage_BM_M14_012.fcs
... gating sample 22112023_lineage_BM_M15_013.fcs
... gating sample 22112023_lineage_BM_M16_014.fcs
... gating sample 22112023_lineage_BM_M17_015.fcs
... gating sample 22112023_lineage_BM_M18_016.fcs
... compensating sample 20112023_lineage_BM_Cre_neg_unstained_037.fcs
... compensating sample 20112023_lineage_BM_Cre_pos_unstained_036.fcs
... compensating sample 20112023_lineage_BM_M1_038.fcs
... compensating sample 20112023_lineage_BM_M2_039.fcs
... compensating sample 20112023_lineage_BM_M3_040.fcs
... compensating sample 20112023_lineage_BM_M4_041.fcs
... compensating sample 20112023_lineage_BM_M5_042.fcs
... compensating sample 20112023_lineage_BM_M6_043.fcs
... compensating sample 21112023_lineage_BM_Cre_neg_unstained_010.fcs
... compensating sample 21112023_lineage_BM_Cre_pos_unstained_009.fcs
... compensating sample 21112023_lineage_BM_M10_014.fcs
... compensating sample 21112023_lineage_BM_M11_015.fcs
... compensating sample 21112023_lineage_BM_M12_016.fcs
... compensating sample 21112023_lineage_BM_M7_011.fcs
... compensating sample 21112023_lineage_BM_M8_012.fcs
... compensating sample 21112023_lineage_BM_M9_013.fcs
... compensating sample 22112023_lineage_BM_Cre_neg_unstained_010.fcs
... compensating sample 22112023_lineage_BM_Cre_pos_unstained_009.fcs
... compensating sample 22112023_lineage_BM_M13_011.fcs
... compensating sample 22112023_lineage_BM_M14_012.fcs
... compensating sample 22112023_lineage_BM_M15_013.fcs
... compensating sample 22112023_lineage_BM_M16_014.fcs
... compensating sample 22112023_lineage_BM_M17_015.fcs
... compensating sample 22112023_lineage_BM_M18_016.fcs

We obtain a dataset consisting of 3.212.862 cells of 20 channels.

For the specifics on how an AnnData object is structured, please refer to the vignette ‘The FACSPy dataset’.

[5]:
dataset
[5]:
AnnData object with n_obs × n_vars = 3212862 × 20
    obs: 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash'
    obsm: 'gating'
    layers: 'compensated'

Prepare cofactor calculation#

In order to perform the cofactor calculation, we need to specify which samples are stained. We do this by creating a column called ‘staining’ in the metadata.

If there are no control samples, mark every file as stained.

Although this column can already be specified using Excel or a similar program, we do it programatically in python.

[6]:
metadata_frame = dataset.uns["metadata"].to_df()

stained_files = [file for file in metadata_frame["file_name"] if not "unstained" in file]
unstained_files = [file for file in metadata_frame["file_name"] if "unstained" in file]
[7]:
metadata = dataset.uns["metadata"]

metadata.annotate(file_names = stained_files, column = "staining", value = "stained")
metadata.annotate(file_names = unstained_files, column = "staining", value = "unstained")

metadata_frame = dataset.uns["metadata"].to_df()
metadata_frame.head()
[7]:
sample_ID file_name organ genotype sex experiment age staining
0 1 20112023_lineage_BM_Cre_neg_unstained_037.fcs BM neg m 1 95 unstained
1 2 20112023_lineage_BM_Cre_pos_unstained_036.fcs BM pos m 1 95 unstained
2 3 20112023_lineage_BM_M1_038.fcs BM pos f 1 95 stained
3 4 20112023_lineage_BM_M2_039.fcs BM neg f 1 95 stained
4 5 20112023_lineage_BM_M3_040.fcs BM pos f 1 95 stained

FACSPy implements a synchronization module in order to transfer metadata to the .obs slot and vice versa. Here, we use the synchronization to transfer the respective staining information to the .obs slot.

For further information about synchronization of supplementary files in FACSPy, please refer to the respective vignette.

[8]:
fp.sync.synchronize_dataset(dataset)
dataset.obs.head()
Found modified subsets: ['metadata_columns']
        ... synchronizing dataset to contain columns of the metadata object
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:12: UserWarning: It was detected that the dataset was modified.Please make sure that the performed analyses are still valid. Note that if you removed whole samples, mfi/fop calculations will not be affected.
  warnings.warn(message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\synchronization\_synchronize.py:106: DataModificationWarning: 'It was detected that the dataset was modified.Please make sure that the performed analyses are still valid. Note that if you removed whole samples, mfi/fop calculations will not be affected.'
  warnings.warn('', DataModificationWarning)
[8]:
staining sample_ID file_name organ genotype sex experiment age
OBS_INDEX
0-0 unstained 1 20112023_lineage_BM_Cre_neg_unstained_037.fcs BM neg m 1 95
1-0 unstained 1 20112023_lineage_BM_Cre_neg_unstained_037.fcs BM neg m 1 95
2-0 unstained 1 20112023_lineage_BM_Cre_neg_unstained_037.fcs BM neg m 1 95
3-0 unstained 1 20112023_lineage_BM_Cre_neg_unstained_037.fcs BM neg m 1 95
4-0 unstained 1 20112023_lineage_BM_Cre_neg_unstained_037.fcs BM neg m 1 95

Cofactor calculation#

Next, we calculate the cofactors.

[9]:
fp.dt.calculate_cofactors(dataset)
... calculating cofactors
    ... sample 20112023_lineage_BM_M1_038.fcs
    ... sample 20112023_lineage_BM_M2_039.fcs
    ... sample 20112023_lineage_BM_M3_040.fcs
    ... sample 20112023_lineage_BM_M4_041.fcs
    ... sample 20112023_lineage_BM_M5_042.fcs
    ... sample 20112023_lineage_BM_M6_043.fcs
    ... sample 21112023_lineage_BM_M10_014.fcs
    ... sample 21112023_lineage_BM_M11_015.fcs
    ... sample 21112023_lineage_BM_M12_016.fcs
    ... sample 21112023_lineage_BM_M7_011.fcs
    ... sample 21112023_lineage_BM_M8_012.fcs
    ... sample 21112023_lineage_BM_M9_013.fcs
    ... sample 22112023_lineage_BM_M13_011.fcs
    ... sample 22112023_lineage_BM_M14_012.fcs
    ... sample 22112023_lineage_BM_M15_013.fcs
    ... sample 22112023_lineage_BM_M16_014.fcs
    ... sample 22112023_lineage_BM_M17_015.fcs
    ... sample 22112023_lineage_BM_M18_016.fcs

Note that we have additional entries in the .uns slot, containing the cofactors as a CofactorTable and the raw cofactors per channel.

[10]:
dataset
[10]:
AnnData object with n_obs × n_vars = 3212862 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors'
    obsm: 'gating'
    layers: 'compensated'
[11]:
dataset.uns["cofactors"].to_df()
[11]:
fcs_colname cofactors
0 GFP 607.857475
1 B220 643.436602
2 CD4 275.745685
3 Siglec-F 5713.021257
4 CD8 17884.975774
5 Ly6C 566.603132
6 NK1.1 197.031074
7 CD11b 484.784756
8 Ly6G 131.867339
9 DAPI 1871.729492
10 CD3 1830.410394
11 F4_80 1298.097604
12 CD45 785.022914
[12]:
dataset.uns["raw_cofactors"].head()
[12]:
GFP B220 CD4 Siglec-F CD8 Ly6C NK1.1 CD11b Ly6G DAPI CD3 F4_80 CD45
20112023_lineage_BM_M1_038.fcs 562.877813 697.245258 278.692780 5464.259489 18628.525714 591.509488 161.599662 521.184098 121.961490 1899.471546 1493.340853 30.286997 829.378594
20112023_lineage_BM_M2_039.fcs 548.994623 993.703762 327.034327 6091.917988 21313.777311 749.728180 189.301114 574.892001 108.630986 1773.316280 2852.789892 30.377886 771.596898
20112023_lineage_BM_M3_040.fcs 565.798075 979.276545 291.197758 6496.297498 19104.054043 674.887101 187.688078 574.650474 132.430131 1898.719574 2702.682747 30.329980 818.261753
20112023_lineage_BM_M4_041.fcs 825.932949 703.641388 312.354135 6456.971936 13622.016678 522.266000 133.188697 427.904056 156.770022 1803.676715 1453.757090 29.042495 780.603105
20112023_lineage_BM_M5_042.fcs 416.758882 695.864707 338.981557 5022.346330 12492.914543 500.842279 158.653363 460.474069 175.886829 1760.818980 1410.623555 10948.910519 740.388522

We can use FACSPy plotting to visualize the results.

[13]:
fp.pl.cofactor_distribution(dataset,
                            marker = "Ly6G",
                            groupby = "staining",
                            stat_test = False)
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_creation_and_transformation_21_1.png

Transformation#

Next, we transform the data using the arcsinh transformation and the newly calculated cofactors. We created a new layer called ‘transformed’.

[14]:
fp.dt.transform(dataset,
                transform = "asinh",
                cofactor_table = dataset.uns["cofactors"],
                key_added = "transformed",
                layer = "compensated")
dataset
[14]:
AnnData object with n_obs × n_vars = 3212862 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors'
    obsm: 'gating'
    layers: 'compensated', 'transformed'

Transformation visualization#

We visualize the transformation result via the transformation plot. Control samples are depicted in blue while the sample of interest (in this case the sample with sample_ID ‘3’) is displayed in red.

The left plot shows the untransformed raw data on a bi-exponential scale.

The middle plot shows a dot plot with the transformed values. The green line represents the cofactor, specifying the border of negative to positive.

The right plot plots the values as a histogram.

Note that we have two positive populations of Ly6C, corresponding to Ly6C int Neutrophils and Ly6C hi Monocytes. The cofactor calculation looks very reasonable.

[15]:
fp.pl.transformation_plot(dataset,
                          gate = "CD45+",
                          sample_identifier = "3",
                          marker = "Ly6G",
                          figsize = (12,3))
../_images/vignettes_dataset_creation_and_transformation_25_0.png

Setting cofactors manually#

Note that the cofactor for CD8 was very high.

Upon closer inspection, we notice a high divergence of the individual samples.

[16]:
fp.pl.cofactor_distribution(dataset,
                            marker = "CD8",
                            groupby = "staining",
                            stat_test = False)
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_creation_and_transformation_27_1.png

We notice in the transformation plot, that the cofactor was set too high. We would rather put it at 5000-7000.

[17]:
fp.pl.transformation_plot(dataset,
                          gate = "live",
                          sample_identifier = "3",
                          marker = "CD8",
                          figsize = (12,3))
../_images/vignettes_dataset_creation_and_transformation_29_0.png

In order to reset the cofactor manually, we use the cofactor table object and retransform the dataset.

[18]:
dataset.uns["cofactors"].set_cofactor("CD8", 6000)

fp.dt.transform(dataset,
                transform = "asinh",
                cofactor_table = dataset.uns["cofactors"],
                key_added = "transformed")

We visualize the transformation again and observe that the result is much better.

[19]:
fp.pl.transformation_plot(dataset,
                          gate = "live",
                          sample_identifier = "3",
                          marker = "CD8",
                          figsize = (12,3))
../_images/vignettes_dataset_creation_and_transformation_33_0.png

Other transformation methods#

FACSPy implements log, logicle, hyperlog and asinh transforms.

These can be accessed via the ‘transform’ parameter in fp.dt.transform(). The key_added parameter controls the name of the corresponding layer…

[20]:
fp.dt.transform(dataset,
                transform = "logicle",
                key_added = "logicle")
dataset
[20]:
AnnData object with n_obs × n_vars = 3212862 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors'
    obsm: 'gating'
    layers: 'compensated', 'transformed', 'logicle'

… that can be later used for analysis and visualization functions.

[21]:
fp.settings.default_gate = "CD45+"

fp.tl.pca(dataset, layer = "transformed")
fp.pl.pca(dataset, layer = "transformed", color = "Ly6G", vmin = 0, vmax = 5)
../_images/vignettes_dataset_creation_and_transformation_37_0.png
[22]:
fp.settings.default_gate = "CD45+"

fp.tl.pca(dataset, layer = "logicle")
fp.pl.pca(dataset, layer = "logicle", color = "Ly6G", vmin = 0, vmax = 0.8)
../_images/vignettes_dataset_creation_and_transformation_38_0.png

Save the dataset#

Finally, we save the dataset to the hard drive.

[23]:
fp.save_dataset(dataset,
                output_dir = "../../Tutorials/mouse_lineages",
                file_name = "raw_dataset",
                overwrite = True)
File saved successfully