Dataset Creation and Transformation#
In this vignette, we showcase a typical analysis workflow for cytometry data.
First, we will assemble necessary metadata, the panel information and the accompanying workspace from FlowJo.
In order to transform the data, we will use the automated calculation of the necessary cofactors. Cofactors are the values that separate the positive from the negative populations in a specific channel. These values are used for the transformation itself as well as for the calculation of frequency-positives (FOP).
We start by importing the necessary libraries.
[1]:
import warnings
warnings.filterwarnings(
action='ignore',
category=FutureWarning
)
[2]:
import FACSPy as fp
Assemble the supplementary data#
We read the metadata table directly as a fp.dt.Metadata object. For further information how to use this object, e.g. in order to change values or add columns, please refer to the respective vignette.
The panel information is similarly read into a fp.dt.Panel object.
Lastly, the FlowJoWorkspace is imported which contains the compensation matrix as well as the gating we set manually.
[3]:
panel = fp.dt.Panel("../../Tutorials/mouse_lineages/panel.csv")
metadata = fp.dt.Metadata("../../Tutorials/mouse_lineages/metadata_bm.csv")
workspace = fp.dt.FlowJoWorkspace("../../Tutorials/mouse_lineages/lineages_full_gated_bm.wsp")
Finally, we create the dataset.
[4]:
dataset = fp.create_dataset(input_directory = "../../Tutorials/mouse_lineages",
panel = panel,
metadata = metadata,
workspace = workspace)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4046, FSC-H: 39, FSC-W: 112, SSC-A: 699, SSC-H: 20, SSC-W: 39, BUV496-A: 4, BB700-A: 9
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 3756, FSC-H: 49, FSC-W: 115, SSC-A: 801, SSC-H: 25, SSC-W: 51, GFP-A: 2, APC-A: 1, APC-H7-A: 1, BV421-A: 5, BV510-A: 13, BV605-A: 6, BB700-A: 1
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4804, FSC-H: 71, FSC-W: 145, SSC-A: 1005, SSC-H: 22, SSC-W: 49, APC-H7-A: 6, BV421-A: 11, BV510-A: 2541, BV605-A: 6760, BV711-A: 745, BV786-A: 43, BUV395-A: 21, BUV496-A: 129, BUV737-A: 7194, BYG790-A: 173, BB700-A: 9
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 6715, FSC-H: 96, FSC-W: 217, SSC-A: 1324, SSC-H: 33, SSC-W: 69, GFP-A: 5, APC-A: 8, APC-H7-A: 16, BV421-A: 3040, BV510-A: 7299, BV605-A: 857, BV711-A: 52, BV786-A: 43, BUV395-A: 140, BUV496-A: 7727, BUV737-A: 198, BYG790-A: 7, BB700-A: 23
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5090, FSC-H: 85, FSC-W: 163, SSC-A: 1140, SSC-H: 25, SSC-W: 76, GFP-A: 5, APC-A: 9, APC-H7-A: 16, BV421-A: 2871, BV510-A: 6925, BV605-A: 948, BV711-A: 61, BV786-A: 33, BUV395-A: 185, BUV496-A: 7361, BUV737-A: 251, BYG790-A: 11, BB700-A: 22
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 3994, FSC-H: 57, FSC-W: 98, SSC-A: 760, SSC-H: 20, SSC-W: 39, GFP-A: 5, APC-A: 8, APC-H7-A: 17, BV421-A: 2139, BV510-A: 5646, BV605-A: 594, BV711-A: 36, BV786-A: 35, BUV395-A: 102, BUV496-A: 6027, BUV737-A: 139, BYG790-A: 8, BB700-A: 17
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4634, FSC-H: 65, FSC-W: 181, SSC-A: 944, SSC-H: 29, SSC-W: 81, GFP-A: 2, APC-A: 12, APC-H7-A: 16, BV421-A: 2583, BV510-A: 6914, BV605-A: 797, BV711-A: 45, BV786-A: 28, BUV395-A: 152, BUV496-A: 7346, BUV737-A: 200, BYG790-A: 9, BB700-A: 15
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5643, FSC-H: 62, FSC-W: 224, SSC-A: 970, SSC-H: 31, SSC-W: 83, GFP-A: 3, APC-A: 7, APC-H7-A: 11, BV421-A: 2252, BV510-A: 6162, BV605-A: 690, BV711-A: 40, BV786-A: 23, BUV395-A: 136, BUV496-A: 6566, BUV737-A: 183, BYG790-A: 8, BB700-A: 13
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4643, FSC-H: 58, FSC-W: 121, SSC-A: 970, SSC-H: 18, SSC-W: 54, GFP-A: 1, BV605-A: 3, BUV395-A: 6, BUV496-A: 2
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4288, FSC-H: 64, FSC-W: 168, SSC-A: 895, SSC-H: 13, SSC-W: 74, GFP-A: 1, BV605-A: 5, BV711-A: 15, BUV395-A: 3, BUV496-A: 1, BB700-A: 1
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4153, FSC-H: 61, FSC-W: 170, SSC-A: 887, SSC-H: 16, SSC-W: 91, APC-H7-A: 10, BV421-A: 31, BV510-A: 2178, BV605-A: 7261, BV711-A: 652, BV786-A: 42, BUV395-A: 28, BUV496-A: 112, BUV737-A: 7836, BYG790-A: 187, BB700-A: 17
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4591, FSC-H: 64, FSC-W: 121, SSC-A: 818, SSC-H: 12, SSC-W: 39, GFP-A: 1, APC-A: 12, APC-H7-A: 24, BV421-A: 1892, BV510-A: 5521, BV605-A: 541, BV711-A: 26, BV786-A: 26, BUV395-A: 85, BUV496-A: 5878, BUV737-A: 142, BYG790-A: 11, BB700-A: 22
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4034, FSC-H: 79, FSC-W: 94, SSC-A: 909, SSC-H: 11, SSC-W: 44, APC-H7-A: 18, BV421-A: 40, BV510-A: 1980, BV605-A: 6409, BV711-A: 557, BV786-A: 54, BUV395-A: 41, BUV496-A: 111, BUV737-A: 6888, BYG790-A: 211, BB700-A: 19
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5086, FSC-H: 82, FSC-W: 168, SSC-A: 1126, SSC-H: 21, SSC-W: 71, GFP-A: 2, APC-A: 28, APC-H7-A: 38, BV421-A: 2131, BV510-A: 6383, BV605-A: 666, BV711-A: 60, BV786-A: 52, BUV395-A: 133, BUV496-A: 6899, BUV737-A: 202, BYG790-A: 16, BB700-A: 32
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4435, FSC-H: 73, FSC-W: 145, SSC-A: 1068, SSC-H: 12, SSC-W: 65, GFP-A: 3, APC-A: 15, APC-H7-A: 31, BV421-A: 2079, BV510-A: 6417, BV605-A: 676, BV711-A: 56, BV786-A: 33, BUV395-A: 147, BUV496-A: 6982, BUV737-A: 222, BYG790-A: 8, BB700-A: 32
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4201, FSC-H: 59, FSC-W: 152, SSC-A: 920, SSC-H: 15, SSC-W: 67, APC-H7-A: 11, BV421-A: 28, BV510-A: 1880, BV605-A: 6092, BV711-A: 585, BV786-A: 48, BUV395-A: 31, BUV496-A: 106, BUV737-A: 6607, BYG790-A: 178, BB700-A: 10
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5607, FSC-H: 64, FSC-W: 311, SSC-A: 1099, SSC-H: 30, SSC-W: 139, GFP-A: 4, BV605-A: 7, BUV395-A: 19, BUV496-A: 4
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4807, FSC-H: 66, FSC-W: 220, SSC-A: 872, SSC-H: 15, SSC-W: 98, GFP-A: 1, BV605-A: 7, BUV395-A: 20, BUV496-A: 4
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 3448, FSC-H: 54, FSC-W: 127, SSC-A: 831, SSC-H: 14, SSC-W: 59, GFP-A: 1, APC-A: 24, APC-H7-A: 37, BV421-A: 1920, BV510-A: 6340, BV605-A: 557, BV711-A: 51, BV786-A: 42, BUV395-A: 131, BUV496-A: 6724, BUV737-A: 190, BYG790-A: 14, BB700-A: 28
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4579, FSC-H: 78, FSC-W: 188, SSC-A: 860, SSC-H: 13, SSC-W: 68, GFP-A: 1, APC-A: 20, APC-H7-A: 26, BV421-A: 1953, BV510-A: 6281, BV605-A: 527, BV711-A: 23, BV786-A: 27, BUV395-A: 107, BUV496-A: 6656, BUV737-A: 162, BYG790-A: 12, BB700-A: 19
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4401, FSC-H: 64, FSC-W: 169, SSC-A: 871, SSC-H: 16, SSC-W: 68, APC-H7-A: 20, BV421-A: 27, BV510-A: 2072, BV605-A: 6955, BV711-A: 493, BV786-A: 47, BUV395-A: 33, BUV496-A: 112, BUV737-A: 7429, BYG790-A: 161, BB700-A: 10
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 4958, FSC-H: 78, FSC-W: 184, SSC-A: 1258, SSC-H: 27, SSC-W: 82, APC-H7-A: 20, BV421-A: 29, BV510-A: 2327, BV605-A: 8027, BV711-A: 749, BV786-A: 50, BUV395-A: 41, BUV496-A: 123, BUV737-A: 8564, BYG790-A: 203, BB700-A: 12
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5720, FSC-H: 82, FSC-W: 300, SSC-A: 1419, SSC-H: 30, SSC-W: 127, GFP-A: 1, APC-A: 10, APC-H7-A: 20, BV421-A: 2471, BV510-A: 7677, BV605-A: 837, BV711-A: 43, BV786-A: 30, BUV395-A: 183, BUV496-A: 8155, BUV737-A: 252, BYG790-A: 3, BB700-A: 14
warnings.warn(self.message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:40: UserWarning: Some data points exceed the PnR value. The data points are truncated. To avoid truncation, set the PnR value manually or pass `truncate_max_range = False`. The following counts were outside the channel range: FSC-A: 5499, FSC-H: 68, FSC-W: 216, SSC-A: 1251, SSC-H: 22, SSC-W: 89, GFP-A: 1, APC-A: 11, APC-H7-A: 20, BV421-A: 1873, BV510-A: 6009, BV605-A: 608, BV711-A: 22, BV786-A: 35, BUV395-A: 123, BUV496-A: 6303, BUV737-A: 167, BYG790-A: 7, BB700-A: 14
warnings.warn(self.message, UserWarning)
... gating sample 20112023_lineage_BM_Cre_neg_unstained_037.fcs
... gating sample 20112023_lineage_BM_Cre_pos_unstained_036.fcs
... gating sample 20112023_lineage_BM_M1_038.fcs
... gating sample 20112023_lineage_BM_M2_039.fcs
... gating sample 20112023_lineage_BM_M3_040.fcs
... gating sample 20112023_lineage_BM_M4_041.fcs
... gating sample 20112023_lineage_BM_M5_042.fcs
... gating sample 20112023_lineage_BM_M6_043.fcs
... gating sample 21112023_lineage_BM_Cre_neg_unstained_010.fcs
... gating sample 21112023_lineage_BM_Cre_pos_unstained_009.fcs
... gating sample 21112023_lineage_BM_M10_014.fcs
... gating sample 21112023_lineage_BM_M11_015.fcs
... gating sample 21112023_lineage_BM_M12_016.fcs
... gating sample 21112023_lineage_BM_M7_011.fcs
... gating sample 21112023_lineage_BM_M8_012.fcs
... gating sample 21112023_lineage_BM_M9_013.fcs
... gating sample 22112023_lineage_BM_Cre_neg_unstained_010.fcs
... gating sample 22112023_lineage_BM_Cre_pos_unstained_009.fcs
... gating sample 22112023_lineage_BM_M13_011.fcs
... gating sample 22112023_lineage_BM_M14_012.fcs
... gating sample 22112023_lineage_BM_M15_013.fcs
... gating sample 22112023_lineage_BM_M16_014.fcs
... gating sample 22112023_lineage_BM_M17_015.fcs
... gating sample 22112023_lineage_BM_M18_016.fcs
... compensating sample 20112023_lineage_BM_Cre_neg_unstained_037.fcs
... compensating sample 20112023_lineage_BM_Cre_pos_unstained_036.fcs
... compensating sample 20112023_lineage_BM_M1_038.fcs
... compensating sample 20112023_lineage_BM_M2_039.fcs
... compensating sample 20112023_lineage_BM_M3_040.fcs
... compensating sample 20112023_lineage_BM_M4_041.fcs
... compensating sample 20112023_lineage_BM_M5_042.fcs
... compensating sample 20112023_lineage_BM_M6_043.fcs
... compensating sample 21112023_lineage_BM_Cre_neg_unstained_010.fcs
... compensating sample 21112023_lineage_BM_Cre_pos_unstained_009.fcs
... compensating sample 21112023_lineage_BM_M10_014.fcs
... compensating sample 21112023_lineage_BM_M11_015.fcs
... compensating sample 21112023_lineage_BM_M12_016.fcs
... compensating sample 21112023_lineage_BM_M7_011.fcs
... compensating sample 21112023_lineage_BM_M8_012.fcs
... compensating sample 21112023_lineage_BM_M9_013.fcs
... compensating sample 22112023_lineage_BM_Cre_neg_unstained_010.fcs
... compensating sample 22112023_lineage_BM_Cre_pos_unstained_009.fcs
... compensating sample 22112023_lineage_BM_M13_011.fcs
... compensating sample 22112023_lineage_BM_M14_012.fcs
... compensating sample 22112023_lineage_BM_M15_013.fcs
... compensating sample 22112023_lineage_BM_M16_014.fcs
... compensating sample 22112023_lineage_BM_M17_015.fcs
... compensating sample 22112023_lineage_BM_M18_016.fcs
We obtain a dataset consisting of 3.212.862 cells of 20 channels.
For the specifics on how an AnnData object is structured, please refer to the vignette ‘The FACSPy dataset’.
[5]:
dataset
[5]:
AnnData object with n_obs × n_vars = 3212862 × 20
obs: 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn'
uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash'
obsm: 'gating'
layers: 'compensated'
Prepare cofactor calculation#
In order to perform the cofactor calculation, we need to specify which samples are stained. We do this by creating a column called ‘staining’ in the metadata.
If there are no control samples, mark every file as stained.
Although this column can already be specified using Excel or a similar program, we do it programatically in python.
[6]:
metadata_frame = dataset.uns["metadata"].to_df()
stained_files = [file for file in metadata_frame["file_name"] if not "unstained" in file]
unstained_files = [file for file in metadata_frame["file_name"] if "unstained" in file]
[7]:
metadata = dataset.uns["metadata"]
metadata.annotate(file_names = stained_files, column = "staining", value = "stained")
metadata.annotate(file_names = unstained_files, column = "staining", value = "unstained")
metadata_frame = dataset.uns["metadata"].to_df()
metadata_frame.head()
[7]:
| sample_ID | file_name | organ | genotype | sex | experiment | age | staining | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 20112023_lineage_BM_Cre_neg_unstained_037.fcs | BM | neg | m | 1 | 95 | unstained |
| 1 | 2 | 20112023_lineage_BM_Cre_pos_unstained_036.fcs | BM | pos | m | 1 | 95 | unstained |
| 2 | 3 | 20112023_lineage_BM_M1_038.fcs | BM | pos | f | 1 | 95 | stained |
| 3 | 4 | 20112023_lineage_BM_M2_039.fcs | BM | neg | f | 1 | 95 | stained |
| 4 | 5 | 20112023_lineage_BM_M3_040.fcs | BM | pos | f | 1 | 95 | stained |
FACSPy implements a synchronization module in order to transfer metadata to the .obs slot and vice versa. Here, we use the synchronization to transfer the respective staining information to the .obs slot.
For further information about synchronization of supplementary files in FACSPy, please refer to the respective vignette.
[8]:
fp.sync.synchronize_dataset(dataset)
dataset.obs.head()
Found modified subsets: ['metadata_columns']
... synchronizing dataset to contain columns of the metadata object
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\exceptions\_exceptions.py:12: UserWarning: It was detected that the dataset was modified.Please make sure that the performed analyses are still valid. Note that if you removed whole samples, mfi/fop calculations will not be affected.
warnings.warn(message, UserWarning)
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\synchronization\_synchronize.py:106: DataModificationWarning: 'It was detected that the dataset was modified.Please make sure that the performed analyses are still valid. Note that if you removed whole samples, mfi/fop calculations will not be affected.'
warnings.warn('', DataModificationWarning)
[8]:
| staining | sample_ID | file_name | organ | genotype | sex | experiment | age | |
|---|---|---|---|---|---|---|---|---|
| OBS_INDEX | ||||||||
| 0-0 | unstained | 1 | 20112023_lineage_BM_Cre_neg_unstained_037.fcs | BM | neg | m | 1 | 95 |
| 1-0 | unstained | 1 | 20112023_lineage_BM_Cre_neg_unstained_037.fcs | BM | neg | m | 1 | 95 |
| 2-0 | unstained | 1 | 20112023_lineage_BM_Cre_neg_unstained_037.fcs | BM | neg | m | 1 | 95 |
| 3-0 | unstained | 1 | 20112023_lineage_BM_Cre_neg_unstained_037.fcs | BM | neg | m | 1 | 95 |
| 4-0 | unstained | 1 | 20112023_lineage_BM_Cre_neg_unstained_037.fcs | BM | neg | m | 1 | 95 |
Cofactor calculation#
Next, we calculate the cofactors.
[9]:
fp.dt.calculate_cofactors(dataset)
... calculating cofactors
... sample 20112023_lineage_BM_M1_038.fcs
... sample 20112023_lineage_BM_M2_039.fcs
... sample 20112023_lineage_BM_M3_040.fcs
... sample 20112023_lineage_BM_M4_041.fcs
... sample 20112023_lineage_BM_M5_042.fcs
... sample 20112023_lineage_BM_M6_043.fcs
... sample 21112023_lineage_BM_M10_014.fcs
... sample 21112023_lineage_BM_M11_015.fcs
... sample 21112023_lineage_BM_M12_016.fcs
... sample 21112023_lineage_BM_M7_011.fcs
... sample 21112023_lineage_BM_M8_012.fcs
... sample 21112023_lineage_BM_M9_013.fcs
... sample 22112023_lineage_BM_M13_011.fcs
... sample 22112023_lineage_BM_M14_012.fcs
... sample 22112023_lineage_BM_M15_013.fcs
... sample 22112023_lineage_BM_M16_014.fcs
... sample 22112023_lineage_BM_M17_015.fcs
... sample 22112023_lineage_BM_M18_016.fcs
Note that we have additional entries in the .uns slot, containing the cofactors as a CofactorTable and the raw cofactors per channel.
[10]:
dataset
[10]:
AnnData object with n_obs × n_vars = 3212862 × 20
obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn'
uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors'
obsm: 'gating'
layers: 'compensated'
[11]:
dataset.uns["cofactors"].to_df()
[11]:
| fcs_colname | cofactors | |
|---|---|---|
| 0 | GFP | 607.857475 |
| 1 | B220 | 643.436602 |
| 2 | CD4 | 275.745685 |
| 3 | Siglec-F | 5713.021257 |
| 4 | CD8 | 17884.975774 |
| 5 | Ly6C | 566.603132 |
| 6 | NK1.1 | 197.031074 |
| 7 | CD11b | 484.784756 |
| 8 | Ly6G | 131.867339 |
| 9 | DAPI | 1871.729492 |
| 10 | CD3 | 1830.410394 |
| 11 | F4_80 | 1298.097604 |
| 12 | CD45 | 785.022914 |
[12]:
dataset.uns["raw_cofactors"].head()
[12]:
| GFP | B220 | CD4 | Siglec-F | CD8 | Ly6C | NK1.1 | CD11b | Ly6G | DAPI | CD3 | F4_80 | CD45 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20112023_lineage_BM_M1_038.fcs | 562.877813 | 697.245258 | 278.692780 | 5464.259489 | 18628.525714 | 591.509488 | 161.599662 | 521.184098 | 121.961490 | 1899.471546 | 1493.340853 | 30.286997 | 829.378594 |
| 20112023_lineage_BM_M2_039.fcs | 548.994623 | 993.703762 | 327.034327 | 6091.917988 | 21313.777311 | 749.728180 | 189.301114 | 574.892001 | 108.630986 | 1773.316280 | 2852.789892 | 30.377886 | 771.596898 |
| 20112023_lineage_BM_M3_040.fcs | 565.798075 | 979.276545 | 291.197758 | 6496.297498 | 19104.054043 | 674.887101 | 187.688078 | 574.650474 | 132.430131 | 1898.719574 | 2702.682747 | 30.329980 | 818.261753 |
| 20112023_lineage_BM_M4_041.fcs | 825.932949 | 703.641388 | 312.354135 | 6456.971936 | 13622.016678 | 522.266000 | 133.188697 | 427.904056 | 156.770022 | 1803.676715 | 1453.757090 | 29.042495 | 780.603105 |
| 20112023_lineage_BM_M5_042.fcs | 416.758882 | 695.864707 | 338.981557 | 5022.346330 | 12492.914543 | 500.842279 | 158.653363 | 460.474069 | 175.886829 | 1760.818980 | 1410.623555 | 10948.910519 | 740.388522 |
We can use FACSPy plotting to visualize the results.
[13]:
fp.pl.cofactor_distribution(dataset,
marker = "Ly6G",
groupby = "staining",
stat_test = False)
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Transformation#
Next, we transform the data using the arcsinh transformation and the newly calculated cofactors. We created a new layer called ‘transformed’.
[14]:
fp.dt.transform(dataset,
transform = "asinh",
cofactor_table = dataset.uns["cofactors"],
key_added = "transformed",
layer = "compensated")
dataset
[14]:
AnnData object with n_obs × n_vars = 3212862 × 20
obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors'
obsm: 'gating'
layers: 'compensated', 'transformed'
Transformation visualization#
We visualize the transformation result via the transformation plot. Control samples are depicted in blue while the sample of interest (in this case the sample with sample_ID ‘3’) is displayed in red.
The left plot shows the untransformed raw data on a bi-exponential scale.
The middle plot shows a dot plot with the transformed values. The green line represents the cofactor, specifying the border of negative to positive.
The right plot plots the values as a histogram.
Note that we have two positive populations of Ly6C, corresponding to Ly6C int Neutrophils and Ly6C hi Monocytes. The cofactor calculation looks very reasonable.
[15]:
fp.pl.transformation_plot(dataset,
gate = "CD45+",
sample_identifier = "3",
marker = "Ly6G",
figsize = (12,3))
Setting cofactors manually#
Note that the cofactor for CD8 was very high.
Upon closer inspection, we notice a high divergence of the individual samples.
[16]:
fp.pl.cofactor_distribution(dataset,
marker = "CD8",
groupby = "staining",
stat_test = False)
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
We notice in the transformation plot, that the cofactor was set too high. We would rather put it at 5000-7000.
[17]:
fp.pl.transformation_plot(dataset,
gate = "live",
sample_identifier = "3",
marker = "CD8",
figsize = (12,3))
In order to reset the cofactor manually, we use the cofactor table object and retransform the dataset.
[18]:
dataset.uns["cofactors"].set_cofactor("CD8", 6000)
fp.dt.transform(dataset,
transform = "asinh",
cofactor_table = dataset.uns["cofactors"],
key_added = "transformed")
We visualize the transformation again and observe that the result is much better.
[19]:
fp.pl.transformation_plot(dataset,
gate = "live",
sample_identifier = "3",
marker = "CD8",
figsize = (12,3))
Other transformation methods#
FACSPy implements log, logicle, hyperlog and asinh transforms.
These can be accessed via the ‘transform’ parameter in fp.dt.transform(). The key_added parameter controls the name of the corresponding layer…
[20]:
fp.dt.transform(dataset,
transform = "logicle",
key_added = "logicle")
dataset
[20]:
AnnData object with n_obs × n_vars = 3212862 × 20
obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors'
obsm: 'gating'
layers: 'compensated', 'transformed', 'logicle'
… that can be later used for analysis and visualization functions.
[21]:
fp.settings.default_gate = "CD45+"
fp.tl.pca(dataset, layer = "transformed")
fp.pl.pca(dataset, layer = "transformed", color = "Ly6G", vmin = 0, vmax = 5)
[22]:
fp.settings.default_gate = "CD45+"
fp.tl.pca(dataset, layer = "logicle")
fp.pl.pca(dataset, layer = "logicle", color = "Ly6G", vmin = 0, vmax = 0.8)
Save the dataset#
Finally, we save the dataset to the hard drive.
[23]:
fp.save_dataset(dataset,
output_dir = "../../Tutorials/mouse_lineages",
file_name = "raw_dataset",
overwrite = True)
File saved successfully