Data analysis: Fluorescence intensity and Frequency of Parents#

In this vignette we will cover how to calculate MFIs and FOPs and visualize them.

We will first import necessary libraries and read our dataset we created in earlier vignettes.

[1]:

import warnings
warnings.filterwarnings(
    action='ignore',
    category=FutureWarning
)

[2]:

import FACSPy as fp

[3]:

dataset = fp.read_dataset(input_dir = "../../Tutorials/mouse_lineages/",
                          file_name = "raw_dataset_stained")
dataset

[3]:

AnnData object with n_obs × n_vars = 2450306 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors', 'settings', 'pca_CD45+_transformed', 'pca_CD45+_logicle', 'gate_frequencies'
    obsm: 'X_pca_CD45+_logicle', 'X_pca_CD45+_transformed', 'gating'
    varm: 'pca_CD45+_logicle', 'pca_CD45+_transformed'
    layers: 'compensated', 'logicle', 'transformed'

MFI calculation#

In order to calculate the MFI, we use the fp.tl.mfi() function. Note that by default, MFIs are calculated by sample_ID. We will later showcase how to calculate MFI on different variables.

Currently, we will calculate the median fluorescence intensity on the compensated data. In order to calculate the mean, pass method='mean. If calculations on other data layers are needed, pass the layer argument, specifying the data stored in .layers.

[4]:

fp.tl.mfi(dataset)

[5]:

dataset

[5]:

AnnData object with n_obs × n_vars = 2450306 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors', 'settings', 'pca_CD45+_transformed', 'pca_CD45+_logicle', 'gate_frequencies', 'mfi_sample_ID_compensated'
    obsm: 'X_pca_CD45+_logicle', 'X_pca_CD45+_transformed', 'gating'
    varm: 'pca_CD45+_logicle', 'pca_CD45+_transformed'
    layers: 'compensated', 'logicle', 'transformed'

We have a new entry in the .uns slot called mfi_sample_ID_compensated. This entry contains a dataframe, where median fluorescence values for each channel and gate are stored.

[6]:

dataset.uns["mfi_sample_ID_compensated"].head()

[6]:

		FSC-A	FSC-H	FSC-W	SSC-A	SSC-H	SSC-W	GFP	B220	CD4	Siglec-F	CD8	Ly6C	NK1.1	CD11b	Ly6G	DAPI	CD3	F4_80	CD45	Time
sample_ID	gate
11	root/cells	133443.234375	119058.652344	122137.394531	38624.003906	36014.041016	76540.160156	127.794876	109.917522	-18.103258	74.135475	408.144058	249.207123	22.688130	139.327660	-57.340120	1260.633911	70.686916	46.031918	1707.404358	19.199318
12	root/cells	140772.328125	124422.710938	124205.296875	47709.519531	45127.750000	79265.210938	233.019485	112.786469	-32.957512	74.586716	385.162079	2023.613403	47.331848	1269.332886	-35.532528	1502.565552	81.487892	56.261284	2006.830933	18.206221
13	root/cells	134818.671875	120832.539062	122160.945312	38974.097656	36839.359375	76395.578125	191.137650	135.872894	-10.295650	79.315895	401.158356	412.952423	55.898453	166.824615	-61.612671	1276.863892	78.963295	61.515224	1939.693726	24.721586
14	root/cells	136966.250000	121361.726562	123618.257812	43038.148438	40041.062500	78246.851562	118.131294	133.452423	-23.271145	73.270729	355.407837	1130.800659	35.031693	271.262695	-48.361839	1392.066284	73.971474	48.489834	2045.860962	20.997778
15	root/cells	132058.265625	118656.210938	121971.609375	37151.539062	34822.558594	76196.960938	174.226944	148.596176	-21.143461	83.881248	420.505432	272.381622	43.345253	143.740891	-72.722580	1183.359985	77.483955	55.742638	1947.556885	21.019920

MFI visualization#

In order to visualize the data, we use the fp.pl.mfi() function. Similar to the previous fp.pl.cell_counts() and fp.pl.gate_frequency(), we use a categorical boxplot.

The gate parameter is used to specify the population. The default fp.settings.default_layer argument is compensated, so we access the previously calculated data on the compensated events.

[7]:

fp.pl.mfi(dataset,
          gate = "Neutrophils",
          groupby = "experiment",
          marker = "Ly6G")

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

../_images/vignettes_dataset_frequency_analysis_10_1.png

[8]:

fp.pl.mfi(dataset,
          gate = "Neutrophils",
          groupby = "experiment",
          splitby = "sex",
          stat_test = "Kruskal",
          marker = "Ly6G",
          figsize = (4,4))

../_images/vignettes_dataset_frequency_analysis_11_0.png

We can also plot the data as an expression heatmap. Here, we use the MFI values. Each row corresponds to a marker, and every column is a sample.

We use the metadata_annotation parameter in order to visualize the metadata on the same heatmap.

[9]:

fp.pl.expression_heatmap(dataset, gate = "CD45+", metadata_annotation = ["experiment", "sex"])

../_images/vignettes_dataset_frequency_analysis_13_0.png

Often times, heatmaps can be misleading due to internal scaling. We can plot the raw MFI values for a specific marker on top by using the metadata_annotation parameter.

[10]:

fp.pl.expression_heatmap(dataset,
                         gate = "CD45+",
                         metadata_annotation = ["experiment", "sex"],
                         marker_annotation = "CD45",
                         figsize = (4,7))

../_images/vignettes_dataset_frequency_analysis_15_0.png

FOP calculation#

In order to calculate the frequency of positives, a cutoff needs to be defined above which cells are counted as marker-positive.

In this example, we already performed the calculation of these cofactors, and these are stored in the .var slot.

[11]:

dataset.var

[11]:

	pns	png	pne	pnr	type	pnn	cofactors
FSC-A	FSC-A	1.0	(0.0, 0.0)	262144	scatter	FSC-A	1.0
FSC-H	FSC-H	1.0	(0.0, 0.0)	262144	scatter	FSC-H	1.0
FSC-W	FSC-W	1.0	(0.0, 0.0)	262144	scatter	FSC-W	1.0
SSC-A	SSC-A	1.0	(0.0, 0.0)	262144	scatter	SSC-A	1.0
SSC-H	SSC-H	1.0	(0.0, 0.0)	262144	scatter	SSC-H	1.0
SSC-W	SSC-W	1.0	(0.0, 0.0)	262144	scatter	SSC-W	1.0
GFP	GFP	1.0	(0.0, 0.0)	262144	fluo	GFP-A	604.5583
B220	B220	1.0	(0.0, 0.0)	262144	fluo	APC-A	621.9474
CD4	CD4	1.0	(0.0, 0.0)	262144	fluo	APC-H7-A	263.38608
Siglec-F	Siglec-F	1.0	(0.0, 0.0)	262144	fluo	BV421-A	5008.094
CD8	CD8	1.0	(0.0, 0.0)	262144	fluo	BV510-A	6000.0
Ly6C	Ly6C	1.0	(0.0, 0.0)	262144	fluo	BV605-A	566.7654
NK1.1	NK1.1	1.0	(0.0, 0.0)	262144	fluo	BV711-A	140.00124
CD11b	CD11b	1.0	(0.0, 0.0)	262144	fluo	BV786-A	495.22092
Ly6G	Ly6G	1.0	(0.0, 0.0)	262144	fluo	BUV395-A	127.525566
DAPI	DAPI	1.0	(0.0, 0.0)	262144	fluo	BUV496-A	1871.7664
CD3	CD3	1.0	(0.0, 0.0)	262144	fluo	BUV737-A	1767.76
F4_80	F4_80	1.0	(0.0, 0.0)	262144	fluo	BYG790-A	1105.7057
CD45	CD45	1.0	(0.0, 0.0)	262144	fluo	BB700-A	782.04565
Time	Time	1.0	(0.0, 0.0)	262144	time	Time	1.0

We can now calculate the frequency of parents using the fp.tl.fop() function. Similar to the MFI, FOPs are calculated per sample_ID and stored in the .uns slot.

[12]:

fp.tl.fop(dataset)
dataset

[12]:

AnnData object with n_obs × n_vars = 2450306 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors', 'settings', 'pca_CD45+_transformed', 'pca_CD45+_logicle', 'gate_frequencies', 'mfi_sample_ID_compensated', 'fop_sample_ID_compensated'
    obsm: 'X_pca_CD45+_logicle', 'X_pca_CD45+_transformed', 'gating'
    varm: 'pca_CD45+_logicle', 'pca_CD45+_transformed'
    layers: 'compensated', 'logicle', 'transformed'

FOP visualization#

We use similar plotting capabilities for the display of FOPs.

[13]:

fp.pl.fop(dataset,
          marker = "Ly6G",
          gate = "CD45+",
          groupby = "experiment")

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

../_images/vignettes_dataset_frequency_analysis_21_1.png

As Ly6G is a marker for Neutrophils, we expect the FOP to be very similar to the gate frequency of Neutrophils:

[14]:

fp.pl.gate_frequency(dataset,
                     gate = "Neutrophils",
                     freq_of = "CD45+",
                     groupby = "experiment")

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

../_images/vignettes_dataset_frequency_analysis_23_1.png

We can use the same expression heatmap as above, passing data_metric=fop. This will display the frequency of parents per sample as described above. Similarly, we display the frequency of Ly6G positive cells on top of the plot.

[15]:

fp.pl.expression_heatmap(dataset,
                         gate = "CD45+",
                         data_metric = "fop",
                         metadata_annotation = ["experiment", "sex"],
                         marker_annotation = "Ly6G",
                         figsize = (4,7))

../_images/vignettes_dataset_frequency_analysis_25_0.png

Save the dataset#

Since we performed the mfi and fop analysis, we save the dataset.

[16]:

fp.save_dataset(dataset,
                file_name = "../../Tutorials/mouse_lineages/raw_dataset_stained_mfi",
                overwrite = True)

File saved successfully

Data analysis: Fluorescence intensity and Frequency of Parents

Contents