Data analysis: Fluorescence intensity and Frequency of Parents

Data analysis: Fluorescence intensity and Frequency of Parents#

In this vignette we will cover how to calculate MFIs and FOPs and visualize them.

We will first import necessary libraries and read our dataset we created in earlier vignettes.

[1]:
import warnings
warnings.filterwarnings(
    action='ignore',
    category=FutureWarning
)
[2]:
import FACSPy as fp
[3]:
dataset = fp.read_dataset(input_dir = "../../Tutorials/mouse_lineages/",
                          file_name = "raw_dataset_stained")
dataset
[3]:
AnnData object with n_obs × n_vars = 2450306 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors', 'settings', 'pca_CD45+_transformed', 'pca_CD45+_logicle', 'gate_frequencies'
    obsm: 'X_pca_CD45+_logicle', 'X_pca_CD45+_transformed', 'gating'
    varm: 'pca_CD45+_logicle', 'pca_CD45+_transformed'
    layers: 'compensated', 'logicle', 'transformed'

MFI calculation#

In order to calculate the MFI, we use the fp.tl.mfi() function. Note that by default, MFIs are calculated by sample_ID. We will later showcase how to calculate MFI on different variables.

Currently, we will calculate the median fluorescence intensity on the compensated data. In order to calculate the mean, pass method='mean. If calculations on other data layers are needed, pass the layer argument, specifying the data stored in .layers.

[4]:
fp.tl.mfi(dataset)
[5]:
dataset
[5]:
AnnData object with n_obs × n_vars = 2450306 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors', 'settings', 'pca_CD45+_transformed', 'pca_CD45+_logicle', 'gate_frequencies', 'mfi_sample_ID_compensated'
    obsm: 'X_pca_CD45+_logicle', 'X_pca_CD45+_transformed', 'gating'
    varm: 'pca_CD45+_logicle', 'pca_CD45+_transformed'
    layers: 'compensated', 'logicle', 'transformed'

We have a new entry in the .uns slot called mfi_sample_ID_compensated. This entry contains a dataframe, where median fluorescence values for each channel and gate are stored.

[6]:
dataset.uns["mfi_sample_ID_compensated"].head()
[6]:
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W GFP B220 CD4 Siglec-F CD8 Ly6C NK1.1 CD11b Ly6G DAPI CD3 F4_80 CD45 Time
sample_ID gate
11 root/cells 133443.234375 119058.652344 122137.394531 38624.003906 36014.041016 76540.160156 127.794876 109.917522 -18.103258 74.135475 408.144058 249.207123 22.688130 139.327660 -57.340120 1260.633911 70.686916 46.031918 1707.404358 19.199318
12 root/cells 140772.328125 124422.710938 124205.296875 47709.519531 45127.750000 79265.210938 233.019485 112.786469 -32.957512 74.586716 385.162079 2023.613403 47.331848 1269.332886 -35.532528 1502.565552 81.487892 56.261284 2006.830933 18.206221
13 root/cells 134818.671875 120832.539062 122160.945312 38974.097656 36839.359375 76395.578125 191.137650 135.872894 -10.295650 79.315895 401.158356 412.952423 55.898453 166.824615 -61.612671 1276.863892 78.963295 61.515224 1939.693726 24.721586
14 root/cells 136966.250000 121361.726562 123618.257812 43038.148438 40041.062500 78246.851562 118.131294 133.452423 -23.271145 73.270729 355.407837 1130.800659 35.031693 271.262695 -48.361839 1392.066284 73.971474 48.489834 2045.860962 20.997778
15 root/cells 132058.265625 118656.210938 121971.609375 37151.539062 34822.558594 76196.960938 174.226944 148.596176 -21.143461 83.881248 420.505432 272.381622 43.345253 143.740891 -72.722580 1183.359985 77.483955 55.742638 1947.556885 21.019920

MFI visualization#

In order to visualize the data, we use the fp.pl.mfi() function. Similar to the previous fp.pl.cell_counts() and fp.pl.gate_frequency(), we use a categorical boxplot.

The gate parameter is used to specify the population. The default fp.settings.default_layer argument is compensated, so we access the previously calculated data on the compensated events.

[7]:
fp.pl.mfi(dataset,
          gate = "Neutrophils",
          groupby = "experiment",
          marker = "Ly6G")
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_frequency_analysis_10_1.png
[8]:
fp.pl.mfi(dataset,
          gate = "Neutrophils",
          groupby = "experiment",
          splitby = "sex",
          stat_test = "Kruskal",
          marker = "Ly6G",
          figsize = (4,4))
../_images/vignettes_dataset_frequency_analysis_11_0.png

We can also plot the data as an expression heatmap. Here, we use the MFI values. Each row corresponds to a marker, and every column is a sample.

We use the metadata_annotation parameter in order to visualize the metadata on the same heatmap.

[9]:
fp.pl.expression_heatmap(dataset, gate = "CD45+", metadata_annotation = ["experiment", "sex"])
../_images/vignettes_dataset_frequency_analysis_13_0.png

Often times, heatmaps can be misleading due to internal scaling. We can plot the raw MFI values for a specific marker on top by using the metadata_annotation parameter.

[10]:
fp.pl.expression_heatmap(dataset,
                         gate = "CD45+",
                         metadata_annotation = ["experiment", "sex"],
                         marker_annotation = "CD45",
                         figsize = (4,7))
../_images/vignettes_dataset_frequency_analysis_15_0.png

FOP calculation#

In order to calculate the frequency of positives, a cutoff needs to be defined above which cells are counted as marker-positive.

In this example, we already performed the calculation of these cofactors, and these are stored in the .var slot.

[11]:
dataset.var
[11]:
pns png pne pnr type pnn cofactors
FSC-A FSC-A 1.0 (0.0, 0.0) 262144 scatter FSC-A 1.0
FSC-H FSC-H 1.0 (0.0, 0.0) 262144 scatter FSC-H 1.0
FSC-W FSC-W 1.0 (0.0, 0.0) 262144 scatter FSC-W 1.0
SSC-A SSC-A 1.0 (0.0, 0.0) 262144 scatter SSC-A 1.0
SSC-H SSC-H 1.0 (0.0, 0.0) 262144 scatter SSC-H 1.0
SSC-W SSC-W 1.0 (0.0, 0.0) 262144 scatter SSC-W 1.0
GFP GFP 1.0 (0.0, 0.0) 262144 fluo GFP-A 604.5583
B220 B220 1.0 (0.0, 0.0) 262144 fluo APC-A 621.9474
CD4 CD4 1.0 (0.0, 0.0) 262144 fluo APC-H7-A 263.38608
Siglec-F Siglec-F 1.0 (0.0, 0.0) 262144 fluo BV421-A 5008.094
CD8 CD8 1.0 (0.0, 0.0) 262144 fluo BV510-A 6000.0
Ly6C Ly6C 1.0 (0.0, 0.0) 262144 fluo BV605-A 566.7654
NK1.1 NK1.1 1.0 (0.0, 0.0) 262144 fluo BV711-A 140.00124
CD11b CD11b 1.0 (0.0, 0.0) 262144 fluo BV786-A 495.22092
Ly6G Ly6G 1.0 (0.0, 0.0) 262144 fluo BUV395-A 127.525566
DAPI DAPI 1.0 (0.0, 0.0) 262144 fluo BUV496-A 1871.7664
CD3 CD3 1.0 (0.0, 0.0) 262144 fluo BUV737-A 1767.76
F4_80 F4_80 1.0 (0.0, 0.0) 262144 fluo BYG790-A 1105.7057
CD45 CD45 1.0 (0.0, 0.0) 262144 fluo BB700-A 782.04565
Time Time 1.0 (0.0, 0.0) 262144 time Time 1.0

We can now calculate the frequency of parents using the fp.tl.fop() function. Similar to the MFI, FOPs are calculated per sample_ID and stored in the .uns slot.

[12]:
fp.tl.fop(dataset)
dataset
[12]:
AnnData object with n_obs × n_vars = 2450306 × 20
    obs: 'staining', 'sample_ID', 'file_name', 'organ', 'genotype', 'sex', 'experiment', 'age'
    var: 'pns', 'png', 'pne', 'pnr', 'type', 'pnn', 'cofactors'
    uns: 'metadata', 'panel', 'workspace', 'gating_cols', 'dataset_status_hash', 'cofactors', 'raw_cofactors', 'settings', 'pca_CD45+_transformed', 'pca_CD45+_logicle', 'gate_frequencies', 'mfi_sample_ID_compensated', 'fop_sample_ID_compensated'
    obsm: 'X_pca_CD45+_logicle', 'X_pca_CD45+_transformed', 'gating'
    varm: 'pca_CD45+_logicle', 'pca_CD45+_transformed'
    layers: 'compensated', 'logicle', 'transformed'

FOP visualization#

We use similar plotting capabilities for the display of FOPs.

[13]:
fp.pl.fop(dataset,
          marker = "Ly6G",
          gate = "CD45+",
          groupby = "experiment")
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_frequency_analysis_21_1.png

As Ly6G is a marker for Neutrophils, we expect the FOP to be very similar to the gate frequency of Neutrophils:

[14]:
fp.pl.gate_frequency(dataset,
                     gate = "Neutrophils",
                     freq_of = "CD45+",
                     groupby = "experiment")
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/vignettes_dataset_frequency_analysis_23_1.png

We can use the same expression heatmap as above, passing data_metric=fop. This will display the frequency of parents per sample as described above. Similarly, we display the frequency of Ly6G positive cells on top of the plot.

[15]:
fp.pl.expression_heatmap(dataset,
                         gate = "CD45+",
                         data_metric = "fop",
                         metadata_annotation = ["experiment", "sex"],
                         marker_annotation = "Ly6G",
                         figsize = (4,7))
../_images/vignettes_dataset_frequency_analysis_25_0.png

Save the dataset#

Since we performed the mfi and fop analysis, we save the dataset.

[16]:
fp.save_dataset(dataset,
                file_name = "../../Tutorials/mouse_lineages/raw_dataset_stained_mfi",
                overwrite = True)
File saved successfully