The Metadata object#

Metadata have to be provided as a table.

There are two obligatory columns: ‘sample_ID’ and ‘file_name’. It is highly recommended to add a column ‘staining’ as well, since this information is required for the automated cofactor calculation.

‘sample_ID’ can be filled with anything, as long as the entries are unique. We used ascending integers here. ‘file_name’ has to be filled with the .fcs file names, including the data extension .fcs. Only the files specified here will be read. ‘staining’ can be filled with either ‘unstained’ or ‘stained’.

Here, we read in example metadata as a normal dataframe via the pandas library:

[1]:
import pandas as pd

user_metadata = pd.read_csv("../Tutorials/spectral_dataset/metadata.csv", sep = ";")
user_metadata.head()
[1]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material batch
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1
2 3 4449.fcs healthy 4449 PB stained healthy healthy 4449 PBMC 2
3 4 5143.fcs healthy 5143 PB stained healthy healthy 5143 PBMC 2
4 5 6042.fcs healthy 6042 PB stained healthy healthy 6042 PBMC 1

Create metadata from a pandas dataframe#

In order to create a FACSPy-readable Metadata object, we use the fp.dt.Metadata class where ‘fp’ is the alias for FACSPy and ‘dt’ stands for dataset.

In this scenario, we use the metadata table that we read via the pandas library from above. We pass the table via the metadata parameter.

A Metadata object is created with 36 entries. ‘factors’ refer to the column names specifying the individual parameters.

[2]:
import FACSPy as fp
[3]:
metadata = fp.dt.Metadata(metadata = user_metadata)
metadata
[3]:
Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])

Create metadata from a .csv file#

We can also read the metadata table directly from the hard drive. In order to do that, we pass the path to the fp.dt.Metadata class. Any file format that can be accessed by pd.read_csv() can be used.

[4]:
metadata = fp.dt.Metadata(file = "../Tutorials/spectral_dataset/metadata.csv")
metadata
[4]:
Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])

Access the metadata table#

The underlying table is stored in the .dataframe attribute and can be accessed and modified.

Use the method .to_df() to return the underlying table or directly access the table via .dataframe as shown here.

[5]:
df = metadata.dataframe
df.head()
[5]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material batch
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1
2 3 4449.fcs healthy 4449 PB stained healthy healthy 4449 PBMC 2
3 4 5143.fcs healthy 5143 PB stained healthy healthy 5143 PBMC 2
4 5 6042.fcs healthy 6042 PB stained healthy healthy 6042 PBMC 1
[6]:
df = metadata.to_df()
df.head()
[6]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material batch
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1
2 3 4449.fcs healthy 4449 PB stained healthy healthy 4449 PBMC 2
3 4 5143.fcs healthy 5143 PB stained healthy healthy 5143 PBMC 2
4 5 6042.fcs healthy 6042 PB stained healthy healthy 6042 PBMC 1

Access metadata factors#

In order to access the parameters that the user has specified in the metadata, use the .get_factors() method.

[7]:
metadata.get_factors()
[7]:
['group_fd',
 'internal_id',
 'organ',
 'diag_main',
 'diag_fine',
 'donor_id',
 'material',
 'batch']

Rename columns#

Metadata table columns can be renamed by the .rename() method. It expects two arguments: the current column name and the new column name. Note that the change happens inplace.

[8]:
metadata.rename_column(current_name = "batch", new_name = "newly_named_batch")
metadata.dataframe.columns
[8]:
Index(['sample_ID', 'file_name', 'group_fd', 'internal_id', 'organ',
       'staining', 'diag_main', 'diag_fine', 'donor_id', 'material',
       'newly_named_batch'],
      dtype='object')

Subset metadata#

In order to subset the metadata, the .subset() method can be used. The function expects the current column and a list of entries in that column.

[9]:
metadata.subset(column = "file_name", values = ["3742.fcs", "4337.fcs"])
metadata.to_df()
[9]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material newly_named_batch
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1

Add annotations to metadata#

In order to add a new annotation to the metadata, we can use the .annotate() method. Currently, the filenames or sample_IDs can be passed as a list or a singular value, the second argument specifies the column that is created and the third argument specifies the value that is added.

[10]:
metadata.annotate(
    file_names = ["3742.fcs", "4337.fcs"],
    column = "new_col",
    value = "new_val"
)
metadata.dataframe.head()
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\dataset\_supplements.py:353: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'new_val' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  self.dataframe.loc[self.dataframe["file_name"].isin(file_names), column] = value
[10]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material newly_named_batch new_col
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1 new_val
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1 new_val

Rename values#

Entries can be modified using the pandas notation, or via the convenience method .rename_factors(). In the next example, every entry gets renamed to ‘renamed_val’.

[11]:
metadata.rename_values("new_col", "renamed_val")
metadata.dataframe.head()
[11]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material newly_named_batch new_col
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1 renamed_val
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1 renamed_val

If we want to rename the entries one-by-one, we pass a list:

[12]:
metadata.rename_values("new_col", ["renamed_val1", "renamed_val2"])
metadata.dataframe.head()
[12]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material newly_named_batch new_col
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1 renamed_val1
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1 renamed_val2

Lastly, we can also pass a dictionary, where the old values are the keys and the values to be renamed to are the values.

[13]:
metadata.rename_values("new_col", {"renamed_val1": "final_var1",
                                   "renamed_val2": "final_var2"})
metadata.dataframe.head()
[13]:
sample_ID file_name group_fd internal_id organ staining diag_main diag_fine donor_id material newly_named_batch new_col
0 1 3742.fcs healthy 3742 PB stained healthy healthy 3742 PBMC 1 final_var1
1 2 4337.fcs healthy 4337 PB stained healthy healthy 4337 PBMC 1 final_var2

Write metadata to the hard drive#

In order to write the metadata table to the hard drive, use the .write() method, specifying a file-path with the file name.

[14]:
metadata.write("../Tutorials/spectral_dataset/vignette_metadata.csv")