The Metadata object#

Metadata have to be provided as a table.

There are two obligatory columns: ‘sample_ID’ and ‘file_name’. It is highly recommended to add a column ‘staining’ as well, since this information is required for the automated cofactor calculation.

‘sample_ID’ can be filled with anything, as long as the entries are unique. We used ascending integers here. ‘file_name’ has to be filled with the .fcs file names, including the data extension .fcs. Only the files specified here will be read. ‘staining’ can be filled with either ‘unstained’ or ‘stained’.

Here, we read in example metadata as a normal dataframe via the pandas library:

[1]:

import pandas as pd

user_metadata = pd.read_csv("../Tutorials/spectral_dataset/metadata.csv", sep = ";")
user_metadata.head()

[1]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	batch
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1
2	3	4449.fcs	healthy	4449	PB	stained	healthy	healthy	4449	PBMC	2
3	4	5143.fcs	healthy	5143	PB	stained	healthy	healthy	5143	PBMC	2
4	5	6042.fcs	healthy	6042	PB	stained	healthy	healthy	6042	PBMC	1

Create metadata from a pandas dataframe#

In order to create a FACSPy-readable Metadata object, we use the fp.dt.Metadata class where ‘fp’ is the alias for FACSPy and ‘dt’ stands for dataset.

In this scenario, we use the metadata table that we read via the pandas library from above. We pass the table via the metadata parameter.

A Metadata object is created with 36 entries. ‘factors’ refer to the column names specifying the individual parameters.

[2]:

import FACSPy as fp

[3]:

metadata = fp.dt.Metadata(metadata = user_metadata)
metadata

[3]:

Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])

Create metadata from a .csv file#

We can also read the metadata table directly from the hard drive. In order to do that, we pass the path to the fp.dt.Metadata class. Any file format that can be accessed by pd.read_csv() can be used.

[4]:

metadata = fp.dt.Metadata(file = "../Tutorials/spectral_dataset/metadata.csv")
metadata

[4]:

Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])

Access the metadata table#

The underlying table is stored in the .dataframe attribute and can be accessed and modified.

Use the method .to_df() to return the underlying table or directly access the table via .dataframe as shown here.

[5]:

df = metadata.dataframe
df.head()

[5]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	batch
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1
2	3	4449.fcs	healthy	4449	PB	stained	healthy	healthy	4449	PBMC	2
3	4	5143.fcs	healthy	5143	PB	stained	healthy	healthy	5143	PBMC	2
4	5	6042.fcs	healthy	6042	PB	stained	healthy	healthy	6042	PBMC	1

[6]:

df = metadata.to_df()
df.head()

[6]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	batch
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1
2	3	4449.fcs	healthy	4449	PB	stained	healthy	healthy	4449	PBMC	2
3	4	5143.fcs	healthy	5143	PB	stained	healthy	healthy	5143	PBMC	2
4	5	6042.fcs	healthy	6042	PB	stained	healthy	healthy	6042	PBMC	1

Access metadata factors#

In order to access the parameters that the user has specified in the metadata, use the .get_factors() method.

[7]:

metadata.get_factors()

[7]:

['group_fd',
 'internal_id',
 'organ',
 'diag_main',
 'diag_fine',
 'donor_id',
 'material',
 'batch']

Rename columns#

Metadata table columns can be renamed by the .rename() method. It expects two arguments: the current column name and the new column name. Note that the change happens inplace.

[8]:

metadata.rename_column(current_name = "batch", new_name = "newly_named_batch")
metadata.dataframe.columns

[8]:

Index(['sample_ID', 'file_name', 'group_fd', 'internal_id', 'organ',
       'staining', 'diag_main', 'diag_fine', 'donor_id', 'material',
       'newly_named_batch'],
      dtype='object')

Subset metadata#

In order to subset the metadata, the .subset() method can be used. The function expects the current column and a list of entries in that column.

[9]:

metadata.subset(column = "file_name", values = ["3742.fcs", "4337.fcs"])
metadata.to_df()

[9]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	newly_named_batch
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1

Add annotations to metadata#

In order to add a new annotation to the metadata, we can use the .annotate() method. Currently, the filenames or sample_IDs can be passed as a list or a singular value, the second argument specifies the column that is created and the third argument specifies the value that is added.

[10]:

metadata.annotate(
    file_names = ["3742.fcs", "4337.fcs"],
    column = "new_col",
    value = "new_val"
)
metadata.dataframe.head()

C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\dataset\_supplements.py:353: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'new_val' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  self.dataframe.loc[self.dataframe["file_name"].isin(file_names), column] = value

[10]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	newly_named_batch	new_col
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1	new_val
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1	new_val

Rename values#

Entries can be modified using the pandas notation, or via the convenience method .rename_factors(). In the next example, every entry gets renamed to ‘renamed_val’.

[11]:

metadata.rename_values("new_col", "renamed_val")
metadata.dataframe.head()

[11]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	newly_named_batch	new_col
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1	renamed_val
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1	renamed_val

If we want to rename the entries one-by-one, we pass a list:

[12]:

metadata.rename_values("new_col", ["renamed_val1", "renamed_val2"])
metadata.dataframe.head()

[12]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	newly_named_batch	new_col
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1	renamed_val1
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1	renamed_val2

Lastly, we can also pass a dictionary, where the old values are the keys and the values to be renamed to are the values.

[13]:

metadata.rename_values("new_col", {"renamed_val1": "final_var1",
                                   "renamed_val2": "final_var2"})
metadata.dataframe.head()

[13]:

	sample_ID	file_name	group_fd	internal_id	organ	staining	diag_main	diag_fine	donor_id	material	newly_named_batch	new_col
0	1	3742.fcs	healthy	3742	PB	stained	healthy	healthy	3742	PBMC	1	final_var1
1	2	4337.fcs	healthy	4337	PB	stained	healthy	healthy	4337	PBMC	1	final_var2

Write metadata to the hard drive#

In order to write the metadata table to the hard drive, use the .write() method, specifying a file-path with the file name.

[14]:

metadata.write("../Tutorials/spectral_dataset/vignette_metadata.csv")

The Metadata object

Contents