The Metadata object#
Metadata have to be provided as a table.
There are two obligatory columns: ‘sample_ID’ and ‘file_name’. It is highly recommended to add a column ‘staining’ as well, since this information is required for the automated cofactor calculation.
‘sample_ID’ can be filled with anything, as long as the entries are unique. We used ascending integers here. ‘file_name’ has to be filled with the .fcs file names, including the data extension .fcs. Only the files specified here will be read. ‘staining’ can be filled with either ‘unstained’ or ‘stained’.
Here, we read in example metadata as a normal dataframe via the pandas library:
[1]:
import pandas as pd
user_metadata = pd.read_csv("../Tutorials/spectral_dataset/metadata.csv", sep = ";")
user_metadata.head()
[1]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | batch | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 |
| 2 | 3 | 4449.fcs | healthy | 4449 | PB | stained | healthy | healthy | 4449 | PBMC | 2 |
| 3 | 4 | 5143.fcs | healthy | 5143 | PB | stained | healthy | healthy | 5143 | PBMC | 2 |
| 4 | 5 | 6042.fcs | healthy | 6042 | PB | stained | healthy | healthy | 6042 | PBMC | 1 |
Create metadata from a pandas dataframe#
In order to create a FACSPy-readable Metadata object, we use the fp.dt.Metadata class where ‘fp’ is the alias for FACSPy and ‘dt’ stands for dataset.
In this scenario, we use the metadata table that we read via the pandas library from above. We pass the table via the metadata parameter.
A Metadata object is created with 36 entries. ‘factors’ refer to the column names specifying the individual parameters.
[2]:
import FACSPy as fp
[3]:
metadata = fp.dt.Metadata(metadata = user_metadata)
metadata
[3]:
Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])
Create metadata from a .csv file#
We can also read the metadata table directly from the hard drive. In order to do that, we pass the path to the fp.dt.Metadata class. Any file format that can be accessed by pd.read_csv() can be used.
[4]:
metadata = fp.dt.Metadata(file = "../Tutorials/spectral_dataset/metadata.csv")
metadata
[4]:
Metadata(36 entries with factors ['group_fd', 'internal_id', 'organ', 'diag_main', 'diag_fine', 'donor_id', 'material', 'batch'])
Access the metadata table#
The underlying table is stored in the .dataframe attribute and can be accessed and modified.
Use the method .to_df() to return the underlying table or directly access the table via .dataframe as shown here.
[5]:
df = metadata.dataframe
df.head()
[5]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | batch | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 |
| 2 | 3 | 4449.fcs | healthy | 4449 | PB | stained | healthy | healthy | 4449 | PBMC | 2 |
| 3 | 4 | 5143.fcs | healthy | 5143 | PB | stained | healthy | healthy | 5143 | PBMC | 2 |
| 4 | 5 | 6042.fcs | healthy | 6042 | PB | stained | healthy | healthy | 6042 | PBMC | 1 |
[6]:
df = metadata.to_df()
df.head()
[6]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | batch | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 |
| 2 | 3 | 4449.fcs | healthy | 4449 | PB | stained | healthy | healthy | 4449 | PBMC | 2 |
| 3 | 4 | 5143.fcs | healthy | 5143 | PB | stained | healthy | healthy | 5143 | PBMC | 2 |
| 4 | 5 | 6042.fcs | healthy | 6042 | PB | stained | healthy | healthy | 6042 | PBMC | 1 |
Access metadata factors#
In order to access the parameters that the user has specified in the metadata, use the .get_factors() method.
[7]:
metadata.get_factors()
[7]:
['group_fd',
'internal_id',
'organ',
'diag_main',
'diag_fine',
'donor_id',
'material',
'batch']
Rename columns#
Metadata table columns can be renamed by the .rename() method. It expects two arguments: the current column name and the new column name. Note that the change happens inplace.
[8]:
metadata.rename_column(current_name = "batch", new_name = "newly_named_batch")
metadata.dataframe.columns
[8]:
Index(['sample_ID', 'file_name', 'group_fd', 'internal_id', 'organ',
'staining', 'diag_main', 'diag_fine', 'donor_id', 'material',
'newly_named_batch'],
dtype='object')
Subset metadata#
In order to subset the metadata, the .subset() method can be used. The function expects the current column and a list of entries in that column.
[9]:
metadata.subset(column = "file_name", values = ["3742.fcs", "4337.fcs"])
metadata.to_df()
[9]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | newly_named_batch | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 |
Add annotations to metadata#
In order to add a new annotation to the metadata, we can use the .annotate() method. Currently, the filenames or sample_IDs can be passed as a list or a singular value, the second argument specifies the column that is created and the third argument specifies the value that is added.
[10]:
metadata.annotate(
file_names = ["3742.fcs", "4337.fcs"],
column = "new_col",
value = "new_val"
)
metadata.dataframe.head()
C:\Users\tarik\anaconda3\envs\FACSPypeline\lib\site-packages\FACSPy\dataset\_supplements.py:353: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'new_val' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
self.dataframe.loc[self.dataframe["file_name"].isin(file_names), column] = value
[10]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | newly_named_batch | new_col | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 | new_val |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 | new_val |
Rename values#
Entries can be modified using the pandas notation, or via the convenience method .rename_factors(). In the next example, every entry gets renamed to ‘renamed_val’.
[11]:
metadata.rename_values("new_col", "renamed_val")
metadata.dataframe.head()
[11]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | newly_named_batch | new_col | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 | renamed_val |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 | renamed_val |
If we want to rename the entries one-by-one, we pass a list:
[12]:
metadata.rename_values("new_col", ["renamed_val1", "renamed_val2"])
metadata.dataframe.head()
[12]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | newly_named_batch | new_col | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 | renamed_val1 |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 | renamed_val2 |
Lastly, we can also pass a dictionary, where the old values are the keys and the values to be renamed to are the values.
[13]:
metadata.rename_values("new_col", {"renamed_val1": "final_var1",
"renamed_val2": "final_var2"})
metadata.dataframe.head()
[13]:
| sample_ID | file_name | group_fd | internal_id | organ | staining | diag_main | diag_fine | donor_id | material | newly_named_batch | new_col | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 3742.fcs | healthy | 3742 | PB | stained | healthy | healthy | 3742 | PBMC | 1 | final_var1 |
| 1 | 2 | 4337.fcs | healthy | 4337 | PB | stained | healthy | healthy | 4337 | PBMC | 1 | final_var2 |
Write metadata to the hard drive#
In order to write the metadata table to the hard drive, use the .write() method, specifying a file-path with the file name.
[14]:
metadata.write("../Tutorials/spectral_dataset/vignette_metadata.csv")