Samples are normally annotated with the use of
descriptor_schema. However in some cases the fields defined in
DescriptorSchema do not suffice and it comes handy to upload sample
annotations in a table where each row holds information about some
sample in collection. In general, there can be multiple rows referring
to the same sample in the collection (for example one sample received
two or more distinct treatments). In such cases one can upload this
tables with the process Metadata table. However, quite often there is
exactly one-on-one mapping between rows in such table and samples in
collection. In such case, please use the “unique” flavour of the above
process, Metadata table (one-to-one).
Metadata in ReSDK is just a special kind of
Data resource that
simplifies retrieval of the above mentioned tables. In addition to all
of the functionality of
Data, it also has two additional attributes:
# The "df" attribute is pandas.DataFrame of the output named "table" # The index of df are sample ID's m.df # Attribute "unique" is signalling if this is metadata is unique or not m.unique
Behind the scenes,
df is not an attribute but rather a property.
So it has getter and setter methods (
This comes handy if the default parsing logic does not suffice. In
such cases you can provide your own parser and keyword arguments for
import pandas m.get_df(parser=pandas.read_csv, sep="\t", skiprows=[1, 2, 3])
In the most common case, Metadata objects exist somewhere on Resolwe server and user just fetches them:
# Get one metadata by slug m = res.metadata.get("my-slug") # Filter metadata by some conditions, e.g. get all metadata # from a given collection: ms = res.metadata.filter(collection=<my-collection>):
Sometimes, these objects need to be updated, and one can easily do that.
unique are upload protected - they can be set
during object creation but cannot be set afterwards:
m.unique = False # Will fail on already existing object m.df = <new-df> # Will fail on already existing object
Sometimes one wishes to create a new Metadata. This can be achieved in the same manner as for other ReSDK resources:
m = res.metadata.create(df=<my-df>, collection=<my-collection>) # Creating metadata without specifying df / collection will fail m = res.metdata.create() # Fail m = res.metdata.create(collection=<my-collection>) # Fail m = res.metdata.create(df=<my-df>) # Fail
Alternatively, one can also build this object gradually from scratch and
m = Metadata(resolwe=<resolwe>) m.collection = <my-collection> my_df = m.set_index(<my-df>) m.df = my_df m.save()
m.set_index(<my-df>) is a helper function that finds
column or index name, maps it to
Sample ID and sets it as index.
This function is recommended to use because the validation step is trying to
m.df index with
m.collection sample ID’s.
Deleting Metadata works the same as for any other resource. Be careful, this cannot be undone and you need to have sufficient permissions: