SDK Reference¶
Resolwe¶
- class resdk.Resolwe(username=None, password=None, url=None)[source]¶
Connect to a Resolwe server.
- Parameters
- data_usage(**query_params)[source]¶
Get per-user data usage information.
Display number of samples, data objects and sum of data object sizes for currently logged-in user. For admin users, display data for all users.
- login(username=None, password=None)[source]¶
Interactive login.
Ask the user to enter credentials in command prompt. If username / email and password are given, login without prompt.
- run(slug=None, input={}, descriptor=None, descriptor_schema=None, collection=None, data_name='', process_resources=None)[source]¶
Run process and return the corresponding Data object.
Upload files referenced in inputs
Create Data object with given inputs
Command is run that processes inputs into outputs
Return Data object
The processing runs asynchronously, so the returned Data object does not have an OK status or outputs when returned. Use data.update() to refresh the Data resource object.
- Parameters
slug (str) – Process slug (human readable unique identifier)
input (dict) – Input values
descriptor (dict) – Descriptor values
descriptor_schema (str) – A valid descriptor schema slug
collection (int/resource) – Collection resource or it’s id into which data object should be included
data_name (str) – Default name of data object
process_resources (dict) – Process resources
- Returns
data object that was just created
- Return type
Data object
Resolwe Query¶
- class resdk.ResolweQuery(resolwe, resource, slug_field='slug')[source]¶
Query resource endpoints.
A Resolwe instance (for example “res”) has several endpoints:
res.data
res.collection
res.sample
res.process
…
Each such endpoint is an instance of the ResolweQuery class. ResolweQuery supports queries on corresponding objects, for example:
res.data.get(42) # return Data object with ID 42. res.sample.filter(contributor=1) # return all samples made by contributor 1
This object is lazy loaded which means that actual request is made only when needed. This enables composing multiple filters, for example:
res.data.filter(contributor=1).filter(name='My object')
is the same as:
res.data.filter(contributor=1, name='My object')
This is especially useful, because all endpoints at Resolwe instance are such queries and can be filtered further before transferring any data.
To get a list of all supported query parameters, use one that does not exist and you will et a helpful error message with a list of allowed ones.
res.data.filter(foo="bar")
- all()[source]¶
Return copy of the current queryset.
This is handy function to get newly created query without any filters.
- delete(force=False)[source]¶
Delete objects in current query.
- Parameters
force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.
- get(*args, **kwargs)[source]¶
Get object that matches given parameters.
If only one non-keyworded argument is given, it is considered as id if it is number and as slug otherwise.
- Parameters
uid (int for ID or string for slug) – unique identifier - ID or slug
- Return type
object of type self.resource
- Raises
ValueError – if non-keyworded and keyworded arguments are combined or if more than one non-keyworded argument is given
LookupError – if none or more than one objects are returned
- iterate(chunk_size=100)[source]¶
Iterate through query.
This can come handy when one wishes to iterate through hundreds or thousands of objects and would otherwise get “504 Gateway-timeout”.
The method cannot be used together with the following filters: limit, offset and ordering, and will raise a
ValueError
.
Resources¶
Resource classes¶
- class resdk.resources.base.BaseResource(resolwe, **model_data)[source]¶
Abstract resource.
One and only one of the identifiers (slug, id or model_data) should be given.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- delete(force=False)[source]¶
Delete the resource object from the server.
- Parameters
force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.
- classmethod fetch_object(resolwe, id=None, slug=None)[source]¶
Return resource instance that is uniquely defined by identifier.
- id¶
unique identifier of an object
- class resdk.resources.base.BaseResolweResource(resolwe, **model_data)[source]¶
Base class for Resolwe resources.
One and only one of the identifiers (slug, id or model_data) should be given.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- property contributor¶
Contributor.
- property created¶
Creation time.
- current_user_permissions¶
current user permissions
- property modified¶
Modification time.
- name¶
name of resource
- property permissions¶
Permissions.
- slug¶
human-readable unique identifier
- version¶
resource version
- class resdk.resources.Data(resolwe, **model_data)[source]¶
Resolwe Data resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- checksum¶
checksum field calculated on inputs
- property children¶
Get children of this Data object.
- property collection¶
Get collection.
- descriptor¶
annotation data, with the form defined in descriptor_schema
- descriptor_dirty¶
indicate whether descriptor doesn’t match descriptor_schema (is dirty)
- property descriptor_schema¶
Get descriptor schema.
- download(file_name=None, field_name=None, download_dir=None)[source]¶
Download Data object’s files and directories.
Download files and directoriesfrom the Resolwe server to the download directory (defaults to the current working directory).
- Parameters
file_name (string) – name of file or directory
field_name (string) – file or directory field name
download_dir (string) – download path
- Return type
None
Data objects can contain multiple files and directories. All are downloaded by default, but may be filtered by name or output field:
re.data.get(42).download(file_name=’alignment7.bam’)
re.data.get(42).download(field_name=’bam’)
- duplicate(inherit_collection=False)[source]¶
Duplicate (make copy of)
data
object.- Parameters
inherit_collection – If
True
then duplicated data will be added to collection of the original data.- Returns
Duplicated data object
- duplicated¶
duplicated
- files(file_name=None, field_name=None)[source]¶
Get list of downloadable file fields.
Filter files by file name or output field.
- Parameters
file_name (string) – name of file
field_name (string) – output field name
- Return type
List of tuples (data_id, file_name, field_name, process_type)
- property finished¶
Get finish time.
- input¶
actual input values
- output¶
actual output values
- property parents¶
Get parents of this Data object.
- property process¶
Get process.
- process_cores¶
process cores
- process_error¶
error log message (list of strings)
- process_info¶
info log message (list of strings)
- process_memory¶
process memory
- process_progress¶
process progress in percentage
- process_rc¶
Process algorithm return code
- process_resources¶
process_resources
- process_warning¶
warning log message (list of strings)
- property sample¶
Get sample.
- scheduled¶
scheduled
- size¶
size
- property started¶
Get start time.
- status¶
process status - Possible values: UP (Uploading - for upload processes), RE (Resolving - computing input data objects) WT (Waiting - waiting for process since the queue is full) PP (Preparing - preparing the environment for processing) PR (Processing) OK (Done) ER (Error) DR (Dirty - Data is dirty)
- stdout()[source]¶
Return process standard output (stdout.txt file content).
Fetch stdout.txt file from the corresponding Data object and return the file content as string. The string can be long and ugly.
- Return type
string
- tags¶
data object’s tags
- class resdk.resources.collection.BaseCollection(resolwe, **model_data)[source]¶
Abstract collection resource.
One and only one of the identifiers (slug, id or model_data) should be given.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- property data¶
Return list of attached Data objects.
- description¶
description
- descriptor¶
descriptor
- descriptor_dirty¶
descriptor_dirty
- property descriptor_schema¶
Descriptor schema.
- download(file_name=None, field_name=None, download_dir=None)[source]¶
Download output files of associated Data objects.
Download files from the Resolwe server to the download directory (defaults to the current working directory).
- Parameters
file_name (string) – name of file
field_name (string) – field name
download_dir (string) – download path
- Return type
None
Collections can contain multiple Data objects and Data objects can contain multiple files. All files are downloaded by default, but may be filtered by file name or Data object type:
re.collection.get(42).download(file_name=’alignment7.bam’)
re.collection.get(42).download(data_type=’bam’)
- duplicated¶
duplicatied
- settings¶
settings
- tags¶
tags
- class resdk.resources.Collection(resolwe, **model_data)[source]¶
Resolwe Collection resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- property data¶
Return list of data objects on collection.
- property relations¶
Return list of data objects on collection.
- property samples¶
Return list of samples on collection.
- class resdk.resources.Sample(resolwe, **model_data)[source]¶
Resolwe Sample resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- property background¶
Get background sample of the current one.
- property collection¶
Get collection.
- property data¶
Get data.
- duplicate(inherit_collection=False)[source]¶
Duplicate (make copy of)
sample
object.- Parameters
inherit_collection – If
True
then duplicated samples (and their data) will be added to collection of the original sample.- Returns
Duplicated sample
- property is_background¶
Return
True
if given sample is background to any other andFalse
otherwise.
- property relations¶
Get
Relation
objects for this sample.
- class resdk.resources.Relation(resolwe, **model_data)[source]¶
Resolwe Relation resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- category¶
category of the relation
- property collection¶
Return collection object to which relation belongs.
- partitions¶
list of
RelationPartition
objects in theRelation
- property samples¶
Return list of sample objects in the relation.
- type¶
type of the relation
- unit(where applicable, e.g. for serieses)¶
unit (where applicable, e.g. for serieses)
- class resdk.resources.Process(resolwe, **model_data)[source]¶
Resolwe Process resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- category¶
used to group processes in a GUI. Examples:
upload:
,analyses:variants:
, …
- data_name¶
the default name of data object using this process. When data object is created you can assign a name to it. But if you don’t, the name of data object is determined from this field. The field is a expression which can take values of other fields.
- description¶
process description
- entity_always_create¶
entity_always_create
- entity_descriptor_schema¶
entity_descriptor_schema
- entity_input¶
entity_input
- entity_type¶
entity_type
- input_schema¶
specifications of inputs
- is_active¶
Boolean stating wether process is active
- output_schema¶
specification of outputs
- persistence¶
Measure of how important is to keep the process outputs when optimizing disk usage. Options: RAW/CACHED/TEMP. For processes, used on frontend use TEMP - the results of this processes can be quickly re-calculated any time. For upload processes use RAW - this data should never be deleted, since it cannot be re-calculated. For analysis use CACHED - the results can stil be calculated from imported data but it can take time.
- priority¶
process priority - not used yet
- requirements¶
required Docker image, amount of memory / CPU …
- run¶
the heart of process - here the algorithm is defined.
- scheduling_class¶
Scheduling class
- type¶
the type of process
"type:sub_type:sub_sub_type:..."
- class resdk.resources.DescriptorSchema(resolwe, **model_data)[source]¶
Resolwe DescriptorSchema resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- description¶
description
- schema¶
schema
- class resdk.resources.User(resolwe=None, **model_data)[source]¶
Resolwe User resource.
One and only one of the identifiers (slug, id or model_data) should be given.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- first_name¶
user’s first name
- class resdk.resources.Group(resolwe=None, **model_data)[source]¶
Resolwe Group resource.
One and only one of the identifiers (slug, id or model_data) should be given.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- name¶
group’s name
- property users¶
Return list of users in group.
- class resdk.resources.Geneset(resolwe, genes=None, source=None, species=None, **model_data)[source]¶
Resolwe Geneset resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- property genes¶
Get genes.
- save()[source]¶
Save Geneset to the server.
If Geneset is already on the server update with save() from base class. Otherwise, create a new Geneset by running process with slug “create-geneset”.
- set_operator(operator, other)[source]¶
Perform set operations on Geneset object by creating a new Genseset.
- Parameters
operator – string -> set operation function name
other – Geneset object
- Returns
new Geneset object
- property source¶
Get source.
- property species¶
Get species.
- class resdk.resources.Metadata(resolwe, **model_data)[source]¶
Metadata resource.
- Parameters
resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data
- property df¶
Get table as pd.DataFrame.
- property df_bytes¶
Get file contents of table output in bytes form.
- save()[source]¶
Save Metadata to the server.
If Metadata is already uploaded: update. Otherwise, create new one.
- set_index(df)[source]¶
Set index of df to Sample ID.
If there is a column with
Sample ID
just set that as index. If there isSample name
orSample slug
column, map sample name / slug to sample ID’s and set ID’s as an index. If no suitable column in there, raise an error. Works also if any of the above options is already an index with appropriate name.
- property unique¶
Get unique attribute.
This attribute tells if Metadata has one-to-one or one-to-many relation to collection samples.
- validate_df(df)[source]¶
Validate df property.
Validates that df:
is an instance of pandas.DataFrame
index contains sample IDs that match some samples:
If not matches, raise warning
If there are samples in df but not in collection, raise warning
If there are samples in collection but not in df, raise warning
- class resdk.resources.kb.Feature(resolwe, **model_data)[source]¶
Knowledge base Feature resource.
- aliases¶
Aliases
- description¶
Description
- feature_id¶
Feature ID
- full_name¶
Full name
- name¶
Name
- source¶
Source
- species¶
Species
- sub_type¶
Feature subtype (tRNA, protein coding, rRNA, …)
- type¶
Feature type (gene, transcript, exon, …)
Permissions¶
Resources like resdk.resources.Data
,
resdk.resources.Collection
, resdk.resources.Sample
, and
resdk.resources.Process
include a permissions attribute to manage
permissions. The permissions attribute is an instance of
resdk.resources.permissions.PermissionsManager.
- class resdk.resources.permissions.PermissionsManager(all_permissions, api_root, resolwe)[source]¶
Helper class to manage permissions of the
BaseResource
.- property editors¶
Get users with
edit
permission.
- property owners¶
Get users with
owner
permission.
- set_group(group, perm)[source]¶
Set
perm
permission togroup
.When assigning permissions, only the highest permission needs to be given. Permission hierarchy is:
none (no permissions)
view
edit
share
owner
Some examples:
collection = res.collection.get(...) # Add share, edit and view permission to BioLab: collection.permissions.set_group('biolab', 'share') # Remove share and edit permission from BioLab: collection.permissions.set_group('biolab', 'view') # Remove all permissions from BioLab: collection.permissions.set_group('biolab', 'none')
- set_public(perm)[source]¶
Set
perm
permission for public.Public can only get two sorts of permissions:
none (no permissions)
view
Some examples:
collection = res.collection.get(...) # Add view permission to public: collection.permissions.set_public('view') # Remove view permission from public: collection.permissions.set_public('none')
- set_user(user, perm)[source]¶
Set
perm
permission touser
.When assigning permissions, only the highest permission needs to be given. Permission hierarchy is:
none (no permissions)
view
edit
share
owner
Some examples:
collection = res.collection.get(...) # Add share, edit and view permission to John: collection.permissions.set_user('john', 'share') # Remove share and edit permission from John: collection.permissions.set_user('john', 'view') # Remove all permissions from John: collection.permissions.set_user('john', 'none')
- property viewers¶
Get users with
view
permission.
Utility functions¶
Resource utility functions.
- resdk.resources.utils.fill_spaces(word, desired_length)[source]¶
Fill spaces at the end until word reaches desired length.
- resdk.resources.utils.flatten_field(field, schema, path)[source]¶
Reduce dicts of dicts to dot separated keys.
- resdk.resources.utils.get_collection_id(collection)[source]¶
Return id attribute of the object if it is collection, otherwise return given value.
- resdk.resources.utils.get_data_id(data)[source]¶
Return id attribute of the object if it is data, otherwise return given value.
- resdk.resources.utils.get_descriptor_schema_id(dschema)[source]¶
Get descriptor schema id.
Return id attribute of the object if it is descriptor schema, otherwise return given value.
- resdk.resources.utils.get_process_id(process)[source]¶
Return id attribute of the object if it is process, otherwise return given value.
- resdk.resources.utils.get_relation_id(relation)[source]¶
Return id attribute of the object if it is relation, otherwise return given value.
- resdk.resources.utils.get_sample_id(sample)[source]¶
Return id attribute of the object if it is sample, otherwise return given value.
- resdk.resources.utils.get_user_id(user)[source]¶
Return id attribute of the object if it is relation, otherwise return given value.
- resdk.resources.utils.is_collection(collection)[source]¶
Return
True
if passed object is Collection andFalse
otherwise.
- resdk.resources.utils.is_data(data)[source]¶
Return
True
if passed object is Data andFalse
otherwise.
- resdk.resources.utils.is_descriptor_schema(data)[source]¶
Return
True
if passed object is DescriptorSchema andFalse
otherwise.
- resdk.resources.utils.is_group(group)[source]¶
Return
True
if passed object is Group andFalse
otherwise.
- resdk.resources.utils.is_process(process)[source]¶
Return
True
if passed object is Process andFalse
otherwise.
- resdk.resources.utils.is_relation(relation)[source]¶
Return
True
if passed object is Relation andFalse
otherwise.
- resdk.resources.utils.is_sample(sample)[source]¶
Return
True
if passed object is Sample andFalse
otherwise.
- resdk.resources.utils.is_user(user)[source]¶
Return
True
if passed object is User andFalse
otherwise.
- resdk.resources.utils.iterate_fields(fields, schema)[source]¶
Recursively iterate over all DictField sub-fields.
ReSDK Tables¶
Helper classes for aggregating collection data in tabular format.
Table classes¶
- class resdk.tables.rna.RNATables(collection: resdk.resources.collection.Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None, expression_source: Optional[str] = None, expression_process_slug: Optional[str] = None)[source]¶
A helper class to fetch collection’s expression and meta data.
This class enables fetching given collection’s data and returning it as tables which have samples in rows and expressions/metadata in columns.
When calling
RNATables.exp
,RNATables.rc
andRNATables.meta
for the first time the corresponding data gets downloaded from the server. This data than gets cached in memory and on disc and is used in consequent calls. If the data on the server changes the updated version gets re-downloaded.A simple example:
# Get Collection object collection = res.collection.get("collection-slug") # Fetch collection expressions and metadata tables = RNATables(collection) exp = tables.exp rc = tables.rc meta = tables.meta
- property exp: pandas.core.frame.DataFrame¶
Return expressions table as a pandas DataFrame object.
Which type of expressions (TPM, CPM, FPKM, …) get returned depends on how the data was processed. The expression type can be checked in the returned table attribute attrs[‘exp_type’]:
exp = tables.exp print(exp.attrs['exp_type'])
- Returns
table of expressions
- property rc: pandas.core.frame.DataFrame¶
Return expression counts table as a pandas DataFrame object.
- Returns
table of counts
- property readable_columns: Dict[str, str]¶
Map of source gene ids to symbols.
This also gets fetched only once and then cached in memory and on disc.
RNATables.exp
orRNATables.rc
must be called before this as the mapping is specific to just this data. Its intended use is to rename table column labels from gene ids to symbols.Example of use:
exp = exp.rename(columns=tables.id_to_symbol)
- Returns
dict with gene ids as keys and gene symbols as values
- class resdk.tables.methylation.MethylationTables(collection: resdk.resources.collection.Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]¶
A helper class to fetch collection’s methylation and meta data.
This class enables fetching given collection’s data and returning it as tables which have samples in rows and methylation/metadata in columns.
A simple example:
# Get Collection object collection = res.collection.get("collection-slug") # Fetch collection methylation and metadata tables = MethylationTables(collection) meta = tables.meta beta = tables.beta m_values = tables.mval
- property beta: pandas.core.frame.DataFrame¶
Return beta values table as a pandas DataFrame object.
- property mval: pandas.core.frame.DataFrame¶
Return m-values as a pandas DataFrame object.
Exceptions¶
Custom ReSDK exceptions.
Logging¶
Module contents:
Parent logger for all modules in resdk library
Handler STDOUT_HANDLER is “turned off” by default
Handler configuration functions
Override sys.excepthook to log all uncaught exceptions
Parent logger¶
Loggers in resdk are named by their module name. This is achieved by:
logger = logging.getLogger(__name__)
This makes it easy to locate the source of a log message.
Logging handlers¶
The handler STDOUT_HANDLER is created but not
automatically added to ROOT_LOGGER, which means they do not do anything.
The handlers are activated when users call logger configuration
functions like start_logging()
.
Handler configuration functions¶
As a good logging practice, the library does not register handlers by default. The reason is that if the library is included in some application, developers of that application will probably want to register loggers by themself. Therefore, if a user wishes to register the pre-defined handlers she can run:
import resdk
resdk.start_logging()
- resdk_logger.start_logging(logging_level=logging.INFO)¶
Start logging resdk with the default configuration.
- Parameters
logging_level (int) – logging threshold level - integer in [0-50]
- Return type
None
Logging levels:
logging.DEBUG(10)
logging.INFO(20)
logging.WARNING(30)
logging.ERROR(40)
logging.CRITICAL(50)
Log uncaught exceptions¶
All python exceptions are handled by function, stored in
sys.excepthook.
By rewriting the default implementation, we can
modify it for our puruses - to log all uncaught exceptions.
Note#1: Modified behaviour (logging of all uncaught exceptions) applies only when runing in non-interactive mode.
Note#2: Any exception can be caught/uncaught and it can happen in interactive/non-interactive mode. This makes 4 different scenarios. The sys.excepthook modification takes care of uncaught exceptions in non-interactive mode. In interactive mode, user is notified directly if exception is raised. If exception is caught and not reraised, it should be logged somehow, since it can provide valuable information for developer when debugging. Therefore, we should use the following convention for logging in resdk: “Exceptions are explicitly logged only when they are caught and not re-raised.”