SDK Reference

Resolwe

class resdk.Resolwe(username=None, password=None, url=None)[source]

Connect to a Resolwe server.

Parameters

username (str) – user’s email
password (str) – user’s password
url (str) – Resolwe server instance

data_usage(**query_params)[source]

Get per-user data usage information.

Display number of samples, data objects and sum of data object sizes for currently logged-in user. For admin users, display data for all users.

get_or_run(slug=None, input={})[source]

Return existing object if found, otherwise create new one.

Parameters

slug (str) – Process slug (human readable unique identifier)
input (dict) – Input values

get_query_by_resource(resource)[source]: Get ResolweQuery for a given resource.

login(username=None, password=None)[source]

Interactive login.

If only username is given prompt the user for password via shell. If username is not given, prompt for interactive login.

run(slug=None, input={}, descriptor=None, descriptor_schema=None, collection=None, data_name='', process_resources=None)[source]

Run process and return the corresponding Data object.

Upload files referenced in inputs
Create Data object with given inputs
Command is run that processes inputs into outputs
Return Data object

The processing runs asynchronously, so the returned Data object does not have an OK status or outputs when returned. Use data.update() to refresh the Data resource object.

Parameters

slug (str) – Process slug (human readable unique identifier)
input (dict) – Input values
descriptor (dict) – Descriptor values
descriptor_schema (str) – A valid descriptor schema slug
collection (int/resource) – Collection resource or it’s id into which data object should be included
data_name (str) – Default name of data object
process_resources (dict) – Process resources

Returns

data object that was just created

Return type

Data object

version_check()[source]: Check that the server is compatible with the client.

Resolwe Query

class resdk.ResolweQuery(resolwe, resource, slug_field='slug')[source]

Query resource endpoints.

A Resolwe instance (for example “res”) has several endpoints:

res.data

res.collection

res.sample

res.process

…

Each such endpoint is an instance of the ResolweQuery class. ResolweQuery supports queries on corresponding objects, for example:

res.data.get(42)  # return Data object with ID 42.
res.sample.filter(contributor=1)  # return all samples made by contributor 1

This object is lazy loaded which means that actual request is made only when needed. This enables composing multiple filters, for example:

res.data.filter(contributor=1).filter(name='My object')

is the same as:

res.data.filter(contributor=1, name='My object')

This is especially useful, because all endpoints at Resolwe instance are such queries and can be filtered further before transferring any data.

To get a list of all supported query parameters, use one that does not exist and you will et a helpful error message with a list of allowed ones.

res.data.filter(foo="bar")

all()[source]

Return copy of the current queryset.

This is handy function to get newly created query without any filters.

clear_cache()[source]: Clear cache.

count()[source]: Return number of objects in current query.

create(**model_data)[source]: Return new instance of current resource.

delete(force=False)[source]

Delete objects in current query.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

filter(**filters)[source]: Return clone of current query with added given filters.

get(*args, **kwargs)[source]

Get object that matches given parameters.

If only one non-keyworded argument is given, it is considered as id if it is number and as slug otherwise.

Parameters

uid (int for ID or string for slug) – unique identifier - ID or slug

Return type

object of type self.resource

Raises

ValueError – if non-keyworded and keyworded arguments are combined or if more than one non-keyworded argument is given
LookupError – if none or more than one objects are returned

iterate(chunk_size=100, show_progress=False)[source]

Iterate through query.

This can come handy when one wishes to iterate through hundreds or thousands of objects and would otherwise get “504 Gateway-timeout”.

The method cannot be used together with the following filters: limit, offset and ordering, and will raise a ValueError.

search(text)[source]: Full text search.

Resources

Resource classes

class resdk.resources.base.BaseResource(resolwe, **model_data)[source]

Abstract resource.

One and only one of the identifiers (slug, id or model_data) should be given.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

delete(force=False)[source]

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None)[source]: Return resource instance that is uniquely defined by identifier.

fields()[source]: Resource fields.

id: unique identifier of an object

save()[source]: Save resource to the server.

update()[source]: Update resource fields from the server.

class resdk.resources.base.BaseResolweResource(resolwe, **model_data)[source]

Base class for Resolwe resources.

One and only one of the identifiers (slug, id or model_data) should be given.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

property modified: Modification time.

name: name of resource

property permissions: Permissions.

save(): Save resource to the server.

slug: human-readable unique identifier

update()[source]: Clear permissions cache and update the object.

version: resource version

class resdk.resources.Data(resolwe, **model_data)[source]

Resolwe Data resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

checksum: checksum field calculated on inputs

property children: Get children of this Data object.

property collection: Get collection.

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

descriptor: annotation data, with the form defined in descriptor_schema

descriptor_dirty: indicate whether descriptor doesn’t match descriptor_schema (is dirty)

property descriptor_schema: Get descriptor schema.

download(file_name=None, field_name=None, download_dir=None)[source]

Download Data object’s files and directories.

Download files and directoriesfrom the Resolwe server to the download directory (defaults to the current working directory).

Parameters

file_name (string) – name of file or directory
field_name (string) – file or directory field name
download_dir (string) – download path

Return type

None

Data objects can contain multiple files and directories. All are downloaded by default, but may be filtered by name or output field:

re.data.get(42).download(file_name=’alignment7.bam’)
re.data.get(42).download(field_name=’bam’)

duplicate()[source]

Duplicate (make copy of) data object.

Returns: Duplicated data object

duplicated: duplicated

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

files(file_name=None, field_name=None)[source]

Get list of downloadable file fields.

Filter files by file name or output field.

Parameters

file_name (string) – name of file
field_name (string) – output field name

Return type

List of tuples (data_id, file_name, field_name, process_type)

property finished: Get finish time.

id: unique identifier of an object

input: actual input values

property modified: Modification time.

name: name of resource

output: actual output values

property parents: Get parents of this Data object.

property permissions: Permissions.

property process: Get process.

process_cores: process cores

process_error: error log message (list of strings)

process_info: info log message (list of strings)

process_memory: process memory

process_progress: process progress in percentage

process_rc: Process algorithm return code

process_resources: process_resources

process_warning: warning log message (list of strings)

property sample: Get sample.

save(): Save resource to the server.

scheduled: scheduled

size: size

slug: human-readable unique identifier

property started: Get start time.

status: process status - Possible values: UP (Uploading - for upload processes), RE (Resolving - computing input data objects) WT (Waiting - waiting for process since the queue is full) PP (Preparing - preparing the environment for processing) PR (Processing) OK (Done) ER (Error) DR (Dirty - Data is dirty)

stdout()[source]

Return process standard output (stdout.txt file content).

Fetch stdout.txt file from the corresponding Data object and return the file content as string. The string can be long and ugly.

Return type: string

tags: data object’s tags

update()[source]: Clear cache and update resource fields from the server.

version: resource version

class resdk.resources.collection.BaseCollection(resolwe, **model_data)[source]

Abstract collection resource.

One and only one of the identifiers (slug, id or model_data) should be given.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

property data: Return list of attached Data objects.

data_types()[source]

Return a list of data types (process_type).

Return type: List

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

description: description

descriptor: descriptor

descriptor_dirty: descriptor_dirty

property descriptor_schema: Descriptor schema.

download(file_name=None, field_name=None, download_dir=None)[source]

Download output files of associated Data objects.

Download files from the Resolwe server to the download directory (defaults to the current working directory).

Parameters

file_name (string) – name of file
field_name (string) – field name
download_dir (string) – download path

Return type

None

Collections can contain multiple Data objects and Data objects can contain multiple files. All files are downloaded by default, but may be filtered by file name or Data object type:

re.collection.get(42).download(file_name=’alignment7.bam’)
re.collection.get(42).download(data_type=’bam’)

duplicated: duplicatied

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

files(file_name=None, field_name=None)[source]: Return list of files in resource.

id: unique identifier of an object

property modified: Modification time.

name: name of resource

property permissions: Permissions.

save(): Save resource to the server.

settings: settings

slug: human-readable unique identifier

tags: tags

update()[source]: Clear cache and update resource fields from the server.

version: resource version

class resdk.resources.Collection(resolwe, **model_data)[source]

Resolwe Collection resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

assign_to_billing_account(billing_account_name)[source]: Assign given collection to a billing account.

property contributor: Contributor.

create_background_relation(category, background, cases)

Create background relation.

Parameters

category (str) – Category of relation
background (Sample) – Background sample
cases (Sample) – Case samples (signals)

create_compare_relation(category, samples, labels=[])

Create compare relation.

Parameters

category (str) – Category of relation (i.e. case-control, …)
samples (list) – List of samples to include in relation.
labels (list) – List of labels assigned to corresponding samples. If given it should be of same length as samples.

create_group_relation(category, samples, labels=[])

Create group relation.

Parameters

category (str) – Category of relation (i.e. replicates, clones, …)
samples (list) – List of samples to include in relation.
labels (list) – List of labels assigned to corresponding samples. If given it should be of same length as samples.

create_series_relation(category, samples, positions=[], labels=[])

Create series relation.

Parameters

category (str) – Category of relation (i.e. case-control, …)
samples (list) – List of samples to include in relation.
positions (list) – List of positions assigned to corresponding sample (i.e. 10, 20, 30). If given it should be of same length as samples. Note that this elements should be machine-sortable by default.
labels (list) – List of labels assigned to corresponding samples. If given it should be of same length as samples.

property created: Creation time.

current_user_permissions: current user permissions

property data: Return list of data objects on collection.

data_types()

Return a list of data types (process_type).

Return type: List

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

description: description

descriptor: descriptor

descriptor_dirty: descriptor_dirty

property descriptor_schema: Descriptor schema.

download(file_name=None, field_name=None, download_dir=None)

Download output files of associated Data objects.

Download files from the Resolwe server to the download directory (defaults to the current working directory).

Parameters

file_name (string) – name of file
field_name (string) – field name
download_dir (string) – download path

Return type

None

Collections can contain multiple Data objects and Data objects can contain multiple files. All files are downloaded by default, but may be filtered by file name or Data object type:

re.collection.get(42).download(file_name=’alignment7.bam’)
re.collection.get(42).download(data_type=’bam’)

duplicate()[source]

Duplicate (make copy of) collection object.

Returns: Duplicated collection

duplicated: duplicatied

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

files(file_name=None, field_name=None): Return list of files in resource.

id: unique identifier of an object

property modified: Modification time.

name: name of resource

property permissions: Permissions.

property relations: Return list of data objects on collection.

property samples: Return list of samples on collection.

save(): Save resource to the server.

settings: settings

slug: human-readable unique identifier

tags: tags

update()[source]: Clear cache and update resource fields from the server.

version: resource version

class resdk.resources.Sample(resolwe, **model_data)[source]

Resolwe Sample resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

property annotations: Get the annotations for the given sample.

property background: Get background sample of the current one.

property collection: Get collection.

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

property data: Get data.

data_types()

Return a list of data types (process_type).

Return type: List

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

description: description

descriptor: descriptor

descriptor_dirty: descriptor_dirty

property descriptor_schema: Descriptor schema.

download(file_name=None, field_name=None, download_dir=None)

Download output files of associated Data objects.

Download files from the Resolwe server to the download directory (defaults to the current working directory).

Parameters

file_name (string) – name of file
field_name (string) – field name
download_dir (string) – download path

Return type

None

Collections can contain multiple Data objects and Data objects can contain multiple files. All files are downloaded by default, but may be filtered by file name or Data object type:

re.collection.get(42).download(file_name=’alignment7.bam’)
re.collection.get(42).download(data_type=’bam’)

duplicate()[source]

Duplicate (make copy of) sample object.

Returns: Duplicated sample

duplicated: duplicatied

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

files(file_name=None, field_name=None): Return list of files in resource.

get_annotation(full_path: str) → AnnotationValue[source]

Get the AnnotationValue from full path.

Raises: LookupError – when field at the specified path does not exist.

get_annotations() → Dict[str, Any][source]: Get all annotations for the given sample in a dictionary.

get_bam(): Return bam object on the sample.

get_cuffquant(): Get cuffquant.

get_expression(): Get expression.

get_macs(): Return list of bed objects on the sample.

get_primary_bam(fallback_to_bam=False)

Return primary bam object on the sample.

If the primary bam object is not present and fallback_to_bam is set to True, a bam object will be returned.

get_reads(**filters)

Return the latest fastq object in sample.

If there are multiple fastq objects in sample (trimmed, filtered, subsampled…), return the latest one. If any other of the fastq objects is required, one can provide additional filter arguments and limits search to one result.

id: unique identifier of an object

property is_background: Return True if given sample is background to any other and False otherwise.

property modified: Modification time.

name: name of resource

property permissions: Permissions.

property relations: Get Relation objects for this sample.

save(): Save resource to the server.

set_annotation(full_path: str, value, force=False) → Optional[AnnotationValue][source]

Create/update annotation value.

If value is None the annotation is deleted and None is returned. If force is set to True no explicit confirmation is required to delete the annotation.

set_annotations(annotations: Dict[str, Any])[source]: Bulk set annotations on the sample.

settings: settings

slug: human-readable unique identifier

tags: tags

update()[source]: Clear cache and update resource fields from the server.

version: resource version

class resdk.resources.Relation(resolwe, **model_data)[source]

Resolwe Relation resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

add_sample(sample, label=None, position=None)[source]: Add sample object to relation.

category: category of the relation

property collection: Return collection object to which relation belongs.

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

descriptor: annotation data, with the form defined in descriptor_schema

descriptor_dirty: indicate whether descriptor doesn’t match descriptor_schema (is dirty)

property descriptor_schema: Get descriptor schema.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

property modified: Modification time.

name: name of resource

partitions: list of RelationPartition objects in the Relation

property permissions: Permissions.

remove_samples(*samples)[source]: Remove sample objects from relation.

property samples: Return list of sample objects in the relation.

save()[source]: Check that collection is saved and save instance.

slug: human-readable unique identifier

type: type of the relation

unit(where applicable, e.g. for serieses): unit (where applicable, e.g. for serieses)

update()[source]: Clear cache and update resource fields from the server.

version: resource version

class resdk.resources.Process(resolwe, **model_data)[source]

Resolwe Process resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

category: used to group processes in a GUI. Examples: upload:, analyses:variants:, …

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

data_name: the default name of data object using this process. When data object is created you can assign a name to it. But if you don’t, the name of data object is determined from this field. The field is a expression which can take values of other fields.

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

description: process description

entity_always_create: entity_always_create

entity_descriptor_schema: entity_descriptor_schema

entity_input: entity_input

entity_type: entity_type

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

input_schema: specifications of inputs

is_active: Boolean stating wether process is active

property modified: Modification time.

name: name of resource

output_schema: specification of outputs

property permissions: Permissions.

persistence: Measure of how important is to keep the process outputs when optimizing disk usage. Options: RAW/CACHED/TEMP. For processes, used on frontend use TEMP - the results of this processes can be quickly re-calculated any time. For upload processes use RAW - this data should never be deleted, since it cannot be re-calculated. For analysis use CACHED - the results can stil be calculated from imported data but it can take time.

print_inputs()[source]: Pretty print input_schema.

priority: process priority - not used yet

requirements: required Docker image, amount of memory / CPU …

run: the heart of process - here the algorithm is defined.

save(): Save resource to the server.

scheduling_class: Scheduling class

slug: human-readable unique identifier

type: the type of process "type:sub_type:sub_sub_type:..."

update(): Clear permissions cache and update the object.

version: resource version

class resdk.resources.DescriptorSchema(resolwe, **model_data)[source]

Resolwe DescriptorSchema resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

description: description

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

property modified: Modification time.

name: name of resource

property permissions: Permissions.

save(): Save resource to the server.

schema: schema

slug: human-readable unique identifier

update(): Clear permissions cache and update the object.

version: resource version

class resdk.resources.AnnotationValue(resolwe: Resolwe, **model_data)[source]

Resolwe AnnotationValue resource.

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

property field: AnnotationField: Get annotation field.

fields(): Resource fields.

id: unique identifier of an object

property sample: Get sample.

sample_id: Optional[int]: sample

save(): Save resource to the server.

update(): Update resource fields from the server.

class resdk.resources.AnnotationGroup(resolwe: Resolwe, **model_data)[source]

Resolwe AnnotationGroup resource.

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

save(): Save resource to the server.

update(): Update resource fields from the server.

class resdk.resources.AnnotationField(resolwe: Resolwe, **model_data)[source]

Resolwe AnnotationField resource.

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

property group: AnnotationGroup: Get annotation group.

id: unique identifier of an object

save(): Save resource to the server.

update(): Update resource fields from the server.

class resdk.resources.User(resolwe=None, **model_data)[source]

Resolwe User resource.

One and only one of the identifiers (slug, id or model_data) should be given.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

first_name: user’s first name

get_name()[source]: Return user’s name.

id: unique identifier of an object

save(): Save resource to the server.

update(): Update resource fields from the server.

class resdk.resources.Group(resolwe=None, **model_data)[source]

Resolwe Group resource.

One and only one of the identifiers (slug, id or model_data) should be given.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

add_users(*users)[source]: Add users to group.

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

name: group’s name

remove_users(*users)[source]: Remove users from group.

save(): Save resource to the server.

update()[source]: Clear cache and update resource fields from the server.

property users: Return list of users in group.

class resdk.resources.Geneset(resolwe, genes=None, source=None, species=None, **model_data)[source]

Resolwe Geneset resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

checksum: checksum field calculated on inputs

property children: Get children of this Data object.

property collection: Get collection.

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

descriptor: annotation data, with the form defined in descriptor_schema

descriptor_dirty: indicate whether descriptor doesn’t match descriptor_schema (is dirty)

property descriptor_schema: Get descriptor schema.

download(file_name=None, field_name=None, download_dir=None)

Download Data object’s files and directories.

Download files and directoriesfrom the Resolwe server to the download directory (defaults to the current working directory).

Parameters

file_name (string) – name of file or directory
field_name (string) – file or directory field name
download_dir (string) – download path

Return type

None

Data objects can contain multiple files and directories. All are downloaded by default, but may be filtered by name or output field:

re.data.get(42).download(file_name=’alignment7.bam’)
re.data.get(42).download(field_name=’bam’)

duplicate()

Duplicate (make copy of) data object.

Returns: Duplicated data object

duplicated: duplicated

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

files(file_name=None, field_name=None)

Get list of downloadable file fields.

Filter files by file name or output field.

Parameters

file_name (string) – name of file
field_name (string) – output field name

Return type

List of tuples (data_id, file_name, field_name, process_type)

property finished: Get finish time.

property genes: Get genes.

id: unique identifier of an object

input: actual input values

property modified: Modification time.

name: name of resource

output: actual output values

property parents: Get parents of this Data object.

property permissions: Permissions.

property process: Get process.

process_cores: process cores

process_error: error log message (list of strings)

process_info: info log message (list of strings)

process_memory: process memory

process_progress: process progress in percentage

process_rc: Process algorithm return code

process_resources: process_resources

process_warning: warning log message (list of strings)

property sample: Get sample.

save()[source]

Save Geneset to the server.

If Geneset is already on the server update with save() from base class. Otherwise, create a new Geneset by running process with slug “create-geneset”.

scheduled: scheduled

set_operator(operator, other)[source]

Perform set operations on Geneset object by creating a new Genseset.

Parameters

operator – string -> set operation function name
other – Geneset object

Returns

new Geneset object

size: size

slug: human-readable unique identifier

property source: Get source.

property species: Get species.

property started: Get start time.

status: process status - Possible values: UP (Uploading - for upload processes), RE (Resolving - computing input data objects) WT (Waiting - waiting for process since the queue is full) PP (Preparing - preparing the environment for processing) PR (Processing) OK (Done) ER (Error) DR (Dirty - Data is dirty)

stdout()

Return process standard output (stdout.txt file content).

Fetch stdout.txt file from the corresponding Data object and return the file content as string. The string can be long and ugly.

Return type: string

tags: data object’s tags

update(): Clear cache and update resource fields from the server.

version: resource version

class resdk.resources.Metadata(resolwe, **model_data)[source]

Metadata resource.

Parameters

resolwe (Resolwe object) – Resolwe instance
model_data – Resource model data

checksum: checksum field calculated on inputs

property children: Get children of this Data object.

property collection: Get collection.

property contributor: Contributor.

property created: Creation time.

current_user_permissions: current user permissions

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

descriptor: annotation data, with the form defined in descriptor_schema

descriptor_dirty: indicate whether descriptor doesn’t match descriptor_schema (is dirty)

property descriptor_schema: Get descriptor schema.

property df: Get table as pd.DataFrame.

property df_bytes: Get file contents of table output in bytes form.

download(file_name=None, field_name=None, download_dir=None)

Download Data object’s files and directories.

Download files and directoriesfrom the Resolwe server to the download directory (defaults to the current working directory).

Parameters

file_name (string) – name of file or directory
field_name (string) – file or directory field name
download_dir (string) – download path

Return type

None

Data objects can contain multiple files and directories. All are downloaded by default, but may be filtered by name or output field:

re.data.get(42).download(file_name=’alignment7.bam’)
re.data.get(42).download(field_name=’bam’)

duplicate()

Duplicate (make copy of) data object.

Returns: Duplicated data object

duplicated: duplicated

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

files(file_name=None, field_name=None)

Get list of downloadable file fields.

Filter files by file name or output field.

Parameters

file_name (string) – name of file
field_name (string) – output field name

Return type

List of tuples (data_id, file_name, field_name, process_type)

property finished: Get finish time.

get_df(parser=None, **kwargs)[source]: Get table as pd.DataFrame.

id: unique identifier of an object

input: actual input values

property modified: Modification time.

name: name of resource

output: actual output values

property parents: Get parents of this Data object.

property permissions: Permissions.

property process: Get process.

process_cores: process cores

process_error: error log message (list of strings)

process_info: info log message (list of strings)

process_memory: process memory

process_progress: process progress in percentage

process_rc: Process algorithm return code

process_resources: process_resources

process_warning: warning log message (list of strings)

property sample: Get sample.

save()[source]

Save Metadata to the server.

If Metadata is already uploaded: update. Otherwise, create new one.

scheduled: scheduled

set_df(value)[source]: Set df.

set_index(df)[source]

Set index of df to Sample ID.

If there is a column with Sample ID just set that as index. If there is Sample name or Sample slug column, map sample name / slug to sample ID’s and set ID’s as an index. If no suitable column in there, raise an error. Works also if any of the above options is already an index with appropriate name.

size: size

slug: human-readable unique identifier

property started: Get start time.

status: process status - Possible values: UP (Uploading - for upload processes), RE (Resolving - computing input data objects) WT (Waiting - waiting for process since the queue is full) PP (Preparing - preparing the environment for processing) PR (Processing) OK (Done) ER (Error) DR (Dirty - Data is dirty)

stdout()

Return process standard output (stdout.txt file content).

Fetch stdout.txt file from the corresponding Data object and return the file content as string. The string can be long and ugly.

Return type: string

tags: data object’s tags

property unique

Get unique attribute.

This attribute tells if Metadata has one-to-one or one-to-many relation to collection samples.

update(): Clear cache and update resource fields from the server.

validate_df(df)[source]

Validate df property.

Validates that df:

is an instance of pandas.DataFrame
index contains sample IDs that match some samples:
- If not matches, raise warning
- If there are samples in df but not in collection, raise warning
- If there are samples in collection but not in df, raise warning

version: resource version

class resdk.resources.kb.Feature(resolwe, **model_data)[source]

Knowledge base Feature resource.

aliases: Aliases

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

description: Description

feature_id: Feature ID

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

full_name: Full name

id: unique identifier of an object

name: Name

save(): Save resource to the server.

source: Source

species: Species

sub_type: Feature subtype (tRNA, protein coding, rRNA, …)

type: Feature type (gene, transcript, exon, …)

update(): Update resource fields from the server.

class resdk.resources.kb.Mapping(resolwe, **model_data)[source]

Knowledge base Mapping resource.

delete(force=False)

Delete the resource object from the server.

Parameters: force (bool) – Do not trigger confirmation prompt. WARNING: Be sure that you really know what you are doing as deleted objects are not recoverable.

classmethod fetch_object(resolwe, id=None, slug=None): Return resource instance that is uniquely defined by identifier.

fields(): Resource fields.

id: unique identifier of an object

save(): Save resource to the server.

source_db: Source database

source_id: Source feature ID

source_species: Source feature species

target_db: Target database

target_id: Target feature ID

target_species: Target feature species

update(): Update resource fields from the server.

Permissions

Resources like resdk.resources.Data, resdk.resources.Collection, resdk.resources.Sample, and resdk.resources.Process include a permissions attribute to manage permissions. The permissions attribute is an instance of resdk.resources.permissions.PermissionsManager.

class resdk.resources.permissions.PermissionsManager(all_permissions, api_root, resolwe)[source]

Helper class to manage permissions of the BaseResource.

clear_cache()[source]: Clear cache.

copy_from(source)[source]: Copy permissions from some other object to self.

property editors: Get users with edit permission.

fetch()[source]: Fetch permissions from server.

property owners: Get users with owner permission.

set_group(group, perm)[source]

Set perm permission to group.

When assigning permissions, only the highest permission needs to be given. Permission hierarchy is:

none (no permissions)

view

edit

share

owner

Some examples:

collection = res.collection.get(...)
# Add share, edit and view permission to BioLab:
collection.permissions.set_group('biolab', 'share')
# Remove share and edit permission from BioLab:
collection.permissions.set_group('biolab', 'view')
# Remove all permissions from BioLab:
collection.permissions.set_group('biolab', 'none')

set_public(perm)[source]

Set perm permission for public.

Public can only get two sorts of permissions:

none (no permissions)

view

Some examples:

collection = res.collection.get(...)
# Add view permission to public:
collection.permissions.set_public('view')
# Remove view permission from public:
collection.permissions.set_public('none')

set_user(user, perm)[source]

Set perm permission to user.

When assigning permissions, only the highest permission needs to be given. Permission hierarchy is:

none (no permissions)

view

edit

share

owner

Some examples:

collection = res.collection.get(...)
# Add share, edit and view permission to John:
collection.permissions.set_user('john', 'share')
# Remove share and edit permission from John:
collection.permissions.set_user('john', 'view')
# Remove all permissions from John:
collection.permissions.set_user('john', 'none')

property viewers: Get users with view permission.

Utility functions

Resource utility functions.

resdk.resources.utils.fill_spaces(word, desired_length)[source]: Fill spaces at the end until word reaches desired length.

resdk.resources.utils.flatten_field(field, schema, path)[source]

Reduce dicts of dicts to dot separated keys.

Parameters

field (dict) – Field instance (e.g. input)
schema (dict) – Schema instance (e.g. input_schema)
path (string) – Field path

Returns

flattened instance

Return type

dictionary

resdk.resources.utils.get_collection_id(collection)[source]: Return id attribute of the object if it is collection, otherwise return given value.

resdk.resources.utils.get_data_id(data)[source]: Return id attribute of the object if it is data, otherwise return given value.

resdk.resources.utils.get_descriptor_schema_id(dschema)[source]

Get descriptor schema id.

Return id attribute of the object if it is descriptor schema, otherwise return given value.

resdk.resources.utils.get_process_id(process)[source]: Return id attribute of the object if it is process, otherwise return given value.

resdk.resources.utils.get_relation_id(relation)[source]: Return id attribute of the object if it is relation, otherwise return given value.

resdk.resources.utils.get_sample_id(sample)[source]: Return id attribute of the object if it is sample, otherwise return given value.

resdk.resources.utils.get_user_id(user)[source]: Return id attribute of the object if it is relation, otherwise return given value.

resdk.resources.utils.is_collection(collection)[source]: Return True if passed object is Collection and False otherwise.

resdk.resources.utils.is_data(data)[source]: Return True if passed object is Data and False otherwise.

resdk.resources.utils.is_descriptor_schema(data)[source]: Return True if passed object is DescriptorSchema and False otherwise.

resdk.resources.utils.is_group(group)[source]: Return True if passed object is Group and False otherwise.

resdk.resources.utils.is_process(process)[source]: Return True if passed object is Process and False otherwise.

resdk.resources.utils.is_relation(relation)[source]: Return True if passed object is Relation and False otherwise.

resdk.resources.utils.is_sample(sample)[source]: Return True if passed object is Sample and False otherwise.

resdk.resources.utils.is_user(user)[source]: Return True if passed object is User and False otherwise.

resdk.resources.utils.iterate_fields(fields, schema)[source]

Recursively iterate over all DictField sub-fields.

Parameters

fields (dict) – Field instance (e.g. input)
schema (dict) – Schema instance (e.g. input_schema)

resdk.resources.utils.iterate_schema(fields, schema, path=None)[source]

Recursively iterate over all schema sub-fields.

Parameters

fields (dict) – Field instance (e.g. input)
schema (dict) – Schema instance (e.g. input_schema)

Path schema

Field path

Path schema

string

resdk.resources.utils.parse_resolwe_datetime(dtime)[source]: Convert string representation of time to local datetime.datetime object.

ReSDK Tables

Helper classes for aggregating collection data in tabular format.

Table classes

class resdk.tables.microarray.MATables(collection: Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]

A helper class to fetch collection’s microarray, qc and meta data.

This class enables fetching given collection’s data and returning it as tables which have samples in rows and microarray / qc / metadata in columns.

A simple example:

# Get Collection object
collection = res.collection.get("collection-slug")

# Fetch collection microarray and metadata
tables = MATables(collection)
meta = tables.meta
exp = tables.exp

__init__(collection: Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]

Initialize class.

Parameters

collection – collection to use
cache_dir – cache directory location, if not specified system specific cache directory is used
progress_callable – custom callable that can be used to report progress. By default, progress is written to stderr with tqdm

static clear_cache() → None: Remove ReSDK cache files from the default cache directory.

property exp: DataFrame: Return expressions values table as a pandas DataFrame object.

property meta: DataFrame

Return samples metadata table as a pandas DataFrame object.

Returns: table of metadata

property qc: DataFrame

Return samples QC table as a pandas DataFrame object.

Returns: table of QC values

property readable_index: Dict[int, str]: Get mapping from index values to readable names.

class resdk.tables.ml_ready.MLTables(collection, name)[source]

Machine-learning ready tables.

__init__(collection, name)[source]

Initialize class.

Parameters: collection – Collection to use

property exp

Get ML ready expressions as pandas.DataFrame.

These expressions are normalized and batch effect corrected - thus ready to be taken into ML procedures.

class resdk.tables.rna.RNATables(collection: Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None, expression_source: Optional[str] = None, expression_process_slug: Optional[str] = None)[source]

A helper class to fetch collection’s expression and meta data.

This class enables fetching given collection’s data and returning it as tables which have samples in rows and expressions/metadata in columns.

When calling RNATables.exp, RNATables.rc and RNATables.meta for the first time the corresponding data gets downloaded from the server. This data than gets cached in memory and on disc and is used in consequent calls. If the data on the server changes the updated version gets re-downloaded.

A simple example:

# Get Collection object
collection = res.collection.get("collection-slug")

# Fetch collection expressions and metadata
tables = RNATables(collection)
exp = tables.exp
rc = tables.rc
meta = tables.meta

__init__(collection: Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None, expression_source: Optional[str] = None, expression_process_slug: Optional[str] = None)[source]

Initialize class.

Parameters

collection – collection to use
cache_dir – cache directory location, if not specified system specific cache directory is used
progress_callable – custom callable that can be used to report progress. By default, progress is written to stderr with tqdm
expression_source – Only consider samples in the collection with specified source
expression_process_slug – Only consider samples in the collection with specified process slug

property build: str: Get build.

check_heterogeneous_collections()[source]: Ensure consistency among expressions.

static clear_cache() → None: Remove ReSDK cache files from the default cache directory.

property exp: DataFrame

Return expressions table as a pandas DataFrame object.

Which type of expressions (TPM, CPM, FPKM, …) get returned depends on how the data was processed. The expression type can be checked in the returned table attribute attrs[‘exp_type’]:

exp = tables.exp
print(exp.attrs['exp_type'])

Returns: table of expressions

property meta: DataFrame

Return samples metadata table as a pandas DataFrame object.

Returns: table of metadata

property qc: DataFrame

Return samples QC table as a pandas DataFrame object.

Returns: table of QC values

property rc: DataFrame

Return expression counts table as a pandas DataFrame object.

Returns: table of counts

property readable_columns: Dict[str, str]

Map of source gene ids to symbols.

This also gets fetched only once and then cached in memory and on disc. RNATables.exp or RNATables.rc must be called before this as the mapping is specific to just this data. Its intended use is to rename table column labels from gene ids to symbols.

Example of use:

exp = exp.rename(columns=tables.readable_columns)

Returns: dict with gene ids as keys and gene symbols as values

property readable_index: Dict[int, str]: Get mapping from index values to readable names.

class resdk.tables.methylation.MethylationTables(collection: Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]

A helper class to fetch collection’s methylation and meta data.

This class enables fetching given collection’s data and returning it as tables which have samples in rows and methylation/metadata in columns.

A simple example:

# Get Collection object
collection = res.collection.get("collection-slug")

# Fetch collection methylation and metadata
tables = MethylationTables(collection)
meta = tables.meta
beta = tables.beta
m_values = tables.mval

__init__(collection: Collection, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]

Initialize class.

Parameters

collection – collection to use
cache_dir – cache directory location, if not specified system specific cache directory is used
progress_callable – custom callable that can be used to report progress. By default, progress is written to stderr with tqdm

property beta: DataFrame: Return beta values table as a pandas DataFrame object.

static clear_cache() → None: Remove ReSDK cache files from the default cache directory.

property meta: DataFrame

Return samples metadata table as a pandas DataFrame object.

Returns: table of metadata

property mval: DataFrame: Return m-values as a pandas DataFrame object.

property qc: DataFrame

Return samples QC table as a pandas DataFrame object.

Returns: table of QC values

property readable_index: Dict[int, str]: Get mapping from index values to readable names.

class resdk.tables.variant.VariantTables(collection: Collection, geneset: Optional[List[str]] = None, filtering: bool = True, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]

A helper class to fetch collection’s variant and meta data.

This class enables fetching given collection’s data and returning it as tables which have samples in rows and variants in columns.

A simple example:

# Get Collection object
collection = res.collection.get("collection-slug")

tables = VariantTables(collection)
# Get variant data
tables.variants
# Get depth per variant or coverage for specific base
tables.depth
tables.depth_a
tables.depth_c
tables.depth_g
tables.depth_t

__init__(collection: Collection, geneset: Optional[List[str]] = None, filtering: bool = True, cache_dir: Optional[str] = None, progress_callable: Optional[Callable] = None)[source]

Initialize class.

Parameters

collection – Collection to use.
geneset – Only consider mutations from this gene-set. Can be a list of gene symbols or a valid geneset Data object id / slug.
filtering – Only show variants that pass QC filters.
cache_dir – Cache directory location, if not specified system specific cache directory is used.
progress_callable – Custom callable that can be used to report progress. By default, progress is written to stderr with tqdm.

static clear_cache() → None: Remove ReSDK cache files from the default cache directory.

property depth: DataFrame: Get depth table.

property depth_a: DataFrame: Get depth table for adenine.

property depth_c: DataFrame: Get depth table for cytosine.

property depth_g: DataFrame: Get depth table for guanine.

property depth_t: DataFrame: Get depth table for thymine.

property filter: DataFrame

Get filter table.

Values can be:

PASS - Variant has passed filters:

DP : Insufficient read depth (< 10.0)

QD: insufficient quality normalized by depth (< 2.0)

FS: insufficient phred-scaled p-value using Fisher’s exact
test to detect strand bias (> 30.0)

SnpCluster: Variant is part of a cluster

For example, if a variant has read depth 8, GATK will mark it as DP.

property geneset: Get geneset.

property meta: DataFrame

Return samples metadata table as a pandas DataFrame object.

Returns: table of metadata

property qc: DataFrame

Return samples QC table as a pandas DataFrame object.

Returns: table of QC values

property readable_index: Dict[int, str]: Get mapping from index values to readable names.

property variants: DataFrame

Get variants table.

There are 4 possible values:

0 - wild-type, no variant

1 - heterozygous mutation

2 - homozygous mutation

NaN - QC filters are failing - mutation status is unreliable

Exceptions

Custom ReSDK exceptions.

class resdk.exceptions.ValidationError[source]: An error while validating data.

Logging

Module contents:

Parent logger for all modules in resdk library
Handler STDOUT_HANDLER is “turned off” by default
Handler configuration functions
Override sys.excepthook to log all uncaught exceptions

Parent logger

Loggers in resdk are named by their module name. This is achieved by:

logger = logging.getLogger(__name__)

This makes it easy to locate the source of a log message.

Logging handlers

The handler STDOUT_HANDLER is created but not automatically added to ROOT_LOGGER, which means they do not do anything. The handlers are activated when users call logger configuration functions like start_logging().

Handler configuration functions

As a good logging practice, the library does not register handlers by default. The reason is that if the library is included in some application, developers of that application will probably want to register loggers by themself. Therefore, if a user wishes to register the pre-defined handlers she can run:

import resdk
resdk.start_logging()

resdk_logger.start_logging(logging_level=logging.INFO)

Start logging resdk with the default configuration.

Parameters: logging_level (int) – logging threshold level - integer in [0-50]
Return type: None

Logging levels:

logging.DEBUG(10)
logging.INFO(20)
logging.WARNING(30)
logging.ERROR(40)
logging.CRITICAL(50)

resdk_logger.log_to_stdout(level=None)

Configure logging to stdout.

Parameters

is_on (bool) – If True, log to standard output
level (int) – logging threshold level - integer in [0-50]

Return type

None

Log uncaught exceptions

All python exceptions are handled by function, stored in sys.excepthook. By rewriting the default implementation, we can modify it for our puruses - to log all uncaught exceptions.

Note#1: Modified behaviour (logging of all uncaught exceptions) applies only when runing in non-interactive mode.

Note#2: Any exception can be caught/uncaught and it can happen in interactive/non-interactive mode. This makes 4 different scenarios. The sys.excepthook modification takes care of uncaught exceptions in non-interactive mode. In interactive mode, user is notified directly if exception is raised. If exception is caught and not reraised, it should be logged somehow, since it can provide valuable information for developer when debugging. Therefore, we should use the following convention for logging in resdk: “Exceptions are explicitly logged only when they are caught and not re-raised.”