Query, inspect and download data

Login

By now, you should have an account on the Genialis Server. If not, you can request a Demo. Let’s connect to the server by creating a Resolwe object:

import resdk

# Create a Resolwe object to interact with the server and login
res = resdk.Resolwe(url='https://app.genialis.com')
res.login()

# Enable verbose logging to standard output
resdk.start_logging()

If you omit the login() you will be logged as anonymous user. Note that this will strongly limit the things you can do.

Note

To avoid copy-pasting of the commands, you can download all the code used in this section.

Query resources

As you have read in the Genialis Server basics section, there are various resources: Data, Sample, Collection, Process… each of which has a corresponding entry-point on Resolwe object (in our case, this is the res variable). For example, to count all Data or Sample objects:

res.data.count()
res.sample.count()

Note

id is the autogenerated unique identifier of an object. IDs are: integers.

slug is the unique name of an object. The slug is automatically created from the name but can also be edited by the user, although we do not recommend that. Only lowercase letters, numbers and dashes are allowed (will not accept white space or uppercase letters).

name is an arbitrary, non unique name of an object.

In practice one typically wants to narrow down the amount of results. This can be done with the filter(**fields) method. It returns a list of objects under the conditions defined by **fields. For example:

# Get all Collection objects with "RNA-Seq" in their name
res.collection.filter(name__contains='RNA-Seq')

# Get all Processes with category "Align"
res.process.filter(category='Align')

Note

For a complete list of processes, their categories and definitions, please visit resolwe-bio docs

But the real power of the filter() method is in combining multiple parameters:

# Filter by using several fields:
from datetime import datetime

res.data.filter(
  status='OK',
  created__gt=datetime(2018, 10, 1),
  created__lt=datetime(2025, 11, 1),
  ordering='-modified',
  limit=3,
)

This will return data objects with OK status, created in October 2018, order them by descending modified date and return first 3 objects. Quite powerful isn’t it?

Note

For a complete list of filtering options use a “wrong” filtering argument and you will receive an informative message with all options listed. For example:

res.data.filter(foo="bar")

The get(**fields) method searches by the same parameters as filter and returns a single object (filter returns a list). If only one parameter is given, it will be interpreted as a unique identifier id or slug, depending on if it is a number or string:

# Get object by slug
res.sample.get('resdk-example')

Inspect resources

We have learned how to query the resources with get and filter. Now we will look at how to access the information in these resources. All of the resources share some common attributes like name, id, slug, created, modified, contributor and permissions. You can access them like any other Python class attributes:

# Get a data object:
data = res.data.get('resdk-example-reads')

# Object creator:
data.contributor
# Date and time of object creation:
data.created
# Name
data.name
# List of permissions
data.permissions

Aside from these attributes, each resource class has some specific attributes and methods. For example, some of the most used ones for Data resource:

data = res.data.get('resdk-example-reads')
data.status
data.process
data.started
data.finished
data.size

You can check list of methods defined for each of the resources in the reference section. Note that some attributes and methods are defined in the BaseResource and BaseCollection classes. BaseResource is the parent of all resource classes in resdk. BaseCollection is the parent of all collection-like classes: Sample and Collection

Quite commonly, one wants to inspect list of Data objects in Collection or to know the Sample of a given Data… For such purposes, there are some handy shortcuts:

data.sample

data.collection

sample.data

sample.collection

collection.data

collection.samples

Download data

Resource classes Data, Sample and Collection have the methods files() and download().

The files() method returns a list of all files on the resource but does not download anything.

# Get data by slug
data = res.data.get('resdk-example-reads')

# Print a list of files
data.files()

# Filter the list of files by file name
data.files(file_name='reads.fastq.gz')

# Filter the list of files by field name
data.files(field_name='output.fastq')

The method download() downloads the resource files. The optional parameters file_name and field_name have the same effect as in the files method. There is an additional parameter, download_dir, that allows you to specify the download directory:

# Get sample by slug
sample = res.sample.get('resdk-example')

# Download the FASTQ reads file into current directory
sample.download(
    file_name='reads.fastq.gz',
    download_dir='./',
)