Query, inspect and download data
Login
By now, you should have an account on the Genialis Server. If not, you can
request a Demo. Let’s connect to the server by creating a
Resolwe
object:
import resdk
# Create a Resolwe object to interact with the server and login
res = resdk.Resolwe(url='https://app.genialis.com')
res.login()
# Enable verbose logging to standard output
resdk.start_logging()
If you omit the login()
you will be logged as anonymous user. Note that this
will strongly limit the things you can do.
Note
To avoid copy-pasting of the commands, you can
download all the code
used in this section.
Query resources
As you have read in the Genialis Server basics section, there are various
resources: Data
,
Sample
,
Collection
,
Process
… each of which has a
corresponding entry-point on Resolwe
object (in our case, this is the
res
variable). For example, to count all Data
or Sample
objects:
res.data.count()
res.sample.count()
Note
id
is the autogenerated unique identifier of an object. IDs areintegers.
slug
is the unique name of an object. The slug is automatically
created from the name
but can also be edited by the user, although
we do not recommend that. Only lowercase letters, numbers and dashes are
allowed (will not accept white space or uppercase letters).
name
is an arbitrary, non unique name of an object.
In practice one typically wants to narrow down the amount of results. This can
be done with the filter(**fields)
method. It
returns a list of objects under the conditions defined by **fields
. For
example:
# Get all Collection objects with "RNA-Seq" in their name
res.collection.filter(name__contains='RNA-Seq')
# Get all Processes with category "Align"
res.process.filter(category='Align')
Note
For a complete list of processes, their categories and definitions, please visit resolwe-bio docs
But the real power of the filter()
method is in combining multiple
parameters:
# Filter by using several fields:
from datetime import datetime
res.data.filter(
status='OK',
created__gt=datetime(2018, 10, 1),
created__lt=datetime(2025, 11, 1),
ordering='-modified',
limit=3,
)
This will return data objects with OK status, created in October 2018, order them by descending modified date and return first 3 objects. Quite powerful isn’t it?
Note
For a complete list of filtering options use a “wrong” filtering argument and you will receive an informative message with all options listed. For example:
res.data.filter(foo="bar")
The get(**fields)
method searches by the same
parameters as filter
and returns a single object (filter
returns a
list). If only one parameter is given, it will be interpreted as a unique
identifier id
or slug
, depending on if it is a number or string:
# Get object by slug
res.sample.get('resdk-example')
Inspect resources
We have learned how to query the resources with get
and filter
. Now we
will look at how to access the information in these resources. All of the
resources share some common attributes like name
, id
, slug
,
created
, modified
, contributor
and permissions
. You can access
them like any other Python class attributes:
# Get a data object:
data = res.data.get('resdk-example-reads')
# Object creator:
data.contributor
# Date and time of object creation:
data.created
# Name
data.name
# List of permissions
data.permissions
Aside from these attributes, each resource class has some specific attributes
and methods. For example, some of the most used ones for Data
resource:
data = res.data.get('resdk-example-reads')
data.status
data.process
data.started
data.finished
data.size
You can check list of methods defined for each of the resources in the
reference section. Note that some attributes and methods are
defined in the BaseResource
and
BaseCollection
classes.
BaseResource
is the parent of all
resource classes in resdk
.
BaseCollection
is the parent
of all collection-like classes: Sample
and
Collection
Quite commonly, one wants to inspect list of Data
objects in Collection
or to know the Sample
of a given Data
… For such purposes, there are
some handy shortcuts:
Download data
Resource classes Data
, Sample
and
Collection
have the methods files()
and download()
.
The files()
method returns a list of all files on the resource but does not
download anything.
# Get data by slug
data = res.data.get('resdk-example-reads')
# Print a list of files
data.files()
# Filter the list of files by file name
data.files(file_name='reads.fastq.gz')
# Filter the list of files by field name
data.files(field_name='output.fastq')
The method download()
downloads the resource files. The optional parameters
file_name
and field_name
have the same effect as in the files
method. There is an additional parameter, download_dir
, that allows you to
specify the download directory:
# Get sample by slug
sample = res.sample.get('resdk-example')
# Download the FASTQ reads file into current directory
sample.download(
file_name='reads.fastq.gz',
download_dir='./',
)