Knowledge base
Genialis Knowledge base (KB) is a collection of “features” (genes, transcripts, …) and “mappings” between these features. It comes very handy when performing various tasks with genomic features e.g.:
find all aliases of gene
BRCA2
finding all genes of type
protein_coding
find all transcripts of gene
FHIT
converting
gene_id
togene_symbol
…
Feature
Feature
object represents a genomic feature: a gene, a transcript, etc.
You can query Feature
objects by feature
endpoint, similarly like
Data
, Sample
or any other ReSDK resource:
feature = res.feature.get(feature_id="BRCA2")
To examine all attributes of a Feature
, see the SDK Reference.
Here we will list a few most commonly used ones:
# Get the feature:
feature = res.feature.get(feature_id="BRCA2")
# Database where this feature is defined, e.g. ENSEMBL, UCSC, NCBI, ...
feature.source
# Unique identifier of a feature
feature.feature_id
# Feature species
feature.species
# Feature type, e.g. gene, transcript, exon, ...
feature.type
# Feature name
feature.name
# List of feature aliases
feature.aliases
The real power is in the filter capabilities. Here are some examples:
# Count number of Human "protein-conding" transcripts in ENSEMBL database
res.feature.filter(
species="Homo sapiens",
type="transcript",
subtype="protein-coding",
source="ENSEMBL",
).count()
# Convert all gene IDs in a list ``gene_ids`` to gene symbols::
gene_ids = ["ENSG00000139618", "ENSG00000189283"]
genes = res.feature.filter(
feature_id__in=gene_ids,
type="gene",
species="Homo sapiens",
)
mapping = {g.feature_id: g.name for g in genes}
gene_symbols = [mapping[gene_id] for gene_id in gene_ids]
Warning
It might look tempting to simplify the last example with:
gene_symbols = [g.name for g in genes]
Don’t do this. The order of entries in the genes
can be arbitrary
and therefore cause that the resulting list gene_symbols
is not
ordered in the same way as gene_ids
.
Mapping
Mapping is a connection between two features. Features can be related
in various ways. The type of mapping is indicated by relation_type
attribute. It is one of the following options:
crossdb
: Two features from different sources (databases) that describe same feature. Example: connection for gene BRCA2 between database “UniProtKB” and “UCSC”.
ortholog
: Two features from different species that describe orthologous gene.
transcript
: Connection between gene and it’s transcripts.
exon
: Connection between gene / transcript and it’s exons.
Again, we will only list an example and then let your imagination fly:
# Find UniProtKB ID for gene with given ENSEMBL ID:
mapping = res.mapping.filter(
source_id="ENSG00000189283",
source_db="ENSEMBL",
target_db="UniProtKB",
source_species="Homo sapiens",
target_species="Homo sapiens",
)
uniprot_id = mapping[0].target_id