Knowledge base

Genialis Knowledge base (KB) is a collection of “features” (genes, transcripts, …) and “mappings” between these features. It comes very handy when performing various tasks with genomic features e.g.:

  • find all aliases of gene BRCA2

  • finding all genes of type protein_coding

  • find all transcripts of gene FHIT

  • converting gene_id to gene_symbol

Feature

Feature object represents a genomic feature: a gene, a transcript, etc. You can query Feature objects by feature endpoint, similarly like Data, Sample or any other ReSDK resource:

feature = res.feature.get(feature_id="BRCA2")

To examine all attributes of a Feature, see the SDK Reference. Here we will list a few most commonly used ones:

# Get the feature:
feature = res.feature.get(feature_id="BRCA2")

# Database where this feature is defined, e.g. ENSEMBL, UCSC, NCBI, ...
feature.source

# Unique identifier of a feature
feature.feature_id

# Feature species
feature.species

# Feature type, e.g. gene, transcript, exon, ...
feature.type

# Feature name
feature.name

# List of feature aliases
feature.aliases

The real power is in the filter capabilities. Here are some examples:

# Count number of Human "protein-conding" transcripts in ENSEMBL database
res.feature.filter(
    species="Homo sapiens",
    type="transcript",
    subtype="protein-coding",
    source="ENSEMBL",
).count()

# Convert all gene IDs in a list ``gene_ids`` to gene symbols::
gene_ids = ["ENSG00000139618", "ENSG00000189283"]
genes = res.feature.filter(
    feature_id__in=gene_ids,
    type="gene",
    species="Homo sapiens",
)
mapping = {g.feature_id: g.name for g in genes}
gene_symbols = [mapping[gene_id] for gene_id in gene_ids]

Warning

It might look tempting to simplify the last example with:

gene_symbols = [g.name for g in genes]

Don’t do this. The order of entries in the genes can be arbitrary and therefore cause that the resulting list gene_symbols is not ordered in the same way as gene_ids.

Mapping

Mapping is a connection between two features. Features can be related in various ways. The type of mapping is indicated by relation_type attribute. It is one of the following options:

  • crossdb: Two features from different sources (databases) that describe same feature. Example: connection for gene BRCA2 between database “UniProtKB” and “UCSC”.

  • ortholog: Two features from different species that describe orthologous gene.

  • transcript: Connection between gene and it’s transcripts.

  • exon: Connection between gene / transcript and it’s exons.

Again, we will only list an example and then let your imagination fly:

# Find UniProtKB ID for gene with given ENSEMBL ID:
mapping = res.mapping.filter(
    source_id="ENSG00000189283",
    source_db="ENSEMBL",
    target_db="UniProtKB",
    source_species="Homo sapiens",
    target_species="Homo sapiens",
)
uniprot_id = mapping[0].target_id