Using GeneBe API with python

Purpose

One of the problems I've tried to solve was to have a convenient tool to operate with many genomic variants stored in the dataframe. I had variants stored in csv files, some in vcf, some in jsons. I could read them all easily using pandas, but annotating them or operations like lifting over to hg38 or parsing the HGVS streams was a nightmare -- downloading databases, installing dependencies, reformatting the input. The https://github.com/pstawinski/pygenebe is addressing all this issues, letting bioinformats to focus on what is important and have annotation or conversion just done.

Installing

pygenebe can be installed using pip, simply call:

pip install genebe

Examples

Not all examples below uses pandas, some uses plain python. For more examples check "More information" chapter.

There also is a showcase here: https://github.com/pstawinski/pygenebe/blob/main/examples/showcase.ipynb showing the main features.

Annotating a simple list of variants

Just a singleton list, to make output more readable. But you can put as many variants as you wish into the list.

import genebe as gnb

input_variants = ['7-69599651-A-G']
list = gnb.annotate(input_variants,flatten_consequences = True)

With result:

[{'chr': '7', 'pos': 69599651, 'ref': 'A', 'alt': 'G', 'effect': '5_prime_UTR_variant', 'transcript': 'NM_015570.4', 'consequences': '5_prime_UTR_variant', 'gene_symbol': 'AUTS2', 'dbsnp': '3735260', 'frequency_reference_population': 0.08464142, 'hom_count_reference_population': 5804, 'allele_count_reference_population': 109796, 'gnomad_exomes_af': 0.07914809882640839, 'gnomad_genomes_af': 0.12610900402069092, 'gnomad_exomes_ac': 90660, 'gnomad_genomes_ac': 19136, 'gnomad_exomes_homalt': 4036, 'gnomad_genomes_homalt': 1768, 'computational_prediction_selected': 'Benign', 'splice_prediction_selected': 'Benign', 'bayesdelnoaf_score': -0.5899999737739563, 'bayesdelnoaf_prediction': 'Benign', 'phylop100way_score': 1.0399999618530273, 'spliceai_max_score': 0.0, 'spliceai_max_prediction': 'Benign', 'acmg_score': -14, 'acmg_classification': 'Benign', 'acmg_criteria': 'BP4_Strong,BP6_Moderate,BA1', 'clinvar_disease': 'not provided', 'clinvar_classification': 'Benign', 'gene_hgnc_id': 14262, 'hgvs_c': 'c.-3A>G'}]

Annotating a pandas dataframe with variants

import pandas as pd
import genebe as gnb

data = {
    'chr': ['chr6'],
    'pos': [161006172],
    'ref': ['T'],
    'alt': ['G']
}

df = pd.DataFrame(data)
result = gnb.annotate(df,use_ensembl=False,use_refseq=True, genome='hg38', flatten_consequences=True)
result.to_csv('output_file.csv', sep='\t', header=True, index=False)

And the result is table:

chr pos ref alt effect transcript consequences gene_symbol aggregated_hom_count aggregated_ac_count aggregated_computational_prediction aggregated_splice_prediction bayesdelnoaf_score bayesdelnoaf_prediction phylop100way_score spliceai_max_score spliceai_max_prediction acmg_score acmg_classification acmg_criteria gene_hgnc_id hgvs_c
chr6 161006172 T G intron_variant NM_005922.4 intron_variant MAP3K4 0 0 Benign Benign -0.6499999761581421 Benign 0.578000009059906 0.0 Benign -2 Likely_benign PM2,BP4_Strong 6856 c.152+14089C>G

Making a liftover

import genebe as gnb
gnb.lift_over_variants(['chr6-161006172-T-G'], from_genome='hg19', dest_genome='hg38')

will return

['chr6-160585140-T-G']

HGVS, dbSNP, SPDI and other variants conversion

Mind the different ways to represent a variant below:

  • ENST00000679957.1:c.803C>T - HGVS
  • rs11 - dbSNP identifier
  • AGT M259T - aminoacid change
  • chrX:153803771:1:A - SPDI variant representation
  • and some more, chech main https://genebe.net page for the supported variant representations

parse_variants understands these representations and is able to convert them to standard way. BEWARE: In case there are many possible variants - for example rs describes more than just one variant - still one variant is returned.

import genebe as gnb

parsed = gnb.parse_variants(['ENST00000679957.1:c.803C>T',
                         'ENST00000404276.6:c.1100del',
                         'NC_000003.12:g.39394574A>T',
                         'NC_012920.1:m.1243T>C',
                         'rs11',
                         'AGT M259T',
                         'chrX:153803771:1:A'] )
['1-230710021-G-A', '22-28695868-AG-A', '3-39394574-A-T', 'M-1243-T-C', '7-11324574-C-T', '1-230710048-A-G']

More information

You can find more information here: https://pygenebe.readthedocs.io/en/latest/ or by exploring examples in the code itself here https://github.com/pstawinski/pygenebe/tree/main/examples .