Using GeneBe API with python
Purpose
One of the problems I've tried to solve was to have a convenient tool to operate with many genomic variants stored in the dataframe. I had variants stored in csv files, some in vcf, some in jsons. I could read them all easily using pandas
, but annotating them or operations like lifting over to hg38 or parsing the HGVS streams was a nightmare -- downloading databases, installing dependencies, reformatting the input. The https://github.com/pstawinski/pygenebe is addressing all this issues, letting bioinformats to focus on what is important and have annotation or conversion just done.
Installing
pygenebe
can be installed using pip
, simply call:
pip install genebe
Examples
Not all examples below uses pandas, some uses plain python. For more examples check "More information" chapter.
There also is a showcase here: https://github.com/pstawinski/pygenebe/blob/main/examples/showcase.ipynb showing the main features.
Annotating a simple list of variants
Just a singleton list, to make output more readable. But you can put as many variants as you wish into the list.
import genebe as gnb
input_variants = ['7-69599651-A-G']
list = gnb.annotate(input_variants,flatten_consequences = True)
With result:
[{'chr': '7', 'pos': 69599651, 'ref': 'A', 'alt': 'G', 'effect': '5_prime_UTR_variant', 'transcript': 'NM_015570.4', 'consequences': '5_prime_UTR_variant', 'gene_symbol': 'AUTS2', 'dbsnp': '3735260', 'frequency_reference_population': 0.08464142, 'hom_count_reference_population': 5804, 'allele_count_reference_population': 109796, 'gnomad_exomes_af': 0.07914809882640839, 'gnomad_genomes_af': 0.12610900402069092, 'gnomad_exomes_ac': 90660, 'gnomad_genomes_ac': 19136, 'gnomad_exomes_homalt': 4036, 'gnomad_genomes_homalt': 1768, 'computational_prediction_selected': 'Benign', 'splice_prediction_selected': 'Benign', 'bayesdelnoaf_score': -0.5899999737739563, 'bayesdelnoaf_prediction': 'Benign', 'phylop100way_score': 1.0399999618530273, 'spliceai_max_score': 0.0, 'spliceai_max_prediction': 'Benign', 'acmg_score': -14, 'acmg_classification': 'Benign', 'acmg_criteria': 'BP4_Strong,BP6_Moderate,BA1', 'clinvar_disease': 'not provided', 'clinvar_classification': 'Benign', 'gene_hgnc_id': 14262, 'hgvs_c': 'c.-3A>G'}]
Annotating a pandas dataframe with variants
import pandas as pd
import genebe as gnb
data = {
'chr': ['chr6'],
'pos': [161006172],
'ref': ['T'],
'alt': ['G']
}
df = pd.DataFrame(data)
result = gnb.annotate(df,use_ensembl=False,use_refseq=True, genome='hg38', flatten_consequences=True)
result.to_csv('output_file.csv', sep='\t', header=True, index=False)
And the result is table:
chr | pos | ref | alt | effect | transcript | consequences | gene_symbol | aggregated_hom_count | aggregated_ac_count | aggregated_computational_prediction | aggregated_splice_prediction | bayesdelnoaf_score | bayesdelnoaf_prediction | phylop100way_score | spliceai_max_score | spliceai_max_prediction | acmg_score | acmg_classification | acmg_criteria | gene_hgnc_id | hgvs_c |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr6 | 161006172 | T | G | intron_variant | NM_005922.4 | intron_variant | MAP3K4 | 0 | 0 | Benign | Benign | -0.6499999761581421 | Benign | 0.578000009059906 | 0.0 | Benign | -2 | Likely_benign | PM2,BP4_Strong | 6856 | c.152+14089C>G |
Making a liftover
import genebe as gnb
gnb.lift_over_variants(['chr6-161006172-T-G'], from_genome='hg19', dest_genome='hg38')
will return
['chr6-160585140-T-G']
HGVS, dbSNP, SPDI and other variants conversion
Mind the different ways to represent a variant below:
ENST00000679957.1:c.803C>T
- HGVSrs11
- dbSNP identifierAGT M259T
- aminoacid changechrX:153803771:1:A
- SPDI variant representation- and some more, chech main https://genebe.net page for the supported variant representations
parse_variants
understands these representations and is able to convert them to standard way. BEWARE: In case there are many possible variants - for example rs describes more than just one variant - still one variant is returned.
import genebe as gnb
parsed = gnb.parse_variants(['ENST00000679957.1:c.803C>T',
'ENST00000404276.6:c.1100del',
'NC_000003.12:g.39394574A>T',
'NC_012920.1:m.1243T>C',
'rs11',
'AGT M259T',
'chrX:153803771:1:A'] )
['1-230710021-G-A', '22-28695868-AG-A', '3-39394574-A-T', 'M-1243-T-C', '7-11324574-C-T', '1-230710048-A-G']
More information
You can find more information here: https://pygenebe.readthedocs.io/en/latest/ or by exploring examples in the code itself here https://github.com/pstawinski/pygenebe/tree/main/examples .