Variant annotation API

This page describes in details the variant annotation API. If you are interested in annotating variants in Python remeber to visit the "pandas" chapter, where Python library is introduced. If you want to use API from any other well known language, consider generating a client using the OpenAPI definition published here https://api.genebe.net/cloud/gb-api-doc/swagger-ui/index.html . However, API is also convenient to use it without any wraper.

On this page examples will be presented using curl and simple browser links in case of GET queries.

Example of GET endpoint

GET endpoint is just for test purposes. If you want to annotate multiple variants please batch them using the POST endpoint described below. But for now: the GET endpoint:

curl -X 'GET' \
  'https://api.genebe.net/cloud/api-public/v1/variant?chr=7&pos=140753336&ref=A&alt=T&allGenes=False&genome=hg38&useEnsembl=False' \
  -H 'accept: */*'

See the results in browser by clicking

https://api.genebe.net/cloud/api-public/v1/variant?chr=7&pos=140753336&ref=A&alt=T&allGenes=False&genome=hg38&useEnsembl=False

{
  "variants": [
    {
      "chr": "7",
      "pos": 140753336,
      "ref": "A",
      "alt": "T",
      "effect": "missense_variant",
      "transcript": "NM_001374258.1",
      "consequences": [
        {
          "aa_ref": "V",
          "aa_alt": "E",
          "canonical": false,
          "protein_coding": true,
          "consequences": [
            "missense_variant"
          ],
          "exon_rank": 16,
          "exon_count": 20,
          "gene_symbol": "BRAF",
          "gene_hgnc_id": 1097,
          "hgvs_c": "c.1919T>A",
          "hgvs_p": "p.Val640Glu",
          "transcript": "NM_001374258.1",
          "protein_id": "NP_001361187.1",
          "aa_start": 640,
          "aa_length": 807,
          "cds_start": 1919,
          "cds_length": 2424,
          "cdna_start": 2145,
          "cdna_length": 9807,
          "mane_plus": "ENST00000644969.2"
        },
        [...]
      ],
      "gene_symbol": "BRAF",
      "gene_hgnc_id": 1097,
      "dbsnp": "rs113488022",
      "frequency_reference_population": 0.0000013692834,
      "hom_count_reference_population": 0,
      "allele_count_reference_population": 2,
      "gnomad_exomes_af": 0.000001369279971186188,
      "gnomad_genomes_af": null,
      "gnomad_exomes_ac": 2,
      "gnomad_genomes_ac": null,
      "gnomad_exomes_homalt": 0,
      "gnomad_genomes_homalt": null,
      "gnomad_mito_homoplasmic": null,
      "gnomad_mito_heteroplasmic": null,
      "computational_score_selected": 29.799999237060547,
      "computational_prediction_selected": "Pathogenic",
      "computational_source_selected": "Cadd",
      "splice_score_selected": 0.0,
      "splice_prediction_selected": "Benign",
      "splice_source_selected": "max_spliceai",
      "revel_score": 0.9309999942779541,
      "revel_prediction": "Pathogenic",
      "alphamissense_score": 0.9926999807357788,
      "alphamissense_prediction": "Pathogenic",
      "bayesdelnoaf_score": 0.3400000035762787,
      "bayesdelnoaf_prediction": "Pathogenic",
      "phylop100way_score": 9.236000061035156,
      "phylop100way_prediction": "Pathogenic",
      "spliceai_max_score": 0.0,
      "spliceai_max_prediction": "Benign",
      "dbscsnv_ada_score": null,
      "dbscsnv_ada_prediction": null,
      "apogee2_score": null,
      "apogee2_prediction": null,
      "mitotip_score": null,
      "mitotip_prediction": null,
      "acmg_score": 21,
      "acmg_classification": "Pathogenic",
      "acmg_criteria": "PS1,PM1,PM2,PM5,PP2,PP3_Moderate,PP5_Very_Strong",
      "acmg_by_gene": [

      ],
      "clinvar_disease": "Carcinoma of colon,Papillary thyroid carcinoma,Astrocytoma, low-grade, somatic,Nongerminomatous germ cell tumor,Non-small cell lung carcinoma,not provided,Melanoma,Cardio-facio-cutaneous syndrome,Malignant melanoma of skin,Glioblastoma,Squamous cell carcinoma of the head and neck,Colonic neoplasm,Ovarian neoplasm,Brainstem glioma,Lung adenocarcinoma,Multiple myeloma,Neoplasm of the large intestine,Lung carcinoma,Neoplasm of brain,Papillary renal cell carcinoma, sporadic,Gastrointestinal stromal tumor,Neoplasm,Cystic epithelial invagination containing papillae lined by columnar epithelium,Cerebral arteriovenous malformation,Nephroblastoma,Colorectal cancer,Malignant neoplastic disease,Lymphangioma,Vascular malformation,Cardiovascular phenotype",
      "clinvar_classification": "Pathogenic/Likely pathogenic",
      "phenotype_combined": null,
      "pathogenicity_classification_combined": null,
      "custom_annotations": null
    }
  ],
  "message": null
}

Important notices:

To make the output more readable some consequences were removed from the listing.
You may see consequences_ensembl and consequences_refseq in your answer. This are depreciated fields and will be removed soon. Please use the consequences field.
In the request I've explicitely asked NOT to add Ensembl consequences (useEnsembl=False).
The null values indicates no data.
acmg_by_gene is populated only if you set allGenes to true in the query
custom_annotations is populated only if customAnnotations is given. customAnnotations is a comma delimited list of custom annotations. If used new columns are added to the output, straight from our internal database. More documentation on available fields will be added soon.

Input

Variant description

Name	Description	Required
chr	Chromosome	Required
pos	Position of the change, as in VCF file	Required
ref	Refernece bases, only [ACGT]+ allowed	Required
alt	Alternate bases, only [ACGT]+ allowed	Required
transcript	Specify the transcript to use for ACMG score, if not specified usually MANE is selected	Optional
gene_symbol	Specify the transcript to use for ACMG score, usually the most affected gene is selected	Optional

Parameters

Name	Default	Description	Required
genome	hg38	You can use hg38 or hg19 here. If hg19 used, your queries will be lifted to hg38 before annotation	Required
useRefseq	true	Use transcripts from Refseq for consequences field.	Optional
useEnsembl	true	Use transcripts from Ensembl for consequences field.	Optional
omitAcmg	false	Don't add ACMG scores in the output. Set to true if you don't need them.	Optional
omitCsq	false	Don't add consequences in the output.	Optional
omitBasic	false	Don't add basic annotations (GnomAD frequencies etc) in the output.	Optional
omitAdvanced	false	Don't add advanced annotations (ClinVar frequencies etc) in the output.	Optional
omitNormalization	false	Don't normalize variants. Use only if you are sure they are normalized already.	Optional
allGenes	false	Compute ACMG score for all genes in this region.	Optional
customAnnotations	empty	Comma delimited list of custom annotations to be applied. Consult with documentation for recognized values.	Optional
annotator	snpeff	Which annotator to use. Please leave empty for now.	Optional

Output

Field	Description
`chr`	Chromosome where the variant is located. If lifting was required, this represents the new location.
`pos`	Position of the variant on the chromosome. If lifting was required, this represents the new location.
`ref`	Reference allele, i.e., the base found in the reference genome. This may differ from your query if lifting was required.
`alt`	Alternate allele, i.e., the base differing from the reference genome. This may differ from your query if lifting was required.
`effect`	Selected effect of the variant (e.g., missense_variant), typically computed for the most relevant transcript, usually the MANE transcript.
`transcript`	Selected transcript ID (e.g., RefSeq or Ensembl). Typically, this is the MANE transcript of the most affected gene.
`consequences`	An array of computed possible consequences.
`consequences.aa_ref`	Reference amino acid before the mutation.
`consequences.aa_alt`	Alternate amino acid after the mutation.
`consequences.canonical`	Indicates whether the transcript is the canonical (main) transcript for the gene (`true` or `false`). Not always populated.
`consequences.protein_coding`	Indicates if the transcript is protein-coding (`true` or `false`).
`consequences.consequences`	List of predicted biological consequences of the variant on the protein (e.g., `missense_variant`). Uses Sequence Ontology terms.
`consequences.exon_rank`	The exon number where the variant is located.
`consequences.exon_count`	Total number of exons in the transcript.
`consequences.gene_symbol`	The symbol of the gene where the variant is located (e.g., BRAF).
`consequences.gene_hgnc_id`	HGNC ID for the gene.
`consequences.hgvs_c`	HGVS notation describing the variant at the cDNA level.
`consequences.hgvs_p`	HGVS notation describing the variant at the protein level.
`consequences.transcript`	Transcript ID for this consequence.
`consequences.protein_id`	Protein ID linked to the transcript.
`consequences.aa_start`	Start position of the affected amino acid in the protein sequence.
`consequences.aa_length`	Total length of the protein sequence.
`consequences.cds_start`	Start position of the coding sequence (CDS) affected by the variant.
`consequences.cds_length`	Total length of the coding sequence.
`consequences.cdna_start`	Start position of the variant in the cDNA sequence.
`consequences.cdna_length`	Total length of the cDNA sequence.
`consequences.mane_plus`	MANE Plus Clinical transcript ID (a reference transcript for clinical reporting).
`gene_symbol`	Selected gene symbol where the variant occurs.
`gene_hgnc_id`	Selected HGNC ID for the gene.
`dbsnp`	dbSNP ID for the variant (if present).
`frequency_reference_population`	Aggregated frequency of the variant in various population databases (currently GnomAD Genomes and Exomes). May be `null` if no reliable data is available (e.g., due to low coverage or filtering).
`hom_count_reference_population`	Total number of homozygous individuals for this variant in population databases (currently GnomAD Genomes and Exomes).
`allele_count_reference_population`	Total allele count for the variant across all individuals in population databases (currently GnomAD Genomes and Exomes).
`gnomad_exomes_af`	Allele frequency in gnomAD exome data.
`gnomad_genomes_af`	Allele frequency in gnomAD genome data (may be `null` if unavailable).
`gnomad_exomes_ac`	Allele count in gnomAD exome data.
`gnomad_genomes_ac`	Allele count in gnomAD genome data (may be `null` if unavailable).
`gnomad_exomes_homalt`	Homozygous alternate count in gnomAD exome data.
`gnomad_genomes_homalt`	Homozygous alternate count in gnomAD genome data (may be `null` if unavailable).
`gnomad_mito_homoplasmic`	Homoplasmic variant count in mitochondrial data from gnomAD (if applicable).
`gnomad_mito_heteroplasmic`	Heteroplasmic variant count in mitochondrial data from gnomAD (if applicable).
`computational_score_selected`	Computational prediction score from the most reliable tool for variant pathogenicity (e.g., CADD, REVEL).
`computational_prediction_selected`	Prediction label based on the computational score (e.g., "Pathogenic", "Benign").
`computational_source_selected`	Source of the computational prediction (e.g., CADD, REVEL).
`splice_score_selected`	Maximum splice effect prediction score for the variant, predicted by the most reliable tool.
`splice_prediction_selected`	Prediction of whether the variant affects splicing (e.g., "Benign", "Pathogenic").
`splice_source_selected`	Source of the splicing prediction (e.g., SpliceAI).
`revel_score`	REVEL score for variant pathogenicity prediction.
`revel_prediction`	REVEL prediction label (e.g., "Pathogenic").
`alphamissense_score`	AlphaMissense score for missense variant pathogenicity.
`alphamissense_prediction`	AlphaMissense prediction label (e.g., "Pathogenic").
`bayesdelnoaf_score`	BayesDelNoAF score for variant pathogenicity prediction.
`bayesdelnoaf_prediction`	BayesDelNoAF prediction label (e.g., "Pathogenic").
`phylop100way_score`	PhyloP score for evolutionary conservation at the variant position (higher scores suggest greater conservation).
`phylop100way_prediction`	PhyloP prediction label (e.g., "Pathogenic").
`spliceai_max_score`	Maximum SpliceAI score for splicing impact prediction. This is the highest value from AL, DL, AG, and DG scores.
`spliceai_max_prediction`	SpliceAI prediction label (e.g., "Benign").
`dbscsnv_ada_score`	ADA score from dbscSNV for splicing impact prediction (if available).
`dbscsnv_ada_prediction`	ADA prediction label (if available).
`acmg_score`	ACMG (American College of Medical Genetics) score for the variant, automatically evaluated based on GeneBe implementation.
`acmg_classification`	ACMG classification (e.g., "Pathogenic", "Likely Pathogenic").
`acmg_criteria`	Specific ACMG criteria met by the variant (e.g., PS1, PM1), comma-separated.
`clinvar_disease`	List of diseases associated with the variant in ClinVar.
`clinvar_classification`	ClinVar classification for the variant (e.g., "Pathogenic", "Likely Pathogenic").

Moreover, at the top level there is a message field, that may contain important message. Usually null.

Example of POST endpoint

It is very similar to the GET endpoint, just allows user to annotate multiple entries at once. You can send up to 1,000 variants in one request, but usually it's better to send them in smaller chunks, not to get timeout on some more computationally intensive request. Test for example batches of 500 variants. For the parameters and the description of the output please read the GET documentation above.

The body of the post is a JSON list of variants:

[
  {
    "chr": "string",
    "pos": 0,
    "ref": "string",
    "alt": "string",
    "transcript": "string",
    "gene_symbol": "string"
  }
]

where transcript and gene_symbol are optional (and rarely used). Take a look at the table in the GET documentation for more information.

To continue the example of BRAF V600E from the GET documentation above, let's create a body and curl it to the API:


curl -X 'POST' \
  'https://api.genebe.net/cloud/api-public/v1/variants?useRefseq=True&useEnsembl=True&omitAcmg=False&omitCsq=False&omitBasic=False&omitAdvanced=False&omitNormalization=False&allGenes=False&genome=hg38' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '[
  {
    "chr": "7",
    "pos": 140753336,
    "ref": "A",
    "alt": "T"
  }
]'

And again we get:


{
  "variants": [
    {
      "chr": "7",
      "pos": 140753336,
      "ref": "A",
      "alt": "T",
      "effect": "missense_variant",
      "transcript": "NM_001374258.1",
      "consequences": [
        {
          "aa_ref": "V",
          "aa_alt": "E",
          "canonical": false,
          "protein_coding": true,
          "consequences": [
            "missense_variant"
          ],
          "exon_rank": 16,
          "exon_count": 20,
          "gene_symbol": "BRAF",
          "gene_hgnc_id": 1097,
          "hgvs_c": "c.1919T>A",
          "hgvs_p": "p.Val640Glu",
          "transcript": "NM_001374258.1",
          "protein_id": "NP_001361187.1",
          "aa_start": 640,
          "aa_length": 807,
          "cds_start": 1919,
          "cds_length": 2424,
          "cdna_start": 2145,
          "cdna_length": 9807,
          "mane_plus": "ENST00000644969.2"
        },
       ...
      ],
      "gene_symbol": "BRAF",
      "gene_hgnc_id": null,
      "dbsnp": "113488022",
      "frequency_reference_population": 0.0000013692834,
      "hom_count_reference_population": 0,
      "allele_count_reference_population": 2,
      "gnomad_exomes_af": 0.000001369279971186188,
      "gnomad_genomes_af": null,
      "gnomad_exomes_ac": 2,
      "gnomad_genomes_ac": null,
      "gnomad_exomes_homalt": 0,
      "gnomad_genomes_homalt": null,
      "gnomad_mito_homoplasmic": null,
      "gnomad_mito_heteroplasmic": null,
      "computational_prediction_selected": "Pathogenic",
      "splice_prediction_selected": "Benign",
      "revel_score": 0.9309999942779541,
      "revel_prediction": "Pathogenic",
      "alphamissense_score": 0.9926999807357788,
      "alphamissense_prediction": "Pathogenic",
      "bayesdelnoaf_score": 0.3400000035762787,
      "bayesdelnoaf_prediction": "Pathogenic",
      "phylop100way_score": 9.236000061035156,
      "phylop100way_prediction": "Pathogenic",
      "spliceai_max_score": 0,
      "spliceai_max_prediction": "Benign",
      "dbscsnv_ada_score": null,
      "dbscsnv_ada_prediction": null,
      "apogee2_score": null,
      "apogee2_prediction": null,
      "mitotip_score": null,
      "mitotip_prediction": null,
      "acmg_score": 21,
      "acmg_classification": "Pathogenic",
      "acmg_criteria": "PS1,PM1,PM2,PM5,PP2,PP3_Moderate,PP5_Very_Strong",
      "acmg_by_gene": [],
      "clinvar_disease": "Carcinoma of colon,Papillary thyroid carcinoma,Astrocytoma, low-grade, somatic,Nongerminomatous germ cell tumor,Non-small cell lung carcinoma,not provided,Melanoma,Cardio-facio-cutaneous syndrome,Malignant melanoma of skin,Glioblastoma,Squamous cell carcinoma of the head and neck,Colonic neoplasm,Ovarian neoplasm,Brainstem glioma,Lung adenocarcinoma,Multiple myeloma,Neoplasm of the large intestine,Lung carcinoma,Neoplasm of brain,Papillary renal cell carcinoma, sporadic,Gastrointestinal stromal tumor,Neoplasm,Cystic epithelial invagination containing papillae lined by columnar epithelium,Cerebral arteriovenous malformation,Nephroblastoma,Colorectal cancer,Malignant neoplastic disease,Lymphangioma,Vascular malformation,Cardiovascular phenotype",
      "clinvar_classification": "Pathogenic/Likely pathogenic",
      "phenotype_combined": null,
      "pathogenicity_classification_combined": null,
      "custom_annotations": null
    }
  ],
  "message": null
}