VCF File Annotation Using GeneBe API

The GeneBe API can be used for annotating a VCF file. Currently, there are two ways to do so. The recommended way is by using the GeneBeClient from the GeneBe CLI project, which requires you to have Java installed. Another method is using the Python client pygenebe from the pygenebe repository; however, annotating VCF files using pygenebe is deprecated.

Regardless of which client you choose, remember to use your API key to avoid exceeding the request limit.

GeneBeClient — genebe-cli

This is the recommended method for annotating VCF files.

Requirements and Installation

You need to have Java installed, version 21 or higher. To install, download the most recent .jar file from GeneBe CLI releases. You can run the program like any jar file by calling it from the command line:

java -jar GeneBeClient.jar

To get help, run:

java -jar GeneBeClient.jar help

For help on a specific command, run:

java -jar GeneBeClient-0.0.1-a.1.jar help vcf annotate

Running GeneBeClient

GeneBeClient can be used in two modes: interactive (using a shell) or by invoking commands from the command line with arguments. The interactive shell mode is useful for exploring the program’s features, as it supports features like tab completion.

Example of running GeneBeClient in shell

Here is an example of running GeneBeClient:

java -jar GeneBeClient-0.0.1-a.1.jar \
    vcf annotate \
    --input-vcf myfile.vcf.gz \
    --output-vcf /tmp/outputx.vcf \
    --genome hg38 \
    --api-key ak-YOUR_API_KEY \
    --username YOUR-EMAIL@YOUR-INSTITUTION

Features

  • Genome recognition — It is recommended that you provide the genome version used for creating the VCF. However, if you don’t, the client will try to identify whether the reference genome used in the VCF file is hg19 or hg38.
  • Splitting multiallelic sites — If your VCF has multiple ALT entries in a single row, GeneBeClient will split and normalize these into biallelic sites before annotation, which should work just fine. However, it is recommended to convert multiallelic VCF files to biallelic ones before using GeneBeClient. You can convert your VCF using bcftools norm -m -any.
  • Automatic liftover — The GeneBe API annotates variants using hg38 databases. If you provide a VCF with hg19 coordinates, each variant will automatically be lifted over to hg38 before annotation.
  • .netrc file — GeneBeClient supports the .netrc file for automatic login. This is useful if you don’t want to provide your API key each time you run a command.
  • Multiple output formats — You can receive annotations in several formats, including VCF, .XLSX, .MDB (MS Access), .TSV (Tab-Separated Values), or even .parquet. Check the help section for more information.