VCF File Annotation Using GeneBe API
The GeneBe API can be used for annotating a VCF file. Currently, there are two ways to do so. The recommended way is by using the GeneBeClient from the GeneBe CLI project, which requires you to have Java installed. Another method is using the Python client pygenebe
from the pygenebe repository; however, annotating VCF files using pygenebe
is deprecated.
Regardless of which client you choose, remember to use your API key to avoid exceeding the request limit.
GeneBeClient — genebe-cli
This is the recommended method for annotating VCF files.
Requirements and Installation
You need to have Java installed, version 21 or higher. To install, download the most recent .jar
file from GeneBe CLI releases. You can run the program like any jar
file by calling it from the command line:
java -jar GeneBeClient.jar
To get help, run:
java -jar GeneBeClient.jar help
For help on a specific command, run:
java -jar GeneBeClient-0.0.1-a.1.jar help vcf annotate
Running GeneBeClient
GeneBeClient can be used in two modes: interactive (using a shell) or by invoking commands from the command line with arguments. The interactive shell mode is useful for exploring the program’s features, as it supports features like tab completion.
Here is an example of running GeneBeClient:
java -jar GeneBeClient-0.0.1-a.1.jar \
vcf annotate \
--input-vcf myfile.vcf.gz \
--output-vcf /tmp/outputx.vcf \
--genome hg38 \
--api-key ak-YOUR_API_KEY \
--username YOUR-EMAIL@YOUR-INSTITUTION
Features
- Genome recognition — It is recommended that you provide the genome version used for creating the VCF. However, if you don’t, the client will try to identify whether the reference genome used in the VCF file is
hg19
orhg38
. - Splitting multiallelic sites — If your VCF has multiple
ALT
entries in a single row, GeneBeClient will split and normalize these into biallelic sites before annotation, which should work just fine. However, it is recommended to convert multiallelic VCF files to biallelic ones before using GeneBeClient. You can convert your VCF usingbcftools norm -m -any
. - Automatic liftover — The GeneBe API annotates variants using hg38 databases. If you provide a VCF with hg19 coordinates, each variant will automatically be lifted over to hg38 before annotation.
.netrc
file — GeneBeClient supports the.netrc
file for automatic login. This is useful if you don’t want to provide your API key each time you run a command.- Multiple output formats — You can receive annotations in several formats, including VCF, .XLSX, .MDB (MS Access), .TSV (Tab-Separated Values), or even .parquet. Check the help section for more information.