Quick Start guide with GeneBe Hub

This document is a quick start introduction to GeneBe Hub main functionality: annotating the VCF using a selected annotations databases. If you want to read more about annotating VCF with ACMG scores you will find more information at VCF file annotatoin with ACMG scores.

Annotating a VCF File

The primary use case for GeneBe Hub is using its annotations to annotate a VCF file. Let’s assume you already have a VCF file containing a single Whole Exome Sequencing sample, based on the GRCh38 (hg38) reference genome, with variants normalized and represented as single alleles.

Prerequisites

Annotation

Start by logging into GeneBe. Run:

java -jar GeneBeClient.jar login

Next, choose the databases you want to use. You can browse available databases at GeneBe Hub browser. Be careful to choose only annotations that uses same species and genome as yours VCF file!

For the purpose of this example, we will use the GnomAD4 Exomes database to retrieve variant frequencies in the healthy population. It is a variant based annotation:

# This will download the @genebe/gnomad-exomes:0.0.1-4.1.0 database to your local computer
java -jar GeneBeClient.jar annotation pull --id @genebe/gnomad-exomes:0.0.1-4.1.0

If using GnomAD, it’s good practice to check the depth of coverage to assess the reliability of the frequencies. Use the GnomAD exomes depth database, which is a position based annotation:

java -jar GeneBeClient.jar annotation pull --id @genebe/gnomad-exomes-depth:0.0.1-4.1.0

Finally, let’s annotate the variants using the current version of ClinVar -- notice, that I don't specify the version here, the newest version will automatically be downloaded:

java -jar GeneBeClient.jar annotation pull --id @genebe/clinvar

Now, let’s annotate your VCF file, we will also use CCRS anntations, just to include a region based annotation in this showcase.

Assuming your input file is named input.vcf.gz and the output should be written to output.vcf.gz and output.tsv our command will look like:

java -jar GeneBeClient.jar vcf annotate \
    --input-vcf sample.vcf \
    --annotations @genebe/ccrs_hg38:0.0.1 @genebe/clinvar @genebe/gnomad-exomes-depth:0.0.1-4.1.0 @genebe/gnomad-exomes:0.0.1-4.1.0 \
    --output-vcf /tmp/output.vcf.gz \
    --output-tsv /tmp/output.tsv 

Voila, take a look at the output files now.

IMPORTANT NOTE: When you run vcf annotate this way, GeneBeClient does send variants to an api.genebe.net server. It's because by default remote annotation named @genebe/base, that includes ACMG, consequences, GnomAD, ClinVar and more "basic" annotations is turned on. You can turn this behaviour off with --omit-base-annotation true. This default annotation is for backward compatibility, when GeneBe as used mainly for the remote ACMG criteria annotation. This behavior is described on VCF annotation with ACMG criteria page.