gpn-msa

version: 0.0.1(latest)

GPN-MSA - genomic pretrained network with multiple-sequence alignment

Description

GPN-MSA - DNA Language Model for Variant Effect Prediction GPN-MSA, crafted by the Song Lab at Cal, is a DNA language model leveraging whole-genome sequence alignments across 100 vertebrate species to predict variant effects genome-wide, excelling in both coding and non-coding regions of the human genome (hg38). Detailed in Benegas et al. (2023) on bioRxiv (https://www.biorxiv.org/content/10.1101/2023.10.10.561776v1), it trains in hours and outperforms existing models on benchmarks like ClinVar and gnomAD, offering a lightweight yet powerful alternative to protein-focused predictors. Its alignment-based approach enhances deleteriousness prediction, making it a standout tool for genomic research.

Build instructions

Script for converting to GeneBe Hub format is here: https://github.com/genebe-net/annotation-builder-scripts/tree/main/gpn-msa

Meta Information

Access:

PUBLIC

Author:

@genebe

Pull Command:

java -jar genebe.jar annotation pull --id @genebe/gpn-msa:0.0.1more examples

Created:

01 Mar 2025, 17:26:51 UTC

Type:

VARIANT

Genome:

GRCh38

Status:

ACTIVE

License:

NOT_SPECIFIED

Version: