Evo 2: Revolutionizing DNA Research with AI-Powered Genome Analysis and Design
March 4, 2026
Evo 2 is a DNA foundation model from Arc Institute and NVIDIA, published in Nature, trained on DNA from over 100,000 species to identify disease-causing mutations and even design genomes up to the length of simple bacterial genomes.
The project prioritizes openness, with data, training/inference code, and model weights publicly available on GitHub and integrated with NVIDIA BioNeMo, aiming to generalize across life for applications like targeted gene regulation and genome design.
Open science is central: Evo 2, its training and inference code, and the OpenGenome2 dataset are fully open to accelerate exploration and design of biological complexity.
Regulatory DNA design showed modest success, with 17% activity differences across two cell types, while fully functional, novel proteins remain unproven and not yet on par with design-grade protein engineering.
Evo 2 embeddings can train supervised classifiers for tasks like BRCA1 variant interpretation, achieving high AUROC/AUPRC when leveraging layer-specific features.
On regulatory tasks, Evo 2 embeddings outperform other unsupervised DNA models, but sequence-to-function models trained with task-specific data still excel for certain regulatory predictions.
The StripedHyena 2 architecture blends input-dependent convolution with attention to enable efficient long-range sequence modeling and higher throughput than Transformer baselines at large context lengths.
Evo 2 can predict some regulatory features and certain structural protein aspects, but designing fully functional proteins or highly active, organism-specific regulatory DNA in eukaryotes remains limited.
The article notes ongoing questions about discovering novel genome features and whether Evo 2 will yield innovative biological insights beyond known features like intron/exon boundaries and regulatory motifs.
Variant effect prediction is competitive in zero-shot coding SNVs and superior for non-SNV coding and noncoding variants, with strong BRCA1/BRCA2 analyses versus specialized models.
Overall, Evo 2 marks a significant step in AI-assisted genome analysis and annotation, with potential for research and biotechnological design, while noting current limitations in designing novel, functional sequences.
Mechanistic interpretability shows Evo 2 learns latent representations of biological features such as exon–intron structure and mobile elements (prophages, CRISPR spacers) via sparse autoencoders and contrastive searches.
Summary based on 4 sources
Get a daily email with more Tech stories
Sources

Nature • Mar 4, 2026
Genome modelling and design across all domains of life with Evo 2
Ars Technica • Mar 4, 2026
Large genome model: Open source AI trained on trillions of bases
EurekAlert! • Mar 4, 2026
With Evo 2, AI can model and design the genetic code for all domains of life
Mirage News • Mar 4, 2026
Evo 2 AI Models Genetic Code for All Life Domains