GenomeOcean: Revolutionizing Genomic Data with AI to Accelerate Precision Medicine and Environmental Research

February 26, 2026
GenomeOcean: Revolutionizing Genomic Data with AI to Accelerate Precision Medicine and Environmental Research
  • Doudna features include high-density, rack-scale infrastructure and ultra-low-latency networking with NVIDIA Quantum-X InfiniBand, plus offload of non-compute tasks to NVIDIA BlueField DPUs.

  • GenomeOcean, a JGI initiative, uses large language models to read, understand, and generate genomic data at scale, aiming to shorten the path from hypothesis to insight in precision medicine, drug development, and environmental genomics.

  • Estimated training times could drop dramatically with Doudna, from the current multi-thousand GPU-hour scale to 30–350 times faster, with availability expected in the first half of 2027.

  • The project emphasizes end-to-end AI enablement to reduce scientists’ operational burden, including collaborations, structured learning, and potential use of the Dell AI Factory with NVIDIA to streamline training to inference.

  • A pilot 4-billion-parameter genome foundation model was released, trained on 220 TB of metagenomic data using NERSC resources, and will benefit from the upcoming Doudna (NERSC-10) supercomputer for faster training and inference.

  • Open science principles guide GenomeOcean, making data publicly available with containerized deployment guidance, and safeguarding data integrity with a detector that can distinguish synthetic from real DNA at over 99% accuracy.

  • The model is designed for efficiency and trust, targeting 50–100x faster performance than comparable models while prioritizing reliability to reduce hallucinations in genomic outputs.

  • Future goals include training larger models faster, expanding accessibility for researchers, and maintaining open, reproducible workflows that scale GenomeOcean’s impact across DOE priorities and broader genomics research.

  • The launch will leverage the Dell PowerEdge-Integrated Rack 7000, NVIDIA Vera Rubin platform, NVIDIA Quantum-X InfiniBand, NVIDIA BlueField DPUs, and Dell Omnia tools to scale AI and HPC workloads.

  • Bottlenecks include vast, unstructured, and partially labeled genomic data—over 30 TB of assembled metagenomes at JGI—with functional annotation remaining a key challenge.

  • GenomeOcean addresses three core JGI questions: how to read genomes via sequencing, how to understand function via annotation and interpretation, and how to write genomes through synthetic biology aligned with DOE priorities.

Summary based on 2 sources


Get a daily email with more AI stories

More Stories