Accelerating Biology with GPUs

January 7, 2025

Advances in machine learning for biology are transforming basic and clinical research, enabling researchers to make sense of massive multi-omics datasets and guide critical decisions in drug development, patient segmentation, and much more. While the explosion of interest in AI is accelerating progress in complex research areas like cancer, it has also made biology much more computationally intensive. Servers powered by central processing units (CPUs) alone are no longer enough for the modern scientist. Labs looking to implement advanced analytical tools like foundation models need additional graphical processing units (GPUs) to make their projects feasible.

What is the difference between a CPU and a GPU?

All servers are powered by central processing units (CPUs), which are composed of several computational cores that serially handle operations like running software and responding to user input. The CPU is essentially the “brain” of a server. Graphical processing units (GPUs), however, are computing components that can help supercharge what CPUs can handle. GPUs can have hundreds of cores and excel at parallel processing, or running multiple operations at the same time. GPUs are necessary for high performance computing (HPC) – tasks that require an exceptional amount of processing power, such as implementing a foundation model.

Architecture of a CPU vs. GPU, where the control unit directs the processor’s operations, the ALUs (arithmetic logic units) are circuits that perform operations, and the cache unit temporarily holds memory.

Are there different kinds of GPUs?

GPUs are generally classified by their use case; low-power, mid-range and high-performance. Lower-power GPUs like the NVIDIA Tesla T4 are designed for more resource-intensive tasks that exceed the capabilities of CPUs but don't require the highest level of computational power. These can be used for tasks like accelerating certain bioinformatics analyses or machine learning inference. On the other hand, high-end GPUs like the NVIDIA A100 and L40S are specifically designed for demanding workloads such as training large foundation models or running large-scale deep learning applications. These GPUs offer significantly more power, but are also more expensive and generally used in specialized servers rather than local computing systems.

What are some key applications of GPUs?

Genomics

In order to generate DNA sequence data, raw data from purified DNA samples must be processed into nucleotide base pairs. One way to obtain this data is by using a Nanopore sequencer, which measures changes in ionic current as the DNA strand passes through a tiny hole – or nanopore – and then uses a basecalling algorithm to convert this signal into recognizable nucleotide bases and to perform error correction. Oxford Nanopore basecallers use Recurrent Neural Network (RNN)-based algorithms trained on a variety of DNA sequence data to predict nucleotides from raw electrical signals. Users can run a GPU-enabled algorithm called Dorado for basecalling. Dorado significantly accelerates basecalling for computationally intensive applications, and is optimized on systems running heavyweight GPUs like the NVIDIA A100 and H100.

Basecalling using a bi-directional recurrent neural network (RNN). Adapted from “Data analysis documentation: Basecalling overview,” Oxford Nanopore.

Another important analytical process in genomics is variant calling, or identifying genetic variants. Accurately calling variants from genomic data is crucial for downstream applications, like learning which variants may be associated with certain diseases. Google researchers have developed an algorithm called DeepVariant (Poplin et al. 2018), which uses a deep convolutional neural network to learn how to accurately call variants from genomic data generated using a variety of sequencing technologies. DeepVariant can be accelerated with NVIDIA’s GPU-enabled Parabricks software, which significantly improves runtime for high-throughput applications – up to 60X faster compared to using CPUs alone.

It’s not only variation in the coding regions of DNA that can lead to altered or impaired gene function; noncoding DNA plays a variety of regulatory roles that can impact gene expression. Even genetic elements tens of thousands of base pairs away from a gene can impact its expression, making it difficult to systematically assess the effects of these interactions without powerful computational tools. Foundation models like DeepSEA (Zhou & Troyanskaya 2015), Enformer (Avsec et al. 2021), and DNABERT (Ji et al. 2021) use deep learning algorithms trained on massive DNA datasets to help predict the regulatory effects of noncoding DNA. Foundation models require pretraining in order to learn the patterns and contexts necessary to make accurate predictions from input data – both pretraining and running the model on the input data require GPU acceleration.

Architecture of Enformer variant calling, including long-range (100kb away) interactions. From Avsec et al., “Effective gene expression prediction from sequence by integrating long-range interactions,” Nature Methods (2021).

Transcriptomics

There are a variety of analyses that can be performed on single-cell RNA (scRNA) data to learn about gene expression patterns, cell identity, cell-cell communication, and much more. Scverse offers a suite of tools for such analyses, including scanpy (Single-Cell Analysis in Python) and squidpy (for spatial transcriptomics data). In order to bring GPU acceleration to single-cell analysis, NVIDIA introduced RAPIDS single-cell: a tool providing GPU-based workflows that are near drop-in replacements for most scanpy functions, as well as some for squidpy. RAPIDS single-cell retains scanpy’s accessibility, using code that is easily navigable by those familiar with Python.

GPU-accelerated tools are particularly useful for cell type annotation, especially as datasets grow larger and more and more references become available. CellTypist, which can be used with the RAPIDS single-cell pipeline, and scANVI, a deep generative model, both leverage GPUs to annotate cellular identities in scRNAseq data. The use of GPU-powered tools allows researchers to process data at large scales and high speeds, often leading to a quicker turnaround as compared to CPU-based approaches.

scRNA analysis pipeline using RAPIDS single-cell. Adapted from Avantika Lal, “Technical Blog: Accelerating Single Cell Genomic Analysis using RAPIDS,” NVIDIA Developer.

Many foundation models have also been developed to make inferences from transcriptomic data. For example Geneformer (Theodoris et al. 2023) and scGPT (Cui et al. in 2024) are trained on tens of millions of single-cell transcriptomes in order to learn how to make accurate predictions and perform analytical tasks. Like other foundation models, running these algorithms essentially requires GPU acceleration.

Imaging

A major challenge of analyzing microscopy images is segmentation, or distinguishing individual cells and cellular components (like membranes and nuclei) from imaging data. Segmentation algorithms can be complex and resource-intensive to run, but they are necessary for accurately extracting meaningful features. In 2020, Stringer et al. introduced an imaging segmentation tool called Cellpose, which uses a deep learning algorithm trained on highly variable images of cells to effectively segment objects in many image types. In addition to the original paper, the authors have also published a Nature Methods guide (Pachitariu & Stringer 2022) for training your own Cellpose model, and a preprint (Stringer & Pachitariu 2024) on restoring images for cell segmentation. As image data is particularly large, GPU acceleration is necessary to run Cellpose in a feasible timeframe, especially with large datasets.

In addition to segmentation, image registration is often an important step in analyzing microscopy and medical imaging data. Registration involves aligning acquired images to reference images in order to annotate the image with tissue structures. Elastix, a popular image registration toolkit, provides a GPU-accelerated framework for aligning images through parametric transformations. Recently, deep learning-based tools such as Voxelmorph are being developed, which offer faster and more flexible registration through neural networks. Both segmentation and registration apply to all kinds of imaging modalities, including immunohistochemistry (IHC), magnetic resonance imaging (MRI), computed tomography (CT), and electron microscopy (EM). As imaging data are generally large in size, methods are computationally intensive and often benefit from GPU acceleration.

Segmentation of the same image (from 2 different datasets, a & c) by models initialized on CellPose 1.0, with incrementally more training ROIs. From Pachitariu & Stringer, “Cellpose 2.0: how to train your own model,” Nature Methods (2022).

In some applications, textual information is important for understanding visual data. Specifically, histopathology slides are often accompanied by captions that provide key patient and sample data. CONCH (Lu et al. 2024) is a new tool applying both visual and language-based deep learning algorithms to histopathological image analysis. Trained on image-caption pairs, CONCH can perform visual analyses like segmentation and image classification as well as accurate captioning. Running models like CONCH on GPUs can substantially speed up digital pathology, accelerating research as well as clinical practice.

How do I implement GPUs in my research?

Once effectively integrated into your workflows, GPU acceleration can transform your experimental and analytical pipelines. However, there are several questions to consider before diving in:

1. Which analyses do you want to run on GPUs? Start by identifying how you want to analyze your data, and which tools you want to use. Let’s use Dorado basecalling as an example through all of these steps. Dorado documentation details everything you need to know about how to download the packages, system requirements, running the program, and more. Working through this documentation on your own can be tedious, which is why Watershed makes it easy to install and integrate these packages into your data analysis pipeline.

2. What kind of GPU do your analyses require? Not all applications require high-performance GPUs, but you should err on the side of allocating adequate resources for the most computationally intensive analysis you want to run. Usually, the documentation for the analysis tool that you want to use will include minimum requirements for the resources needed to run it. The documentation for Dorado basecalling specifies that the algorithm is optimized to run on high-performance NVIDIA A100 and H100 GPUs, but should also work on other NVIDIA GPUs with ≥8 GB VRAM and Pascal architecture or later.

3. How should you obtain GPUs? Once you know what GPUs you need, you can either purchase and install them on your own, or access them through a third-party like Watershed. Integrating them into your compute ecosystem can be technically challenging, especially if you need to switch between multiple GPUs for different applications. Watershed offers built-in, easy access to multiple types of GPUs, allowing you to seamlessly switch between them as well as run them in parallel to further boost training and analysis speed.

To learn more about accelerating your research with GPU-optimized tools, reach us at contact@watershed.bio or schedule a demo.