PSC, XSEDE Support Gene Assembly of Key Aquaculture Species

Commercial abalone “aquaculture”—farming the shellfish in enclosures—has exploded over the past decade, becoming a $100-million global industry. Understanding the DNA of the abalone is key to improving and expanding its aquaculture for California producers. That’s why scientists at Iowa State University and the National Oceanic and Atmospheric Administration worked with PSC experts to “assemble” the DNA sequences of several species of abalone on the Bridges supercomputer.

Why It’s Important

Commercial abalone “aquaculture”—farming the shellfish in enclosures—has exploded over the past decade, becoming a $100-million global industry. Red abalone, a prized delicacy in much of the world, has also become more popular among American seafood eaters. Along with the “crop’s” high market value, this increased popularity is also driving growth in abalone “farming” in the U.S. Abalone culture has become a major business in California, Korea, China, and in other areas around the world. Partly because wild abalone populations have been over-exploited and hit with disease, the shellfish is one of the few species in which farming dominates the global market.

Despite the value and volume of abalone culture, though, we still have a lot to learn about the shellfish. Understanding its DNA and the DNA of the related green, pink, white and black abalone species is key to improving and expanding abalone aquaculture for California producers. By breeding individuals with useful traits—like growth rate, disease resistance and temperature-change tolerance—producers hope to grow it more cost efficiently. Research on abalone DNA is also vital for plans to restore the endangered white and black abalone, and will likely play an important role in trying to save these species from extinction. That’s why Andrew Severin and Arun Seetharam of Iowa State University, along with colleagues John Hyde and Catherine Purcellat the National Oceanic and Atmospheric Administration (NOAA)Fisheries, Southwest Fisheries Science Center in La Jolla, Calif., decided to “assemble” the DNA sequences of several species of abalone.

“Researchers at Iowa State University have been working on similar projects to improve food production for the last 10 years in agriculture [such as with] soybean and maize, and livestock [like pigs and cattle]. Aquaculture is now catching up [and] we already had a plan onthe best approaches to create genomic resources for a particular organism when you’re trying to effectively breed it to maximize production.”—Andrew Severin, Iowa State University

How PSC and XSEDE Helped

The scientists’ main objective was to create a “genome assembly”—the full DNA sequence of the abalone species. The red abalone aquaculture species has a genome consisting of 1.8 billion DNA “bases”—letters in the genetic alphabet. Current sequencing technology “reads” DNA in fragments as small as 100 bases, though, and the investigators needed a powerful computer to mix and match billions of these overlapping reads to discover their proper order in the genome. Making this task harder is the fact that the farmed abalone population is genetically “wild,” with a lot of genetic variation between individuals. That’s good for a wild population, but can make assembling the DNA fragments harder because the same fragment in the genome can have different letters in different individuals.

Worse, abalone has a lot of repetition in its DNA sequence, meaning the same fragment could match multiple places in the genome. All of this can quickly overwhelm the memory of a computer trying to reconstruct the full sequence. Severin, who manages Iowa State’s Genome Informatics Facility, has been an XSEDE Campus Champion since 2014. He knew that assembling the genomes of the five abalone species was going to take much more memory than available on most supercomputers. He also knew where to find that kind of memory. Working with XSEDE Extended Collaborative Support Service experts Phil Blood and David O’Neal at the Pittsburgh Supercomputing Center (PSC), the scientists ran their assemblies on the 3-terabyte “large memory nodes” of Bridges, an XSEDE-allocated system at PSC.

Today they’ve completed rough assemblies of all five species and a more complete assembly for the commercially important red abalone. Their red abalone assembly now covers about 97 percent of the complete genome. Once the red abalone genome is complete, the scientists are going to compare the genes across all abalone species from different regions to identify which genes offer survival and growth benefits in different environments and growing conditions. This information will be important for commercial abalone farms seeking to make their operations more efficient, and to helping restore the endangered white and black abalone.

“We did the preliminary assemblies on Bridges for all five species …Our local supercomputing infrastructure was over-utilized, and XSEDE was the only place that would run all five genome assemblies at the same time. Bridges is bigger than anything we have access to and can complete more jobs at the same time.”—Arun Seetharam, Iowa State University