Studies with Anton, a special-purpose supercomputer designed by D. E. Shaw Research and made available at PSC, have yielded new insights into the motion and function of proteins

PHOTO: Anton © D.E. Shaw

Research with Anton at PSC and at DESRES has demonstrated its ability as an unparalleled tool to advance understanding of biomolecules.

In the 1670s in the small city of Delft in the Netherlands, a fabric merchant named Anton van Leeuwenhoek, who used magnifying lenses to count the threads in cloth, found ways to craft lenses of unprecedented power. His lens-making ability along with his natural curiosity led him to observe living things — bacteria, spermatozoa, blood flow in capillaries — no one had seen before. More than four centuries later, a supercomputer called “Anton,” named in honor of van Leeuwenhoek, is demonstrating remarkable ability as a “computational microscope” — enabling researchers to observe submicroscopic realms where proteins carry out their life sustaining functions.

Although Anton the supercomputer can’t literally see anything, it makes otherwise unseen things observable by dramatically increasing the speed of a computational application called “molecular dynamics” (MD) — a widely used method of simulating the structure and movement of biomolecules, including proteins and DNA. While most supercomputers are general purpose, built to handle a wide range of computational problems, from fluid dynamics to quantum physics, Anton’s specialized hardware has a single purpose — to run MD simulations, and it does this about 100 times faster than other supercomputers.

PHOTO: Markus Dittrich

Markus Dittrich PSC “Anton’s ability to extend the timescale of molecular dynamics simulations,” says Dittrich, who coordinates the Anton project at PSC, “has opened a new window on many important biological processes.”

Anton and the novel algorithms it employs were designed by a team of researchers led by David E. Shaw, chief scientist of D. E. Shaw Research (DESRES) in New York City. The objective of running MD faster, more than speed per se, is to simulate biomolecules for longer periods of biological time. Before Anton, most MD simulations could track a protein’s movement for only hundreds of nanoseconds (10–9 seconds), with a few simulations reaching into the microsecond range (10-6 seconds). Anton makes it possible to routinely simulate biomolecules for tens to hundreds of microseconds, and in some cases into the millisecond range (10-3 seconds). At these longer timescales is typically where the most biologically important aspects of protein activity occur, which before Anton weren’t accessible with MD simulation.

Since late 2008, DESRES has used Anton machines in its own internal research program, and in 2009 — in collaboration with the National Resource for Biomedical Supercomputing (NRBSC) at PSC — DESRES made one of its Anton systems available without cost for non-commercial research by scientists at universities and other not-for-profit institutions. PSC hosts this machine, supported by a two-year $2.7 million grant to NRBSC from The National Institute of General Medical Sciences (part of NIH).

In the summer of 2010, a panel convened by the National Research Council of the National Academies of Science reviewed proposals submitted by research groups from around the country and allocated time on Anton at NRBSC for 47 of these proposals. Many of these projects — as described in the rest of this article — have already produced new scientific insights about protein structure and function. Several papers have been accepted for publication, and other researchers have harvested unprecedented amounts of biomolecular simulation data that they’re still analyzing. As a result of these successes, NRBSC and DESRES renewed the program, enabling a new round of projects that will begin in October 2011.

How Proteins Get in Shape

PHOTO: KlausGroup

Martin Gruebele (left), Yanxin Liu (graduate student) and Klaus Schulten, with the lambda-repressor fragment that they simulated in the background.

Anton is an amazing machine,” says Martin Gruebele, of the University of Illinois, Urbana-Champaign (UIUC). Using Anton at PSC, graduate student Yanxin Liu, collaborating with Gruebele and with UIUC biophysicist Klaus Schulten, successfully simulated “protein folding” of a protein (lambda6-85) that, at 80 amino acids, is more than twice as large as the largest proteins whose folding had previously been successfully simulated and published.

The results are a major advance toward a challenge that scientists have called “the protein-folding problem.” When a cell produces a brand-new protein, it’s a droopy, unstructured chain of amino acids. Before it can begin to carry out its biological function, this chain must fold into the proper three-dimensional configuration. The result is a structured complex of folds, ribbons and helices, with clefts and notches that allow the protein to attach and release other molecules.

“The ability of a protein to fold into its characteristic three-dimensional structure,” says Gruebele, “is crucial for living cells. Misfolded proteins not only lose their functions, but can also cause diseases, including Alzheimer’s and Huntington’s disease.”

A major advance toward solution of the protein-folding problem

The Anton simulations of lambda6-85 by Liu, Gruebele and Schulten produced a folded form of the protein that compared well with experimental findings. “Anton enables simulations at full atomistic detail,” says Gruebele, “all protein atoms and water molecules included, for a long enough time for a protein to be folded from 'first principles’ on the computer.”

For a protein of this size, this represents a big step toward the goal of being able to calculate the accurate folded structure of a protein from knowing only its amino-acid sequence. This is a major objective of molecular biology, since current methods of deriving a protein’s folded structure depend on experimental processes — x-ray crystallography and NMR — that take months or years to find the structure of a single protein.

The “magic number,” says Gruebele, is about 200 amino acids. “The largest single domain in most proteins is about that size, with much evidence that these domains fold relatively independently from each other. If we can do reliable simulations on that scale, we’ll be able by running on the computer to do what otherwise takes experiments and years of analysis. Our work with Anton on lambda6-85 is exciting because it shows that this goal is within reach.”

PHOTO: KlausGroup

How A Protein Folds
Snapshots of the folding of lambda6-85 during a simulation performed on Anton. (ns = nanoseconds.) The native state of lambda6-85 is shown on the right for comparison.

Finding the Leak in Membrane Transporters

Emad Tajkhorshid

Another Anton project, led by Emad Tajkhorshid also at UIUC, focuses on a family of proteins known as “membrane transporters”. Like finely engineered doorway systems, these proteins reside in biological membranes and create highly regulated passageways for biomolecules — such as neurotransmitters — to cross from outside the cell to inside and vice-versa.

With Anton, Tajkhorshid and collaborators simulated the structural changes in these transporters over a much longer period of time than has previously been possible. “Before Anton,” he said, “we could simulate maybe 100 nanoseconds of protein motion. With Anton we were able to run several microseconds of simulation — more than 100 times longer in biological time.”

“With Anton we were able to run more than 100 times longer in biological time.”

Because of how they work — one side must close as the other side opens — membrane transporters undergo large changes in structure as part of their function. This has made them difficult to study with MD. “These proteins,” says Tajkhorshid, “are molecular machines that have to open on one side and close on the other in a highly coordinated manner as they go through their transport cycle. Large conformational change is key to understanding their mechanism, and with the MD simulations we could do before, we could observe initial motions only, not enough to be significant.”


This image from the simulations shows a membrane transporter protein (gold) within the cell wall with water molecules (gray surface), sodium (pink) and chloride (green) ions inside the cell (top) and in the extracellular environment (bottom). A substrate molecule (galactose: red & blue) is midway in the transporter. Overall this simulation included 84,000 atoms.

(Inset) This closeup view from the simulation shows water molecules (blue surface) passing through the transporter along with the galactose substrate molecule.

In being able to extend transporter simulations into the microsecond range, Tajkhorshid has begun to characterize a phenomenon involving water “leaks” through these proteins. During their function, as they switch from one side to the other being open, they take in water molecules that can then pass through to the other side along with the “substrate” molecule. This had been noticed experimentally, says Tajkhorshid, but not understood in detail.

Tajkhorshid’s Anton simulations show water leakage in four different cases. “We have a collection of simulations,” he says, “in which we’ve observed this phenomenon for all the transporter sub-families that we investigated.” Their findings, which they have reported in a submitted paper, point toward further experiments and suggest that current understanding of membrane transporters may need to be revised.

Inner-Ear Proteins

(l to r) Rachelle Gaudet, David Corey, Marcos Sotomayor and Wilhelm Wiehofen
Sotomayor credits PSC staff for the group’s success with Anton: “It’s a new machine and very powerful but you have to learn how to use it. PSC put on a workshop that was very helpful in getting these simulations started.”

A team of researchers at Harvard used Anton to arrive at better understanding of proteins that make it possible to hear. Residing at the tips of hair cells in the inner ear, these proteins form a thin filament called the “hair-cell tip link,” which is directly involved in transforming mechanical vibrations into the sensation of sound.

Hair bundles on the surface of the hair cells are comprised of “stereocilia” — flexible, finger-like structures arranged in rows. When sound vibration stimulates the hair-cell membrane, it stirs the stereocilia to movement that’s similar to a field of grain stirred by wind. The tip links connect each stereocilia to its neighbor and, as they stretch, convey the force of this wave-like movement to ion channels at the stereocilium tip, which then open to trigger electro-chemical signals to the brain.


Electron microscope image of hair bundles on the surface of an inner-ear hair cell. The hair-like stereocilia respond to the force and pitch of vibration with wave-like motions that trigger corresponding electrochemical signals to the brain. Stereocilia are approximately one to three microns (millionths of a meter) long. (Credit: Fred E. Hossler, inner-ear hair cell of a guinea pig)

Professors David Corey and Rachelle Gaudet and post-doctoral fellows Marcos Sotomayor and Wilhelm A. Weihofen collaborated on the project. Laboratory work to find the 3D crystallographic structure of the tip link — a combination of two proteins — set the stage for simulations using Anton. By extending MD into the microsecond range, this work gives a much more complete picture of how one of these tip-link proteins (cadherin-23) changes when it binds with calcium ions.

Research has shown that genetic defects in cadherin-23 cause deafness, and these mutations affect the part of the protein that binds with calcium. “We know that mutations target the calcium-binding site,” says Sotomayor, “and now with the simulation for the first time we can see the whole process of calcium binding, to clarify why some amino acids are important and might be related to deafness.”

“For the first time we can clarify why some amino acids might be related to deafness.”

One of their Anton simulations started with the protein and calcium not bound to each other and evolved to a structure, with calcium bound, that matches well with the previously obtained 3D crystallographic structure (which includes calcium). The researchers are now looking more closely at the overall calcium-binding dynamics, in particular which amino acids are involved, which could lead to better understanding of deafness. “We expect that those amino acids,” says Sotomayor, “are involved in hereditary deafness.”


Snapshots from Anton simulations of calcium ions (numbered green spheres) binding to cadherin-23 (blue, binding site in colored stick), a protein essential to hearing. The first frame (left) shows the protein alone; the second (200 nanoseconds of biological time) shows calcium beginning to bind; the third (800 ns) shows a calcium-bound structure that agrees well with the crystallographic structure.

Attacking Chagas Disease

Their simulations offer new insight for designing a drug to defeat Chagas disease.

Chagas disease threatens millions of people from the southern United States to Argentina. It’s caused by a protozoan parasite, Trypanosoma cruzi, that enters the blood, usually by way of bites from a common insect, the triatomine bug. Anti-parasitic medications, which can have fairly severe side-effects, are sometimes helpful, but 20 to 40-percent of people chronically infected with Chagas develop life-threatening heart and digestive system disorders.

[from top to bottom] César de Oliveira, Andrew McCammon, Barry Grant, Riccardo Baron

With the aim of finding more effective drug therapies, Andrew McCammon of the University of California, San Diego and post-doctoral fellows César de Oliveira, Barry Grant and Riccardo Baron used Anton to simulate an enzyme from the Chagas parasite — T. cruzi proline racemase (TcPR). Research has shown that TcPR triggers the parasite’s ability to, in effect, trick the immune system and sustain itself as an invader in the blood stream. Their simulations, which extended to three microseconds for two different forms of the enzyme, reveal new information about how TcPR changes its structure, and offer new insight for designing a drug to defeat the disease.

Triatomine bug (Rhodnius prolixus, subfamily Triatominae), aka the assassin bug, cone-nosed bug and kissing bug, depending on environs, one of several related species of nocturnal blood-sucking insect that transmit Chagas disease.

“No one has had access experimentally to the open state,” says de Oliveira. “We’ve been doing MD to characterize this unexplored state.” The open state, he explains, is responsible for stimulating a non-specific immune response — called a mitogenic B-cell response — that allows the parasite to establish infection and avoid a specific immune response that otherwise might protect the host organism.

Their simulations with Anton show, for the first time, the beginning of the opening of TcPR, with several residues around the active sites becoming exposed to solvent. “You can track what protein segments are involved in the opening motion,” says de Oliveira. With this information, the researchers are now collaborating with an experimental group that will test to see if the segments identified in the simulation are involved in eliciting immune response. In further work, McCammon, de Oliveira, Grant and Baron anticipate using the simulation data from Anton to computationally screen thousands of potential “inhibitor” molecules, to find compounds that can block TcPR’s active site from triggering a mitogenic response and thereby potentially lead to a drug that can defeat the parasite’s ability to threaten life.

PHOTO: chagas.jpg

The closed (left) and open state of the TcPR enzyme, with arrows indicating changes during simulation of the open state.