Three U.S.-U.K. Projects Link Transcontinental Grids and Multiple Sites for Interactive Simulations in Real Time

Transatlantic Federated Grid collaborative demonstrates benefit of “intergrid” computing and gains experience with co-scheduling at multiple sites.

SEATTLE, November 20, 2005 — Three research groups, with joint support from the U.S. National Science Foundation and the U.K. Engineering and Physical Sciences Research Council, linked the U.S. TeraGrid and the U.K. National Grid Service via transatlantic fiber and used supercomputing systems at multiple sites simultaneously during SC05 to carry out interactive simulations.

Visualization  of a NEKTAR 3D arterial-tree

This visualization of a NEKTAR 3D arterial-tree simulation shows several arterial branching-sites with velocity vectors (arrows) and isosurfaces indicating pressure within the artery. (from a visualization by Joe Insley, UC/ANL)

The three groups, who worked on a shared Transatlantic Federated Grid, reported on their successes and lessons learned on Nov. 17 at the TeraGrid booth at the Seattle Convention Center. All three projects - NEKTAR (led by George Karniadakis, Brown University), VORTONICS (led by Bruce Boghosian, Tufts University) and SPICE (led by Peter Coveney, University of London) - grappled with challenging large-scale research problems that require grid computing to be solved.

“We believe we have shown that linked grids is a benefit for these projects,” said Coveney, who coordinated the effort from the U.K. side. “We made progress more effectively by pooling resources and expertise, and we brought some of the difficulties involved in grid computing more sharply into focus.” SPICE (Simulated Pore Interactive Computing Environment) won the HPC Analytics Challenge award, a first-time SC award given for innovative techniques in rigorous data analysis, advanced networks and high-end visualization to solve a complex, real-world problem.

Using novel algorithms (based on a mathematical relation known as Jarzynsky’s identity), SPICE uses steered molecular dynamics to pull a strand of DNA through the nanometer-sized pore of a channel protein embedded in a bilayer membrane. With a total size exceeding 250,000 atoms, the problem would require 25 years, says Coveney, with “vanilla molecular dynamics.” It becomes tractable only with grid-enabled computational resources, which make possible many interactive simulations, first to dynamically explore the parameter space of the DNA-protein system and then, having reduced the search space, to efficiently farm-out around 100 large-scale non-equilibrium simulations across the federated grid. The work identified a constriction in the pore structure, says Coveney, that may have physical consequences.

SPICE linked systems at three TeraGrid sites (NCSA, SDSC and PSC) and UK sites at Daresbury and CSAR (Computer Services for Academic Research, University of Manchester). The steering infrastructure was managed by RealityGrid middleware, which can fire up simulations and visualizations remotely from any linked sites. “The federated grid provides unparalleled computational power,” said Coveney, “in a coordinated and coherent fashion. This enables the heterogeneous and geographically distributed resources to be marshaled in service of a single scientific problem.”

The NEKTAR project (computational fluid dynamics) linked supercomputing systems at four TeraGrid partner sites (NCSA. SDSC, TACC and PSC) and a UK site (CSAR) with visualization at the University of Chicago/Argonne National Laboratory (UC/ANL) in real-time to the SC 05 showfloor in Seattle. With cross-site runs on grid-linked resources, they simulated the flow dynamics of the human arterial “tree” - the branched structure of arteries in the human anatomy. In earlier work, the NEKTAR team performed the first cross-site simulations on the TeraGrid, and extended this work during the past year to include a transatlantic component.

Atherosclerosis due to plaque formation is a major health problem strongly related to blood-flow patterns. It occurs preferentially at arterial branching sites, where blood-flow can circle back on itself, like eddies on the outer slow-flowing bank of a stream when it bends. With detailed 3D simulations of these flow patterns, researchers hope to facilitate better decisions about diagnosis and surgical intervention. The human arterial tree model contains the largest 55 arteries in the human body with 27 artery bifurcations at a fine-enough resolution to capture the flow. This requires a total memory of three to seven terabytes for the finite-element model, beyond current capacity of any single supercomputing site.

“The challenge,” said Suchuan (Steve) Dong of Brown, who ran the demonstration from Seattle, “was how to adapt the application and devise algorithms to exploit ensembles of supercomputers to achieve high performance.” The nature of the simulation made it viable to divide the “tree” among many processors at many sites.

“This was a success,” said Dong. “We have gained significant experience in the sometimes arduous process of cross-site debugging and in the co-scheduling of a large Globus job with several subjobs on different machines. This is important learning about the practical challenges of grid computing.”

The VORTONICS project, led by Boghosian of Tufts, linked the TeraGrid sites (UC/ANL, NCSA, SDSC, TACC and PSC) with CSAR during SC05, and during their largest, most successful run (on Nov. 17) they ran cross-site at UC/ANL, NCSA, SDSC, PSC and CSAR. For this run, they linked 512 processors to carry out a lattice-Boltzmann simulation on a 1,250^3 grid (nearly two-billion lattice points). VORTONICS provides direct numerical simulation of 3D Navier-Stokes flows, to address problems in vortex interaction, important problems in which the time and space scales of vortex stretching and reconnection isn’t understood. “Our demand is geographically distributed domain distribution,” said Boghosian. “We have an enormous 3D lattice grid, and we want to be able chop it up into pieces that reside on different SC sites.”

In their largest run from Seattle, the VORTONICS simulation injected more than nine gigabytes of data into the network.

“This joint U.S.-U.K. effort illustrates a key benefit to integrating major resources in a service oriented architecture using grid computing capabilities,” said Charlie Catlett, director of the TeraGrid, the NSF-sponsored cyberinfrastructure program. “It shows that these tools make it possible to solve problems that otherwise can’t be solved.”

Coveney, Dong and Boghosian emphasized that the success revealed the need for more sophisticated grid capabilities, such as “co-scheduling” for simultaneous runs at multiple sites. Lack of advanced scheduling capabilities led to an inordinate amount of human intervention - to such a degree, notes Coveney, that cross-site simulations would be a “show stopper” for most computational scientists. Scheduling tools to alleviate this, he also noted, are not a daunting technical challenge.

The fundamental difficulty with cross-site scheduling on a routine basis, said Sergiu Sanielevici of PSC, who leads user services for the TeraGrid, and who helped to coordinate scheduling for all three projects, is scarcity of resources. “We need more nodes, so that we have a reasonable excess capacity, and we need dedicated funding for grid-scheduling algorithm development and implementation. Of course, we will be analyzing the lessons of these experiments and should be able to make some quick improvements.”

The project leaders also emphasized that this work is on-going and the potential of grid computing to solve important scientific problems has just begun to be opened up. “There is a synergy of collaboration,” said Boghosian, “that is part of the energy of grid computing.”

For both NEKTAR and VORTONICS, cross-site message passing (via MPI) was coordinated with MPICH-G2 middleware, developed by Nicholas Karonis of Northern Illinois University and UC/ANL, who collaborated with both the NEKTAR and VORTONICS project. “MPICH-G2 was essential to making this work,” said Boghosian. “If you’ve written your code with MPI, you don’t have to change your internal code to migrate to the grid.”

Other collaborators in the NEKTAR group along with Karniadakis, Dong and Karonis are Leopold Grinberg of Brown, Michael E. Papka and Joseph A. Insley of UC/ANL, Alex Yakhot of Ben Gurion University and Spencer Sherwin of Imperial College.

Boghosian’s collaborators along with Karonis are Lucas Finn and Christopher Kottke. Coveney collaborates with Shantenu Jha of University College London and colleagues at the University of Manchester, UK.

The TeraGrid, sponsored by the National Science Foundation, is a partnership of people and resources that provides a comprehensive cyberinfrastructure to enable discovery in U.S. science and engineering research. Through high-performance network connections, the TeraGrid integrates a distributed set of very-high capability computational, data management and visualization resources to make U.S. research more productive. With Science Gateway collaborations and education and mentoring programs, the TeraGrid also connects and broadens scientific communities.

The National Grid Service aims to fulfil a similar role in the United Kingdom.

For more information, see http://www.teragrid.org

See also:
Ketchup on the Grid with Joysticks, Projects in Scientific Computing, 2004