News Center


Mind the Gap

Pittsburgh Supercomputing Center's MARC Program Helps Minority-Serving Institutions Prepare Students for 21st Century Biology Careers

August 12, 2013

In 1996, Ricardo González Mendez decided to revamp his skills and expertise in bioinformatics—using advanced computing techniques on biological problems. In the process, he learned something alarming: American biology education was in danger of becoming a two-class system.

A gap is developing, the University of Puerto Rico (UPR) School of Medicine professor realized. The top-tier institutions understand that bioinformatics will soon be a job requirement in much of biology. And they have expended their considerable resources to create bioinformatics classes, degree programs and research centers. Among the other institutions, many realized as well that a bioinformatics crisis was coming. Their students were in danger of being left behind, whether they wanted to pursue academic research, industrial positions or even teach. But the schools had neither the expertise nor the resources to respond.

“There is a total disconnect between the top-tier research schools and the minority or not-so-rich schools,” Gonzalez warns.

Today’s biologists are generating incredible amounts of data. The European Bioinformatics Institute alone, for example, now stores 20 petabytes of life sciences data — enough to fill nearly a quarter of a million top-line iPads. Understanding that much data can only be accomplished with computers.

 “People are starting to come to the realization that either they modernize their skills, or they’re not going to get more funding,” Gonzalez says.

Gonzalez enlisted the then-director of PSC’s biomedical initiative, David Deerfield, who with PSC’s Hugh Nicholas and Alex Ropelewski wrote a 2001 National Institutes of Health Minority Access to Research Careers (MARC) grant to help students at minority-serving institutions study bioinformatics. Gonzalez was the PSC MARC program’s first faculty liaison. After Deerfield’s tragic death in 2006, he took on an expanded role as co-principal investigator with Nicholas.

Initially, the program focused on helping institutions establish a single bioinformatics course on campus. “The focus of the current grant is on working with five minority-serving [partner] institutions to get concentrations in bioinformatics on their campuses,” says Ropelewski. “It’s basically a multi-course work series that’s established officially, so that someone can minor in bioinformatics, for example.” The program fills gaps in students’ knowledge base — teaching biologists about computers, and computer scientists about biology.

PSC’s MARC program has enjoyed a number of successes. The students are “publishing in really good journals, getting funded from various agencies, and … going on to graduate programs, post-doctoral fellowships and positions in industry and government that are very good,” Gonzalez says. “We are not as big as some [of the top] programs, but we produce the same kind of quality.”

One of PSC MARC’s most exciting facets is a 10-week summer program, in which students from participating institutions come to PSC to carry out bioinformatics research projects. Here we present a sampling of this year’s students and their efforts.

Grabbing the Brass Ring

People had plenty of advice for Tevin Reed, an incoming senior at North Carolina Agricultural and Technical State University, about what not to major in. His band director warned about the grim employment prospects for a brass musician. Reed’s sister, an information technology major, always seemed to need another expensive tool for her projects. But in computer science, she told him, “As long as you have your laptop you can always do your homework.”

Reed found he loved computer science. “I’m not going to say I was the best at it, but I could actually understand what the teachers were saying on the first day,” he says.

Reed’s MARC project was to use the wxWidgets library to make the open source GeneDoc Windows program work on any operating system. PSC’s Hugh Nicholas, the late David Deerfield and Alex Ropelewski developed program specifications and Nicholas’ son Karl coded GeneDoc, which visualizes, highlights and rigorously compares DNA, RNA and protein sequence alignments. Reed hopes to help biologists get more consistent computational results no matter what kind of computer they use.

“It was very helpful” having the program’s authors available in working out the inevitable glitches, Reed says. “Once I finish it, I’ll talk with Alex and Hugh and see how they want to distribute it.”

Getting a Better Vantage Point

From atop Ingrid Montes-Rodriguez’s grandparents’ house in Ciales, in the mountains of Puerto Rico, you can see both the Atlantic Ocean to the north and the Caribbean Sea to the south.

“It’s really beautiful,” says Montes-Rodriguez, a PhD student in Juan López Garriga’s chemistry lab at the University of Puerto Rico, Mayagüez, who is currently doing her research work at the UPR Medical Sciences Campus. When the pressure of grad school gets to her, she heads for Ciales, and her family.

Her dad is a master of “tough love” advice: “He just says, ‘Well, you have to do it! What are you going to do, cry?’”

Thanks in part to that advice—and a PSC MARC summer project with Graham Hatfull’s lab at the University of Pittsburgh—Montes-Rodriguez has begun decoding the genetic material of the clam L. pectinata, a major focus of López Garriga’s team.

L. pectinata, which grows in mudflats throughout the Caribbean, survives levels of hydrogen sulfide that would kill most animals. It does this in part by producing a unique kind of hemoglobin, which attaches to that “rotten egg” chemical instead of oxygen.

Montes-Rodriguez hopes that comparing the genome of the clam with the DNA of other species will help explain how the unique hemoglobin evolved, and what other protective mechanisms the species has developed.

Putting the “Tech” in Biotech

Michael Thompson loves the gadgets.

“To be honest, I really love technology,” says the incoming Jackson State University senior. “I’m just fascinated by it.”

As a freshman Thompson wasn’t sure what he wanted to study. But he’d scored high in biology in the Mississippi state high school tests, and when Raphael Isokpehi, director of Jackson State’s Center for Bioinformatics and Computational Biology, invited him to visit Isokpehi’s lab he decided to check it out.

“There were a lot of computers but I didn’t notice any microscopes or anything like that,” Thompson says. “I thought, ‘That’s weird.’” That’s where, for the first time, Thompson found out that he could do biology and engage in his love of technology.

At the MARC summer workshop, Thompson continued his project from Isokpehi’s lab, studying the universal stress proteins (USPs) in the Clostridia bacteria. USPs are an important part of an organism’s defense against stresses such as antibiotics. Because of that, they’re an important target for treating food poisoning, colitis, tetanus and other diseases caused by Clostridia.

“Being able to have Alex, Hugh and Pallavi [Ishwad, PSC’s Education Program director] guide me in the right direction has been amazing,” Thompson says. “I feel like I can take a lot back home and teach others.”

Summer Job

Not all the success stories in this summer’s MARC program involve MARC students.

Jonathan Strickland, a recent high school graduate of the Pittsburgh Science and Technology Academy, knew that he wanted to work with computers. It’s probably fair to say, though, that after a senior project at PSC he’s aiming considerably higher in the industry—and it started with this year’s summer employment.

“I kind of pictured having a summer job at Arby’s,” he says. But the now-University of Arizona Honors College freshman and full scholarship recipient found himself on a different track. Working with PSC’s Alex Ropelewski, he did a project analyzing what factors affect the RNA sequence assembly program Trinity’s performance on supercomputers (See p. X). “I guess I did a good job, because Mr. Ropelewski asked me if I wanted to work at PSC this summer” helping with the MARC program.

So Strickland helped keep the MARC workshop running smoothly. He did some purely gopher tasks. But he also set up user accounts for the MARC students. He even helped out with the Python programming language class, which he himself learned during his senior project experience.

“Working at PSC is great,” he says. “I’m actually doing stuff that’s meaningful.”

Last Updated on Friday, 10 January 2014 09:10
 

CFL Software, PSC Collaborate on Next Generation of Information Searching

JULY 18, 2013

SherlockSherlockNew software being developed by CFL Software may transform our ability to search for information in text documents as profoundly as search engines improved upon paper library card catalogs. The software, CFL Discover, will search electronic text documents far more completely and accurately than possible with today’s search technologies.

Pittsburgh Supercomputing Center (PSC) is collaborating with CFL as a strategic partner in developing CFL Discover, making the software available to researchers on Sherlock, a modified version of YarcData’s UrikaTM, a real-time data discovery appliance at the center.

“This is a new venture both in terms of scale and speed in searching for information,” says David Woolls, CEO of CFL Software, which specializes in linguistic document forensics. “In essence, we take over where search engines stop.”

While many users may not be aware of it, search engines don’t completely search all the text in the entire Web — that would take far too long. Instead, they search indexes, keywords, categories and other “metadata” that have been added to those documents. In the case of keywords and categories, that addition has to be made by humans, and so is time-intensive and incomplete. Today’s engines obviously revolutionized our ability to find information, but they are inexact. Many irrelevant sites pop up, and many sites that may be more suitable aren’t captured. In a sense, we all stop when we reach a site that is “good enough” rather than one that’s best for our needs.

“Search engines start with a few words and return a list of documents which contain them,” Woolls adds. “CFL Discover starts with one or more of those documents and reads them for you, shows you the terminology that is shared and gives immediate access to the passages of particular interest to you.”

The program uses YarcData’s industry-standard SPARQL query language and RDF (Resource Description Framework) to search entire texts for meaningful connections between the words in a search query and related language in other texts. This kind of “graph search” enables someone searching for information to find relevant connections that they may not have thought of. The program is written in Java, so is platform independent and can work on anything from a standard PC to a Java-capable supercomputer. (While most supercomputers can’t run Java, two at PSC — Sherlock and Blacklight — do, providing valuable support for research communities that primarily use Java for data analytics.) The choice of platform and computer is solely dependent on the volume and speed of response required.

“It’s less like searching for a needle in a haystack than searching for a needle in a needlestack,” says Arvind Parthasarathi, President, YarcData. The advantage of CFL Discover is that it allows related groups of documents to be rapidly identified, not on the basis of pre-determined keywords and categories, but purely on the similarity of the content. This in turn allows the rapid creation of new combined databases from a collection of existing databases. For example, when searching Wikipedia, entering the title of an article causes CFL Discover to read the database, returning a comprehensive list of potentially interesting articles related to the whole content. And because the framework is RDF, searches of other RDF collections can be readily performed. The principles on which the program works allow it to be used in many different languages, including Arabic, Chinese, Thai and Finnish, which appear to be very disparate to the human eye.

“The structures and sequences inherent to individual documents are all that are needed to encode them,” Parthasarathi says. “New material is easily added to existing stores and is immediately available for use by the search queries.”

CFL Software has carried out proof-of-concept studies of CFL Discover to search U.S. Patent Office record and legal document description sections as well as Wikipedia. The collaboration with PSC will employ the program on PSC’s Sherlock, which is optimized to search extremely large and complex bodies of information with open-ended queries. The new work will explore a substantial portion of the U.S. Patent database, in addition to the full data of Wikipedia in more depth.

“PSC’s role in the partnership is to couple the unique analytic capability of Sherlock running CFL Discover with hosting massive datasets on PSC’s Data Supercell to expand text analytics to unprecedented, interdisciplinary use cases,” says Nick Nystrom, PSC’s director of strategic applications. “Response time is critical for exploring big data, and Sherlock with CFL Discover will provide rapid analyses of unstructured text data larger than can be done on any platform currently available to U.S. researchers.”

“We see high value for a wide range of research and societal applications,” Nystrom adds. Examples include analyzing recent events from news and social media sources, extracting deeper insights from sets of publications, and enabling computational history and culturomics — the quantitative study of cultural phenomena by analyzing large volumes of written records. “Application of high-performance analytics is new to these and similar fields, and will catalyze new ways of leveraging unstructured text data.”

Last Updated on Thursday, 18 July 2013 10:23
 

Blacklight Research Spurs Change in Stock Exchange Rules

July 15, 2013

Findings on the effects of “odd lot” trades on the financial markets, using computations on PSC’s Blacklight, have led the New York Stock Exchange, the Nasdaq Stock Market and the Financial Industry Regulatory Authority Inc. to redefine how the industry tracks small stock trades. The new rules will be enacted in October.

Previously, odd lots — trades of 100 or fewer shares — did not have to be reported to regulators. The rationale was that these trades involved small investors who were unlikely to affect the larger market significantly. But recent volatility in the markets, driven by automated small trades that occur far faster than any human can think, called that assumption into question.

In an upcoming paper in The Journal of Finance, Mao Ye, University of Illinois, Urbana-Champaign, Chen Yao of UIUC and Maureen O’Hara, Cornell University, report that odd lots are playing an increasingly important role in the wider behavior of the markets. The researchers used Blacklight and the San Diego Supercomputer Center’s Gordon to analyze market data for the effects of odd-lot trading.

“For every 100 trades of Google, 52 to 53 of them” are in the form of odd lots, Ye observes. “There are more missing trades than trades you can see. In terms of volume, more than 20 percent of the trading volume [among all stocks] is missing” in the official count.

The widely held suspicion is that the largest and most sophisticated traders are using automated trading in odd lots to hide their activities from other traders. In any case, the researchers showed that including the odd lots significantly alters our understanding of the markets. Partly in response to this research, in June 2013 the market authorities agreed to a plan to require all trades, of as few as one share, to be reported.

“In the U.S., they care a lot about the transparency of the market,” Ye explains. The new rule change will remove “a kind of darkness we cannot see and that we never realized was there.”

PSC covered the group’s work in detail in a recent article that you can find here.

Last Updated on Monday, 15 July 2013 08:09
 

2014 Pennsylvania State Budget Includes $500,000 for Pittsburgh Supercomputing Center

July 2, 2013

The Commonwealth of Pennsylvania budget signed by Gov. Tom Corbett on June 30 includes a $500,000 line item for PSC.

“This is very good news for PSC and for the Commonwealth,” says Ralph Roskies, scientific director for PSC, adding that the state’s return on its past investments in PSC has been excellent. “Since our inception we’ve brought over $500 million in outside funds into Pennsylvania, representing a 14:1 return on state funding for PSC.”

“We’re grateful to the members of the General Assembly, and especially the Allegheny County delegation,” Roskies adds. “The bipartisan support of Senators Randy Vulakovich and Jay Costa and Representatives Mark Mustio and Joe Markosek made this possible.”

The funding, says PSC’s leadership, will benefit the state’s technological and workforce infrastructures as well.

“PSC is responsible for generating 1,600 jobs and over $200 million in annual economic activity,” says Cheryl Begandy, PSC’s director of education, outreach, and training. “In addition, our place on the leading edge of computing technologies at the largest scale enables us to respond quickly to technological developments, giving the state, its researchers and its small and mid-sized companies a leg up in capitalizing on these advances.”

The state line item will also prove valuable to PSC’s ongoing competition for federal research funding. Local funding is often seen by granting agencies as concrete evidence of grassroots support for a research center.

“In our fight for federal awards, we’re competing with some of the best high performance computing centers in the world, many of which enjoy significant state funding,” Roskies says. “The state line item will help us retain a competitive edge over and above the excellence of our proposals themselves.”

The details of the line item have yet to be worked out with the state, Roskies says. Potential projects include

  • supporting the Commonwealth’s STEM Education initiative through PSC programs in Computational Reasoning and Bioinformatics
  • collaborating with the Pennsylvania State System of Higher Education to support research and education at its 14 state universities
  • supporting small and mid-sized manufacturers in Pennsylvania through the introduction of Digital Modeling tools, resources and training
  • encouraging workforce development through internships for undergraduate or graduate students
  • continuing PSC core management and outreach efforts expected by federal and other granting agencies
Last Updated on Tuesday, 02 July 2013 09:37
 

Pittsburgh Supercomputing Center, Numascale AS to Collaborate on Improved Memory Systems for Research

June 28, 2013

Pittsburgh Supercomputing Center (PSC), Pennsylvania’s only National Science Foundation high performance computing facility, and Numascale AS, whose products support the construction of low-cost, scalable-server computer systems, have launched a collaborative project investigating the applicability of Numascale systems to the many research projects requiring more directly addressable memory than is readily available on single, commodity, multi-socket, large memory servers.

“Rapid advancement in many scientific fields of data-dependent research will be facilitated by the availability of larger memory systems at near commodity prices,” says Michael J. Levine, scientific director, PSC. “Having large amounts of data in directly-addressable memory avoids very time-consuming disk input/output and allows a much more productive programming paradigm.”

The field of supercomputing is well known for engineering extreme processing speeds but increasingly, researchers’ calculations are limited not by the speed of processing but access to and efficient use of vast amounts of data. Application areas that require very large memories include natural language processing, multi-organism genomics and quantum chemistry.

“We see the collaboration with Pittsburgh Supercomputing Center as an important milestone for utilizing NumaConnectTM for a number of applications that have previously been limited by inferior memory capacity in standard servers,” says Einar Rustad, CTO and co-founder of Numascale. “The huge and scalable memory capacity in systems with NumaConnect allows users to operate in the familiar programming and runtime environment they are used to with workstations.”

This, Rustad explains, eliminates the need for explicit message passing and significantly reduces the overall time from idea to solution for a number of important applications in many scientific fields. “PSC's unique expertise will strengthen our focus on applications that are key to advances in major scientific fields and help us to widen the market for Numascale.”

The collaboration between PSC and Numascale seeks to leverage PSC’s unique and extensive experience with very large memory computing systems and Numascale’s NumaConnect memory technology to produce systems capable of handling such large data volumes without memory-retrieval lags. NumaConnect uses commodity servers as building blocks to provide memory capacities and retrieval speeds currently only available through high-end and enterprise-class systems. PSC’s application specialists will work with Numascale engineers and application programmers to find ways the two organizations’ experience and expertise can be combined synergistically.

Last Updated on Friday, 28 June 2013 10:38
 


Page 2 of 12

People. Science. Collaboration.

PSC's Bi-annual Publication (select issue to download PDFs)

PSC Spring2014c

PSC2013 covers web    Projects2012

Subscriptions: You can receive PSC news releases via e-mail. Send a blank e-mail to psc-wire-join@psc.edu.

Media Contacts

Media Contact(s):

Ken Chiacchia
Pittsburgh Supercomputing Center
chiacchi@psc.edu
412.268.4960
 
Shandra Williams
Pittsburgh Supercomputing Center
shandraw@psc.edu
412.268.4960

Subscriptions: You can receive PSC news releases via e-mail. Send a blank e-mail to psc-wire-join@psc.edu.

PSC Logo Download: PSC's logo is available for use in print, e-media and presentation application. Various formats are available here.

Use of PSC materials: To request permission to use PSC materials, please complete this form.

News Archive: 2012201120102009

2008200720062005

2004200320022001

2000199919981997

1996199519941993