News & Publications

PSC Developing HOV Lane for Big Data Transfers

PSC Developing Networking Tool to Speed Big Data Transfers

DANCES will create virtual "HOV lane" for larger scientific users in Internet2

Tuesday, April 8, 2014

A new, $1 million National Science Foundation grant will enable engineers at Pittsburgh Supercomputing Center (PSC), the National Institute for Computational Sciences, the Pennsylvania State University, the Georgia Institute of Technology, the Texas Advanced Computing Center and the National Center for Supercomputing Applications to create a new tool for high-volume scientific users to achieve faster data transfers over Internet2.

The Developing Applications with Networking Capabilities via End-to-End SDN (DANCES) project will add network bandwidth scheduling capability to the network infrastructure and the supercomputing applications used by the collaborating sites. The DANCES team will develop and integrate file system, scheduling and networking software along with advanced networking hardware. Their aim is to prevent “Big Data” users who are transferring vast amounts of data from being slowed or even halted by periodic surges in competing network traffic.

“There currently is no tool that schedules network resources automatically within our existing scheduling systems,” says Kathy Benninger, PSC Manager of Networking Research and principal investigator in DANCES. “You figure out when you think you need to start your data transfer and then you do it manually.”

But the egalitarian structure of the Internet—and the protocol underlying the majority of network traffic—causes problems for Big Data users. Such researchers and engineers must compete with many other users of all sizes on an equal footing. For example, a researcher transferringa 100-Terabyte data set over a 10 Gbps Internet2 research connection could do the transfer in just over 22 hours. A home user with a typical 15 Mbps Internet connection would need almost 1.7 years to complete the download. But even on the research-only Internet2, such a large user could bebumped and sometimes halted by surges of traffic by other users. An automatic tool that protects designated flows from local congestion—essentially creating a “high occupancy vehicle lane” for large-scale data by prioritizing their traffic—would provide dramatically faster network speeds for Big Data users.

“The idea behind the DANCES tool is that you have an idea of how much data you need to transfer, how long you want to take, and how long your computations will take,” says Joe Lappa, Operations Networking Manager for XSEDE. “So the tool will work backwards and grab the data you need from a site and a network path that isn’t crowded.”

“Instead of having a bunch of equal competing jobs at one time, you’ll be able to push priority data through at a guaranteed, predictable data rate,” Benninger adds.

In addition to developing new software, the DANCES team will use hardware upgrades at the participating institutions that will in essence provide high-speed on-ramps for Big Data users. While most of the chokepoints that the system is intended to bypass are expected to be at the level of the campuses, Internet2 is also participating by monitoring its network capacity, adding more bandwidth if necessary.

Ultimately, the system will provide larger benefits as well. With the Big Data transfers DANCES is designed to serve, the energy wasted and heat generated by slow network speeds is significant.

“It’s greener,” says Lappa. “Your machine’s not waiting. Everything is queued, everything is where it needs to be for a faster data transfer.”

The DANCES web site is available at http://www.dances-sdn.org.

PSC Projects Make Top Supercomputing Discovery List

PSC Projects Make Top Supercomputing Discovery List

Two public health projects at Pittsburgh Supercomputing Center have made HPCwire’s list of “The Top Supercomputing-Led Discoveries of 2013.” The HERMES project is analyzing vaccine supply chains in lower-income countries to identify and repair under-appreciated choke points. The VecNet Cyberinfrastructure project has created a prototype computational system to support a global malaria eradication effort.

Read more: PSC Projects Make Top...

PSC Activates Region's First 100-GE Network

Region’s First 100-Gigabit-per-Second Network Opens in Pittsburgh

Pittsburgh Supercomputing Center Brings New Technology to Regional Users

Monday, Dec. 2, 2013

Pittsburgh Supercomputing Center (PSC) has upgraded the Three Rivers Optical Exchange (3ROX) Internet2 connection to 100 gigabits per second (GE, or gigabit Ethernet). The new connection puts 3ROX at the leading edge of academically based networks, offering users speeds 10 times those of the highest-bandwidth academic and industrial connections in the region. The new connection is about 5,000 times faster than typical home broadband Internet.

Read more: PSC Activates Region's...

PSC Receives Four HPCwire Awards

Pittsburgh Supercomputing Center Receives Four HPCwire Awards

Awards Recognize Outstanding Achievements in Science and Technology

Monday, Nov. 18, 2013

Pittsburgh Supercomputing Center (PSC) has received top national honors in four categories of the 2013 HPCwire Readers’ and Editors’ Choice Awards. HPCwire, the trade publication for the high performance computing (HPC) community, announced the winners at the start of the opening reception at the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis (SC13), in Denver, Colorado.

Read more: PSC Receives Four HPCwire...

$178,000 Grant for Tool to Spot Network Glitches

$178,000 Grant for Tool to Spot Network Glitches

Web10G Project Will Create User-Friendly “Dashboard” for Identifying, Fixing Data Slowdowns

Tuesday, Oct. 29, 2013

The Web10G Project has received a one-year, $178,000 Software Development for Cyberinfrastructure (SDCI) supplemental award from the National Science Foundation to develop a “dashboard” that will allow users of computer networks to identify when and where a networking problem is slowing or blocking their access. Web10G, funded by an earlier, three-year SDCI grant, is a collaboration between Pittsburgh Supercomputing Center (PSC) and the National Center for Supercomputing Applications.

“We’ve found that a lot of network users either have unrealistically high expectations or unrealistically low expectations for network performance,” says Chris Rapier, PSC network applications engineer. “Web10G has produced 127 different instruments that report on what’s going on with the network connection, ways in which it might be failing and ways in which it might be improved. With the supplemental grant, we’re going to automate that process to let users know what’s reasonable and then help them work with their network operations teams to actually get the performance they need.”

 

Read more: $178,000 Grant for Tool...

PSC Lands $7.6-Million Data Exacell Grant

Pittsburgh Supercomputing Center Lands $7.6-Million NSF Grant

Four-Year Project to Prototype the Data Exacell, a Next-Generation System for Integrated Data Storage and Analytics

Monday, Oct. 21, 2013

The National Science Foundation (NSF) has approved a grant to Pittsburgh Supercomputing Center (PSC) to develop a prototype Data Exacell (DXC), a next-generation system for storing, handling and analyzing vast amounts of data. The $7.6-million, four-year grant will allow PSC to architect, build, test and refine DXC in collaboration with selected scientific research projects that face unique challenges in working with and analyzing “Big Data.”

Read more: PSC Lands $7.6-Million...

MARC Program Helps Prepare Students for 21st-Century Biology Careers

Mind the Gap

MARC Program Helps Minority-Serving Institutions Prepare Students for 21st-Century Biology Careers

August 13, 2013

American biology education risks becoming a two-class system. The top-tier institutions understand that bioinformatics—using advanced computing techniques on biological problemswill soon be a job requirement in much of biology, and have expended considerable resources to create bioinformatics classes, degree programs and research centers. Students at institutions without such resources or expertise, on the other hand, are in danger of being left behind.

Read more: MARC Program Helps...

CFL Software, PSC Collaborate on Next Generation Search Technology

CFL Software, PSC Collaborate on Next Generation of Information Searching

JULY 18, 2013

SherlockNew software being developed by CFL Software may transform our ability to search for information in text documents as profoundly as search engines improved upon paper library card catalogs. The software, CFL Discover, will search electronic text documents far more completely and accurately than possible with today’s search technologies.

Pittsburgh Supercomputing Center (PSC) is collaborating with CFL as a strategic partner in developing CFL Discover, making the software available to researchers on Sherlock, a modified version of YarcData’s UrikaTM, a real-time data discovery appliance at the center.

“This is a new venture both in terms of scale and speed in searching for information,” says David Woolls, CEO of CFL Software, which specializes in linguistic document forensics. “In essence, we take over where search engines stop.”

While many users may not be aware of it, search engines don’t completely search all the text in the entire Web — that would take far too long. Instead, they search indexes, keywords, categories and other “metadata” that have been added to those documents. In the case of keywords and categories, that addition has to be made by humans, and so is time-intensive and incomplete. Today’s engines obviously revolutionized our ability to find information, but they are inexact. Many irrelevant sites pop up, and many sites that may be more suitable aren’t captured. In a sense, we all stop when we reach a site that is “good enough” rather than one that’s best for our needs.

“Search engines start with a few words and return a list of documents which contain them,” Woolls adds. “CFL Discover starts with one or more of those documents and reads them for you, shows you the terminology that is shared and gives immediate access to the passages of particular interest to you.”

The program uses YarcData’s industry-standard SPARQL query language and RDF (Resource Description Framework) to search entire texts for meaningful connections between the words in a search query and related language in other texts. This kind of “graph search” enables someone searching for information to find relevant connections that they may not have thought of. The program is written in Java, so is platform independent and can work on anything from a standard PC to a Java-capable supercomputer. (While most supercomputers can’t run Java, two at PSC — Sherlock and Blacklight — do, providing valuable support for research communities that primarily use Java for data analytics.) The choice of platform and computer is solely dependent on the volume and speed of response required.

“It’s less like searching for a needle in a haystack than searching for a needle in a needlestack,” says Arvind Parthasarathi, President, YarcData. The advantage of CFL Discover is that it allows related groups of documents to be rapidly identified, not on the basis of pre-determined keywords and categories, but purely on the similarity of the content. This in turn allows the rapid creation of new combined databases from a collection of existing databases. For example, when searching Wikipedia, entering the title of an article causes CFL Discover to read the database, returning a comprehensive list of potentially interesting articles related to the whole content. And because the framework is RDF, searches of other RDF collections can be readily performed. The principles on which the program works allow it to be used in many different languages, including Arabic, Chinese, Thai and Finnish, which appear to be very disparate to the human eye.

“The structures and sequences inherent to individual documents are all that are needed to encode them,” Parthasarathi says. “New material is easily added to existing stores and is immediately available for use by the search queries.”

CFL Software has carried out proof-of-concept studies of CFL Discover to search U.S. Patent Office record and legal document description sections as well as Wikipedia. The collaboration with PSC will employ the program on PSC’s Sherlock, which is optimized to search extremely large and complex bodies of information with open-ended queries. The new work will explore a substantial portion of the U.S. Patent database, in addition to the full data of Wikipedia in more depth.

“PSC’s role in the partnership is to couple the unique analytic capability of Sherlock running CFL Discover with hosting massive datasets on PSC’s Data Supercell to expand text analytics to unprecedented, interdisciplinary use cases,” says Nick Nystrom, PSC’s director of strategic applications. “Response time is critical for exploring big data, and Sherlock with CFL Discover will provide rapid analyses of unstructured text data larger than can be done on any platform currently available to U.S. researchers.”

“We see high value for a wide range of research and societal applications,” Nystrom adds. Examples include analyzing recent events from news and social media sources, extracting deeper insights from sets of publications, and enabling computational history and culturomics — the quantitative study of cultural phenomena by analyzing large volumes of written records. “Application of high-performance analytics is new to these and similar fields, and will catalyze new ways of leveraging unstructured text data.”

Blacklight Research Spurs Change in Stock Exchange Rules

Blacklight Research Spurs Change in Stock Exchange Rules

July 15, 2013

Findings on the effects of “odd lot” trades on the financial markets, using computations on PSC’s Blacklight, have led the New York Stock Exchange, the Nasdaq Stock Market and the Financial Industry Regulatory Authority Inc. to redefine how the industry tracks small stock trades. The new rules will be enacted in October.

Previously, odd lots — trades of 100 or fewer shares — did not have to be reported to regulators. The rationale was that these trades involved small investors who were unlikely to affect the larger market significantly. But recent volatility in the markets, driven by automated small trades that occur far faster than any human can think, called that assumption into question.

In an upcoming paper in The Journal of Finance, Mao Ye, University of Illinois, Urbana-Champaign, Chen Yao of UIUC and Maureen O’Hara, Cornell University, report that odd lots are playing an increasingly important role in the wider behavior of the markets. The researchers used Blacklight and the San Diego Supercomputer Center’s Gordon to analyze market data for the effects of odd-lot trading.

“For every 100 trades of Google, 52 to 53 of them” are in the form of odd lots, Ye observes. “There are more missing trades than trades you can see. In terms of volume, more than 20 percent of the trading volume [among all stocks] is missing” in the official count.

The widely held suspicion is that the largest and most sophisticated traders are using automated trading in odd lots to hide their activities from other traders. In any case, the researchers showed that including the odd lots significantly alters our understanding of the markets. Partly in response to this research, in June 2013 the market authorities agreed to a plan to require all trades, of as few as one share, to be reported.

“In the U.S., they care a lot about the transparency of the market,” Ye explains. The new rule change will remove “a kind of darkness we cannot see and that we never realized was there.”

PSC covered the group’s work in detail in a recent article that you can find here.

PSC Media Contacts

Media / Press Contact(s):

Kenneth Chiacchia
Pittsburgh Supercomputing Center
chiacchi@psc.edu
412-268-5869

Vivian Benton
Pittsburgh Supercomputing Center
benton@psc.edu
412.268.4960

Website Contact

Shandra Williams
Pittsburgh Supercomputing Center
shandraw@psc.edu
412.268.4960

Use of PSC materials: To request permission to use PSC materials, please complete this form.

Events Calendar

<<  December 2018  >>
 Su  Mo  Tu  We  Th  Fr  Sa 
        1
  2  3  4  5  6  7  8
  9101112131415
16171819202122
23242526272829
3031