Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

PSC's Data Exacell

The Data Exacell (DXC) is a pilot project to create, deploy, and test software and hardware building blocks designed to support data-analytic capabilities for data-intensive scientific research.

The DXC focuses on data storage mechanisms and their coupling to specialized, powerful engines for data analytics. The data storage mechanisms are an extension of PSC’s successful Data Supercell (DSC) technology, which replaced a conventional tape-based archive with a disk-based system to economically provide the much lower latency and higher bandwidth necessary for data-intensive research projects.

Campuses XSEDE Instruments

The data analytics engines include PSC’s existing Blacklight and Sherlock systems and planned upgrades. Blacklight is an SGI UV1000 with very large (2×16TB) hardware-supported cache-coherent memory for memory-resident computation on large datasets. Sherlock is a YarcData Urika™ graph-analytic appliance with extreme hardware multithreading capabilities for efficiently executing graph algorithms, particularly for datasets expressed in RDF, for which Sherlock can accommodate graphs of approximately five billion edges.

Research groups with uniquely demanding data analytic challenges are collaborating to test and harden the DXC.

DXC Extends PSC’s SLASH2 and Data Supercell technologies to enable:

 Collaborative data analytics across researchers’ sites and datasets

— Cross-domain analytics

— Distributed, web-based workflows

 Tightly-coupled computational resources for data analytics

— Uniquely large shared memory

— Purpose-built graph capabilities

 Enhanced ease of use

DXC is supported by a $7.6M, 4-year NSF DIBBs award

 Data Infrastructure Building Blocks


Pilot Applications

Selected for requiring diverse kinds of innovation in storage and coupled analytics—initial pilot applications are as follows:

   Identifying changes in gene pathways that cause tumors University of Pittsburgh Department of Biomedical Informatics

    Semantic understanding of large, multimedia datasets Carnegie Mellon University School of Computer Science

   Exploring and understanding the universe National Radio Astronomy Observatory

    Enabling bioinformatic workflows Penn State University

    Data integration and fusion for world history University of Pittsburgh School of Information Sciences and World History Data Center

    Neural connectomics PSC and Harvard University



Using the Data Execell