PSC's Data Exacell
The Data Exacell (DXC) is a pilot project to create, deploy, and test software and hardware building blocks designed to support data-analytic capabilities for data-intensive scientific research.
The DXC focuses on data storage mechanisms and their coupling to specialized, powerful engines for data analytics. The data storage mechanisms are an extension of PSC’s successful Data Supercell (DSC) technology, which replaced a conventional tape-based archive with a disk-based system to economically provide the much lower latency and higher bandwidth necessary for data-intensive research projects.
The data analytics engines include PSC’s existing Blacklight and Sherlock systems and planned upgrades. Blacklight is an SGI UV1000 with very large (2×16TB) hardware-supported cache-coherent memory for memory-resident computation on large datasets. Sherlock is a YarcData Urika™ graph-analytic appliance with extreme hardware multithreading capabilities for efficiently executing graph algorithms, particularly for datasets expressed in RDF, for which Sherlock can accommodate graphs of approximately five billion edges.
Research groups with uniquely demanding data analytic challenges are collaborating to test and harden the DXC.
DXC Extends PSC’s SLASH2 and Data Supercell technologies to enable:
♦ Collaborative data analytics across researchers’ sites and datasets
— Cross-domain analytics
— Distributed, web-based workflows
♦ Tightly-coupled computational resources for data analytics
— Uniquely large shared memory
— Purpose-built graph capabilities
♦ Enhanced ease of use
DXC is supported by a $7.6M, 4-year NSF DIBBs award
♦ Data Infrastructure Building Blocks
Selected for requiring diverse kinds of innovation in storage and coupled analytics—initial pilot applications are as follows:
Identifying changes in gene pathways that cause tumors University of Pittsburgh Department of Biomedical Informatics
Semantic understanding of large, multimedia datasets Carnegie Mellon University School of Computer Science
Exploring and understanding the universe National Radio Astronomy Observatory
Enabling bioinformatic workflows Penn State University
Data integration and fusion for world history University of Pittsburgh School of Information Sciences and World History Data Center
Neural connectomics PSC and Harvard University