The Little Yellow Elephant at PSC

This post was going to be an introduction to Hadoop, but I found there are many, many, many pages doing just that. (And an hour long O’Reilly Webcast, which I highly recommend.) Instead I thought I’d give you a little tour of what Hadoop does for us at PSC.

Hadoop allows us to better utilize our newly upgraded disk storage system. The tricky part is that we are now tracking approximately 4,000 disk drives, 7 servers, and Ethernet ports. Each of these items generates elephantine log files that would be otherwise impractical to analyze. So this is where the little yellow elephant, Hadoop, comes in.

Hadoop allows us to easily analyze the vast amounts of information coming in from all of the disks, servers, and ports. (Not that we have failed disks, but you know, just in case!) We can follow trends and pinpoint any arising issues.

One way to analyze issues and trends is via Mahout. (We did the Wikipedia search for you: A mahout is a person who drives an elephant, cool right!) Apache Mahout has machine-learning libraries built in: collaborative filtering, clustering, frequent pattern recognition, and genetic algorithms. There are a number of add-ons for Hadoop, which is the beauty of open source.

If you’d like to learn more about Hadoop, Mahout, or anything else relating to managing, storing, and analyzing big data, you can come visit PSC for our monthly Pittsburgh Hadoop Users Group Meetup.

Our next meetup will be Wednesday, February 15 and we’ll be joined by Shannon Quinn, who will talk to us about Mahout; Bryon Gill, who will talk about how he used Hadoop to revitalize old hardware and build a new cluster; and JRay Scott, who will give a brief status update on building a Hadoop cluster on his Lenovo X200 Windows laptop.

The Hadoop Users Group Pittsburgh consists of members from local universities and companies. 6 – 8 p.m., Pittsburgh Supercomputing Center, 300 S. Craig St., The Stiles Lecture Hall, Room 103.

Editor’s note: Beth Albert is an occasional blog contributor.  She works in our Facilities group as a Technical Writer.

About Ken Chiacchia

Ken Chiacchia's bio reads like a random sampling of events from different people's lives. The senior science writer at PSC, Ken has been a biochemist, a public relations writer, a freelance newspaper reporter, a science fiction author, an emergency responder, and a hobby farmer. A volunteer dog handler and wilderness EMT with Allegheny Mountain Rescue Group of Pittsburgh, he lives with his wife, dog trainer and writer Heather Houlahan, and an assorted cloud of canine partners and fosters, barn cats, chickens, turkeys, ducks, and goats, on a 26-acre farm in Harmony, Pa.
This entry was posted in General and tagged , , . Bookmark the permalink.

Comments are closed.

Media Contacts

Media Contact(s):

Ken Chiacchia
Pittsburgh Supercomputing Center
chiacchi@psc.edu
412.268.4960
 
Shandra Williams
Pittsburgh Supercomputing Center
shandraw@psc.edu
412.268.4960

Projects in Scientific Computing, 2012

PSC's Annual Research Report

Projects2012

Subscriptions: You can receive PSC news releases via e-mail. Send a blank e-mail to psc-wire-join@psc.edu.

News Archive: 2012, 201120102009

2008200720062005

2004200320022001

2000199919981997

1996199519941993


PSC Logo Download: PSC's logo is available for use in print, e-media and presentation application. Various formats are available here.

Use of PSC materials: To request permission to use PSC materials, please complete this form.