Improved Network Diagnosis and Repair Goal of NSF Grant
Nov. 12, 2015
A proposed automated system will allow users to provide network administrators with information about faulty connections, greatly speeding network diagnosis and repair. This system, to be called TestRig 2.0, is the goal of a new $300,000 NSF grant to researchers at the Pittsburgh Supercomputing Center (PSC).
“When researchers encounter network problems they naturally reach out to network engineers,” says Chris Rapier, PSC Senior Research Programmer and principal investigator in the grant. “However, the engineers have to rely on the user to provide them with enough information to properly diagnose the problem. This means multiple rounds of email, phone calls, tests, and often times results in significant delays.”
It's not unusual for this process to take days, if not weeks, before the engineers receive enough information to start their diagnosis and implement a solution. TestRig will get around this back-and-forth cycle, giving the user a dynamically-generated Linux disk image that reboots the system into a known good environment and automatically performs a variety of tests. TestRig will then send the test results to the appropriate network engineer without user intervention.
“It's automated from start to finish,” Rapier adds. “So where it once took days to collect the relevant information, it can now be completed in less than 15 minutes.
Importantly, TestRig will be made available to network operations centers (NOCs) and engineers without requiring the installation of any local infrastructure. “The goal is to make this as easy as possible,” Rapier says. “So instead of having the NOCs install servers, distribute the bootable disk images, and maintain databases we’ll do all that for them. All they will need to do is sign up for our service and create a local account that can receive the files.”
TestRig will perform tests using existing perfSONAR network measurement servers. The project builds on PSC's ongoing Web10G effort to open TCP/IP networking protocols so that network administrators and users can identify and repair networking problems.