NEWS FLASH: Web10Gig has launched!
See web10g.org for all future updates!
No further Web100 updates are expected. The only recent changes have been the IPv6 userland bugfix (version 1.8) and the annotated tcp-kis backport below.
While the national high-performance network infrastructure has grown tremendously both in bandwidth and accessibility, it is still common for applications, hosts, researchers and other users to be unable to take full advantage of this new and improved infrastructure. Without expert attention from network engineers, users are unlikely to achieve even 10 Mbps single stream TCP transfers, despite the fact that the underlying network infrastructure can support data rates of 100Mbps or more. On unloaded networks, this poor performance can be attributed primarily to two factors: host system software (principally TCP) that is optimized for low bandwidth environments, and the lack of effective instrumentation and tools to diagnose performance issues.
The Web100 project was created to address these problems. The first is addressed with automatic TCP buffer tuning. The Web100 work in this area has been merged with main-line Linux kernel, and is contained in recent releases. To address the other problem, we have created a set of TCP instruments, defined in RFC 4898. Prototypes of these instruments were implemented in Linux with the Web100 kernel patch.
Note: Web100 is now several years beyond the end-of-funding. We are still updating the kernel patches to track new Linux versions. A good portion of the user base and developers still monitor the discussion list. The only remaining support for the Web100 project is via the discussion list.
If you want production quality code, please ask your OS vendor to support RFC 4898 in their products.
The Web100 software implements instruments in the Linux TCP/IP stack. It is distributed in two pieces: a kernel patch adding the instruments, and a suite of "userland" libraries and tools for accessing the kernel instrumentation.
Current download quick links
Support for Web100 is no longer provided.
Web100 was developed and managed by The Pittsburgh Supercomputing Center through the partnerships of The National Center for Atmospheric Research, The National Center for Supercomputing Applications, Cisco Systems, and The National Science Foundation.
Install instructions for the NPAD pathdiag server
- A fairly fast, preferably Gigabit-attached *Linux* server.
- A working development environment including a C compiler and a recent version of Python.
- If you want plots of the test data (strongly recommended), you need to install gnuplot and the python-gnuplot library.
(e.g.: apt-get install gnuplot python-gnuplot )
- Web100 must be installed and properly tuned.
- The Web100 kernel patch version must be 2.5.5 or later (Linux 2.6.13 or later).
- The Web100 userland must be 1.5 or later.
- BIC and other experimental congestion control must be off
(echo reno > /proc/sys/net/ipv4/tcp_congestion_control).
- Sufficient TCP buffer space for the largest pipe you want to test
If things go wrong, see "PROBLEMS" below.
- To tell if Web100 has been properly set up, run this script:
python -c "import Web100; Web100.Web100Agent(); print('Success.')"
If this does not print "Success." then the NPAD tools WILL NOT WORK. You must get Web100 set up correctly first.
Much of these instructions can be cut and pasted into a bash shell.
The current source setup is as follows:
npad-1.5.6/ - Configuration and common resources
napd-1.5.6/pathdiag/ - The pathdiag tool
npad-1.5.6/diag_server/ - The diagnostic client/server framework
- Pick a location to manage the sources. This will be the parent of the build dir, such that future release will just "drop in".
SRC=/usr/src cd $SRC
- Unpack the source:
tar xzvf npad-1.5.6.tar.gz cd npad-1.5.6
- Copy over your old config
NB: When upgrading from an older version, check RELEASENOTES.txt for specific information about any compatibility issues.
- If you have a previously set up installation (or if you are upgrading from a previous version), you may want to copy your custom configuration. Copy config.xml from the previously installed location (by default, this will be in $SRC/npad-dist//), or from the previous source tree, to your current build directory.
- If you have a site-customized server form (e.g. a modified template_diag_form.html), copy this file to your current build directory, though if you are upgrading, be sure to check for changes in our most recent template_diag_form.html you may want to merge.
If you do not have a customized form, you can create one by editing template_diag_form.html in place. Certain critical sections are included with %%keyword%% directives processed by the config script. You should keep all of these in place somewhere within the file, probably in their default order. Most customizations should be to the "look and feel" of the page.
The form should work as-is for most installations.
- Run the config script
You will be prompted with on-line help for each config option. If you have already run config.py and would like to change your configuration, run ./config.py -p.
You can confirm that the libraries are correctly installed and searched by displaying the help:
If this does not display pathdiag help the NPAD tools WILL NOT WORK. See PROBLEMS #5 below.
Depending on the permissions for your installation directories, you may need to become root before running make install.
Note that the server will not run in the source area, you must do an install to properly assemble all of the components. You can run the server as yourself (it does not require root) if appropriately configured.
- Start the server by hand. This can be done by running DiagServer.py in your chosen installation directory. By default:
- Run tests.
Run a web browser on a machine that you want to test, and browse to the NPAD welcome page. Scroll down to the "The Test Form" section. It should have 2 field-s labeled "Target RTT" and "Target rate". Fill in the fields and click "Start test". A log window will appear showing some log messages while the test runs. When it completes, another browser window will appear showing the test results. The results are saved in an HTML file on the server.
Hint: if something goes wrong with the tester, it is easier to use the command line client to capture output for debugging.
- Set up the server to start at boot time.
There is a Sys V init script provided ('npad' in your build directory). On Sys V systems (most Linux distributions), you should be able to copy this to /etc/init.d/npad, and start the service with:
and set it up to automatically start/stop with your distribution's init script management tools.
If this script does not work with your Linux distribution, you will need to determine the best way to start and stop the service, perhaps by creating your init script.
- Problems on step 0:
- Problems on step 5:
- If pathdiag -help does not generate a usage message, it is failing to find all of the necessary libraries. This probably means that the system dynamic loader is not searching the default locations fouser added libraries, such as web100. The easiest fix is to add the following to your login rc file (e.g. to .bashrc):
- Problems on step 8:
- If the web applet showed no useful output but -help worked) then try running the "c" command line version.
In either case if output includes messages something like:
Web100 setup error a 'web100_group *' is expected, 'PySwigObject(xxxxxxx)' is received C/C++ variable 'gread'
Then the precompiled swig output included with the pathdiag sources is incompatible with the python run time on your system. There are 2 possible solutions:
- Edit napd-1.5.6/pathdiag/Makefile to select a different SAVESWIG version, and start over with step 5.
- Manually install swig on your system, then "make clean; make makeswig" in napd-1.5.6/pathdiag/Makefile, and start over with step 5.
This is a sample sanitized log output from an extended logging enabled SSHD server. Since privlege seperation caused multiple SSHD instances to be spawned for each connection the remote IP and port information is displayed in each log line. This should allow for machine parsable tracking of single connections over multiple SSHD instantiaations. The typical format of the extended logs is as follows:
SSH: Server;Ltype: Log data type; Remote: RemoteIP-RemotePort;Log data name: Log data value
There are four log data types.
- Version: Contains the protocol level and client version information
- Kex: Key Exchange result information including the encryption (Enc:) used, MAC (MAC:) used, and compression (Comp:) used
- Authname: The remote user name
- Throughput: Contains the amount of data seen on the STDOUT and STDIN of the server, duraction of the connection, and average throughput in both directions in bytes per second.
Nov 15 14:55:18 delta sshd: Server listening on 0.0.0.0 port 22221. Nov 15 14:55:33 delta sshd: SSH: Server;Ltype: Version;Remote: 184.108.40.206-49913;Protocol: 2.0;Client: OpenSSH_4.7p1-hpn12v19 Nov 15 14:55:34 delta sshd: SSH: Server;Ltype: Kex;Remote: 220.127.116.11-49913;Enc: aes128-cbc;MAC: hmac-md5;Comp: none Nov 15 19:55:34 delta sshd: SSH: Server;Ltype: Authname;Remote: 18.104.22.168-49913;Name: rapier Nov 15 14:55:35 delta sshd: Accepted publickey for rapier from 22.214.171.124 port 49913 ssh2 Nov 15 14:55:53 delta sshd: SSH: Server;LType: Throughput;Remote: 126.96.36.199-49913;IN: 84608;OUT: 205357344;Duration: 17.7;tPut_in: 4790.9;tPut_out: 11628400 .6 Nov 15 15:11:12 delta sshd: SSH: Server;Ltype: Version;Remote: 188.8.131.52-38262;Protocol: 2.0;Client: OpenSSH_4.7p1-hpn12v19 Nov 15 15:11:12 delta sshd: SSH: Server;Ltype: Kex;Remote: 184.108.40.206-38262;Enc: aes128-cbc;MAC: hmac-md5;Comp: none Nov 15 20:11:13 delta sshd: SSH: Server;Ltype: Authname;Remote: 220.127.116.11-38262;Name: rapier Nov 15 15:11:14 delta sshd: Accepted publickey for rapier from 18.104.22.168 port 38262 ssh2 Nov 15 15:11:31 delta sshd: SSH: Server;LType: Throughput;Remote: 22.214.171.124-38262;IN: 84704;OUT: 205362048;Duration: 17.1;tPut_in: 4954.2;tPut_out: 12011361 .8 Nov 15 15:12:29 delta sshd: SSH: Server;Ltype: Version;Remote: 126.96.36.199-38265;Protocol: 2.0;Client: OpenSSH_4.7p1-hpn12v19 Nov 15 15:12:29 delta sshd: SSH: Server;Ltype: Kex;Remote: 188.8.131.52-38265;Enc: aes128-cbc;MAC: hmac-md5;Comp: none Nov 15 20:12:30 delta sshd: SSH: Server;Ltype: Authname;Remote: 184.108.40.206-38265;Name: rapier Nov 15 15:12:30 delta sshd: Accepted publickey for rapier from 220.127.116.11 port 38265 ssh2 Nov 15 15:12:42 delta sshd: SSH: Server;LType: Throughput;Remote: 18.104.22.168-38265;IN: 4752;OUT: 1824;Duration: 12.0;tPut_in: 396.8;tPut_out: 152.3 Nov 15 15:13:00 delta sshd: SSH: Server;Ltype: Version;Remote: 22.214.171.124-38266;Protocol: 2.0;Client: OpenSSH_4.7p1-hpn12v19 Nov 15 15:13:00 delta sshd: SSH: Server;Ltype: Kex;Remote: 126.96.36.199-38266;Enc: arcfour;MAC: hmac-md5;Comp: none Nov 15 20:13:00 delta sshd: SSH: Server;Ltype: Authname;Remote: 188.8.131.52-38266;Name: rapier Nov 15 15:13:01 delta sshd: Accepted publickey for rapier from 184.108.40.206 port 38266 ssh2 Nov 15 15:13:05 delta sshd: SSH: Server;LType: Throughput;Remote: 220.127.116.11-38266;IN: 3440;OUT: 776;Duration: 4.5;tPut_in: 768.4;tPut_out: 173.3
IP Utils Summary:
IP-Utils is a collection of C++ Classes that enable data communication via a simple API for various IP-based networking protocols. Designed as a layered architecture (mimicking the ISO layers), it uses file descriptor reference counting to provide safe copying of networking objects. An application requests either a half-tuple (e.g., TCPConn tcp_connection;) or an entire flow (e.g., TCPSession tcp_session(FRAMING_TYPE);) and communicates data over those objects using a specific message framing (e.g., HTTPFraming). Different message framing is achieved within flows by encapsulating the different framing headers within a single Class (i.e., MsgHdr), which the flow Classes (e.g., TCPSession) interface with. The IP-Util Classes can be used individually, or the collection can be built into a library archive.
Additional utility Classes for error handling, logging, URL handling and file management are also included and used by the networking Classes. Since C++ exception handling is *not* looked on favorably, error handling is done through the ErrorHandling Class, which provides a global structure that Classes can use to initiate and append to *events* for processing by the application. Likewise, logging is done through the Logger Class, which similarly uses a mechanism based on a global object. The Logger object allows Classes to report events by both priority (syslog(3)) and mechanism (e.g., stderr, syslog, file, script), however, logging will *not* be enabled unless the code that links with IP-Utils is complied using the flag "-DUSE_LOGGER". A minimal Class to parse and generate URLs is used by the HTTPFraming class, referred to as URL. And finally, the File Class handles all disk I/O used by the networking Classes, both low-level I/O and streaming FILE* I/O. All four non-networking utility Classes can be used outside of IP-Utils, if so desired. Features of IP-Utils Classes, include:
- - DOxygen comments included in header files.
- - Handles IPv4 or IPv6 communications.
- - Provides TCP or UDP objects.
- - Currently supports *struct-based* and HTTP framing (including MIME-type handling).
- - Code written to Google C++ Style Guide <https://code.google.com/p/google-styleguide/>
The layered design of IP-Utils allows easy expansion. For example, with little work a UDP flow Class could be generated (technically, UDP is connection-less, however, flows can exist nonetheless). Moreover, multiple message framing could be supported by IP-Utils simply by adding the appropriate Class (e.g., SNMP) and adding the hooks within the MsgHdr Class.
Test Rig 2.0
An automated optimized network testing platform
During the course of diagnosing off site network performance issues it is often necessary to run a variety of tests on a machine at the far end of the network path. If the investigator has root access to a device at this location this process is usually quickly finished and relatively painless. However, it is often the case that the far end of path is located within another organizations network where security and accounting considerations makes this difficult if not impossible. In these cases it is necessary for the investogator to work with a user who may or may not have appropriate authority and/or the skill set to run a full set of diagnostsic tests. Likewise, variations in the installed operating system on of the remote host may, at times, obscure issues that exist entirely within the network. The end result is that it may take days, if not weeks, for network performance problems to be accurately identified. Not because of technical problem but, almost always, because of unavoidable logistical impediments to the diagnostic process.
In an attempt to obviate some of these obstacles PSC has developed Test Rig 2.0. Based on ideas first developed by Jeff Semke during his time at PSC, Test Rig is full suite of automated tests performed within a known optimal environment that is easily performed by even quite naive users. The foundation of the system is the distribution of a 'live' linux ISO (Ubuntu 10.04) that is burned by the user on to a bootable CD. This is used to non-destructively boot a test machine into a known optimal configuration from which a standardized set of diagnostic tests are run. The results of these tests are then packaged and transfered to PSC where upon they are reviewed by a network engineer. Upon completion the user simply reboots the machine to return it to it's original configuration.
There are a number of advantages to Test Rig
- Optimized OS environment is standardized across all tests.
- Tests are run with out any user intervention.
- No need for root access.
- Tests are completely self contained - no additional software to install.
- Users are given the option of running the tests with multiple congenstion controll algorithms.
- Tests are often completed and the results returned within a few minutes.
- No need to modify the host operating system.
- No specialized skill set required on the part of the user.
- Burn the ISO to a CD. (Directions)
- Boot the host in question from the CD.
- Type 'live' at the boot prompt.
- Type 'net_test.pl' at the command prompt.
- Answer three questions and let the test complete.
- A network engineer with get in touch with you shortly.