Debugging networked applications
Application failures due to network issues are some of the most difficult to diagnose and debug. The failure may be due to in-network state or state maintained by a remote end-host, both of which are invisible to an application host. For instance, data may be dropped due to MTU issues, NAT devices and firewalls introduce problems like connection blocking, default IPv6 options can cause IPv4 applications to fail, and default buffer size settings can cause UDP datagrams to be dropped. Such failures are challenging for developers and administrators to understand and to fix. Numerous fault diagnosis tools have been developed, but few of these tools are applicable to large applications whose source code is not available. Without source code administrators often resort to probing tools such as ping and traceroute, which can help to diagnose reachability, but cannot diagnose application-level issues.
The NetCheck tool
NetCheck is a tool that determines the cause of a failure in a networked application. In contrast with most prior approaches, NetCheck does not require application- or network-specific knowledge to perform its diagnoses, and no modification to the application or the infrastructure is necessary. NetCheck treats an application as a blackbox and requires just a set of system call (syscall) invocation traces from the relevant end-hosts. These traces can be easily collected at runtime with standard blackbox tracing tools, such as strace. To perform its diagnosis, NetCheck derives a global ordering of the input syscalls by simulating the syscalls against a network model. The model is also used to identify those syscalls that deviate from expected network semantics. These deviations are then mapped to a diagnosis by using a set of heuristics.
- NSDI 2014 paper on NetCheck.
- Program traces that we used in the paper.
- A technical report detailing NetCheck diagnoses for different kinds of input traces
- A detailed description of NetCheck's design
- A listing of error transitions in NetCheck's network model
- A README with code-level details of NetCheck components
Get the source code
To check out the repository anonymously (read-only):
$ svn co https://netcheck.poly.edu/svn/project/
And if you want to perform the checkout as another user (e.g., "USER"), then run:
$ svn co --username USER https://netcheck.poly.edu/svn/project/
To run NetCheck:
$ python netcheck.py CONFIG_FILE
$ python netcheck.py -u trace_file1 trace_file2 ... trace_fileN