wiki:WikiStart
Last modified 14 months ago Last modified on 03/01/14 17:52:43


Debugging networked applications

Application failures due to network issues are some of the most difficult to diagnose and debug. The failure may be due to in-network state or state maintained by a remote end-host, both of which are invisible to an application host. For instance, data may be dropped due to MTU issues, NAT devices and firewalls introduce problems like connection blocking, default IPv6 options can cause IPv4 applications to fail, and default buffer size settings can cause UDP datagrams to be dropped. Such failures are challenging for developers and administrators to understand and to fix. Numerous fault diagnosis tools have been developed, but few of these tools are applicable to large applications whose source code is not available. Without source code administrators often resort to probing tools such as ping and traceroute, which can help to diagnose reachability, but cannot diagnose application-level issues.

The NetCheck tool

NetCheck is a tool that determines the cause of a failure in a networked application. In contrast with most prior approaches, NetCheck does not require application- or network-specific knowledge to perform its diagnoses, and no modification to the application or the infrastructure is necessary. NetCheck treats an application as a blackbox and requires just a set of system call (syscall) invocation traces from the relevant end-hosts. These traces can be easily collected at runtime with standard blackbox tracing tools, such as strace. To perform its diagnosis, NetCheck derives a global ordering of the input syscalls by simulating the syscalls against a network model. The model is also used to identify those syscalls that deviate from expected network semantics. These deviations are then mapped to a diagnosis by using a set of heuristics.

Further reading

Get the source code

To check out the repository anonymously (read-only):

$ svn co https://netcheck.poly.edu/svn/project/

And if you want to perform the checkout as another user (e.g., "USER"), then run:

$ svn co --username USER https://netcheck.poly.edu/svn/project/

Run NetCheck

To run NetCheck:

$ python netcheck.py CONFIG_FILE

or:

$ python netcheck.py -u trace_file1 trace_file2 ... trace_fileN

You may also want to look at CONFIG_FILE samples and CONFIG_FILE format description.