Sometimes Using Less Abstraction is Better

Last week I was working on getting the Apache Qpid unit tests running on Windows. The unit tests are arranged to take advantage of the fact that the bulk of the Qpid client and broker is built as shared/dynamic libraries. The unit tests invoke capabilities directly in the shared libraries, making it easier to test. Most of the work needed to get these tests built on Windows was taken care of by the effort to build DLLs on Windows. However, there was a small but important piece remaining that posed a challenge.

Being a networked system, Qpid tests need to be sure it correctly handles situations where the network or the network peer fails or acts in some unexpected way. The Qpid unit tests have a useful little class named SocketProxy which sits between the client and broker. SocketProxy relays network traffic in each direction but can also be told to drop pieces of traffic in one or both directions, and can be instructed to drop the socket in one or both directions. Getting this SocketProxy class to run on Windows was a challenge. SocketProxy uses the Qpid common Poller class to know when network data is available in one or both directions, then directly performs the socket recv() and send() as needed. This use of Poller, ironically, was what caused me problems. Although the Windows port includes an implementation of Poller, it doesn’t work in the same fashion as the Linux implementation.

In Qpid proper, the Poller class is designed to work in concert with the AsynchIO class; Poller detects and multiplexes events and AsynchIO performs I/O. The upper level frame handling in Qpid interacts primarily with the AsynchIO class. Below that interface there’s a bit of difference from Linux to Windows. On Linux, Poller indicates when a socket is ready, then AsynchIO performs the I/O and hands the data up to the next layer. However, the Windows port uses overlapped I/O and an I/O completion port; AsynchIO initiates I/O, Poller indicates completions (rather than I/O ready-to-start), and AsynchIO gets control to hand the resulting data to the next layer. So, the interface between the frame handling and I/O layers in Qpid is the same for all platforms, but the way that Poller and AsynchIO interact can vary between platforms as needed.

My initial plan for SocketProxy was to take it up a level, abstraction-wise. After all, abstracting away behavior is often a good way to make better use of existing, known-to-work code, and avoid complexities. So my first approach was to replace SocketProxy’s direct event-handling code and socket send/recv operations with use of the AsynchIO and Poller combination that is used in Qpid proper.

The AsynchIO-Poller arrangement’s design and place in Qpid involves some dynamic allocation and release of memory related to sockets, and a nice mechanism to do orderly cleanup of sockets regardless of which end initiates the socket close. Ironically, it is this nice cleanup arrangement which tanked its use in the SocketProxy case. Recall that SocketProxy’s usefulness is its ability to interrupt sockets in messy ways, but not be messy itself in terms of leaking handles and memory. My efforts to get AsynchIO and Poller going in SocketProxy resulted in memory leaks, sockets not getting interrupted as abruptly as needed for the test, and connections not getting closed properly. It was a mess.

The solution? Rather than go up a level of abstraction, go down. Use the least common denominator for what’s needed in a very limited use case. I used select() and fd_set. This is just what I advise customers not to do. Did I lose my mind? Sell out to time pressure? No. In this case, using less abstraction was the correct approach – I just didn’t recognize it immediately.

So what made this situation different from “normal”? Why was it a proper place to use less abstraction?

The use case is odd. Poller and AsynchIO are very well designed for running the I/O activities in Qpid, correctly handling all socket activity quickly and efficiently. They’re not designed to force failures, and that’s what was needed. It makes no sense to redesign foundational classes in order to make a test harness more elegant.
The use is out of the way. It’s a test harness, not the code that has to be maintained and relied on for efficient, correct performance in deployed environments.
It’s needs are limited and isolated. SocketProxy handles only two sockets at a time. Performance is not an issue.

Sometimes less is more – it works in abstractions too. The key is to know when it really is best.

Tags: qpid, Sockets, tcp

This entry was posted on March 24, 2009 at 8:19 am and is filed under qpid, Sockets, tcp/ip, windows. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Sometimes Using Less Abstraction is Better”

Joannah Says:
March 26, 2009 at 10:06 am | Reply
I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Joannah

http://linuxmemory.net
- stevehuston Says:
  April 6, 2009 at 4:53 pm | Reply
  Thanks, Joannah! Glad to have you, and I hope to post more things that are useful for you.

Steve Huston's Networked Programming Blog