Archive for December, 2008

There’s No Substitute for Experience with TCP/IP Sockets

December 31, 2008

The number of software development tools and aids available to us as we begin 2009 is staggering. IDEs, code generators, component and class libraries, design and modeling tools, high-level protocols, etc. were just speculation and dreams when I began working with TCP/IP in 1985. TCP and IP were not yet even approved MIL-STDs and the company I worked for had to get US Department of Defense permission to connect to the fledgling Internet. The “Web” was still 10 years away. If you wanted to use TCP/IP for much more than FTP, Telnet, or email you had to write the protocol and the code to run it yourself. The Sockets API was the highest level access we had at the time. That is a whole area of difficulty and complexity in and of itself, which C++ Network Programming addresses. But the API is more of a usage and programming efficiency issue – what I’m talking about today is the necessity of experience and understanding what’s going on between the API and the wire when working with TCP/IP, regardless of the toolkit or language or API you use.

A lot of software running on what many people consider “the net” piggy-backs on HTTP in one form or another. There are also many helpful libraries, such as .NET and ACE, to assist in writing networked applications at a number of levels. More specific problem areas also have very useful targeted solutions, such as message queuing systems and Apache Qpid. And, like most programming tasks, when everything’s ideal, it’s not too hard to get some code running pretty well. It’s when things don’t work as well as you planned that the way forward becomes murky. That’s when experience is needed. These are some examples of issues I’ve assisted people with lately:

  1. Streaming data transfer would periodically stop for 200 msec, then resume
  2. Character strings transferred would intermittently be bunched together or split apart
  3. Asynchronous I/O-based code stopped working when ported to Linux

The tendency when a problem such as this comes up is to find out who, or what, is to blame. In my experience, the first attempt at blame is usually laid on the most recent addition to the programming toolset – the piece trusted the least and that’s usually closest to the application being written. For ACE programs, this is usually why I get involved so early.

I’ve spent many years debugging applications and network protocol code. I spent way too much time trying to blame the layer below me, or the OS, or the hardware. The biggest lesson I learned is that when something goes wrong with code I wrote, it’s usually my problem and it’s usually a matter of some concept or facility I don’t understand enough to see the problem clearly or find the way to a solution. That’s why it’s so important to understand the features and functionality you are making use of – there’s no substitute for experience.

Helping my clients solve the three problems I mentioned above involved experience. Knowing where to target further diagnosis and gathering the right information made the difference between solving the problem that day and staring at code for days wondering what’s going on. Curious about what the problems were?

  1. Slow-start peculiarity on the receiver; disable Nagle’s on the receiving side.
  2. That’s the streaming nature of TCP. Need to mark the string boundaries and check for them on receive.
  3. Linux silently converts asynchronous socket I/O operations to synchronous and executes them in order; need to restructure order of operations in a very restricted way, or switch paradigm on Linux.

Although each client initially targeted blame at the same place, the real solution was in a different layer for each case. And none involved the software initially thought to be at fault.

When you are ready to begin developing a new networked application, or you’re spending yet another night staring at code and network traces, remember: there’s a good chance you need a little more clarity on something. Take a step back, assume the tools you’re using are probably correct, and begin to challenge your assumptions about how you think it’s all supposed to work. A little more understanding and experience will make it clear.

Apache Qpid graduates incubator; now a top-level project

December 11, 2008

The Apache Qpid project has been in incubation at the Apache Software Foundation for quite a while now, having delivered at least 3 releases of Apache Qpid. Recently the Apache Software Foundation board of directors voted to graduate the project from the incubator as a new top-level project (TLP) at Apache. This is a major milestone for Qpid and is based on:

  • Proven ability to manage and coordinate development and release a product
  • Cultivate a community of developers with sufficient diversity

I joined the Apache Qpid project this past summer, primarily to lead the port to Windows. I’ve been impressed with the development team’s professionalism, experience, and commitment to quality.

Congratulations to the Apache Qpid team on this great accomplishment!