There’s No Substitute for Experience with TCP/IP Sockets

The number of software development tools and aids available to us as we begin 2009 is staggering. IDEs, code generators, component and class libraries, design and modeling tools, high-level protocols, etc. were just speculation and dreams when I began working with TCP/IP in 1985. TCP and IP were not yet even approved MIL-STDs and the company I worked for had to get US Department of Defense permission to connect to the fledgling Internet. The “Web” was still 10 years away. If you wanted to use TCP/IP for much more than FTP, Telnet, or email you had to write the protocol and the code to run it yourself. The Sockets API was the highest level access we had at the time. That is a whole area of difficulty and complexity in and of itself, which C++ Network Programming addresses. But the API is more of a usage and programming efficiency issue – what I’m talking about today is the necessity of experience and understanding what’s going on between the API and the wire when working with TCP/IP, regardless of the toolkit or language or API you use.

A lot of software running on what many people consider “the net” piggy-backs on HTTP in one form or another. There are also many helpful libraries, such as .NET and ACE, to assist in writing networked applications at a number of levels. More specific problem areas also have very useful targeted solutions, such as message queuing systems and Apache Qpid. And, like most programming tasks, when everything’s ideal, it’s not too hard to get some code running pretty well. It’s when things don’t work as well as you planned that the way forward becomes murky. That’s when experience is needed. These are some examples of issues I’ve assisted people with lately:

  1. Streaming data transfer would periodically stop for 200 msec, then resume
  2. Character strings transferred would intermittently be bunched together or split apart
  3. Asynchronous I/O-based code stopped working when ported to Linux

The tendency when a problem such as this comes up is to find out who, or what, is to blame. In my experience, the first attempt at blame is usually laid on the most recent addition to the programming toolset – the piece trusted the least and that’s usually closest to the application being written. For ACE programs, this is usually why I get involved so early.

I’ve spent many years debugging applications and network protocol code. I spent way too much time trying to blame the layer below me, or the OS, or the hardware. The biggest lesson I learned is that when something goes wrong with code I wrote, it’s usually my problem and it’s usually a matter of some concept or facility I don’t understand enough to see the problem clearly or find the way to a solution. That’s why it’s so important to understand the features and functionality you are making use of – there’s no substitute for experience.

Helping my clients solve the three problems I mentioned above involved experience. Knowing where to target further diagnosis and gathering the right information made the difference between solving the problem that day and staring at code for days wondering what’s going on. Curious about what the problems were?

  1. Slow-start peculiarity on the receiver; disable Nagle’s on the receiving side.
  2. That’s the streaming nature of TCP. Need to mark the string boundaries and check for them on receive.
  3. Linux silently converts asynchronous socket I/O operations to synchronous and executes them in order; need to restructure order of operations in a very restricted way, or switch paradigm on Linux.

Although each client initially targeted blame at the same place, the real solution was in a different layer for each case. And none involved the software initially thought to be at fault.

When you are ready to begin developing a new networked application, or you’re spending yet another night staring at code and network traces, remember: there’s a good chance you need a little more clarity on something. Take a step back, assume the tools you’re using are probably correct, and begin to challenge your assumptions about how you think it’s all supposed to work. A little more understanding and experience will make it clear.

Advertisements

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: