Archive for the ‘Uncategorized’ Category

How We Converted the Apache Qpid C++ Build to CMake

June 1, 2009

A previous post covered why the Apache Qpid C++ build switched to CMake; this post describes how it was done.

The project was generously funded by Microsoft. We started the conversion in February 2009. At this point, the builds have been running well for a while; the test executions are not quite done. So, it took about 3 months to get the builds running on both Linux and Windows. We’re working on the testing aspects now. We have not really addressed the installation steps yet. There were only two aspects of the Qpid build conversion that weren’t completely straight forward:

  1. The build processes XML versions of the AMQP specification and the Qpid Management Framework specification to generate a lot of the code. The names of the generated files are not known a priori. The generator scripts produce a list of the generated files in addition to the files themselves. This list of files obviously needs to be plugged into the appropriate places when generating the makefiles.
  2. There are a number of optional features to build into Qpid. In addition to explicitly enabling or disabling the features, the autoconf scheme checked for the requisite capabilities and enabled the features when the user didn’t specify. It built as much as it could if the user didn’t specify what to build (or not to build).

To start, one person on the team (Cliff Jansen of Interop Systems) ran the existing automake through the KDE conversion steps to get a base set of CMakeLists.txt files and did some initial prototyping for the code generation step. The original autoconf build ran the code generator at make time if the source XML specifications were available at configure time (in a release kit, the generated sources are already there, and the specs are not in the kit). The Makefile.am file then included the generated lists of sources to generate the Makefile from which the product was built. Where to place the code generating step in the CMake scheme was a big question. We considered two options:

  • Do the code generation in the generated Makefile (or Visual Studio project). This had the advantage of being able to leverage the build system’s dependency evaluation and regenerate the code as needed. However, once generated, the Makefile (or Visual Studio project) would need to be recreated by CMake. Recall that the code generation generates a list of source files that must be in the Makefile. We couldn’t get this to be as seamless as desired.
  • Do the code generation in the CMake configuration step. This puts the dependency evaluation in the CMakeLists.txt file, and had to be coded by hand since we wouldn’t have the build system’s dependency evaluation available. However, once the code was generated, the list of generated source files was readily available for inclusion in the Makefile (and Visual Studio project) file generation and the build could proceed smoothly.

We elected the second approach for ease of use. The CMakeLists code for generating the AMQP specification-based code looks like this (note this code is covered by the Apache license):

# rubygen subdir is excluded from stable distributions
# If the main AMQP spec is present, then check if ruby and python are
# present, and if any sources have changed, forcing a re-gen of source code.
set(AMQP_SPEC_DIR ${qpidc_SOURCE_DIR}/../specs)
set(AMQP_SPEC ${AMQP_SPEC_DIR}/amqp.0-10-qpid-errata.xml)
if (EXISTS ${AMQP_SPEC})
  include(FindRuby)
  include(FindPythonInterp)
  if (NOT RUBY_EXECUTABLE)
    message(FATAL_ERROR "Can't locate ruby, needed to generate source files.")
  endif (NOT RUBY_EXECUTABLE)
  if (NOT PYTHON_EXECUTABLE)
    message(FATAL_ERROR "Can't locate python, needed to generate source files.")
  endif (NOT PYTHON_EXECUTABLE)

  set(specs ${AMQP_SPEC} ${qpidc_SOURCE_DIR}/xml/cluster.xml)
  set(regen_amqp OFF)
  set(rgen_dir ${qpidc_SOURCE_DIR}/rubygen)
  file(GLOB_RECURSE rgen_progs ${rgen_dir}/*.rb)
  # If any of the specs, or any of the sources used to generate code, change
  # then regenerate the sources.
  foreach (spec_file ${specs} ${rgen_progs})
    if (${spec_file} IS_NEWER_THAN ${CMAKE_CURRENT_SOURCE_DIR}/rubygen.cmake)
      set(regen_amqp ON)
    endif (${spec_file} IS_NEWER_THAN ${CMAKE_CURRENT_SOURCE_DIR}/rubygen.cmake)
  endforeach (spec_file ${specs})
  if (regen_amqp)
    message(STATUS "Regenerating AMQP protocol sources")
    execute_process(COMMAND ${RUBY_EXECUTABLE} -I ${rgen_dir} ${rgen_dir}/generate gen
                           {specs} all ${CMAKE_CURRENT_SOURCE_DIR}/rubygen.cmake
                           WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
  else (regen_amqp)
    message(STATUS "No need to generate AMQP protocol sources")
  endif (regen_amqp)
else (EXISTS ${AMQP_SPEC})
  message(STATUS "No AMQP spec... won't generate sources")
endif (EXISTS ${AMQP_SPEC})

# Pull in the names of the generated files, i.e. ${rgen_framing_srcs}
include (rubygen.cmake)

With the code generation issue resolved, I was able to get the rest of the project building on both Linux and Windows without much trouble. The cmake@cmake.org email list was very helpful when questions came up.

The remaining not-real-clear-for-a-newbie area was how to best handle building optional features. Where the original autoconf script tried to build as much as possible without the user specifying, I put in simpler CMake language to allow the user to select options, try the configure, and adjust settings if a feature (such as SSL libraries) was not available. This took away a convenient feature for building as much as possible without user intervention, though with CMake’s ability to very easily adjust the settings and re-run the configure, I didn’t think this was much of a loss.

Shortly after I got the first set of CMakeLists.txt files checked into the Qpid subversion repository, other team members started iterating on the initial CMake-based build. Andrew Stitcher from Red Hat quickly zeroed in on the removed capability to build as much as possible without user intervention. He developed a creative approach to setting the CMake defaults in the cache based on some initial system checks. For example, this is the code that sets up the SSL-enabling default based on whether or not the required capability is available on the build system (note this code is covered by the Apache license):

# Optional SSL/TLS support. Requires Netscape Portable Runtime on Linux.

include(FindPkgConfig)

# According to some cmake docs this is not a reliable way to detect
# pkg-configed libraries, but it's no worse than what we did under
# autotools
pkg_check_modules(NSS nss)

set (ssl_default ${ssl_force})
if (CMAKE_SYSTEM_NAME STREQUAL Windows)
else (CMAKE_SYSTEM_NAME STREQUAL Windows)
  if (NSS_FOUND)
    set (ssl_default ON)
  endif (NSS_FOUND)
endif (CMAKE_SYSTEM_NAME STREQUAL Windows)

option(BUILD_SSL "Build with support for SSL" ${ssl_default})
if (BUILD_SSL)

  if (NOT NSS_FOUND)
    message(FATAL_ERROR "nss/nspr not found, required for ssl support")
  endif (NOT NSS_FOUND)

  foreach(f ${NSS_CFLAGS})
    set (NSS_COMPILE_FLAGS "${NSS_COMPILE_FLAGS} ${f}")
  endforeach(f)

  foreach(f ${NSS_LDFLAGS})
    set (NSS_LINK_FLAGS "${NSS_LINK_FLAGS} ${f}")
  endforeach(f)

  # ... continue to set up the sources and targets to build.
endif (BUILD_SSL)

With that, the Apache Qpid build is going strong with CMake.

During the process I developed a pattern for naming CMake variables that play a part in user configuration and, later, in the code. There are two basic prefixes for cache variables:

  • BUILD_* variables control optional features that the user can build. For example, the SSL section shown above uses BUILD_SSL. Using a common prefix, especially one that collates near the front of the alphabet, puts options that users change most often right at the top of the list, and together.
  • QPID_HAS_* variables note variances about the build system that affect code but not users. For example, is a header file present, or a particular system call.

Future efforts in this area will complete the transition of the test suite to CMake/CTest, which will have the side affect of making it much easier to script the regression test on Windows. The last area to be addressed will be how downstream packagers make use of the new CMake/CPack system for building RPMs, Windows installers, etc. Stay tuned…

USB pass-thru in RHEL 5 Xen VM doesn’t work; why do I buy support?

February 23, 2009

As part of my efforts to maintain ACE+TAO on LabVIEW RT (with Pharlap ETS kernel) I have a setup to run the test suite on a National Instruments chassis, driven by the build system on Windows. This arrangement is easily handled by ACE+TAO build environment, including a mechanism to reboot the NI box when things go wrong. The reboot is triggered by a USB-connected NI USB-6009 device that trips the reset signal on the NI box. It’s very slick and keeps from having to cycle power. The hitch is that it requires a USB 2.0 connection from the Windows machine.

In the past I’d used a VMware virtual machine hosted on Linux (RHEL 4) running a Windows guest OS to host this test environment. The VMware software passed the USB device through to the Windows VM without a hitch. However, over the winter I got a new machine set up with a great deal more capacity and decided to move the LabVIEW RT test environment to the new machine which runs RHEL 5 and Xen.

And that’s when the trouble started…

First I had to search quite a bit to find out how to configure the Xen VM to pass the USB device to the guest OS. After a bit of googling and reading, I found the magic configuration lines to add. I also found another blog entry (http://www.olivetalks.com/2008/02/03/usb-forwarding-on-xen-it-just-does-not-work/) saying it wouldn’t work right. But I forged on, confident that even if it didn’t work “out of the box” I had purchased support from Red Hat and could get any help I needed.

Well, long story short, the USB device didn’t pass through correctly from Xen. On December 9, 2008 I opened a support case with Red Hat to have them do whatever is needed to make it work. After twelve (12) exchanges over 22 days I requested escalation to someone who could do more to help than quote manual sections that were not applicable to what I needed.

After 11 more exchanges with 3 more support engineers over another 49 days I got the long-awaited answer: “It doesn’t work.”

Well, I wasn’t totally surprised since I had no success and had already seen a blog posting saying it won’t work. But I was still clinging to hope that my support contract would come through and Red Hat would make it work. Nope. Sorry. It don’t work. End of story.

So why do I buy support? Sure, I get all the updates, but I paid extra for someone to actually work on problems for me and all I get is “It doesn’t work.”? When my customers raise issues about ACE not working, they get fixes. Solutions. You know, like they paid for.

Apparently, solutions are optional for other providers.

So what happened in the end? I went back to running the Windows VM in a VMware environment, where it’s happily chugging along.

My Experiences Porting Apache Qpid C++ to Windows

January 9, 2009

I recently finished (modulo some capabilities that should be added) porting Apache Qpid’s C++ implementation to Microsoft Windows. Apache Qpid also sports Java broker and client as well as Python, Ruby, C# and .NET clients. For my customer’s project I needed C++ which had, to date, been developed and used primarily on Linux. What I thought would be a slam dunk 20-40 hour piece of work took about 4 months and hundreds of hours. Fortunately, my customer’s projects waiting for this completion also were delayed and my customer was very accommodating. Still, since I goofed the estimate so wildly I only billed the customer a fraction of the hours I worked. Fortunately, I rarely goof estimates that badly. This post-project review takes a look at what went wrong and why it ended up a very good thing.

When I first looked through the Qpid code base, I got some initial impressions:

  • It’s nicely layered, which will make life easy
  • It’s hard to find one’s way around it
  • The I/O layer (at the bottom of the stack) is easily modified for what I need

The first two impressions held; the third did not. Most of the troubles and false starts had to do with the I/O layer at the bottom of the stack. Most of the rest of the code ported over with relative ease. The original authors did a very nice job isolating code that was likely to need varying implementations. Those areas generally use the Bridge pattern to offer a uniform API that’s implemented differently as needed.

The general areas I had to work on for the port are described below.

Synchronization

Qpid uses multiple threads – no big surprise for a high-performance networked system. So there’s of course a need for synchronization objects (mutex, condition variables, etc.) The existing C++ code had nice wrapper classes and a Pthreads implementation. The options for completing the Windows implementation were:

  • Use native Windows system calls
  • ACE (relatively obvious for me)
  • Apache Portable Runtime (it’s an Apache project after all)
  • Boost (Qpid already made use of Boost in many other areas)

Windows system calls were ruled out fairly quickly because they don’t offer all that was needed (particularly, condition variables) on XP and the interaction of the date-time parts of the existing threading/synch objects and Windows system time was very clunky.

I was hesitant to add ACE as an outside requirement just for the Windows port. I was also sensitive to the fact that as a newbie on this particular project I could be seen as simply trying to wedge ACE in for my own sinister purposes (which is definitely not the case!). So scratch that one.

After a brief but unsuccessful attempt at APR (and being told that some previous APR use was abandoned) I settled on Boost. This was my first project using Boost and it took some getting used to, but overall was pretty smooth.

Thread Management

The code that actually spawned and managed threads was easily implemented using native Windows system calls. Straight-forward and easy.

I/O

This is where all the action is. The existing code comments (there aren’t many, but what was there was descriptive) talked about “Asynch I/O.” This was welcome since I planned to use overlapped I/O to get high throughput; Windows’ implementation of asynchronous (they call it overlapped) I/O is very good, scales well and performs very well. The interface to the I/O layer from the upper level in Qpid looked good for asynchronous I/O and I got a little over confident. In retrospect, the name of the event dispatcher class (Poller) should have tipped me off that I had some difficulty ahead.

The Linux code’s Poller implementation uses Linux epoll to get high performance and remain very scalable. The code is solid and well designed and implemented. However, it is event driven, synchronous I/O and that tends to show a bit more than maybe intended. Handles need to be registered with the Poller, for example, something that’s not done with overlapped I/O.

My first attempt at the Windows version of a Poller implementation was large and disruptive. Fortunately, once I offered it up for review I received a huge amount of help from the I/O layer expert on the project. He and I sat down for a morning to review the design, the code I came up with, and best ways to go forward. The people I’ve worked with on Apache Qpid are consummate professionals and I’m very thankful for their input and guidance.

My second design for the I/O layer went much better. It doesn’t interfere with the Linux code, and slides in nicely with very little code duplication. I think that after another port or two are done where more of these implementations need to be designed, it may be possible to refactor some of the I/O layer to make things a bit cleaner, but that’s very minor at this point – the code works very well and integrates without disruption.

Lessons Learned

So what did I learn from this?

  1. It pays to spend a little extra time reading the existing code before designing extensions. Even when it looks pretty straight-forward. Even if you have to write some design documentation and run it by the original author(s).
  2. Forge good relationships with the other people on the team. This is an easy one when you all work in the same group, even in the same building. It’s more often assumed to be difficult at best when the group is spread around the world and across (sometimes competing) companies. It’s worth the effort.

So although the project took far longer than I originally estimated, the result is a good implementation that fits with the rest of the system and performs well. I could have wedged in my original bad design in far less time, but someone would have had to pick up the pieces later. The design constraints and rules that were not written before are somewhat written now (at least in the Windows code). If I do another port, it’ll be much smoother next time.

Where to Go From Here?

There are a few difficulties remaining for the Windows port and a few capabilities that should be added:

  • Keep project files synched with generated code. The Qpid project’s build process generates a lot of code from XML protocol specifications. This is very nice, but runs into trouble keeping the Visual Studio project files up to date as the set of generated files changes. I’ve been using the MPC tool to generate Visual Studio projects and MPC can pick up names by wildcard, but that still leaves an extra step: generate C++ code, regenerate project files. This need has caused a couple of hiccups during the Qpid M4 release process where I had to regenerate the project files. It would be nice if Visual Studio’s C++ build could deal with wildcards, or if the C++  project file format allowed inclusion of an outside file listing sources (which could be generated along with the C++ code).
  • Add SSL support. The Linux code uses OpenSSL. I’d rather not pull in another prerequisite when Windows knows how to do SSL already. At least I assume it does, and in a way that doesn’t require an HTTP connection to use. I’ve yet to figure this out…
  • Persistent storage for messages. AMQP (and Qpid) allows for persistent message store, guaranteeing message delivery in the face of failures. There’s not yet a store for Qpid on Windows, and it should be added.
  • Add the needed “declspec”s to build the Qpid libraries as DLLs; currently they’re built as static libraries.
  • Minor tweaks making Qpid integrate better with Windows, such as a message definition for storing diagnostics in the event log and being able to set the broker up as a Windows Service.

Apache Qpid graduates incubator; now a top-level project

December 11, 2008

The Apache Qpid project has been in incubation at the Apache Software Foundation for quite a while now, having delivered at least 3 releases of Apache Qpid. Recently the Apache Software Foundation board of directors voted to graduate the project from the incubator as a new top-level project (TLP) at Apache. This is a major milestone for Qpid and is based on:

  • Proven ability to manage and coordinate development and release a product
  • Cultivate a community of developers with sufficient diversity

I joined the Apache Qpid project this past summer, primarily to lead the port to Windows. I’ve been impressed with the development team’s professionalism, experience, and commitment to quality.

Congratulations to the Apache Qpid team on this great accomplishment!

What’s “Networked Programming” all about?

November 15, 2008

When I’m asked what type of work I do, I often seem to grab for the just the right terms to describe it. But it’s a blind spot for me, I guess. I have been writing network protocol software and networked applications for over 25 years, am considered a network programming expert, and have co-authored three books on the subject, but am not real big on buzzwords. When I mention I write software to make networks more useful, people assume it’s a web type of thing.

Actually, I do networked applications and systems involving pretty much anything except the web. When I started doing this, I actually used serial lines and modems. DECnet, ring-net, and I helped implement the TCP/IP stack (twice) back when you needed US DoD permission to connect to the Internet. Although TCP/IP (and it’s assorted related protocols) drive the Internet today, TCP/IP is used in many applications that don’t touch “the Net”. Medical devices, automobiles, cell phones, industrial processes… practically anything involving more than one computer that needs to talk is what I put in the category of “networked application.”

Some people think it odd that I can specialize in such an area. After all, once you get some piece of software running in one computer, it’s pretty straight-forward to talk to another right? Aren’t there standards for that sort of thing? Well, yes there are. And the nice thing about them is that there are so many to choose from. And that’s just in the “plumbing” – once you put a network between two pieces of your system, the number of issues to be aware of and be able to work with explodes. Timing, byte orders, rogue data attacks, accidental complexities… the list goes on and on. And that’s where I come in – my job is to keep these issues from derailing projects, their schedules, and the jobs that depend on them. I love this stuff…

So the major purpose of this blog is to discuss issues related to networked programming and how to do it better. I hope you’ll join in and share your experiences too.

And if you are a buzzword-literate person and have a moment, do you have a better term for this than “networked applications”?