Steve Huston's Networked Programming Blog

AMQP 1.0 becomes ISO 19464

May 1, 2014

A guest post by John O’Hara, Originator of AMQP

I’d like to thank Steve for inviting me to write a few words on his blog.

I remember using Steve’s ACE framework on a banking project back in the 90’s before I got to know him through his contributions to AMQP (Advanced Message Queuing Protocol).

I’ve taken to posting a rare blog entry because today AMQP becomes ISO 19464.

ISO standardization is a high bar requiring international consensus to achieve and as a result it confers an associated sense of stability and longevity. I think this milestone will establish AMQP as the backbone of business computing; the unseen, essential networks that hold the world together.

What does this mean?

It means that for the first time, we have standardized connectivity that is portable between businesses and which combines security, event subscription and the enactment of transacted reliable messages. With AMQP / ISO 19464, we can be reliably enact ecommerce without depending on third-party exchanges or requiring proprietary software and this removes many of the technical barriers to dealing with new business partners.

This portability makes AMQP an important technology for switching workloads between Cloud providers, where it is already widely adopted.

AMQP also enhances the potential of other ISO standards including ISO 20022 (Unifi) the standard for financial messaging by combining the means to perform financial transactions with a secure reliable transport.

How might this affect me?

If you are in business or government computing, you should take a look at how AMQP / ISO 19464 can connect your internal systems and your business partners in an open, secure, agile manner. Take a look too if you’re writing mobile apps, multiplayer games, computer animations, social networks, departmental applications or if you are building out the Internet of Things. An AMQP implementation will get you to market faster.

How did this happen?

I’m the father of AMQP, the guy with a vision who invited those first bold firms to join the AMQP Working Group. There, a dozen or so middleware experts and users from leading firms worked together to realize a shared vision to make messaging better. We transitioned to OASIS as part of a deliberate journey towards openness and today’s ISO recognition. We wanted to get to ISO to ensure AMQP could be relied upon for the long term by the largest consumers of middleware; government, healthcare, finance, etc. We wanted it to be dependable for decades, an industrial level of stability that is necessary when connecting between independent entities.

What next?

As we developed AMQP we gained profound insights in to the nature of messaging. When you work long enough with it you discover that the patterns in messaging are axiomatic truths of nature to be discovered rather than invented. Early iterations of AMQP embody useful innovations but also contain issues that made it difficult to use at global scale or to admit existing products. More recent editions corrected these and also reduced the scope slightly to broaden its applicability. What we have today in AMQP / ISO 19464 delivers the essential core of messaging and opens the door to future innovations in intelligent networking. I’m excited about what people will do with it.

What is this in layman’s terms?

Take social networks as an example. These are based on the idea of creating queues of events (they call them timelines) and notifying subscribers (they call them friends) of interesting developments. AMQP brings the basic machinery for doing that to every software developer. But it also adds security and performance optimizations, and it adds the ability to perform transactions. AMQP sets a high bar for the lowest common denominator of capability and then makes that pervasive.

AMQP has been a long time in the making since it was conceived in 2003. Achieving consensus in something as complex as messaging and working it through the rigorous processes that lead to standardization takes time.

I want to express again my thanks and sincere appreciation for the commitment, the intellect, the passion and the good humor of all the people who have helped make AMQP. These are people I am proud to count among my friends and mentors.

John O’Hara
Originator of AMQP

For more views and news on today’s ISO announcement concerning AMQP, please see blog posts from Ram Jeyaraman and Brian Benz of Microsoft, and David Ingham of Red Hat.

Posted in amqp | Leave a Comment »

Book Review: “Boost.ASIO C++ Network Programming” by John Torjo

April 20, 2013

I make a living doing network programming, so I was very interested to review a new book, Boost.ASIO C++ Network Programming by John Torjo, published by Packt Publishing. On a scale of 1-10 I give it a solid 6. The good news is that I haven’t seen anything written about Boost.ASIO that’s better, so if you are interested in learning more about Boost.ASIO, I recommend you buy this book. I’ll explain further.

My first impression when starting to read this book is that it’s poorly edited. It feels like it was written in a rush and produced in an even bigger rush. There are spelling errors, grammar errors, and it just feels second rate. As I made my way through the book it didn’t get any better. Don’t let this dissuade you from learning from the real content, though.

Technically speaking Torjo attempts to do a few big things:

Explain use of Boost.ASIO
Explain network programming
Explain synchronous vs. asynchronous programming paradigms

If you are not already fluent in the general network programming concepts, I recommend you purchase other books (Stevens, Schmidt, Schmidt [disclaimer: I co-authored this]) as a prerequisite to Boost.ASIO. Torjo succeeds fairly well at the first point, though, which is the primary purpose of the book. You should also have a passing familiarity with other Boost classes. While Boost.ASIO is not template-heavy like some other Boost areas, shared pointers and function binding are used well in the examples.

Boost.ASIO explains the use of the io_service, resolver, endpoint, and socket classes fairly well. Torjo hand-waves a bit about the relationship of io_service and socket, but once you’ve become accustomed to programming with Boost.ASIO, that becomes second nature. The book includes many well-explained examples, which I really appreciate. If you follow the examples you will end up with working code.

The book at one point claims to be useful as a reference to return to over and over for details. While Boost.ASIO is certainly not a Boost.ASIO reference manual (you should bookmark the Boost docs for that) it is a very useful book for explaining how to make good use of this very flexible and useful class library for network programming.

Posted in Sockets, tcp/ip, Uncategorized | 2 Comments »

Apache Qpid 0.10 Released

May 4, 2011

The following announcement was issued by the Apache Qpid project:

The Apache Qpid community is pleased to announce the immediate availability of Apache Qpid 0.10.

Apache Qpid (http://qpid.apache.org) is a cross-platform enterprise messaging solution which implements the Advanced Message Queuing Protocol (AMQP, http://www.amqp.org). It provides brokers written in C++ and Java, along with clients for C++, Java JMS, .Net, Python, and Ruby.

Qpid 0.10 is available from our website:

http://qpid.apache.org/download.cgi

The 0.10 release has undergone a lot of bug fixing and feature enhancement. We recommend that all users upgrade. A few changes of note:

The C++ broker now supports priority-ordered message queuing
The C++ broker and client now implement producer flow control
The Java JMS client is now available via Maven

A comprehensive list of features, improvements, and bugs resolved in the 0.10 release is available here:

http://qpid.apache.org/release_notes_0.10.html

Posted in amqp, qpid | Leave a Comment »

Diagnosing Stack/Heap Collision on AIX

April 29, 2011

I was recently confronted with a program that mysteriously aborted (Trace/BPT trap) at run time on AIX 7.1 (but not on AIX 6.1). Usually. But not on all systems or on all build settings.

This program is the ACE Message_Queue_Test; in particular, the stress test I added to it to ensure that blocks are counted properly when enqueues and dequeues are happening in different combinations from different threads. It ended up not being particular to ACE, but I did add a change to the test’s build settings to account for this issue. But I’m getting ahead of myself…

The symptoms were that after the queue writer threads had been running a while and the reader threads started to exit, a writer thread would hit a Trace/BPT trap. The ACE_Task in this thread had its members all zeroed out, including the message queue pointer, leading to the trap. I tried setting debug watches on the task content but still no real clues.

Yes, the all-zeroes contents of the wiped stack should have tipped me off, but hind-sight is always 20-20.

The other confusion was that the same program built on AIX 6.1 would run fine. But copy it over to AIX 7.1, and crash! So, I opened a support case with IBM AIX support to report the brokenness of the binary compatibility from AIX 6.1 to 7.1. “There. That’s off to IBM’s hands,” I thought. “I hope it isn’t a total pain to get a fix from them. Let’s see what Big Blue can do.”

If you’ve been reading this blog for a while you may recall another support experience I related here, from a different support-providing company that wears hats of a different color than Big Blue. As you may recall, I was less than impressed.

Within hours I got a response that IBM had reproduced the problem. Although they could crash my program on AIX 7.1 and 6.1. They wanted a test case, preprocessed source, to get more info. I responded that they could download the whole 12 MB ACE source kit – the source is in there. Meanwhile I set off to narrow down the code into a small test case, imagining the whole AIX support team laughing hysterically about this joker who wanted them to download a 12 MB tarball to help diagnose a case.

I came back from lunch yesterday gearing up to get my test case together. There was email from IBM support. “Is this where they remind me that they want a small test case?” I wondered.

Nope. The email contained the dbg steps they used to diagnose the problem (which was mine), the 3 choices of ways to resolve the problem, and pointers to the AIX docs that explained all the background.

Wow.

AIX support rocks. I mean, I very often help customers diagnose problems under ACE support that end up being problems in the customer’s code. But I’ve never experienced that from any other company. Really. Outstanding.

So what was the problem in the end? The segment 2 memory area, which holds both the heap and the process stacks, was overflowing. The program was allocating enough memory to cause the heap to run over the stacks. (Remember the zeroed-out stack content? The newly allocated memory was being cleared.)

This is how the diagnosis went:

(dbx) run

Trace/BPT trap in
ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*) at line  39 in file "Task_T.inl" ($t12)
39     return this->msg_queue_->enqueue_tail (mb, tv);
(dbx) list 36,42
36   ACE_Task<ACE_SYNCH_USE>::putq (ACE_Message_Block *mb,
ACE_Time_Value *tv)
37   {
38     ACE_TRACE ("ACE_Task<ACE_SYNCH_USE>::putq");
39     return this->msg_queue_->enqueue_tail (mb, tv);
40   }
41
42   template <ACE_SYNCH_DECL> ACE_INLINE int

(dbx) 0x10000f20/12 i
0x10000f20
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*))
7c0802a6        mflr   r0
0x10000f24
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x4)
9421ffc0        stwu   r1,-64(r1)
0x10000f28
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x8)
90010048         stw   r0,0x48(r1)
0x10000f2c
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0xc)
90610058         stw   r3,0x58(r1)
0x10000f30
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x10)
9081005c         stw   r4,0x5c(r1)
0x10000f34
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x14)
90a10060         stw   r5,0x60(r1)
0x10000f38
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x18)
80610058         lwz   r3,0x58(r1)
0x10000f3c
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x1c)
0c430200      twllti   r3,0x200
0x10000f40
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x20)
80610058         lwz   r3,0x58(r1)
0x10000f44
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x24)
806300a4         lwz   r3,0xa4(r3)
0x10000f48
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x28)
0c430200      twllti   r3,0x200
0x10000f4c
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x2c)
80630000         lwz   r3,0x0(r3)

(dbx) 0x2FF2289C/4 x
0x2ff2289c:  0000 0000 0000 0000

(dbx) malloc
The following options are enabled:

Implementation Algorithm........ Default Allocator (Yorktown)

Statistical Report on the Malloc Subsystem:
Heap 0
heap lock held by................ pthread ID 0x200248e8
bytes acquired from sbrk().......    267402864 <***!!!
bytes in the freespace tree......        15488
bytes held by the user...........    267387376
allocations currently active.....      4535796
allocations since process start..      9085824

The Process Heap
Initial process brk value........ 0x2001e460
current process brk value........ 0x2ff222d0 <***!!!
sbrk()s called by malloc.........       4071

*** Heap has reached the upper limit of segment 0x2 and
collided with the initial thread's stack.
Changing the executable to a 'large address model' 32bit
exe should resolve the problem (in other words give
it more heap space).

# ldedit -b maxdata:0x20000000 MessageQueueTest
ldedit:  File MessageQueueTest updated.
# dump -ov MessageQueueTest

MessageQueueTest:

***Object Module Header***
# Sections      Symbol Ptr      # Symbols       Opt Hdr Len     Flags
6      0x004cde82         142781                72     0x1002
Flags=( EXEC DYNLOAD DEP_SYSTEM )
Timestamp = "Apr 23 14:51:24 2011"
Magic = 0x1df  (32-bit XCOFF)

***Optional Header***
Tsize        Dsize       Bsize       Tstart      Dstart
0x001b7244  0x0001d8ec  0x000007b8  0x10000178  0x200003bc

SNloader     SNentry     SNtext      SNtoc       SNdata
0x0004      0x0002      0x0001      0x0002      0x0002

TXTalign     DATAalign   TOC         vstamp      entry
0x0007      0x0003      0x2001cc40  0x0001      0x20017f7c

maxSTACK     maxDATA     SNbss       magic       modtype
0x00000000  0x20000000  0x0003      0x010b        1L
# ./MessageQueueTest
#                     <-- NO CRASH!

Summary: Increasing the default heap space from 256M(approx.) to 512M resolved the problem. IBM gave me three ways to resolve this:

Edit the executable as above with ldedit
Relink the executable with -bmaxdata:0x20000000
Set environment variable LDR_CNTRL=MAXDATA=0x20000000

I ended up changing the Message_Queue_Test’s MPC description to add -bmaxdata to the build. That was the easiest way to always get it correct and make it easier for the regression test suite to execute the program.

Lastly, here’s the link IBM gave me for the ‘large address model’:

http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/lrg_prg_support.htm

Bottom line – the test is running, the project is done, I have a sunny afternoon to write this blog entry and enjoy the nice New England spring day – instead of narrowing down a test case. Thanks, IBM!

Posted in ACE, Helpful sites, threads, Uncategorized | Leave a Comment »

ace/OS.h is Back!

March 9, 2011

A lot of previously deprecated classes, methods, and header files were removed from ACE for the release of ACE 6.0 in December 2010. Since then, there’s been a steady stream of questions, from both Riverace support customers and user group members along the lines of “Where the $%^@* is OS.h?!”

Oops…

Some explanation is in order.

OS.h is one of the oldest files in ACE and it had grown quite unwieldy and large over the years. The past few years have seen an effort to reduce build times for ACE and OS.h was a prime target of that work. Most of its content had been reorganized and moved to a number of new files with the prefix OS_NS (e.g., OS_NS_string.h) leaving OS.h as primarily a collection of #include “ace/OS_NS_…h” lines. A mere shell of its former ugly self.

OS.h had been marked “deprecated” for years and was finally removed during the housecleaning for ACE 6. However, the number of ACE apps in the field that still include OS.h was not taken enough into account. Hence, the steady stream of complaints regarding its absence.

We heard you loud and clear. I resurrected OS.h today and it will be back in ACE for the 6.0.2 micro release as well as the 6.0b Fix Kit for Riverace support customers. It’s also available as attachment to the “Missing header files building against ACE 6.0” solution in the Riverace Knowledge Base.

Tags: ACE
Posted in ACE | 2 Comments »

Python Distutils and RPMs Targeting /opt

February 21, 2011

I think I’ve mentioned before that I’ve been writing more Python code lately. Python is very powerful and the modules available for it make lots of things very easy, including networked application programming.

Today, though, I want to share a little tidbit about creating Linux RPMs to distribute Python scripts/modules that install to a root other than /usr. In particular, a customer had a request to install under /opt. I’ll use /opt/riverace for this example.

Like many Python tasks, there’s already an easy way to generate a RPM. The Python distutils are very cool and it’s very easy to take a distutils-using setup and use python to generate the RPM. It’s as easy as:

python setup.py bdist_rpm

There are lots of blogs, doc pages, etc. that explain how to create your setup.py. The thing I couldn’t find, though, is how to get the RPM to install into /opt.

It turned out to be simple. Just add this to your setup.py:

import sys
sys.prefix = /opt/riverace
setup(name='MyStuff',
...

Voila!

Tags: RPM python distutils
Posted in Uncategorized | 1 Comment »

Read and Follow All Directions Carefully. And, Firefox SSL Settings for Accessing IBM pSeries ASMI via HTTPS

January 13, 2011

This is a public service announcement for those with IBM pSeries servers who muck up the ASMI setup. And for those that don’t but it doesn’t work anyway.

Sometimes when I’m in a hurry I don’t always follow the directions to the letter, especially if I’m confident I know what’s going on.

Never do that. Especially setting up new hardware. The people who write the directions spell those steps out for a reason.

A while back I installed a new IBM pSeries server. I’m no sysadmin guru, but I thought hey, I’ve hooked up more than a few new computers in my time. How hard could it be?

I don’t have an HMC, so I needed to cable up my ethernet LAN to the HMC port to access the ASMI via a web browser. (In hindsight, I should have known I was wandering into shark-infested waters with all those new acronyms.) The installation manual has a rather lengthy description of how to do this, involving configuring a PC or laptop ethernet interface in a particular way, wire that directly to the HMC port, type a specific URL into a web browser, log in, then reconfigure the IP address etc for the local LAN, move the cable to the LAN and off you go. Easy.

Well I thought I could take a few shortcuts. I am, after all, a network programming guy. I’ve implemented IP. Multiple times. And I’m in a hurry.

Well, it may have been the install manual (it is a bit confusing and seems to contradict itself) but probably not. In any event, I somehow wedged the HMC ethernet port into an unusable state. Somehow I did manage to get the server up and things hummed along nicely.

Until it happened. The server crashed and hung on reboot. What a lovely paperweight. Without access to the ASMI I was stuck. As far as I could tell, I was going to have to reset the service processor to factory defaults and start over, following the directions carefully this time. Now how to reset it?

After a frantic call to IBM, I got a very helpful person on the phone. After explaining my bungling the HMC ethernet setup and why I needed to reset the SP, he asked “Why don’t you just use the serial port and reset the network parameters to what you need?”

Oh.

That went pretty quickly. Network parameters now set to the correct values, port connected to LAN, here we go… get Firefox up, give it the magic URL, and…

“Cannot communicate securely with peer: no common encryption algorithm(s).

(Error code: ssl_error_no_cypher_overlap)”

My friendly IBM fellow had no advice for this one.

So I got wireshark going and watched the exchange between Firefox and the server that produced this error. Short and sweet – one SSL exchange and connection reset.

I wondered if maybe the server needed to speak SSL2, so enabled that. Wireshark reported that the server really didn’t like that either – SSL2 start, SSL3 reset. So, it wants SSL3, but what else?

I poked around in the Firefox about:config page for SSL-related items and found a bunch that are disabled by default – less-secure options that are normally not used. Except for talking secure HTTP to pSeries ASMI, that is.

Long story short, if you need to use Firefox to access one of these IBM ASMI via web, the option that worked for me was to enable:

security.ssl3.rsa_rc4_40_md5

I’m guessing that this is because it’s a low-strength cipher that can be easily exported. In any event, that was the last piece of the puzzle I needed to get management access to this box. Maybe it will save someone a few days’ work.

Posted in Uncategorized | 8 Comments »

ACE V6 timestamp format changing to ISO-8601

December 10, 2010

Who says persistence doesn’t pay off?

Thomas Lockhart is a long-time ACE user with some particularly hard-won expertise in date-time formatting. He suggested back in January 2008 that timestamps produced by ACE’s %D logging spec, as well as from the ACE::timestamp() method, switch from the UNIX-y “Day Mon dd yyyy hh:mm:ss.uuuuuu” format to the ISO-8601 format “yyyy-mm-dd hh:mm:ss.uuuuuu”. He even included a patch to implement the change. A number of other Bugzilla requests were attached to Thomas’s as dependents. It was clearly a worthwhile improvement but it languished for nearly three years. Thomas even updated the patch from the original ACE 5.6.3 base to the current 5.8.3 base. Still nothing. What else could we want? Well, just time, I guess. There were always more urgent matters pressing…

One day last week I had some free time and decided to jump in and apply Thomas’s patch. Why? I don’t know… it seemed like a well-reasoned idea that solved a number of problems, and we were preparing to begin the release process for ACE 6. I figured if it didn’t happen now, it probably would languish for years more.

As soon as I began, Thomas renewed the patch, examined the dependent TAO code which may be affected, and resolved those issues as well. All in all, a couple of days later, it was done. If you upgrade to ACE 6 you’ll see the new timestamps most notably if you have code that uses the %D specifier for logging.

Thanks very much to Thomas Lockhart for the well-reasoned request, for the patches, and most of all, for the persistence.

See, persistence does pay off.

If you don’t have three years, though, I have ways to speed up the process. Talk to me…

Tags: ACE, ISO-8601, logging, timestamp
Posted in ACE | Leave a Comment »

Trouble with ACE and IPv6? Make Sure Your Config is Consistent

July 2, 2010

I just spent about 5 hours over the week debugging a stack corruption problem. The MSVC debugger was politely telling me the stack was corrupted in the area of an ACE_INET_Addr object instantiated on the stack. But all I did was create it then return from the method. So the problem had to be localized pretty well. Also, I was seeing the problem but none of the students in the class I was working this example for saw the problem. So it was somehow specific to my ACE build.

I stepped through the ACE_INET_Addr constructor and watched it clear the contents of the internal sockaddr to zeroes. Fine. I noted it was clearing 28 bytes and setting address family 23. “IPv6. Ok, ” I thought. But I knew the stack was being scribbled on outside the bounds of that ACE_INET_Addr object. I checked to see if ACE somehow had a bad definition of sockaddr_in6. After rummaging around ACE and Windows SDK headers I was pretty sure that wasn’t it. But there was definitely some confusion on the size of what needed to be cleared.

If you haven’t looked at the ACE_INET_Addr internals (and, really, why would you?), when ACE is built with IPv6 support (the ACE_HAS_IPV6 setting) the internal sockaddr is a union of socketaddr_in and sockaddr_in6 so both IPv4 and IPv6 can be supported. The debugger inside the ACE_INET_Addr constructor was showing me both v4 and v6 members of the union. But as I stepped out of the ACE_INET_Addr constructor back to the example application, the debugger changed to only showing the IPv4 part. Hmmm… why is that? The object back in the example is suddenly looking smaller (since the sockaddr_in6 structure is larger than the sockaddr_in structure, the union gets smaller when you leave out the sockaddr_in6). Ok, so now I know why the stack is getting smashed… I’m passing a IPv4-only ACE_INET_Addr object to a method that thinks it’s getting a IPv4-or-IPv6 object which is larger. But why?

I checked my $ACE_ROOT/ace/config.h since that’s where ACE config settings usually are. No ACE_HAS_IPV6 setting there. Did the ACE-supplied Windows configs add it in somewhere sneakily? Nope. I checked the ACE.vcproj file ACE was built with. Ah-ha… in the compile preprocessor settings there it is – ACE_HAS_IPV6.

AAAAARRRRRGGGGGGG!!!!! Now I remember where it came from. IPv6 support is turned on/off in the MPC-generated Visual Studio projects using an MPC feature setting, ipv6=1 (this is because some parts of ACE and tests aren’t included without the ipv6 feature). When I generated the ACE projects that setting was used, but when I generated the example program’s projects it wasn’t. So the uses of ACE_INET_Addr in the example had only the IPv4 support, but were passed to an ACE build that was expecting both IPv4 and IPv6 support – a larger object.

Solution? Regenerate the example’s projects with the same MPC feature file ACE’s projects were generated with. That made all the settings consistent between ACE and my example programs. No more stack scribbling.

Posted in ACE, tcp/ip, windows | 15 Comments »

Resolving the CPU-bound ACE_Dev_Poll_Reactor Problem, and more

February 5, 2010

I previously wrote about improvements to ACE_Dev_Poll_Reactor I made for ACE 5.7. The improvements were important for large-scale uses of ACE_Dev_Poll_Reactor, but introduced a problem where some applications went CPU bound, particularly on CentOS. I have made further improvements in ACE_Dev_Poll_Reactor to resolve the CPU-bound issue as well as to further improve performance. These changes will be in the ACE 5.7.7 micro release; the customer that funded the improvements is running load and performance tests on them now.

Here’s what was changed to improve the performance:

Change the notify handler so it’s not suspended/resumed around callbacks like normal event handlers are.
Delay resuming an auto-suspended handle until the next call to epoll_wait().

I’ll describe more about each point separately.

Don’t Suspend/Resume the Notify Handler

All of the Reactor implementations in ACE have an event handler that responds to reactor notifications. Most of the implementations (such as ACE_Select_Reactor and ACE_TP_Reactor) pay special attention to the notify handler when dispatching events because notifications are always dispatched before I/O events. However, the ACE_Dev_Poll_Reactor does not make the same effort to dispatch notifications before I/O; they’re intermixed as the epoll facility dequeues events in response to epoll_wait() calls. Thus, there was little special-cased code for the notify handler when event dispatching happened. When event handler dispatching was changed to automatically suspend and resume handlers around upcalls, the notify handler was also suspended and resumed. This is actually where the CPU-bound issues came in – when the dispatched callback returned to the reactor, the dispatching thread needs to reacquire the reactor token so it can change internal reactor state required to verify the handler and resume it. Acquiring the reactor token can involve a reactor notification if another thread is currently executing the event dispatching loop. (Can you see it coming?) It was possible for the notify handler to be resumed, which caused a notify, which dispatched the notify handler, which required another resume, which caused a notify, which… ad infinitum.

The way I resolved this was to simply not suspend/resume the notify handler. This removed the source of the infinite notifications and CPU times came back down quickly.

Delay Resuming an Auto-Suspended Handle

Before beginning the performance improvement work, I wrote a new test, Reactor_Fairness_Test. This test uses a number of threads to run the reactor event loop and drives traffic at a set of TCP sockets as fast as possible for a fixed period of time. At the end of the time period, the number of data chunks received at each socket is compared; the counts should all be pretty close. I ran this test with ACE_Select_Reactor (one dispatching thread), ACE_TP_Reactor, and ACE_Dev_Poll_Reactor initially. This was important because the initial customer issue I was working on was related to fairness in dispatching events. ACE_Dev_Poll_Reactor’s fairness is very good but the performance needed to go up.

With the notify changes from above, the ACE_Dev_Poll_Reactor performance went up, to slightly better than ACE_TP_Reactor (and the test uses a relatively small number of sockets). However, while examining strace output for the test run I noticed that there were still many notifies causing a lot of event dispatching that was slowing the test down.

As I described above, when the reactor needs to resume a handler after its callback completes, it must acquire the reactor token (the token is released during the event callback to the handler). This often requires a notify, but even when it doesn’t, the dispatching thread needs to wait for the token just to change some state, then release the token, then go around the event processing loop again which requires it to wait for the token again – a lot of token thrashing that would be great to remove.

The plan I settled on was to keep a list of handlers that needed to be resumed; instead of immediately resuming the handler upon return from the upcall, add the handler to the to-be-resumed list. This only requires a mutex instead of the reactor token, so there’s no possibility of triggering another notify, and there’s little contention for the mutex in other parts of the code. The dispatching thread could quickly add the entry to the list and get back in line for dispatching more events.

The second part of the to-be-resumed list is that a thread that is about to call epoll_wait() to get the next event will first (while holding the reactor token it already had in order to get to epoll_wait()) walk the to-be-resumed list and resume any handlers in the list that are still valid (they may have been canceled or explicitly resumed by the application in the meantime).

After this improvement was made, my reactor fairness test showed still excellent fairness on the ACE_Dev_Poll_Reactor, but with about twice the throughput. This with about 1/2 the CPU usage. These results were gathered in a less than scientific measurements and with a specific usage pattern – your mileage may vary. But if you’ve been scared away from ACE_Dev_Poll_Reactor by the discussions of CPU-bound applications getting poor performance, it’s time to take another look at ACE_Dev_Poll_Reactor.

Posted in ACE, Reactor, tcp/ip, threads | 9 Comments »