Archive for the ‘ACE’ Category

Diagnosing Stack/Heap Collision on AIX

April 29, 2011

I was recently confronted with a program that mysteriously aborted (Trace/BPT trap) at run time on AIX 7.1 (but not on AIX 6.1). Usually. But not on all systems or on all build settings.

This program is the ACE Message_Queue_Test; in particular, the stress test I added to it to ensure that blocks are counted properly when enqueues and dequeues are happening in different combinations from different threads. It ended up not being particular to ACE, but I did add a change to the test’s build settings to account for this issue. But I’m getting ahead of myself…

The symptoms were that after the queue writer threads had been running a while and the reader threads started to exit, a writer thread would hit a Trace/BPT trap. The ACE_Task in this thread had its members all zeroed out, including the message queue pointer, leading to the trap. I tried setting debug watches on the task content but still no real clues.

Yes, the all-zeroes contents of the wiped stack should have tipped me off, but hind-sight is always 20-20.

The other confusion was that the same program built on AIX 6.1 would run fine. But copy it over to AIX 7.1, and crash! So, I opened a support case with IBM AIX support to report the brokenness of the binary compatibility from AIX 6.1 to 7.1. “There. That’s off to IBM’s hands,” I thought. “I hope it isn’t a total pain to get a fix from them. Let’s see what Big Blue can do.”

If you’ve been reading this blog for a while you may recall another support experience I related here, from a different support-providing company that wears hats of a different color than Big Blue. As you may recall, I was less than impressed.

Within hours I got a response that IBM had reproduced the problem. Although they could crash my program on AIX 7.1 and 6.1. They wanted a test case, preprocessed source, to get more info. I responded that they could download the whole 12 MB ACE source kit – the source is in there. Meanwhile I set off to narrow down the code into a small test case, imagining the whole AIX support team laughing hysterically about this joker who wanted them to download a 12 MB tarball to help diagnose a case.

I came back from lunch yesterday gearing up to get my test case together. There was email from IBM support. “Is this where they remind me that they want a small test case?” I wondered.

Nope. The email contained the dbg steps they used to diagnose the problem (which was mine), the 3 choices of ways to resolve the problem, and pointers to the AIX docs that explained all the background.

Wow.

AIX support rocks. I mean, I very often help customers diagnose problems under ACE support that end up being problems in the customer’s code. But I’ve never experienced that from any other company. Really. Outstanding.

So what was the problem in the end? The segment 2 memory area, which holds both the heap and the process stacks, was overflowing. The program was allocating enough memory to cause the heap to run over the stacks. (Remember the zeroed-out stack content? The newly allocated memory was being cleared.)

This is how the diagnosis went:

(dbx) run

Trace/BPT trap in
ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*) at line  39 in file "Task_T.inl" ($t12)
39     return this->msg_queue_->enqueue_tail (mb, tv);
(dbx) list 36,42
36   ACE_Task<ACE_SYNCH_USE>::putq (ACE_Message_Block *mb,
ACE_Time_Value *tv)
37   {
38     ACE_TRACE ("ACE_Task<ACE_SYNCH_USE>::putq");
39     return this->msg_queue_->enqueue_tail (mb, tv);
40   }
41
42   template <ACE_SYNCH_DECL> ACE_INLINE int

(dbx) 0x10000f20/12 i
0x10000f20
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*))
7c0802a6        mflr   r0
0x10000f24
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x4)
9421ffc0        stwu   r1,-64(r1)
0x10000f28
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x8)
90010048         stw   r0,0x48(r1)
0x10000f2c
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0xc)
90610058         stw   r3,0x58(r1)
0x10000f30
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x10)
9081005c         stw   r4,0x5c(r1)
0x10000f34
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x14)
90a10060         stw   r5,0x60(r1)
0x10000f38
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x18)
80610058         lwz   r3,0x58(r1)
0x10000f3c
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x1c)
0c430200      twllti   r3,0x200
0x10000f40
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x20)
80610058         lwz   r3,0x58(r1)
0x10000f44
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x24)
806300a4         lwz   r3,0xa4(r3)
0x10000f48
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x28)
0c430200      twllti   r3,0x200
0x10000f4c
(ACE_Task<ACE_MT_SYNCH>::putq(ACE_Message_Block*,ACE_Time_Value*)+0x2c)
80630000         lwz   r3,0x0(r3)

(dbx) 0x2FF2289C/4 x
0x2ff2289c:  0000 0000 0000 0000

(dbx) malloc
The following options are enabled:

Implementation Algorithm........ Default Allocator (Yorktown)

Statistical Report on the Malloc Subsystem:
Heap 0
heap lock held by................ pthread ID 0x200248e8
bytes acquired from sbrk().......    267402864 <***!!!
bytes in the freespace tree......        15488
bytes held by the user...........    267387376
allocations currently active.....      4535796
allocations since process start..      9085824

The Process Heap
Initial process brk value........ 0x2001e460
current process brk value........ 0x2ff222d0 <***!!!
sbrk()s called by malloc.........       4071

*** Heap has reached the upper limit of segment 0x2 and
collided with the initial thread's stack.
Changing the executable to a 'large address model' 32bit
exe should resolve the problem (in other words give
it more heap space).

# ldedit -b maxdata:0x20000000 MessageQueueTest
ldedit:  File MessageQueueTest updated.
# dump -ov MessageQueueTest

MessageQueueTest:

***Object Module Header***
# Sections      Symbol Ptr      # Symbols       Opt Hdr Len     Flags
6      0x004cde82         142781                72     0x1002
Flags=( EXEC DYNLOAD DEP_SYSTEM )
Timestamp = "Apr 23 14:51:24 2011"
Magic = 0x1df  (32-bit XCOFF)

***Optional Header***
Tsize        Dsize       Bsize       Tstart      Dstart
0x001b7244  0x0001d8ec  0x000007b8  0x10000178  0x200003bc

SNloader     SNentry     SNtext      SNtoc       SNdata
0x0004      0x0002      0x0001      0x0002      0x0002

TXTalign     DATAalign   TOC         vstamp      entry
0x0007      0x0003      0x2001cc40  0x0001      0x20017f7c

maxSTACK     maxDATA     SNbss       magic       modtype
0x00000000  0x20000000  0x0003      0x010b        1L
# ./MessageQueueTest
#                     <-- NO CRASH!

Summary: Increasing the default heap space from 256M(approx.) to 512M resolved the problem.  IBM gave me three ways to resolve this:

  1. Edit the executable as above with ldedit
  2. Relink the executable with -bmaxdata:0x20000000
  3. Set environment variable LDR_CNTRL=MAXDATA=0x20000000

I ended up changing the Message_Queue_Test’s MPC description to add -bmaxdata to the build. That was the easiest way to always get it correct and make it easier for the regression test suite to execute the program.

Lastly, here’s the link IBM gave me for the ‘large address model’:

http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/lrg_prg_support.htm

Bottom line – the test is running, the project is done, I have a sunny afternoon to write this blog entry and enjoy the nice New England spring day – instead of narrowing down a test case. Thanks, IBM!

ace/OS.h is Back!

March 9, 2011

A lot of previously deprecated classes, methods, and header files were removed from ACE for the release of ACE 6.0 in December 2010. Since then, there’s been a steady stream of questions, from both Riverace support customers and user group members along the lines of “Where the $%^@* is OS.h?!”

Oops…

Some explanation is in order.

OS.h is one of the oldest files in ACE and it had grown quite unwieldy and large over the years. The past few years have seen an effort to reduce build times for ACE and OS.h was a prime target of that work. Most of its content had been reorganized and moved to a number of new files with the prefix OS_NS (e.g., OS_NS_string.h) leaving OS.h as primarily a collection of #include “ace/OS_NS_…h” lines. A mere shell of its former ugly self.

OS.h had been marked “deprecated” for years and was finally removed during the housecleaning for ACE 6. However, the number of ACE apps in the field that still include OS.h was not taken enough into account. Hence, the steady stream of complaints regarding its absence.

We heard you loud and clear. I resurrected OS.h today and it will be back in ACE for the 6.0.2 micro release as well as the 6.0b Fix Kit for Riverace support customers. It’s also available as attachment to the “Missing header files building against ACE 6.0” solution in the Riverace Knowledge Base.

ACE V6 timestamp format changing to ISO-8601

December 10, 2010

Who says persistence doesn’t pay off?

Thomas Lockhart is a long-time ACE user with some particularly hard-won expertise in date-time formatting. He suggested back in January 2008 that timestamps produced by ACE’s %D logging spec, as well as from the ACE::timestamp() method, switch from the UNIX-y “Day Mon dd yyyy hh:mm:ss.uuuuuu” format to the ISO-8601 format “yyyy-mm-dd hh:mm:ss.uuuuuu”. He even included a patch to implement the change. A number of other Bugzilla requests were attached to Thomas’s as dependents. It was clearly a worthwhile improvement but it languished for nearly three years. Thomas even updated the patch from the original ACE 5.6.3 base to the current 5.8.3 base.  Still nothing. What else could we want? Well, just time, I guess.  There were always more urgent matters pressing…

One day last week I had some free time and decided to jump in and apply Thomas’s patch. Why? I don’t know… it seemed like a well-reasoned idea that solved a number of problems, and we were preparing to begin the release process for ACE 6. I figured if it didn’t happen now, it probably would languish for years more.

As soon as I began, Thomas renewed the patch, examined the dependent TAO code which may be affected, and resolved those issues as well. All in all, a couple of days later, it was done. If you upgrade to ACE 6 you’ll see the new timestamps most notably if you have code that uses the %D specifier for logging.

Thanks very much to Thomas Lockhart for the well-reasoned request, for the patches, and most of all, for the persistence.

See, persistence does pay off.

If you don’t have three years, though, I have ways to speed up the process. Talk to me…

Trouble with ACE and IPv6? Make Sure Your Config is Consistent

July 2, 2010

I just spent about 5 hours over the week debugging a stack corruption problem. The MSVC debugger was politely telling me the stack was corrupted in the area of an ACE_INET_Addr object instantiated on the stack. But all I did was create it then return from the method. So the problem had to be localized pretty well. Also, I was seeing the problem but none of the students in the class I was working this example for saw the problem. So it was somehow specific to my ACE build.

I stepped through the ACE_INET_Addr constructor and watched it clear the contents of the internal sockaddr to zeroes. Fine. I noted it was clearing 28 bytes and setting address family 23. “IPv6. Ok, ” I thought. But I knew the stack was being scribbled on outside the bounds of that ACE_INET_Addr object. I checked to see if ACE somehow had a bad definition of sockaddr_in6. After rummaging around ACE and Windows SDK headers I was pretty sure that wasn’t it. But there was definitely some confusion on the size of what needed to be cleared.

If you haven’t looked at the ACE_INET_Addr internals (and, really, why would you?), when ACE is built with IPv6 support (the ACE_HAS_IPV6 setting) the internal sockaddr is a union of socketaddr_in and sockaddr_in6 so both IPv4 and IPv6 can be supported. The debugger inside the ACE_INET_Addr constructor was showing me both v4 and v6 members of the union. But as I stepped out of the ACE_INET_Addr constructor back to the example application, the debugger changed to only showing the IPv4 part. Hmmm… why is that? The object back in the example is suddenly looking smaller (since the sockaddr_in6 structure is larger than the sockaddr_in structure, the union gets smaller when you leave out the sockaddr_in6). Ok, so now I know why the stack is getting smashed… I’m passing a IPv4-only ACE_INET_Addr object to a method that thinks it’s getting a IPv4-or-IPv6 object which is larger. But why?

I checked my $ACE_ROOT/ace/config.h since that’s where ACE config settings usually are. No ACE_HAS_IPV6 setting there. Did the ACE-supplied Windows configs add it in somewhere sneakily? Nope. I checked the ACE.vcproj file ACE was built with. Ah-ha… in the compile preprocessor settings there it is – ACE_HAS_IPV6.

AAAAARRRRRGGGGGGG!!!!! Now I remember where it came from. IPv6 support is turned on/off in the MPC-generated Visual Studio projects using an MPC feature setting, ipv6=1 (this is because some parts of ACE and tests aren’t included without the ipv6 feature). When I generated the ACE projects that setting was used, but when I generated the example program’s projects it wasn’t. So the uses of ACE_INET_Addr in the example had only the IPv4 support, but were passed to an ACE build that was expecting both IPv4 and IPv6 support – a larger object.

Solution? Regenerate the example’s projects with the same MPC feature file ACE’s projects were generated with. That made all the settings consistent between ACE and my example programs. No more stack scribbling.

Resolving the CPU-bound ACE_Dev_Poll_Reactor Problem, and more

February 5, 2010

I previously wrote about improvements to ACE_Dev_Poll_Reactor I made for ACE 5.7. The improvements were important for large-scale uses of ACE_Dev_Poll_Reactor, but introduced a problem where some applications went CPU bound, particularly on CentOS. I have made further improvements in ACE_Dev_Poll_Reactor to resolve the CPU-bound issue as well as to further improve performance. These changes will be in the ACE 5.7.7 micro release; the customer that funded the improvements is running load and performance tests on them now.

Here’s what was changed to improve the performance:

  • Change the notify handler so it’s not suspended/resumed around callbacks like normal event handlers are.
  • Delay resuming an auto-suspended handle until the next call to epoll_wait().

I’ll describe more about each point separately.

Don’t Suspend/Resume the Notify Handler

All of the Reactor implementations in ACE have an event handler that responds to reactor notifications. Most of the implementations (such as ACE_Select_Reactor and ACE_TP_Reactor) pay special attention to the notify handler when dispatching events because notifications are always dispatched before I/O events. However, the ACE_Dev_Poll_Reactor does not make the same effort to dispatch notifications before I/O; they’re intermixed as the epoll facility dequeues events in response to epoll_wait() calls. Thus, there was little special-cased code for the notify handler when event dispatching happened. When event handler dispatching was changed to automatically suspend and resume handlers around upcalls, the notify handler was also suspended and resumed. This is actually where the CPU-bound issues came in – when the dispatched callback returned to the reactor, the dispatching thread needs to reacquire the reactor token so it can change internal reactor state required to verify the handler and resume it. Acquiring the reactor token can involve a reactor notification if another thread is currently executing the event dispatching loop. (Can you see it coming?) It was possible for the notify handler to be resumed, which caused a notify, which dispatched the notify handler, which required another resume, which caused a notify, which… ad infinitum.

The way I resolved this was to simply not suspend/resume the notify handler. This removed the source of the infinite notifications and CPU times came back down quickly.

Delay Resuming an Auto-Suspended Handle

Before beginning the performance improvement work, I wrote a new test, Reactor_Fairness_Test. This test uses a number of threads to run the reactor event loop and drives traffic at a set of TCP sockets as fast as possible for a fixed period of time. At the end of the time period, the number of data chunks received at each socket is compared; the counts should all be pretty close. I ran this test with ACE_Select_Reactor (one dispatching thread), ACE_TP_Reactor, and ACE_Dev_Poll_Reactor initially. This was important because the initial customer issue I was working on was related to fairness in dispatching events. ACE_Dev_Poll_Reactor’s fairness is very good but the performance needed to go up.

With the notify changes from above, the ACE_Dev_Poll_Reactor performance went up, to slightly better than ACE_TP_Reactor (and the test uses a relatively small number of sockets). However, while examining strace output for the test run I noticed that there were still many notifies causing a lot of event dispatching that was slowing the test down.

As I described above, when the reactor needs to resume a handler after its callback completes, it must acquire the reactor token (the token is released during the event callback to the handler). This often requires a notify, but even when it doesn’t, the dispatching thread needs to wait for the token just to change some state, then release the token, then go around the event processing loop again which requires it to wait for the token again – a lot of token thrashing that would be great to remove.

The plan I settled on was to keep a list of handlers that needed to be resumed; instead of immediately resuming the handler upon return from the upcall, add the handler to the to-be-resumed list. This only requires a mutex instead of the reactor token, so there’s no possibility of triggering another notify, and there’s little contention for the mutex in other parts of the code. The dispatching thread could quickly add the entry to the list and get back in line for dispatching more events.

The second part of the to-be-resumed list is that a thread that is about to call epoll_wait() to get the next event will first (while holding the reactor token it already had in order to get to epoll_wait()) walk the to-be-resumed list and resume any handlers in the list that are still valid (they may have been canceled or explicitly resumed by the application in the meantime).

After this improvement was made, my reactor fairness test showed still excellent fairness on the ACE_Dev_Poll_Reactor, but with about twice the throughput. This with about 1/2 the CPU usage. These results were gathered in a less than scientific measurements and with a specific usage pattern – your mileage may vary. But if you’ve been scared away from ACE_Dev_Poll_Reactor by the discussions of CPU-bound applications getting poor performance, it’s time to take another look at ACE_Dev_Poll_Reactor.

ACE 5.7 Changes You Should Know About, But (Probably) Don’t

June 29, 2009

ACE 5.7 was released last week (see my newsletter article for info including new platforms supported at ACE 5.7). It was a happy day for ACE users, as ACE 5.7 contains many important fixes, especially in the Service Configurator framework and the ACE_Dev_Poll_Reactor class. Indeed, a recent survey of Riverace’s ACE support customers revealed that nearly 53% are planning to upgrade to ACE 5.7 immediately.

As the rollouts started happening, a few unsuspected changes were noticed that users should know about. Riverace posted these notes in the ACE Knowledge Base last week but I’m also proactively describing them here because I suspect there may be more users out there who may trip over these issues as ACE 5.7 adoption picks up.

The LIB Makefile Variable

Many ACE users reuse the ACE GNU Makefile scheme to take advantage of ACE’s well-tuned system for knowing how to build ACE applications. This is a popular mechanism to reuse because it automatically picks up the ACE configuration settings and compiler options. This is very important to ensure that ACE-built settings match the application’s ACE-related settings for a successful build.

When reusing the ACE make scheme to build libraries the LIB makefile variable specifies the library name to build. This has worked for many years. However, during ACE 5.7 development support was added to the GNU Make scheme to enable its use on Windows with Visual C++. Visual C++ uses the LIB environment variable as a search path for libraries at link time, which clashed with the existing use of the LIB variable. Therefore, the old LIB variable was renamed to LIB_CHECKED. This broke existing builds.

Since the Windows case requires LIB be left available for Visual C++, the name change wasn’t reverted; however, I added a patch that reverts the LIB behavior on non-Windows build systems. The patch will be available in the ACE 5.7.1 bug-fix-only beta as well as in ACE 5.7a for Riverace support customers. If you’re stuck on this problem now, and you’re a support customer, open a case to get the patch immediately.

Note that if you generate your application’s project files with MPC instead of hand-coding to the ACE make scheme, you can avoid the LIB name problem by regenerating your projects after ACE 5.7 is installed.

Symlink Default for Build Libaries Changed from Absolute to Relative

When ACE builds libraries it can “install” them by creating a symbolic link. In ACE 5.6 and earlier, the link used an absolute path to the original file. In ACE 5.7 the default behavior changed to create a link relative to the $ACE_ROOT/ace directory. This helps to enable relocating an entire tree without breaking the links, but in some cases can cause invalid links. For example, if you are building your own libraries that do not get relocated with ACE, or won’t have the same directory hierarchy, the links will not be valid.

To change to the pre-5.7 behavior of creating links with absolute pathnames, build with the make variable symlinks=absolute. You can either specify symlinks=absolute on the make command line or add it to your $ACE_ROOT/include/makeinclude/platform_macros.GNU file prior to including wrapper_macros.GNU.

Revised ACE_Dev_Poll_Reactor Fixes Multithread Issues (and more!) on Linux

June 15, 2009

When ACE 5.7 is released this week it will contain an important fix (a number of them, actually) for use cases that rely on multiple threads running the Reactor event loop concurrently on Linux. The major fix areas involved for ACE_Dev_Poll_Reactor in ACE 5.7 are:

  • Dispatching events from multiple threads concurrently
  • Properly handling changes in handle registration during callbacks
  • Change in suspend/resume behavior to be more ACE_TP_Reactor-like

At the base of these fixes was a foundational change in the way ACE_Dev_Poll_Reactor manages events returned from Linux epoll. Prior to this change, ACE would obtain all ready events from epoll and then each event loop-executing thread in turn would pick the next event from that set and dispatch it. This design was, I suppose, more or less borrowed from the ACE_Select_Reactor event demultiplexing strategy. In that case it made sense since select() is relatively expensive and avoiding repeated scans of all the watched handles is a good thing. Also, the ACE_Select_Reactor (and ACE_TP_Reactor, which inherits from it) have a mechanism to note that something in the handle registrations changed, signifying that select() must be called again. This mechanism was lacking in ACE_Dev_Poll_Reactor.

However, unlike with select(), it’s completely unnecessary to try to avoid calls to epoll_wait(). Epoll is much more scalable than is select(), and letting epoll manage the event queue, only passing back one event at a time, is much simpler than the previous design, and also much easier to get correct. So that was the first change: obtain one event per call to epoll_wait(), letting Linux manage the event queue and weed out events for handles that are closed, etc. The second change was to add the EPOLLONESHOT option bit to the event registration for each handle. The effect of this is that once an event for a particular handle is delivered from epoll_wait(), that handle is effectively suspended. No more events for the handle will be delivered until the handle’s event mask is re-enabled via epoll_ctl(). These two changes were used to fix and extend ACE_Dev_Poll_Reactor as follows.

Dispatching Events from Multiple Threads Concurrently

The main defect in the previous scheme was the possibility that events obtained from epoll_wait() could be delivered to an ACE_Event_Handler object that no longer existed. This was the primary driver for fixing ACE_Dev_Poll_Reactor. However, another less likely, but still possible, situation was that callbacks for a handler could be called out of order, triggering time-sensitive ordering problems that are very difficult to track down. Both these situations are resolved by only obtaining one I/O event per ACE_Reactor::handle_events() iteration. A side-effect of this change is that the concurrency behavior of ACE_Dev_Poll_Reactor changes from being similar to ACE_WFMO_Reactor (simultaneous callbacks to the same handler are possible) to being similar to ACE_TP_Reactor (only one I/O callback for a particular handle at a time). Since epoll’s behavior with respect to when a handle’s availability for more events differs from Windows’s WaitForMultipleObjects, the old multiple-concurrent-calls-per-handle couldn’t be done correctly anyway, so the new ACE_Dev_Poll_Reactor behavior leads to easier coding and programs that are much more likely to be correct when changing reactor use between platforms.

Properly handling changes in handle registration during callbacks

A difficult problem to track down sometimes arose in the previous design when a callback handler changed handle registration. In such a case, if the reactor made a subsequent callback to the original handler (for example, if the callback returned -1 and needed to be removed) the callback may be made to the wrong handler – the new registered handler instead of the originally called handler. This problem was fixed by making some changes and additions to the dispatching data structures and code and is no longer an issue.

Change in suspend/resume behavior to be more ACE_TP_Reactor-like

An important aspect of ACE_TP_Reactor’s ability to support complicated use cases arising in systems such as TAO is that a dispatched I/O handler is suspended around the upcall. This prevents multiple events from being dispatched simultaneously. As previously mentioned, the changes to ACE_Dev_Poll_Reactor also effectively suspend a handler around an upcall. However, a feature once only available with the ACE_TP_Reactor is that an application can specify that the application,  not the ACE reactor, will resume the suspended handler. This capability is important to properly supporting the nested upcall capability in TAO, for example. The revised ACE_Dev_Poll_Reactor now also has this capability. Once the epoll changes were made to effectively suspend a handler around an upcall, taking advantage of the existing suspend-resume setting in ACE_Event_Handler was pretty straight-forward.

So, if you’ve been holding off on using ACE_Dev_Poll_Reactor on Linux because it was unstable with multiple threads, or you didn’t like the concurrency behavior and the instability it may bring, I encourage you to re-evaluate this area when ACE 5.7 is released this week. And if you’ve ever wondered what good professional support services are, I did this work for a support customer who is very happy they didn’t have to pay hourly for this. And many more people will be happy that since I wasn’t billing for time I could freely fix tangential issues not in the original report such as the application-resume feature. Everyone wins: the customer’s problem is resolved and ACE’s overall product quality and functionality are improved. Enjoy!

Analysis of ACE_Proactor Shortcomings on UNIX

January 22, 2009

I’ve been looking into two related issues in the ACE development stream:

  1. SSL_Asynch_Stream_Test times out on HP-UX (I recently made a bunch of fixes to the test itself so it runs as well as can be on Linux, but times out on HP-UX)
  2. Proactor_Test shows a stray, intermittent diagnostic on HP-UX: EINVAL returned from aio_suspend()

Although I’ve previously discussed use of ACE_Proactor on Linux (https://stevehuston.wordpress.com/2008/11/25/when-is-it-ok-to-use-ace-proactor-on-linux/) the issues on HP-UX are of a different sort. If the previously discussed Linux aio issues are resolved inside Linux, the same problem I’m seeing on HP-UX may also arise, but it doesn’t get that far. Also, I suspect that the issues arising from these tests’ execution on Solaris are of the same nature, though the symptoms are a bit different.

The symptoms are that the proactor event loop either fails to detect completions, or it gets random errors that smell like the aiocb list is damaged. I believe I’ve got a decent idea of what’s going on, and it’s basically two issues:

  1. If all of the completion dispatch threads are blocked waiting for completions when new I/O is initiated, the new operation(s) are  not taken into account by the threads waiting for completions. This is basically the case in the SSL_Asynch_Stream_Test timeout on HP-UX – all the completion-detecting threads are already running before any I/O is initiated and no completions are ever detected.
  2. The completion and initiation activities modify the aiocb list used to detect completions directly, without interlocks, and without consideration of what affect it may have (or not) on the threads waiting for completions.

The ACE_Reactor framework uses internal notifications to handle the need to unblock waiting demultiplexing threads so they can re-examine the handle set as needed; something similar is needed for the ACE_Proactor to remedy issue #1 above. There is a notification pipe facility in the proactor code, but I need to see if it can be used in this case. I hope so…

The other problem, of concurrent access to the aiocb list by threads both waiting for completions and modifying the list is a much larger problem. That requires more of a fundamental change in the innards of the POSIX Proactor implementation.

Note that there are a number of POSIX Proactor flavors inside ACE (section 8.5 in C++NPv2 describes most of them). The particular shortcomings I’ve noted here only affect the ACE_POSIX_AIOCB_Proactor and ACE_POSIX_SIG_Proactor, which is largely based on the ACE_POSIX_AIOCB_Proactor. The newest one, ACE_POSIX_CB_Proactor, is much less affected, but is not as widely available.

So, the Proactor situation on UNIX platforms is generally not too good for demanding applications. Again, Proactor on Windows is very good, and recommended for high-performance, highly scalable networked applications. On Linux, stick to ACE_Reactor using the ACE_Dev_Poll_Reactor implementation; on other systems, stick with ACE_Reactor and ACE_Select_Reactor or ACE_TP_Reactor depending on your need for multithreaded dispatching.

When is it ok to use ACE Proactor on Linux?

November 25, 2008

The ACE Proactor framework (see C++NPv2 chapter 8, APG chapter 8 ) allows multiple I/O operations to be initiated and completed by a single thread. The idea is a good one, allowing a small number of threads to execute more I/O operations than could be done synchronously since the OS handles the actual transfers in the background. Many Windows programmers use this paradigm with overlapped I/O very effectively.

The overlapped I/O facility is used by ACE Proactor on Windows, and when the time comes for many to port their ACE-based application to Linux (or Solaris, or HP-UX, or AIX, or…) they naturally gravitate toward carrying the Proactor model to Linux. Seems safe, since Linux offers the aio facility, so off they go.

And then it happens. I/O locks up and all progress stops. Why? Because the aio facility upon which ACE Proactor builds is very restricted for socket I/O on Linux (at least through the Linuxes I’ve worked on). The issue is that the I/O operations initiated using aio from the application are silently converted to synchronous and executed in order based on the handle used. To see why this is a problem, consider the following common asynch I/O idiom:

  1. Open socket
  2. Initiate a read (whether expecting data or desiring to sense a broken connection)
  3. Initiate a write that should immediately complete

When the aio operations are converted to synchronous, the read is executed first, in a blocking manner. The write (which really has data to send) will not execute until the read completes. Consider the situation if the peer is waiting for an initial protocol exchange before sending any data. The peer is waiting for the local end to send data, but the local end’s data won’t actually send until the peer sends. But this will never happen, and we have deadlock.

The only way to make a Proactor-based application work on Linux is to follow a strict lock-step protocol. A ping-pong model, if you will. Each side may only have one operation outstanding on the socket at a time. This is fairly fragile and doesn’t suit many applications well. But if you do have such an application model, you can safely use ACE Proactor on Linux.

Note that the aio facility and, thus, ACE Proactor, on HP-UX, AIX, etc. does not suffer from this “silently converts ops to synchronous” problem, so these restrictions don’t apply there.

For an example of how to program this lock-step protocol arrangement, see the ACE_wrappers/tests/Proactor_Test.cpp program – it has an option to run half-duplex.