I previously wrote about improvements to ACE_Dev_Poll_Reactor I made for ACE 5.7. The improvements were important for large-scale uses of ACE_Dev_Poll_Reactor, but introduced a problem where some applications went CPU bound, particularly on CentOS. I have made further improvements in ACE_Dev_Poll_Reactor to resolve the CPU-bound issue as well as to further improve performance. These changes will be in the ACE 5.7.7 micro release; the customer that funded the improvements is running load and performance tests on them now.
Here’s what was changed to improve the performance:
- Change the notify handler so it’s not suspended/resumed around callbacks like normal event handlers are.
- Delay resuming an auto-suspended handle until the next call to epoll_wait().
I’ll describe more about each point separately.
Don’t Suspend/Resume the Notify Handler
All of the Reactor implementations in ACE have an event handler that responds to reactor notifications. Most of the implementations (such as ACE_Select_Reactor and ACE_TP_Reactor) pay special attention to the notify handler when dispatching events because notifications are always dispatched before I/O events. However, the ACE_Dev_Poll_Reactor does not make the same effort to dispatch notifications before I/O; they’re intermixed as the epoll facility dequeues events in response to epoll_wait() calls. Thus, there was little special-cased code for the notify handler when event dispatching happened. When event handler dispatching was changed to automatically suspend and resume handlers around upcalls, the notify handler was also suspended and resumed. This is actually where the CPU-bound issues came in – when the dispatched callback returned to the reactor, the dispatching thread needs to reacquire the reactor token so it can change internal reactor state required to verify the handler and resume it. Acquiring the reactor token can involve a reactor notification if another thread is currently executing the event dispatching loop. (Can you see it coming?) It was possible for the notify handler to be resumed, which caused a notify, which dispatched the notify handler, which required another resume, which caused a notify, which… ad infinitum.
The way I resolved this was to simply not suspend/resume the notify handler. This removed the source of the infinite notifications and CPU times came back down quickly.
Delay Resuming an Auto-Suspended Handle
Before beginning the performance improvement work, I wrote a new test, Reactor_Fairness_Test. This test uses a number of threads to run the reactor event loop and drives traffic at a set of TCP sockets as fast as possible for a fixed period of time. At the end of the time period, the number of data chunks received at each socket is compared; the counts should all be pretty close. I ran this test with ACE_Select_Reactor (one dispatching thread), ACE_TP_Reactor, and ACE_Dev_Poll_Reactor initially. This was important because the initial customer issue I was working on was related to fairness in dispatching events. ACE_Dev_Poll_Reactor’s fairness is very good but the performance needed to go up.
With the notify changes from above, the ACE_Dev_Poll_Reactor performance went up, to slightly better than ACE_TP_Reactor (and the test uses a relatively small number of sockets). However, while examining strace output for the test run I noticed that there were still many notifies causing a lot of event dispatching that was slowing the test down.
As I described above, when the reactor needs to resume a handler after its callback completes, it must acquire the reactor token (the token is released during the event callback to the handler). This often requires a notify, but even when it doesn’t, the dispatching thread needs to wait for the token just to change some state, then release the token, then go around the event processing loop again which requires it to wait for the token again – a lot of token thrashing that would be great to remove.
The plan I settled on was to keep a list of handlers that needed to be resumed; instead of immediately resuming the handler upon return from the upcall, add the handler to the to-be-resumed list. This only requires a mutex instead of the reactor token, so there’s no possibility of triggering another notify, and there’s little contention for the mutex in other parts of the code. The dispatching thread could quickly add the entry to the list and get back in line for dispatching more events.
The second part of the to-be-resumed list is that a thread that is about to call epoll_wait() to get the next event will first (while holding the reactor token it already had in order to get to epoll_wait()) walk the to-be-resumed list and resume any handlers in the list that are still valid (they may have been canceled or explicitly resumed by the application in the meantime).
After this improvement was made, my reactor fairness test showed still excellent fairness on the ACE_Dev_Poll_Reactor, but with about twice the throughput. This with about 1/2 the CPU usage. These results were gathered in a less than scientific measurements and with a specific usage pattern – your mileage may vary. But if you’ve been scared away from ACE_Dev_Poll_Reactor by the discussions of CPU-bound applications getting poor performance, it’s time to take another look at ACE_Dev_Poll_Reactor.