Resolving the CPU-bound ACE_Dev_Poll_Reactor Problem, and more

I previously wrote about improvements to ACE_Dev_Poll_Reactor I made for ACE 5.7. The improvements were important for large-scale uses of ACE_Dev_Poll_Reactor, but introduced a problem where some applications went CPU bound, particularly on CentOS. I have made further improvements in ACE_Dev_Poll_Reactor to resolve the CPU-bound issue as well as to further improve performance. These changes will be in the ACE 5.7.7 micro release; the customer that funded the improvements is running load and performance tests on them now.

Here’s what was changed to improve the performance:

  • Change the notify handler so it’s not suspended/resumed around callbacks like normal event handlers are.
  • Delay resuming an auto-suspended handle until the next call to epoll_wait().

I’ll describe more about each point separately.

Don’t Suspend/Resume the Notify Handler

All of the Reactor implementations in ACE have an event handler that responds to reactor notifications. Most of the implementations (such as ACE_Select_Reactor and ACE_TP_Reactor) pay special attention to the notify handler when dispatching events because notifications are always dispatched before I/O events. However, the ACE_Dev_Poll_Reactor does not make the same effort to dispatch notifications before I/O; they’re intermixed as the epoll facility dequeues events in response to epoll_wait() calls. Thus, there was little special-cased code for the notify handler when event dispatching happened. When event handler dispatching was changed to automatically suspend and resume handlers around upcalls, the notify handler was also suspended and resumed. This is actually where the CPU-bound issues came in – when the dispatched callback returned to the reactor, the dispatching thread needs to reacquire the reactor token so it can change internal reactor state required to verify the handler and resume it. Acquiring the reactor token can involve a reactor notification if another thread is currently executing the event dispatching loop. (Can you see it coming?) It was possible for the notify handler to be resumed, which caused a notify, which dispatched the notify handler, which required another resume, which caused a notify, which… ad infinitum.

The way I resolved this was to simply not suspend/resume the notify handler. This removed the source of the infinite notifications and CPU times came back down quickly.

Delay Resuming an Auto-Suspended Handle

Before beginning the performance improvement work, I wrote a new test, Reactor_Fairness_Test. This test uses a number of threads to run the reactor event loop and drives traffic at a set of TCP sockets as fast as possible for a fixed period of time. At the end of the time period, the number of data chunks received at each socket is compared; the counts should all be pretty close. I ran this test with ACE_Select_Reactor (one dispatching thread), ACE_TP_Reactor, and ACE_Dev_Poll_Reactor initially. This was important because the initial customer issue I was working on was related to fairness in dispatching events. ACE_Dev_Poll_Reactor’s fairness is very good but the performance needed to go up.

With the notify changes from above, the ACE_Dev_Poll_Reactor performance went up, to slightly better than ACE_TP_Reactor (and the test uses a relatively small number of sockets). However, while examining strace output for the test run I noticed that there were still many notifies causing a lot of event dispatching that was slowing the test down.

As I described above, when the reactor needs to resume a handler after its callback completes, it must acquire the reactor token (the token is released during the event callback to the handler). This often requires a notify, but even when it doesn’t, the dispatching thread needs to wait for the token just to change some state, then release the token, then go around the event processing loop again which requires it to wait for the token again – a lot of token thrashing that would be great to remove.

The plan I settled on was to keep a list of handlers that needed to be resumed; instead of immediately resuming the handler upon return from the upcall, add the handler to the to-be-resumed list. This only requires a mutex instead of the reactor token, so there’s no possibility of triggering another notify, and there’s little contention for the mutex in other parts of the code. The dispatching thread could quickly add the entry to the list and get back in line for dispatching more events.

The second part of the to-be-resumed list is that a thread that is about to call epoll_wait() to get the next event will first (while holding the reactor token it already had in order to get to epoll_wait()) walk the to-be-resumed list and resume any handlers in the list that are still valid (they may have been canceled or explicitly resumed by the application in the meantime).

After this improvement was made, my reactor fairness test showed still excellent fairness on the ACE_Dev_Poll_Reactor, but with about twice the throughput. This with about 1/2 the CPU usage. These results were gathered in a less than scientific measurements and with a specific usage pattern – your mileage may vary. But if you’ve been scared away from ACE_Dev_Poll_Reactor by the discussions of CPU-bound applications getting poor performance, it’s time to take another look at ACE_Dev_Poll_Reactor.

About these ads

9 Responses to “Resolving the CPU-bound ACE_Dev_Poll_Reactor Problem, and more”

  1. Koh Says:

    Hi Steve,

    I am really excited to use your Dev_Poll_Reactor because it seems really efficient.
    Now I’m evaluating ACE_Dev_Poll_Reactor with $ACE_ROOT/tests/Reactor_Fairness_Test.cpp .

    And some questions arose when I read the source code of Dev_Poll_Reactor.cpp,
    TP_Reactor.cpp, and Select_Reactor_T.cpp .

    1) When I use Dev_Poll_Reactor or TP_Reactor,
    do I need to follw the Design rule 16 in

    http://ftp.icm.edu.pl/packages/ace/ACE/PDF/reactor-rules.pdf

    which says “Do not make blocking calls to other
    threads in handle * methods if these threads will directly or indirectly
    call back into the same ACE Reactor.” ?

    I think the answer is “No”, because both Dev_Poll_Reactor and TP_Reactor guard
    their dispatching method by using ACE_Token.

    2) When my handle_input is upcalled from Dev_Poll_Reactor and once the handler receives/reads
    data from the HANDLE, can I call resume_handler just after calling recv/read ?
    If I can do this, I think the latency of the Reactor will be more efficient
    when the data arrives at the same socket frequently.

    Is my assumption correct and is such usage of resume_handler expected usage ?
    I am not sure because I encountered the problem below.

    3) When I evaluate Dev_Poll_Reactor with modified Reactor_Fairness_Test.cpp attached,
    (I will send this cpp code via email after you replyed to me.)
    I encountered ACE::send in ACE_Dev_Poll_Reactor_Notify::notify is blocked at line 164,
    and seems to be deadlock with resume_handler.
    This leads two questions,

    One, ACE_HAS_REACTOR_NOTIFICATION_QUEUE is not defined in config.h,
    is that what you expected in current version 5.8.2 ?

    Two, the comment around line 152 says “the pipe is already in nonblocking mode
    and all we want is one attempt”, but I could not find out where the pipe is set
    to nonblocking mode.
    Which code does make the pipe nonblocking mode ?

    Here is the RPF.

    ACE VERSION: 5.8.2

    HOST MACHINE and OPERATING SYSTEM: Intel Core 2 Duo, Ubuntu 10.04.1-desktop-amd64

    TARGET MACHINE and OPERATING SYSTEM, if different from HOST: same
    COMPILER NAME AND VERSION (AND PATCHLEVEL):gcc 4.4.3

    THE $ACE_ROOT/ace/config.h FILE [if you use a link to a platform-
    specific file, simply state which one]: config-linux.h

    THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE [if you
    use a link to a platform-specific file, simply state which one
    (unless this isn’t used in this case, e.g., with Microsoft Visual
    C++)]:platform-linux.GNU

    CONTENTS OF $ACE_ROOT/bin/MakeProjectCreator/config/default.features
    (used by MPC when you generate your own makefiles):

    AREA/CLASS/EXAMPLE AFFECTED:

    Dev_Poll_Reactor.cpp / ACE_Dev_Poll_Reactor_Notify::notify / Reactor_Fairness_Test.cpp
    TP_Reactor.cpp

    SYNOPSIS:
    Same as above question,

    1) When I use Dev_Poll_Reactor or TP_Reactor,
    do I need to follw the Design rule 16 in

    http://ftp.icm.edu.pl/packages/ace/ACE/PDF/reactor-rules.pdf

    which says “Do not make blocking calls to other
    threads in handle * methods if these threads will directly or indirectly
    call back into the same ACE Reactor.” ?

    I think the answer is “No”, because both Dev_Poll_Reactor and TP_Reactor guard
    their dispatching method by using ACE_Token.

    2) When my handle_input is upcalled from Dev_Poll_Reactor and once the handler receives/reads
    data from the HANDLE, can I call resume_handler just after calling recv/read ?
    If I can do this, I think the latency of the Reactor will be more efficient
    when the data arrives at the same socket frequently.

    Is my assumption correct and is such usage of resume_handler expected usage ?
    I am not sure because I encountered the problem below.

    3) When I evaluate Dev_Poll_Reactor with modified Reactor_Fairness_Test.cpp attached,
    I encountered ACE::send in ACE_Dev_Poll_Reactor_Notify::notify is blocked at line 164,
    and seems to be deadlock with resume_handler.
    This leads two questions,

    One, ACE_HAS_REACTOR_NOTIFICATION_QUEUE is not defined in config.h,
    is that what you expected in current version 5.8.2 ?

    Two, the comment around line 152 says “the pipe is already in nonblocking mode
    and all we want is one attempt”, but I could not find out where the pipe is set
    to nonblocking mode.
    Which code does make the pipe nonblocking mode ?

    DESCRIPTION:
    The application in short.
    ————————-
    The test program is modified Reactor_Fairness_Test.cpp attached.

    Problem cause
    ————–
    May cause deadlock with ACE_Dev_Poll_Reactor_Notify::notify and resume_handler
    which user application calls.

    I will be appreciated if you give me some advice to use Dev_Poll_Reactor effectively.

    Thanks,

    Koh

    • stevehuston Says:

      Hi Koh,
      Thanks for your interest and feedback. For your questions:

      1. The referenced PDF is from an old paper. I recommend you use current recommendations in C++NPv2 and APG (both books described further at http://www.riverace.com/acebooks/ ). That rule is not important now, because the TP and Dev_Poll reactors don’t hold the token around the upcall as the paper describes.
      2. It is possible to resume your handler before returning from the upcall, but you should tell the reactor that’s what you will be doing or the reactor will also try to resume the handler upon return from the callback. To do this, override the ACE_Event_Handler::resume_handler(void) method and return the value ACE_Event_Handler::ACE_APPLICATION_RESUMES_HANDLER. See the documentation in Event_Handler.h for further instructions.

      If you still have problems after correcting your code, please raise the issue on the ace-users@list.isis.vanderbilt.edu mailing list.

  2. Koh Says:

    Thank you very much, Steve.

    I understood.
    I can avoid the deadlock which I encountered my test code
    if ACE_Event_Handler::resume_handler(void) method returns ACE_Event_Handler::ACE_APPLICATION_RESUMES_HANDLER, then Dev_Poll_Reactor does not insert this event_handler to the to_be_resumed_ and does not call this->notify().

    In my evaluation, Dev_Poll_Reactor shows the fastest throughput among these three reactors, Select_Reactor, TP_Reactor, and Dev_Poll_Reactor, when it is on multi-threads multi-core processor and event_handler controls the resume_handler of Dev_Poll_Reactor.

    Thanks!

  3. Koh Says:

    Hi Steve,

    Dev_Poll_Reactor gives us the fastest throughput among the three reactors.
    However, Dev_Poll_Reactor can get better performance when you use epoll version of Dev_Poll_Reactor.
    I think my modification below works well, but if you give me some feedback,
    I will really be happy.

    (0) Token degrades the performance

    I found the ACE_Dev_Poll_Reactor_Token degrades the performance of process,
    when the process has the reactor waited by Leader-Followers thread, in multi-core environment.

    When some thread calls schedule_timer, cancel_timer, or resume_handler of the reactor
    which is waited by multi-thread (Leader-Followers),

    1) the token notify notification to Leader thread and caller’s thread wait to acquire the token,
    2) Leader wake up from epoll_wait and release token,
    3) then the caller acquire the token,
    4) the caller’s thread does something (implementation of schedule_timer, cancel_timer, etc…),
    5) the caller’s thread releases the token,
    6) one of the followers acquires the token and becomes Leader.

    There are two performance problems.
    One is the caller’s thread must wait until the Leader wakes.
    The other is Leader and Follower always consume CPU time when releases token.

    In my software, this two problems reduce the performance,
    and the solutions below improve the performance by 15% – 20%.

    (1) Improve schedule_timer

    If the earliest_time has updated, then epoll_wait must be updated to new earliest_time,
    so ACE_Dev_Poll_Reactor::schedule_timer must call
    ACE_MT (ACE_GUARD_RETURN (ACE_Dev_Poll_Reactor_Token, mon, this->token_, -1));
    to notify changes to Leader. (you just need to notify() to the reactor.)
    But if the earliest_time has not changed, just put timer_node into the timer_queue,
    this operation is guarded by queue’s own lock, and schedule_timer does not need token operation.

    In most cases the timer_node will be put at the tail of the queue,
    this method will reduce the chance of waiting another thread and improve efficiency.

    (2) Improve cancel_timer

    In any cases, the only need to do is to erase timer_node from timer_queueu.
    Epoll_wait may fire at certain timing, but it is harmless,
    because this harmless timer_handling costs as much burden as the notification handling
    which is invoked by token.

    (3) Improve resume_handler

    Call
    ACE_MT (ACE_GUARD_RETURN (ACE_SYNCH_MUTEX, mon, to_be_resumed_lock_, -1));
    instead of
    ACE_MT (ACE_GUARD_RETURN (ACE_Dev_Poll_Reactor_Token, mon, this->token_, -1));
    in resume_handler,
    and remove

    if (!info->suspended)
    return 0;

    in ACE_Dev_Poll_Reactor::resume_handler_i .

    Since epoll_ctl is thread-safe against epoll_wait,
    I think you do not need to wake Leader thread.

    I suppose the EPOLLONESHOT mechanism of epoll_wait and “info->suspended=true”
    need to be atomic, and you would want to reduce the times of epoll_ctl,
    you had defined critical section of “resume_handler” as critical section of token.
    But if the calling epoll_ctl is less expensive than waiting to Leader thread,
    and epoll_ctl allows duplicate EPOLL_CTL_MOD to the same fd,
    then this solution I proposed above works, I think.

    What do you think about the performance improvement I proposed ?

    Regards,
    Koh

    • stevehuston Says:

      Those are good insights, Koh. Thank you for taking the time to respond with your ideas. Could you please formulate a patch incorporating your ideas and submit it as an improvement request to ACE at http://bugzilla.dre.vanderbilt.edu/ – that will ensure it gets looked at, evaluated, and tracked properly.

      > —–Original Message—– >

    • stevehuston Says:

      Koh, I added some improvements in the way ACE_Dev_Poll_Reactor runs with multiple threads, taking advantage of the concurrency characteristics of the epoll facility. I also used your idea for canceling timers – thank you for that. Performance on the Reactor_Fairness_Test improved about 35%, These will be in ACE 6.2.

  4. Abhinav Says:

    Hi Steve,

    I have migrated my current TP reactor framework for a process A to ACE 6.0 where I want to use Dev/Poll. Process A listens on a socket 12345 and processes the message and sends an IPC to process B using a client socket. Afterwards Process B replies on another server socket 12346 of process A using IPC. Once Process A receives B’s response forwards it to socket 12345.

    Now all this use a single TP reactor for process A and no worker threads. After migrating to Dev/Poll reactor I want to activate 2 threads using the activate() so that there is some concurrency in handling data from port 12345 as current process A can only handle 7000 transactions/per sec.

    Please let me know how using Dev/Poll I can increase it or is this current design of single reactor the bottleneck?

    Regards,
    Abhinav

    • stevehuston Says:

      Both ACE_TP_Reactor and ACE_Dev_Poll_Reactor can be used with multiple threads. You would spawn your threads, and each of them call ACE_Reactor::run_reactor_event_loop(). There is an example of this in section 4.3 of C++NPv2

      • Abhinav Says:

        Hi Steve
        Thanks for your reply!
        I have one problem with spawning multi-threads as they are stateful. If a thread accepts a request on port12345 then forwards request to process B. I should receive a response on the same thread that forwarded the request. In such a scenario I need to make sure that process B when accepts connection from a client of process A and sends.the response. Then IPC server socket of process A 12346 should put it on thread which has a connection established to process B.

        Also I have multiple process B with one reactor and one server socket. Process A can be made multi-thread dedicating each thread to one instance of process B. But all threads of process A have a single socket.

        Is.this design possible in ACE TP reactor or having stateful threads is a wrong idea?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 244 other followers

%d bloggers like this: