• D
    ipc/mqueue: add rbtree node caching support · ce2d52cc
    Doug Ledford 提交于
    When I wrote the first patch that added the rbtree support for message
    queue insertion, it sped up the case where the queue was very full
    drastically from the original code.  It, however, slowed down the case
    where the queue was empty (not drastically though).
    
    This patch caches the last freed rbtree node struct so we can quickly
    reuse it when we get a new message.  This is the common path for any queue
    that very frequently goes from 0 to 1 then back to 0 messages in queue.
    
    Andrew Morton didn't like that we were doing a GFP_ATOMIC allocation in
    msg_insert, so this patch attempts to speculatively allocate a new node
    struct outside of the spin lock when we know we need it, but will still
    fall back to a GFP_ATOMIC allocation if it has to.
    
    Once I added the caching, the necessary various ret = ; spin_unlock
    gyrations in mq_timedsend were getting pretty ugly, so this also slightly
    refactors that function to streamline the flow of the code and the
    function exit.
    
    Finally, while working on getting performance back I made sure that all of
    the node structs were always fully initialized when they were first used,
    rendering the use of kzalloc unnecessary and a waste of CPU cycles.
    
    The net result of all of this is:
    
    1) We will avoid a GFP_ATOMIC allocation when possible, but fall back
       on it when necessary.
    
    2) We will speculatively allocate a node struct using GFP_KERNEL if our
       cache is empty (and save the struct to our cache if it's still empty
       after we have obtained the spin lock).
    
    3) The performance of the common queue empty case has significantly
       improved and is now much more in line with the older performance for
       this case.
    
    The performance changes are:
    
                Old mqueue      new mqueue      new mqueue + caching
    queue empty
    send/recv   305/288ns       349/318ns       310/322ns
    
    I don't think we'll ever be able to get the recv performance back, but
    that's because the old recv performance was a direct result and
    consequence of the old methods abysmal send performance.  The recv path
    simply must do more so that the send path does not incur such a penalty
    under higher queue depths.
    
    As it turns out, the new caching code also sped up the various queue full
    cases relative to my last patch.  That could be because of the difference
    between the syscall path in 3.3.4-rc5 and 3.3.4-rc6, or because of the
    change in code flow in the mq_timedsend routine.  Regardless, I'll take
    it.  It wasn't huge, and I *would* say it was within the margin for error,
    but after many repeated runs what I'm seeing is that the old numbers trend
    slightly higher (about 10 to 20ns depending on which test is the one
    running).
    
    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: NDoug Ledford <dledford@redhat.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Manfred Spraul <manfred@colorfullife.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    ce2d52cc
mqueue.c 35.6 KB