1. 20 8月, 2011 6 次提交
    • C
      slub: per cpu cache for partial pages · 49e22585
      Christoph Lameter 提交于
      Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
      partial pages. The partial page list is used in slab_free() to avoid
      per node lock taking.
      
      In __slab_alloc() we can then take multiple partial pages off the per
      node partial list in one go reducing node lock pressure.
      
      We can also use the per cpu partial list in slab_alloc() to avoid scanning
      partial lists for pages with free objects.
      
      The main effect of a per cpu partial list is that the per node list_lock
      is taken for batches of partial pages instead of individual ones.
      
      Potential future enhancements:
      
      1. The pickup from the partial list could be perhaps be done without disabling
         interrupts with some work. The free path already puts the page into the
         per cpu partial list without disabling interrupts.
      
      2. __slab_free() may have some code paths that could use optimization.
      
      Performance:
      
      				Before		After
      ./hackbench 100 process 200000
      				Time: 1953.047	1564.614
      ./hackbench 100 process 20000
      				Time: 207.176   156.940
      ./hackbench 100 process 20000
      				Time: 204.468	156.940
      ./hackbench 100 process 20000
      				Time: 204.879	158.772
      ./hackbench 10 process 20000
      				Time: 20.153	15.853
      ./hackbench 10 process 20000
      				Time: 20.153	15.986
      ./hackbench 10 process 20000
      				Time: 19.363	16.111
      ./hackbench 1 process 20000
      				Time: 2.518	2.307
      ./hackbench 1 process 20000
      				Time: 2.258	2.339
      ./hackbench 1 process 20000
      				Time: 2.864	2.163
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      49e22585
    • C
      slub: return object pointer from get_partial() / new_slab(). · 497b66f2
      Christoph Lameter 提交于
      There is no need anymore to return the pointer to a slab page from get_partial()
      since the page reference can be stored in the kmem_cache_cpu structures "page" field.
      
      Return an object pointer instead.
      
      That in turn allows a simplification of the spaghetti code in __slab_alloc().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      497b66f2
    • C
      slub: pass kmem_cache_cpu pointer to get_partial() · acd19fd1
      Christoph Lameter 提交于
      Pass the kmem_cache_cpu pointer to get_partial(). That way
      we can avoid the this_cpu_write() statements.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      acd19fd1
    • C
      slub: Prepare inuse field in new_slab() · e6e82ea1
      Christoph Lameter 提交于
      inuse will always be set to page->objects. There is no point in
      initializing the field to zero in new_slab() and then overwriting
      the value in __slab_alloc().
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      e6e82ea1
    • C
      slub: Remove useless statements in __slab_alloc · 7db0d705
      Christoph Lameter 提交于
      Two statements in __slab_alloc() do not have any effect.
      
      1. c->page is already set to NULL by deactivate_slab() called right before.
      
      2. gfpflags are masked in new_slab() before being passed to the page
         allocator. There is no need to mask gfpflags in __slab_alloc in particular
         since most frequent processing in __slab_alloc does not require the use of a
         gfpmask.
      
      Cc: torvalds@linux-foundation.org
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      7db0d705
    • C
      slub: free slabs without holding locks · 69cb8e6b
      Christoph Lameter 提交于
      There are two situations in which slub holds a lock while releasing
      pages:
      
      	A. During kmem_cache_shrink()
      	B. During kmem_cache_close()
      
      For A build a list while holding the lock and then release the pages
      later. In case of B we are the last remaining user of the slab so
      there is no need to take the listlock.
      
      After this patch all calls to the page allocator to free pages are
      done without holding any spinlocks. kmem_cache_destroy() will still
      hold the slub_lock semaphore.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      69cb8e6b
  2. 10 8月, 2011 1 次提交
    • C
      slub: Fix partial count comparison confusion · 81107188
      Christoph Lameter 提交于
      deactivate_slab() has the comparison if more than the minimum number of
      partial pages are in the partial list wrong. An effect of this may be that
      empty pages are not freed from deactivate_slab(). The result could be an
      OOM due to growth of the partial slabs per node. Frees mostly occur from
      __slab_free which is okay so this would only affect use cases where a lot
      of switching around of per cpu slabs occur.
      
      Switching per cpu slabs occurs with high frequency if debugging options are
      enabled.
      Reported-and-tested-by: NXiaotian Feng <xtfeng@gmail.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      81107188
  3. 09 8月, 2011 2 次提交
    • A
      slub: fix check_bytes() for slub debugging · ef62fb32
      Akinobu Mita 提交于
      The check_bytes() function is used by slub debugging.  It returns a pointer
      to the first unmatching byte for a character in the given memory area.
      
      If the character for matching byte is greater than 0x80, check_bytes()
      doesn't work.  Becuase 64-bit pattern is generated as below.
      
      	value64 = value | value << 8 | value << 16 | value << 24;
      	value64 = value64 | value64 << 32;
      
      The integer promotions are performed and sign-extended as the type of value
      is u8.  The upper 32 bits of value64 is 0xffffffff in the first line, and
      the second line has no effect.
      
      This fixes the 64-bit pattern generation.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Reviewed-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      ef62fb32
    • C
      slub: Fix full list corruption if debugging is on · 6fbabb20
      Christoph Lameter 提交于
      When a slab is freed by __slab_free() and the slab can only contain a
      single object ever then it was full (and therefore not on the partial
      lists but on the full list in the debug case) before we reached
      slab_empty.
      
      This caused the following full list corruption when SLUB debugging was enabled:
      
        [ 5913.233035] ------------[ cut here ]------------
        [ 5913.233097] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
        [ 5913.233101] Hardware name: Adamo 13
        [ 5913.233105] list_del corruption. prev->next should be ffffea000434fd20, but was ffffea0004199520
        [ 5913.233108] Modules linked in: nfs fscache fuse ebtable_nat ebtables ppdev parport_pc lp parport ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss xt_CHECKSUM sunrpc iptable_mangle bridge stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables rfcomm bnep arc4 iwlagn snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel btusb mac80211 snd_hda_codec bluetooth snd_hwdep snd_seq snd_seq_device snd_pcm usb_debug dell_wmi sparse_keymap cdc_ether usbnet cdc_acm uvcvideo cdc_wdm mii cfg80211 snd_timer dell_laptop videodev dcdbas snd microcode v4l2_compat_ioctl32 soundcore joydev tg3 pcspkr snd_page_alloc iTCO_wdt i2c_i801 rfkill iTCO_vendor_support wmi virtio_net kvm_intel kvm ipv6 xts gf128mul dm_crypt i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
        [ 5913.233213] Pid: 0, comm: swapper Not tainted 3.0.0+ #127
        [ 5913.233213] Call Trace:
        [ 5913.233213]  <IRQ>  [<ffffffff8105df18>] warn_slowpath_common+0x83/0x9b
        [ 5913.233213]  [<ffffffff8105dfd3>] warn_slowpath_fmt+0x46/0x48
        [ 5913.233213]  [<ffffffff8127e7c1>] __list_del_entry+0x8d/0x98
        [ 5913.233213]  [<ffffffff8127e7da>] list_del+0xe/0x2d
        [ 5913.233213]  [<ffffffff814e0430>] __slab_free+0x1db/0x235
        [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff811706ab>] ? bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff81133085>] kmem_cache_free+0x88/0x102
        [ 5913.233213]  [<ffffffff811706ab>] bvec_free_bs+0x35/0x37
        [ 5913.233213]  [<ffffffff811706e1>] bio_free+0x34/0x64
        [ 5913.233213]  [<ffffffff813dc390>] dm_bio_destructor+0x12/0x14
        [ 5913.233213]  [<ffffffff8116fef6>] bio_put+0x2b/0x2d
        [ 5913.233213]  [<ffffffff813dccab>] clone_endio+0x9e/0xb4
        [ 5913.233213]  [<ffffffff8116f7dd>] bio_endio+0x2d/0x2f
        [ 5913.233213]  [<ffffffffa00148da>] crypt_dec_pending+0x5c/0x8b [dm_crypt]
        [ 5913.233213]  [<ffffffffa00150a9>] crypt_endio+0x78/0x81 [dm_crypt]
      
      [ Full discussion here: https://lkml.org/lkml/2011/8/4/375 ]
      
      Make sure that we remove such a slab also from the full lists.
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Reported-and-tested-by: NXiaotian Feng <xtfeng@gmail.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      6fbabb20
  4. 26 7月, 2011 1 次提交
  5. 21 7月, 2011 1 次提交
    • P
      treewide: fix potentially dangerous trailing ';' in #defined values/expressions · 497888cf
      Phil Carmody 提交于
      All these are instances of
        #define NAME value;
      or
        #define NAME(params_opt) value;
      
      These of course fail to build when used in contexts like
        if(foo $OP NAME)
        while(bar $OP NAME)
      and may silently generate the wrong code in contexts such as
        foo = NAME + 1;    /* foo = value; + 1; */
        bar = NAME - 1;    /* bar = value; - 1; */
        baz = NAME & quux; /* baz = value; & quux; */
      
      Reported on comp.lang.c,
      Message-ID: <ab0d55fe-25e5-482b-811e-c475aa6065c3@c29g2000yqd.googlegroups.com>
      Initial analysis of the dangers provided by Keith Thompson in that thread.
      
      There are many more instances of more complicated macros having unnecessary
      trailing semicolons, but this pile seems to be all of the cases of simple
      values suffering from the problem. (Thus things that are likely to be found
      in one of the contexts above, more complicated ones aren't.)
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      497888cf
  6. 18 7月, 2011 1 次提交
  7. 08 7月, 2011 4 次提交
    • P
      SLUB: Fix missing <linux/stacktrace.h> include · bfa71457
      Pekka Enberg 提交于
      This fixes the following build breakage commit d6543e39 ("slub: Enable backtrace
      for create/delete points"):
      
        CC      mm/slub.o
      mm/slub.c: In function ‘set_track’:
      mm/slub.c:428: error: storage size of ‘trace’ isn’t known
      mm/slub.c:435: error: implicit declaration of function ‘save_stack_trace’
      mm/slub.c:428: warning: unused variable ‘trace’
      make[1]: *** [mm/slub.o] Error 1
      make: *** [mm/slub.o] Error 2
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      bfa71457
    • M
      slub: reduce overhead of slub_debug · c4089f98
      Marcin Slusarz 提交于
      slub checks for poison one byte by one, which is highly inefficient
      and shows up frequently as a highest cpu-eater in perf top.
      
      Joining reads gives nice speedup:
      
      (Compiling some project with different options)
                                       make -j12    make clean
      slub_debug disabled:             1m 27s       1.2 s
      slub_debug enabled:              1m 46s       7.6 s
      slub_debug enabled + this patch: 1m 33s       3.2 s
      
      check_bytes still shows up high, but not always at the top.
      Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: linux-mm@kvack.org
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      c4089f98
    • B
      slub: Add method to verify memory is not freed · d18a90dd
      Ben Greear 提交于
      This is for tracking down suspect memory usage.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      d18a90dd
    • B
      slub: Enable backtrace for create/delete points · d6543e39
      Ben Greear 提交于
      This patch attempts to grab a backtrace for the creation
      and deletion points of the slub object.  When a fault is
      detected, we can then get a better idea of where the item
      was deleted.
      
      Example output from debugging some funky nfs/rpc behaviour:
      
      =============================================================================
      BUG kmalloc-64: Object is on free-list
      -----------------------------------------------------------------------------
      
      INFO: Allocated in rpcb_getport_async+0x39c/0x5a5 [sunrpc] age=381 cpu=3 pid=3750
             __slab_alloc+0x348/0x3ba
             kmem_cache_alloc_trace+0x67/0xe7
             rpcb_getport_async+0x39c/0x5a5 [sunrpc]
             call_bind+0x70/0x75 [sunrpc]
             __rpc_execute+0x78/0x24b [sunrpc]
             rpc_execute+0x3d/0x42 [sunrpc]
             rpc_run_task+0x79/0x81 [sunrpc]
             rpc_call_sync+0x3f/0x60 [sunrpc]
             rpc_ping+0x42/0x58 [sunrpc]
             rpc_create+0x4aa/0x527 [sunrpc]
             nfs_create_rpc_client+0xb1/0xf6 [nfs]
             nfs_init_client+0x3b/0x7d [nfs]
             nfs_get_client+0x453/0x5ab [nfs]
             nfs_create_server+0x10b/0x437 [nfs]
             nfs_fs_mount+0x4ca/0x708 [nfs]
             mount_fs+0x6b/0x152
      INFO: Freed in rpcb_map_release+0x3f/0x44 [sunrpc] age=30 cpu=2 pid=29049
             __slab_free+0x57/0x150
             kfree+0x107/0x13a
             rpcb_map_release+0x3f/0x44 [sunrpc]
             rpc_release_calldata+0x12/0x14 [sunrpc]
             rpc_free_task+0x59/0x61 [sunrpc]
             rpc_final_put_task+0x82/0x8a [sunrpc]
             __rpc_execute+0x23c/0x24b [sunrpc]
             rpc_async_schedule+0x10/0x12 [sunrpc]
             process_one_work+0x230/0x41d
             worker_thread+0x133/0x217
             kthread+0x7d/0x85
             kernel_thread_helper+0x4/0x10
      INFO: Slab 0xffffea00029aa470 objects=20 used=9 fp=0xffff8800be7830d8 flags=0x20000000004081
      INFO: Object 0xffff8800be7830d8 @offset=4312 fp=0xffff8800be7827a8
      
      Bytes b4 0xffff8800be7830c8:  87 a8 96 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a .�......ZZZZZZZZ
       Object 0xffff8800be7830d8:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
       Object 0xffff8800be7830e8:  6b 6b 6b 6b 01 08 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkk..kkkkkkkkkk
       Object 0xffff8800be7830f8:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
       Object 0xffff8800be783108:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk�
       Redzone 0xffff8800be783118:  bb bb bb bb bb bb bb bb                         �������������
       Padding 0xffff8800be783258:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
      Pid: 29049, comm: kworker/2:2 Not tainted 3.0.0-rc4+ #8
      Call Trace:
       [<ffffffff811055c3>] print_trailer+0x131/0x13a
       [<ffffffff81105601>] object_err+0x35/0x3e
       [<ffffffff8110746f>] verify_mem_not_deleted+0x7a/0xb7
       [<ffffffffa02851b5>] rpcb_getport_done+0x23/0x126 [sunrpc]
       [<ffffffffa027d0ba>] rpc_exit_task+0x3f/0x6d [sunrpc]
       [<ffffffffa027d4ab>] __rpc_execute+0x78/0x24b [sunrpc]
       [<ffffffffa027d6c0>] ? rpc_execute+0x42/0x42 [sunrpc]
       [<ffffffffa027d6d0>] rpc_async_schedule+0x10/0x12 [sunrpc]
       [<ffffffff810611b7>] process_one_work+0x230/0x41d
       [<ffffffff81061102>] ? process_one_work+0x17b/0x41d
       [<ffffffff81063613>] worker_thread+0x133/0x217
       [<ffffffff810634e0>] ? manage_workers+0x191/0x191
       [<ffffffff81066e10>] kthread+0x7d/0x85
       [<ffffffff81485924>] kernel_thread_helper+0x4/0x10
       [<ffffffff8147eb18>] ? retint_restore_args+0x13/0x13
       [<ffffffff81066d93>] ? __init_kthread_worker+0x56/0x56
       [<ffffffff81485920>] ? gs_change+0x13/0x13
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      d6543e39
  8. 02 7月, 2011 14 次提交
  9. 04 6月, 2011 1 次提交
  10. 26 5月, 2011 1 次提交
  11. 25 5月, 2011 1 次提交
  12. 21 5月, 2011 1 次提交
  13. 18 5月, 2011 3 次提交
  14. 08 5月, 2011 1 次提交
    • C
      slub: Remove CONFIG_CMPXCHG_LOCAL ifdeffery · 1759415e
      Christoph Lameter 提交于
      Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used
      everywhere.
      
      There may be performance implications since:
      
      A. We now have to manage a transaction ID for all arches
      
      B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced
      to a very short irqoff section.
      
      There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback
      case we only have to do one disable and enable like before.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      1759415e
  15. 05 5月, 2011 1 次提交
    • T
      slub: Fix the lockless code on 32-bit platforms with no 64-bit cmpxchg · 30106b8c
      Thomas Gleixner 提交于
      The SLUB allocator use of the cmpxchg_double logic was wrong: it
      actually needs the irq-safe one.
      
      That happens automatically when we use the native unlocked 'cmpxchg8b'
      instruction, but when compiling the kernel for older x86 CPUs that do
      not support that instruction, we fall back to the generic emulation
      code.
      
      And if you don't specify that you want the irq-safe version, the generic
      code ends up just open-coding the cmpxchg8b equivalent without any
      protection against interrupts or preemption.  Which definitely doesn't
      work for SLUB.
      
      This was reported by Werner Landgraf <w.landgraf@ru.ru>, who saw
      instability with his distro-kernel that was compiled to support pretty
      much everything under the sun.  Most big Linux distributions tend to
      compile for PPro and later, and would never have noticed this problem.
      
      This also fixes the prototypes for the irqsafe cmpxchg_double functions
      to use 'bool' like they should.
      
      [ Btw, that whole "generic code defaults to no protection" design just
        sounds stupid - if the code needs no protection, there is no reason to
        use "cmpxchg_double" to begin with.  So we should probably just remove
        the unprotected version entirely as pointless.   - Linus ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-and-tested-by: Nwerner <w.landgraf@ru.ru>
      Acked-and-tested-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionosSigned-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30106b8c
  16. 17 4月, 2011 1 次提交