1. 04 3月, 2008 1 次提交
  2. 20 2月, 2008 1 次提交
    • L
      Revert "SLUB: Alternate fast paths using cmpxchg_local" · 00e962c5
      Linus Torvalds 提交于
      This reverts commit 1f84260c, which is
      suspected to be the reason for some very occasional and hard-to-trigger
      crashes that usually look related to memory allocation (mostly reported
      in networking, but since that's generally the most common source of
      shortlived allocations - and allocations in interrupt contexts - that in
      itself is not a big clue).
      
      See for example
      	http://bugzilla.kernel.org/show_bug.cgi?id=9973
      	http://lkml.org/lkml/2008/2/19/278
      etc.
      
      One promising suspicion for what the root cause of bug is (which also
      explains why it's so hard to trigger in practice) came from Eric
      Dumazet:
      
         "I wonder how SLUB_FASTPATH is supposed to work, since it is affected
          by a classical ABA problem of lockless algo.
      
          cmpxchg_local(&c->freelist, object, object[c->offset]) can succeed,
          while an interrupt came (on this cpu), and several allocations were
          done, and one free was performed at the end of this interruption, so
          'object' was recycled.
      
          c->freelist can then contain the previous value (object), but
          object[c->offset] was changed by IRQ.
      
          We then put back in freelist an already allocated object."
      
      but another reason for the revert is simply that everybody agrees that
      this code was the main suspect just by virtue of the pattern of oopses.
      
      Cc: Torsten Kaiser <just.for.lkml@googlemail.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00e962c5
  3. 15 2月, 2008 5 次提交
  4. 08 2月, 2008 6 次提交
    • I
      SLUB: fix checkpatch warnings · 3adbefee
      Ingo Molnar 提交于
      fix checkpatch --file mm/slub.c errors and warnings.
      
       $ q-code-quality-compare
                                            errors   lines of code   errors/KLOC
       mm/slub.c      [before]                  22            4204           5.2
       mm/slub.c      [after]                    0            4210             0
      
      no code changed:
      
          text    data     bss     dec     hex filename
         22195    8634     136   30965    78f5 slub.o.before
         22195    8634     136   30965    78f5 slub.o.after
      
         md5:
           93cdfbec2d6450622163c590e1064358  slub.o.before.asm
           93cdfbec2d6450622163c590e1064358  slub.o.after.asm
      
      [clameter: rediffed against Pekka's cleanup patch, omitted
      moves of the name of a function to the start of line]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      3adbefee
    • N
      Use non atomic unlock · a76d3546
      Nick Piggin 提交于
      Slub can use the non-atomic version to unlock because other flags will not
      get modified with the lock held.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      a76d3546
    • C
      SLUB: Support for performance statistics · 8ff12cfc
      Christoph Lameter 提交于
      The statistics provided here allow the monitoring of allocator behavior but
      at the cost of some (minimal) loss of performance. Counters are placed in
      SLUB's per cpu data structure. The per cpu structure may be extended by the
      statistics to grow larger than one cacheline which will increase the cache
      footprint of SLUB.
      
      There is a compile option to enable/disable the inclusion of the runtime
      statistics and its off by default.
      
      The slabinfo tool is enhanced to support these statistics via two options:
      
      -D 	Switches the line of information displayed for a slab from size
      	mode to activity mode.
      
      -A	Sorts the slabs displayed by activity. This allows the display of
      	the slabs most important to the performance of a certain load.
      
      -r	Report option will report detailed statistics on
      
      Example (tbench load):
      
      slabinfo -AD		->Shows the most active slabs
      
      Name                   Objects    Alloc     Free   %Fast
      skbuff_fclone_cache         33 111953835 111953835  99  99
      :0000192                  2666  5283688  5281047  99  99
      :0001024                   849  5247230  5246389  83  83
      vm_area_struct            1349   119642   118355  91  22
      :0004096                    15    66753    66751  98  98
      :0000064                  2067    25297    23383  98  78
      dentry                   10259    28635    18464  91  45
      :0000080                 11004    18950     8089  98  98
      :0000096                  1703    12358    10784  99  98
      :0000128                   762    10582     9875  94  18
      :0000512                   184     9807     9647  95  81
      :0002048                   479     9669     9195  83  65
      anon_vma                   777     9461     9002  99  71
      kmalloc-8                 6492     9981     5624  99  97
      :0000768                   258     7174     6931  58  15
      
      So the skbuff_fclone_cache is of highest importance for the tbench load.
      Pretty high load on the 192 sized slab. Look for the aliases
      
      slabinfo -a | grep 000192
      :0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
      	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili
      
      Likely skbuff_head_cache.
      
      
      Looking into the statistics of the skbuff_fclone_cache is possible through
      
      slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned
      
      
      .... Usual output ...
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath             111953360 111946981  99  99
      Slowpath                 1044     7423   0   0
      Page Alloc                272      264   0   0
      Add partial                25      325   0   0
      Remove partial             86      264   0   0
      RemoteObj/SlabFrozen      350     4832   0   0
      Total                111954404 111954404
      
      Flushes       49 Refill        0
      Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)
      
      Looks good because the fastpath is overwhelmingly taken.
      
      
      skbuff_head_cache:
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath              5297262  5259882  99  99
      Slowpath                 4477    39586   0   0
      Page Alloc                937      824   0   0
      Add partial                 0     2515   0   0
      Remove partial           1691      824   0   0
      RemoteObj/SlabFrozen     2621     9684   0   0
      Total                 5301739  5299468
      
      Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)
      
      
      Descriptions of the output:
      
      Total:		The total number of allocation and frees that occurred for a
      		slab
      
      Fastpath:	The number of allocations/frees that used the fastpath.
      
      Slowpath:	Other allocations
      
      Page Alloc:	Number of calls to the page allocator as a result of slowpath
      		processing
      
      Add Partial:	Number of slabs added to the partial list through free or
      		alloc (occurs during cpuslab flushes)
      
      Remove Partial:	Number of slabs removed from the partial list as a result of
      		allocations retrieving a partial slab or by a free freeing
      		the last object of a slab.
      
      RemoteObj/Froz:	How many times were remotely freed object encountered when a
      		slab was about to be deactivated. Frozen: How many times was
      		free able to skip list processing because the slab was in use
      		as the cpuslab of another processor.
      
      Flushes:	Number of times the cpuslab was flushed on request
      		(kmem_cache_shrink, may result from races in __slab_alloc)
      
      Refill:		Number of times we were able to refill the cpuslab from
      		remotely freed objects for the same slab.
      
      Deactivate:	Statistics how slabs were deactivated. Shows how they were
      		put onto the partial list.
      
      In general fastpath is very good. Slowpath without partial list processing is
      also desirable. Any touching of partial list uses node specific locks which
      may potentially cause list lock contention.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      8ff12cfc
    • C
      SLUB: Alternate fast paths using cmpxchg_local · 1f84260c
      Christoph Lameter 提交于
      Provide an alternate implementation of the SLUB fast paths for alloc
      and free using cmpxchg_local. The cmpxchg_local fast path is selected
      for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only
      set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an
      interrupt enable/disable sequence. This is known to be true for both
      x86 platforms so set FAST_CMPXCHG_LOCAL for both arches.
      
      Currently another requirement for the fastpath is that the kernel is
      compiled without preemption. The restriction will go away with the
      introduction of a new per cpu allocator and new per cpu operations.
      
      The advantages of a cmpxchg_local based fast path are:
      
      1. Potentially lower cycle count (30%-60% faster)
      
      2. There is no need to disable and enable interrupts on the fast path.
         Currently interrupts have to be disabled and enabled on every
         slab operation. This is likely avoiding a significant percentage
         of interrupt off / on sequences in the kernel.
      
      3. The disposal of freed slabs can occur with interrupts enabled.
      
      The alternate path is realized using #ifdef's. Several attempts to do the
      same with macros and inline functions resulted in a mess (in particular due
      to the strange way that local_interrupt_save() handles its argument and due
      to the need to define macros/functions that sometimes disable interrupts
      and sometimes do something else).
      
      [clameter: Stripped preempt bits and disabled fastpath if preempt is enabled]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      1f84260c
    • C
      SLUB: Use unique end pointer for each slab page. · 683d0baa
      Christoph Lameter 提交于
      We use a NULL pointer on freelists to signal that there are no more objects.
      However the NULL pointers of all slabs match in contrast to the pointers to
      the real objects which are in different ranges for different slab pages.
      
      Change the end pointer to be a pointer to the first object and set bit 0.
      Every slab will then have a different end pointer. This is necessary to ensure
      that end markers can be matched to the source slab during cmpxchg_local.
      
      Bring back the use of the mapping field by SLUB since we would otherwise have
      to call a relatively expensive function page_address() in __slab_alloc().  Use
      of the mapping field allows avoiding a call to page_address() in various other
      functions as well.
      
      There is no need to change the page_mapping() function since bit 0 is set on
      the mapping as also for anonymous pages.  page_mapping(slab_page) will
      therefore still return NULL although the mapping field is overloaded.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      683d0baa
    • C
      SLUB: Deal with annoying gcc warning on kfree() · 5bb983b0
      Christoph Lameter 提交于
      gcc 4.2 spits out an annoying warning if one casts a const void *
      pointer to a void * pointer. No warning is generated if the
      conversion is done through an assignment.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      5bb983b0
  5. 05 2月, 2008 7 次提交
  6. 25 1月, 2008 5 次提交
  7. 03 1月, 2008 1 次提交
  8. 02 1月, 2008 1 次提交
  9. 22 12月, 2007 1 次提交
  10. 18 12月, 2007 1 次提交
  11. 10 12月, 2007 1 次提交
  12. 06 12月, 2007 1 次提交
  13. 13 11月, 2007 1 次提交
  14. 06 11月, 2007 1 次提交
  15. 29 10月, 2007 1 次提交
  16. 22 10月, 2007 1 次提交
  17. 17 10月, 2007 5 次提交
    • C
      Slab API: remove useless ctor parameter and reorder parameters · 4ba9b9d0
      Christoph Lameter 提交于
      Slab constructors currently have a flags parameter that is never used.  And
      the order of the arguments is opposite to other slab functions.  The object
      pointer is placed before the kmem_cache pointer.
      
      Convert
      
              ctor(void *object, struct kmem_cache *s, unsigned long flags)
      
      to
      
              ctor(struct kmem_cache *s, void *object)
      
      throughout the kernel
      
      [akpm@linux-foundation.org: coupla fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ba9b9d0
    • C
      SLUB: simplify IRQ off handling · b811c202
      Christoph Lameter 提交于
      Move irq handling out of new slab into __slab_alloc.  That is useful for
      Mathieu's cmpxchg_local patchset and also allows us to remove the crude
      local_irq_off in early_kmem_cache_alloc().
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b811c202
    • A
      slub: list_locations() can use GFP_TEMPORARY · ea3061d2
      Andrew Morton 提交于
      It's a short-lived allocation.
      
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea3061d2
    • C
      SLUB: Optimize cacheline use for zeroing · 42a9fdbb
      Christoph Lameter 提交于
      We touch a cacheline in the kmem_cache structure for zeroing to get the
      size. However, the hot paths in slab_alloc and slab_free do not reference
      any other fields in kmem_cache, so we may have to just bring in the
      cacheline for this one access.
      
      Add a new field to kmem_cache_cpu that contains the object size. That
      cacheline must already be used in the hotpaths. So we save one cacheline
      on every slab_alloc if we zero.
      
      We need to update the kmem_cache_cpu object size if an aliasing operation
      changes the objsize of an non debug slab.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a9fdbb
    • C
      SLUB: Place kmem_cache_cpu structures in a NUMA aware way · 4c93c355
      Christoph Lameter 提交于
      The kmem_cache_cpu structures introduced are currently an array placed in the
      kmem_cache struct. Meaning the kmem_cache_cpu structures are overwhelmingly
      on the wrong node for systems with a higher amount of nodes. These are
      performance critical structures since the per node information has
      to be touched for every alloc and free in a slab.
      
      In order to place the kmem_cache_cpu structure optimally we put an array
      of pointers to kmem_cache_cpu structs in kmem_cache (similar to SLAB).
      
      However, the kmem_cache_cpu structures can now be allocated in a more
      intelligent way.
      
      We would like to put per cpu structures for the same cpu but different
      slab caches in cachelines together to save space and decrease the cache
      footprint. However, the slab allocators itself control only allocations
      per node. We set up a simple per cpu array for every processor with
      100 per cpu structures which is usually enough to get them all set up right.
      If we run out then we fall back to kmalloc_node. This also solves the
      bootstrap problem since we do not have to use slab allocator functions
      early in boot to get memory for the small per cpu structures.
      
      Pro:
      	- NUMA aware placement improves memory performance
      	- All global structures in struct kmem_cache become readonly
      	- Dense packing of per cpu structures reduces cacheline
      	  footprint in SMP and NUMA.
      	- Potential avoidance of exclusive cacheline fetches
      	  on the free and alloc hotpath since multiple kmem_cache_cpu
      	  structures are in one cacheline. This is particularly important
      	  for the kmalloc array.
      
      Cons:
      	- Additional reference to one read only cacheline (per cpu
      	  array of pointers to kmem_cache_cpu) in both slab_alloc()
      	  and slab_free().
      
      [akinobu.mita@gmail.com: fix cpu hotplug offline/online path]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: "Pekka Enberg" <penberg@cs.helsinki.fi>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c93c355