1. 08 5月, 2007 3 次提交
    • C
      SLUB core · 81819f0f
      Christoph Lameter 提交于
      This is a new slab allocator which was motivated by the complexity of the
      existing code in mm/slab.c. It attempts to address a variety of concerns
      with the existing implementation.
      
      A. Management of object queues
      
         A particular concern was the complex management of the numerous object
         queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
         each allocating CPU and use objects from a slab directly instead of
         queueing them up.
      
      B. Storage overhead of object queues
      
         SLAB Object queues exist per node, per CPU. The alien cache queue even
         has a queue array that contain a queue for each processor on each
         node. For very large systems the number of queues and the number of
         objects that may be caught in those queues grows exponentially. On our
         systems with 1k nodes / processors we have several gigabytes just tied up
         for storing references to objects for those queues  This does not include
         the objects that could be on those queues. One fears that the whole
         memory of the machine could one day be consumed by those queues.
      
      C. SLAB meta data overhead
      
         SLAB has overhead at the beginning of each slab. This means that data
         cannot be naturally aligned at the beginning of a slab block. SLUB keeps
         all meta data in the corresponding page_struct. Objects can be naturally
         aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
         boundaries and can fit tightly into a 4k page with no bytes left over.
         SLAB cannot do this.
      
      D. SLAB has a complex cache reaper
      
         SLUB does not need a cache reaper for UP systems. On SMP systems
         the per CPU slab may be pushed back into partial list but that
         operation is simple and does not require an iteration over a list
         of objects. SLAB expires per CPU, shared and alien object queues
         during cache reaping which may cause strange hold offs.
      
      E. SLAB has complex NUMA policy layer support
      
         SLUB pushes NUMA policy handling into the page allocator. This means that
         allocation is coarser (SLUB does interleave on a page level) but that
         situation was also present before 2.6.13. SLABs application of
         policies to individual slab objects allocated in SLAB is
         certainly a performance concern due to the frequent references to
         memory policies which may lead a sequence of objects to come from
         one node after another. SLUB will get a slab full of objects
         from one node and then will switch to the next.
      
      F. Reduction of the size of partial slab lists
      
         SLAB has per node partial lists. This means that over time a large
         number of partial slabs may accumulate on those lists. These can
         only be reused if allocator occur on specific nodes. SLUB has a global
         pool of partial slabs and will consume slabs from that pool to
         decrease fragmentation.
      
      G. Tunables
      
         SLAB has sophisticated tuning abilities for each slab cache. One can
         manipulate the queue sizes in detail. However, filling the queues still
         requires the uses of the spin lock to check out slabs. SLUB has a global
         parameter (min_slab_order) for tuning. Increasing the minimum slab
         order can decrease the locking overhead. The bigger the slab order the
         less motions of pages between per CPU and partial lists occur and the
         better SLUB will be scaling.
      
      G. Slab merging
      
         We often have slab caches with similar parameters. SLUB detects those
         on boot up and merges them into the corresponding general caches. This
         leads to more effective memory use. About 50% of all caches can
         be eliminated through slab merging. This will also decrease
         slab fragmentation because partial allocated slabs can be filled
         up again. Slab merging can be switched off by specifying
         slub_nomerge on boot up.
      
         Note that merging can expose heretofore unknown bugs in the kernel
         because corrupted objects may now be placed differently and corrupt
         differing neighboring objects. Enable sanity checks to find those.
      
      H. Diagnostics
      
         The current slab diagnostics are difficult to use and require a
         recompilation of the kernel. SLUB contains debugging code that
         is always available (but is kept out of the hot code paths).
         SLUB diagnostics can be enabled via the "slab_debug" option.
         Parameters can be specified to select a single or a group of
         slab caches for diagnostics. This means that the system is running
         with the usual performance and it is much more likely that
         race conditions can be reproduced.
      
      I. Resiliency
      
         If basic sanity checks are on then SLUB is capable of detecting
         common error conditions and recover as best as possible to allow the
         system to continue.
      
      J. Tracing
      
         Tracing can be enabled via the slab_debug=T,<slabcache> option
         during boot. SLUB will then protocol all actions on that slabcache
         and dump the object contents on free.
      
      K. On demand DMA cache creation.
      
         Generally DMA caches are not needed. If a kmalloc is used with
         __GFP_DMA then just create this single slabcache that is needed.
         For systems that have no ZONE_DMA requirement the support is
         completely eliminated.
      
      L. Performance increase
      
         Some benchmarks have shown speed improvements on kernbench in the
         range of 5-10%. The locking overhead of slub is based on the
         underlying base allocation size. If we can reliably allocate
         larger order pages then it is possible to increase slub
         performance much further. The anti-fragmentation patches may
         enable further performance increases.
      
      Tested on:
      i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator
      
      SLUB Boot options
      
      slub_nomerge		Disable merging of slabs
      slub_min_order=x	Require a minimum order for slab caches. This
      			increases the managed chunk size and therefore
      			reduces meta data and locking overhead.
      slub_min_objects=x	Mininum objects per slab. Default is 8.
      slub_max_order=x	Avoid generating slabs larger than order specified.
      slub_debug		Enable all diagnostics for all caches
      slub_debug=<options>	Enable selective options for all caches
      slub_debug=<o>,<cache>	Enable selective options for a certain set of
      			caches
      
      Available Debug options
      F		Double Free checking, sanity and resiliency
      R		Red zoning
      P		Object / padding poisoning
      U		Track last free / alloc
      T		Trace all allocs / frees (only use for individual slabs).
      
      To use SLUB: Apply this patch and then select SLUB as the default slab
      allocator.
      
      [hugh@veritas.com: fix an oops-causing locking error]
      [akpm@linux-foundation.org: various stupid cleanups and small fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81819f0f
    • A
      mm/slab.c: proper prototypes · ac267728
      Adrian Bunk 提交于
      Add proper prototypes in include/linux/slab.h.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac267728
    • P
      slab: introduce krealloc · fd76bab2
      Pekka Enberg 提交于
      This introduce krealloc() that reallocates memory while keeping the contents
      unchanged.  The allocator avoids reallocation if the new size fits the
      currently used cache.  I also added a simple non-optimized version for
      mm/slob.c for compatibility.
      
      [akpm@linux-foundation.org: fix warnings]
      Acked-by: NJosef Sipek <jsipek@fsl.cs.sunysb.edu>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd76bab2
  2. 14 12月, 2006 2 次提交
    • C
      [PATCH] More slab.h cleanups · 55935a34
      Christoph Lameter 提交于
      More cleanups for slab.h
      
      1. Remove tabs from weird locations as suggested by Pekka
      
      2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
         as suggested by Pekka.
      
      3. Uses static inline for the fallback defs as also suggested by Pekka.
      
      4. Make kmem_ptr_valid take a const * argument.
      
      5. Separate the NUMA fallback definitions from the kmalloc_track fallback
         definitions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55935a34
    • C
      [PATCH] Cleanup slab headers / API to allow easy addition of new slab allocators · 2e892f43
      Christoph Lameter 提交于
      This is a response to an earlier discussion on linux-mm about splitting
      slab.h components per allocator.  Patch is against 2.6.19-git11.  See
      http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2
      
      This patch cleans up the slab header definitions.  We define the common
      functions of slob and slab in slab.h and put the extra definitions needed
      for slab's kmalloc implementations in <linux/slab_def.h>.  In order to get
      a greater set of common functions we add several empty functions to slob.c
      and also rename slob's kmalloc to __kmalloc.
      
      Slob does not need any special definitions since we introduce a fallback
      case.  If there is no need for a slab implementation to provide its own
      kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
      That is sufficient for SLOB.
      
      Sort the function in slab.h according to their functionality.  First the
      functions operating on struct kmem_cache * then the kmalloc related
      functions followed by special debug and fallback definitions.
      
      Also redo a lot of comments.
      
      Signed-off-by: Christoph Lameter <clameter@sgi.com>?
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2e892f43
  3. 08 12月, 2006 18 次提交
  4. 04 10月, 2006 2 次提交
  5. 27 9月, 2006 1 次提交
  6. 26 9月, 2006 3 次提交
  7. 23 6月, 2006 1 次提交
  8. 16 5月, 2006 1 次提交
  9. 26 4月, 2006 1 次提交
  10. 29 3月, 2006 1 次提交
  11. 26 3月, 2006 3 次提交
    • P
      [PATCH] slab: optimize constant-size kzalloc calls · 40c07ae8
      Pekka Enberg 提交于
      As suggested by Eric Dumazet, optimize kzalloc() calls that pass a
      compile-time constant size.  Please note that the patch increases kernel
      text slightly (~200 bytes for defconfig on x86).
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      40c07ae8
    • P
      [PATCH] slab: introduce kmem_cache_zalloc allocator · a8c0f9a4
      Pekka Enberg 提交于
      Introduce a memory-zeroing variant of kmem_cache_alloc.  The allocator
      already exits in XFS and there are potential users for it so this patch
      makes the allocator available for the general public.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a8c0f9a4
    • A
      [PATCH] slab: implement /proc/slab_allocators · 871751e2
      Al Viro 提交于
      Implement /proc/slab_allocators.   It produces output like:
      
      idr_layer_cache: 80 idr_pre_get+0x33/0x4e
      buffer_head: 2555 alloc_buffer_head+0x20/0x75
      mm_struct: 9 mm_alloc+0x1e/0x42
      mm_struct: 20 dup_mm+0x36/0x370
      vm_area_struct: 384 dup_mm+0x18f/0x370
      vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
      vm_area_struct: 1 split_vma+0x5a/0x10e
      vm_area_struct: 11 do_brk+0x206/0x2e2
      vm_area_struct: 2 copy_vma+0xda/0x142
      vm_area_struct: 9 setup_arg_pages+0x99/0x214
      fs_cache: 8 copy_fs_struct+0x21/0x133
      fs_cache: 29 copy_process+0xf38/0x10e3
      files_cache: 30 alloc_files+0x1b/0xcf
      signal_cache: 81 copy_process+0xbaa/0x10e3
      sighand_cache: 77 copy_process+0xe65/0x10e3
      sighand_cache: 1 de_thread+0x4d/0x5f8
      anon_vma: 241 anon_vma_prepare+0xd9/0xf3
      size-2048: 1 add_sect_attrs+0x5f/0x145
      size-2048: 2 journal_init_revoke+0x99/0x302
      size-2048: 2 journal_init_revoke+0x137/0x302
      size-2048: 2 journal_init_inode+0xf9/0x1c4
      
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Alexander Nyberg <alexn@telia.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      DESC
      slab-leaks3-locking-fix
      EDESC
      From: Andrew Morton <akpm@osdl.org>
      
      Update for slab-remove-cachep-spinlock.patch
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Alexander Nyberg <alexn@telia.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Ravikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      871751e2
  12. 24 3月, 2006 1 次提交
    • P
      [PATCH] cpuset memory spread slab cache implementation · 101a5001
      Paul Jackson 提交于
      Provide the slab cache infrastructure to support cpuset memory spreading.
      
      See the previous patches, cpuset_mem_spread, for an explanation of cpuset
      memory spreading.
      
      This patch provides a slab cache SLAB_MEM_SPREAD flag.  If set in the
      kmem_cache_create() call defining a slab cache, then any task marked with the
      process state flag PF_MEMSPREAD will spread memory page allocations for that
      cache over all the allowed nodes, instead of preferring the local (faulting)
      node.
      
      On systems not configured with CONFIG_NUMA, this results in no change to the
      page allocation code path for slab caches.
      
      On systems with cpusets configured in the kernel, but the "memory_spread"
      cpuset option not enabled for the current tasks cpuset, this adds a call to a
      cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.
      
      For tasks so marked, a second inline test is done for the slab cache flag
      SLAB_MEM_SPREAD, and if that is set and if the allocation is not
      in_interrupt(), this adds a call to to a cpuset routine that computes which of
      the tasks mems_allowed nodes should be preferred for this allocation.
      
      ==> This patch adds another hook into the performance critical
          code path to allocating objects from the slab cache, in the
          ____cache_alloc() chunk, below.  The next patch optimizes this
          hook, reducing the impact of the combined mempolicy plus memory
          spreading hooks on this critical code path to a single check
          against the tasks task_struct flags word.
      
      This patch provides the generic slab flags and logic needed to apply memory
      spreading to a particular slab.
      
      A subsequent patch will mark a few specific slab caches for this placement
      policy.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      101a5001
  13. 22 3月, 2006 2 次提交
    • C
      [PATCH] slab: Remove SLAB_NO_REAP option · ac2b898c
      Christoph Lameter 提交于
      SLAB_NO_REAP is documented as an option that will cause this slab not to be
      reaped under memory pressure.  However, that is not what happens.  The only
      thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
      slab elements that were allocated in batch in cache_reap().  Cache_reap()
      is run every few seconds independently of memory pressure.
      
      Could we remove the whole thing?  Its only used by three slabs anyways and
      I cannot find a reason for having this option.
      
      There is an additional problem with SLAB_NO_REAP.  If set then the recovery
      of objects from alien caches is switched off.  Objects not freed on the
      same node where they were initially allocated will only be reused if a
      certain amount of objects accumulates from one alien node (not very likely)
      or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
      check for SLAB_NO_REAP)
      
      Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac2b898c
    • A
      [PATCH] kcalloc(): INT_MAX -> ULONG_MAX · b50ec7d8
      Adrian Bunk 提交于
      Since size_t has the same size as a long on all architectures, it's enough
      for overflow checks to check against ULONG_MAX.
      
      This change could allow a compiler better optimization (especially in the
      n=1 case).
      
      The practical effect seems to be positive, but quite small:
      
          text           data     bss      dec            hex filename
      21762380        5859870 1848928 29471178        1c1b1ca vmlinux-old
      21762211        5859870 1848928 29471009        1c1b121 vmlinux-patched
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b50ec7d8
  14. 02 2月, 2006 1 次提交