1. 23 6月, 2006 5 次提交
  2. 03 6月, 2006 1 次提交
    • I
      [PATCH] slab.c: fix offslab_limit bug · b1ab41c4
      Ingo Molnar 提交于
      mm/slab.c's offlab_limit logic is totally broken.
      
      Firstly, "offslab_limit" is a global variable while it should either be
      calculated in situ or should be passed in as a parameter.
      
      Secondly, the more serious problem with it is that the condition for
      calculating it:
      
                     if (!(OFF_SLAB(sizes->cs_cachep))) {
                             offslab_limit = sizes->cs_size - sizeof(struct slab);
                             offslab_limit /= sizeof(kmem_bufctl_t);
      
      is in total disconnect with the condition that makes use of it:
      
                     /* More than offslab_limit objects will cause problems */
                     if ((flags & CFLGS_OFF_SLAB) && num > offslab_limit)
                             break;
      
      but due to offslab_limit being a global variable this breakage was
      hidden.
      
      Up until lockdep came along and perturbed the slab sizes sufficiently so
      that the first off-slab cache would still see a (non-calculated) zero
      value for offslab_limit and would panic with:
      
        kmem_cache_create: couldn't create cache size-512.
      
        Call Trace:
         [<ffffffff8020a5b9>] show_trace+0x96/0x1c8
         [<ffffffff8020a8f0>] dump_stack+0x13/0x15
         [<ffffffff8022994f>] panic+0x39/0x21a
         [<ffffffff80270814>] kmem_cache_create+0x5a0/0x5d0
         [<ffffffff80aced62>] kmem_cache_init+0x193/0x379
         [<ffffffff80abf779>] start_kernel+0x17f/0x218
         [<ffffffff80abf263>] _sinittext+0x263/0x26a
      
        Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-512'
      
      Paolo Ornati's config on x86_64 managed to trigger it.
      
      The fix is to move the calculation to the place that makes use of it.
      This also makes slab.o 54 bytes smaller.
      
      Btw., the check itself is quite silly. Its intention is to test whether
      the number of objects per slab would be higher than the number of slab
      control pointers possible. In theory it could be triggered: if someone
      tried to allocate 4-byte objects cache and explicitly requested with
      CFLGS_OFF_SLAB. So i kept the check.
      
      Out of historic interest i checked how old this bug was and it's
      ancient, 10 years old! It is the oldest hidden and then truly triggering
      bugs i ever saw being fixed in the kernel!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b1ab41c4
  3. 16 5月, 2006 2 次提交
    • R
      [PATCH] slab: Fix kmem_cache_destroy() on NUMA · a4523a8b
      Roland Dreier 提交于
      With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't
      free all objects."  The problem is caused by sequences such as the
      following (suppose we are on a NUMA machine with two nodes, 0 and 1):
      
       * Allocate an object from cache on node 0.
       * Free the object on node 1.  The object is put into node 1's alien
         array_cache for node 0.
       * Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink().
       * __cache_shrink() does drain_cpu_caches(), which loops through all nodes.
         For each node it drains the shared array_cache and then handles the
         alien array_cache for the other node.
      
      However this means that node 0's shared array_cache will be drained,
      and then node 1 will move the contents of its alien[0] array_cache
      into that same shared array_cache.  node 0's shared array_cache is
      never looked at again, so the objects left there will appear to be in
      use when __cache_shrink() calls __node_shrink() for node 0.  So
      __node_shrink() will return 1 and kmem_cache_destroy() will fail.
      
      This patch fixes this by having drain_cpu_caches() do
      drain_alien_cache() on every node before it does drain_array() on the
      nodes' shared array_caches.
      
      The problem was originally reported by Or Gerlitz <ogerlitz@voltaire.com>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a4523a8b
    • M
      [PATCH] add slab_is_available() routine for boot code · 39d24e64
      Mike Kravetz 提交于
      slab_is_available() indicates slab based allocators are available for use.
      SPARSEMEM code needs to know this as it can be called at various times
      during the boot process.
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39d24e64
  4. 29 4月, 2006 1 次提交
  5. 26 4月, 2006 1 次提交
  6. 11 4月, 2006 3 次提交
  7. 02 4月, 2006 1 次提交
  8. 29 3月, 2006 1 次提交
  9. 26 3月, 2006 7 次提交
  10. 24 3月, 2006 3 次提交
    • P
      [PATCH] cpuset: memory_spread_slab drop useless PF_SPREAD_PAGE check · b2455396
      Paul Jackson 提交于
      The hook in the slab cache allocation path to handle cpuset memory
      spreading for tasks in cpusets with 'memory_spread_slab' enabled has a
      modest performance bug.  The hook calls into the memory spreading handler
      alternate_node_alloc() if either of 'memory_spread_slab' or
      'memory_spread_page' is enabled, even though the handler does nothing
      (albeit harmlessly) for the page case
      
      Fix - drop PF_SPREAD_PAGE from the set of flag bits that are used to
      trigger a call to alternate_node_alloc().
      
      The page case is handled by separate hooks -- see the calls conditioned on
      cpuset_do_page_mem_spread() in mm/filemap.c
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b2455396
    • P
      [PATCH] cpuset memory spread slab cache optimizations · c61afb18
      Paul Jackson 提交于
      The hooks in the slab cache allocator code path for support of NUMA
      mempolicies and cpuset memory spreading are in an important code path.  Many
      systems will use neither feature.
      
      This patch optimizes those hooks down to a single check of some bits in the
      current tasks task_struct flags.  For non NUMA systems, this hook and related
      code is already ifdef'd out.
      
      The optimization is done by using another task flag, set if the task is using
      a non-default NUMA mempolicy.  Taking this flag bit along with the
      PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
      memory spreading' patch set, one can check for the combination of any of these
      special case memory placement mechanisms with a single test of the current
      tasks task_struct flags.
      
      This patch also tightens up the code, to save a few bytes of kernel text
      space, and moves some of it out of line.  Due to the nested inlines called
      from multiple places, we were ending up with three copies of this code, which
      once we get off the main code path (for local node allocation) seems a bit
      wasteful of instruction memory.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c61afb18
    • P
      [PATCH] cpuset memory spread slab cache implementation · 101a5001
      Paul Jackson 提交于
      Provide the slab cache infrastructure to support cpuset memory spreading.
      
      See the previous patches, cpuset_mem_spread, for an explanation of cpuset
      memory spreading.
      
      This patch provides a slab cache SLAB_MEM_SPREAD flag.  If set in the
      kmem_cache_create() call defining a slab cache, then any task marked with the
      process state flag PF_MEMSPREAD will spread memory page allocations for that
      cache over all the allowed nodes, instead of preferring the local (faulting)
      node.
      
      On systems not configured with CONFIG_NUMA, this results in no change to the
      page allocation code path for slab caches.
      
      On systems with cpusets configured in the kernel, but the "memory_spread"
      cpuset option not enabled for the current tasks cpuset, this adds a call to a
      cpuset routine and failed bit test of the processor state flag PF_SPREAD_SLAB.
      
      For tasks so marked, a second inline test is done for the slab cache flag
      SLAB_MEM_SPREAD, and if that is set and if the allocation is not
      in_interrupt(), this adds a call to to a cpuset routine that computes which of
      the tasks mems_allowed nodes should be preferred for this allocation.
      
      ==> This patch adds another hook into the performance critical
          code path to allocating objects from the slab cache, in the
          ____cache_alloc() chunk, below.  The next patch optimizes this
          hook, reducing the impact of the combined mempolicy plus memory
          spreading hooks on this critical code path to a single check
          against the tasks task_struct flags word.
      
      This patch provides the generic slab flags and logic needed to apply memory
      spreading to a particular slab.
      
      A subsequent patch will mark a few specific slab caches for this placement
      policy.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      101a5001
  11. 22 3月, 2006 14 次提交
  12. 10 3月, 2006 1 次提交
    • C
      [PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages. · 8fce4d8e
      Christoph Lameter 提交于
      The cache reaper currently tries to free all alien caches and all remote
      per cpu pages in each pass of cache_reap.  For a machines with large number
      of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
      Interrupts are disabled while reclaiming creating unacceptable delays.
      
      This patch changes that behavior by adding a per cpu reap_node variable.
      Instead of attempting to free all caches, we free only one alien cache and
      the per cpu pages from one remote node.  That reduces the time spend in
      cache_reap.  However, doing so will lengthen the time it takes to
      completely drain all remote per cpu pagesets and all alien caches.  The
      time needed will grow with the number of nodes in the system.  All caches
      are drained when they overflow their respective capacity.  So the drawback
      here is only that a bit of memory may be wasted for awhile longer.
      
      Details:
      
      1. Rename drain_remote_pages to drain_node_pages to allow the specification
         of the node to drain of pcp pages.
      
      2. Add additional functions init_reap_node, next_reap_node for NUMA
         that manage a per cpu reap_node counter.
      
      3. Add a reap_alien function that reaps only from the current reap_node.
      
      For us this seems to be a critical issue.  Holdoffs of an average of ~7ms
      cause some HPC benchmarks to slow down significantly.  F.e.  NAS parallel
      slows down dramatically.  NAS parallel has a 12-16 seconds runtime w/o rotor
      compared to 5.8 secs with the rotor patches.  It gets down to 5.05 secs with
      the additional interrupt holdoff reductions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8fce4d8e