1. 02 2月, 2006 1 次提交
  2. 19 1月, 2006 3 次提交
    • C
      [PATCH] mm: optimize numa policy handling in slab allocator · 86c562a9
      Christoph Lameter 提交于
      Move the interrupt check from slab_node into ___cache_alloc and adds an
      "unlikely()" to avoid pipeline stalls on some architectures.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86c562a9
    • C
      [PATCH] NUMA policies in the slab allocator V2 · dc85da15
      Christoph Lameter 提交于
      This patch fixes a regression in 2.6.14 against 2.6.13 that causes an
      imbalance in memory allocation during bootup.
      
      The slab allocator in 2.6.13 is not numa aware and simply calls
      alloc_pages().  This means that memory policies may control the behavior of
      alloc_pages().  During bootup the memory policy is set to MPOL_INTERLEAVE
      resulting in the spreading out of allocations during bootup over all
      available nodes.  The slab allocator in 2.6.13 has only a single list of
      slab pages.  As a result the per cpu slab cache and the spinlock controlled
      page lists may contain slab entries from off node memory.  The slab
      allocator in 2.6.13 makes no effort to discern the locality of an entry on
      its lists.
      
      The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages
      explicitly by calling alloc_pages_node().  The NUMA slab allocator manages
      slab entries by having lists of available slab pages for each node.  The
      per cpu slab cache can only contain slab entries associated with the node
      local to the processor.  This guarantees that the default allocation mode
      of the slab allocator always assigns local memory if available.
      
      Setting MPOL_INTERLEAVE as a default policy during bootup has no effect
      anymore.  In 2.6.14 all node unspecific slab allocations are performed on
      the boot processor.  This means that most of key data structures are
      allocated on one node.  Most processors will have to refer to these
      structures making the boot node a potential bottleneck.  This may reduce
      performance and cause unnecessary memory pressure on the boot node.
      
      This patch implements NUMA policies in the slab layer.  There is the need
      of explicit application of NUMA memory policies by the slab allcator itself
      since the NUMA slab allocator does no longer let the page_allocator control
      locality.
      
      The check for policies is made directly at the beginning of __cache_alloc
      using current->mempolicy.  The memory policy is already frequently checked
      by the page allocator (alloc_page_vma() and alloc_page_current()).  So it
      is highly likely that the cacheline is present.  For MPOL_INTERLEAVE
      kmalloc() will spread out each request to one node after another so that an
      equal distribution of allocations can be obtained during bootup.
      
      It is not possible to push the policy check to lower layers of the NUMA
      slab allocator since the per cpu caches are now only containing slab
      entries from the current node.  If the policy says that the local node is
      not to be preferred or forbidden then there is no point in checking the
      slab cache or local list of slab pages.  The allocation better be directed
      immediately to the lists containing slab entries for the allowed set of
      nodes.
      
      This way of applying policy also fixes another strange behavior in 2.6.13.
      alloc_pages() is controlled by the memory allocation policy of the current
      process.  It could therefore be that one process is running with
      MPOL_INTERLEAVE and would f.e.  obtain a new page following that policy
      since no slab entries are in the lists anymore.  A page can typically be
      used for multiple slab entries but lets say that the current process is
      only using one.  The other entries are then added to the slab lists.  These
      are now non local entries in the slab lists despite of the possible
      availability of local pages that would provide faster access and increase
      the performance of the application.
      
      Another process without MPOL_INTERLEAVE may now run and expect a local slab
      entry from kmalloc().  However, there are still these free slab entries
      from the off node page obtained from the other process via MPOL_INTERLEAVE
      in the cache.  The process will then get an off node slab entry although
      other slab entries may be available that are local to that process.  This
      means that the policy if one process may contaminate the locality of the
      slab caches for other processes.
      
      This patch in effect insures that a per process policy is followed for the
      allocation of slab entries and that there cannot be a memory policy
      influence from one process to another.  A process with default policy will
      always get a local slab entry if one is available.  And the process using
      memory policies will get its memory arranged as requested.  Off-node slab
      allocation will require the use of spinlocks and will make the use of per
      cpu caches not possible.  A process using memory policies to redirect
      allocations offnode will have to cope with additional lock overhead in
      addition to the latency added by the need to access a remote slab entry.
      
      Changes V1->V2
      - Remove #ifdef CONFIG_NUMA by moving forward declaration into
        prior #ifdef CONFIG_NUMA section.
      
      - Give the function determining the node number to use a saner
        name.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dc85da15
    • I
      [PATCH] sem2mutex: mm/slab.c · fc0abb14
      Ingo Molnar 提交于
      Convert mm/swapfile.c's swapon_sem to swapon_mutex.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fc0abb14
  3. 12 1月, 2006 1 次提交
  4. 10 1月, 2006 1 次提交
  5. 09 1月, 2006 6 次提交
  6. 14 11月, 2005 2 次提交
  7. 08 11月, 2005 1 次提交
  8. 07 11月, 2005 4 次提交
  9. 30 10月, 2005 1 次提交
  10. 28 10月, 2005 1 次提交
  11. 09 10月, 2005 1 次提交
  12. 28 9月, 2005 1 次提交
  13. 24 9月, 2005 1 次提交
  14. 23 9月, 2005 3 次提交
  15. 15 9月, 2005 1 次提交
    • A
      [PATCH] Fix slab BUG_ON() triggered by change in array cache size · c7e43c78
      Alok Kataria 提交于
      With the new changes that we made in the initialization of the slab
      allocator, we first setup the cache from which array caches are allocated,
      and then the cache, from which kmem_list3's are allocated.
      
      Now if the array cache comes from a cache in which objsize > 32, (in this
      instance size-64) then, first size-64 cache will be allocated and then the
      size-128 (if this is the cache from which kmem_list3's are going to be
      allocated).
      
      So with these new changes, we are not guaranteed that we will be
      initializing the malloc_sizes array in a serialized order. Thus there is
      a bug in __find_general_cachep, as we are checking whether the first
      cache_sizes ptr is NULL.
      
      This is replaced by checking whether the array-cache cache is initialized.
      Attached is a patch which does that.  Boots fine on a x86-64, with
      DEBUG_SPIN, DEBUG_SLAB, and preempt.
      
      Attached is a patch which does that.  Boots fine on a x86-64, with
      DEBUG_SPIN, DEBUG_SLAB, and preempt.Thanks & Regards, Alok
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: Shobhit Dayal <shobhitdayal.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7e43c78
  16. 11 9月, 2005 1 次提交
  17. 10 9月, 2005 2 次提交
    • P
      [PATCH] update kfree, vfree, and vunmap kerneldoc · 80e93eff
      Pekka Enberg 提交于
      This patch clarifies NULL handling of kfree() and vfree().  I addition,
      wording of calling context restriction for vfree() and vunmap() are changed
      from "may not" to "must not."
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      80e93eff
    • C
      [PATCH] Numa-aware slab allocator V5 · e498be7d
      Christoph Lameter 提交于
      The NUMA API change that introduced kmalloc_node was accepted for
      2.6.12-rc3.  Now it is possible to do slab allocations on a node to
      localize memory structures.  This API was used by the pageset localization
      patch and the block layer localization patch now in mm.  The existing
      kmalloc_node is slow since it simply searches through all pages of the slab
      to find a page that is on the node requested.  The two patches do a one
      time allocation of slab structures at initialization and therefore the
      speed of kmalloc node does not matter.
      
      This patch allows kmalloc_node to be as fast as kmalloc by introducing node
      specific page lists for partial, free and full slabs.  Slab allocation
      improves in a NUMA system so that we are seeing a performance gain in AIM7
      of about 5% with this patch alone.
      
      More NUMA localizations are possible if kmalloc_node operates in an fast
      way like kmalloc.
      
      Test run on a 32p systems with 32G Ram.
      
      w/o patch
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      485.36  100       485.3640     11.99      1.91   Sat Apr 30 14:01:51 2005
        100    26582.63   88       265.8263     21.89    144.96   Sat Apr 30 14:02:14 2005
        200    29866.83   81       149.3342     38.97    286.08   Sat Apr 30 14:02:53 2005
        300    33127.16   78       110.4239     52.71    426.54   Sat Apr 30 14:03:46 2005
        400    34889.47   80        87.2237     66.72    568.90   Sat Apr 30 14:04:53 2005
        500    35654.34   76        71.3087     81.62    714.55   Sat Apr 30 14:06:15 2005
        600    36460.83   75        60.7681     95.77    853.42   Sat Apr 30 14:07:51 2005
        700    35957.00   75        51.3671    113.30    990.67   Sat Apr 30 14:09:45 2005
        800    33380.65   73        41.7258    139.48   1140.86   Sat Apr 30 14:12:05 2005
        900    35095.01   76        38.9945    149.25   1281.30   Sat Apr 30 14:14:35 2005
       1000    36094.37   74        36.0944    161.24   1419.66   Sat Apr 30 14:17:17 2005
      
      w/patch
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      484.27  100       484.2736     12.02      1.93   Sat Apr 30 15:59:45 2005
        100    28262.03   90       282.6203     20.59    143.57   Sat Apr 30 16:00:06 2005
        200    32246.45   82       161.2322     36.10    282.89   Sat Apr 30 16:00:42 2005
        300    37945.80   83       126.4860     46.01    418.75   Sat Apr 30 16:01:28 2005
        400    40000.69   81       100.0017     58.20    561.48   Sat Apr 30 16:02:27 2005
        500    40976.10   78        81.9522     71.02    696.95   Sat Apr 30 16:03:38 2005
        600    41121.54   78        68.5359     84.92    834.86   Sat Apr 30 16:05:04 2005
        700    44052.77   78        62.9325     92.48    971.53   Sat Apr 30 16:06:37 2005
        800    41066.89   79        51.3336    113.38   1111.15   Sat Apr 30 16:08:31 2005
        900    38918.77   79        43.2431    134.59   1252.57   Sat Apr 30 16:10:46 2005
       1000    41842.21   76        41.8422    139.09   1392.33   Sat Apr 30 16:13:05 2005
      
      These are measurement taken directly after boot and show a greater
      improvement than 5%.  However, the performance improvements become less
      over time if the AIM7 runs are repeated and settle down at around 5%.
      
      Links to earlier discussions:
      http://marc.theaimsgroup.com/?t=111094594500003&r=1&w=2
      http://marc.theaimsgroup.com/?t=111603406600002&r=1&w=2
      
      Changelog V4-V5:
      - alloc_arraycache and alloc_aliencache take node parameter instead of cpu
      - fix initialization so that nodes without cpus are properly handled.
      - simplify code in kmem_cache_init
      - patch against Andrews temp mm3 release
      - Add Shai to credits
      - fallback to __cache_alloc from __cache_alloc_node if the node's cache
        is not available yet.
      
      Changelog V3-V4:
      - Patch against 2.6.12-rc5-mm1
      - Cleanup patch integrated
      - More and better use of for_each_node and for_each_cpu
      - GCC 2.95 fix (do not use [] use [0])
      - Correct determination of INDEX_AC
      - Remove hack to cause an error on platforms that have no CONFIG_NUMA but nodes.
      - Remove list3_data and list3_data_ptr macros for better readability
      
      Changelog V2-V3:
      - Made to patch against 2.6.12-rc4-mm1
      - Revised bootstrap mechanism so that larger size kmem_list3 structs can be
        supported. Do a generic solution so that the right slab can be found
        for the internal structs.
      - use for_each_online_node
      
      Changelog V1-V2:
      - Batching for freeing of wrong-node objects (alien caches)
      - Locking changes and NUMA #ifdefs as requested by Manfred
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: NShobhit Dayal <shobhit@calsoftinc.com>
      Signed-off-by: NShai Fultheim <Shai@Scalex86.org>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e498be7d
  18. 08 9月, 2005 1 次提交
  19. 05 9月, 2005 4 次提交
  20. 08 7月, 2005 1 次提交
  21. 07 7月, 2005 1 次提交
  22. 24 6月, 2005 1 次提交
  23. 22 6月, 2005 1 次提交
    • C
      [PATCH] Periodically drain non local pagesets · 4ae7c039
      Christoph Lameter 提交于
      The pageset array can potentially acquire a huge amount of memory on large
      NUMA systems.  F.e.  on a system with 512 processors and 256 nodes there
      will be 256*512 pagesets.  If each pageset only holds 5 pages then we are
      talking about 655360 pages.With a 16K page size on IA64 this results in
      potentially 10 Gigabytes of memory being trapped in pagesets.  The typical
      cases are much less for smaller systems but there is still the potential of
      memory being trapped in off node pagesets.  Off node memory may be rarely
      used if local memory is available and so we may potentially have memory in
      seldom used pagesets without this patch.
      
      The slab allocator flushes its per cpu caches every 2 seconds.  The
      following patch flushes the off node pageset caches in the same way by
      tying into the slab flush.
      
      The patch also changes /proc/zoneinfo to include the number of pages
      currently in each pageset.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ae7c039