1. 17 5月, 2007 6 次提交
  2. 11 5月, 2007 2 次提交
    • C
      SLUB: remove nr_cpu_ids hack · bcf889f9
      Christoph Lameter 提交于
      This was in SLUB in order to head off trouble while the nr_cpu_ids
      functionality was not merged.  Its merged now so no need to still have this.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf889f9
    • C
      slub: support concurrent local and remote frees and allocs on a slab · 894b8788
      Christoph Lameter 提交于
      Avoid atomic overhead in slab_alloc and slab_free
      
      SLUB needs to use the slab_lock for the per cpu slabs to synchronize with
      potential kfree operations.  This patch avoids that need by moving all free
      objects onto a lockless_freelist.  The regular freelist continues to exist
      and will be used to free objects.  So while we consume the
      lockless_freelist the regular freelist may build up objects.
      
      If we are out of objects on the lockless_freelist then we may check the
      regular freelist.  If it has objects then we move those over to the
      lockless_freelist and do this again.  There is a significant savings in
      terms of atomic operations that have to be performed.
      
      We can even free directly to the lockless_freelist if we know that we are
      running on the same processor.  So this speeds up short lived objects.
      They may be allocated and freed without taking the slab_lock.  This is
      particular good for netperf.
      
      In order to maximize the effect of the new faster hotpath we extract the
      hottest performance pieces into inlined functions.  These are then inlined
      into kmem_cache_alloc and kmem_cache_free.  So hotpath allocation and
      freeing no longer requires a subroutine call within SLUB.
      
      [I am not sure that it is worth doing this because it changes the easy to
      read structure of slub just to reduce atomic ops.  However, there is
      someone out there with a benchmark on 4 way and 8 way processor systems
      that seems to show a 5% regression vs.  Slab.  Seems that the regression is
      due to increased atomic operations use vs.  SLAB in SLUB).  I wonder if
      this is applicable or discernable at all in a real workload?]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      894b8788
  3. 10 5月, 2007 18 次提交
  4. 08 5月, 2007 14 次提交
    • L
      Fix up SLUB compile · 0f9008ef
      Linus Torvalds 提交于
      The newly merged SLUB allocator patches had been generated before the
      removal of "struct subsystem", and ended up applying fine, but wouldn't
      build based on the current tree as a result.
      
      Fix up that merge error - not that SLUB is likely really ready for
      showtime yet, but at least I can fix the trivial stuff.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f9008ef
    • C
      Slab allocators: remove useless __GFP_NO_GROW flag · cfce6604
      Christoph Lameter 提交于
      There is no user remaining and I have never seen any use of that flag.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfce6604
    • C
      slab allocators: Remove SLAB_CTOR_ATOMIC · 4f104934
      Christoph Lameter 提交于
      SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
      that one would want to do something serious in a constructor or destructor.
       In particular given that the slab allocators run with interrupts disabled.
       Actions in constructors and destructors are by their nature very limited
      and usually do not go beyond initializing variables and list operations.
      
      (The i386 pgd ctor and dtors do take a spinlock in constructor and
      destructor.....  I think that is the furthest we go at this point.)
      
      There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
      establishes a certain symmetry.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f104934
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
    • C
      slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN · 5af60839
      Christoph Lameter 提交于
      This patch was recently posted to lkml and acked by Pekka.
      
      The flag SLAB_MUST_HWCACHE_ALIGN is
      
      1. Never checked by SLAB at all.
      
      2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB
      
      3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.
      
      The only remaining use is in sparc64 and ppc64 and their use there
      reflects some earlier role that the slab flag once may have had. If
      its specified then SLAB_HWCACHE_ALIGN is also specified.
      
      The flag is confusing, inconsistent and has no purpose.
      
      Remove it.
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5af60839
    • C
      slub: remove object activities out of checking functions · 70d71228
      Christoph Lameter 提交于
      Make sure that the check function really only check things and do not perform
      activities.  Extract the tracing and object seeding out of the two check
      functions and place them into slab_alloc and slab_free
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70d71228
    • C
      SLUB: Free slabs and sort partial slab lists in kmem_cache_shrink · 2086d26a
      Christoph Lameter 提交于
      At kmem_cache_shrink check if we have any empty slabs on the partial
      if so then remove them.
      
      Also--as an anti-fragmentation measure--sort the partial slabs so that
      the most fully allocated ones come first and the least allocated last.
      
      The next allocations may fill up the nearly full slabs. Having the
      least allocated slabs last gives them the maximum chance that their
      remaining objects may be freed. Thus we can hopefully minimize the
      partial slabs.
      
      I think this is the best one can do in terms antifragmentation
      measures. Real defragmentation (meaning moving objects out of slabs with
      the least free objects to those that are almost full) can be implemted
      by reverse scanning through the list produced here but that would mean
      that we need to provide a callback at slab cache creation that allows
      the deletion or moving of an object. This will involve slab API
      changes, so defer for now.
      
      Cc: Mel Gorman <mel@skynet.ie>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2086d26a
    • C
      slub: add ability to list alloc / free callers per slab · 88a420e4
      Christoph Lameter 提交于
      This patch enables listing the callers who allocated or freed objects in a
      cache.
      
      For example to list the allocators for kmalloc-128 do
      
      cat /sys/slab/kmalloc-128/alloc_calls
            7 sn_io_slot_fixup+0x40/0x700
            7 sn_io_slot_fixup+0x80/0x700
            9 sn_bus_fixup+0xe0/0x380
            6 param_sysfs_setup+0xf0/0x280
          276 percpu_populate+0xf0/0x1a0
           19 __register_chrdev_region+0x30/0x360
            8 expand_files+0x2e0/0x6e0
            1 sys_epoll_create+0x60/0x200
            1 __mounts_open+0x140/0x2c0
           65 kmem_alloc+0x110/0x280
            3 alloc_disk_node+0xe0/0x200
           33 as_get_io_context+0x90/0x280
           74 kobject_kset_add_dir+0x40/0x140
           12 pci_create_bus+0x2a0/0x5c0
            1 acpi_ev_create_gpe_block+0x120/0x9e0
           41 con_insert_unipair+0x100/0x1c0
            1 uart_open+0x1c0/0xba0
            1 dma_pool_create+0xe0/0x340
            2 neigh_table_init_no_netlink+0x260/0x4c0
            6 neigh_parms_alloc+0x30/0x200
            1 netlink_kernel_create+0x130/0x320
            5 fz_hash_alloc+0x50/0xe0
            2 sn_common_hubdev_init+0xd0/0x6e0
           28 kernel_param_sysfs_setup+0x30/0x180
           72 process_zones+0x70/0x2e0
      
      cat /sys/slab/kmalloc-128/free_calls
          558 <not-available>
            3 sn_io_slot_fixup+0x600/0x700
           84 free_fdtable_rcu+0x120/0x260
            2 seq_release+0x40/0x60
            6 kmem_free+0x70/0xc0
           24 free_as_io_context+0x20/0x200
            1 acpi_get_object_info+0x3a0/0x3e0
            1 acpi_add_single_object+0xcf0/0x1e40
            2 con_release_unimap+0x80/0x140
            1 free+0x20/0x40
      
      SLAB_STORE_USER must be enabled for a slab cache by either booting with
      "slab_debug" or enabling user tracking specifically for the slab of interest.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88a420e4
    • C
      SLUB: Add MIN_PARTIAL · e95eed57
      Christoph Lameter 提交于
      We leave a mininum of partial slabs on nodes when we search for
      partial slabs on other node. Define a constant for that value.
      
      Then modify slub to keep MIN_PARTIAL slabs around.
      
      This avoids bad situations where a function frees the last object
      in a slab (which results in the page being returned to the page
      allocator) only to then allocate one again (which requires getting
      a page back from the page allocator if the partial list was empty).
      Keeping a couple of slabs on the partial list reduces overhead.
      
      Empty slabs are added to the end of the partial list to insure that
      partially allocated slabs are consumed first (defragmentation).
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e95eed57
    • C
      slub: validation of slabs (metadata and guard zones) · 53e15af0
      Christoph Lameter 提交于
      This enables validation of slab.  Validation means that all objects are
      checked to see if there are redzone violations, if padding has been
      overwritten or any pointers have been corrupted.  Also checks the consistency
      of slab counters.
      
      Validation enables the detection of metadata corruption without the kernel
      having to execute code that actually uses (allocs/frees) and object.  It
      allows one to make sure that the slab metainformation and the guard values
      around an object have not been compromised.
      
      A single slabcache can be checked by writing a 1 to the "validate" file.
      
      i.e.
      
      echo 1 >/sys/slab/kmalloc-128/validate
      
      or use the slabinfo tool to check all slabs
      
      slabinfo -v
      
      Error messages will show up in the syslog.
      
      Note that validation can only reach slabs that are on a list.  This means that
      we are usually restricted to partial slabs and active slabs unless
      SLAB_STORE_USER is active which will build a full slab list and allows
      validation of slabs that are fully in use.  Booting with "slub_debug" set will
      enable SLAB_STORE_USER and then full diagnostic are available.
      
      Note that we attempt to push cpu slabs back to the lists when we start the
      check.  If the cpu slab is reactivated before we get to it (another processor
      grabs it before we get to it) then it cannot be checked.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53e15af0
    • C
      slub: enable tracking of full slabs · 643b1138
      Christoph Lameter 提交于
      If slab tracking is on then build a list of full slabs so that we can verify
      the integrity of all slabs and are also able to built list of alloc/free
      callers.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      643b1138
    • C
      slub: fix object tracking · 77c5e2d0
      Christoph Lameter 提交于
      Object tracking did not work the right way for several call chains. Fix this up
      by adding a new parameter to slub_alloc and slub_free that specifies the
      caller address explicitly.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77c5e2d0
    • C
    • C
      Make page->private usable in compound pages · d85f3385
      Christoph Lameter 提交于
      If we add a new flag so that we can distinguish between the first page and the
      tail pages then we can avoid to use page->private in the first page.
      page->private == page for the first page, so there is no real information in
      there.
      
      Freeing up page->private makes the use of compound pages more transparent.
      They become more usable like real pages.  Right now we have to be careful f.e.
       if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
      can then no longer use the private field.  This is one of the issues that
      cause us not to support debugging for page size slabs in SLAB.
      
      Having page->private available for SLUB would allow more meta information in
      the page struct.  I can probably avoid the 16 bit ints that I have in there
      right now.
      
      Also if page->private is available then a compound page may be equipped with
      buffer heads.  This may free up the way for filesystems to support larger
      blocks than page size.
      
      We add PageTail as an alias of PageReclaim.  Compound pages cannot currently
      be reclaimed.  Because of the alias one needs to check PageCompound first.
      
      The RFC for the this approach was discussed at
      http://marc.info/?t=117574302800001&r=1&w=2
      
      [nacc@us.ibm.com: fix hugetlbfs]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d85f3385