1. 10 5月, 2007 10 次提交
  2. 09 5月, 2007 10 次提交
  3. 08 5月, 2007 20 次提交
    • L
      Fix up SLUB compile · 0f9008ef
      Linus Torvalds 提交于
      The newly merged SLUB allocator patches had been generated before the
      removal of "struct subsystem", and ended up applying fine, but wouldn't
      build based on the current tree as a result.
      
      Fix up that merge error - not that SLUB is likely really ready for
      showtime yet, but at least I can fix the trivial stuff.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f9008ef
    • R
      freezer: fix racy usage of try_to_freeze in kswapd · b1296cc4
      Rafael J. Wysocki 提交于
      Currently we can miss freeze_process()->signal_wake_up() in kswapd() if it
      happens between try_to_freeze() and prepare_to_wait().  To prevent this
      from happening we should check freezing(current) before calling schedule().
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1296cc4
    • R
      swsusp: use inline functions for changing page flags · 7be98234
      Rafael J. Wysocki 提交于
      Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc.  with
      calls to inline functions that can be changed in subsequent patches without
      modifying the code calling them.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7be98234
    • A
      slob: fix page order calculation on not 4KB page · 4ab688c5
      Akinobu Mita 提交于
      SLOB doesn't calculate correct page order when page size is not 4KB.  This
      patch fixes it with using get_order() instead of find_order() which is SLOB
      version of get_order().
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ab688c5
    • C
      Slab allocators: remove useless __GFP_NO_GROW flag · cfce6604
      Christoph Lameter 提交于
      There is no user remaining and I have never seen any use of that flag.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfce6604
    • C
      slab allocators: Remove SLAB_CTOR_ATOMIC · 4f104934
      Christoph Lameter 提交于
      SLAB_CTOR atomic is never used which is no surprise since I cannot imagine
      that one would want to do something serious in a constructor or destructor.
       In particular given that the slab allocators run with interrupts disabled.
       Actions in constructors and destructors are by their nature very limited
      and usually do not go beyond initializing variables and list operations.
      
      (The i386 pgd ctor and dtors do take a spinlock in constructor and
      destructor.....  I think that is the furthest we go at this point.)
      
      There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also
      establishes a certain symmetry.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f104934
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
    • B
      get_unmapped_area doesn't need hugetlbfs hacks anymore · 4b1d8929
      Benjamin Herrenschmidt 提交于
      Remove the hugetlbfs specific hacks in toplevel get_unmapped_area() now that
      all archs and hugetlbfs itself do the right thing for both cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b1d8929
    • B
      get_unmapped_area handles MAP_FIXED in generic code · 06abdfb4
      Benjamin Herrenschmidt 提交于
      generic arch_get_unmapped_area() now handles MAP_FIXED.  Now that all
      implementations have been fixed, change the toplevel get_unmapped_area() to
      call into arch or drivers for the MAP_FIXED case.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: William Irwin <bill.irwin@oracle.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06abdfb4
    • D
      oom: fix constraint deadlock · 2b45ab33
      David Rientjes 提交于
      Fixes a deadlock in the OOM killer for allocations that are not
      __GFP_HARDWALL.
      
      Before the OOM killer checks for the allocation constraint, it takes
      callback_mutex.
      
      constrained_alloc() iterates through each zone in the allocation zonelist
      and calls cpuset_zone_allowed_softwall() to determine whether an allocation
      for gfp_mask is possible.  If a zone's node is not in the OOM-triggering
      task's mems_allowed, it is not exiting, and we did not fail on a
      __GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
      callback_mutex to check the nearest exclusive ancestor of current's cpuset.
       This results in deadlock.
      
      We now take callback_mutex after iterating through the zonelist since we
      don't need it yet.
      
      Cc: Andi Kleen <ak@suse.de>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Martin J. Bligh <mbligh@mbligh.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b45ab33
    • Y
      mm: fix handling of panic_on_oom when cpusets are in use · 2b744c01
      Yasunori Goto 提交于
      The current panic_on_oom may not work if there is a process using
      cpusets/mempolicy, because other nodes' memory may remain.  But some people
      want failover by panic ASAP even if they are used.  This patch makes new
      setting for its request.
      
      This is tested on my ia64 box which has 3 nodes.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Ethan Solomita <solo@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b744c01
    • A
      fault injection: fix failslab with CONFIG_NUMA · 824ebef1
      Akinobu Mita 提交于
      Currently failslab injects failures into ____cache_alloc().  But with enabling
      CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc,
      kmem_cache_alloc, ...) return NULL.
      
      This patch moves fault injection hook inside of __cache_alloc() and
      __cache_alloc_node().  These are lower call path than ____cache_alloc() and
      enable to inject faulures to slab allocators with CONFIG_NUMA.
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      824ebef1
    • C
      slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN · 5af60839
      Christoph Lameter 提交于
      This patch was recently posted to lkml and acked by Pekka.
      
      The flag SLAB_MUST_HWCACHE_ALIGN is
      
      1. Never checked by SLAB at all.
      
      2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB
      
      3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.
      
      The only remaining use is in sparc64 and ppc64 and their use there
      reflects some earlier role that the slab flag once may have had. If
      its specified then SLAB_HWCACHE_ALIGN is also specified.
      
      The flag is confusing, inconsistent and has no purpose.
      
      Remove it.
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5af60839
    • N
      mm: madvise avoid exclusive mmap_sem · 0a27a14a
      Nick Piggin 提交于
      Avoid down_write of the mmap_sem in madvise when we can help it.
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a27a14a
    • M
    • A
      slob: handle SLAB_PANIC flag · bc0055ae
      Akinobu Mita 提交于
      kmem_cache_create() for slob doesn't handle SLAB_PANIC.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc0055ae
    • C
      Quicklists for page table pages · 6225e937
      Christoph Lameter 提交于
      On x86_64 this cuts allocation overhead for page table pages down to a
      fraction (kernel compile / editing load.  TSC based measurement of times spend
      in each function):
      
      no quicklist
      
      pte_alloc               1569048 4.3s(401ns/2.7us/179.7us)
      pmd_alloc                780988 2.1s(337ns/2.7us/86.1us)
      pud_alloc                780072 2.2s(424ns/2.8us/300.6us)
      pgd_alloc                260022 1s(920ns/4us/263.1us)
      
      quicklist:
      
      pte_alloc                452436 573.4ms(8ns/1.3us/121.1us)
      pmd_alloc                196204 174.5ms(7ns/889ns/46.1us)
      pud_alloc                195688 172.4ms(7ns/881ns/151.3us)
      pgd_alloc                 65228 9.8ms(8ns/150ns/6.1us)
      
      pgd allocations are the most complex and there we see the most dramatic
      improvement (may be we can cut down the amount of pgds cached somewhat?).  But
      even the pte allocations still see a doubling of performance.
      
      1. Proven code from the IA64 arch.
      
      	The method used here has been fine tuned for years and
      	is NUMA aware. It is based on the knowledge that accesses
      	to page table pages are sparse in nature. Taking a page
      	off the freelists instead of allocating a zeroed pages
      	allows a reduction of number of cachelines touched
      	in addition to getting rid of the slab overhead. So
      	performance improves. This is particularly useful if pgds
      	contain standard mappings. We can save on the teardown
      	and setup of such a page if we have some on the quicklists.
      	This includes avoiding lists operations that are otherwise
      	necessary on alloc and free to track pgds.
      
      2. Light weight alternative to use slab to manage page size pages
      
      	Slab overhead is significant and even page allocator use
      	is pretty heavy weight. The use of a per cpu quicklist
      	means that we touch only two cachelines for an allocation.
      	There is no need to access the page_struct (unless arch code
      	needs to fiddle around with it). So the fast past just
      	means bringing in one cacheline at the beginning of the
      	page. That same cacheline may then be used to store the
      	page table entry. Or a second cacheline may be used
      	if the page table entry is not in the first cacheline of
      	the page. The current code will zero the page which means
      	touching 32 cachelines (assuming 128 byte). We get down
      	from 32 to 2 cachelines in the fast path.
      
      3. x86_64 gets lightweight page table page management.
      
      	This will allow x86_64 arch code to faster repopulate pgds
      	and other page table entries. The list operations for pgds
      	are reduced in the same way as for i386 to the point where
      	a pgd is allocated from the page allocator and when it is
      	freed back to the page allocator. A pgd can pass through
      	the quicklists without having to be reinitialized.
      
      64 Consolidation of code from multiple arches
      
      	So far arches have their own implementation of quicklist
      	management. This patch moves that feature into the core allowing
      	an easier maintenance and consistent management of quicklists.
      
      Page table pages have the characteristics that they are typically zero or in a
      known state when they are freed.  This is usually the exactly same state as
      needed after allocation.  So it makes sense to build a list of freed page
      table pages and then consume the pages already in use first.  Those pages have
      already been initialized correctly (thus no need to zero them) and are likely
      already cached in such a way that the MMU can use them most effectively.  Page
      table pages are used in a sparse way so zeroing them on allocation is not too
      useful.
      
      Such an implementation already exits for ia64.  Howver, that implementation
      did not support constructors and destructors as needed by i386 / x86_64.  It
      also only supported a single quicklist.  The implementation here has
      constructor and destructor support as well as the ability for an arch to
      specify how many quicklists are needed.
      
      Quicklists are defined by an arch defining CONFIG_QUICKLIST.  If more than one
      quicklist is necessary then we can define NR_QUICK for additional lists.  F.e.
       i386 needs two and thus has
      
      config NR_QUICK
      	int
      	default 2
      
      If an arch has requested quicklist support then pages can be allocated
      from the quicklist (or from the page allocator if the quicklist is
      empty) via:
      
      quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)
      
      Page table pages can be freed using:
      
      quicklist_free(<quicklist-nr>, <destructor>, <page>)
      
      Pages must have a definite state after allocation and before
      they are freed. If no constructor is specified then pages
      will be zeroed on allocation and must be zeroed before they are
      freed.
      
      If a constructor is used then the constructor will establish
      a definite page state. F.e. the i386 and x86_64 pgd constructors
      establish certain mappings.
      
      Constructors and destructors can also be used to track the pages.
      i386 and x86_64 use a list of pgds in order to be able to dynamically
      update standard mappings.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6225e937
    • C
      slub: remove object activities out of checking functions · 70d71228
      Christoph Lameter 提交于
      Make sure that the check function really only check things and do not perform
      activities.  Extract the tracing and object seeding out of the two check
      functions and place them into slab_alloc and slab_free
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70d71228
    • C
      SLUB: Free slabs and sort partial slab lists in kmem_cache_shrink · 2086d26a
      Christoph Lameter 提交于
      At kmem_cache_shrink check if we have any empty slabs on the partial
      if so then remove them.
      
      Also--as an anti-fragmentation measure--sort the partial slabs so that
      the most fully allocated ones come first and the least allocated last.
      
      The next allocations may fill up the nearly full slabs. Having the
      least allocated slabs last gives them the maximum chance that their
      remaining objects may be freed. Thus we can hopefully minimize the
      partial slabs.
      
      I think this is the best one can do in terms antifragmentation
      measures. Real defragmentation (meaning moving objects out of slabs with
      the least free objects to those that are almost full) can be implemted
      by reverse scanning through the list produced here but that would mean
      that we need to provide a callback at slab cache creation that allows
      the deletion or moving of an object. This will involve slab API
      changes, so defer for now.
      
      Cc: Mel Gorman <mel@skynet.ie>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2086d26a
    • C
      slub: add ability to list alloc / free callers per slab · 88a420e4
      Christoph Lameter 提交于
      This patch enables listing the callers who allocated or freed objects in a
      cache.
      
      For example to list the allocators for kmalloc-128 do
      
      cat /sys/slab/kmalloc-128/alloc_calls
            7 sn_io_slot_fixup+0x40/0x700
            7 sn_io_slot_fixup+0x80/0x700
            9 sn_bus_fixup+0xe0/0x380
            6 param_sysfs_setup+0xf0/0x280
          276 percpu_populate+0xf0/0x1a0
           19 __register_chrdev_region+0x30/0x360
            8 expand_files+0x2e0/0x6e0
            1 sys_epoll_create+0x60/0x200
            1 __mounts_open+0x140/0x2c0
           65 kmem_alloc+0x110/0x280
            3 alloc_disk_node+0xe0/0x200
           33 as_get_io_context+0x90/0x280
           74 kobject_kset_add_dir+0x40/0x140
           12 pci_create_bus+0x2a0/0x5c0
            1 acpi_ev_create_gpe_block+0x120/0x9e0
           41 con_insert_unipair+0x100/0x1c0
            1 uart_open+0x1c0/0xba0
            1 dma_pool_create+0xe0/0x340
            2 neigh_table_init_no_netlink+0x260/0x4c0
            6 neigh_parms_alloc+0x30/0x200
            1 netlink_kernel_create+0x130/0x320
            5 fz_hash_alloc+0x50/0xe0
            2 sn_common_hubdev_init+0xd0/0x6e0
           28 kernel_param_sysfs_setup+0x30/0x180
           72 process_zones+0x70/0x2e0
      
      cat /sys/slab/kmalloc-128/free_calls
          558 <not-available>
            3 sn_io_slot_fixup+0x600/0x700
           84 free_fdtable_rcu+0x120/0x260
            2 seq_release+0x40/0x60
            6 kmem_free+0x70/0xc0
           24 free_as_io_context+0x20/0x200
            1 acpi_get_object_info+0x3a0/0x3e0
            1 acpi_add_single_object+0xcf0/0x1e40
            2 con_release_unimap+0x80/0x140
            1 free+0x20/0x40
      
      SLAB_STORE_USER must be enabled for a slab cache by either booting with
      "slab_debug" or enabling user tracking specifically for the slab of interest.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88a420e4