1. 17 10月, 2007 4 次提交
    • M
      Group short-lived and reclaimable kernel allocations · e12ba74d
      Mel Gorman 提交于
      This patch marks a number of allocations that are either short-lived such as
      network buffers or are reclaimable such as inode allocations.  When something
      like updatedb is called, long-lived and unmovable kernel allocations tend to
      be spread throughout the address space which increases fragmentation.
      
      This patch groups these allocations together as much as possible by adding a
      new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
      reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
      them and re-reading the information from elsewhere.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e12ba74d
    • C
      Categorize GFP flags · 6cb06229
      Christoph Lameter 提交于
      The function of GFP_LEVEL_MASK seems to be unclear.  In order to clear up
      the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP
      flags:
      
      GFP_RECLAIM_MASK	Flags used to control page allocator reclaim behavior.
      
      GFP_CONSTRAINT_MASK	Flags used to limit where allocations can occur.
      
      GFP_SLAB_BUG_MASK	Flags that the slab allocator BUG()s on.
      
      These replace the uses of GFP_LEVEL mask in the slab allocators and in
      vmalloc.c.
      
      The use of the flags not included in these sets may occur as a result of a
      slab allocation standing in for a page allocation when constructing scatter
      gather lists.  Extraneous flags are cleared and not passed through to the
      page allocator.  __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will
      now be ignored if passed to a slab allocator.
      
      Change the allocation of allocator meta data in SLAB and vmalloc to not
      pass through flags listed in GFP_CONSTRAINT_MASK.  SLAB already removes the
      __GFP_THISNODE flag for such allocations.  Generalize that to also cover
      vmalloc.  The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL.
      
      The impact of allocator metadata placement on access latency to the
      cachelines of the object itself is minimal since metadata is only
      referenced on alloc and free.  The attempt is still made to place the meta
      data optimally but we consistently allow fallback both in SLAB and vmalloc
      (SLUB does not need to allocate metadata like that).
      
      Allocator metadata may serve multiple in kernel users and thus should not
      be subject to the limitations arising from a single allocation context.
      
      [akpm@linux-foundation.org: fix fallback_alloc()]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb06229
    • C
      Memoryless nodes: Slab support · 04231b30
      Christoph Lameter 提交于
      Slab should not allocate control structures for nodes without memory.  This
      may seem to work right now but its unreliable since not all allocations can
      fall back due to the use of GFP_THISNODE.
      
      Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to
      only allocate for nodes that have regular memory.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NBob Picco <bob.picco@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@skynet.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04231b30
    • C
      Slab allocators: fail if ksize is called with a NULL parameter · ef8b4520
      Christoph Lameter 提交于
      A NULL pointer means that the object was not allocated.  One cannot
      determine the size of an object that has not been allocated.  Currently we
      return 0 but we really should BUG() on attempts to determine the size of
      something nonexistent.
      
      krealloc() interprets NULL to mean a zero sized object.  Handle that
      separately in krealloc().
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef8b4520
  2. 23 8月, 2007 1 次提交
  3. 25 7月, 2007 1 次提交
  4. 20 7月, 2007 3 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
    • L
      Fix up non-NUMA SLAB configuration for zero-sized allocations · a5c96d8a
      Linus Torvalds 提交于
      I suspect Christoph tested his code only in the NUMA configuration, for
      the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work.
      
      Of course, this would only trigger in configurations where those zero-
      sized allocations happen (not very common), so that may explain why it
      wasn't more widely noticed.
      
      Seen by by Andi Kleen under qemu, and there seems to be a report by
      Michael Tsirkin on it too.
      
      Cc: Andi Kleen <ak@suse.de>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Michael S. Tsirkin <mst@dev.mellanox.co.il>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5c96d8a
    • D
      FRV: work around a possible compiler bug · ea02e3dd
      David Howells 提交于
      Work around a possible bug in the FRV compiler.
      
      What appears to be happening is that gcc resolves the
      __builtin_constant_p() in kmalloc() to true, but then fails to reduce the
      therefore constant conditions in the if-statements it guards to constant
      results.
      
      When compiling with -O2 or -Os, one single spurious error crops up in
      cpuup_callback() in mm/slab.c.  This can be avoided by making the memsize
      variable const.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea02e3dd
  5. 18 7月, 2007 5 次提交
  6. 17 7月, 2007 2 次提交
  7. 06 7月, 2007 1 次提交
    • D
      Fix slab redzone alignment · 87a927c7
      David Woodhouse 提交于
      Commit b46b8f19 fixed a couple of bugs
      by switching the redzone to 64 bits. Unfortunately, it neglected to
      ensure that the _second_ redzone, after the slab object, is aligned
      correctly. This caused illegal instruction faults on sparc32, which for
      some reason not entirely clear to me are not trapped and fixed up.
      
      Two things need to be done to fix this:
        - increase the object size, rounding up to alignof(long long) so
          that the second redzone can be aligned correctly.
        - If SLAB_STORE_USER is set but alignof(long long)==8, allow a
          full 64 bits of space for the user word at the end of the buffer,
          even though we may not _use_ the whole 64 bits.
      
      This patch should be a no-op on any 64-bit architecture or any 32-bit
      architecture where alignof(long long) == 4. Of the others, it's tested
      on ppc32 by myself and a very similar patch was tested on sparc32 by
      Mark Fortescue, who reported the new problem.
      
      Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
      the new sizes. Again noticed by Mark.
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87a927c7
  8. 02 7月, 2007 1 次提交
  9. 09 6月, 2007 1 次提交
  10. 19 5月, 2007 1 次提交
  11. 17 5月, 2007 4 次提交
  12. 10 5月, 2007 6 次提交
    • C
      Move remote node draining out of slab allocators · 4037d452
      Christoph Lameter 提交于
      Currently the slab allocators contain callbacks into the page allocator to
      perform the draining of pagesets on remote nodes.  This requires SLUB to have
      a whole subsystem in order to be compatible with SLAB.  Moving node draining
      out of the slab allocators avoids a section of code in SLUB.
      
      Move the node draining so that is is done when the vm statistics are updated.
      At that point we are already touching all the cachelines with the pagesets of
      a processor.
      
      Add a expire counter there.  If we have to update per zone or global vm
      statistics then assume that the pageset will require subsequent draining.
      
      The expire counter will be decremented on each vm stats update pass until it
      reaches zero.  Then we will drain one batch from the pageset.  The draining
      will cause vm counter updates which will then cause another expiration until
      the pcp is empty.  So we will drain a batch every 3 seconds.
      
      Note that remote node draining is a somewhat esoteric feature that is required
      on large NUMA systems because otherwise significant portions of system memory
      can become trapped in pcp queues.  The number of pcp is determined by the
      number of processors and nodes in a system.  A system with 4 processors and 2
      nodes has 8 pcps which is okay.  But a system with 1024 processors and 512
      nodes has 512k pcps with a high potential for large amount of memory being
      caught in them.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4037d452
    • C
      vmstat: use our own timer events · d1187ed2
      Christoph Lameter 提交于
      vmstat is currently using the cache reaper to periodically bring the
      statistics up to date.  The cache reaper does only exists in SLUB as a way to
      provide compatibility with SLAB.  This patch removes the vmstat calls from the
      slab allocators and provides its own handling.
      
      The advantage is also that we can use a different frequency for the updates.
      Refreshing vm stats is a pretty fast job so we can run this every second and
      stagger this by only one tick.  This will lead to some overlap in large
      systems.  F.e a system running at 250 HZ with 1024 processors will have 4 vm
      updates occurring at once.
      
      However, the vm stats update only accesses per node information.  It is only
      necessary to stagger the vm statistics updates per processor in each node.  Vm
      counter updates occurring on distant nodes will not cause cacheline
      contention.
      
      We could implement an alternate approach that runs the first processor on each
      node at the second and then each of the other processor on a node on a
      subsequent tick.  That may be useful to keep a large amount of the second free
      of timer activity.  Maybe the timer folks will have some feedback on this one?
      
      [jirislaby@gmail.com: add missing break]
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1187ed2
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
    • C
      slab: shut down cache_reaper when cpu goes down · 5830c590
      Christoph Lameter 提交于
      Shutdown the cache_reaper if the cpu is brought down and set the
      cache_reap.func to NULL.  Otherwise hotplug shuts down the reaper for good.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5830c590
    • H
      slab: use CPU_LOCK_[ACQUIRE|RELEASE] · 38c3bd96
      Heiko Carstens 提交于
      Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
      introduced.
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Gautham Shenoy <ego@in.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38c3bd96
    • P
      krealloc: fix kerneldoc comments · 7ae439ce
      Pekka J Enberg 提交于
      No "blank" (or "*") line is allowed between the function name and lines for
      it parameter(s).
      
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ae439ce
  13. 09 5月, 2007 2 次提交
  14. 08 5月, 2007 8 次提交