1. 08 12月, 2006 40 次提交
    • A
      [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols · 045f147f
      Adrian Bunk 提交于
      In time for 2.6.20, we can get rid of this junk.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      045f147f
    • B
      4668edc3
    • A
      1f370a23
    • I
      [PATCH] hotplug CPU: clean up hotcpu_notifier() use · 02316067
      Ingo Molnar 提交于
      There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
      prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
      generating compiler warnings of unused symbols, hence forcing people to add
      #ifdefs.
      
      the compiler can skip truly unused functions just fine:
      
          text    data     bss     dec     hex filename
       1624412  728710 3674856 6027978  5bfaca vmlinux.before
       1624412  728710 3674856 6027978  5bfaca vmlinux.after
      
      [akpm@osdl.org: topology.c fix]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      02316067
    • A
      [PATCH] remove HASH_HIGHMEM · 04903664
      Andrew Morton 提交于
      It has no users and it's doubtful that we'll need it again.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04903664
    • O
      [PATCH] read_cache_pages() cleanup · 38da288b
      OGAWA Hirofumi 提交于
      Use put_pages_list() instead of opencoding it.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      38da288b
    • A
      [PATCH] slab: use probe_kernel_address() · 138ae663
      Andrew Morton 提交于
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      138ae663
    • N
      [PATCH] Add include/linux/freezer.h and move definitions from sched.h · 7dfb7103
      Nigel Cunningham 提交于
      Move process freezing functions from include/linux/sched.h to freezer.h, so
      that modifications to the freezer or the kernel configuration don't require
      recompiling just about everything.
      
      [akpm@osdl.org: fix ueagle driver]
      Signed-off-by: NNigel Cunningham <nigel@suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7dfb7103
    • R
      [PATCH] swsusp: Improve handling of highmem · 8357376d
      Rafael J. Wysocki 提交于
      Currently swsusp saves the contents of highmem pages by copying them to the
      normal zone which is quite inefficient (eg.  it requires two normal pages
      to be used for saving one highmem page).  This may be improved by using
      highmem for saving the contents of saveable highmem pages.
      
      Namely, during the suspend phase of the suspend-resume cycle we try to
      allocate as many free highmem pages as there are saveable highmem pages.
      If there are not enough highmem image pages to store the contents of all of
      the saveable highmem pages, some of them will be stored in the "normal"
      memory.  Next, we allocate as many free "normal" pages as needed to store
      the (remaining) image data.  We use a memory bitmap to mark the allocated
      free pages (ie.  highmem as well as "normal" image pages).
      
      Now, we use another memory bitmap to mark all of the saveable pages
      (highmem as well as "normal") and the contents of the saveable pages are
      copied into the image pages.  Then, the second bitmap is used to save the
      pfns corresponding to the saveable pages and the first one is used to save
      their data.
      
      During the resume phase the pfns of the pages that were saveable during the
      suspend are loaded from the image and used to mark the "unsafe" page
      frames.  Next, we try to allocate as many free highmem page frames as to
      load all of the image data that had been in the highmem before the suspend
      and we allocate so many free "normal" page frames that the total number of
      allocated free pages (highmem and "normal") is equal to the size of the
      image.  While doing this we have to make sure that there will be some extra
      free "normal" and "safe" page frames for two lists of PBEs constructed
      later.
      
      Now, the image data are loaded, if possible, into their "original" page
      frames.  The image data that cannot be written into their "original" page
      frames are loaded into "safe" page frames and their "original" kernel
      virtual addresses, as well as the addresses of the "safe" pages containing
      their copies, are stored in one of two lists of PBEs.
      
      One list of PBEs is for the copies of "normal" suspend pages (ie.  "normal"
      pages that were saveable during the suspend) and it is used in the same way
      as previously (ie.  by the architecture-dependent parts of swsusp).  The
      other list of PBEs is for the copies of highmem suspend pages.  The pages
      in this list are restored (in a reversible way) right before the
      arch-dependent code is called.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8357376d
    • R
      [PATCH] swsusp: use block device offsets to identify swap locations · 3aef83e0
      Rafael J. Wysocki 提交于
      Make swsusp use block device offsets instead of swap offsets to identify swap
      locations and make it use the same code paths for writing as well as for
      reading data.
      
      This allows us to use the same code for handling swap files and swap
      partitions and to simplify the code, eg.  by dropping rw_swap_page_sync().
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3aef83e0
    • R
      [PATCH] swsusp: use partition device and offset to identify swap areas · 915bae9e
      Rafael J. Wysocki 提交于
      The Linux kernel handles swap files almost in the same way as it handles swap
      partitions and there are only two differences between these two types of swap
      areas:
      
      (1) swap files need not be contiguous,
      
      (2) the header of a swap file is not in the first block of the partition
          that holds it.  From the swsusp's point of view (1) is not a problem,
          because it is already taken care of by the swap-handling code, but (2) has
          to be taken into consideration.
      
      In principle the location of a swap file's header may be determined with the
      help of appropriate filesystem driver.  Unfortunately, however, it requires
      the filesystem holding the swap file to be mounted, and if this filesystem is
      journaled, it cannot be mounted during a resume from disk.  For this reason we
      need some other means by which swap areas can be identified.
      
      For example, to identify a swap area we can use the partition that holds the
      area and the offset from the beginning of this partition at which the swap
      header is located.
      
      The following patch allows swsusp to identify swap areas this way.  It changes
      swap_type_of() so that it takes an additional argument representing an offset
      of the swap header within the partition represented by its first argument.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      915bae9e
    • N
      [PATCH] radix-tree: RCU lockless readside · 7cf9c2c7
      Nick Piggin 提交于
      Make radix tree lookups safe to be performed without locks.  Readers are
      protected against nodes being deleted by using RCU based freeing.  Readers
      are protected against new node insertion by using memory barriers to ensure
      the node itself will be properly written before it is visible in the radix
      tree.
      
      Each radix tree node keeps a record of their height (above leaf nodes).
      This height does not change after insertion -- when the radix tree is
      extended, higher nodes are only inserted in the top.  So a lookup can take
      the pointer to what is *now* the root node, and traverse down it even if
      the tree is concurrently extended and this node becomes a subtree of a new
      root.
      
      "Direct" pointers (tree height of 0, where root->rnode points directly to
      the data item) are handled by using the low bit of the pointer to signal
      whether rnode is a direct pointer or a pointer to a radix tree node.
      
      When a reader wants to traverse the next branch, they will take a copy of
      the pointer.  This pointer will be either NULL (and the branch is empty) or
      non-NULL (and will point to a valid node).
      
      [akpm@osdl.org: cleanups]
      [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
      [clameter@sgi.com: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cf9c2c7
    • A
      [PATCH] mm: make compound page destructor handling explicit · 33f2ef89
      Andy Whitcroft 提交于
      Currently we we use the lru head link of the second page of a compound page
      to hold its destructor.  This was ok when it was purely an internal
      implmentation detail.  However, hugetlbfs overrides this destructor
      violating the layering.  Abstract this out as explicit calls, also
      introduce a type for the callback function allowing them to be type
      checked.  For each callback we pre-declare the function, causing a type
      error on definition rather than on use elsewhere.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      33f2ef89
    • C
      [PATCH] slab: better fallback allocation behavior · 3c517a61
      Christoph Lameter 提交于
      Currently we simply attempt to allocate from all allowed nodes using
      GFP_THISNODE.  However, GFP_THISNODE does not do reclaim (it wont do any at
      all if the recent GFP_THISNODE patch is accepted).  If we truly run out of
      memory in the whole system then fallback_alloc may return NULL although
      memory may still be available if we would perform more thorough reclaim.
      
      This patch changes fallback_alloc() so that we first only inspect all the
      per node queues for available slabs.  If we find any then we allocate from
      those.  This avoids slab fragmentation by first getting rid of all partial
      allocated slabs on every node before allocating new memory.
      
      If we cannot satisfy the allocation from any per node queue then we extend
      a slab.  We now call into the page allocator without specifying
      GFP_THISNODE.  The page allocator will then implement its own fallback (in
      the given cpuset context), perform necessary reclaim (again considering not
      a single node but the whole set of allowed nodes) and then return pages for
      a new slab.
      
      We identify from which node the pages were allocated and then insert the
      pages into the corresponding per node structure.  In order to do so we need
      to modify cache_grow() to take a parameter that specifies the new slab.
      kmem_getpages() can no longer set the GFP_THISNODE flag since we need to be
      able to use kmem_getpage to allocate from an arbitrary node.  GFP_THISNODE
      needs to be specified when calling cache_grow().
      
      One key advantage is that the decision from which node to allocate new
      memory is removed from slab fallback processing.  The patch allows to go
      back to use of the page allocators fallback/reclaim logic.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3c517a61
    • C
      [PATCH] GFP_THISNODE must not trigger global reclaim · 952f3b51
      Christoph Lameter 提交于
      The intent of GFP_THISNODE is to make sure that an allocation occurs on a
      particular node.  If this is not possible then NULL needs to be returned so
      that the caller can choose what to do next on its own (the slab allocator
      depends on that).
      
      However, GFP_THISNODE currently triggers reclaim before returning a failure
      (GFP_THISNODE means GFP_NORETRY is set).  If we have over allocated a node
      then we will currently do some reclaim before returning NULL.  The caller
      may want memory from other nodes before reclaim should be triggered.  (If
      the caller wants reclaim then he can directly use __GFP_THISNODE instead).
      
      There is no flag to avoid reclaim in the page allocator and adding yet
      another GFP_xx flag would be difficult given that we are out of available
      flags.
      
      So just compare and see if all bits for GFP_THISNODE (__GFP_THISNODE,
      __GFP_NORETRY and __GFP_NOWARN) are set.  If so then we return NULL before
      waking up kswapd.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      952f3b51
    • C
      [PATCH] slab: fix two issues in kmalloc_node / __cache_alloc_node · 5bcd234d
      Christoph Lameter 提交于
      This addresses two issues:
      
      1. Kmalloc_node() may intermittently return NULL if we are allocating
         from the current node and are unable to obtain memory for the current
         node from the page allocator.  This is because we call ___cache_alloc()
         if nodeid == numa_node_id() and ____cache_alloc is not able to fallback
         to other nodes.
      
         This was introduced in the 2.6.19 development cycle.  <= 2.6.18 in
         that case does not do a restricted allocation and blindly trusts the
         page allocator to have given us memory from the indicated node.  It
         inserts the page regardless of the node it came from into the queues for
         the current node.
      
      2. If kmalloc_node() is used on a node that has not been bootstrapped
         yet then we may try to pass an invalid node number to
         ____cache_alloc_node() triggering a BUG().
      
         Change the function to call fallback_alloc() instead.  Only call
         fallback_alloc() if we are allowed to fallback at all.  The need to
         handle a node not bootstrapped yet also first surfaced in the 2.6.19
         cycle.
      
      Update the comments since they were still describing the old kmalloc_node
      from 2.6.12.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bcd234d
    • C
      [PATCH] slab: remove SLAB_DMA · 441e143e
      Christoph Lameter 提交于
      SLAB_DMA is an alias of GFP_DMA. This is the last one so we
      remove the leftover comment too.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      441e143e
    • C
      [PATCH] slab: remove SLAB_KERNEL · e94b1766
      Christoph Lameter 提交于
      SLAB_KERNEL is an alias of GFP_KERNEL.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e94b1766
    • C
      [PATCH] slab: remove SLAB_LEVEL_MASK · a06d72c1
      Christoph Lameter 提交于
      SLAB_LEVEL_MASK is only used internally to the slab and is
      and alias of GFP_LEVEL_MASK.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a06d72c1
    • C
      [PATCH] slab: remove SLAB_NO_GROW · 6e0eaa4b
      Christoph Lameter 提交于
      It is only used internally in the slab.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e0eaa4b
    • H
      [PATCH] kill install_file_pte's pte_val · 2d4d862f
      Hugh Dickins 提交于
      David Binderman and his Intel C compiler rightly observe that
      install_file_pte no longer has any use for its pte_val.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: d binderman <dcb314@hotmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2d4d862f
    • A
      [PATCH] mm: cleanup indentation on switch for CPU operations · ce421c79
      Andy Whitcroft 提交于
      These patches introduced new switch statements which are indented contrary
      to the concensus in mm/*.c.  Fix them up to match that concensus.
      
          [PATCH] node local per-cpu-pages
          [PATCH] ZVC: Scale thresholds depending on the size of the system
          commit e7c8d5c9
          commit df9ecabaSigned-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ce421c79
    • E
      [PATCH] reject corrupt swapfiles earlier · 5d1854e1
      Eric Sandeen 提交于
      The fsfuzzer found this; with a corrupt small swapfile that claims to have
      many pages:
      
        [root]# file swap.741.img
        swap.741.img: Linux/i386 swap file (new style) 1 (4K pages) size 1040191487 pages
        [root]# ls -l swap.741.img
        -rw-r--r-- 1 root root 16777216 Nov 22 05:18 swap.741.img
      
      sys_swapon() will try to vmalloc all those pages, and -then- check to see if
      the file is actually that large:
      
                      if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) {
        <snip>
              if (swapfilesize && maxpages > swapfilesize) {
                      printk(KERN_WARNING
                             "Swap area shorter than signature indicates\n");
      
      It seems to me that it would make more sense to move this test up before
      the vmalloc, with the other checks, to avoid the OOM-killer in this
      situation...
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d1854e1
    • A
      [PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int · 25ba77c1
      Andy Whitcroft 提交于
      NUMA node ids are passed as either int or unsigned int almost exclusivly
      page_to_nid and zone_to_nid both return unsigned long.  This is a throw
      back to when page_to_nid was a #define and was thus exposing the real type
      of the page flags field.
      
      In addition to fixing up the definitions of page_to_nid and zone_to_nid I
      audited the users of these functions identifying the following incorrect
      uses:
      
      1) mm/page_alloc.c show_node() -- printk dumping the node id,
      2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
         against numa_node_id() which returns an int from cpu_to_node(), and
      3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
         uses bit_set which in generic code takes an int.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25ba77c1
    • C
      [PATCH] drain_node_page(): Drain pages in batch units · bc4ba393
      Christoph Lameter 提交于
      drain_node_pages() currently drains the complete pageset of all pages.  If
      there are a large number of pages in the queues then we may hold off
      interrupts for too long.
      
      Duplicate the method used in free_hot_cold_page.  Only drain pcp->batch
      pages at one time.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc4ba393
    • A
      [PATCH] make mm/thrash.c:global_faults static · e3050055
      Adrian Bunk 提交于
      This patch makes the needlessly global "global_faults" static.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e3050055
    • C
      [PATCH] enable booting a NUMA system where some nodes have no memory · 7c309a64
      Christian Krafft 提交于
      When booting a NUMA system with nodes that have no memory (eg by limiting
      memory), bootmem_alloc_core tried to find pages in an uninitialized
      bootmem_map.  This caused a null pointer access.  This fix adds a check, so
      that NULL is returned.  That will enable the caller (bootmem_alloc_nopanic)
      to alloc memory on other without a panic.
      Signed-off-by: NChristian Krafft <krafft@de.ibm.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Martin Bligh <mbligh@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7c309a64
    • A
      [PATCH] Allow NULL pointers in percpu_free · a1205868
      Alan Stern 提交于
      The patch (as824b) makes percpu_free() ignore NULL arguments, as one would
      expect for a deallocation routine.  (Note that free_percpu is #defined as
      percpu_free in include/linux/percpu.h.) A few callers are updated to remove
      now-unneeded tests for NULL.  A few other callers already seem to assume
      that passing a NULL pointer to percpu_free() is okay!
      
      The patch also removes an unnecessary NULL check in percpu_depopulate().
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a1205868
    • C
      [PATCH] leak tracking for kmalloc_node · 8b98c169
      Christoph Hellwig 提交于
      We have variants of kmalloc and kmem_cache_alloc that leave leak tracking to
      the caller.  This is used for subsystem-specific allocators like skb_alloc.
      
      To make skb_alloc node-aware we need similar routines for the node-aware slab
      allocator, which this patch adds.
      
      Note that the code is rather ugly, but it mirrors the non-node-aware code 1:1:
      
      [akpm@osdl.org: add module export]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8b98c169
    • S
      [PATCH] Always print out the header line in /proc/swaps · 881e4aab
      Suleiman Souhlal 提交于
      It would be possible for /proc/swaps to not always print out the header:
      
      swapon /dev/hdc2
      swapon /dev/hde2
      swapoff /dev/hdc2
      
      At this point /proc/swaps would not have a header.
      Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      881e4aab
    • K
      [PATCH] OOM can panic due to processes stuck in __alloc_pages() · b43a57bb
      Kirill Korotaev 提交于
      OOM can panic due to the processes stuck in __alloc_pages() doing infinite
      rebalance loop while no memory can be reclaimed.  OOM killer tries to kill
      some processes, but unfortunetaly, rebalance label was moved by someone
      below the TIF_MEMDIE check, so buddy allocator doesn't see that process is
      OOM-killed and it can simply fail the allocation :/
      
      Observed in reality on RHEL4(2.6.9)+OpenVZ kernel when a user doing some
      memory allocation tricks triggered OOM panic.
      Signed-off-by: NDenis Lunev <den@sw.ru>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b43a57bb
    • R
      [PATCH] mlock cleanup · a3eea484
      Rik Bobbaers 提交于
      mm is defined as vma->vm_mm, so use that.
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a3eea484
    • P
      [PATCH] mm: add noaliencache boot option to disable numa alien caches · 3395ee05
      Paul Menage 提交于
      When using numa=fake on non-NUMA hardware there is no benefit to having the
      alien caches, and they consume much memory.
      
      Add a kernel boot option to disable them.
      
      Christoph sayeth "This is good to have even on large NUMA.  The problem is
      that the alien caches grow by the square of the size of the system in terms of
      nodes."
      
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3395ee05
    • R
      [PATCH] mm: slab: eliminate lock_cpu_hotplug from slab · 8f5be20b
      Ravikiran G Thirumalai 提交于
      Here's an attempt towards doing away with lock_cpu_hotplug in the slab
      subsystem.  This approach also fixes a bug which shows up when cpus are
      being offlined/onlined and slab caches are being tuned simultaneously.
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=116098888100481&w=2
      
      The patch has been stress tested overnight on a 2 socket 4 core AMD box with
      repeated cpu online and offline, while dbench and kernbench process are
      running, and slab caches being tuned at the same time.
      There were no lockdep warnings either.  (This test on 2,6.18 as 2.6.19-rc
      crashes at __drain_pages
      http://marc.theaimsgroup.com/?l=linux-kernel&m=116172164217678&w=2 )
      
      The approach here is to hold cache_chain_mutex from CPU_UP_PREPARE until
      CPU_ONLINE (similar in approach as worqueue_mutex) .  Slab code sensitive
      to cpu_online_map (kmem_cache_create, kmem_cache_destroy, slabinfo_write,
      __cache_shrink) is already serialized with cache_chain_mutex.  (This patch
      lengthens cache_chain_mutex hold time at kmem_cache_destroy to cover this).
       This patch also takes the cache_chain_sem at kmem_cache_shrink to protect
      sanity of cpu_online_map at __cache_shrink, as viewed by slab.
      (kmem_cache_shrink->__cache_shrink->drain_cpu_caches).  But, really,
      kmem_cache_shrink is used at just one place in the acpi subsystem!  Do we
      really need to keep kmem_cache_shrink at all?
      
      Another note.  Looks like a cpu hotplug event can send  CPU_UP_CANCELED to
      a registered subsystem even if the subsystem did not receive CPU_UP_PREPARE.
      This could be due to a subsystem registered for notification earlier than
      the current subsystem crapping out with NOTIFY_BAD. Badness can occur with
      in the CPU_UP_CANCELED code path at slab if this happens (The same would
      apply for workqueue.c as well).  To overcome this, we might have to use either
      a) a per subsystem flag and avoid handling of CPU_UP_CANCELED, or
      b) Use a special notifier events like LOCK_ACQUIRE/RELEASE as Gautham was
         using in his experiments, or
      c) Do not send CPU_UP_CANCELED to a subsystem which did not receive
         CPU_UP_PREPARE.
      
      I would prefer c).
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f5be20b
    • K
      [PATCH] slab debug and ARCH_SLAB_MINALIGN don't get along · a44b56d3
      Kevin Hilman 提交于
      When CONFIG_SLAB_DEBUG is used in combination with ARCH_SLAB_MINALIGN, some
      debug flags should be disabled which depend on BYTES_PER_WORD alignment.
      
      The disabling of these debug flags is not properly handled when
      BYTES_PER_WORD < ARCH_SLAB_MEMALIGN < cache_line_size()
      
      This patch fixes that and also adds an alignment check to
      cache_alloc_debugcheck_after() when ARCH_SLAB_MINALIGN is used.
      Signed-off-by: NKevin Hilman <khilman@mvista.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a44b56d3
    • C
      [PATCH] htlb forget rss with pt sharing · cace673d
      Chen, Kenneth W 提交于
      Imprecise RSS accounting is an irritating ill effect with pt sharing.  After
      consulted with several VM experts, I have tried various methods to solve that
      problem: (1) iterate through all mm_structs that share the PT and increment
      count; (2) keep RSS count in page table structure and then sum them up at
      reporting time.  None of the above methods yield any satisfactory
      implementation.
      
      Since process RSS accounting is pure information only, I propose we don't
      count them at all for hugetlb page.  rlimit has such field, though there is
      absolutely no enforcement on limiting that resource.  One other method is to
      account all RSS at hugetlb mmap time regardless they are faulted or not.  I
      opt for the simplicity of no accounting at all.
      
      Hugetlb page are special, they are reserved up front in global reservation
      pool and is not reclaimable.  From physical memory resource point of view, it
      is already consumed regardless whether there are users using them.
      
      If the concern is that RSS can be used to control resource allocation, we
      already can specify hugetlb fs size limit and sysadmin can enforce that at
      mount time.  Combined with the two points mentioned above, I fail to see if
      there is anything got affected because of this patch.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Dave McCracken <dmccr@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cace673d
    • C
      [PATCH] shared page table for hugetlb page · 39dde65c
      Chen, Kenneth W 提交于
      Following up with the work on shared page table done by Dave McCracken.  This
      set of patch target shared page table for hugetlb memory only.
      
      The shared page table is particular useful in the situation of large number of
      independent processes sharing large shared memory segments.  In the normal
      page case, the amount of memory saved from process' page table is quite
      significant.  For hugetlb, the saving on page table memory is not the primary
      objective (as hugetlb itself already cuts down page table overhead
      significantly), instead, the purpose of using shared page table on hugetlb is
      to allow faster TLB refill and smaller cache pollution upon TLB miss.
      
      With PT sharing, pte entries are shared among hundreds of processes, the cache
      consumption used by all the page table is smaller and in return, application
      gets much higher cache hit ratio.  One other effect is that cache hit ratio
      with hardware page walker hitting on pte in cache will be higher and this
      helps to reduce tlb miss latency.  These two effects contribute to higher
      application performance.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Dave McCracken <dmccr@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39dde65c
    • A
      [PATCH] balance_pdgat() cleanup · e1dbeda6
      Andrew Morton 提交于
      Despaghettify balance_pdgat() a bit.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e1dbeda6
    • N
      [PATCH] mm: add arch_alloc_page · cc102509
      Nick Piggin 提交于
      Add an arch_alloc_page to match arch_free_page.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cc102509
    • A
      [PATCH] new scheme to preempt swap token · 7602bdf2
      Ashwin Chaugule 提交于
      The new swap token patches replace the current token traversal algo.  The old
      algo had a crude timeout parameter that was used to handover the token from
      one task to another.  This algo, transfers the token to the tasks that are in
      need of the token.  The urgency for the token is based on the number of times
      a task is required to swap-in pages.  Accordingly, the priority of a task is
      incremented if it has been badly affected due to swap-outs.  To ensure that
      the token doesnt bounce around rapidly, the token holders are given a priority
      boost.  The priority of tasks is also decremented, if their rate of swap-in's
      keeps reducing.  This way, the condition to check whether to pre-empt the swap
      token, is a matter of comparing two task's priority fields.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NAshwin Chaugule <ashwin.chaugule@celunite.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7602bdf2