1. 27 10月, 2010 5 次提交
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
    • D
      oom: kill all threads sharing oom killed task's mm · 1e99bad0
      David Rientjes 提交于
      It's necessary to kill all threads that share an oom killed task's mm if
      the goal is to lead to future memory freeing.
      
      This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill
      doesn't kill vfork parent (or child)) since it is obsoleted.
      
      It's now guaranteed that any task passed to oom_kill_task() does not share
      an mm with any thread that is unkillable.  Thus, we're safe to issue a
      SIGKILL to any thread sharing the same mm.
      
      This is especially necessary to solve an mm->mmap_sem livelock issue
      whereas an oom killed thread must acquire the lock in the exit path while
      another thread is holding it in the page allocator while trying to
      allocate memory itself (and will preempt the oom killer since a task was
      already killed).  Since tasks with pending fatal signals are now granted
      access to memory reserves, the thread holding the lock may quickly
      allocate and release the lock so that the oom killed task may exit.
      
      This mainly is for threads that are cloned with CLONE_VM but not
      CLONE_THREAD, so they are in a different thread group.  Non-NPTL threads
      exist in the wild and this change is necessary to prevent the livelock in
      such cases.  We care more about preventing the livelock than incurring the
      additional tasklist in the oom killer when a task has been killed.
      Systems that are sufficiently large to not want the tasklist scan in the
      oom killer in the first place already have the option of enabling
      /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for
      that purpose.
      
      This code had existed in the oom killer for over eight years dating back
      to the 2.4 kernel.
      
      [akpm@linux-foundation.org: add nice comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e99bad0
    • D
      oom: avoid killing a task if a thread sharing its mm cannot be killed · e18641e1
      David Rientjes 提交于
      The oom killer's goal is to kill a memory-hogging task so that it may
      exit, free its memory, and allow the current context to allocate the
      memory that triggered it in the first place.  Thus, killing a task is
      pointless if other threads sharing its mm cannot be killed because of its
      /proc/pid/oom_adj or /proc/pid/oom_score_adj value.
      
      This patch checks whether any other thread sharing p->mm has an
      oom_score_adj of OOM_SCORE_ADJ_MIN.  If so, the thread cannot be killed
      and oom_badness(p) returns 0, meaning it's unkillable.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e18641e1
    • M
      mm, page-allocator: do not check the state of a non-existant buddy during free · b7f50cfa
      Mel Gorman 提交于
      There is a bug in commit 6dda9d55 ("page allocator: reduce fragmentation
      in buddy allocator by adding buddies that are merging to the tail of the
      free lists") that means a buddy at order MAX_ORDER is checked for merging.
       A page of this order never exists so at times, an effectively random
      piece of memory is being checked.
      
      Alan Curry has reported that this is causing memory corruption in
      userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32).
      It is not clear why this is happening.  It could be a cache coherency
      problem where pages mapped in both user and kernel space are getting
      different cache lines due to the bad read from kernel space
      (http://lkml.org/lkml/2010/10/13/179).  It could also be that there are
      some special registers being io-remapped at the end of the memmap array
      and that a read has special meaning on them.  Compiler bugs have been
      ruled out because the assembly before and after the patch looks relatively
      harmless.
      
      This patch fixes the problem by ensuring we are not reading a possibly
      invalid location of memory.  It's not clear why the read causes corruption
      but one way or the other it is a buggy read.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Corrado Zoccolo <czoccolo@gmail.com>
      Reported-by: NAlan Curry <pacman@kosh.dhis.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7f50cfa
    • K
      mm: fix return value of scan_lru_pages in memory unplug · f8f72ad5
      KAMEZAWA Hiroyuki 提交于
      scan_lru_pages returns pfn. So, it's type should be "unsigned long"
      not "int".
      
      Note: I guess this has been work until now because memory hotplug tester's
            machine has not very big memory....
            physical address < 32bit << PAGE_SHIFT.
      Reported-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f72ad5
  2. 25 10月, 2010 1 次提交
  3. 24 10月, 2010 1 次提交
  4. 19 10月, 2010 1 次提交
  5. 12 10月, 2010 2 次提交
  6. 11 10月, 2010 1 次提交
    • A
      Fix migration.c compilation on s390 · 3ef8fd7f
      Andi Kleen 提交于
      31bit s390 doesn't have huge pages and failed with:
      
      > mm/migrate.c: In function 'remove_migration_pte':
      > mm/migrate.c:143:3: error: implicit declaration of function 'pte_mkhuge'
      > mm/migrate.c:143:7: error: incompatible types when assigning to type 'pte_t' from type 'int'
      
      Put that code into a ifdef.
      
      Reported by Heiko Carstens
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      3ef8fd7f
  7. 08 10月, 2010 19 次提交
  8. 07 10月, 2010 3 次提交
  9. 06 10月, 2010 4 次提交
    • C
      slub: Move functions to reduce #ifdefs · a5a84755
      Christoph Lameter 提交于
      There is a lot of #ifdef/#endifs that can be avoided if functions would be in different
      places. Move them around and reduce #ifdef.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      a5a84755
    • C
      slub: Enable sysfs support for !CONFIG_SLUB_DEBUG · ab4d5ed5
      Christoph Lameter 提交于
      Currently disabling CONFIG_SLUB_DEBUG also disabled SYSFS support meaning
      that the slabs cannot be tuned without DEBUG.
      
      Make SYSFS support independent of CONFIG_SLUB_DEBUG
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      ab4d5ed5
    • P
      SLUB: Optimize slab_free() debug check · 15b7c514
      Pekka Enberg 提交于
      This patch optimizes slab_free() debug check to use "c->node != NUMA_NO_NODE"
      instead of "c->node >= 0" because the former generates smaller code on x86-64:
      
        Before:
      
          4736:       48 39 70 08             cmp    %rsi,0x8(%rax)
          473a:       75 26                   jne    4762 <kfree+0xa2>
          473c:       44 8b 48 10             mov    0x10(%rax),%r9d
          4740:       45 85 c9                test   %r9d,%r9d
          4743:       78 1d                   js     4762 <kfree+0xa2>
      
        After:
      
          4736:       48 39 70 08             cmp    %rsi,0x8(%rax)
          473a:       75 23                   jne    475f <kfree+0x9f>
          473c:       83 78 10 ff             cmpl   $0xffffffffffffffff,0x10(%rax)
          4740:       74 1d                   je     475f <kfree+0x9f>
      
      This patch also cleans up __slab_alloc() to use NUMA_NO_NODE instead of "-1"
      for enabling debugging for a per-CPU cache.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      15b7c514
    • Y
      memblock: Fix wraparound in find_region() · f1af98c7
      Yinghai Lu 提交于
      When trying to find huge range for crashkernel, get
      
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] WARNING: at arch/x86/mm/memblock.c:248 memblock_x86_reserve_range+0x40/0x7a()
      [    0.000000] Hardware name: Sun Fire x4800
      [    0.000000] memblock_x86_reserve_range: wrong range [0xffffffff37000000, 0x137000000)
      [    0.000000] Modules linked in:
      [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.36-rc5-tip-yh-01876-g1cac214-dirty #59
      [    0.000000] Call Trace:
      [    0.000000]  [<ffffffff82816f7e>] ? memblock_x86_reserve_range+0x40/0x7a
      [    0.000000]  [<ffffffff81078c2d>] warn_slowpath_common+0x85/0x9e
      [    0.000000]  [<ffffffff81078d38>] warn_slowpath_fmt+0x6e/0x70
      [    0.000000]  [<ffffffff8281e77c>] ? memblock_find_region+0x40/0x78
      [    0.000000]  [<ffffffff8281eb1f>] ? memblock_find_base+0x9a/0xb9
      [    0.000000]  [<ffffffff82816f7e>] memblock_x86_reserve_range+0x40/0x7a
      [    0.000000]  [<ffffffff8280452c>] setup_arch+0x99d/0xb2a
      [    0.000000]  [<ffffffff810a3e02>] ? trace_hardirqs_off+0xd/0xf
      [    0.000000]  [<ffffffff81cec7d8>] ? _raw_spin_unlock_irqrestore+0x3d/0x4c
      [    0.000000]  [<ffffffff827ffcec>] start_kernel+0xde/0x3f1
      [    0.000000]  [<ffffffff827ff2d4>] x86_64_start_reservations+0xa0/0xa4
      [    0.000000]  [<ffffffff827ff3de>] x86_64_start_kernel+0x106/0x10d
      [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
      [    0.000000] Reserving 8192MB of memory at 17592186041200MB for crashkernel (System RAM: 526336MB)
      
      This is caused by a wraparound in the test due to size > end;
      explicitly check for this condition and fail.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CAA4DD3.1080401@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f1af98c7
  10. 05 10月, 2010 2 次提交
    • H
      ksm: fix bad user data when swapping · 4e31635c
      Hugh Dickins 提交于
      Building under memory pressure, with KSM on 2.6.36-rc5, collapsed with
      an internal compiler error: typically indicating an error in swapping.
      
      Perhaps there's a timing issue which makes it now more likely, perhaps
      it's just a long time since I tried for so long: this bug goes back to
      KSM swapping in 2.6.33.
      
      Notice how reuse_swap_page() allows an exclusive page to be reused, but
      only does SetPageDirty if it can delete it from swap cache right then -
      if it's currently under Writeback, it has to be left in cache and we
      don't SetPageDirty, but the page can be reused.  Fine, the dirty bit
      will get set in the pte; but notice how zap_pte_range() does not bother
      to transfer pte_dirty to page_dirty when unmapping a PageAnon.
      
      If KSM chooses to share such a page, it will look like a clean copy of
      swapcache, and not be written out to swap when its memory is needed;
      then stale data read back from swap when it's needed again.
      
      We could fix this in reuse_swap_page() (or even refuse to reuse a
      page under writeback), but it's more honest to fix my oversight in
      KSM's write_protect_page().  Several days of testing on three machines
      confirms that this fixes the issue they showed.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e31635c
    • H
      ksm: fix page_address_in_vma anon_vma oops · 4829b906
      Hugh Dickins 提交于
      2.6.36-rc1 commit 21d0d443 "rmap:
      resurrect page_address_in_vma anon_vma check" was right to resurrect
      that check; but now that it's comparing anon_vma->roots instead of
      just anon_vmas, there's a danger of oopsing on a NULL anon_vma.
      
      In most cases no NULL anon_vma ever gets here; but it turns out that
      occasionally KSM, when enabled on a forked or forking process, will
      itself call page_address_in_vma() on a "half-KSM" page left over from
      an earlier failed attempt to merge - whose page_anon_vma() is NULL.
      
      It's my bug that those should be getting here at all: I thought they
      were already dealt with, this oops proves me wrong, I'll fix it in
      the next release - such pages are effectively pinned until their
      process exits, since rmap cannot find their ptes (though swapoff can).
      
      For now just work around it by making page_address_in_vma() safe (and
      add a comment on why that check is wanted anyway).  A similar check
      in __page_check_anon_rmap() is safe because do_page_add_anon_rmap()
      already excluded KSM pages.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4829b906
  11. 02 10月, 2010 1 次提交