1. 18 5月, 2009 1 次提交
    • Y
      mm, x86: remove MEMORY_HOTPLUG_RESERVE related code · 888a589f
      Yinghai Lu 提交于
      after:
      
       | commit b263295d
       | Author: Christoph Lameter <clameter@sgi.com>
       | Date:   Wed Jan 30 13:30:47 2008 +0100
       |
       |    x86: 64-bit, make sparsemem vmemmap the only memory model
      
      we don't have MEMORY_HOTPLUG_RESERVE anymore.
      
      Historically, x86-64 had an architecture-specific method for memory hotplug
      whereby it scanned the SRAT for physical memory ranges that could be
      potentially used for memory hot-add later. By reserving those ranges
      without physical memory, the memmap would be allocated and left dormant
      until needed. This depended on the DISCONTIG memory model which has been
      removed so the code implementing HOTPLUG_RESERVE is now dead.
      
      This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.
      
      (Changelog authored by Mel.)
      
      v2: updated changelog, and remove hotadd= in doc
      
      [ Impact: remove dead code ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Workflow-found-OK-by: NAndrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4A0C4910.7090508@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      888a589f
  2. 15 5月, 2009 1 次提交
    • J
      Revert "mm: add /proc controls for pdflush threads" · cd17cbfd
      Jens Axboe 提交于
      This reverts commit fafd688e.
      
      Work is progressing to switch away from pdflush as the process backing
      for flushing out dirty data. So it seems pointless to add more knobs
      to control pdflush threads. The original author of the patch did not
      have any specific use cases for adding the knobs, so we can easily
      revert this before 2.6.30 to avoid having to maintain this API
      forever.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cd17cbfd
  3. 13 5月, 2009 1 次提交
  4. 08 5月, 2009 1 次提交
  5. 07 5月, 2009 5 次提交
  6. 06 5月, 2009 1 次提交
    • M
      Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions · a425a638
      Mel Gorman 提交于
      madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
      backed by a file.  The assumption is made that the page required is
      order-0 and "normal" page cache.
      
      On hugetlbfs, this assumption is not true and order-0 pages are
      allocated and inserted into the hugetlbfs page cache.  This leaks
      hugetlbfs page reservations and can cause BUGs to trigger related to
      corrupted page tables.
      
      This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
      regions.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a425a638
  7. 03 5月, 2009 6 次提交
    • A
      vmscan: avoid multiplication overflow in shrink_zone() · 8713e012
      Andrew Morton 提交于
      Local variable `scan' can overflow on zones which are larger than
      
      	(2G * 4k) / 100 = 80GB.
      
      Making it 64-bit on 64-bit will fix that up.
      
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8713e012
    • K
      mm: fix Committed_AS underflow on large NR_CPUS environment · 00a62ce9
      KOSAKI Motohiro 提交于
      The Committed_AS field can underflow in certain situations:
      
      >         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
      >               1 Committed_AS: 18446744073709323392 kB
      >              11 Committed_AS: 18446744073709455488 kB
      >               6 Committed_AS:    35136 kB
      >               5 Committed_AS: 18446744073709454400 kB
      >               7 Committed_AS:    35904 kB
      >               3 Committed_AS: 18446744073709453248 kB
      >               2 Committed_AS:    34752 kB
      >               9 Committed_AS: 18446744073709453248 kB
      >               8 Committed_AS:    34752 kB
      >               3 Committed_AS: 18446744073709320960 kB
      >               7 Committed_AS: 18446744073709454080 kB
      >               3 Committed_AS: 18446744073709320960 kB
      >               5 Committed_AS: 18446744073709454080 kB
      >               6 Committed_AS: 18446744073709320960 kB
      
      Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
      not check for underflow.
      
      But NR_CPUS proportional isn't good calculation.  In general,
      possibility of lock contention is proportional to the number of online
      cpus, not theorical maximum cpus (NR_CPUS).
      
      The current kernel has generic percpu-counter stuff.  using it is right
      way.  it makes code simplify and percpu_counter_read_positive() don't
      make underflow issue.
      Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[All kernel versions]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00a62ce9
    • D
      memcg: fix mem_cgroup_shrink_usage() · ae3abae6
      Daisuke Nishimura 提交于
      Current mem_cgroup_shrink_usage() has two problems.
      
      1. It doesn't call mem_cgroup_out_of_memory and doesn't update
         last_oom_jiffies, so pagefault_out_of_memory invokes global OOM.
      
      2. Considering hierarchy, shrinking has to be done from the
         mem_over_limit, not from the memcg which the page would be charged to.
      
      mem_cgroup_try_charge_swapin() does all of these things properly, so we
      use it and call cancel_charge_swapin when it succeeded.
      
      The name of "shrink_usage" is not appropriate for this behavior, so we
      change it too.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.cn>
      Cc: Paul Menage <menage@google.com>
      Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae3abae6
    • N
      mm: close page_mkwrite races · b827e496
      Nick Piggin 提交于
      Change page_mkwrite to allow implementations to return with the page
      locked, and also change it's callers (in page fault paths) to hold the
      lock until the page is marked dirty.  This allows the filesystem to have
      full control of page dirtying events coming from the VM.
      
      Rather than simply hold the page locked over the page_mkwrite call, we
      call page_mkwrite with the page unlocked and allow callers to return with
      it locked, so filesystems can avoid LOR conditions with page lock.
      
      The problem with the current scheme is this: a filesystem that wants to
      associate some metadata with a page as long as the page is dirty, will
      perform this manipulation in its ->page_mkwrite.  It currently then must
      return with the page unlocked and may not hold any other locks (according
      to existing page_mkwrite convention).
      
      In this window, the VM could write out the page, clearing page-dirty.  The
      filesystem has no good way to detect that a dirty pte is about to be
      attached, so it will happily write out the page, at which point, the
      filesystem may manipulate the metadata to reflect that the page is no
      longer dirty.
      
      It is not always possible to perform the required metadata manipulation in
      ->set_page_dirty, because that function cannot block or fail.  The
      filesystem may need to allocate some data structure, for example.
      
      And the VM cannot mark the pte dirty before page_mkwrite, because
      page_mkwrite is allowed to fail, so we must not allow any window where the
      page could be written to if page_mkwrite does fail.
      
      This solution of holding the page locked over the 3 critical operations
      (page_mkwrite, setting the pte dirty, and finally setting the page dirty)
      closes out races nicely, preventing page cleaning for writeout being
      initiated in that window.  This provides the filesystem with a strong
      synchronisation against the VM here.
      
      - Sage needs this race closed for ceph filesystem.
      - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
      - I need it for fsblock.
      - I suspect other filesystems may need it too (eg. btrfs).
      - I have converted buffer.c to the new locking. Even simple block allocation
        under dirty pages might be susceptible to i_size changing under partial page
        at the end of file (we also have a buffer.c-side problem here, but it cannot
        be fixed properly without this patch).
      - Other filesystems (eg. NFS, maybe btrfs) will need to change their
        page_mkwrite functions themselves.
      
      [ This also moves page_mkwrite another step closer to fault, which should
        eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
        filesystem calldown and page lock/unlock cycle in __do_fault. ]
      
      [akpm@linux-foundation.org: fix derefs of NULL ->mapping]
      Cc: Sage Weil <sage@newdream.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b827e496
    • D
      memcg: fix try_get_mem_cgroup_from_swapcache() · c0bd3f63
      Daisuke Nishimura 提交于
      This is a bugfix for commit 3c776e64
      ("memcg: charge swapcache to proper memcg").
      
      Used bit of swapcache is solid under page lock, but considering
      move_account, pc->mem_cgroup is not.
      
      We need lock_page_cgroup() anyway.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0bd3f63
    • J
      mm: fix pageref leak in do_swap_page() · bc43f75c
      Johannes Weiner 提交于
      By the time the memory cgroup code is notified about a swapin we
      already hold a reference on the fault page.
      
      If the cgroup callback fails make sure to unlock AND release the page
      reference which was taken by lookup_swap_cach(), or we leak the reference.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc43f75c
  8. 22 4月, 2009 1 次提交
  9. 19 4月, 2009 1 次提交
  10. 17 4月, 2009 1 次提交
  11. 16 4月, 2009 1 次提交
  12. 14 4月, 2009 6 次提交
  13. 07 4月, 2009 3 次提交
    • P
      mm: add /proc controls for pdflush threads · fafd688e
      Peter W Morreale 提交于
      Add /proc entries to give the admin the ability to control the minimum and
      maximum number of pdflush threads.  This allows finer control of pdflush
      on both large and small machines.
      
      The rationale is simply one size does not fit all.  Admins on large and/or
      small systems may want to tune the min/max pdflush thread count to best
      suit their needs.  Right now the min/max is hardcoded to 2/8.  While
      probably a fair estimate for smaller machines, large machines with large
      numbers of CPUs and large numbers of filesystems/block devices may benefit
      from larger numbers of threads working on different block devices.
      
      Even if the background flushing algorithm is radically changed, it is
      still likely that multiple threads will be involved and admins would still
      desire finer control on the min/max other than to have to recompile the
      kernel.
      
      The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
      '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.
      
      The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
      the current value of nr_pdflush_threads_max.  This minimum is required
      since additional thread creation is performed in a pdflush thread itself.
      
      The minimum value for nr_pdflush_threads_max is the current value of
      nr_pdflush_threads_min and the maximum value can be 1000.
      
      Documentation/sysctl/vm.txt is also updated.
      
      [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
      Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fafd688e
    • P
      mm: fix pdflush thread creation upper bound · a56ed663
      Peter W Morreale 提交于
      Fix a race on creating pdflush threads.  Without the patch, it is possible
      to create more than MAX_PDFLUSH_THREADS threads, and this has been
      observed in practice on IO loaded SMP machines.
      
      The fix involves moving the lock around to protect the check against the
      thread count and correctly dealing with thread creation failure.
      
      This fix also _mostly_ repairs a race condition on how quickly the threads
      are created.  The original intent was to create a pdflush thread (up to
      the max allowed) every second.  Without this patch is is possible to
      create NCPUS pdflush threads concurrently.  The 'mostly' caveat is because
      an assumption is made that thread creation will be successful.  If we fail
      to create the thread, the miss is not considered fatal.  (we will try
      again in 1 second)
      Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a56ed663
    • S
      percpu: __percpu_depopulate_mask can take a const mask · 5d6700ea
      Stephen Rothwell 提交于
      This eliminates a compiler warning:
      
        mm/allocpercpu.c: In function 'free_percpu':
        mm/allocpercpu.c:146: warning: passing argument 2 of '__percpu_depopulate_mask' discards qualifiers from pointer target type
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6700ea
  14. 06 4月, 2009 1 次提交
  15. 04 4月, 2009 1 次提交
  16. 03 4月, 2009 9 次提交