1. 01 8月, 2012 10 次提交
  2. 06 7月, 2012 1 次提交
    • R
      mm: cma: don't replace lowmem pages with highmem · 6a6dccba
      Rabin Vincent 提交于
      The filesystem layer expects pages in the block device's mapping to not
      be in highmem (the mapping's gfp mask is set in bdget()), but CMA can
      currently replace lowmem pages with highmem pages, leading to crashes in
      filesystem code such as the one below:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000400
        pgd = c0c98000
        [00000400] *pgd=00c91831, *pte=00000000, *ppte=00000000
        Internal error: Oops: 817 [#1] PREEMPT SMP ARM
        CPU: 0    Not tainted  (3.5.0-rc5+ #80)
        PC is at __memzero+0x24/0x80
        ...
        Process fsstress (pid: 323, stack limit = 0xc0cbc2f0)
        Backtrace:
        [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98)
        [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc)
         r4:c15337f0
        [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98)
        [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac)
         r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0
        [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24)
         r6:beccdcf0 r5:00074000 r4:beccdbbc
        [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
      
      Fix this by replacing only highmem pages with highmem.
      Reported-by: NLaura Abbott <lauraa@codeaurora.org>
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      6a6dccba
  3. 04 6月, 2012 1 次提交
  4. 30 5月, 2012 8 次提交
  5. 24 5月, 2012 1 次提交
  6. 21 5月, 2012 9 次提交
  7. 12 5月, 2012 1 次提交
    • H
      mm: raise MemFree by reverting percpu_pagelist_fraction to 0 · 1b76b02f
      Hugh Dickins 提交于
      Why is there less MemFree than there used to be?  It perturbed a test,
      so I've just been bisecting linux-next, and now find the offender went
      upstream yesterday.
      
      Commit 93278814 "mm: fix division by 0 in percpu_pagelist_fraction()"
      mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
      which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
      us expect it to be left unset at 0 (and it's not then used as a divisor).
      
        MemTotal: 8061476kB  8061476kB  8061476kB  8061476kB  8061476kB  8061476kB
        Repetitive test with percpu_pagelist_fraction 8:
        MemFree:  6948420kB  6237172kB  6949696kB  6840692kB  6949048kB  6862984kB
        Same test with percpu_pagelist_fraction back to 0:
        MemFree:  7945000kB  7944908kB  7948568kB  7949060kB  7948796kB  7948812kB
      Signed-off-by: NHugh Dickins <hughd@google.com>
      [ We really should fix the crazy sysctl interface too, but that's a
        separate thing - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b76b02f
  8. 11 5月, 2012 1 次提交
  9. 08 5月, 2012 1 次提交
  10. 29 3月, 2012 2 次提交
    • G
      mm: only IPI CPUs to drain local pages if they exist · 74046494
      Gilad Ben-Yossef 提交于
      Calculate a cpumask of CPUs with per-cpu pages in any zone and only send
      an IPI requesting CPUs to drain these pages to the buddy allocator if they
      actually have pages when asked to flush.
      
      This patch saves 85%+ of IPIs asking to drain per-cpu pages in case of
      severe memory pressure that leads to OOM since in these cases multiple,
      possibly concurrent, allocation requests end up in the direct reclaim code
      path so when the per-cpu pages end up reclaimed on first allocation
      failure for most of the proceeding allocation attempts until the memory
      pressure is off (possibly via the OOM killer) there are no per-cpu pages
      on most CPUs (and there can easily be hundreds of them).
      
      This also has the side effect of shortening the average latency of direct
      reclaim by 1 or more order of magnitude since waiting for all the CPUs to
      ACK the IPI takes a long time.
      
      Tested by running "hackbench 400" on a 8 CPU x86 VM and observing the
      difference between the number of direct reclaim attempts that end up in
      drain_all_pages() and those were more then 1/2 of the online CPU had any
      per-cpu page in them, using the vmstat counters introduced in the next
      patch in the series and using proc/interrupts.
      
      In the test sceanrio, this was seen to save around 3600 global
      IPIs after trigerring an OOM on a concurrent workload:
      
      $ cat /proc/vmstat | tail -n 2
      pcp_global_drain 0
      pcp_global_ipi_saved 0
      
      $ cat /proc/interrupts | grep CAL
      CAL:          1          2          1          2
                2          2          2          2   Function call interrupts
      
      $ hackbench 400
      [OOM messages snipped]
      
      $ cat /proc/vmstat | tail -n 2
      pcp_global_drain 3647
      pcp_global_ipi_saved 3642
      
      $ cat /proc/interrupts | grep CAL
      CAL:          6         13          6          3
                3          3         1 2          7   Function call interrupts
      
      Please note that if the global drain is removed from the direct reclaim
      path as a patch from Mel Gorman currently suggests this should be replaced
      with an on_each_cpu_cond invocation.
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74046494
    • D
      mm, coredump: fail allocations when coredumping instead of oom killing · 29fd66d2
      David Rientjes 提交于
      The size of coredump files is limited by RLIMIT_CORE, however, allocating
      large amounts of memory results in three negative consequences:
      
       - the coredumping process may be chosen for oom kill and quickly deplete
         all memory reserves in oom conditions preventing further progress from
         being made or tasks from exiting,
      
       - the coredumping process may cause other processes to be oom killed
         without fault of their own as the result of a SIGSEGV, for example, in
         the coredumping process, or
      
       - the coredumping process may result in a livelock while writing to the
         dump file if it needs memory to allocate while other threads are in
         the exit path waiting on the coredumper to complete.
      
      This is fixed by implying __GFP_NORETRY in the page allocator for
      coredumping processes when reclaim has failed so the allocations fail and
      the process continues to exit.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29fd66d2
  11. 22 3月, 2012 5 次提交
    • K
      page_alloc: remove unused find_zone_movable_pfns_for_nodes() argument · b224ef85
      Kautuk Consul 提交于
      find_zone_movable_pfns_for_nodes() does not use its argument.
      Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b224ef85
    • K
      page_alloc.c: remove add_from_early_node_map() · 8d13bddd
      Kautuk Consul 提交于
      add_from_early_node_map() is unused.
      Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d13bddd
    • M
      cpuset: mm: reduce large amounts of memory barrier related damage v3 · cc9a6c87
      Mel Gorman 提交于
      Commit c0ff7453 ("cpuset,mm: fix no node to alloc memory when
      changing cpuset's mems") wins a super prize for the largest number of
      memory barriers entered into fast paths for one commit.
      
      [get|put]_mems_allowed is incredibly heavy with pairs of full memory
      barriers inserted into a number of hot paths.  This was detected while
      investigating at large page allocator slowdown introduced some time
      after 2.6.32.  The largest portion of this overhead was shown by
      oprofile to be at an mfence introduced by this commit into the page
      allocator hot path.
      
      For extra style points, the commit introduced the use of yield() in an
      implementation of what looks like a spinning mutex.
      
      This patch replaces the full memory barriers on both read and write
      sides with a sequence counter with just read barriers on the fast path
      side.  This is much cheaper on some architectures, including x86.  The
      main bulk of the patch is the retry logic if the nodemask changes in a
      manner that can cause a false failure.
      
      While updating the nodemask, a check is made to see if a false failure
      is a risk.  If it is, the sequence number gets bumped and parallel
      allocators will briefly stall while the nodemask update takes place.
      
      In a page fault test microbenchmark, oprofile samples from
      __alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
      actual results were
      
                                   3.3.0-rc3          3.3.0-rc3
                                   rc3-vanilla        nobarrier-v2r1
          Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
          Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
          Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
          Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
          Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
          Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
          Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
          Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
          Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
          Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
          Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
          Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
          Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
          Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
          Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
          MMTests Statistics: duration
          Sys Time Running Test (seconds)             135.68    132.17
          User+Sys Time Running Test (seconds)         164.2    160.13
          Total Elapsed Time (seconds)                123.46    120.87
      
      The overall improvement is small but the System CPU time is much
      improved and roughly in correlation to what oprofile reported (these
      performance figures are without profiling so skew is expected).  The
      actual number of page faults is noticeably improved.
      
      For benchmarks like kernel builds, the overall benefit is marginal but
      the system CPU time is slightly reduced.
      
      To test the actual bug the commit fixed I opened two terminals.  The
      first ran within a cpuset and continually ran a small program that
      faulted 100M of anonymous data.  In a second window, the nodemask of the
      cpuset was continually randomised in a loop.
      
      Without the commit, the program would fail every so often (usually
      within 10 seconds) and obviously with the commit everything worked fine.
      With this patch applied, it also worked fine so the fix should be
      functionally equivalent.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc9a6c87
    • K
      mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug · f0cb3c76
      Konstantin Khlebnikov 提交于
      This cpu hotplug hook was accidentally removed in commit 00a62ce9
      ("mm: fix Committed_AS underflow on large NR_CPUS environment")
      
      The visible effect of this accident: some pages are borrowed in per-cpu
      page-vectors.  Truncate can deal with it, but these pages cannot be
      reused while this cpu is offline.  So this is like a temporary memory
      leak.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0cb3c76
    • D
      mm, oom: force oom kill on sysrq+f · 08ab9b10
      David Rientjes 提交于
      The oom killer chooses not to kill a thread if:
      
       - an eligible thread has already been oom killed and has yet to exit,
         and
      
       - an eligible thread is exiting but has yet to free all its memory and
         is not the thread attempting to currently allocate memory.
      
      SysRq+F manually invokes the global oom killer to kill a memory-hogging
      task.  This is normally done as a last resort to free memory when no
      progress is being made or to test the oom killer itself.
      
      For both uses, we always want to kill a thread and never defer.  This
      patch causes SysRq+F to always kill an eligible thread and can be used to
      force a kill even if another oom killed thread has failed to exit.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ab9b10