1. 27 10月, 2010 13 次提交
    • M
      writeback: add nr_dirtied and nr_written to /proc/vmstat · ea941f0e
      Michael Rubin 提交于
      To help developers and applications gain visibility into writeback
      behaviour adding two entries to vm_stat_items and /proc/vmstat.  This will
      allow us to track the "written" and "dirtied" counts.
      
         # grep nr_dirtied /proc/vmstat
         nr_dirtied 3747
         # grep nr_written /proc/vmstat
         nr_written 3618
      Signed-off-by: NMichael Rubin <mrubin@google.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea941f0e
    • M
      mm: add account_page_writeback() · f629d1c9
      Michael Rubin 提交于
      To help developers and applications gain visibility into writeback
      behaviour this patch adds two counters to /proc/vmstat.
      
        # grep nr_dirtied /proc/vmstat
        nr_dirtied 3747
        # grep nr_written /proc/vmstat
        nr_written 3618
      
      These entries allow user apps to understand writeback behaviour over time
      and learn how it is impacting their performance.  Currently there is no
      way to inspect dirty and writeback speed over time.  It's not possible for
      nr_dirty/nr_writeback.
      
      These entries are necessary to give visibility into writeback behaviour.
      We have /proc/diskstats which lets us understand the io in the block
      layer.  We have blktrace for more in depth understanding.  We have
      e2fsprogs and debugsfs to give insight into the file systems behaviour,
      but we don't offer our users the ability understand what writeback is
      doing.  There is no way to know how active it is over the whole system, if
      it's falling behind or to quantify it's efforts.  With these values
      exported users can easily see how much data applications are sending
      through writeback and also at what rates writeback is processing this
      data.  Comparing the rates of change between the two allow developers to
      see when writeback is not able to keep up with incoming traffic and the
      rate of dirty memory being sent to the IO back end.  This allows folks to
      understand their io workloads and track kernel issues.  Non kernel
      engineers at Google often use these counters to solve puzzling performance
      problems.
      
      Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written
      
      Patch #5 add writeback thresholds to /proc/vmstat
      
      Currently these values are in debugfs. But they should be promoted to
      /proc since they are useful for developers who are writing databases
      and file servers and are not debugging the kernel.
      
      The output is as below:
      
       # grep threshold /proc/vmstat
       nr_pages_dirty_threshold 409111
       nr_pages_dirty_background_threshold 818223
      
      This patch:
      
      This allows code outside of the mm core to safely manipulate page
      writeback state and not worry about the other accounting.  Not using these
      routines means that some code will lose track of the accounting and we get
      bugs.
      
      Modify nilfs2 to use interface.
      Signed-off-by: NMichael Rubin <mrubin@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Jiro SEKIBA <jir@unicus.jp>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f629d1c9
    • V
      mm/mempolicy.c: check return code of check_range · 0def08e3
      Vasiliy Kulikov 提交于
      Function check_range may return ERR_PTR(...). Check for it.
      Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0def08e3
    • M
      vmscan: prevent background aging of anon page in no swap system · 74e3f3c3
      Minchan Kim 提交于
      Ying Han reported that backing aging of anon pages in no swap system
      causes unnecessary TLB flush.
      
      When I sent a patch(69c85481), I wanted this patch but Rik pointed out
      and allowed aging of anon pages to give a chance to promote from inactive
      to active LRU.
      
      It has a two problem.
      
      1) non-swap system
      
      Never make sense to age anon pages.
      
      2) swap configured but still doesn't swapon
      
      It doesn't make sense to age anon pages until swap-on time.  But it's
      arguable.  If we have aged anon pages by swapon, VM have moved anon pages
      from active to inactive.  And in the time swapon by admin, the VM can't
      reclaim hot pages so we can protect hot pages swapout.
      
      But let's think about it.  When does swap-on happen?  It depends on admin.
       we can't expect it.  Nonetheless, we have done aging of anon pages to
      protect hot pages swapout.  It means we lost run time overhead when below
      high watermark but gain hot page swap-[in/out] overhead when VM decide
      swapout.  Is it true?  Let's think more detail.  We don't promote anon
      pages in case of non-swap system.  So even though VM does aging of anon
      pages, the pages would be in inactive LRU for a long time.  It means many
      of pages in there would mark access bit again.  So access bit hot/code
      separation would be pointless.
      
      This patch prevents unnecessary anon pages demotion in not-yet-swapon and
      non-configured swap system.  Even, in non-configuared swap system
      inactive_anon_is_low can be compiled out.
      
      It could make side effect that hot anon pages could swap out when admin
      does swap on.  But I think sooner or later it would be steady state.  So
      it's not a big problem.
      
      We could lose someting but gain more thing(TLB flush and unnecessary
      function call to demote anon pages).
      Signed-off-by: NYing Han <yinghan@google.com>
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74e3f3c3
    • K
      memory hotplug: unify is_removable and offline detection code · 49ac8255
      KAMEZAWA Hiroyuki 提交于
      Now, sysfs interface of memory hotplug shows whether the section is
      removable or not.  But it checks only migrateype of pages and doesn't
      check details of cluster of pages.
      
      Next, memory hotplug's set_migratetype_isolate() has the same kind of
      check, too.
      
      This patch adds the function __count_unmovable_pages() and makes above 2
      checks to use the same logic.  Then, is_removable and hotremove code uses
      the same logic.  No changes in the hotremove logic itself.
      
      TODO: need to find a way to check RECLAMABLE. But, considering bit,
            calling shrink_slab() against a range before starting memory hotremove
            sounds better. If so, this patch's logic doesn't need to be changed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49ac8255
    • K
      memory hotplug: fix notifier's return value check · 4b20477f
      KAMEZAWA Hiroyuki 提交于
      Even if notifier cannot find any pages, it doesn't mean no pages are
      available...And, if there are no notifiers registered, this condition will
      be always true and memory hotplug will show -EBUSY.
      
      This is a bug but not critical.
      
      In most case, a pageblock which will be offlined is MIGRATE_MOVABLE This
      "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE.
      But if not MIGRATE_MOVABLE, it's common case that memory hotplug will
      fail.  So, no one notice this bug.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b20477f
    • M
      mm: compaction: fix COMPACTPAGEFAILED counting · cf608ac1
      Minchan Kim 提交于
      Presently update_nr_listpages() doesn't have a role.  That's because lists
      passed is always empty just after calling migrate_pages.  The
      migrate_pages cleans up page list which have failed to migrate before
      returning by aaa994b3.
      
       [PATCH] page migration: handle freeing of pages in migrate_pages()
      
       Do not leave pages on the lists passed to migrate_pages().  Seems that we will
       not need any postprocessing of pages.  This will simplify the handling of
       pages by the callers of migrate_pages().
      
      At that time, we thought we don't need any postprocessing of pages.  But
      the situation is changed.  The compaction need to know the number of
      failed to migrate for COMPACTPAGEFAILED stat
      
      This patch makes new rule for caller of migrate_pages to call
      putback_lru_pages.  So caller need to clean up the lists so it has a
      chance to postprocess the pages.  [suggested by Christoph Lameter]
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf608ac1
    • T
      mm: only build per-node scan_unevictable functions when NUMA is enabled · e4455abb
      Thadeu Lima de Souza Cascardo 提交于
      Non-NUMA systems do never create these files anyway, since they are only
      created by driver subsystem when NUMA is configured.
      
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4455abb
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
    • D
      oom: kill all threads sharing oom killed task's mm · 1e99bad0
      David Rientjes 提交于
      It's necessary to kill all threads that share an oom killed task's mm if
      the goal is to lead to future memory freeing.
      
      This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill
      doesn't kill vfork parent (or child)) since it is obsoleted.
      
      It's now guaranteed that any task passed to oom_kill_task() does not share
      an mm with any thread that is unkillable.  Thus, we're safe to issue a
      SIGKILL to any thread sharing the same mm.
      
      This is especially necessary to solve an mm->mmap_sem livelock issue
      whereas an oom killed thread must acquire the lock in the exit path while
      another thread is holding it in the page allocator while trying to
      allocate memory itself (and will preempt the oom killer since a task was
      already killed).  Since tasks with pending fatal signals are now granted
      access to memory reserves, the thread holding the lock may quickly
      allocate and release the lock so that the oom killed task may exit.
      
      This mainly is for threads that are cloned with CLONE_VM but not
      CLONE_THREAD, so they are in a different thread group.  Non-NPTL threads
      exist in the wild and this change is necessary to prevent the livelock in
      such cases.  We care more about preventing the livelock than incurring the
      additional tasklist in the oom killer when a task has been killed.
      Systems that are sufficiently large to not want the tasklist scan in the
      oom killer in the first place already have the option of enabling
      /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for
      that purpose.
      
      This code had existed in the oom killer for over eight years dating back
      to the 2.4 kernel.
      
      [akpm@linux-foundation.org: add nice comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e99bad0
    • D
      oom: avoid killing a task if a thread sharing its mm cannot be killed · e18641e1
      David Rientjes 提交于
      The oom killer's goal is to kill a memory-hogging task so that it may
      exit, free its memory, and allow the current context to allocate the
      memory that triggered it in the first place.  Thus, killing a task is
      pointless if other threads sharing its mm cannot be killed because of its
      /proc/pid/oom_adj or /proc/pid/oom_score_adj value.
      
      This patch checks whether any other thread sharing p->mm has an
      oom_score_adj of OOM_SCORE_ADJ_MIN.  If so, the thread cannot be killed
      and oom_badness(p) returns 0, meaning it's unkillable.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e18641e1
    • M
      mm, page-allocator: do not check the state of a non-existant buddy during free · b7f50cfa
      Mel Gorman 提交于
      There is a bug in commit 6dda9d55 ("page allocator: reduce fragmentation
      in buddy allocator by adding buddies that are merging to the tail of the
      free lists") that means a buddy at order MAX_ORDER is checked for merging.
       A page of this order never exists so at times, an effectively random
      piece of memory is being checked.
      
      Alan Curry has reported that this is causing memory corruption in
      userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32).
      It is not clear why this is happening.  It could be a cache coherency
      problem where pages mapped in both user and kernel space are getting
      different cache lines due to the bad read from kernel space
      (http://lkml.org/lkml/2010/10/13/179).  It could also be that there are
      some special registers being io-remapped at the end of the memmap array
      and that a read has special meaning on them.  Compiler bugs have been
      ruled out because the assembly before and after the patch looks relatively
      harmless.
      
      This patch fixes the problem by ensuring we are not reading a possibly
      invalid location of memory.  It's not clear why the read causes corruption
      but one way or the other it is a buggy read.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Corrado Zoccolo <czoccolo@gmail.com>
      Reported-by: NAlan Curry <pacman@kosh.dhis.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7f50cfa
    • K
      mm: fix return value of scan_lru_pages in memory unplug · f8f72ad5
      KAMEZAWA Hiroyuki 提交于
      scan_lru_pages returns pfn. So, it's type should be "unsigned long"
      not "int".
      
      Note: I guess this has been work until now because memory hotplug tester's
            machine has not very big memory....
            physical address < 32bit << PAGE_SHIFT.
      Reported-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f72ad5
  2. 25 10月, 2010 1 次提交
  3. 24 10月, 2010 1 次提交
  4. 19 10月, 2010 1 次提交
  5. 12 10月, 2010 2 次提交
  6. 11 10月, 2010 1 次提交
    • A
      Fix migration.c compilation on s390 · 3ef8fd7f
      Andi Kleen 提交于
      31bit s390 doesn't have huge pages and failed with:
      
      > mm/migrate.c: In function 'remove_migration_pte':
      > mm/migrate.c:143:3: error: implicit declaration of function 'pte_mkhuge'
      > mm/migrate.c:143:7: error: incompatible types when assigning to type 'pte_t' from type 'int'
      
      Put that code into a ifdef.
      
      Reported by Heiko Carstens
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      3ef8fd7f
  7. 08 10月, 2010 19 次提交
  8. 07 10月, 2010 2 次提交