1. 29 7月, 2016 6 次提交
    • M
      mm: rename NR_ANON_PAGES to NR_ANON_MAPPED · 4b9d0fab
      Mel Gorman 提交于
      NR_FILE_PAGES  is the number of        file pages.
      NR_FILE_MAPPED is the number of mapped file pages.
      NR_ANON_PAGES  is the number of mapped anon pages.
      
      This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and
      NR_ANON_PAGES for mapped pages.  This patch renames NR_ANON_PAGES so we
      have
      
      NR_FILE_PAGES  is the number of        file pages.
      NR_FILE_MAPPED is the number of mapped file pages.
      NR_ANON_MAPPED is the number of mapped anon pages.
      
      Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b9d0fab
    • M
      mm: move page mapped accounting to the node · 50658e2e
      Mel Gorman 提交于
      Reclaim makes decisions based on the number of pages that are mapped but
      it's mixing node and zone information.  Account NR_FILE_MAPPED and
      NR_ANON_PAGES pages on the node.
      
      Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50658e2e
    • M
      mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj · 44a70ade
      Michal Hocko 提交于
      oom_score_adj is shared for the thread groups (via struct signal) but this
      is not sufficient to cover processes sharing mm (CLONE_VM without
      CLONE_SIGHAND) and so we can easily end up in a situation when some
      processes update their oom_score_adj and confuse the oom killer.  In the
      worst case some of those processes might hide from the oom killer
      altogether via OOM_SCORE_ADJ_MIN while others are eligible.  OOM killer
      would then pick up those eligible but won't be allowed to kill others
      sharing the same mm so the mm wouldn't release the mm and so the memory.
      
      It would be ideal to have the oom_score_adj per mm_struct because that is
      the natural entity OOM killer considers.  But this will not work because
      some programs are doing
      
      	vfork()
      	set_oom_adj()
      	exec()
      
      We can achieve the same though.  oom_score_adj write handler can set the
      oom_score_adj for all processes sharing the same mm if the task is not in
      the middle of vfork.  As a result all the processes will share the same
      oom_score_adj.  The current implementation is rather pessimistic and
      checks all the existing processes by default if there is more than 1
      holder of the mm but we do not have any reliable way to check for external
      users yet.
      
      Link: http://lkml.kernel.org/r/1466426628-15074-5-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44a70ade
    • M
      proc, oom_adj: extract oom_score_adj setting into a helper · 1d5f0acb
      Michal Hocko 提交于
      Currently we have two proc interfaces to set oom_score_adj.  The legacy
      /proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj which both have their
      specific handlers.  Big part of the logic is duplicated so extract the
      common code into __set_oom_adj helper.  Legacy knob still expects some
      details slightly different so make sure those are handled same way - e.g.
      the legacy mode ignores oom_score_adj_min and it warns about the usage.
      
      This patch shouldn't introduce any functional changes.
      
      Link: http://lkml.kernel.org/r/1466426628-15074-4-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d5f0acb
    • M
      proc, oom: drop bogus sighand lock · f913da59
      Michal Hocko 提交于
      Oleg has pointed out that can simplify both oom_adj_{read,write} and
      oom_score_adj_{read,write} even further and drop the sighand lock.  The
      main purpose of the lock was to protect p->signal from going away but this
      will not happen since ea6d290c ("signals: make task_struct->signal
      immutable/refcountable").
      
      The other role of the lock was to synchronize different writers,
      especially those with CAP_SYS_RESOURCE.  Introduce a mutex for this
      purpose.  Later patches will need this lock anyway.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/1466426628-15074-3-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f913da59
    • M
      proc, oom: drop bogus task_lock and mm check · d49fbf76
      Michal Hocko 提交于
      Series "Handle oom bypass more gracefully", V5
      
      The following 10 patches should put some order to very rare cases of mm
      shared between processes and make the paths which bypass the oom killer
      oom reapable and therefore much more reliable finally.  Even though mm
      shared outside of thread group is rare (either vforked tasks for a short
      period, use_mm by kernel threads or exotic thread model of
      clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them.  Not
      only it makes the current oom killer logic quite hard to follow and
      reason about it can lead to weird corner cases.  E.g.  it is possible to
      select an oom victim which shares the mm with unkillable process or
      bypass the oom killer even when other processes sharing the mm are still
      alive and other weird cases.
      
      Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write.
      This can be considered a bug fix with a low impact as nobody has noticed
      for years.
      
      Patch 2 drops sighand lock because it is not needed anymore as pointed
      by Oleg.
      
      Patch 3 is a clean up of oom_score_adj handling and a preparatory work
      for later patches.
      
      Patch 4 enforces oom_adj_score to be consistent between processes
      sharing the mm to behave consistently with the regular thread groups.
      This can be considered a user visible behavior change because one thread
      group updating oom_score_adj will affect others which share the same mm
      via clone(CLONE_VM).  I argue that this should be acceptable because we
      already have the same behavior for threads in the same thread group and
      sharing the mm without signal struct is just a different model of
      threading.  This is probably the most controversial part of the series,
      I would like to find some consensus here.  There were some suggestions
      to hook some counter/oom_score_adj into the mm_struct but I feel that
      this is not necessary right now and we can rely on proc handler +
      oom_kill_process to DTRT.  I can be convinced otherwise but I strongly
      think that whatever we do the userspace has to have a way to see the
      current oom priority as consistently as possible.
      
      Patch 5 makes sure that no vforked task is selected if it is sharing the
      mm with oom unkillable task.
      
      Patch 6 ensures that all user tasks sharing the mm are killed which in
      turn makes sure that all oom victims are oom reapable.
      
      Patch 7 guarantees that task_will_free_mem will always imply reapable
      bypass of the oom killer.
      
      Patch 8 is new in this version and it addresses an issue pointed out by
      0-day OOM report where an oom victim was reaped several times.
      
      Patch 9 puts an upper bound on how many times oom_reaper tries to reap a
      task and hides it from the oom killer to move on when no progress can be
      made.  This will give an upper bound to how long an oom_reapable task
      can block the oom killer from selecting another victim if the oom_reaper
      is not able to reap the victim.
      
      Patch 10 tries to plug the (hopefully) last hole when we can still lock
      up when the oom victim is shared with oom unkillable tasks (kthreads and
      global init).  We just try to be best effort in that case and rather
      fallback to kill something else than risk a lockup.
      
      This patch (of 10):
      
      Both oom_adj_write and oom_score_adj_write are using task_lock, check for
      task->mm and fail if it is NULL.  This is not needed because the
      oom_score_adj is per signal struct so we do not need mm at all.  The code
      has been introduced by 3d5992d2 ("oom: add per-mm oom disable count")
      but we do not do per-mm oom disable since c9f01245 ("oom: remove
      oom_disable_count").
      
      The task->mm check is even not correct because the current thread might
      have exited but the thread group might be still alive - e.g.  thread group
      leader would lead that echo $VAL > /proc/pid/oom_score_adj would always
      fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would
      succeed.  This is unexpected at best.
      
      Remove the lock along with the check to fix the unexpected behavior and
      also because there is not real need for the lock in the first place.
      
      Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d49fbf76
  2. 27 7月, 2016 14 次提交
  3. 26 7月, 2016 1 次提交
  4. 23 7月, 2016 1 次提交
    • Y
      f2fs: get victim segment again after new cp · fe94793e
      Yunlei He 提交于
      Previous selected segment may become free after write_checkpoint,
      if we do garbage collect on this segment, and then new_curseg happen
      to reuse it, it may cause f2fs_bug_on as below.
      
      	panic+0x154/0x29c
      	do_garbage_collect+0x15c/0xaf4
      	f2fs_gc+0x2dc/0x444
      	f2fs_balance_fs.part.22+0xcc/0x14c
      	f2fs_balance_fs+0x28/0x34
      	f2fs_map_blocks+0x5ec/0x790
      	f2fs_preallocate_blocks+0xe0/0x100
      	f2fs_file_write_iter+0x64/0x11c
      	new_sync_write+0xac/0x11c
      	vfs_write+0x144/0x1e4
      	SyS_write+0x60/0xc0
      
      Here, maybe we check sit and ssa type during reset_curseg. So, we check
      segment is stale or not, and select a new victim to avoid this.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe94793e
  5. 22 7月, 2016 8 次提交
    • M
      ovl: verify upper dentry in ovl_remove_and_whiteout() · cfc9fde0
      Maxim Patlasov 提交于
      The upper dentry may become stale before we call ovl_lock_rename_workdir.
      For example, someone could (mistakenly or maliciously) manually unlink(2)
      it directly from upperdir.
      
      To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
      and and check if it matches the upper dentry.
      
      Essentially, it is the same problem and similar solution as in
      commit 11f37104 ("ovl: verify upper dentry before unlink and rename").
      Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      cfc9fde0
    • D
      xfs: remove EXPERIMENTAL tag from sparse inode feature · 72ccbbe1
      Dave Chinner 提交于
      Been around for long enough now, hasn't caused any regression test
      failures in the past 3 months, so it's time to make it a fully
      supported feature.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      72ccbbe1
    • D
      xfs: bufferhead chains are invalid after end_page_writeback · 28b783e4
      Dave Chinner 提交于
      In xfs_finish_page_writeback(), we have a loop that looks like this:
      
              do {
                      if (off < bvec->bv_offset)
                              goto next_bh;
                      if (off > end)
                              break;
                      bh->b_end_io(bh, !error);
      next_bh:
                      off += bh->b_size;
              } while ((bh = bh->b_this_page) != head);
      
      The b_end_io function is end_buffer_async_write(), which will call
      end_page_writeback() once all the buffers have marked as no longer
      under IO.  This issue here is that the only thing currently
      protecting both the bufferhead chain and the page from being
      reclaimed is the PageWriteback state held on the page.
      
      While we attempt to limit the loop to just the buffers covered by
      the IO, we still read from the buffer size and follow the next
      pointer in the bufferhead chain. There is no guarantee that either
      of these are valid after the PageWriteback flag has been cleared.
      Hence, loops like this are completely unsafe, and result in
      use-after-free issues. One such problem was caught by Calvin Owens
      with KASAN:
      
      .....
       INFO: Freed in 0x103fc80ec age=18446651500051355200 cpu=2165122683 pid=-1
        free_buffer_head+0x41/0x90
        __slab_free+0x1ed/0x340
        kmem_cache_free+0x270/0x300
        free_buffer_head+0x41/0x90
        try_to_free_buffers+0x171/0x240
        xfs_vm_releasepage+0xcb/0x3b0
        try_to_release_page+0x106/0x190
        shrink_page_list+0x118e/0x1a10
        shrink_inactive_list+0x42c/0xdf0
        shrink_zone_memcg+0xa09/0xfa0
        shrink_zone+0x2c3/0xbc0
      .....
       Call Trace:
        <IRQ>  [<ffffffff81e8b8e4>] dump_stack+0x68/0x94
        [<ffffffff8153a995>] print_trailer+0x115/0x1a0
        [<ffffffff81541174>] object_err+0x34/0x40
        [<ffffffff815436e7>] kasan_report_error+0x217/0x530
        [<ffffffff81543b33>] __asan_report_load8_noabort+0x43/0x50
        [<ffffffff819d651f>] xfs_destroy_ioend+0x3bf/0x4c0
        [<ffffffff819d69d4>] xfs_end_bio+0x154/0x220
        [<ffffffff81de0c58>] bio_endio+0x158/0x1b0
        [<ffffffff81dff61b>] blk_update_request+0x18b/0xb80
        [<ffffffff821baf57>] scsi_end_request+0x97/0x5a0
        [<ffffffff821c5558>] scsi_io_completion+0x438/0x1690
        [<ffffffff821a8d95>] scsi_finish_command+0x375/0x4e0
        [<ffffffff821c3940>] scsi_softirq_done+0x280/0x340
      
      
      Where the access is occuring during IO completion after the buffer
      had been freed from direct memory reclaim.
      
      Prevent use-after-free accidents in this end_io processing loop by
      pre-calculating the loop conditionals before calling bh->b_end_io().
      The loop is already limited to just the bufferheads covered by the
      IO in progress, so the offset checks are sufficient to prevent
      accessing buffers in the chain after end_page_writeback() has been
      called by the the bh->b_end_io() callout.
      
      Yet another example of why Bufferheads Must Die.
      
      cc: <stable@vger.kernel.org> # 4.7
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reported-and-Tested-by: NCalvin Owens <calvinowens@fb.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      28b783e4
    • D
      xfs: allocate log vector buffers outside CIL context lock · b1c5ebb2
      Dave Chinner 提交于
      One of the problems we currently have with delayed logging is that
      under serious memory pressure we can deadlock memory reclaim. THis
      occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
      inodes and issues a log force to unpin inodes that are dirty in the
      CIL.
      
      The CIL is pushed, but this will only occur once it gets the CIL
      context lock to ensure that all committing transactions are complete
      and no new transactions start being committed to the CIL while the
      push switches to a new context.
      
      The deadlock occurs when the CIL context lock is held by a
      committing process that is doing memory allocation for log vector
      buffers, and that allocation is then blocked on memory reclaim
      making progress. Memory reclaim, however, is blocked waiting for
      a log force to make progress, and so we effectively deadlock at this
      point.
      
      To solve this problem, we have to move the CIL log vector buffer
      allocation outside of the context lock so that memory reclaim can
      always make progress when it needs to force the log. The problem
      with doing this is that a CIL push can take place while we are
      determining if we need to allocate a new log vector buffer for
      an item and hence the current log vector may go away without
      warning. That means we canot rely on the existing log vector being
      present when we finally grab the context lock and so we must have a
      replacement buffer ready to go at all times.
      
      To ensure this, introduce a "shadow log vector" buffer that is
      always guaranteed to be present when we gain the CIL context lock
      and format the item. This shadow buffer may or may not be used
      during the formatting, but if the log item does not have an existing
      log vector buffer or that buffer is too small for the new
      modifications, we swap it for the new shadow buffer and format
      the modifications into that new log vector buffer.
      
      The result of this is that for any object we modify more than once
      in a given CIL checkpoint, we double the memory required
      to track dirty regions in the log. For single modifications then
      we consume the shadow log vectorwe allocate on commit, and that gets
      consumed by the checkpoint. However, if we make multiple
      modifications, then the second transaction commit will allocate a
      shadow log vector and hence we will end up with double the memory
      usage as only one of the log vectors is consumed by the CIL
      checkpoint. The remaining shadow vector will be freed when th elog
      item is freed.
      
      This can probably be optimised in future - access to the shadow log
      vector is serialised by the object lock (as opposited to the active
      log vector, which is controlled by the CIL context lock) and so we
      can probably free shadow log vector from some objects when the log
      item is marked clean on removal from the AIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b1c5ebb2
    • D
      libxfs: directory node splitting does not have an extra block · 160ae76f
      Dave Chinner 提交于
      xfsprogs source commit 4280e59dcbc4cd8e01585efe788a68eb378048e8
      
      xfs_da3_split() has to handle all three versions of the
      directory/attribute btree structure. The attr tree is v1, the dir
      tre is v2 or v3. The main difference between the v1 and v2/3 trees
      is the way tree nodes are split - in the v1 tree we can require a
      double split to occur because the object to be inserted may be
      larger than the space made by splitting a leaf. In this case we need
      to do a double split - one to split the full leaf, then another to
      allocate an empty leaf block in the correct location for the new
      entry.  This does not happen with dir (v2/v3) formats as the objects
      being inserted are always guaranteed to fit into the new space in
      the split blocks.
      
      Indeed, for directories they *may* be an extra block on this buffer
      pointer. However, it's guaranteed not to be a leaf block (i.e. a
      directory data block) - the directory code only ever places hash
      index or free space blocks in this pointer (as a cursor of
      sorts), and so to use it as a directory data block will immediately
      corrupt the directory.
      
      The problem is that the code assumes that there may be extra blocks
      that we need to link into the tree once we've split the root, but
      this is not true for either dir or attr trees, because the extra
      attr block is always consumed by the last node split before we split
      the root. Hence the linking in an extra block is always wrong at the
      root split level, and this manifests itself in repair as a directory
      corruption in a repaired directory, leaving the directory rebuild
      incomplete.
      
      This is a dir v2 zero-day bug - it was in the initial dir v2 commit
      that was made back in February 1998.
      
      Fix this by ensuring the linking of the blocks after the root split
      never tries to make use of the extra blocks that may be held in the
      cursor. They are held there for other purposes and should never be
      touched by the root splitting code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      160ae76f
    • A
      xfs: remove dax code from object file when disabled · f021bd07
      Arnd Bergmann 提交于
      We check IS_DAX(inode) before calling either xfs_file_dax_read or
      xfs_file_dax_write, and this will lead the call being optimized out at
      compile time when CONFIG_FS_DAX is disabled.
      
      However, the two functions are marked STATIC, so they become global
      symbols when CONFIG_XFS_DEBUG is set, leaving us with two unused global
      functions that call into an undefined function and a broken "allmodconfig"
      build:
      
      fs/built-in.o: In function `xfs_file_dax_read':
      fs/xfs/xfs_file.c:348: undefined reference to `dax_do_io'
      fs/built-in.o: In function `xfs_file_dax_write':
      fs/xfs/xfs_file.c:758: undefined reference to `dax_do_io'
      
      Marking the two functions 'static noinline' instead of 'STATIC' will let
      the compiler drop the symbols when there are no callers but avoid the
      implicit inlining.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 16d4d435 ("xfs: split direct I/O and DAX path")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f021bd07
    • B
      xfs: skip dirty pages in ->releasepage() · 99579cce
      Brian Foster 提交于
      XFS has had scattered reports of delalloc blocks present at
      ->releasepage() time. This results in a warning with a stack trace
      similar to the following:
      
       ...
       Call Trace:
        [<ffffffffa23c5b8f>] dump_stack+0x63/0x84
        [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0
        [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20
        [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140
        [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0
        [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150
        [<ffffffffa21521c2>] try_to_release_page+0x32/0x50
        [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0
        [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0
        [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0
        [<ffffffffa2168539>] kswapd+0x4f9/0x970
        [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
        [<ffffffffa20a0d99>] kthread+0xc9/0xe0
        [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
        [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70
        [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100
      
      This occurs because it is possible for shrink_active_list() to send
      pages marked dirty to ->releasepage() when certain buffer_head threshold
      conditions are met. shrink_active_list() doesn't check the page dirty
      state apparently to handle an old ext3 corner case where in some cases
      clean pages would not have the dirty bit cleared, thus it is up to the
      filesystem to determine how to handle the page.
      
      XFS currently handles the delalloc case properly, but this behavior
      makes the warning spurious. Update the XFS ->releasepage() handler to
      explicitly skip dirty pages. Retain the existing delalloc/unwritten
      checks so we continue to warn if such buffers exist on clean pages when
      they shouldn't.
      Diagnosed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      99579cce
    • B
      GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes · e1cb6be9
      Bob Peterson 提交于
      Before this patch, if you used gfs2_jadd to add new journals of a
      size smaller than the existing journals, replaying those new journals
      would withdraw. That's because function gfs2_replay_incr_blk was
      using the number of journal blocks (jd_block) from the superblock's
      journal pointer. In other words, "My journal's max size" rather than
      "the journal we're replaying's size." This patch changes the function
      to use the size of the pertinent journal rather than always using the
      journal we happen to be using.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e1cb6be9
  6. 21 7月, 2016 6 次提交
  7. 20 7月, 2016 4 次提交