1. 20 11月, 2014 12 次提交
  2. 14 11月, 2014 2 次提交
  3. 13 11月, 2014 11 次提交
  4. 07 11月, 2014 7 次提交
    • D
      xfs: track bulkstat progress by agino · 00275899
      Dave Chinner 提交于
      The bulkstat main loop progress is tracked by the "lastino"
      variable, which is a full 64 bit inode. However, the loop actually
      works on agno/agino pairs, and so there's a significant disconnect
      between the rest of the loop and the main cursor. Convert this to
      use the agino, and pass the agino into the chunk formatting function
      and convert it too.
      
      This gets rid of the inconsistency in the loop processing, and
      finally makes it simple for us to skip inodes at any point in the
      loop simply by incrementing the agino cursor.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      00275899
    • D
      xfs: bulkstat error handling is broken · febe3cbe
      Dave Chinner 提交于
      The error propagation is a horror - xfs_bulkstat() returns
      a rval variable which is only set if there are formatter errors. Any
      sort of btree walk error or corruption will cause the bulkstat walk
      to terminate but will not pass an error back to userspace. Worse
      is the fact that formatter errors will also be ignored if any inodes
      were correctly formatted into the user buffer.
      
      Hence bulkstat can fail badly yet still report success to userspace.
      This causes significant issues with xfsdump not dumping everything
      in the filesystem yet reporting success. It's not until a restore
      fails that there is any indication that the dump was bad and tha
      bulkstat failed. This patch now triggers xfsdump to fail with
      bulkstat errors rather than silently missing files in the dump.
      
      This now causes bulkstat to fail when the lastino cookie does not
      fall inside an existing inode chunk. The pre-3.17 code tolerated
      that error by allowing the code to move to the next inode chunk
      as the agino target is guaranteed to fall into the next btree
      record.
      
      With the fixes up to this point in the series, xfsdump now passes on
      the troublesome filesystem image that exposes all these bugs.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      febe3cbe
    • D
      xfs: bulkstat main loop logic is a mess · 6e57c542
      Dave Chinner 提交于
      There are a bunch of variables tha tare more wildy scoped than they
      need to be, obfuscated user buffer checks and tortured "next inode"
      tracking. This all needs cleaning up to expose the real issues that
      need fixing.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6e57c542
    • D
      xfs: bulkstat chunk-formatter has issues · 2b831ac6
      Dave Chinner 提交于
      The loop construct has issues:
      	- clustidx is completely unused, so remove it.
      	- the loop tries to be smart by terminating when the
      	  "freecount" tells it that all inodes are free. Just drop
      	  it as in most cases we have to scan all inodes in the
      	  chunk anyway.
      	- move the "user buffer left" condition check to the only
      	  point where we consume space int eh user buffer.
      	- move the initialisation of agino out of the loop, leaving
      	  just a simple loop control logic using the clusteridx.
      
      Also, double handling of the user buffer variables leads to problems
      tracking the current state - use the cursor variables directly
      rather than keeping local copies and then having to update the
      cursor before returning.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2b831ac6
    • D
      xfs: bulkstat chunk formatting cursor is broken · bf4a5af2
      Dave Chinner 提交于
      The xfs_bulkstat_agichunk formatting cursor takes buffer values from
      the main loop and passes them via the structure to the chunk
      formatter, and the writes the changed values back into the main loop
      local variables. Unfortunately, this complex dance is full of corner
      cases that aren't handled correctly.
      
      The biggest problem is that it is double handling the information in
      both the main loop and the chunk formatting function, leading to
      inconsistent updates and endless loops where progress is not made.
      
      To fix this, push the struct xfs_bulkstat_agichunk outwards to be
      the primary holder of user buffer information. this removes the
      double handling in the main loop.
      
      Also, pass the last inode processed by the chunk formatter as a
      separate parameter as it purely an output variable and is not
      related to the user buffer consumption cursor.
      
      Finally, the chunk formatting code is not shared by anyone, so make
      it local to xfs_itable.c.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bf4a5af2
    • D
      xfs: bulkstat btree walk doesn't terminate · afa947cb
      Dave Chinner 提交于
      The bulkstat code has several different ways of detecting the end of
      an AG when doing a walk. They are not consistently detected, and the
      code that checks for the end of AG conditions is not consistently
      coded. Hence the are conditions where the walk code can get stuck in
      an endless loop making no progress and not triggering any
      termination conditions.
      
      Convert all the "tmp/i" status return codes from btree operations
      to a common name (stat) and apply end-of-ag detection to these
      operations consistently.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      afa947cb
    • G
      aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer · 835f252c
      Gu Zheng 提交于
      https://bugzilla.kernel.org/show_bug.cgi?id=86831
      
      Markus reported that when shutting down mysqld (with AIO support,
      on a ext3 formatted Harddrive) leads to a negative number of dirty pages
      (underrun to the counter). The negative number results in a drastic reduction
      of the write performance because the page cache is not used, because the kernel
      thinks it is still 2 ^ 32 dirty pages open.
      
      Add a warn trace in __dec_zone_state will catch this easily:
      
      static inline void __dec_zone_state(struct zone *zone, enum
      	zone_stat_item item)
      {
           atomic_long_dec(&zone->vm_stat[item]);
      +    WARN_ON_ONCE(item == NR_FILE_DIRTY &&
      	atomic_long_read(&zone->vm_stat[item]) < 0);
           atomic_long_dec(&vm_stat[item]);
      }
      
      [   21.341632] ------------[ cut here ]------------
      [   21.346294] WARNING: CPU: 0 PID: 309 at include/linux/vmstat.h:242
      cancel_dirty_page+0x164/0x224()
      [   21.355296] Modules linked in: wutbox_cp sata_mv
      [   21.359968] CPU: 0 PID: 309 Comm: kworker/0:1 Not tainted 3.14.21-WuT #80
      [   21.366793] Workqueue: events free_ioctx
      [   21.370760] [<c0016a64>] (unwind_backtrace) from [<c0012f88>]
      (show_stack+0x20/0x24)
      [   21.378562] [<c0012f88>] (show_stack) from [<c03f8ccc>]
      (dump_stack+0x24/0x28)
      [   21.385840] [<c03f8ccc>] (dump_stack) from [<c0023ae4>]
      (warn_slowpath_common+0x84/0x9c)
      [   21.393976] [<c0023ae4>] (warn_slowpath_common) from [<c0023bb8>]
      (warn_slowpath_null+0x2c/0x34)
      [   21.402800] [<c0023bb8>] (warn_slowpath_null) from [<c00c0688>]
      (cancel_dirty_page+0x164/0x224)
      [   21.411524] [<c00c0688>] (cancel_dirty_page) from [<c00c080c>]
      (truncate_inode_page+0x8c/0x158)
      [   21.420272] [<c00c080c>] (truncate_inode_page) from [<c00c0a94>]
      (truncate_inode_pages_range+0x11c/0x53c)
      [   21.429890] [<c00c0a94>] (truncate_inode_pages_range) from
      [<c00c0f6c>] (truncate_pagecache+0x88/0xac)
      [   21.439252] [<c00c0f6c>] (truncate_pagecache) from [<c00c0fec>]
      (truncate_setsize+0x5c/0x74)
      [   21.447731] [<c00c0fec>] (truncate_setsize) from [<c013b3a8>]
      (put_aio_ring_file.isra.14+0x34/0x90)
      [   21.456826] [<c013b3a8>] (put_aio_ring_file.isra.14) from
      [<c013b424>] (aio_free_ring+0x20/0xcc)
      [   21.465660] [<c013b424>] (aio_free_ring) from [<c013b4f4>]
      (free_ioctx+0x24/0x44)
      [   21.473190] [<c013b4f4>] (free_ioctx) from [<c003d8d8>]
      (process_one_work+0x134/0x47c)
      [   21.481132] [<c003d8d8>] (process_one_work) from [<c003e988>]
      (worker_thread+0x130/0x414)
      [   21.489350] [<c003e988>] (worker_thread) from [<c00448ac>]
      (kthread+0xd4/0xec)
      [   21.496621] [<c00448ac>] (kthread) from [<c000ec18>]
      (ret_from_fork+0x14/0x20)
      [   21.503884] ---[ end trace 79c4bf42c038c9a1 ]---
      
      The cause is that we set the aio ring file pages as *DIRTY* via SetPageDirty
      (bypasses the VFS dirty pages increment) when init, and aio fs uses
      *default_backing_dev_info* as the backing dev, which does not disable
      the dirty pages accounting capability.
      So truncating aio ring file will contribute to accounting dirty pages (VFS
      dirty pages decrement), then error occurs.
      
      The original goal is keeping these pages in memory (can not be reclaimed
      or swapped) in life-time via marking it dirty. But thinking more, we have
      already pinned pages via elevating the page's refcount, which can already
      achieve the goal, so the SetPageDirty seems unnecessary.
      
      In order to fix the issue, using the __set_page_dirty_no_writeback instead
      of the nop .set_page_dirty, and dropped the SetPageDirty (don't manually
      set the dirty flags, don't disable set_page_dirty(), rely on default behaviour).
      
      With the above change, the dirty pages accounting can work well. But as we
      known, aio fs is an anonymous one, which should never cause any real write-back,
      we can ignore the dirty pages (write back) accounting by disabling the dirty
      pages (write back) accounting capability. So we introduce an aio private
      backing dev info (disabled the ACCT_DIRTY/WRITEBACK/ACCT_WB capabilities) to
      replace the default one.
      Reported-by: NMarkus Königshaus <m.koenigshaus@wut.de>
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      835f252c
  5. 06 11月, 2014 1 次提交
  6. 05 11月, 2014 4 次提交
  7. 04 11月, 2014 1 次提交
  8. 01 11月, 2014 1 次提交
  9. 31 10月, 2014 1 次提交
    • D
      Return short read or 0 at end of a raw device, not EIO · b2de525f
      David Jeffery 提交于
      Author: David Jeffery <djeffery@redhat.com>
      Changes to the basic direct I/O code have broken the raw driver when reading
      to the end of a raw device.  Instead of returning a short read for a read that
      extends partially beyond the device's end or 0 when at the end of the device,
      these reads now return EIO.
      
      The raw driver needs the same end of device handling as was added for normal
      block devices.  Using blkdev_read_iter, which has the needed size checks,
      prevents the EIO conditions at the end of the device.
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b2de525f