1. 28 1月, 2015 1 次提交
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
  2. 04 12月, 2014 10 次提交
  3. 01 12月, 2014 4 次提交
  4. 28 11月, 2014 10 次提交
  5. 10 11月, 2014 1 次提交
  6. 07 11月, 2014 6 次提交
    • D
      xfs: track bulkstat progress by agino · 00275899
      Dave Chinner 提交于
      The bulkstat main loop progress is tracked by the "lastino"
      variable, which is a full 64 bit inode. However, the loop actually
      works on agno/agino pairs, and so there's a significant disconnect
      between the rest of the loop and the main cursor. Convert this to
      use the agino, and pass the agino into the chunk formatting function
      and convert it too.
      
      This gets rid of the inconsistency in the loop processing, and
      finally makes it simple for us to skip inodes at any point in the
      loop simply by incrementing the agino cursor.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      00275899
    • D
      xfs: bulkstat error handling is broken · febe3cbe
      Dave Chinner 提交于
      The error propagation is a horror - xfs_bulkstat() returns
      a rval variable which is only set if there are formatter errors. Any
      sort of btree walk error or corruption will cause the bulkstat walk
      to terminate but will not pass an error back to userspace. Worse
      is the fact that formatter errors will also be ignored if any inodes
      were correctly formatted into the user buffer.
      
      Hence bulkstat can fail badly yet still report success to userspace.
      This causes significant issues with xfsdump not dumping everything
      in the filesystem yet reporting success. It's not until a restore
      fails that there is any indication that the dump was bad and tha
      bulkstat failed. This patch now triggers xfsdump to fail with
      bulkstat errors rather than silently missing files in the dump.
      
      This now causes bulkstat to fail when the lastino cookie does not
      fall inside an existing inode chunk. The pre-3.17 code tolerated
      that error by allowing the code to move to the next inode chunk
      as the agino target is guaranteed to fall into the next btree
      record.
      
      With the fixes up to this point in the series, xfsdump now passes on
      the troublesome filesystem image that exposes all these bugs.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      febe3cbe
    • D
      xfs: bulkstat main loop logic is a mess · 6e57c542
      Dave Chinner 提交于
      There are a bunch of variables tha tare more wildy scoped than they
      need to be, obfuscated user buffer checks and tortured "next inode"
      tracking. This all needs cleaning up to expose the real issues that
      need fixing.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6e57c542
    • D
      xfs: bulkstat chunk-formatter has issues · 2b831ac6
      Dave Chinner 提交于
      The loop construct has issues:
      	- clustidx is completely unused, so remove it.
      	- the loop tries to be smart by terminating when the
      	  "freecount" tells it that all inodes are free. Just drop
      	  it as in most cases we have to scan all inodes in the
      	  chunk anyway.
      	- move the "user buffer left" condition check to the only
      	  point where we consume space int eh user buffer.
      	- move the initialisation of agino out of the loop, leaving
      	  just a simple loop control logic using the clusteridx.
      
      Also, double handling of the user buffer variables leads to problems
      tracking the current state - use the cursor variables directly
      rather than keeping local copies and then having to update the
      cursor before returning.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2b831ac6
    • D
      xfs: bulkstat chunk formatting cursor is broken · bf4a5af2
      Dave Chinner 提交于
      The xfs_bulkstat_agichunk formatting cursor takes buffer values from
      the main loop and passes them via the structure to the chunk
      formatter, and the writes the changed values back into the main loop
      local variables. Unfortunately, this complex dance is full of corner
      cases that aren't handled correctly.
      
      The biggest problem is that it is double handling the information in
      both the main loop and the chunk formatting function, leading to
      inconsistent updates and endless loops where progress is not made.
      
      To fix this, push the struct xfs_bulkstat_agichunk outwards to be
      the primary holder of user buffer information. this removes the
      double handling in the main loop.
      
      Also, pass the last inode processed by the chunk formatter as a
      separate parameter as it purely an output variable and is not
      related to the user buffer consumption cursor.
      
      Finally, the chunk formatting code is not shared by anyone, so make
      it local to xfs_itable.c.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bf4a5af2
    • D
      xfs: bulkstat btree walk doesn't terminate · afa947cb
      Dave Chinner 提交于
      The bulkstat code has several different ways of detecting the end of
      an AG when doing a walk. They are not consistently detected, and the
      code that checks for the end of AG conditions is not consistently
      coded. Hence the are conditions where the walk code can get stuck in
      an endless loop making no progress and not triggering any
      termination conditions.
      
      Convert all the "tmp/i" status return codes from btree operations
      to a common name (stat) and apply end-of-ag detection to these
      operations consistently.
      
      cc: <stable@vger.kernel.org> # 3.17
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      afa947cb
  7. 30 10月, 2014 2 次提交
    • B
      xfs: rework zero range to prevent invalid i_size updates · 5d11fb4b
      Brian Foster 提交于
      The zero range operation is analogous to fallocate with the exception of
      converting the range to zeroes. E.g., it attempts to allocate zeroed
      blocks over the range specified by the caller. The XFS implementation
      kills all delalloc blocks currently over the aligned range, converts the
      range to allocated zero blocks (unwritten extents) and handles the
      partial pages at the ends of the range by sending writes through the
      pagecache.
      
      The current implementation suffers from several problems associated with
      inode size. If the aligned range covers an extending I/O, said I/O is
      discarded and an inode size update from a previous write never makes it
      to disk. Further, if an unaligned zero range extends beyond eof, the
      page write induced for the partial end page can itself increase the
      inode size, even if the zero range request is not supposed to update
      i_size (via KEEP_SIZE, similar to an fallocate beyond EOF).
      
      The latter behavior not only incorrectly increases the inode size, but
      can lead to stray delalloc blocks on the inode. Typically, post-eof
      preallocation blocks are either truncated on release or inode eviction
      or explicitly written to by xfs_zero_eof() on natural file size
      extension. If the inode size increases due to zero range, however,
      associated blocks leak into the address space having never been
      converted or mapped to pagecache pages. A direct I/O to such an
      uncovered range cannot convert the extent via writeback and will BUG().
      For example:
      
      $ xfs_io -fc "pwrite 0 128k" -c "fzero -k 1m 54321" <file>
      ...
      $ xfs_io -d -c "pread 128k 128k" <file>
      <BUG>
      
      If the entire delalloc extent happens to not have page coverage
      whatsoever (e.g., delalloc conversion couldn't find a large enough free
      space extent), even a full file writeback won't convert what's left of
      the extent and we'll assert on inode eviction.
      
      Rework xfs_zero_file_space() to avoid buffered I/O for partial pages.
      Use the existing hole punch and prealloc mechanisms as primitives for
      zero range. This implementation is not efficient nor ideal as we
      writeback dirty data over the range and remove existing extents rather
      than convert to unwrittern. The former writeback, however, is currently
      the only mechanism available to ensure consistency between pagecache and
      extent state. Even a pagecache truncate/delalloc punch prior to hole
      punch has lead to inconsistencies due to racing with writeback.
      
      This provides a consistent, correct implementation of zero range that
      survives fsstress/fsx testing without assert failures. The
      implementation can be optimized from this point forward once the
      fundamental issue of pagecache and delalloc extent state consistency is
      addressed.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      5d11fb4b
    • J
      xfs: Check error during inode btree iteration in xfs_bulkstat() · 7a19dee1
      Jan Kara 提交于
      xfs_bulkstat() doesn't check error return from xfs_btree_increment(). In
      case of specific fs corruption that could result in xfs_bulkstat()
      entering an infinite loop because we would be looping over the same
      chunk over and over again. Fix the problem by checking the return value
      and terminating the loop properly.
      
      Coverity-id: 1231338
      cc: <stable@vger.kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJie Liu <jeff.u.liu@gmail.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7a19dee1
  8. 29 10月, 2014 1 次提交
  9. 13 10月, 2014 1 次提交
    • E
      xfs: fix agno increment in xfs_inumbers() loop · a8b1ee8b
      Eric Sandeen 提交于
      caused a regression in xfs_inumbers, which in turn broke
      xfsdump, causing incomplete dumps.
      
      The loop in xfs_inumbers() needs to fill the user-supplied
      buffers, and iterates via xfs_btree_increment, reading new
      ags as needed.
      
      But the first time through the loop, if xfs_btree_increment()
      succeeds, we continue, which triggers the ++agno at the bottom
      of the loop, and we skip to soon to the next ag - without
      the proper setup under next_ag to read the next ag.
      
      Fix this by removing the agno increment from the loop conditional,
      and only increment agno if we have actually hit the code under
      the next_ag: target.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a8b1ee8b
  10. 03 10月, 2014 1 次提交
    • M
      xfs: xfs_iflush_done checks the wrong log item callback · 52177937
      Mark Tinguely 提交于
      Commit 30136832 ("xfs: remove all the inodes on a buffer from the AIL
      in bulk") made the xfs inode flush callback more efficient by
      combining all the inode writes on the buffer and the deletions of
      the inode log item from AIL.
      
      The initial loop in this patch should be looping through all
      the log items on the buffer to see which items have
      xfs_iflush_done as their callback function. But currently,
      only the log item passed to the function has its callback
      compared to xfs_iflush_done. If the log item pointer passed to
      the function does have the xfs_iflush_done callback function,
      then all the log items on the buffer are removed from the
      li_bio_list on the buffer b_fspriv and could be removed from
      the AIL even though they may have not been written yet.
      
      This problem is masked by the fact that currently all inodes on a
      buffer will have the same calback function - either xfs_iflush_done
      or xfs_istale_done - and hence the bug cannot manifest in any way.
      Still, we need to remove the landmine so that if we add new
      callbacks in future this doesn't cause us problems.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      52177937
  11. 02 10月, 2014 3 次提交
    • B
      xfs: flush the range before zero range conversion · da5f1096
      Brian Foster 提交于
      XFS currently discards delalloc blocks within the target range of a
      zero range request. Unaligned start and end offsets are zeroed
      through the page cache and the internal, aligned blocks are
      converted to unwritten extents.
      
      If EOF is page aligned and covered by a delayed allocation extent.
      The inode size is not updated until I/O completion. If a zero range
      request discards a delalloc range that covers page aligned EOF as
      such, the inode size update never occurs. For example:
      
      $ rm -f /mnt/file
      $ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file
      $ stat -c "%s" /mnt/file
      65536
      $ umount /mnt
      $ mount <dev> /mnt
      $ stat -c "%s" /mnt/file
      61440
      
      Update xfs_zero_file_space() to flush the range rather than discard
      delalloc blocks to ensure that inode size updates occur
      appropriately.
      
      [dchinner: Note that this is really a workaround to avoid the
      underlying problems. More work is needed (and ongoing) to fix those
      issues so this fix is being added as a temporary stop-gap measure. ]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      da5f1096
    • B
      xfs: restore buffer_head unwritten bit on ioend cancel · 07d08681
      Brian Foster 提交于
      xfs_vm_writepage() walks each buffer_head on the page, maps to the block
      on disk and attaches to a running ioend structure that represents the
      I/O submission. A new ioend is created when the type of I/O (unwritten,
      delayed allocation or overwrite) required for a particular buffer_head
      differs from the previous. If a buffer_head is a delalloc or unwritten
      buffer, the associated bits are cleared by xfs_map_at_offset() once the
      buffer_head is added to the ioend.
      
      The process of mapping each buffer_head occurs in xfs_map_blocks() and
      acquires the ilock in blocking or non-blocking mode, depending on the
      type of writeback in progress. If the lock cannot be acquired for
      non-blocking writeback, we cancel the ioend, redirty the page and
      return. Writeback will revisit the page at some later point.
      
      Note that we acquire the ilock for each buffer on the page. Therefore
      during non-blocking writeback, it is possible to add an unwritten buffer
      to the ioend, clear the unwritten state, fail to acquire the ilock when
      mapping a subsequent buffer and cancel the ioend. If this occurs, the
      unwritten status of the buffer sitting in the ioend has been lost. The
      page will eventually hit writeback again, but xfs_vm_writepage() submits
      overwrite I/O instead of unwritten I/O and does not perform unwritten
      extent conversion at I/O completion. This leads to data corruption
      because unwritten extents are treated as holes on reads and zeroes are
      returned instead of reading from disk.
      
      Modify xfs_cancel_ioend() to restore the buffer unwritten bit for ioends
      of type XFS_IO_UNWRITTEN. This ensures that unwritten extent conversion
      occurs once the page is eventually written back.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      07d08681
    • E
      xfs: check for null dquot in xfs_quota_calc_throttle() · 5cca3f61
      Eric Sandeen 提交于
      Coverity spotted this.
      
      Granted, we *just* checked xfs_inod_dquot() in the caller (by
      calling xfs_quota_need_throttle). However, this is the only place we
      don't check the return value but the check is cheap and future-proof
      so add it.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5cca3f61
反馈
建议
客服 返回
顶部