1. 02 9月, 2017 2 次提交
  2. 21 6月, 2017 1 次提交
  3. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  4. 03 8月, 2016 3 次提交
  5. 11 1月, 2016 1 次提交
    • E
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen 提交于
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f6106efa
  6. 04 1月, 2016 1 次提交
  7. 12 10月, 2015 1 次提交
  8. 29 7月, 2015 4 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
    • D
      xfs: xfs_bunmapi() does not need XFS_BMAPI_METADATA flag · ab7bb610
      Dave Chinner 提交于
      xfs_bunmapi() doesn't care what type of extent is being freed and
      does not look at the XFS_BMAPI_METADATA flag at all. As such we can
      remove the XFS_BMAPI_METADATA from all callers that use it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ab7bb610
    • D
      xfs: remote attributes need to be considered data · df150ed1
      Dave Chinner 提交于
      We don't log remote attribute contents, and instead write them
      synchronously before we commit the block allocation and attribute
      tree update transaction. As a result we are writing to the allocated
      space before the allcoation has been made permanent.
      
      As a result, we cannot consider this allocation to be a metadata
      allocation. Metadata allocation can take blocks from the free list
      and so reuse them before the transaction that freed the block is
      committed to disk. This behaviour is perfectly fine for journalled
      metadata changes as log recovery will ensure the free operation is
      replayed before the overwrite, but for remote attribute writes this
      is not the case.
      
      Hence we have to consider the remote attribute blocks to contain
      data and allocate accordingly. We do this by dropping the
      XFS_BMAPI_METADATA flag from the block allocation. This means the
      allocation will not use blocks that are on the busy list without
      first ensuring that the freeing transaction has been committed to
      disk and the blocks removed from the busy list. This ensures we will
      never overwrite a freed block without first ensuring that it is
      really free.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      df150ed1
    • D
      xfs: remote attribute headers contain an invalid LSN · e3c32ee9
      Dave Chinner 提交于
      In recent testing, a system that crashed failed log recovery on
      restart with a bad symlink buffer magic number:
      
      XFS (vda): Starting recovery (logdev: internal)
      XFS (vda): Bad symlink block magic!
      XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060
      
      On examination of the log via xfs_logprint, none of the symlink
      buffers in the log had a bad magic number, nor were any other types
      of buffer log format headers mis-identified as symlink buffers.
      Tracing was used to find the buffer the kernel was tripping over,
      and xfs_db identified it's contents as:
      
      000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
      020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
      040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      .....
      
      This is a remote attribute buffer, which are notable in that they
      are not logged but are instead written synchronously by the remote
      attribute code so that they exist on disk before the attribute
      transactions are committed to the journal.
      
      The above remote attribute block has an invalid LSN in it - cycle
      0xd002000, block 0 - which means when log recovery comes along to
      determine if the transaction that writes to the underlying block
      should be replayed, it sees a block that has a future LSN and so
      does not replay the buffer data in the transaction. Instead, it
      validates the buffer magic number and attaches the buffer verifier
      to it.  It is this buffer magic number check that is failing in the
      above assert, indicating that we skipped replay due to the LSN of
      the underlying buffer.
      
      The problem here is that the remote attribute buffers cannot have a
      valid LSN placed into them, because the transaction that contains 
      the attribute tree pointer changes and the block allocation that the
      attribute data is being written to hasn't yet been committed. Hence
      the LSN field in the attribute block is completely unwritten,
      thereby leaving the underlying contents of the block in the LSN
      field. It could have any value, and hence a future overwrite of the
      block by log recovery may or may not work correctly.
      
      Fix this by always writing an invalid LSN to the remote attribute
      block, as any buffer in log recovery that needs to write over the
      remote attribute should occur. We are protected from having old data
      written over the attribute by the fact that freeing the block before
      the remote attribute is written will result in the buffer being
      marked stale in the log and so all changes prior to the buffer stale
      transaction will be cancelled by log recovery.
      
      Hence it is safe to ignore the LSN in the case or synchronously
      written, unlogged metadata such as remote attribute blocks, and to
      ensure we do that correctly, we need to write an invalid LSN to all
      remote attribute blocks to trigger immediate recovery of metadata
      that is written over the top.
      
      As a further protection for filesystems that may already have remote
      attribute blocks with bad LSNs on disk, change the log recovery code
      to always trigger immediate recovery of metadata over remote
      attribute blocks.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e3c32ee9
  9. 28 11月, 2014 2 次提交
  10. 25 6月, 2014 2 次提交
  11. 22 6月, 2014 1 次提交
  12. 06 6月, 2014 1 次提交
  13. 06 5月, 2014 1 次提交
    • D
      xfs: remote attribute overwrite causes transaction overrun · 8275cdd0
      Dave Chinner 提交于
      Commit e461fcb1 ("xfs: remote attribute lookups require the value
      length") passes the remote attribute length in the xfs_da_args
      structure on lookup so that CRC calculations and validity checking
      can be performed correctly by related code. This, unfortunately has
      the side effect of changing the args->valuelen parameter in cases
      where it shouldn't.
      
      That is, when we replace a remote attribute, the incoming
      replacement stores the value and length in args->value and
      args->valuelen, but then the lookup which finds the existing remote
      attribute overwrites args->valuelen with the length of the remote
      attribute being replaced. Hence when we go to create the new
      attribute, we create it of the size of the existing remote
      attribute, not the size it is supposed to be. When the new attribute
      is much smaller than the old attribute, this results in a
      transaction overrun and an ASSERT() failure on a debug kernel:
      
      XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
      
      Fix this by keeping the remote attribute value length separate to
      the attribute value length in the xfs_da_args structure. The enables
      us to pass the length of the remote attribute to be removed without
      overwriting the new attribute's length.
      
      Also, ensure that when we save remote block contexts for a later
      rename we zero the original state variables so that we don't confuse
      the state of the attribute to be removes with the state of the new
      attribute that we just added. [Spotted by Brain Foster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8275cdd0
  14. 14 4月, 2014 1 次提交
  15. 27 2月, 2014 1 次提交
  16. 11 1月, 2014 1 次提交
    • J
      xfs: fix off-by-one error in xfs_attr3_rmt_verify · bba719b5
      Jie Liu 提交于
      With CRC check is enabled, if trying to set an attributes value just
      equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
      attr write verification procedure failure, which would yield the back
      trace like below:
      
      <snip>
      XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
      <snip>
      Call Trace:
      [<ffffffff816f0042>] dump_stack+0x45/0x56
      [<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
      [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
      [<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
      [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
      [<ffffffff81097230>] ? wake_up_state+0x20/0x20
      [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
      [<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
      [<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
      [<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
      [<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
      [<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
      [<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
      [<ffffffff811df9b2>] generic_setxattr+0x62/0x80
      [<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
      [<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
      [<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
      [<ffffffff811e054e>] setxattr+0x12e/0x1c0
      [<ffffffff811c6e82>] ? final_putname+0x22/0x50
      [<ffffffff811c708b>] ? putname+0x2b/0x40
      [<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
      [<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
      [<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
      [<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
      [<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f
      
      Tests:
          setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile
      
      This patch fix it to check the remote EA size is greater than the
      XATTR_SIZE_MAX rather than more than or equal to it, because it's
      valid if the specified EA value size is equal to the limitation as
      per VFS setxattr interface.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 85dd0707)
      bba719b5
  17. 07 1月, 2014 1 次提交
    • J
      xfs: fix off-by-one error in xfs_attr3_rmt_verify · 85dd0707
      Jie Liu 提交于
      With CRC check is enabled, if trying to set an attributes value just
      equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
      attr write verification procedure failure, which would yield the back
      trace like below:
      
      <snip>
      XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
      <snip>
      Call Trace:
      [<ffffffff816f0042>] dump_stack+0x45/0x56
      [<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
      [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
      [<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
      [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
      [<ffffffff81097230>] ? wake_up_state+0x20/0x20
      [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
      [<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
      [<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
      [<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
      [<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
      [<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
      [<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
      [<ffffffff811df9b2>] generic_setxattr+0x62/0x80
      [<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
      [<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
      [<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
      [<ffffffff811e054e>] setxattr+0x12e/0x1c0
      [<ffffffff811c6e82>] ? final_putname+0x22/0x50
      [<ffffffff811c708b>] ? putname+0x2b/0x40
      [<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
      [<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
      [<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
      [<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
      [<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f
      
      Tests:
          setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile
      
      This patch fix it to check the remote EA size is greater than the
      XATTR_SIZE_MAX rather than more than or equal to it, because it's
      valid if the specified EA value size is equal to the limitation as
      per VFS setxattr interface.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      85dd0707
  18. 31 10月, 2013 1 次提交
  19. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
  20. 13 8月, 2013 4 次提交
  21. 31 5月, 2013 5 次提交
    • D
      xfs: rework remote attr CRCs · 7bc0dc27
      Dave Chinner 提交于
      Note: this changes the on-disk remote attribute format. I assert
      that this is OK to do as CRCs are marked experimental and the first
      kernel it is included in has not yet reached release yet. Further,
      the userspace utilities are still evolving and so anyone using this
      stuff right now is a developer or tester using volatile filesystems
      for testing this feature. Hence changing the format right now to
      save longer term pain is the right thing to do.
      
      The fundamental change is to move from a header per extent in the
      attribute to a header per filesytem block in the attribute. This
      means there are more header blocks and the parsing of the attribute
      data is slightly more complex, but it has the advantage that we
      always know the size of the attribute on disk based on the length of
      the data it contains.
      
      This is where the header-per-extent method has problems. We don't
      know the size of the attribute on disk without first knowing how
      many extents are used to hold it. And we can't tell from a
      mapping lookup, either, because remote attributes can be allocated
      contiguously with other attribute blocks and so there is no obvious
      way of determining the actual size of the atribute on disk short of
      walking and mapping buffers.
      
      The problem with this approach is that if we map a buffer
      incorrectly (e.g. we make the last buffer for the attribute data too
      long), we then get buffer cache lookup failure when we map it
      correctly. i.e. we get a size mismatch on lookup. This is not
      necessarily fatal, but it's a cache coherency problem that can lead
      to returning the wrong data to userspace or writing the wrong data
      to disk. And debug kernels will assert fail if this occurs.
      
      I found lots of niggly little problems trying to fix this issue on a
      4k block size filesystem, finally getting it to pass with lots of
      fixes. The thing is, 1024 byte filesystems still failed, and it was
      getting really complex handling all the corner cases that were
      showing up. And there were clearly more that I hadn't found yet.
      
      It is complex, fragile code, and if we don't fix it now, it will be
      complex, fragile code forever more.
      
      Hence the simple fix is to add a header to each filesystem block.
      This gives us the same relationship between the attribute data
      length and the number of blocks on disk as we have without CRCs -
      it's a linear mapping and doesn't require us to guess anything. It
      is simple to implement, too - the remote block count calculated at
      lookup time can be used by the remote attribute set/get/remove code
      without modification for both CRC and non-CRC filesystems. The world
      becomes sane again.
      
      Because the copy-in and copy-out now need to iterate over each
      filesystem block, I moved them into helper functions so we separate
      the block mapping and buffer manupulations from the attribute data
      and CRC header manipulations. The code becomes much clearer as a
      result, and it is a lot easier to understand and debug. It also
      appears to be much more robust - once it worked on 4k block size
      filesystems, it has worked without failure on 1k block size
      filesystems, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ad1858d7)
      7bc0dc27
    • D
      xfs: correctly map remote attr buffers during removal · 58a72281
      Dave Chinner 提交于
      If we don't map the buffers correctly (same as for get/set
      operations) then the incore buffer lookup will fail. If a block
      number matches but a length is wrong, then debug kernels will ASSERT
      fail in _xfs_buf_find() due to the length mismatch. Ensure that we
      map the buffers correctly by basing the length of the buffer on the
      attribute data length rather than the remote block count.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 6863ef84)
      58a72281
    • D
      xfs: remote attribute tail zeroing does too much · 26f71445
      Dave Chinner 提交于
      When an attribute data does not fill then entire remote block, we
      zero the remaining part of the buffer. This, however, needs to take
      into account that the buffer has a header, and so the offset where
      zeroing starts and the length of zeroing need to take this into
      account. Otherwise we end up with zeros over the end of the
      attribute value when CRCs are enabled.
      
      While there, make sure we only ask to map an extent that covers the
      remaining range of the attribute, rather than asking every time for
      the full length of remote data. If the remote attribute blocks are
      contiguous with other parts of the attribute tree, it will map those
      blocks as well and we can potentially zero them incorrectly. We can
      also get buffer size mistmatches when trying to read or remove the
      remote attribute, and this can lead to not finding the correct
      buffer when looking it up in cache.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 4af3644c)
      26f71445
    • D
      xfs: remote attribute read too short · 551b382f
      Dave Chinner 提交于
      Reading a maximally size remote attribute fails when CRCs are
      enabled with this verification error:
      
      XFS (vdb): remote attribute header does not match required off/len/owner)
      
      There are two reasons for this, the first being that the
      length of the buffer being read is determined from the
      args->rmtblkcnt which doesn't take into account CRC headers. Hence
      the mapped length ends up being too short and so we need to
      calculate it directly from the value length.
      
      The second is that the byte count of valid data within a buffer is
      capped by the length of the data and so doesn't take into account
      that the buffer might be longer due to headers. Hence we need to
      calculate the data space in the buffer first before calculating the
      actual byte count of data.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 913e96bc)
      551b382f
    • D
      xfs: remote attribute allocation may be contiguous · 9531e2de
      Dave Chinner 提交于
      When CRCs are enabled, there may be multiple allocations made if the
      headers cause a length overflow. This, however, does not mean that
      the number of headers required increases, as the second and
      subsequent extents may be contiguous with the previous extent. Hence
      when we map the extents to write the attribute data, we may end up
      with less extents than allocations made. Hence the assertion that we
      consume the number of headers we calculated in the allocation loop
      is incorrect and needs to be removed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 90253cf1)
      9531e2de
  22. 24 5月, 2013 2 次提交
    • D
      xfs: rework remote attr CRCs · ad1858d7
      Dave Chinner 提交于
      Note: this changes the on-disk remote attribute format. I assert
      that this is OK to do as CRCs are marked experimental and the first
      kernel it is included in has not yet reached release yet. Further,
      the userspace utilities are still evolving and so anyone using this
      stuff right now is a developer or tester using volatile filesystems
      for testing this feature. Hence changing the format right now to
      save longer term pain is the right thing to do.
      
      The fundamental change is to move from a header per extent in the
      attribute to a header per filesytem block in the attribute. This
      means there are more header blocks and the parsing of the attribute
      data is slightly more complex, but it has the advantage that we
      always know the size of the attribute on disk based on the length of
      the data it contains.
      
      This is where the header-per-extent method has problems. We don't
      know the size of the attribute on disk without first knowing how
      many extents are used to hold it. And we can't tell from a
      mapping lookup, either, because remote attributes can be allocated
      contiguously with other attribute blocks and so there is no obvious
      way of determining the actual size of the atribute on disk short of
      walking and mapping buffers.
      
      The problem with this approach is that if we map a buffer
      incorrectly (e.g. we make the last buffer for the attribute data too
      long), we then get buffer cache lookup failure when we map it
      correctly. i.e. we get a size mismatch on lookup. This is not
      necessarily fatal, but it's a cache coherency problem that can lead
      to returning the wrong data to userspace or writing the wrong data
      to disk. And debug kernels will assert fail if this occurs.
      
      I found lots of niggly little problems trying to fix this issue on a
      4k block size filesystem, finally getting it to pass with lots of
      fixes. The thing is, 1024 byte filesystems still failed, and it was
      getting really complex handling all the corner cases that were
      showing up. And there were clearly more that I hadn't found yet.
      
      It is complex, fragile code, and if we don't fix it now, it will be
      complex, fragile code forever more.
      
      Hence the simple fix is to add a header to each filesystem block.
      This gives us the same relationship between the attribute data
      length and the number of blocks on disk as we have without CRCs -
      it's a linear mapping and doesn't require us to guess anything. It
      is simple to implement, too - the remote block count calculated at
      lookup time can be used by the remote attribute set/get/remove code
      without modification for both CRC and non-CRC filesystems. The world
      becomes sane again.
      
      Because the copy-in and copy-out now need to iterate over each
      filesystem block, I moved them into helper functions so we separate
      the block mapping and buffer manupulations from the attribute data
      and CRC header manipulations. The code becomes much clearer as a
      result, and it is a lot easier to understand and debug. It also
      appears to be much more robust - once it worked on 4k block size
      filesystems, it has worked without failure on 1k block size
      filesystems, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad1858d7
    • D
      xfs: correctly map remote attr buffers during removal · 6863ef84
      Dave Chinner 提交于
      If we don't map the buffers correctly (same as for get/set
      operations) then the incore buffer lookup will fail. If a block
      number matches but a length is wrong, then debug kernels will ASSERT
      fail in _xfs_buf_find() due to the length mismatch. Ensure that we
      map the buffers correctly by basing the length of the buffer on the
      attribute data length rather than the remote block count.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6863ef84