1. 04 1月, 2016 1 次提交
  2. 12 10月, 2015 1 次提交
    • B
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster 提交于
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a45086e2
  3. 29 7月, 2015 1 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
  4. 29 5月, 2015 1 次提交
    • D
      xfs: xfs_attr_inactive leaves inconsistent attr fork state behind · 6dfe5a04
      Dave Chinner 提交于
      xfs_attr_inactive() is supposed to clean up the attribute fork when
      the inode is being freed. While it removes attribute fork extents,
      it completely ignores attributes in local format, which means that
      there can still be active attributes on the inode after
      xfs_attr_inactive() has run.
      
      This leads to problems with concurrent inode writeback - the in-core
      inode attribute fork is removed without locking on the assumption
      that nothing will be attempting to access the attribute fork after a
      call to xfs_attr_inactive() because it isn't supposed to exist on
      disk any more.
      
      To fix this, make xfs_attr_inactive() completely remove all traces
      of the attribute fork from the inode, regardless of it's state.
      Further, also remove the in-core attribute fork structure safely so
      that there is nothing further that needs to be done by callers to
      clean up the attribute fork. This means we can remove the in-core
      and on-disk attribute forks atomically.
      
      Also, on error simply remove the in-memory attribute fork. There's
      nothing that can be done with it once we have failed to remove the
      on-disk attribute fork, so we may as well just blow it away here
      anyway.
      
      cc: <stable@vger.kernel.org> # 3.12 to 4.0
      Reported-by: NWaiman Long <waiman.long@hp.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      6dfe5a04
  5. 13 4月, 2015 3 次提交
    • B
      xfs: kill unnecessary firstused overflow check on attr3 leaf removal · 66db8104
      Brian Foster 提交于
      xfs_attr3_leaf_remove() removes an attribute from an attr leaf block. If
      the attribute nameval data happens to be at the start of the nameval
      region, a new start offset (firstused) for the region is calculated
      (since the region grows from the tail of the block to the start). Once
      the new firstused is calculated, it is checked for zero in an apparent
      overflow check.
      
      Now that the in-core firstused is 32-bit, overflow is not possible and
      this check can be removed. Since the purpose for this check is not
      documented and appears to exist since the port to Linux, be conservative
      and replace it with an assert.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      66db8104
    • B
      xfs: use larger in-core attr firstused field and detect overflow · e87021a2
      Brian Foster 提交于
      The on-disk xfs_attr3_leaf_hdr structure firstused field is 16-bit and
      subject to overflow when fs block size is 64k. The field is typically
      initialized to block size when an attr leaf block is initialized. This
      problem is demonstrated by assert failures when running xfstests
      generic/117 on an fs with 64k blocks.
      
      To support the existing attr leaf block algorithms for insertion,
      rebalance and entry movement, increase the size of the in-core firstused
      field to 32-bit and handle the potential overflow on conversion to/from
      the on-disk structure. If the overflow condition occurs, set a special
      value in the firstused field that is translated back on header read. The
      special value is only required in the case of an empty 64k attr block. A
      value of zero is used because firstused is initialized to the block size
      and grows backwards from there. Furthermore, the attribute block header
      occupies the first bytes of the block. Thus, a value of zero has no
      other legitimate meaning for this structure. Two new conversion helpers
      are created to manage the conversion of firstused to and from disk.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e87021a2
    • B
      xfs: pass attr geometry to attr leaf header conversion functions · 2f661241
      Brian Foster 提交于
      The firstused field of the xfs_attr3_leaf_hdr structure is subject to an
      overflow when fs blocksize is 64k. In preparation to handle this
      overflow in the header conversion functions, pass the attribute geometry
      to the functions that convert the in-core structure to and from the
      on-disk structure.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2f661241
  6. 22 1月, 2015 2 次提交
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb
    • D
      xfs: remove bitfield based superblock updates · 4d11a402
      Dave Chinner 提交于
      When we log changes to the superblock, we first have to write them
      to the on-disk buffer, and then log that. Right now we have a
      complex bitfield based arrangement to only write the modified field
      to the buffer before we log it.
      
      This used to be necessary as a performance optimisation because we
      logged the superblock buffer in every extent or inode allocation or
      freeing, and so performance was extremely important. We haven't done
      this for years, however, ever since the lazy superblock counters
      pulled the superblock logging out of the transaction commit
      fast path.
      
      Hence we have a bunch of complexity that is not necessary that makes
      writing the in-core superblock to disk much more complex than it
      needs to be. We only need to log the superblock now during
      management operations (e.g. during mount, unmount or quota control
      operations) so it is not a performance critical path anymore.
      
      As such, remove the complex field based logging mechanism and
      replace it with a simple conversion function similar to what we use
      for all other on-disk structures.
      
      This means we always log the entirity of the superblock, but again
      because we rarely modify the superblock this is not an issue for log
      bandwidth or CPU time. Indeed, if we do log the superblock
      frequently, delayed logging will minimise the impact of this
      overhead.
      
      [Fixed gquota/pquota inode sharing regression noticed by bfoster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4d11a402
  7. 28 11月, 2014 2 次提交
  8. 25 6月, 2014 2 次提交
  9. 22 6月, 2014 2 次提交
  10. 06 6月, 2014 6 次提交
  11. 06 5月, 2014 1 次提交
    • D
      xfs: remote attribute overwrite causes transaction overrun · 8275cdd0
      Dave Chinner 提交于
      Commit e461fcb1 ("xfs: remote attribute lookups require the value
      length") passes the remote attribute length in the xfs_da_args
      structure on lookup so that CRC calculations and validity checking
      can be performed correctly by related code. This, unfortunately has
      the side effect of changing the args->valuelen parameter in cases
      where it shouldn't.
      
      That is, when we replace a remote attribute, the incoming
      replacement stores the value and length in args->value and
      args->valuelen, but then the lookup which finds the existing remote
      attribute overwrites args->valuelen with the length of the remote
      attribute being replaced. Hence when we go to create the new
      attribute, we create it of the size of the existing remote
      attribute, not the size it is supposed to be. When the new attribute
      is much smaller than the old attribute, this results in a
      transaction overrun and an ASSERT() failure on a debug kernel:
      
      XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
      
      Fix this by keeping the remote attribute value length separate to
      the attribute value length in the xfs_da_args structure. The enables
      us to pass the length of the remote attribute to be removed without
      overwriting the new attribute's length.
      
      Also, ensure that when we save remote block contexts for a later
      rename we zero the original state variables so that we don't confuse
      the state of the attribute to be removes with the state of the new
      attribute that we just added. [Spotted by Brain Foster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8275cdd0
  12. 27 2月, 2014 3 次提交
  13. 31 10月, 2013 3 次提交
    • D
      xfs: fix static and extern sparse warnings · 632b89e8
      Dave Chinner 提交于
      The kbuild test robot indicated that there were some new sparse
      warnings in fs/xfs/xfs_dquot_buf.c. Actually, there were a lot more
      that is wasn't warning about, so fix them all up.
      
      Reported-by: kbuild test robot
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      632b89e8
    • D
      xfs: vectorise encoding/decoding directory headers · 01ba43b8
      Dave Chinner 提交于
      Conversion from on-disk structures to in-core header structures
      currently relies on magic number checks. If the magic number is
      wrong, but one of the supported values, we do the wrong thing with
      the encode/decode operation. Split these functions so that there are
      discrete operations for the specific directory format we are
      handling.
      
      In doing this, move all the header encode/decode functions to
      xfs_da_format.c as they are directly manipulating the on-disk
      format. It should be noted that all the growth in binary size is
      from xfs_da_format.c - the rest of the code actaully shrinks.
      
         text    data     bss     dec     hex filename
       794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
       792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
       792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
       789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
       789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
       789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
       789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
       791421   96802    1096  889319   d91e7 fs/xfs/xfs.o.p7
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      01ba43b8
    • D
      xfs: vectorise DA btree operations · 4bceb18f
      Dave Chinner 提交于
      The remaining non-vectorised code for the directory structure is the
      node format blocks. This is shared with the attribute tree, and so
      is slightly more complex to vectorise.
      
      Introduce a "non-directory" directory ops structure that is attached
      to all non-directory inodes so that attribute operations can be
      vectorised for all inodes.
      
      Once we do this, we can vectorise all the da btree operations.
      Because this patch adds more infrastructure than it removes the
      binary size does not decrease:
      
         text    data     bss     dec     hex filename
       794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
       792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
       792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
       789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
       789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
       789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
       789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bceb18f
  14. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
  15. 31 8月, 2013 1 次提交
  16. 13 8月, 2013 4 次提交
  17. 10 7月, 2013 1 次提交
    • D
      xfs: remove local fork format handling from xfs_bmapi_write() · f3508bcd
      Dave Chinner 提交于
      The conversion from local format to extent format requires
      interpretation of the data in the fork being converted, so it cannot
      be done in a generic way. It is up to the caller to convert the fork
      format to extent format before calling into xfs_bmapi_write() so
      format conversion can be done correctly.
      
      The code in xfs_bmapi_write() to convert the format is used
      implicitly by the attribute and directory code, but they
      specifically zero the fork size so that the conversion does not do
      any allocation or manipulation. Move this conversion into the
      shortform to leaf functions for the dir/attr code so the conversions
      are explicitly controlled by all callers.
      
      Now we can remove the conversion code in xfs_bmapi_write.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f3508bcd
  18. 06 6月, 2013 1 次提交
  19. 05 6月, 2013 1 次提交
  20. 31 5月, 2013 1 次提交
    • D
      xfs: rework remote attr CRCs · 7bc0dc27
      Dave Chinner 提交于
      Note: this changes the on-disk remote attribute format. I assert
      that this is OK to do as CRCs are marked experimental and the first
      kernel it is included in has not yet reached release yet. Further,
      the userspace utilities are still evolving and so anyone using this
      stuff right now is a developer or tester using volatile filesystems
      for testing this feature. Hence changing the format right now to
      save longer term pain is the right thing to do.
      
      The fundamental change is to move from a header per extent in the
      attribute to a header per filesytem block in the attribute. This
      means there are more header blocks and the parsing of the attribute
      data is slightly more complex, but it has the advantage that we
      always know the size of the attribute on disk based on the length of
      the data it contains.
      
      This is where the header-per-extent method has problems. We don't
      know the size of the attribute on disk without first knowing how
      many extents are used to hold it. And we can't tell from a
      mapping lookup, either, because remote attributes can be allocated
      contiguously with other attribute blocks and so there is no obvious
      way of determining the actual size of the atribute on disk short of
      walking and mapping buffers.
      
      The problem with this approach is that if we map a buffer
      incorrectly (e.g. we make the last buffer for the attribute data too
      long), we then get buffer cache lookup failure when we map it
      correctly. i.e. we get a size mismatch on lookup. This is not
      necessarily fatal, but it's a cache coherency problem that can lead
      to returning the wrong data to userspace or writing the wrong data
      to disk. And debug kernels will assert fail if this occurs.
      
      I found lots of niggly little problems trying to fix this issue on a
      4k block size filesystem, finally getting it to pass with lots of
      fixes. The thing is, 1024 byte filesystems still failed, and it was
      getting really complex handling all the corner cases that were
      showing up. And there were clearly more that I hadn't found yet.
      
      It is complex, fragile code, and if we don't fix it now, it will be
      complex, fragile code forever more.
      
      Hence the simple fix is to add a header to each filesystem block.
      This gives us the same relationship between the attribute data
      length and the number of blocks on disk as we have without CRCs -
      it's a linear mapping and doesn't require us to guess anything. It
      is simple to implement, too - the remote block count calculated at
      lookup time can be used by the remote attribute set/get/remove code
      without modification for both CRC and non-CRC filesystems. The world
      becomes sane again.
      
      Because the copy-in and copy-out now need to iterate over each
      filesystem block, I moved them into helper functions so we separate
      the block mapping and buffer manupulations from the attribute data
      and CRC header manipulations. The code becomes much clearer as a
      result, and it is a lot easier to understand and debug. It also
      appears to be much more robust - once it worked on 4k block size
      filesystems, it has worked without failure on 1k block size
      filesystems, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ad1858d7)
      7bc0dc27