1. 15 5月, 2014 9 次提交
  2. 06 5月, 2014 2 次提交
    • D
      xfs: remote attribute overwrite causes transaction overrun · 8275cdd0
      Dave Chinner 提交于
      Commit e461fcb1 ("xfs: remote attribute lookups require the value
      length") passes the remote attribute length in the xfs_da_args
      structure on lookup so that CRC calculations and validity checking
      can be performed correctly by related code. This, unfortunately has
      the side effect of changing the args->valuelen parameter in cases
      where it shouldn't.
      
      That is, when we replace a remote attribute, the incoming
      replacement stores the value and length in args->value and
      args->valuelen, but then the lookup which finds the existing remote
      attribute overwrites args->valuelen with the length of the remote
      attribute being replaced. Hence when we go to create the new
      attribute, we create it of the size of the existing remote
      attribute, not the size it is supposed to be. When the new attribute
      is much smaller than the old attribute, this results in a
      transaction overrun and an ASSERT() failure on a debug kernel:
      
      XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
      
      Fix this by keeping the remote attribute value length separate to
      the attribute value length in the xfs_da_args structure. The enables
      us to pass the length of the remote attribute to be removed without
      overwriting the new attribute's length.
      
      Also, ensure that when we save remote block contexts for a later
      rename we zero the original state variables so that we don't confuse
      the state of the attribute to be removes with the state of the new
      attribute that we just added. [Spotted by Brain Foster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8275cdd0
    • B
      xfs: initialize default acls for ->tmpfile() · d540e43b
      Brian Foster 提交于
      The current tmpfile handler does not initialize default ACLs. Doing so
      within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
      which is already used as a common create handler.
      
      xfs_vn_mknod() does not currently have a mechanism to determine whether
      to link the file into the namespace. Therefore, further abstract
      xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
      parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
      on the dentry when called via ->tmpfile().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d540e43b
  3. 05 5月, 2014 1 次提交
    • D
      xfs: fully support v5 format filesystems · c99d609a
      Dave Chinner 提交于
      We have had this code in the kernel for over a year now and have
      shaken all the known issues out of the code over the past few
      releases. It's now time to remove the experimental warnings during
      mount and fully support the new filesystem format in production
      systems.
      
      Remove the experimental warning, and add a version number to the
      initial "mounting filesystem" message to tell use what type of
      filesystem is being mounted. Also, remove the temporary inode
      cluster size output at mount time now we know that this code works
      fine.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      c99d609a
  4. 24 4月, 2014 11 次提交
  5. 23 4月, 2014 8 次提交
  6. 17 4月, 2014 6 次提交
    • B
      xfs: fix tmpfile/selinux deadlock and initialize security · 330033d6
      Brian Foster 提交于
      xfstests generic/004 reproduces an ilock deadlock using the tmpfile
      interface when selinux is enabled. This occurs because
      xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The
      latter eventually calls into xfs_xattr_get() which attempts to get the
      lock again. E.g.:
      
      xfs_io          D ffffffff81c134c0  4096  3561   3560 0x00000080
      ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8
      00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540
      ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480
      Call Trace:
      [<ffffffff8177f969>] schedule+0x29/0x70
      [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120
      [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0
      [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs]
      [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs]
      [<ffffffff8124842f>] generic_getxattr+0x4f/0x70
      [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650
      [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20
      [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30
      [<ffffffff81237db0>] d_instantiate+0x50/0x70
      [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0
      [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs]
      [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs]
      [<ffffffff81230388>] path_openat+0x228/0x6a0
      [<ffffffff810230f9>] ? sched_clock+0x9/0x10
      [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8123101a>] do_filp_open+0x3a/0x90
      [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210
      [<ffffffff8121e4ce>] SyS_open+0x1e/0x20
      [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b
      
      xfs_vn_tmpfile() also fails to initialize security on the newly created
      inode.
      
      Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction
      has been committed and the inode unlocked. Also, initialize security on
      the inode based on the parent directory provided via the tmpfile call.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      330033d6
    • E
      xfs: fix buffer use after free on IO error · 8d6c1210
      Eric Sandeen 提交于
      When testing exhaustion of dm snapshots, the following appeared
      with CONFIG_DEBUG_OBJECTS_FREE enabled:
      
      ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]
      
      indicating that we'd freed a buffer which still had a pending reference,
      down this path:
      
      [  190.867975]  [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270
      [  190.880820]  [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370
      [  190.892615]  [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs]
      [  190.905629]  [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs]
      [  190.911770]  [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]
      
      At issue is the fact that if IO fails in xfs_buf_iorequest,
      we'll queue completion unconditionally, and then call
      xfs_buf_rele; but if IO failed, there are no IOs remaining,
      and xfs_buf_rele will free the bp while work is still queued.
      
      Fix this by not scheduling completion if the buffer has
      an error on it; run it immediately.  The rest is only comment
      changes.
      
      Thanks to dchinner for spotting the root cause.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8d6c1210
    • D
      xfs: wrong error sign conversion during failed DIO writes · 07d5035a
      Dave Chinner 提交于
      We negate the error value being returned from a generic function
      incorrectly. The code path that it is running in returned negative
      errors, so there is no need to negate it to get the correct error
      signs here.
      
      This was uncovered by generic/019.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      07d5035a
    • D
      xfs: unmount does not wait for shutdown during unmount · 9c23eccc
      Dave Chinner 提交于
      And interesting situation can occur if a log IO error occurs during
      the unmount of a filesystem. The cases reported have the same
      signature - the update of the superblock counters fails due to a log
      write IO error:
      
      XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa08a44a1
      XFS (dm-16): Log I/O Error Detected.  Shutting down filesystem
      XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
      XFS (dm-16): xfs_log_force: error 5 returned.
      XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
      
      It can be seen that the last line of output contains a corrupt
      device name - this is because the log and xfs_mount structures have
      already been freed by the time this message is printed. A kernel
      oops closely follows.
      
      The issue is that the shutdown is occurring in a separate IO
      completion thread to the unmount. Once the shutdown processing has
      started and all the iclogs are marked with XLOG_STATE_IOERROR, the
      log shutdown code wakes anyone waiting on a log force so they can
      process the shutdown error. This wakes up the unmount code that
      is doing a synchronous transaction to update the superblock
      counters.
      
      The unmount path now sees all the iclogs are marked with
      XLOG_STATE_IOERROR and so never waits on them again, knowing that if
      it does, there will not be a wakeup trigger for it and we will hang
      the unmount if we do. Hence the unmount runs through all the
      remaining code and frees all the filesystem structures while the
      xlog_iodone() is still processing the shutdown. When the log
      shutdown processing completes, xfs_do_force_shutdown() emits the
      "Please umount the filesystem and rectify the problem(s)" message,
      and xlog_iodone() then aborts all the objects attached to the iclog.
      An iclog that has already been freed....
      
      The real issue here is that there is no serialisation point between
      the log IO and the unmount. We have serialisations points for log
      writes, log forces, reservations, etc, but we don't actually have
      any code that wakes for log IO to fully complete. We do that for all
      other types of object, so why not iclogbufs?
      
      Well, it turns out that we can easily do this. We've got xfs_buf
      handles, and that's what everyone else uses for IO serialisation.
      i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
      release the lock in xlog_iodone() when we are finished with the
      buffer. That way before we tear down the iclog, we can lock and
      unlock the buffer to ensure IO completion has finished completely
      before we tear it down.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NBob Mastors <bob.mastors@solidfire.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      9c23eccc
    • D
      xfs: collapse range is delalloc challenged · d39a2ced
      Dave Chinner 提交于
      FSX has been detecting data corruption after to collapse range
      calls. The key observation is that the offset of the last extent in
      the file was not being shifted, and hence when the file size was
      adjusted it was truncating away data because the extents handled
      been correctly shifted.
      
      Tracing indicated that before the collapse, the extent list looked
      like:
      
      ....
      ino 0x5788 state  idx 6 offset 26 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 39 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      and after the shift of 2 blocks:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      Note that the last extent did not change offset. After the changing
      of the file size:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 30 flag 0
      
      You can see that the last extent had it's length truncated,
      indicating that we've lost data.
      
      The reason for this is that the xfs_bmap_shift_extents() loop uses
      XFS_IFORK_NEXTENTS() to determine how many extents are in the inode.
      This, unfortunately, doesn't take into account delayed allocation
      extents - it's a count of physically allocated extents - and hence
      when the file being collapsed has a delalloc extent like this one
      does prior to the range being collapsed:
      
      ....
      ino 0x5788 state  idx 4 offset 11 block 4503599627239429 count 1 flag 0
      ....
      
      it gets the count wrong and terminates the shift loop early.
      
      Fix it by using the in-memory extent array size that includes
      delayed allocation extents to determine the number of extents on the
      inode.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d39a2ced
    • D
      xfs: don't map ranges that span EOF for direct IO · 0e1f789d
      Dave Chinner 提交于
      Al Viro tracked down the problem that has caused generic/263 to fail
      on XFS since the test was introduced. If is caused by
      xfs_get_blocks() mapping a single extent that spans EOF without
      marking it as buffer-new() so that the direct IO code does not zero
      the tail of the block at the new EOF. This is a long standing bug
      that has been around for many, many years.
      
      Because xfs_get_blocks() starts the map before EOF, it can't set
      buffer_new(), because that causes he direct IO code to also zero
      unaligned sectors at the head of the IO. This would overwrite valid
      data with zeros, and hence we cannot validly return a single extent
      that spans EOF to direct IO.
      
      Fix this by detecting a mapping that spans EOF and truncate it down
      to EOF. This results in the the direct IO code doing the right thing
      for unaligned data blocks before EOF, and then returning to get
      another mapping for the region beyond EOF which XFS treats correctly
      by setting buffer_new() on it. This makes direct Io behave correctly
      w.r.t. tail block zeroing beyond EOF, and fsx is happy about that.
      
      Again, thanks to Al Viro for finding what I couldn't.
      
      [ dchinner: Fix for __divdi3 build error:
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Tested-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0e1f789d
  7. 14 4月, 2014 3 次提交