1. 15 6月, 2013 6 次提交
    • A
      use can_lookup() instead of direct checks of ->i_op->lookup · 05252901
      Al Viro 提交于
      a couple of places got missed back when Linus has introduced that one...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      05252901
    • O
      fput: task_work_add() can fail if the caller has passed exit_task_work() · e7b2c406
      Oleg Nesterov 提交于
      fput() assumes that it can't be called after exit_task_work() but
      this is not true, for example free_ipc_ns()->shm_destroy() can do
      this. In this case fput() silently leaks the file.
      
      Change it to fallback to delayed_fput_work if task_work_add() fails.
      The patch looks complicated but it is not, it changes the code from
      
      	if (PF_KTHREAD) {
      		schedule_work(...);
      		return;
      	}
      	task_work_add(...)
      
      to
      	if (!PF_KTHREAD) {
      		if (!task_work_add(...))
      			return;
      		/* fallback */
      	}
      	schedule_work(...);
      
      As for shm_destroy() in particular, we could make another fix but I
      think this change makes sense anyway. There could be another similar
      user, it is not safe to assume that task_work_add() can't fail.
      Reported-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e7b2c406
    • D
      xfs: don't shutdown log recovery on validation errors · d302cf1d
      Dave Chinner 提交于
      Unfortunately, we cannot guarantee that items logged multiple times
      and replayed by log recovery do not take objects back in time. When
      they are taken back in time, the go into an intermediate state which
      is corrupt, and hence verification that occurs on this intermediate
      state causes log recovery to abort with a corruption shutdown.
      
      Instead of causing a shutdown and unmountable filesystem, don't
      verify post-recovery items before they are written to disk. This is
      less than optimal, but there is no way to detect this issue for
      non-CRC filesystems If log recovery successfully completes, this
      will be undone and the object will be consistent by subsequent
      transactions that are replayed, so in most cases we don't need to
      take drastic action.
      
      For CRC enabled filesystems, leave the verifiers in place - we need
      to call them to recalculate the CRCs on the objects anyway. This
      recovery problem can be solved for such filesystems - we have a LSN
      stamped in all metadata at writeback time that we can to determine
      whether the item should be replayed or not. This is a separate piece
      of work, so is not addressed by this patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 9222a9cf)
      d302cf1d
    • D
      xfs: ensure btree root split sets blkno correctly · 088c9f67
      Dave Chinner 提交于
      For CRC enabled filesystems, the BMBT is rooted in an inode, so it
      passes through a different code path on root splits than the
      freespace and inode btrees. This is much less traversed by xfstests
      than the other trees. When testing on a 1k block size filesystem,
      I've been seeing ASSERT failures in generic/234 like:
      
      XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317
      
      which are generally preceded by a lblock check failure. I noticed
      this in the bmbt stats:
      
      $ pminfo -f xfs.btree.block_map
      
      xfs.btree.block_map.lookup
          value 39135
      
      xfs.btree.block_map.compare
          value 268432
      
      xfs.btree.block_map.insrec
          value 15786
      
      xfs.btree.block_map.delrec
          value 13884
      
      xfs.btree.block_map.newroot
          value 2
      
      xfs.btree.block_map.killroot
          value 0
      .....
      
      Very little coverage of root splits and merges. Indeed, on a 4k
      filesystem, block_map.newroot and block_map.killroot are both zero.
      i.e. the code is not exercised at all, and it's the only generic
      btree infrastructure operation that is not exercised by a default run
      of xfstests.
      
      Turns out that on a 1k filesystem, generic/234 accounts for one of
      those two root splits, and that is somewhat of a smoking gun. In
      fact, it's the same problem we saw in the directory/attr code where
      headers are memcpy()d from one block to another without updating the
      self describing metadata.
      
      Simple fix - when copying the header out of the root block, make
      sure the block number is updated correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ade1335a)
      088c9f67
    • D
      xfs: fix implicit padding in directory and attr CRC formats · 5170711d
      Dave Chinner 提交于
      Michael L. Semon has been testing CRC patches on a 32 bit system and
      been seeing assert failures in the directory code from xfs/080.
      Thanks to Michael's heroic efforts with printk debugging, we found
      that the problem was that the last free space being left in the
      directory structure was too small to fit a unused tag structure and
      it was being corrupted and attempting to log a region out of bounds.
      Hence the assert failure looked something like:
      
      .....
      #5 calling xfs_dir2_data_log_unused() 36 32
      #1 4092 4095 4096
      #2 8182 8183 4096
      XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
      
      Where #1 showed the first region of the dup being logged (i.e. the
      last 4 bytes of a directory buffer) and #2 shows the corrupt values
      being calculated from the length of the dup entry which overflowed
      the size of the buffer.
      
      It turns out that the problem was not in the logging code, nor in
      the freespace handling code. It is an initial condition bug that
      only shows up on 32 bit systems. When a new buffer is initialised,
      where's the freespace that is set up:
      
      [  172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
      [  172.316346] #9 calling xfs_dir2_data_log_unused()
      [  172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
      [  172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096
      
      Note the offset of the first region being logged? It's 60 bytes into
      the buffer. Once I saw that, I pretty much knew that the bug was
      going to be caused by this.
      
      Essentially, all direct entries are rounded to 8 bytes in length,
      and all entries start with an 8 byte alignment. This means that we
      can decode inplace as variables are naturally aligned. With the
      directory data supposedly starting on a 8 byte boundary, and all
      entries padded to 8 bytes, the minimum freespace in a directory
      block is supposed to be 8 bytes, which is large enough to fit a
      unused data entry structure (6 bytes in size). The fact we only have
      4 bytes of free space indicates a directory data block alignment
      problem.
      
      And what do you know - there's an implicit hole in the directory
      data block header for the CRC format, which means the header is 60
      byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
      padding. And while looking at the structures, I found the same
      problem in the attr leaf header. Fix them both.
      
      Note that this only affects 32 bit systems with CRCs enabled.
      Everything else is just fine. Note that CRC enabled filesystems created
      before this fix on such systems will not be readable with this fix
      applied.
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Debugged-by: NMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 8a1fd295)
      5170711d
    • D
      xfs: don't emit v5 superblock warnings on write · 47ad2fcb
      Dave Chinner 提交于
      We write the superblock every 30s or so which results in the
      verifier being called. Right now that results in this output
      every 30s:
      
      XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
      Use of these features in this kernel is at your own risk!
      
      And spamming the logs.
      
      We don't need to check for whether we support v5 superblocks or
      whether there are feature bits we don't support set as these are
      only relevant when we first mount the filesytem. i.e. on superblock
      read. Hence for the write verification we can just skip all the
      checks (and hence verbose output) altogether.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 34510185)
      47ad2fcb
  2. 13 6月, 2013 5 次提交
  3. 09 6月, 2013 6 次提交
  4. 08 6月, 2013 1 次提交
  5. 06 6月, 2013 6 次提交
    • D
      xfs: increase number of ACL entries for V5 superblocks · 0a8aa193
      Dave Chinner 提交于
      The limit of 25 ACL entries is arbitrary, but baked into the on-disk
      format.  For version 5 superblocks, increase it to the maximum nuber
      of ACLs that can fit into a single xattr.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NMark Tinguely <tinuguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 5c87d4bc)
      0a8aa193
    • D
      xfs: disable noattr2/attr2 mount options for CRC enabled filesystems · f763fd44
      Dave Chinner 提交于
      attr2 format is always enabled for v5 superblock filesystems, so the
      mount options to enable or disable it need to be cause mount errors.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit d3eaace8)
      f763fd44
    • D
      xfs: inode unlinked list needs to recalculate the inode CRC · ad868afd
      Dave Chinner 提交于
      The inode unlinked list manipulations operate directly on the inode
      buffer, and so bypass the inode CRC calculation mechanisms. Hence an
      inode on the unlinked list has an invalid CRC. Fix this by
      recalculating the CRC whenever we modify an unlinked list pointer in
      an inode, ncluding during log recovery. This is trivial to do and
      results in  unlinked list operations always leaving a consistent
      inode in the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 0a32c26e)
      ad868afd
    • D
      xfs: fix log recovery transaction item reordering · 75406170
      Dave Chinner 提交于
      There are several constraints that inode allocation and unlink
      logging impose on log recovery. These all stem from the fact that
      inode alloc/unlink are logged in buffers, but all other inode
      changes are logged in inode items. Hence there are ordering
      constraints that recovery must follow to ensure the correct result
      occurs.
      
      As it turns out, this ordering has been working mostly by chance
      than good management. The existing code moves all buffers except
      cancelled buffers to the head of the list, and everything else to
      the tail of the list. The problem with this is that is interleaves
      inode items with the buffer cancellation items, and hence whether
      the inode item in an cancelled buffer gets replayed is essentially
      left to chance.
      
      Further, this ordering causes problems for log recovery when inode
      CRCs are enabled. It typically replays the inode unlink buffer long before
      it replays the inode core changes, and so the CRC recorded in an
      unlink buffer is going to be invalid and hence any attempt to
      validate the inode in the buffer is going to fail. Hence we really
      need to enforce the ordering that the inode alloc/unlink code has
      expected log recovery to have since inode chunk de-allocation was
      introduced back in 2003...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit a775ad77)
      75406170
    • D
      xfs: fix remote attribute invalidation for a leaf · ea929536
      Dave Chinner 提交于
      When invalidating an attribute leaf block block, there might be
      remote attributes that it points to. With the recent rework of the
      remote attribute format, we have to make sure we calculate the
      length of the attribute correctly. We aren't doing that in
      xfs_attr3_leaf_inactive(), so fix it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NMark Tinguely <tinuguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 59913f14)
      ea929536
    • D
      xfs: rework dquot CRCs · bb9b8e86
      Dave Chinner 提交于
      Calculating dquot CRCs when the backing buffer is written back just
      doesn't work reliably. There are several places which manipulate
      dquots directly in the buffers, and they don't calculate CRCs
      appropriately, nor do they always set the buffer up to calculate
      CRCs appropriately.
      
      Firstly, if we log a dquot buffer (e.g. during allocation) it gets
      logged without valid CRC, and so on recovery we end up with a dquot
      that is not valid.
      
      Secondly, if we recover/repair a dquot, we don't have a verifier
      attached to the buffer and hence CRCs are not calculated on the way
      down to disk.
      
      Thirdly, calculating the CRC after we've changed the contents means
      that if we re-read the dquot from the buffer, we cannot verify the
      contents of the dquot are valid, as the CRC is invalid.
      
      So, to avoid all the dquot CRC errors that are being detected by the
      read verifier, change to using the same model as for inodes. That
      is, dquot CRCs are calculated and written to the backing buffer at
      the time the dquot is flushed to the backing buffer. If we modify
      the dquot directly in the backing buffer, calculate the CRC
      immediately after the modification is complete. Hence the dquot in
      the on-disk buffer should always have a valid CRC.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 6fcdc59d)
      bb9b8e86
  6. 05 6月, 2013 1 次提交
  7. 03 6月, 2013 7 次提交
  8. 01 6月, 2013 8 次提交
    • J
      cifs: fix off-by-one bug in build_unc_path_to_root · 1fc29bac
      Jeff Layton 提交于
      commit 839db3d1 (cifs: fix up handling of prefixpath= option) changed
      the code such that the vol->prepath no longer contained a leading
      delimiter and then fixed up the places that accessed that field to
      account for that change.
      
      One spot in build_unc_path_to_root was missed however. When doing the
      pointer addition on pos, that patch failed to account for the fact that
      we had already incremented "pos" by one when adding the length of the
      prepath. This caused a buffer overrun by one byte.
      
      This patch fixes the problem by correcting the handling of "pos".
      
      Cc: <stable@vger.kernel.org> # v3.8+
      Reported-by: NMarcus Moeller <marcus.moeller@gmx.ch>
      Reported-by: NKen Fallon <ken.fallon@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      1fc29bac
    • J
      reiserfs: fix deadlock with nfs racing on create/lookup · a1457c0c
      Jeff Mahoney 提交于
      Reiserfs is currently able to be deadlocked by having two NFS clients
      where one has removed and recreated a file and another is accessing the
      file with an open file handle.
      
      If one client deletes and recreates a file with timing such that the
      recreated file obtains the same [dirid, objectid] pair as the original
      file while another client accesses the file via file handle, the create
      and lookup can race and deadlock if the lookup manages to create the
      in-memory inode first.
      
      The create thread, in insert_inode_locked4, will hold the write lock
      while waiting on the other inode to be unlocked. The lookup thread,
      anywhere in the iget path, will release and reacquire the write lock while
      it schedules. If it needs to reacquire the lock while the create thread
      has it, it will never be able to make forward progress because it needs
      to reacquire the lock before ultimately unlocking the inode.
      
      This patch drops the write lock across the insert_inode_locked4 call so
      that the ordering of inode_wait -> write lock is retained. Since this
      would have been the case before the BKL push-down, this is safe.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      a1457c0c
    • J
      reiserfs: fix problems with chowning setuid file w/ xattrs · 4a857011
      Jeff Mahoney 提交于
      reiserfs_chown_xattrs() takes the iattr struct passed into ->setattr
      and uses it to iterate over all the attrs associated with a file to change
      ownership of xattrs (and transfer quota associated with the xattr files).
      
      When the setuid bit is cleared during chown, ATTR_MODE and iattr->ia_mode
      are passed to all the xattrs as well. This means that the xattr directory
      will have S_IFREG added to its mode bits.
      
      This has been prevented in practice by a missing IS_PRIVATE check
      in reiserfs_acl_chmod, which caused a double-lock to occur while holding
      the write lock. Since the file system was completely locked up, the
      writeout of the corrupted mode never happened.
      
      This patch temporarily clears everything but ATTR_UID|ATTR_GID for the
      calls to reiserfs_setattr and adds the missing IS_PRIVATE check.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      4a857011
    • J
      reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry · 0bdc7acb
      Jeff Mahoney 提交于
      After sleeping for filldir(), we check to see if the file system has
      changed and research. The next_pos pointer is updated but its value
      isn't pushed into the key used for the search itself. As a result,
      the search returns the same item that the last cycle of the loop did
      and filldir() is called multiple times with the same data.
      
      The end result is that the buffer can contain the same name multiple
      times. This can be returned to userspace or used internally in the
      xattr code where it can manifest with the following warning:
      
      jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)
      
      reiserfs_for_each_xattr uses reiserfs_readdir_dentry to iterate over
      the xattr names and ends up trying to unlink the same name twice. The
      second attempt fails with -ENOENT and the error is returned. At some
      point I'll need to add support into reiserfsck to remove the orphaned
      directories left behind when this occurs.
      
      The fix is to push the value into the key before researching.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      0bdc7acb
    • A
      448293aa
    • A
      hpfs: deadlock and race in directory lseek() · 31abdab9
      Al Viro 提交于
      For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex
      in hpfs_dir_lseek() - there's a lot of methods that grab the former with
      the caller already holding the latter, so it must take i_mutex first.
      
      For another, locking the damn thing, carefully validating the offset,
      then dropping locks and assigning the offset is obviously racy.
      
      Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c
      won't modify the sucker on B-tree surgeries.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      31abdab9
    • A
      qnx6: qnx6_readdir() has a braino in pos calculation · 1d7095c7
      Al Viro 提交于
      We want to mask lower 5 bits out, not leave only those and clear the
      rest...  As it is, we end up always starting to read from the beginning
      of directory, no matter what the current position had been.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1d7095c7
    • T
      vfs: Fix invalid ida_remove() call · 5d477b60
      Takashi Iwai 提交于
      When the group id of a shared mount is not allocated, the umount still
      tries to call mnt_release_group_id(), which eventually hits a kernel
      warning at ida_remove() spewing a message like:
        ida_remove called for id=0 which is not allocated.
      
      This patch fixes the bug simply checking the group id in the caller.
      Reported-by: NCristian Rodríguez <crrodriguez@opensuse.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5d477b60