1. 17 4月, 2014 6 次提交
    • B
      xfs: fix tmpfile/selinux deadlock and initialize security · 330033d6
      Brian Foster 提交于
      xfstests generic/004 reproduces an ilock deadlock using the tmpfile
      interface when selinux is enabled. This occurs because
      xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The
      latter eventually calls into xfs_xattr_get() which attempts to get the
      lock again. E.g.:
      
      xfs_io          D ffffffff81c134c0  4096  3561   3560 0x00000080
      ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8
      00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540
      ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480
      Call Trace:
      [<ffffffff8177f969>] schedule+0x29/0x70
      [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120
      [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0
      [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs]
      [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs]
      [<ffffffff8124842f>] generic_getxattr+0x4f/0x70
      [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650
      [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20
      [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30
      [<ffffffff81237db0>] d_instantiate+0x50/0x70
      [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0
      [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs]
      [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs]
      [<ffffffff81230388>] path_openat+0x228/0x6a0
      [<ffffffff810230f9>] ? sched_clock+0x9/0x10
      [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8123101a>] do_filp_open+0x3a/0x90
      [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210
      [<ffffffff8121e4ce>] SyS_open+0x1e/0x20
      [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b
      
      xfs_vn_tmpfile() also fails to initialize security on the newly created
      inode.
      
      Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction
      has been committed and the inode unlocked. Also, initialize security on
      the inode based on the parent directory provided via the tmpfile call.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      330033d6
    • E
      xfs: fix buffer use after free on IO error · 8d6c1210
      Eric Sandeen 提交于
      When testing exhaustion of dm snapshots, the following appeared
      with CONFIG_DEBUG_OBJECTS_FREE enabled:
      
      ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]
      
      indicating that we'd freed a buffer which still had a pending reference,
      down this path:
      
      [  190.867975]  [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270
      [  190.880820]  [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370
      [  190.892615]  [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs]
      [  190.905629]  [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs]
      [  190.911770]  [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]
      
      At issue is the fact that if IO fails in xfs_buf_iorequest,
      we'll queue completion unconditionally, and then call
      xfs_buf_rele; but if IO failed, there are no IOs remaining,
      and xfs_buf_rele will free the bp while work is still queued.
      
      Fix this by not scheduling completion if the buffer has
      an error on it; run it immediately.  The rest is only comment
      changes.
      
      Thanks to dchinner for spotting the root cause.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8d6c1210
    • D
      xfs: wrong error sign conversion during failed DIO writes · 07d5035a
      Dave Chinner 提交于
      We negate the error value being returned from a generic function
      incorrectly. The code path that it is running in returned negative
      errors, so there is no need to negate it to get the correct error
      signs here.
      
      This was uncovered by generic/019.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      07d5035a
    • D
      xfs: unmount does not wait for shutdown during unmount · 9c23eccc
      Dave Chinner 提交于
      And interesting situation can occur if a log IO error occurs during
      the unmount of a filesystem. The cases reported have the same
      signature - the update of the superblock counters fails due to a log
      write IO error:
      
      XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa08a44a1
      XFS (dm-16): Log I/O Error Detected.  Shutting down filesystem
      XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
      XFS (dm-16): xfs_log_force: error 5 returned.
      XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
      
      It can be seen that the last line of output contains a corrupt
      device name - this is because the log and xfs_mount structures have
      already been freed by the time this message is printed. A kernel
      oops closely follows.
      
      The issue is that the shutdown is occurring in a separate IO
      completion thread to the unmount. Once the shutdown processing has
      started and all the iclogs are marked with XLOG_STATE_IOERROR, the
      log shutdown code wakes anyone waiting on a log force so they can
      process the shutdown error. This wakes up the unmount code that
      is doing a synchronous transaction to update the superblock
      counters.
      
      The unmount path now sees all the iclogs are marked with
      XLOG_STATE_IOERROR and so never waits on them again, knowing that if
      it does, there will not be a wakeup trigger for it and we will hang
      the unmount if we do. Hence the unmount runs through all the
      remaining code and frees all the filesystem structures while the
      xlog_iodone() is still processing the shutdown. When the log
      shutdown processing completes, xfs_do_force_shutdown() emits the
      "Please umount the filesystem and rectify the problem(s)" message,
      and xlog_iodone() then aborts all the objects attached to the iclog.
      An iclog that has already been freed....
      
      The real issue here is that there is no serialisation point between
      the log IO and the unmount. We have serialisations points for log
      writes, log forces, reservations, etc, but we don't actually have
      any code that wakes for log IO to fully complete. We do that for all
      other types of object, so why not iclogbufs?
      
      Well, it turns out that we can easily do this. We've got xfs_buf
      handles, and that's what everyone else uses for IO serialisation.
      i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
      release the lock in xlog_iodone() when we are finished with the
      buffer. That way before we tear down the iclog, we can lock and
      unlock the buffer to ensure IO completion has finished completely
      before we tear it down.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NBob Mastors <bob.mastors@solidfire.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      9c23eccc
    • D
      xfs: collapse range is delalloc challenged · d39a2ced
      Dave Chinner 提交于
      FSX has been detecting data corruption after to collapse range
      calls. The key observation is that the offset of the last extent in
      the file was not being shifted, and hence when the file size was
      adjusted it was truncating away data because the extents handled
      been correctly shifted.
      
      Tracing indicated that before the collapse, the extent list looked
      like:
      
      ....
      ino 0x5788 state  idx 6 offset 26 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 39 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      and after the shift of 2 blocks:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      Note that the last extent did not change offset. After the changing
      of the file size:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 30 flag 0
      
      You can see that the last extent had it's length truncated,
      indicating that we've lost data.
      
      The reason for this is that the xfs_bmap_shift_extents() loop uses
      XFS_IFORK_NEXTENTS() to determine how many extents are in the inode.
      This, unfortunately, doesn't take into account delayed allocation
      extents - it's a count of physically allocated extents - and hence
      when the file being collapsed has a delalloc extent like this one
      does prior to the range being collapsed:
      
      ....
      ino 0x5788 state  idx 4 offset 11 block 4503599627239429 count 1 flag 0
      ....
      
      it gets the count wrong and terminates the shift loop early.
      
      Fix it by using the in-memory extent array size that includes
      delayed allocation extents to determine the number of extents on the
      inode.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d39a2ced
    • D
      xfs: don't map ranges that span EOF for direct IO · 0e1f789d
      Dave Chinner 提交于
      Al Viro tracked down the problem that has caused generic/263 to fail
      on XFS since the test was introduced. If is caused by
      xfs_get_blocks() mapping a single extent that spans EOF without
      marking it as buffer-new() so that the direct IO code does not zero
      the tail of the block at the new EOF. This is a long standing bug
      that has been around for many, many years.
      
      Because xfs_get_blocks() starts the map before EOF, it can't set
      buffer_new(), because that causes he direct IO code to also zero
      unaligned sectors at the head of the IO. This would overwrite valid
      data with zeros, and hence we cannot validly return a single extent
      that spans EOF to direct IO.
      
      Fix this by detecting a mapping that spans EOF and truncate it down
      to EOF. This results in the the direct IO code doing the right thing
      for unaligned data blocks before EOF, and then returning to get
      another mapping for the region beyond EOF which XFS treats correctly
      by setting buffer_new() on it. This makes direct Io behave correctly
      w.r.t. tail block zeroing beyond EOF, and fsx is happy about that.
      
      Again, thanks to Al Viro for finding what I couldn't.
      
      [ dchinner: Fix for __divdi3 build error:
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Tested-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0e1f789d
  2. 14 4月, 2014 5 次提交
  3. 13 4月, 2014 1 次提交
    • L
      ceph: fix pr_fmt() redefinition · 96c57ade
      Linus Torvalds 提交于
      The vfs merge caused a latent bug to show up:
      
         In file included from fs/ceph/super.h:4:0,
                          from fs/ceph/ioctl.c:3:
         include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
          #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
          ^
         In file included from include/linux/kernel.h:13:0,
                          from include/linux/uio.h:12,
                          from include/linux/socket.h:7,
                          from include/uapi/linux/in.h:22,
                          from include/linux/in.h:23,
                          from fs/ceph/ioctl.c:1:
         include/linux/printk.h:214:0: note: this is the location of the previous definition
          #define pr_fmt(fmt) fmt
          ^
      
      where the reason is that <linux/ceph_debug.h> is included much too late
      for the "pr_fmt()" define.
      
      The include of <linux/ceph_debug.h> needs to be the first include in the
      file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
      noticeable until some unrelated header file changes brought in an
      indirect earlier include of <linux/kernel.h>.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96c57ade
  4. 12 4月, 2014 3 次提交
    • A
      cifs: fix the race in cifs_writev() · 19dfc1f5
      Al Viro 提交于
      O_APPEND handling there hadn't been completely fixed by Pavel's
      patch; it checks the right value, but it's racy - we can't really
      do that until i_mutex has been taken.
      
      Fix by switching to __generic_file_aio_write() (open-coding
      generic_file_aio_write(), actually) and pulling mutex_lock() above
      inode_size_read().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19dfc1f5
    • A
      ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure · eab87235
      Al Viro 提交于
      ceph_osdc_put_request(ERR_PTR(-error)) oopses.  What we want there
      is break, not goto out.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eab87235
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  5. 11 4月, 2014 2 次提交
    • W
      Btrfs: fix compile warnings on on avr32 platform · e4fbaee2
      Wang Shilong 提交于
      fs/btrfs/scrub.c: In function 'get_raid56_logic_offset':
      fs/btrfs/scrub.c:2269: warning: comparison of distinct pointer types lacks a cast
      fs/btrfs/scrub.c:2269: warning: right shift count >= width of type
      fs/btrfs/scrub.c:2269: warning: passing argument 1 of '__div64_32' from incompatible pointer type
      
      Since @rot is an int type, we should not use do_div(), fix it.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e4fbaee2
    • H
      btrfs: allow mounting btrfs subvolumes with different ro/rw options · 0723a047
      Harald Hoyer 提交于
      Given the following /etc/fstab entries:
      
      /dev/sda3 /mnt/foo btrfs subvol=foo,ro 0 0
      /dev/sda3 /mnt/bar btrfs subvol=bar,rw 0 0
      
      you can't issue:
      
      $ mount /mnt/foo
      $ mount /mnt/bar
      
      You would have to do:
      
      $ mount /mnt/foo
      $ mount -o remount,rw /mnt/foo
      $ mount --bind -o remount,ro /mnt/foo
      $ mount /mnt/bar
      
      or
      
      $ mount /mnt/bar
      $ mount --rw /mnt/foo
      $ mount --bind -o remount,ro /mnt/foo
      
      With this patch you can do
      
      $ mount /mnt/foo
      $ mount /mnt/bar
      
      $ cat /proc/self/mountinfo
      49 33 0:41 /foo /mnt/foo ro,relatime shared:36 - btrfs /dev/sda3 rw,ssd,space_cache
      87 33 0:41 /bar /mnt/bar rw,relatime shared:74 - btrfs /dev/sda3 rw,ssd,space_cache
      Signed-off-by: NChris Mason <clm@fb.com>
      0723a047
  6. 09 4月, 2014 9 次提交
  7. 08 4月, 2014 14 次提交
新手
引导
客服 返回
顶部