1. 26 4月, 2014 3 次提交
  2. 25 4月, 2014 8 次提交
    • F
      Btrfs: correctly set profile flags on seqlock retry · f8213bdc
      Filipe Manana 提交于
      If we had to retry on the profiles seqlock (due to a concurrent write), we
      would set bits on the input flags that corresponded both to the current
      profile and to previous values of the profile.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f8213bdc
    • F
      Btrfs: use correct key when repeating search for extent item · 9ce49a0b
      Filipe Manana 提交于
      If skinny metadata is enabled and our first tree search fails to find a
      skinny extent item, we may repeat a tree search for a "fat" extent item
      (if the previous item in the leaf is not the "fat" extent we're looking
      for). However we were not setting the new key's objectid to the right
      value, as we previously used the same key variable to peek at the previous
      item in the leaf, which has a different objectid. So just set the right
      objectid to avoid modifying/deleting a wrong item if we repeat the tree
      search.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9ce49a0b
    • M
      Btrfs: fix inode caching vs tree log · 1c70d8fb
      Miao Xie 提交于
      Currently, with inode cache enabled, we will reuse its inode id immediately
      after unlinking file, we may hit something like following:
      
      |->iput inode
      |->return inode id into inode cache
      |->create dir,fsync
      |->power off
      
      An easy way to reproduce this problem is:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt -o inode_cache,commit=100
      dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
      inode_id=`ls -i /mnt/data | awk '{print $1}'`
      rm -f /mnt/data
      
      i=1
      while [ 1 ]
      do
              mkdir /mnt/dir_$i
              test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
              if [ $test1 -eq $inode_id ]
              then
      		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
      		echo b > /proc/sysrq-trigger
      	fi
      	sleep 1
              i=$(($i+1))
      done
      
      mount /dev/sdb /mnt
      umount /dev/sdb
      btrfs check /dev/sdb
      
      We fix this problem by adding unlinked inode's id into pinned tree,
      and we can not reuse them until committing transaction.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1c70d8fb
    • W
      Btrfs: fix possible memory leaks in open_ctree() · 28c16cbb
      Wang Shilong 提交于
      Fix possible memory leaks in the following error handling paths:
      
      read_tree_block()
      btrfs_recover_log_trees
      btrfs_commit_super()
      btrfs_find_orphan_roots()
      btrfs_cleanup_fs_roots()
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      28c16cbb
    • W
      Btrfs: avoid triggering bug_on() when we fail to start inode caching task · e60efa84
      Wang Shilong 提交于
      When running stress test(including snapshots,balance,fstress), we trigger
      the following BUG_ON() which is because we fail to start inode caching task.
      
      [  181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
      [  181.137963] invalid opcode: 0000 [#1] SMP
      [  181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
      [  181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
      [  181.367506] Call Trace:
      [  181.371107]  [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs]
      [  181.379191]  [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
      [  181.387464]  [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40
      [  181.395642]  [<ffffffff811dc5fe>] evict+0x9e/0x190
      [  181.401882]  [<ffffffff811dcde3>] iput+0xf3/0x180
      [  181.408025]  [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
      [  181.416614]  [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
      [  181.425399]  [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
      [  181.435059]  [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
      [  181.444148]  [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs]
      [  181.451971]  [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80
      [  181.459509]  [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520
      [  181.467046]  [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0
      [  181.474393]  [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0
      [  181.481450]  [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0
      [  181.488021]  [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b
      
      We should avoid triggering BUG_ON() here, instead, we output warning messages
      and clear inode_cache option.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e60efa84
    • W
      9d89ce65
    • D
      btrfs: replace error code from btrfs_drop_extents · 3f9e3df8
      David Sterba 提交于
      There's a case which clone does not handle and used to BUG_ON instead,
      (testcase xfstests/btrfs/035), now returns EINVAL. This error code is
      confusing to the ioctl caller, as it normally signifies errorneous
      arguments.
      
      Change it to ENOPNOTSUPP which allows a fall back to copy instead of
      clone. This does not affect the common reflink operation.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      3f9e3df8
    • Q
      btrfs: Change the hole range to a more accurate value. · c5f7d0bb
      Qu Wenruo 提交于
      Commit 3ac0d7b9 fixed the btrfs expanding
      write problem but the hole punched is sometimes too large for some
      iovec, which has unmapped data ranges.
      This patch will change to hole range to a more accurate value using the
      counts checked by the write check routines.
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c5f7d0bb
  3. 24 4月, 2014 1 次提交
  4. 22 4月, 2014 1 次提交
    • J
      locks: rename file-private locks to "open file description locks" · 0d3f7a2d
      Jeff Layton 提交于
      File-private locks have been merged into Linux for v3.15, and *now*
      people are commenting that the name and macro definitions for the new
      file-private locks suck.
      
      ...and I can't even disagree. The names and command macros do suck.
      
      We're going to have to live with these for a long time, so it's
      important that we be happy with the names before we're stuck with them.
      The consensus on the lists so far is that they should be rechristened as
      "open file description locks".
      
      The name isn't a big deal for the kernel, but the command macros are not
      visually distinct enough from the traditional POSIX lock macros. The
      glibc and documentation folks are recommending that we change them to
      look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
      programmer will typo one of the commands wrong, and also makes it easier
      to spot this difference when reading code.
      
      This patch makes the following changes that I think are necessary before
      v3.15 ships:
      
      1) rename the command macros to their new names. These end up in the uapi
         headers and so are part of the external-facing API. It turns out that
         glibc doesn't actually use the fcntl.h uapi header, but it's hard to
         be sure that something else won't. Changing it now is safest.
      
      2) make the the /proc/locks output display these as type "OFDLCK"
      
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Carlos O'Donell <carlos@redhat.com>
      Cc: Stefan Metzmacher <metze@samba.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Frank Filz <ffilzlnx@mindspring.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      0d3f7a2d
  5. 20 4月, 2014 3 次提交
    • N
      ext4: disable COLLAPSE_RANGE for bigalloc · 0a04b248
      Namjae Jeon 提交于
      Once COLLAPSE RANGE is be disable for ext4 with bigalloc feature till finding
      root-cause of problem. It will be enable with fixing that regression of
      xfstest(generic 075 and 091) again.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Reviewed-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0a04b248
    • N
      ext4: fix COLLAPSE_RANGE failure with 1KB block size · a8680e0d
      Namjae Jeon 提交于
      When formatting with 1KB or 2KB(not aligned with PAGE SIZE) block
      size, xfstests generic/075 and 091 are failing. The offset supplied to
      function truncate_pagecache_range is block size aligned. In this
      function start offset is re-aligned to PAGE_SIZE by rounding_up to the
      next page boundary.  Due to this rounding up, old data remains in the
      page cache when blocksize is less than page size and start offset is
      not aligned with page size.  In case of collapse range, we need to
      align start offset to page size boundary by doing a round down
      operation instead of round up.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a8680e0d
    • E
      coredump: fix va_list corruption · 404ca80e
      Eric Dumazet 提交于
      A va_list needs to be copied in case it needs to be used twice.
      
      Thanks to Hugh for debugging this issue, leading to various panics.
      
      Tested:
      
        lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
      
      'produce_core' is simply : main() { *(int *)0 = 1;}
      
        lpq84:~# ./produce_core
        Segmentation fault (core dumped)
        lpq84:~# dmesg | tail -1
        [  614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed
      
      Notice the last argument was replaced by a NULL (we were lucky enough to
      not crash, but do not try this on your production machine !)
      
      After fix :
      
        lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
        lpq83:~# ./produce_core
        Segmentation fault
        lpq83:~# dmesg | tail -1
        [  740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed
      
      Fixes: 5fe9d8ca ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NHugh Dickins <hughd@google.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      404ca80e
  6. 18 4月, 2014 11 次提交
  7. 17 4月, 2014 13 次提交
    • M
      cif: fix dead code · 1f80c0cc
      Michael Opdenacker 提交于
      This issue was found by Coverity (CID 1202536)
      
      This proposes a fix for a statement that creates dead code.
      The "rc < 0" statement is within code that is run
      with "rc > 0".
      
      It seems like "err < 0" was meant to be used here.
      This way, the error code is returned by the function.
      Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      1f80c0cc
    • J
      cifs: fix error handling cifs_user_readv · bae9f746
      Jeff Layton 提交于
      Coverity says:
      
      *** CID 1202537:  Dereference after null check  (FORWARD_NULL)
      /fs/cifs/file.c: 2873 in cifs_user_readv()
      2867     		cur_len = min_t(const size_t, len - total_read, cifs_sb->rsize);
      2868     		npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
      2869
      2870     		/* allocate a readdata struct */
      2871     		rdata = cifs_readdata_alloc(npages,
      2872     					    cifs_uncached_readv_complete);
      >>>     CID 1202537:  Dereference after null check  (FORWARD_NULL)
      >>>     Comparing "rdata" to null implies that "rdata" might be null.
      2873     		if (!rdata) {
      2874     			rc = -ENOMEM;
      2875     			goto error;
      2876     		}
      2877
      2878     		rc = cifs_read_allocate_pages(rdata, npages);
      
      ...when we "goto error", rc will be non-zero, and then we end up trying
      to do a kref_put on the rdata (which is NULL). Fix this by replacing
      the "goto error" with a "break".
      
      Reported-by: <scan-admin@coverity.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      bae9f746
    • B
      xfs: fix tmpfile/selinux deadlock and initialize security · 330033d6
      Brian Foster 提交于
      xfstests generic/004 reproduces an ilock deadlock using the tmpfile
      interface when selinux is enabled. This occurs because
      xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The
      latter eventually calls into xfs_xattr_get() which attempts to get the
      lock again. E.g.:
      
      xfs_io          D ffffffff81c134c0  4096  3561   3560 0x00000080
      ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8
      00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540
      ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480
      Call Trace:
      [<ffffffff8177f969>] schedule+0x29/0x70
      [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120
      [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0
      [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs]
      [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs]
      [<ffffffff8124842f>] generic_getxattr+0x4f/0x70
      [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650
      [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20
      [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30
      [<ffffffff81237db0>] d_instantiate+0x50/0x70
      [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0
      [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs]
      [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs]
      [<ffffffff81230388>] path_openat+0x228/0x6a0
      [<ffffffff810230f9>] ? sched_clock+0x9/0x10
      [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8123101a>] do_filp_open+0x3a/0x90
      [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210
      [<ffffffff8121e4ce>] SyS_open+0x1e/0x20
      [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b
      
      xfs_vn_tmpfile() also fails to initialize security on the newly created
      inode.
      
      Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction
      has been committed and the inode unlocked. Also, initialize security on
      the inode based on the parent directory provided via the tmpfile call.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      330033d6
    • E
      xfs: fix buffer use after free on IO error · 8d6c1210
      Eric Sandeen 提交于
      When testing exhaustion of dm snapshots, the following appeared
      with CONFIG_DEBUG_OBJECTS_FREE enabled:
      
      ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]
      
      indicating that we'd freed a buffer which still had a pending reference,
      down this path:
      
      [  190.867975]  [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270
      [  190.880820]  [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370
      [  190.892615]  [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs]
      [  190.905629]  [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs]
      [  190.911770]  [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]
      
      At issue is the fact that if IO fails in xfs_buf_iorequest,
      we'll queue completion unconditionally, and then call
      xfs_buf_rele; but if IO failed, there are no IOs remaining,
      and xfs_buf_rele will free the bp while work is still queued.
      
      Fix this by not scheduling completion if the buffer has
      an error on it; run it immediately.  The rest is only comment
      changes.
      
      Thanks to dchinner for spotting the root cause.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8d6c1210
    • D
      xfs: wrong error sign conversion during failed DIO writes · 07d5035a
      Dave Chinner 提交于
      We negate the error value being returned from a generic function
      incorrectly. The code path that it is running in returned negative
      errors, so there is no need to negate it to get the correct error
      signs here.
      
      This was uncovered by generic/019.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      07d5035a
    • D
      xfs: unmount does not wait for shutdown during unmount · 9c23eccc
      Dave Chinner 提交于
      And interesting situation can occur if a log IO error occurs during
      the unmount of a filesystem. The cases reported have the same
      signature - the update of the superblock counters fails due to a log
      write IO error:
      
      XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa08a44a1
      XFS (dm-16): Log I/O Error Detected.  Shutting down filesystem
      XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
      XFS (dm-16): xfs_log_force: error 5 returned.
      XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
      
      It can be seen that the last line of output contains a corrupt
      device name - this is because the log and xfs_mount structures have
      already been freed by the time this message is printed. A kernel
      oops closely follows.
      
      The issue is that the shutdown is occurring in a separate IO
      completion thread to the unmount. Once the shutdown processing has
      started and all the iclogs are marked with XLOG_STATE_IOERROR, the
      log shutdown code wakes anyone waiting on a log force so they can
      process the shutdown error. This wakes up the unmount code that
      is doing a synchronous transaction to update the superblock
      counters.
      
      The unmount path now sees all the iclogs are marked with
      XLOG_STATE_IOERROR and so never waits on them again, knowing that if
      it does, there will not be a wakeup trigger for it and we will hang
      the unmount if we do. Hence the unmount runs through all the
      remaining code and frees all the filesystem structures while the
      xlog_iodone() is still processing the shutdown. When the log
      shutdown processing completes, xfs_do_force_shutdown() emits the
      "Please umount the filesystem and rectify the problem(s)" message,
      and xlog_iodone() then aborts all the objects attached to the iclog.
      An iclog that has already been freed....
      
      The real issue here is that there is no serialisation point between
      the log IO and the unmount. We have serialisations points for log
      writes, log forces, reservations, etc, but we don't actually have
      any code that wakes for log IO to fully complete. We do that for all
      other types of object, so why not iclogbufs?
      
      Well, it turns out that we can easily do this. We've got xfs_buf
      handles, and that's what everyone else uses for IO serialisation.
      i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
      release the lock in xlog_iodone() when we are finished with the
      buffer. That way before we tear down the iclog, we can lock and
      unlock the buffer to ensure IO completion has finished completely
      before we tear it down.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NBob Mastors <bob.mastors@solidfire.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      9c23eccc
    • D
      xfs: collapse range is delalloc challenged · d39a2ced
      Dave Chinner 提交于
      FSX has been detecting data corruption after to collapse range
      calls. The key observation is that the offset of the last extent in
      the file was not being shifted, and hence when the file size was
      adjusted it was truncating away data because the extents handled
      been correctly shifted.
      
      Tracing indicated that before the collapse, the extent list looked
      like:
      
      ....
      ino 0x5788 state  idx 6 offset 26 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 39 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      and after the shift of 2 blocks:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 32 flag 0
      
      Note that the last extent did not change offset. After the changing
      of the file size:
      
      ino 0x5788 state  idx 6 offset 24 block 195904 count 10 flag 0
      ino 0x5788 state  idx 7 offset 37 block 195917 count 35 flag 0
      ino 0x5788 state  idx 8 offset 86 block 195964 count 30 flag 0
      
      You can see that the last extent had it's length truncated,
      indicating that we've lost data.
      
      The reason for this is that the xfs_bmap_shift_extents() loop uses
      XFS_IFORK_NEXTENTS() to determine how many extents are in the inode.
      This, unfortunately, doesn't take into account delayed allocation
      extents - it's a count of physically allocated extents - and hence
      when the file being collapsed has a delalloc extent like this one
      does prior to the range being collapsed:
      
      ....
      ino 0x5788 state  idx 4 offset 11 block 4503599627239429 count 1 flag 0
      ....
      
      it gets the count wrong and terminates the shift loop early.
      
      Fix it by using the in-memory extent array size that includes
      delayed allocation extents to determine the number of extents on the
      inode.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d39a2ced
    • D
      xfs: don't map ranges that span EOF for direct IO · 0e1f789d
      Dave Chinner 提交于
      Al Viro tracked down the problem that has caused generic/263 to fail
      on XFS since the test was introduced. If is caused by
      xfs_get_blocks() mapping a single extent that spans EOF without
      marking it as buffer-new() so that the direct IO code does not zero
      the tail of the block at the new EOF. This is a long standing bug
      that has been around for many, many years.
      
      Because xfs_get_blocks() starts the map before EOF, it can't set
      buffer_new(), because that causes he direct IO code to also zero
      unaligned sectors at the head of the IO. This would overwrite valid
      data with zeros, and hence we cannot validly return a single extent
      that spans EOF to direct IO.
      
      Fix this by detecting a mapping that spans EOF and truncate it down
      to EOF. This results in the the direct IO code doing the right thing
      for unaligned data blocks before EOF, and then returning to get
      another mapping for the region beyond EOF which XFS treats correctly
      by setting buffer_new() on it. This makes direct Io behave correctly
      w.r.t. tail block zeroing beyond EOF, and fsx is happy about that.
      
      Again, thanks to Al Viro for finding what I couldn't.
      
      [ dchinner: Fix for __divdi3 build error:
      Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Tested-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0e1f789d
    • T
      sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() · 33ac1257
      Tejun Heo 提交于
      All device_schedule_callback_owner() users are converted to use
      device_remove_file_self().  Remove now unused
      {sysfs|device}_schedule_callback_owner().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33ac1257
    • T
      kernfs: protect lazy kernfs_iattrs allocation with mutex · 4afddd60
      Tejun Heo 提交于
      kernfs_iattrs is allocated lazily when operations which require it
      take place; unfortunately, the lazy allocation and returning weren't
      properly synchronized and when there are multiple concurrent
      operations, it might end up returning kernfs_iattrs which hasn't
      finished initialization yet or different copies to different callers.
      
      Fix it by synchronizing with a mutex.  This can be smarter with memory
      barriers but let's go there if it actually turns out to be necessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/g/533ABA32.9080602@oracle.comReported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: stable@vger.kernel.org # 3.14
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4afddd60
    • T
      fs: Don't return 0 from get_anon_bdev · a2a4dc49
      Thomas Bächler 提交于
      Commit 9e30cc95 removed an internal mount. This
      has the side-effect that rootfs now has FSID 0. Many
      userspace utilities assume that st_dev in struct stat
      is never 0, so this change breaks a number of tools in
      early userspace.
      
      Since we don't know how many userspace programs are affected,
      make sure that FSID is at least 1.
      
      References: http://article.gmane.org/gmane.linux.kernel/1666905
      References: http://permalink.gmane.org/gmane.linux.utilities.util-linux-ng/8557
      Cc: 3.14 <stable@vger.kernel.org>
      Signed-off-by: NThomas Bächler <thomas@archlinux.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Tested-by: NAlexandre Demers <alexandre.f.demers@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2a4dc49
    • C
      fs: cifs: remove unused variable. · 8e3ecc87
      Cyril Roelandt 提交于
      In SMB2_set_compression(), the "res_key" variable is only initialized to NULL
      and later kfreed. It is therefore useless and should be removed.
      
      Found with the following semantic patch:
      
      <smpl>
      @@
      identifier foo;
      identifier f;
      type T;
      @@
      * f(...) {
      ...
      * T *foo = NULL;
      ... when forall
          when != foo
      * kfree(foo);
      ...
      }
      </smpl>
      Signed-off-by: NCyril Roelandt <tipecaml@gmail.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      8e3ecc87
    • S
      Return correct error on query of xattr on file with empty xattrs · 60977fcc
      Steve French 提交于
      xfstest 020 detected a problem with cifs xattr handling.  When a file
      had an empty xattr list, we returned success (with an empty xattr value)
      on query of particular xattrs rather than returning ENODATA.
      This patch fixes it so that query of an xattr returns ENODATA when the
      xattr list is empty for the file.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      60977fcc