1. 10 6月, 2014 21 次提交
    • M
      Btrfs: reclaim the reserved metadata space at background · 21c7e756
      Miao Xie 提交于
      Before applying this patch, the task had to reclaim the metadata space
      by itself if the metadata space was not enough. And When the task started
      the space reclamation, all the other tasks which wanted to reserve the
      metadata space were blocked. At some cases, they would be blocked for
      a long time, it made the performance fluctuate wildly.
      
      So we introduce the background metadata space reclamation, when the space
      is about to be exhausted, we insert a reclaim work into the workqueue, the
      worker of the workqueue helps us to reclaim the reserved space at the
      background. By this way, the tasks needn't reclaim the space by themselves at
      most cases, and even if the tasks have to reclaim the space or are blocked
      for the space reclamation, they will get enough space more quickly.
      
      Here is my test result(Tested by compilebench):
       Memory:	2GB
       CPU:		2Cores * 1CPU
       Partition:	40GB(SSD)
      
      Test command:
       # compilebench -D <mnt> -m
      
      Without this patch:
       intial create total runs 30 avg 54.36 MB/s (user 0.52s sys 2.44s)
       compile total runs 30 avg 123.72 MB/s (user 0.13s sys 1.17s)
       read compiled tree total runs 3 avg 81.15 MB/s (user 0.74s sys 4.89s)
       delete compiled tree total runs 30 avg 5.32 seconds (user 0.35s sys 4.37s)
      
      With this patch:
       intial create total runs 30 avg 59.80 MB/s (user 0.52s sys 2.53s)
       compile total runs 30 avg 151.44 MB/s (user 0.13s sys 1.11s)
       read compiled tree total runs 3 avg 83.25 MB/s (user 0.76s sys 4.91s)
       delete compiled tree total runs 30 avg 5.29 seconds (user 0.34s sys 4.34s)
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      21c7e756
    • M
      Btrfs: output warning instead of error when loading free space cache failed · 32d6b47f
      Miao Xie 提交于
      If we fail to load a free space cache, we can rebuild it from the extent tree,
      so it is not a serious error, we should not output a error message that
      would make the users uncomfortable. This patch uses warning message instead
      of it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      32d6b47f
    • Q
      btrfs: Add ctime/mtime update for btrfs device add/remove. · 5a1972bd
      Qu Wenruo 提交于
      Btrfs will send uevent to udev inform the device change,
      but ctime/mtime for the block device inode is not udpated, which cause
      libblkid used by btrfs-progs unable to detect device change and use old
      cache, causing 'btrfs dev scan; btrfs dev rmove; btrfs dev scan' give an
      error message.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Cc: Karel Zak <kzak@redhat.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5a1972bd
    • D
      btrfs: assert that send is not in progres before root deletion · 61155aa0
      David Sterba 提交于
      CC: Miao Xie <miaox@cn.fujitsu.com>
      CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      61155aa0
    • D
      btrfs: protect snapshots from deleting during send · 521e0546
      David Sterba 提交于
      The patch "Btrfs: fix protection between send and root deletion"
      (18f687d5) does not actually prevent to delete the snapshot
      and just takes care during background cleaning, but this seems rather
      user unfriendly, this patch implements the idea presented in
      
      http://www.spinics.net/lists/linux-btrfs/msg30813.html
      
      - add an internal root_item flag to denote a dead root
      - check if the send_in_progress is set and refuse to delete, otherwise
        set the flag and proceed
      - check the flag in send similar to the btrfs_root_readonly checks, for
        all involved roots
      
      The root lookup in send via btrfs_read_fs_root_no_name will check if the
      root is really dead or not. If it is, ENOENT, aborted send. If it's
      alive, it's protected by send_in_progress, send can continue.
      
      CC: Miao Xie <miaox@cn.fujitsu.com>
      CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      521e0546
    • D
      btrfs: remove redundant null check in btrfs_dentry_release() · 944a4515
      Daeseok Youn 提交于
      It doesn't need to check NULL for kfree()
      Signed-off-by: NDaeseok Youn <daeseok.youn@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      944a4515
    • F
      Btrfs: implement inode_operations callback tmpfile · ef3b9af5
      Filipe Manana 提交于
      This implements the tmpfile callback of struct inode_operations, introduced
      in the linux kernel 3.11, and implemented already by some filesystems. This
      callback is invoked by the VFS when the flag O_TMPFILE is passed to the open
      system call.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      ef3b9af5
    • D
      btrfs: make FS_INFO ioctl available to anyone · e4ef90ff
      David Sterba 提交于
      This ioctl provides basic info about the filesystem that can be obtained
      in other ways (eg. sysfs), there's no reason to restrict it to
      CAP_SYSADMIN.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      e4ef90ff
    • D
      btrfs: make DEV_INFO ioctl available to anyone · 7d6213c5
      David Sterba 提交于
      This ioctl provides basic info about the devices that can be obtained in
      other ways (eg. sysfs), there's no reason to restrict it to
      CAP_SYSADMIN.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      7d6213c5
    • D
      btrfs: export more from FS_INFO to sysfs · df93589a
      David Sterba 提交于
      Similar to the FS_INFO updates, export the basic filesystem info through
      sysfs: node size, sector size and clone alignment.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      df93589a
    • D
      btrfs: retrieve more info from FS_INFO ioctl · 80a773fb
      David Sterba 提交于
      Provide the basic information about filesystem through the ioctl:
      * b-tree node size (same as leaf size)
      * sector size
      * expected alignment of CLONE_RANGE and EXTENT_SAME ioctl arguments
      
      Backward compatibility: if the values are 0, kernel does not provide
      this information, the applications should ignore them.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      80a773fb
    • D
      btrfs: balance filter: add limit of processed chunks · 7d824b6f
      David Sterba 提交于
      This started as debugging helper, to watch the effects of converting
      between raid levels on multiple devices, but could be useful standalone.
      
      In my case the usage filter was not finegrained enough and led to
      converting too many chunks at once. Another example use is in connection
      with drange+devid or vrange filters that allow to work with a specific
      chunk or even with a chunk on a given device.
      
      The limit filter applies last, the value of 0 means no limiting.
      
      CC: Ilya Dryomov <idryomov@gmail.com>
      CC: Hugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      7d824b6f
    • F
      Btrfs: fix leaf corruption caused by ENOSPC while hole punching · fc19c5e7
      Filipe Manana 提交于
      While running a stress test with multiple threads writing to the same btrfs
      file system, I ended up with a situation where a leaf was corrupted in that
      it had 2 file extent item keys that had the same exact key. I was able to
      detect this quickly thanks to the following patch which triggers an assertion
      as soon as a leaf is marked dirty if there are duplicated keys or out of order
      keys:
      
          Btrfs: check if items are ordered when a leaf is marked dirty
          (https://patchwork.kernel.org/patch/3955431/)
      
      Basically while running the test, I got the following in dmesg:
      
          [28877.415877] WARNING: CPU: 2 PID: 10706 at fs/btrfs/file.c:553 btrfs_drop_extent_cache+0x435/0x440 [btrfs]()
          (...)
          [28877.415917] Call Trace:
          [28877.415922]  [<ffffffff816f1189>] dump_stack+0x4e/0x68
          [28877.415926]  [<ffffffff8104a32c>] warn_slowpath_common+0x8c/0xc0
          [28877.415929]  [<ffffffff8104a37a>] warn_slowpath_null+0x1a/0x20
          [28877.415944]  [<ffffffffa03775a5>] btrfs_drop_extent_cache+0x435/0x440 [btrfs]
          [28877.415949]  [<ffffffff8118e7be>] ? kmem_cache_alloc+0xfe/0x1c0
          [28877.415962]  [<ffffffffa03777d9>] fill_holes+0x229/0x3e0 [btrfs]
          [28877.415972]  [<ffffffffa0345865>] ? block_rsv_add_bytes+0x55/0x80 [btrfs]
          [28877.415984]  [<ffffffffa03792cb>] btrfs_fallocate+0xb6b/0xc20 [btrfs]
          (...)
          [29854.132560] BTRFS critical (device sdc): corrupt leaf, bad key order: block=955232256,root=1, slot=24
          [29854.132565] BTRFS info (device sdc): leaf 955232256 total ptrs 40 free space 778
          (...)
          [29854.132637] 	item 23 key (3486 108 667648) itemoff 2694 itemsize 53
          [29854.132638] 		extent data disk bytenr 14574411776 nr 286720
          [29854.132639] 		extent data offset 0 nr 286720 ram 286720
          [29854.132640] 	item 24 key (3486 108 954368) itemoff 2641 itemsize 53
          [29854.132641] 		extent data disk bytenr 0 nr 0
          [29854.132643] 		extent data offset 0 nr 0 ram 0
          [29854.132644] 	item 25 key (3486 108 954368) itemoff 2588 itemsize 53
          [29854.132645] 		extent data disk bytenr 8699670528 nr 77824
          [29854.132646] 		extent data offset 0 nr 77824 ram 77824
          [29854.132647] 	item 26 key (3486 108 1146880) itemoff 2535 itemsize 53
          [29854.132648] 		extent data disk bytenr 8699670528 nr 77824
          [29854.132649] 		extent data offset 0 nr 77824 ram 77824
          (...)
          [29854.132707] kernel BUG at fs/btrfs/ctree.h:3901!
          (...)
          [29854.132771] Call Trace:
          [29854.132779]  [<ffffffffa0342b5c>] setup_items_for_insert+0x2dc/0x400 [btrfs]
          [29854.132791]  [<ffffffffa0378537>] __btrfs_drop_extents+0xba7/0xdd0 [btrfs]
          [29854.132794]  [<ffffffff8109c0d6>] ? trace_hardirqs_on_caller+0x16/0x1d0
          [29854.132797]  [<ffffffff8109c29d>] ? trace_hardirqs_on+0xd/0x10
          [29854.132800]  [<ffffffff8118e7be>] ? kmem_cache_alloc+0xfe/0x1c0
          [29854.132810]  [<ffffffffa036783b>] insert_reserved_file_extent.constprop.66+0xab/0x310 [btrfs]
          [29854.132820]  [<ffffffffa036a6c6>] __btrfs_prealloc_file_range+0x116/0x340 [btrfs]
          [29854.132830]  [<ffffffffa0374d53>] btrfs_prealloc_file_range+0x23/0x30 [btrfs]
          (...)
      
      So this is caused by getting an -ENOSPC error while punching a file hole, more
      specifically, we get -ENOSPC error from __btrfs_drop_extents in the while loop
      of file.c:btrfs_punch_hole() when it's unable to modify the btree to delete one
      or more file extent items due to lack of enough free space. When this happens,
      in btrfs_punch_hole(), we attempt to reclaim free space by switching our transaction
      block reservation object to root->fs_info->trans_block_rsv, end our transaction and
      start a new transaction basically - and, we keep increasing our current offset
      (cur_offset) as long as it's smaller than the end of the target range (lockend) -
      this makes use leave the loop with cur_offset == drop_end which in turn makes us
      call fill_holes() for inserting a file extent item that represents a 0 bytes range
      hole (and this insertion succeeds, as in the meanwhile more space became available).
      
      This 0 bytes file hole extent item is a problem because any subsequent caller of
      __btrfs_drop_extents (regular file writes, or fallocate calls for e.g.), with a
      start file offset that is equal to the offset of the hole, will not remove this
      extent item due to the following conditional in the while loop of
      __btrfs_drop_extents:
      
          if (extent_end <= search_start) {
                  path->slots[0]++;
                  goto next_slot;
          }
      
      This later makes the call to setup_items_for_insert() (at the very end of
      __btrfs_drop_extents), insert a new file extent item with the same offset as
      the 0 bytes file hole extent item that follows it. Needless is to say that this
      causes chaos, either when reading the leaf from disk (btree_readpage_end_io_hook),
      where we perform leaf sanity checks or in subsequent operations that manipulate
      file extent items, as in the fallocate call as shown by the dmesg trace above.
      
      Without my other patch to perform the leaf sanity checks once a leaf is marked
      as dirty (if the integrity checker is enabled), it would have been much harder
      to debug this issue.
      
      This change might fix a few similar issues reported by users in the mailing
      list regarding assertion failures in btrfs_set_item_key_safe calls performed
      by __btrfs_drop_extents, such as the following report:
      
          http://comments.gmane.org/gmane.comp.file-systems.btrfs/32938
      
      Asking fill_holes() to create a 0 bytes wide file hole item also produced the
      first warning in the trace above, as we passed a range to btrfs_drop_extent_cache
      that has an end smaller (by -1) than its start.
      
      On 3.14 kernels this issue manifests itself through leaf corruption, as we get
      duplicated file extent item keys in a leaf when calling setup_items_for_insert(),
      but on older kernels, setup_items_for_insert() isn't called by __btrfs_drop_extents(),
      instead we have callers of __btrfs_drop_extents(), namely the functions
      inode.c:insert_inline_extent() and inode.c:insert_reserved_file_extent(), calling
      btrfs_insert_empty_item() to insert the new file extent item, which would fail with
      error -EEXIST, instead of inserting a duplicated key - which is still a serious
      issue as it would make all similar file extent item replace operations keep
      failing if they target the same file range.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      fc19c5e7
    • L
      Btrfs: do not increment on bio_index one by one · d2cbf2a2
      Liu Bo 提交于
      'bio_index' is just a index, it's really not necessary to do increment
      one by one.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      d2cbf2a2
    • F
      Btrfs: read inode size after acquiring the mutex when punching a hole · a1a50f60
      Filipe Manana 提交于
      In a previous change, commit 12870f1c,
      I accidentally moved the roundup of inode->i_size to outside of the
      critical section delimited by the inode mutex, which is not atomic and
      not correct since the size can be changed by other task before we acquire
      the mutex. Therefore fix it.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      a1a50f60
    • T
      btrfs: Remove unnecessary check for NULL · 7fb18a06
      Tobias Klauser 提交于
      iput() already checks for the inode being NULL, thus it's unnecessary to
      check before calling.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NChris Mason <clm@fb.com>
      7fb18a06
    • Z
      btrfs: fix inline compressed read err corruption · 166ae5a4
      Zach Brown 提交于
      uncompress_inline() is dropping the error from btrfs_decompress() after
      testing it and zeroing the page that was supposed to hold decompressed
      data.  This can silently turn compressed inline data in to zeros if
      decompression fails due to corrupt compressed data or memory allocation
      failure.
      
      I verified this by manually forcing the error from btrfs_decompress()
      for a silly named copy of od:
      
      	if (!strcmp(current->comm, "failod"))
      		ret = -ENOMEM;
      
        # od -x /mnt/btrfs/dir/80 | head -1
        0000000 3031 3038 310a 2d30 6f70 6e69 0a74 3031
        # echo 3 > /proc/sys/vm/drop_caches
        # cp $(which od) /tmp/failod
        # /tmp/failod -x /mnt/btrfs/dir/80 | head -1
        0000000 0000 0000 0000 0000 0000 0000 0000 0000
      
      The fix is to pass the error to its caller.  Which still has a BUG_ON().
      So we fix that too.
      
      There seems to be no reason for the zeroing of the page on the error
      from btrfs_decompress() but not from the allocation error a few lines
      above.  So the page zeroing is removed.
      Signed-off-by: NZach Brown <zab@redhat.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      166ae5a4
    • Z
      btrfs: return ptr error from compression workspace · 774bcb35
      Zach Brown 提交于
      The btrfs compression wrappers translated errors from workspace
      allocation to either -ENOMEM or -1.  The compression type workspace
      allocators are already returning a ERR_PTR(-ENOMEM).  Just return that
      and get rid of the magical -1.
      
      This helps a future patch return errors from the compression wrappers.
      Signed-off-by: NZach Brown <zab@redhat.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      774bcb35
    • Z
      btrfs: return errno instead of -1 from compression · 60e1975a
      Zach Brown 提交于
      The compression layer seems to have been built to return -1 and have
      callers make up errors that make sense.  This isn't great because there
      are different errors that originate down in the compression layer.
      
      Let's return real negative errnos from the compression layer so that
      callers can pass on the error without having to guess what happened.
      ENOMEM for allocation failure, E2BIG when compression exceeds the
      uncompressed input, and EIO for everything else.
      
      This helps a future path return errors from btrfs_decompress().
      Signed-off-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      60e1975a
    • S
      btrfs: check_int: propagate out-of-memory error upwards · 98806b44
      Stefan Behrens 提交于
      This issue was not causing any harm but IMO (and in the opinion of the
      static code checker) it is better to propagate this error status upwards.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      98806b44
    • F
      Btrfs: fix hang on error (such as ENOSPC) when writing extent pages · 61391d56
      Filipe Manana 提交于
      When running low on available disk space and having several processes
      doing buffered file IO, I got the following trace in dmesg:
      
      [ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
      [ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
      [ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
      [ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
      [ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
      [ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
      [ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
      [ 4202.720922] Call Trace:
      [ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
      [ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
      [ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
      [ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
      [ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
      [ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
      (...)
      [ 4202.721027] 2 locks held by kworker/u8:1/5450:
      [ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
      [ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
      [ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
      [ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
      [ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
      [ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
      [ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
      [ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
      [ 4202.721718] Call Trace:
      [ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
      [ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
      [ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
      [ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
      [ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
      [ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
      [ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
      [ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
      [ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
      [ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
      [ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
      (...)
      
      It turns out that extent_io.c:__extent_writepage(), which ends up being called
      through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
      -ENOSPC when calling the fill_delalloc callback. In this situation, it returned
      without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
      ever being called for the respective page, which prevents the ordered extent's
      bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
      is never queued into the endio_write_workers queue. This makes the task that
      called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
      extent.
      
      This is fairly easy to reproduce using a small filesystem and fsstress on
      a quad core vm:
      
          mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
          mount /dev/sdd /mnt
      
          fsstress -p 6 -d /mnt -n 100000 -x \
              "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
      	    -f allocsp=0 \
      	    -f bulkstat=0 \
      	    -f bulkstat1=0 \
      	    -f chown=0 \
      	    -f creat=1 \
      	    -f dread=0 \
      	    -f dwrite=0 \
      	    -f fallocate=1 \
      	    -f fdatasync=0 \
      	    -f fiemap=0 \
      	    -f freesp=0 \
      	    -f fsync=0 \
      	    -f getattr=0 \
      	    -f getdents=0 \
      	    -f link=0 \
      	    -f mkdir=0 \
      	    -f mknod=0 \
      	    -f punch=1 \
      	    -f read=0 \
      	    -f readlink=0 \
      	    -f rename=0 \
      	    -f resvsp=0 \
      	    -f rmdir=0 \
      	    -f setxattr=0 \
      	    -f stat=0 \
      	    -f symlink=0 \
      	    -f sync=0 \
      	    -f truncate=1 \
      	    -f unlink=0 \
      	    -f unresvsp=0 \
      	    -f write=4
      
      So just ensure that if an error happens while writing the extent page
      we call the writepage_end_io_hook callback. Also make it return the
      error code and ensure the caller (extent_write_cache_pages) processes
      all pages in the page vector even if an error happens only for some
      of them, so that ordered extents end up released.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      61391d56
  2. 07 6月, 2014 1 次提交
    • F
      Btrfs: send, fix corrupted path strings for long paths · 01a9a8a9
      Filipe Manana 提交于
      If a path has more than 230 characters, we allocate a new buffer to
      use for the path, but we were forgotting to copy the contents of the
      previous buffer into the new one, which has random content from the
      kmalloc call.
      
      Test:
      
          mkfs.btrfs -f /dev/sdd
          mount /dev/sdd /mnt
      
          TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#"
          mkdir -p $TEST_PATH
          echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt
      
          btrfs subvolume snapshot -r /mnt /mnt/mysnap1
          btrfs send /mnt/mysnap1 -f /tmp/1.snap
      
      A test for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Cc: Marc Merlin <marc@merlins.org>
      Tested-by: NMarc MERLIN <marc@merlins.org>
      Signed-off-by: NChris Mason <clm@fb.com>
      01a9a8a9
  3. 21 5月, 2014 2 次提交
    • F
      Btrfs: send, fix incorrect ref access when using extrefs · 51a60253
      Filipe Manana 提交于
      When running send, if an inode only has extended reference items
      associated to it and no regular references, send.c:get_first_ref()
      was incorrectly assuming the reference it found was of type
      BTRFS_INODE_REF_KEY due to use of the wrong key variable.
      This caused weird behaviour when using the found item has a regular
      reference, such as weird path string, and occasionally (when lucky)
      a crash:
      
      [  190.600652] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [  190.600994] Modules linked in: btrfs xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc psmouse serio_raw evbug pcspkr i2c_piix4 e1000 floppy
      [  190.602565] CPU: 2 PID: 14520 Comm: btrfs Not tainted 3.13.0-fdm-btrfs-next-26+ #1
      [  190.602728] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  190.602868] task: ffff8800d447c920 ti: ffff8801fa79e000 task.ti: ffff8801fa79e000
      [  190.603030] RIP: 0010:[<ffffffff813266b4>]  [<ffffffff813266b4>] memcpy+0x54/0x110
      [  190.603262] RSP: 0018:ffff8801fa79f880  EFLAGS: 00010202
      [  190.603395] RAX: ffff8800d4326e3f RBX: 000000000000036a RCX: ffff880000000000
      [  190.603553] RDX: 000000000000032a RSI: ffe708844042936a RDI: ffff8800d43271a9
      [  190.603710] RBP: ffff8801fa79f8c8 R08: 00000000003a4ef0 R09: 0000000000000000
      [  190.603867] R10: 793a4ef09f000000 R11: 9f0000000053726f R12: ffff8800d43271a9
      [  190.604020] R13: 0000160000000000 R14: ffff8802110134f0 R15: 000000000000036a
      [  190.604020] FS:  00007fb423d09b80(0000) GS:ffff880216200000(0000) knlGS:0000000000000000
      [  190.604020] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  190.604020] CR2: 00007fb4229d4b78 CR3: 00000001f5d76000 CR4: 00000000000006e0
      [  190.604020] Stack:
      [  190.604020]  ffffffffa01f4d49 ffff8801fa79f8f0 00000000000009f9 ffff8801fa79f8c8
      [  190.604020]  00000000000009f9 ffff880211013260 000000000000f971 ffff88021147dba8
      [  190.604020]  00000000000009f9 ffff8801fa79f918 ffffffffa02367f5 ffff8801fa79f928
      [  190.604020] Call Trace:
      [  190.604020]  [<ffffffffa01f4d49>] ? read_extent_buffer+0xb9/0x120 [btrfs]
      [  190.604020]  [<ffffffffa02367f5>] fs_path_add_from_extent_buffer+0x45/0x60 [btrfs]
      [  190.604020]  [<ffffffffa0238806>] get_first_ref+0x1f6/0x210 [btrfs]
      [  190.604020]  [<ffffffffa0238994>] __get_cur_name_and_parent+0x174/0x3a0 [btrfs]
      [  190.604020]  [<ffffffff8118df3d>] ? kmem_cache_alloc_trace+0x11d/0x1e0
      [  190.604020]  [<ffffffffa0236674>] ? fs_path_alloc+0x24/0x60 [btrfs]
      [  190.604020]  [<ffffffffa0238c91>] get_cur_path+0xd1/0x240 [btrfs]
      (...)
      
      Steps to reproduce (either crash or some weirdness like an odd path string):
      
          mkfs.btrfs -f -O extref /dev/sdd
          mount /dev/sdd /mnt
      
          mkdir /mnt/testdir
          touch /mnt/testdir/foobar
      
          for i in `seq 1 2550`; do
              ln /mnt/testdir/foobar /mnt/testdir/foobar_link_`printf "%04d" $i`
          done
      
          ln /mnt/testdir/foobar /mnt/testdir/final_foobar_name
      
          rm -f /mnt/testdir/foobar
          for i in `seq 1 2550`; do
              rm -f /mnt/testdir/foobar_link_`printf "%04d" $i`
          done
      
          btrfs subvolume snapshot -r /mnt /mnt/mysnap
          btrfs send /mnt/mysnap -f /tmp/mysnap.send
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      51a60253
    • L
      Btrfs: fix EIO on reading file after ioctl clone works on it · d3ecfcdf
      Liu Bo 提交于
      For inline data extent, we need to make its length aligned, otherwise,
      we can get a phantom extent map which confuses readpages() to return -EIO.
      
      This can be detected by xfstests/btrfs/035.
      Reported-by: NDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d3ecfcdf
  4. 26 4月, 2014 1 次提交
    • C
      Btrfs: limit the path size in send to PATH_MAX · cfd4a535
      Chris Mason 提交于
      fs_path_ensure_buf is used to make sure our path buffers for
      send are big enough for the path names as we construct them.
      The buffer size is limited to 32K by the length field in
      the struct.
      
      But bugs in the path construction can end up trying to build
      a huge buffer, and we'll do invalid memmmoves when the
      buffer length field wraps.
      
      This patch is step one, preventing the overflows.
      Signed-off-by: NChris Mason <clm@fb.com>
      cfd4a535
  5. 25 4月, 2014 8 次提交
    • F
      Btrfs: correctly set profile flags on seqlock retry · f8213bdc
      Filipe Manana 提交于
      If we had to retry on the profiles seqlock (due to a concurrent write), we
      would set bits on the input flags that corresponded both to the current
      profile and to previous values of the profile.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f8213bdc
    • F
      Btrfs: use correct key when repeating search for extent item · 9ce49a0b
      Filipe Manana 提交于
      If skinny metadata is enabled and our first tree search fails to find a
      skinny extent item, we may repeat a tree search for a "fat" extent item
      (if the previous item in the leaf is not the "fat" extent we're looking
      for). However we were not setting the new key's objectid to the right
      value, as we previously used the same key variable to peek at the previous
      item in the leaf, which has a different objectid. So just set the right
      objectid to avoid modifying/deleting a wrong item if we repeat the tree
      search.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9ce49a0b
    • M
      Btrfs: fix inode caching vs tree log · 1c70d8fb
      Miao Xie 提交于
      Currently, with inode cache enabled, we will reuse its inode id immediately
      after unlinking file, we may hit something like following:
      
      |->iput inode
      |->return inode id into inode cache
      |->create dir,fsync
      |->power off
      
      An easy way to reproduce this problem is:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt -o inode_cache,commit=100
      dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
      inode_id=`ls -i /mnt/data | awk '{print $1}'`
      rm -f /mnt/data
      
      i=1
      while [ 1 ]
      do
              mkdir /mnt/dir_$i
              test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
              if [ $test1 -eq $inode_id ]
              then
      		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
      		echo b > /proc/sysrq-trigger
      	fi
      	sleep 1
              i=$(($i+1))
      done
      
      mount /dev/sdb /mnt
      umount /dev/sdb
      btrfs check /dev/sdb
      
      We fix this problem by adding unlinked inode's id into pinned tree,
      and we can not reuse them until committing transaction.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1c70d8fb
    • W
      Btrfs: fix possible memory leaks in open_ctree() · 28c16cbb
      Wang Shilong 提交于
      Fix possible memory leaks in the following error handling paths:
      
      read_tree_block()
      btrfs_recover_log_trees
      btrfs_commit_super()
      btrfs_find_orphan_roots()
      btrfs_cleanup_fs_roots()
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      28c16cbb
    • W
      Btrfs: avoid triggering bug_on() when we fail to start inode caching task · e60efa84
      Wang Shilong 提交于
      When running stress test(including snapshots,balance,fstress), we trigger
      the following BUG_ON() which is because we fail to start inode caching task.
      
      [  181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
      [  181.137963] invalid opcode: 0000 [#1] SMP
      [  181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
      [  181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
      [  181.367506] Call Trace:
      [  181.371107]  [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs]
      [  181.379191]  [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
      [  181.387464]  [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40
      [  181.395642]  [<ffffffff811dc5fe>] evict+0x9e/0x190
      [  181.401882]  [<ffffffff811dcde3>] iput+0xf3/0x180
      [  181.408025]  [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
      [  181.416614]  [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
      [  181.425399]  [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
      [  181.435059]  [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
      [  181.444148]  [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs]
      [  181.451971]  [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80
      [  181.459509]  [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520
      [  181.467046]  [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0
      [  181.474393]  [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0
      [  181.481450]  [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0
      [  181.488021]  [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b
      
      We should avoid triggering BUG_ON() here, instead, we output warning messages
      and clear inode_cache option.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e60efa84
    • W
      9d89ce65
    • D
      btrfs: replace error code from btrfs_drop_extents · 3f9e3df8
      David Sterba 提交于
      There's a case which clone does not handle and used to BUG_ON instead,
      (testcase xfstests/btrfs/035), now returns EINVAL. This error code is
      confusing to the ioctl caller, as it normally signifies errorneous
      arguments.
      
      Change it to ENOPNOTSUPP which allows a fall back to copy instead of
      clone. This does not affect the common reflink operation.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      3f9e3df8
    • Q
      btrfs: Change the hole range to a more accurate value. · c5f7d0bb
      Qu Wenruo 提交于
      Commit 3ac0d7b9 fixed the btrfs expanding
      write problem but the hole punched is sometimes too large for some
      iovec, which has unmapped data ranges.
      This patch will change to hole range to a more accurate value using the
      counts checked by the write check routines.
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c5f7d0bb
  6. 15 4月, 2014 1 次提交
  7. 11 4月, 2014 2 次提交
    • W
      Btrfs: fix compile warnings on on avr32 platform · e4fbaee2
      Wang Shilong 提交于
      fs/btrfs/scrub.c: In function 'get_raid56_logic_offset':
      fs/btrfs/scrub.c:2269: warning: comparison of distinct pointer types lacks a cast
      fs/btrfs/scrub.c:2269: warning: right shift count >= width of type
      fs/btrfs/scrub.c:2269: warning: passing argument 1 of '__div64_32' from incompatible pointer type
      
      Since @rot is an int type, we should not use do_div(), fix it.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e4fbaee2
    • H
      btrfs: allow mounting btrfs subvolumes with different ro/rw options · 0723a047
      Harald Hoyer 提交于
      Given the following /etc/fstab entries:
      
      /dev/sda3 /mnt/foo btrfs subvol=foo,ro 0 0
      /dev/sda3 /mnt/bar btrfs subvol=bar,rw 0 0
      
      you can't issue:
      
      $ mount /mnt/foo
      $ mount /mnt/bar
      
      You would have to do:
      
      $ mount /mnt/foo
      $ mount -o remount,rw /mnt/foo
      $ mount --bind -o remount,ro /mnt/foo
      $ mount /mnt/bar
      
      or
      
      $ mount /mnt/bar
      $ mount --rw /mnt/foo
      $ mount --bind -o remount,ro /mnt/foo
      
      With this patch you can do
      
      $ mount /mnt/foo
      $ mount /mnt/bar
      
      $ cat /proc/self/mountinfo
      49 33 0:41 /foo /mnt/foo ro,relatime shared:36 - btrfs /dev/sda3 rw,ssd,space_cache
      87 33 0:41 /bar /mnt/bar rw,relatime shared:74 - btrfs /dev/sda3 rw,ssd,space_cache
      Signed-off-by: NChris Mason <clm@fb.com>
      0723a047
  8. 08 4月, 2014 4 次提交