1. 02 11月, 2017 2 次提交
    • Z
      btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c
      Zygo Blaxell 提交于
      The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
      offset (encoded as a single logical address) to a list of extent refs.
      LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
      (extent ref -> extent bytenr and offset, or logical address).  These are
      useful capabilities for programs that manipulate extents and extent
      references from userspace (e.g. dedup and defrag utilities).
      
      When the extents are uncompressed (and not encrypted and not other),
      check_extent_in_eb performs filtering of the extent refs to remove any
      extent refs which do not contain the same extent offset as the 'logical'
      parameter's extent offset.  This prevents LOGICAL_INO from returning
      references to more than a single block.
      
      To find the set of extent references to an uncompressed extent from [a, b),
      userspace has to run a loop like this pseudocode:
      
      	for (i = a; i < b; ++i)
      		extent_ref_set += LOGICAL_INO(i);
      
      At each iteration of the loop (up to 32768 iterations for a 128M extent),
      data we are interested in is collected in the kernel, then deleted by
      the filter in check_extent_in_eb.
      
      When the extents are compressed (or encrypted or other), the 'logical'
      parameter must be an extent bytenr (the 'a' parameter in the loop).
      No filtering by extent offset is done (or possible?) so the result is
      the complete set of extent refs for the entire extent.  This removes
      the need for the loop, since we get all the extent refs in one call.
      
      Add an 'ignore_offset' argument to iterate_inodes_from_logical,
      [...several levels of function call graph...], and check_extent_in_eb, so
      that we can disable the extent offset filtering for uncompressed extents.
      This flag can be set by an improved version of the LOGICAL_INO ioctl to
      get either behavior as desired.
      
      There is no functional change in this patch.  The new flag is always
      false.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor coding style fixes ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c995ab3c
    • D
      btrfs: allow to set compression level for zlib · f51d2b59
      David Sterba 提交于
      Preliminary support for setting compression level for zlib, the
      following works:
      
      $ mount -o compess=zlib                 # default
      $ mount -o compess=zlib0                # same
      $ mount -o compess=zlib9                # level 9, slower sync, less data
      $ mount -o compess=zlib1                # level 1, faster sync, more data
      $ mount -o remount,compress=zlib3	# level set by remount
      
      The compress-force works the same as compress'.  The level is visible in
      the same format in /proc/mounts. Level set via file property does not
      work yet.
      
      Required patch: "btrfs: prepare for extensions in compression options"
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f51d2b59
  2. 30 10月, 2017 8 次提交
  3. 26 9月, 2017 3 次提交
    • L
      Btrfs: fix unexpected result when dio reading corrupted blocks · 99c4e3b9
      Liu Bo 提交于
      commit 4246a0b6 ("block: add a bi_error field to struct bio")
      changed the logic of how dio read endio reports errors.
      
      For single stripe dio read, %bio->bi_status reflects the error before
      verifying checksum, and now we're updating it when data block matches
      with its checksum, while in the mismatching case, %bio->bi_status is
      not updated to relfect that.
      
      When some blocks in a file have been corrupted on disk, reading such a
      file ends up with
      
      1) checksum errors are reported in kernel log
      2) read(2) returns successfully with some content being 0x01.
      
      In order to fix it, we need to report its checksum mismatch error to
      the upper layer (dio layer in this case) as well.
      
      Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reported-by: NGoffredo Baroncelli <kreijack@inwind.it>
      Tested-by: NGoffredo Baroncelli <kreijack@inwind.it>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      99c4e3b9
    • N
      btrfs: finish ordered extent cleaning if no progress is found · 67c003f9
      Naohiro Aota 提交于
      __endio_write_update_ordered() repeats the search until it reaches the end
      of the specified range. This works well with direct IO path, because before
      the function is called, it's ensured that there are ordered extents filling
      whole the range. It's not the case, however, when it's called from
      run_delalloc_range(): it is possible to have error in the midle of the loop
      in e.g. run_delalloc_nocow(), so that there exisits the range not covered
      by any ordered extents. By cleaning such "uncomplete" range,
      __endio_write_update_ordered() stucks at offset where there're no ordered
      extents.
      
      Since the ordered extents are created from head to tail, we can stop the
      search if there are no offset progress.
      
      Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
      Cc: <stable@vger.kernel.org> # 4.12
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      67c003f9
    • N
      btrfs: clear ordered flag on cleaning up ordered extents · 63d71450
      Naohiro Aota 提交于
      Commit 52427260 ("btrfs: Handle delalloc error correctly to avoid
      ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup
      submitted ordered extents. However, it does not clear the ordered bit
      (Private2) of corresponding pages. Thus, the following BUG occurs from
      free_pages_check_bad() (on btrfs/125 with nospace_cache).
      
      BUG: Bad page state in process btrfs  pfn:3fa787
      page:ffffdf2acfe9e1c0 count:0 mapcount:0 mapping:          (null) index:0xd
      flags: 0x8000000000002008(uptodate|private_2)
      raw: 8000000000002008 0000000000000000 000000000000000d 00000000ffffffff
      raw: ffffdf2acf5c1b20 ffffb443802238b0 0000000000000000 0000000000000000
      page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
      bad because of flags: 0x2000(private_2)
      
      This patch clears the flag same as other places calling
      btrfs_dec_test_ordered_pending() for every page in the specified range.
      
      Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
      Cc: <stable@vger.kernel.org> # 4.12
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      63d71450
  4. 24 8月, 2017 1 次提交
  5. 22 8月, 2017 1 次提交
    • N
      btrfs: remove unnecessary memory barrier in btrfs_direct_IO · dc59215d
      Nikolay Borisov 提交于
      Commit 38851cc1 ("Btrfs: implement unlocked dio write") implemented
      unlocked dio write, allowing multiple dio writers to write to
      non-overlapping, and non-eof-extending regions. In doing so it also
      introduced a broken memory barrier. It is broken due to 2 things:
      
      1. Memory barriers _MUST_ always be paired, this is clearly not the case
         here
      
      2. Checkpatch actually produces a warning if a memory barrier is
         introduced that doesn't have a comment explaining how it's being
         paired.
      
      Specifically for inode::i_dio_count that's wrapped inside
      inode_dio_begin, there is no explicit barrier semantics attached, so
      removing is fine as the atomic is used in common the waiter/wakeup
      pattern.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ enhance changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dc59215d
  6. 18 8月, 2017 2 次提交
  7. 16 8月, 2017 11 次提交
  8. 17 7月, 2017 1 次提交
    • D
      VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) · bc98a42c
      David Howells 提交于
      Firstly by applying the following with coccinelle's spatch:
      
      	@@ expression SB; @@
      	-SB->s_flags & MS_RDONLY
      	+sb_rdonly(SB)
      
      to effect the conversion to sb_rdonly(sb), then by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(!sb_rdonly(SB)) && A
      	+!sb_rdonly(SB) && A
      	|
      	-A != (sb_rdonly(SB))
      	+A != sb_rdonly(SB)
      	|
      	-A == (sb_rdonly(SB))
      	+A == sb_rdonly(SB)
      	|
      	-!(sb_rdonly(SB))
      	+!sb_rdonly(SB)
      	|
      	-A && (sb_rdonly(SB))
      	+A && sb_rdonly(SB)
      	|
      	-A || (sb_rdonly(SB))
      	+A || sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) != A
      	+sb_rdonly(SB) != A
      	|
      	-(sb_rdonly(SB)) == A
      	+sb_rdonly(SB) == A
      	|
      	-(sb_rdonly(SB)) && A
      	+sb_rdonly(SB) && A
      	|
      	-(sb_rdonly(SB)) || A
      	+sb_rdonly(SB) || A
      	)
      
      	@@ expression A, B, SB; @@
      	(
      	-(sb_rdonly(SB)) ? 1 : 0
      	+sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) ? A : B
      	+sb_rdonly(SB) ? A : B
      	)
      
      to remove left over excess bracketage and finally by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(A & MS_RDONLY) != sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
      	|
      	-(A & MS_RDONLY) == sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
      	)
      
      to make comparisons against the result of sb_rdonly() (which is a bool)
      work correctly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc98a42c
  9. 15 7月, 2017 2 次提交
  10. 30 6月, 2017 3 次提交
    • Q
      btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges · bc42bda2
      Qu Wenruo 提交于
      [BUG]
      For the following case, btrfs can underflow qgroup reserved space
      at an error path:
      (Page size 4K, function name without "btrfs_" prefix)
      
               Task A                  |             Task B
      ----------------------------------------------------------------------
      Buffered_write [0, 2K)           |
      |- check_data_free_space()       |
      |  |- qgroup_reserve_data()      |
      |     Range aligned to page      |
      |     range [0, 4K)          <<< |
      |     4K bytes reserved      <<< |
      |- copy pages to page cache      |
                                       | Buffered_write [2K, 4K)
                                       | |- check_data_free_space()
                                       | |  |- qgroup_reserved_data()
                                       | |     Range alinged to page
                                       | |     range [0, 4K)
                                       | |     Already reserved by A <<<
                                       | |     0 bytes reserved      <<<
                                       | |- delalloc_reserve_metadata()
                                       | |  And it *FAILED* (Maybe EQUOTA)
                                       | |- free_reserved_data_space()
                                            |- qgroup_free_data()
                                               Range aligned to page range
                                               [0, 4K)
                                               Freeing 4K
      (Special thanks to Chandan for the detailed report and analyse)
      
      [CAUSE]
      Above Task B is freeing reserved data range [0, 4K) which is actually
      reserved by Task A.
      
      And at writeback time, page dirty by Task A will go through writeback
      routine, which will free 4K reserved data space at file extent insert
      time, causing the qgroup underflow.
      
      [FIX]
      For btrfs_qgroup_free_data(), add @reserved parameter to only free
      data ranges reserved by previous btrfs_qgroup_reserve_data().
      So in above case, Task B will try to free 0 byte, so no underflow.
      Reported-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bc42bda2
    • Q
      btrfs: qgroup: Introduce extent changeset for qgroup reserve functions · 364ecf36
      Qu Wenruo 提交于
      Introduce a new parameter, struct extent_changeset for
      btrfs_qgroup_reserved_data() and its callers.
      
      Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
      which range it reserved in current reserve, so it can free it in error
      paths.
      
      The reason we need to export it to callers is, at buffered write error
      path, without knowing what exactly which range we reserved in current
      allocation, we can free space which is not reserved by us.
      
      This will lead to qgroup reserved space underflow.
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      364ecf36
    • Q
      btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write... · a12b877b
      Qu Wenruo 提交于
      btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quotas being enabled
      
      [BUG]
      Under the following case, we can underflow qgroup reserved space.
      
                  Task A                |            Task B
      ---------------------------------------------------------------
       Quota disabled                   |
       Buffered write                   |
       |- btrfs_check_data_free_space() |
       |  *NO* qgroup space is reserved |
       |  since quota is *DISABLED*     |
       |- All pages are copied to page  |
          cache                         |
                                        | Enable quota
                                        | Quota scan finished
                                        |
                                        | Sync_fs
                                        | |- run_delalloc_range
                                        | |- Write pages
                                        | |- btrfs_finish_ordered_io
                                        |    |- insert_reserved_file_extent
                                        |       |- btrfs_qgroup_release_data()
                                        |          Since no qgroup space is
                                                   reserved in Task A, we
                                                   underflow qgroup reserved
                                                   space
      This can be detected by fstest btrfs/104.
      
      [CAUSE]
      In insert_reserved_file_extent() we tell qgroup to release the @ram_bytes
      size of qgroup reserved_space in all cases.
      And btrfs_qgroup_release_data() will check if quotas are enabled.
      
      However in the above case, the buffered write happens before quota is
      enabled, so we don't have the reserved space for that range.
      
      [FIX]
      In insert_reserved_file_extent(), we tell qgroup to release the acctual
      byte number it released.
      In the above case, since we don't have the reserved space, we tell
      qgroups to release 0 byte, so the problem can be fixed.
      
      And thanks to the @reserved parameter introduced by the qgroup rework,
      and previous patch to return released bytes, the fix can be as small as
      10 lines.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [ changelog updates ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a12b877b
  11. 22 6月, 2017 1 次提交
    • S
      btrfs: Check name_len with boundary in verify dir_item · e79a3327
      Su Yue 提交于
      Originally, verify_dir_item verifies name_len of dir_item with fixed
      values but not item boundary.
      If corrupted name_len was not bigger than the fixed value, for example
      255, the function will think the dir_item is fine. And then reading
      beyond boundary will cause crash.
      
      Example:
      	1. Corrupt one dir_item name_len to be 255.
              2. Run 'ls -lar /mnt/test/ > /dev/null'
      dmesg:
      [   48.451449] BTRFS info (device vdb1): disk space caching is enabled
      [   48.451453] BTRFS info (device vdb1): has skinny extents
      [   48.489420] general protection fault: 0000 [#1] SMP
      [   48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq
      [   48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5
      [   48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      [   48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000
      [   48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs]
      [   48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202
      [   48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000
      [   48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368
      [   48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000
      [   48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288
      [   48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20
      [   48.490008] FS:  00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
      [   48.490008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0
      [   48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   48.490008] Call Trace:
      [   48.490008]  btrfs_real_readdir+0x3b7/0x4a0 [btrfs]
      [   48.490008]  iterate_dir+0x181/0x1b0
      [   48.490008]  SyS_getdents+0xa7/0x150
      [   48.490008]  ? fillonedir+0x150/0x150
      [   48.490008]  entry_SYSCALL_64_fastpath+0x18/0xad
      [   48.490008] RIP: 0033:0x7fef5032546b
      [   48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
      [   48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b
      [   48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003
      [   48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000
      [   48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38
      [   48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e
      [   48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98
      [   48.499455] ---[ end trace 321920d8e8339505 ]---
      
      Fix it by adding a parameter @slot and check name_len with item boundary
      by calling btrfs_is_name_len_valid.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      rev
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e79a3327
  12. 21 6月, 2017 1 次提交
    • N
      percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch · 104b4e51
      Nikolay Borisov 提交于
      Currently, percpu_counter_add is a wrapper around __percpu_counter_add
      which is preempt safe due to explicit calls to preempt_disable.  Given
      how __ prefix is used in percpu related interfaces, the naming
      unfortunately creates the false sense that __percpu_counter_add is
      less safe than percpu_counter_add.  In terms of context-safety,
      they're equivalent.  The only difference is that the __ version takes
      a batch parameter.
      
      Make this a bit more explicit by just renaming __percpu_counter_add to
      percpu_counter_add_batch.
      
      This patch doesn't cause any functional changes.
      
      tj: Minor updates to patch description for clarity.  Cosmetic
          indentation updates.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: linux-mm@kvack.org
      Cc: "David S. Miller" <davem@davemloft.net>
      104b4e51
  13. 20 6月, 2017 4 次提交