1. 06 8月, 2018 11 次提交
  2. 19 7月, 2018 1 次提交
    • F
      Btrfs: fix file data corruption after cloning a range and fsync · bd3599a0
      Filipe Manana 提交于
      When we clone a range into a file we can end up dropping existing
      extent maps (or trimming them) and replacing them with new ones if the
      range to be cloned overlaps with a range in the destination inode.
      When that happens we add the new extent maps to the list of modified
      extents in the inode's extent map tree, so that a "fast" fsync (the flag
      BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
      and log corresponding extent items. However, at the end of range cloning
      operation we do truncate all the pages in the affected range (in order to
      ensure future reads will not get stale data). Sometimes this truncation
      will release the corresponding extent maps besides the pages from the page
      cache. If this happens, then a "fast" fsync operation will miss logging
      some extent items, because it relies exclusively on the extent maps being
      present in the inode's extent tree, leading to data loss/corruption if
      the fsync ends up using the same transaction used by the clone operation
      (that transaction was not committed in the meanwhile). An extent map is
      released through the callback btrfs_invalidatepage(), which gets called by
      truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
      later ends up calling try_release_extent_mapping() which will release the
      extent map if some conditions are met, like the file size being greater
      than 16Mb, gfp flags allow blocking and the range not being locked (which
      is the case during the clone operation) nor being the extent map flagged
      as pinned (also the case for cloning).
      
      The following example, turned into a test for fstests, reproduces the
      issue:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
        $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
      
        $ xfs_io -c "fsync" /mnt/bar
        # reflink destination offset corresponds to the size of file bar,
        # 2728Kb minus 4Kb.
        $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
        $ xfs_io -c "fsync" /mnt/bar
      
        $ md5sum /mnt/bar
        95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar
      
        <power fail>
      
        $ mount /dev/sdb /mnt
        $ md5sum /mnt/bar
        207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
        # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
        # power failure
      
      In the above example, the destination offset of the clone operation
      corresponds to the size of the "bar" file minus 4Kb. So during the clone
      operation, the extent map covering the range from 2572Kb to 2728Kb gets
      trimmed so that it ends at offset 2724Kb, and a new extent map covering
      the range from 2724Kb to 11724Kb is created. So at the end of the clone
      operation when we ask to truncate the pages in the range from 2724Kb to
      2724Kb + 15908Kb, the page invalidation callback ends up removing the new
      extent map (through try_release_extent_mapping()) when the page at offset
      2724Kb is passed to that callback.
      
      Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
      map is removed at try_release_extent_mapping(), forcing the next fsync to
      search for modified extents in the fs/subvolume tree instead of relying on
      the presence of extent maps in memory. This way we can continue doing a
      "fast" fsync if the destination range of a clone operation does not
      overlap with an existing range or if any of the criteria necessary to
      remove an extent map at try_release_extent_mapping() is not met (file
      size not bigger then 16Mb or gfp flags do not allow blocking).
      
      CC: stable@vger.kernel.org # 3.16+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bd3599a0
  3. 17 7月, 2018 1 次提交
    • Q
      btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() · 665d4953
      Qu Wenruo 提交于
      In commit ac0b4145 ("btrfs: scrub: Don't use inode pages for device
      replace") we removed the branch of copy_nocow_pages() to avoid
      corruption for compressed nodatasum extents.
      
      However above commit only solves the problem in scrub_extent(), if
      during scrub_pages() we failed to read some pages,
      sctx->no_io_error_seen will be non-zero and we go to fixup function
      scrub_handle_errored_block().
      
      In scrub_handle_errored_block(), for sctx without csum (no matter if
      we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
      which does the similar thing with copy_nocow_pages(), but does it
      without the extra check in copy_nocow_pages() routine.
      
      So for test cases like btrfs/100, where we emulate read errors during
      replace/scrub, we could corrupt compressed extent data again.
      
      This patch will fix it just by avoiding any "optimization" for
      nodatasum, just falls back to the normal fixup routine by try read from
      any good copy.
      
      This also solves WARN_ON() or dead lock caused by lame backref iteration
      in scrub_fixup_nodatasum() routine.
      
      The deadlock or WARN_ON() won't be triggered before commit ac0b4145
      ("btrfs: scrub: Don't use inode pages for device replace") since
      copy_nocow_pages() have better locking and extra check for data extent,
      and it's already doing the fixup work by try to read data from any good
      copy, so it won't go scrub_fixup_nodatasum() anyway.
      
      This patch disables the faulty code and will be removed completely in a
      followup patch.
      
      Fixes: ac0b4145 ("btrfs: scrub: Don't use inode pages for device replace")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      665d4953
  4. 13 7月, 2018 2 次提交
  5. 28 6月, 2018 3 次提交
    • F
      Btrfs: fix mount failure when qgroup rescan is in progress · e4e7ede7
      Filipe Manana 提交于
      If a power failure happens while the qgroup rescan kthread is running,
      the next mount operation will always fail. This is because of a recent
      regression that makes qgroup_rescan_init() incorrectly return -EINVAL
      when we are mounting the filesystem (through btrfs_read_qgroup_config()).
      This causes the -EINVAL error to be returned regardless of any qgroup
      flags being set instead of returning the error only when neither of
      the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON
      are set.
      
      A test case for fstests follows up soon.
      
      Fixes: 9593bf49 ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e4e7ede7
    • C
      Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion · 717beb96
      Chris Mason 提交于
      The vm_fault_t conversion commit introduced a ret2 variable for tracking
      the integer return values from internal btrfs functions.  It was
      sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
      and had been removed from the radix.  Something like this:
      
          ret2 = btrfs_delalloc_reserve_space() // returns zero on success
      
          lock_page(page)
          if (page->mapping != inode->i_mapping)
      	goto out_unlock;
      
      ...
      
      out_unlock:
          if (!ret2) {
      	    ...
      	    return VM_FAULT_LOCKED;
          }
      
      This ends up triggering this WARNING in btrfs_destroy_inode()
          WARN_ON(BTRFS_I(inode)->block_rsv.size);
      
      xfstests generic/095 was able to reliably reproduce the errors.
      
      Since out_unlock: is only used for errors, this fix moves it below the
      if (!ret2) check we use to return VM_FAULT_LOCKED for success.
      
      Fixes: a528a241 (btrfs: change return type of btrfs_page_mkwrite to vm_fault_t)
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      717beb96
    • Q
      btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf · 6f7de19e
      Qu Wenruo 提交于
      Commit ff3d27a0 ("btrfs: qgroup: Finish rescan when hit the last leaf
      of extent tree") added a new exit for rescan finish.
      
      However after finishing quota rescan, we set
      fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the
      original exit path.
      While we missed that assignment of (u64)-1 in the new exit path.
      
      The end result is, the quota status item doesn't have the same value.
      (-1 vs the last bytenr + 1)
      Although it doesn't affect quota accounting, it's still better to keep
      the original behavior.
      Reported-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Fixes: ff3d27a0 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6f7de19e
  6. 22 6月, 2018 3 次提交
    • F
      Btrfs: fix return value on rename exchange failure · c5b4a50b
      Filipe Manana 提交于
      If we failed during a rename exchange operation after starting/joining a
      transaction, we would end up replacing the return value, stored in the
      local 'ret' variable, with the return value from btrfs_end_transaction().
      So this could end up returning 0 (success) to user space despite the
      operation having failed and aborted the transaction, because if there are
      multiple tasks having a reference on the transaction at the time
      btrfs_end_transaction() is called by the rename exchange, that function
      returns 0 (otherwise it returns -EIO and not the original error value).
      So fix this by not overwriting the return value on error after getting
      a transaction handle.
      
      Fixes: cdd1fedf ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
      CC: stable@vger.kernel.org # 4.9+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c5b4a50b
    • L
      btrfs: fix invalid-free in btrfs_extent_same · 22883ddc
      Lu Fengqi 提交于
      If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
      		   (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
      is hit, we will go to free the uninitialized cmp.src_pages and
      cmp.dst_pages.
      
      Fixes: 67b07bd4 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      22883ddc
    • F
      Btrfs: fix physical offset reported by fiemap for inline extents · f0986318
      Filipe Manana 提交于
      Commit 9d311e11 ("Btrfs: fiemap: pass correct bytenr when
      fm_extent_count is zero") introduced a regression where we no longer
      report 0 as the physical offset for inline extents (and other extents
      with a special block_start value). This is because it always sets the
      variable used to report the physical offset ("disko") as em->block_start
      plus some offset, and em->block_start has the value 18446744073709551614
      ((u64) -2) for inline extents.
      
      This made the btrfs test 004 (from fstests) often fail, for example, for
      a file with an inline extent we have the following items in the subvolume
      tree:
      
          item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
                 generation 25 transid 38 size 1525 nbytes 1525
                 block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
                 sequence 0 flags 0x2(none)
                 atime 1529342058.461891730 (2018-06-18 18:14:18)
                 ctime 1529342058.461891730 (2018-06-18 18:14:18)
                 mtime 1529342058.461891730 (2018-06-18 18:14:18)
                 otime 1529342055.869892885 (2018-06-18 18:14:15)
          item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
                 index 25 namelen 3 name: fc7
          item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
                 generation 38 type 0 (inline)
                 inline extent data size 1525 ram_bytes 1525 compression 0 (none)
      
      Then when test 004 invoked fiemap against the file it got a non-zero
      physical offset:
      
       $ filefrag -v /mnt/p0/d4/d7/fc7
       Filesystem type is: 9123683e
       File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
        ext:     logical_offset:        physical_offset: length:   expected: flags:
          0:        0..    4095: 18446744073709551614..      4093:   4096:             last,not_aligned,inline,eof
       /mnt/p0/d4/d7/fc7: 1 extent found
      
      This resulted in the test failing like this:
      
      btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
          --- tests/btrfs/004.out	2016-08-23 10:17:35.027012095 +0100
          +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad	2018-06-18 18:15:02.385872155 +0100
          @@ -1,3 +1,10 @@
           QA output created by 004
           *** test backref walking
          -*** done
          +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression expected
          +ERROR: 7.55578637259143e+22 is not a valid numeric value.
          +unexpected output from
          +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -s 65536 -P 7.55578637259143e+22 /home/fdmanana/btrfs-tests/scratch_1
          ...
          (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire diff)
      Ran: btrfs/004
      
      The large number in scientific notation reported as an invalid numeric
      value is the result from the filter passed to perl which multiplies the
      physical offset by the block size reported by fiemap.
      
      So fix this by ensuring the physical offset is always set to 0 when we
      are processing an extent with a special block_start value.
      
      Fixes: 9d311e11 ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f0986318
  7. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  8. 11 6月, 2018 1 次提交
    • Q
      btrfs: scrub: Don't use inode pages for device replace · ac0b4145
      Qu Wenruo 提交于
      [BUG]
      Btrfs can create compressed extent without checksum (even though it
      shouldn't), and if we then try to replace device containing such extent,
      the result device will contain all the uncompressed data instead of the
      compressed one.
      
      Test case already submitted to fstests:
      https://patchwork.kernel.org/patch/10442353/
      
      [CAUSE]
      When handling compressed extent without checksum, device replace will
      goe into copy_nocow_pages() function.
      
      In that function, btrfs will get all inodes referring to this data
      extents and then use find_or_create_page() to get pages direct from that
      inode.
      
      The problem here is, pages directly from inode are always uncompressed.
      And for compressed data extent, they mismatch with on-disk data.
      Thus this leads to corrupted compressed data extent written to replace
      device.
      
      [FIX]
      In this attempt, we could just remove the "optimization" branch, and let
      unified scrub_pages() to handle it.
      
      Although scrub_pages() won't bother reusing page cache, it will be a
      little slower, but it does the correct csum checking and won't cause
      such data corruption caused by "optimization".
      
      Note about the fix: this is the minimal fix that can be backported to
      older stable trees without conflicts. The whole callchain from
      copy_nocow_pages() can be deleted, and will be in followup patches.
      
      Fixes: ff023aac ("Btrfs: add code to scrub to copy read data to another disk")
      CC: stable@vger.kernel.org # 4.4+
      Reported-by: NJames Harvey <jamespharvey20@gmail.com>
      Reviewed-by: NJames Harvey <jamespharvey20@gmail.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      [ remove code removal, add note why ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ac0b4145
  9. 07 6月, 2018 2 次提交
    • S
      btrfs: change return type of btrfs_page_mkwrite to vm_fault_t · a528a241
      Souptick Joarder 提交于
      Use the new return type vm_fault_t for fault handler. For now, this is
      just documenting that the function returns a VM_FAULT value rather than
      an errno. Once all instances are converted, vm_fault_t will become a
      distinct type.
      
      Reference commit 1c8f4220 ("mm: change return type to vm_fault_t")
      
      vmf_error() is the newly introduced inline function in 4.17-rc6.
      Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a528a241
    • R
      Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero · 9d311e11
      Robbie Ko 提交于
      [BUG]
      fm_mapped_extents is not correct when fm_extent_count is 0
      Like:
         # mount /dev/vdb5 /mnt/btrfs
         # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file
         # xfs_io -c "fiemap -v" /mnt/btrfs/file
         /mnt/btrfs/file:
         EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
           0: [0..127]:        25088..25215       128   0x1
      
      When user space wants to get the number of file extents,
      set fm_extent_count to 0 to run fiemap and then read fm_mapped_extents.
      
      In the above example, fiemap will return with fm_mapped_extents set to 4,
      but it should be 1 since there's only one entry in the output.
      
      [REASON]
      The problem seems to be that disko is only set if
      fieinfo->fi_extents_max is set. And this member is initialized, in the
      generic ioctl_fiemap function, to the value of used-passed
      fm_extent_count. So when the user passes 0 then fi_extent_max is also
      set to zero and this causes btrfs to not initialize disko at all.
      Eventually this leads emit_fiemap_extent being called with a bogus
      'phys' argument preventing proper fiemap entries merging.
      
      [FIX]
      Move the disko initialization earlier in extent_fiemap making it
      independent of user-passed arguments, allowing emit_fiemap_extent to
      properly handle consecutive extent entries.
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9d311e11
  10. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  11. 05 6月, 2018 1 次提交
  12. 31 5月, 2018 7 次提交
  13. 30 5月, 2018 6 次提交