1. 12 4月, 2018 1 次提交
    • F
      Btrfs: fix loss of prealloc extents past i_size after fsync log replay · 471d557a
      Filipe Manana 提交于
      Currently if we allocate extents beyond an inode's i_size (through the
      fallocate system call) and then fsync the file, we log the extents but
      after a power failure we replay them and then immediately drop them.
      This behaviour happens since about 2009, commit c71bf099 ("Btrfs:
      Avoid orphan inodes cleanup while replaying log"), because it marks
      the inode as an orphan instead of dropping any extents beyond i_size
      before replaying logged extents, so after the log replay, and while
      the mount operation is still ongoing, we find the inode marked as an
      orphan and then perform a truncation (drop extents beyond the inode's
      i_size). Because the processing of orphan inodes is still done
      right after replaying the log and before the mount operation finishes,
      the intention of that commit does not make any sense (at least as
      of today). However reverting that behaviour is not enough, because
      we can not simply discard all extents beyond i_size and then replay
      logged extents, because we risk dropping extents beyond i_size created
      in past transactions, for example:
      
        add prealloc extent beyond i_size
        fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
        transaction commit
        add another prealloc extent beyond i_size
        fsync - triggers the fast fsync path
        power failure
      
      In that scenario, we would drop the first extent and then replay the
      second one. To fix this just make sure that all prealloc extents
      beyond i_size are logged, and if we find too many (which is far from
      a common case), fallback to a full transaction commit (like we do when
      logging regular extents in the fast fsync path).
      
      Trivial reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
       $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
       $ sync
       $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
       $ xfs_io -c "fsync" /mnt/foo
       <power failure>
      
       # mount to replay log
       $ mount /dev/sdb /mnt
       # at this point the file only has one extent, at offset 0, size 256K
      
      A test case for fstests follows soon, covering multiple scenarios that
      involve adding prealloc extents with previous shrinking truncates and
      without such truncates.
      
      Fixes: c71bf099 ("Btrfs: Avoid orphan inodes cleanup while replaying log")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      471d557a
  2. 06 4月, 2018 2 次提交
  3. 31 3月, 2018 3 次提交
    • Q
      btrfs: Validate child tree block's level and first key · 581c1760
      Qu Wenruo 提交于
      We have several reports about node pointer points to incorrect child
      tree blocks, which could have even wrong owner and level but still with
      valid generation and checksum.
      
      Although btrfs check could handle it and print error message like:
      leaf parent key incorrect 60670574592
      
      Kernel doesn't have enough check on this type of corruption correctly.
      At least add such check to read_tree_block() and btrfs_read_buffer(),
      where we need two new parameters @level and @first_key to verify the
      child tree block.
      
      The new @level check is mandatory and all call sites are already
      modified to extract expected level from its call chain.
      
      While @first_key is optional, the following call sites are skipping such
      check:
      1) Root node/leaf
         As ROOT_ITEM doesn't contain the first key, skip @first_key check.
      2) Direct backref
         Only parent bytenr and level is known and we need to resolve the key
         all by ourselves, skip @first_key check.
      
      Another note of this verification is, it needs extra info from nodeptr
      or ROOT_ITEM, so it can't fit into current tree-checker framework, which
      is limited to node/leaf boundary.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      581c1760
    • F
      Btrfs: fix copy_items() return value when logging an inode · 8434ec46
      Filipe Manana 提交于
      When logging an inode, at tree-log.c:copy_items(), if we call
      btrfs_next_leaf() at the loop which checks for the need to log holes, we
      need to make sure copy_items() returns the value 1 to its caller and
      not 0 (on success). This is because the path the caller passed was
      released and is now different from what is was before, and the caller
      expects a return value of 0 to mean both success and that the path
      has not changed, while a return value of 1 means both success and
      signals the caller that it can not reuse the path, it has to perform
      another tree search.
      
      Even though this is a case that should not be triggered on normal
      circumstances or very rare at least, its consequences can be very
      unpredictable (especially when replaying a log tree).
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8434ec46
    • F
      Btrfs: fix fsync after hole punching when using no-holes feature · 4ee3fad3
      Filipe Manana 提交于
      When we have the no-holes mode enabled and fsync a file after punching a
      hole in it, we can end up not logging the whole hole range in the log tree.
      This happens if the file has extent items that span more than one leaf and
      we punch a hole that covers a range that starts in a leaf but does not go
      beyond the offset of the first extent in the next leaf.
      
      Example:
      
        $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
        $ mount /dev/sdb /mnt
        $ for ((i = 0; i <= 831; i++)); do
      	offset=$((i * 2 * 256 * 1024))
      	xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
      		/mnt/foobar >/dev/null
          done
        $ sync
      
        # We now have 2 leafs in our filesystem fs tree, the first leaf has an
        # item corresponding the extent at file offset 216530944 and the second
        # leaf has a first item corresponding to the extent at offset 217055232.
        # Now we punch a hole that partially covers the range of the extent at
        # offset 216530944 but does go beyond the offset 217055232.
      
        $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
      
        <power fail>
      
        # mount to replay the log
        $ mount /dev/sdb /mnt
      
        # Before this patch, only the subrange [216658016, 216662016[ (length of
        # 4000 bytes) was logged, leaving an incorrect file layout after log
        # replay.
      
      Fix this by checking if there is a hole between the last extent item that
      we processed and the first extent item in the next leaf, and if there is
      one, log an explicit hole extent item.
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4ee3fad3
  4. 26 3月, 2018 4 次提交
    • N
      btrfs: Remove root argument from btrfs_log_dentry_safe · e5b84f7a
      Nikolay Borisov 提交于
      Now that nothing uses the root arg of btrfs_log_dentry_safe it can be
      safely removed. No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e5b84f7a
    • N
      btrfs: Remove root arg from btrfs_log_inode_parent · f882274b
      Nikolay Borisov 提交于
      btrfs_log_inode_parent is called from 2 places (btrfs_log_dentry_safe
      and btrfs_log_new_name) both of which pass inode->root as the root
      argument and the inode itself. Remove the redundant root argument and
      get a reference to the root directly from the inode, also remove
      redundant root != inode->root check from the same function. No
      functional change.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f882274b
    • N
      btrfs: Remove custom crc32c init code · 9678c543
      Nikolay Borisov 提交于
      The custom crc32 init code was introduced in
      14a958e6 ("Btrfs: fix btrfs boot when compiled as built-in") to
      enable using btrfs as a built-in. However, later as pointed out by
      60efa5eb ("Btrfs: use late_initcall instead of module_init") this
      wasn't enough and finally btrfs was switched to late_initcall which
      comes after the generic crc32c implementation is initiliased. The
      latter commit superseeded the former. Now that we don't have to
      maintain our own code let's just remove it and switch to using the
      generic implementation.
      
      Despite touching a lot of files the patch is really simple. Here is the gist of
      the changes:
      
      1. Select LIBCRC32C rather than the low-level modules.
      2. s/btrfs_crc32c/crc32c/g
      3. replace hash.h with linux/crc32c.h
      4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly.
      
      I've tested this with btrfs being both a module and a built-in and xfstest
      doesn't complain.
      
      Does seem to fix the longstanding problem of not automatically selectiong
      the crc32c module when btrfs is used. Possibly there is a workaround in
      dracut.
      
      The modinfo confirms that now all the module dependencies are there:
      
      before:
      depends:        zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate
      
      after:
      depends:        libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add more info to changelog from mails ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9678c543
    • N
      btrfs: Don't pass fs_info to btrfs_run_delayed_items/_nr · e5c304e6
      Nikolay Borisov 提交于
      We already pass the transaction which has a reference to the fs_info,
      so use that. No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e5c304e6
  5. 01 3月, 2018 2 次提交
    • F
      Btrfs: fix log replay failure after unlink and link combination · 1f250e92
      Filipe Manana 提交于
      If we have a file with 2 (or more) hard links in the same directory,
      remove one of the hard links, create a new file (or link an existing file)
      in the same directory with the name of the removed hard link, and then
      finally fsync the new file, we end up with a log that fails to replay,
      causing a mount failure.
      
      Example:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/testdir
        $ touch /mnt/testdir/foo
        $ ln /mnt/testdir/foo /mnt/testdir/bar
      
        $ sync
      
        $ unlink /mnt/testdir/bar
        $ touch /mnt/testdir/bar
        $ xfs_io -c "fsync" /mnt/testdir/bar
      
        <power failure>
      
        $ mount /dev/sdb /mnt
        mount: mount(2) failed: /mnt: No such file or directory
      
      When replaying the log, for that example, we also see the following in
      dmesg/syslog:
      
        [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
        [71813.674204] ------------[ cut here ]------------
        [71813.675694] BTRFS: Transaction aborted (error -2)
        [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
        [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
        [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G        W        4.15.0-rc9-btrfs-next-56+ #1
        [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
        [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
        [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
        [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
        [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
        [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
        [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
        [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
        [71813.679669] FS:  00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
        [71813.679669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
        [71813.679669] Call Trace:
        [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
        [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
        [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
        [71813.679669]  ? __lock_is_held+0x39/0x71
        [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
        [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
        [71813.679669]  ? rcu_read_unlock+0x3a/0x57
        [71813.679669]  ? __lock_is_held+0x39/0x71
        [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
        [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
        [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
        [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
        [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
        [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
        [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
        [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
        [71813.679669]  ? mount_fs+0x64/0x10b
        [71813.679669]  mount_fs+0x64/0x10b
        [71813.679669]  vfs_kern_mount+0x68/0xce
        [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
        [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
        [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
        [71813.679669]  ? mount_fs+0x64/0x10b
        [71813.679669]  mount_fs+0x64/0x10b
        [71813.679669]  vfs_kern_mount+0x68/0xce
        [71813.679669]  do_mount+0x6e5/0x973
        [71813.679669]  ? memdup_user+0x3e/0x5c
        [71813.679669]  SyS_mount+0x72/0x98
        [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
        [71813.679669] RIP: 0033:0x7f7cedf150ba
        [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
        [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
        [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
        [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
        [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
        [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
        [71814.128078] BTRFS error (device dm-0): open_ctree failed
      
      This happens because the log has inode reference items for both inode 258
      (the first file we created) and inode 259 (the second file created), and
      when processing the reference item for inode 258, we replace the
      corresponding item in the subvolume tree (which has two names, "foo" and
      "bar") witht he one in the log (which only has one name, "foo") without
      removing the corresponding dir index keys from the parent directory.
      Later, when processing the inode reference item for inode 259, which has
      a name of "bar" associated to it, we notice that dir index entries exist
      for that name and for a different inode, so we attempt to unlink that
      name, which fails because the inode reference item for inode 258 no longer
      has the name "bar" associated to it, making a call to btrfs_unlink_inode()
      fail with a -ENOENT error.
      
      Fix this by unlinking all the names in an inode reference item from a
      subvolume tree that are not present in the inode reference item found in
      the log tree, before overwriting it with the item from the log tree.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1f250e92
    • F
      Btrfs: fix log replay failure after linking special file and fsync · 9a6509c4
      Filipe Manana 提交于
      If in the same transaction we rename a special file (fifo, character/block
      device or symbolic link), create a hard link for it having its old name
      then sync the log, we will end up with a log that can not be replayed and
      at when attempting to replay it, an EEXIST error is returned and mounting
      the filesystem fails. Example scenario:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
        $ mkdir /mnt/testdir
        $ mkfifo /mnt/testdir/foo
        # Make sure everything done so far is durably persisted.
        $ sync
      
        # Create some unrelated file and fsync it, this is just to create a log
        # tree. The file must be in the same directory as our special file.
        $ touch /mnt/testdir/f1
        $ xfs_io -c "fsync" /mnt/testdir/f1
      
        # Rename our special file and then create a hard link with its old name.
        $ mv /mnt/testdir/foo /mnt/testdir/bar
        $ ln /mnt/testdir/bar /mnt/testdir/foo
      
        # Create some other unrelated file and fsync it, this is just to persist
        # the log tree which was modified by the previous rename and link
        # operations. Alternatively we could have modified file f1 and fsync it.
        $ touch /mnt/f2
        $ xfs_io -c "fsync" /mnt/f2
      
        <power failure>
      
        $ mount /dev/sdc /mnt
        mount: mount /dev/sdc on /mnt failed: File exists
      
      This happens because when both the log tree and the subvolume's tree have
      an entry in the directory "testdir" with the same name, that is, there
      is one key (258 INODE_REF 257) in the subvolume tree and another one in
      the log tree (where 258 is the inode number of our special file and 257
      is the inode for directory "testdir"). Only the data of those two keys
      differs, in the subvolume tree the index field for inode reference has
      a value of 3 while the log tree it has a value of 5. Because the same key
      exists in both trees, but have different index, the log replay fails with
      an -EEXIST error when attempting to replay the inode reference from the
      log tree.
      
      Fix this by setting the last_unlink_trans field of the inode (our special
      file) to the current transaction id when a hard link is created, as this
      forces logging the parent directory inode, solving the conflict at log
      replay time.
      
      A new generic test case for fstests was also submitted.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9a6509c4
  6. 02 2月, 2018 3 次提交
  7. 29 1月, 2018 1 次提交
  8. 22 1月, 2018 2 次提交
  9. 28 11月, 2017 1 次提交
    • L
      Btrfs: fix list_add corruption and soft lockups in fsync · ebb70442
      Liu Bo 提交于
      Xfstests btrfs/146 revealed this corruption,
      
      [   58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read
      [   58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
      [   58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88).
      [   58.153518] ------------[ cut here ]------------
      [   58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0
      ...
      [   58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
      ...
      [   58.161956] Call Trace:
      [   58.162264]  btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
      [   58.163583]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
      [   58.164003]  btrfs_sync_file+0x4c2/0x6f0 [btrfs]
      [   58.164393]  vfs_fsync_range+0x5f/0xd0
      [   58.164898]  do_fsync+0x5a/0x90
      [   58.165170]  SyS_fsync+0x10/0x20
      [   58.165395]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      ...
      
      It turns out that we could record btrfs_log_ctx:io_err in
      log_one_extents when IO fails, but make log_one_extents() return '0'
      instead of -EIO, so the IO error is not acknowledged by the callers,
      i.e.  btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
      from list head 'root->log_ctxs'.  Since btrfs_log_ctx is allocated
      from stack memory, it'd get freed with a object alive on the
      list. then a future list_add will throw the above warning.
      
      This returns the correct error in the above case.
      
      Jeff also reported this while testing against his fsync error
      patch set[1].
      
      [1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
      "btrfs list corruption and soft lockups while testing writeback error handling"
      
      Fixes: 8407f553 ("Btrfs: fix data corruption after fast fsync and writeback error")
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ebb70442
  10. 30 10月, 2017 3 次提交
  11. 26 9月, 2017 1 次提交
    • J
      btrfs: log csums for all modified extents · 8c6c5928
      Josef Bacik 提交于
      Amir reported a bug discovered by his cleaned up version of my
      dm-log-writes xfstests where we were missing csums at certain replay
      points.  This is because fsx was doing an msync(), which essentially
      fsync()'s a specific range of a file.  We will log all modified extents,
      but only search for the checksums in the range we are being asked to
      sync.  We cannot simply log the extents in the range we're being asked
      because we are logging the inode item as it is currently, which if it
      has had a i_size update before the msync means we will miss extents when
      replaying.  We could possibly get around this by marking the inode with
      the transaction that extended the i_size to see if we have this case,
      but this would be racy and we'd have to lock the whole range of the
      inode to make sure we didn't have an ordered extent outside of our range
      that was in the middle of completing.
      
      Fix this simply by keeping track of the modified extents range and
      logging the csums for the entire range of extents that we are logging.
      This makes the xfstest pass.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8c6c5928
  12. 21 8月, 2017 1 次提交
  13. 18 8月, 2017 2 次提交
    • F
      Btrfs: fix assertion failure during fsync in no-holes mode · 6399fb5a
      Filipe Manana 提交于
      When logging an inode in full mode that has an inline compressed extent
      that represents a range with a size matching the sector size (currently
      the same as the page size), has a trailing hole and the no-holes feature
      is enabled, we end up failing an assertion leading to a trace like the
      following:
      
      [141812.031528] assertion failed: len == i_size, file: fs/btrfs/tree-log.c, line: 4453
      [141812.033069] ------------[ cut here ]------------
      [141812.034330] kernel BUG at fs/btrfs/ctree.h:3452!
      [141812.035137] invalid opcode: 0000 [#1] PREEMPT SMP
      [141812.035932] Modules linked in: btrfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_flakey dm_mod dax ppdev evdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 tpm_tis psmouse crypto_simd parport_pc sg pcspkr tpm_tis_core cryptd parport serio_raw glue_helper tpm i2c_piix4 i2c_core button sunrpc loop autofs4 ext4 crc16 jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod ata_generic virtio_scsi ata_piix floppy crc32c_intel libata scsi_mod virtio_pci virtio_ring e1000 virtio [last unloaded: btrfs]
      [141812.036790] CPU: 3 PID: 845 Comm: fdm-stress Tainted: G    B   W       4.12.3-btrfs-next-52+ #1
      [141812.036790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
      [141812.036790] task: ffff8801e6694180 task.stack: ffffc90009004000
      [141812.036790] RIP: 0010:assfail.constprop.18+0x1c/0x1e [btrfs]
      [141812.036790] RSP: 0018:ffffc90009007bc0 EFLAGS: 00010282
      [141812.036790] RAX: 0000000000000046 RBX: ffff88017512c008 RCX: 0000000000000001
      [141812.036790] RDX: ffff88023fd95201 RSI: ffffffff8182264c RDI: 00000000ffffffff
      [141812.036790] RBP: ffffc90009007bc0 R08: 0000000000000001 R09: 0000000000000001
      [141812.036790] R10: 0000000000001000 R11: ffffffff82f5a0c9 R12: ffff88014e5947e8
      [141812.036790] R13: 00000000000b4000 R14: ffff8801b234d008 R15: 0000000000000000
      [141812.036790] FS:  00007fdba6ffd700(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
      [141812.036790] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [141812.036790] CR2: 00007fdb9c000010 CR3: 000000016efa2000 CR4: 00000000001406e0
      [141812.036790] Call Trace:
      [141812.036790]  btrfs_log_inode+0x9f0/0xd3d [btrfs]
      [141812.036790]  ? __mutex_lock+0x120/0x3ce
      [141812.036790]  btrfs_log_inode_parent+0x224/0x685 [btrfs]
      [141812.036790]  ? lock_acquire+0x16b/0x1af
      [141812.036790]  btrfs_log_dentry_safe+0x60/0x7b [btrfs]
      [141812.036790]  btrfs_sync_file+0x32e/0x3f8 [btrfs]
      [141812.036790]  vfs_fsync_range+0x8a/0x9d
      [141812.036790]  vfs_fsync+0x1c/0x1e
      [141812.036790]  do_fsync+0x31/0x4a
      [141812.036790]  SyS_fdatasync+0x13/0x17
      [141812.036790]  entry_SYSCALL_64_fastpath+0x18/0xad
      [141812.036790] RIP: 0033:0x7fdbac41a47d
      [141812.036790] RSP: 002b:00007fdba6ffce30 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
      [141812.036790] RAX: ffffffffffffffda RBX: ffffffff81092c9f RCX: 00007fdbac41a47d
      [141812.036790] RDX: 0000004cf0160a40 RSI: 0000000000000000 RDI: 0000000000000006
      [141812.036790] RBP: ffffc90009007f98 R08: 0000000000000000 R09: 0000000000000010
      [141812.036790] R10: 00000000000002e8 R11: 0000000000000293 R12: ffffffff8110cd90
      [141812.036790] R13: ffffc90009007f78 R14: 0000000000000000 R15: 0000000000000000
      [141812.036790]  ? time_hardirqs_off+0x9/0x14
      [141812.036790]  ? trace_hardirqs_off_caller+0x1f/0xa3
      [141812.036790] Code: c7 d6 61 6b a0 48 89 e5 e8 ba ef a8 e0 0f 0b 55 89 f1 48 c7 c2 6d 65 6b a0 48 89 fe 48 c7 c7 81 65 6b a0 48 89 e5 e8 9c ef a8 e0 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89
      [141812.036790] RIP: assfail.constprop.18+0x1c/0x1e [btrfs] RSP: ffffc90009007bc0
      [141812.084448] ---[ end trace 44e472684c7a32cc ]---
      
      Which happens because the code that logs a trailing hole when the no-holes
      feature is enabled, did not consider that a compressed inline extent can
      represent a range with a size matching the sector size, in which case
      expanding the inode's i_size, through a truncate operation, won't lead
      to padding with zeroes the page that represents the inline extent, and
      therefore the inline extent remains after the truncation.
      
      Fix this by adapting the assertion to accept inline extents representing
      data with a sector size length if, and only if, the inline extents are
      compressed.
      
      A sample and trivial reproducer (for systems with a 4K page size) for this
      issue:
      
        mkfs.btrfs -O no-holes -f /dev/sdc
        mount -o compress /dev/sdc /mnt
        xfs_io -f -c "pwrite -S 0xab 0 4K" /mnt/foobar
        sync
        xfs_io -c "truncate 32K" /mnt/foobar
        xfs_io -c "fsync" /mnt/foobar
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6399fb5a
    • C
      btrfs: remove redundant check on ret being non-zero · 938e1c77
      Colin Ian King 提交于
      The error return variable ret is initialized to zero and then is
      checked to see if it is non-zero in the if-block that follows it.
      It is therefore impossible for ret to be non-zero after the if-block
      hence the check is redundant and can be removed.
      
      Detected by CoverityScan, CID#1021040 ("Logically dead code")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      938e1c77
  14. 20 7月, 2017 1 次提交
  15. 22 6月, 2017 4 次提交
    • S
      btrfs: Check name_len in btrfs_check_ref_name_override · 3c1d4184
      Su Yue 提交于
      In btrfs_log_inode, btrfs_search_forward gets the buffer and then
      btrfs_check_ref_name_override will read name from ref/extref for the
      first time.
      
      Call btrfs_is_name_len_valid before reading name.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3c1d4184
    • S
      btrfs: Verify dir_item in replay_xattr_deletes · 8ee8c2d6
      Su Yue 提交于
      replay_xattr_deletes calls btrfs_search_slot to get buffer and reads
      name.
      
      Call verify_dir_item to check name_len in replay_xattr_deletes to avoid
      reading out of boundary.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8ee8c2d6
    • S
      btrfs: Check name_len on add_inode_ref call path · 26a836ce
      Su Yue 提交于
      replay_one_buffer first reads buffers and dispatches items accroding to
      the item type.
      In this patch, add_inode_ref handles inode_ref and inode_extref.
      Then add_inode_ref calls ref_get_fields and extref_get_fields to read
      ref/extref name for the first time.
      So checking name_len before reading those two is fine.
      
      add_inode_ref also calls inode_in_dir to match ref/extref in parent_dir.
      The call graph includes btrfs_match_dir_item_name to read dir_item name
      in the parent dir.
      Checking first dir_item is not enough. Change it to verify every
      dir_item while doing matches.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      26a836ce
    • S
      btrfs: Check name_len with boundary in verify dir_item · e79a3327
      Su Yue 提交于
      Originally, verify_dir_item verifies name_len of dir_item with fixed
      values but not item boundary.
      If corrupted name_len was not bigger than the fixed value, for example
      255, the function will think the dir_item is fine. And then reading
      beyond boundary will cause crash.
      
      Example:
      	1. Corrupt one dir_item name_len to be 255.
              2. Run 'ls -lar /mnt/test/ > /dev/null'
      dmesg:
      [   48.451449] BTRFS info (device vdb1): disk space caching is enabled
      [   48.451453] BTRFS info (device vdb1): has skinny extents
      [   48.489420] general protection fault: 0000 [#1] SMP
      [   48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq
      [   48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5
      [   48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      [   48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000
      [   48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs]
      [   48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202
      [   48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000
      [   48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368
      [   48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000
      [   48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288
      [   48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20
      [   48.490008] FS:  00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
      [   48.490008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0
      [   48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   48.490008] Call Trace:
      [   48.490008]  btrfs_real_readdir+0x3b7/0x4a0 [btrfs]
      [   48.490008]  iterate_dir+0x181/0x1b0
      [   48.490008]  SyS_getdents+0xa7/0x150
      [   48.490008]  ? fillonedir+0x150/0x150
      [   48.490008]  entry_SYSCALL_64_fastpath+0x18/0xad
      [   48.490008] RIP: 0033:0x7fef5032546b
      [   48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
      [   48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b
      [   48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003
      [   48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000
      [   48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38
      [   48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e
      [   48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98
      [   48.499455] ---[ end trace 321920d8e8339505 ]---
      
      Fix it by adding a parameter @slot and check name_len with item boundary
      by calling btrfs_is_name_len_valid.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      rev
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e79a3327
  16. 18 4月, 2017 1 次提交
  17. 28 2月, 2017 5 次提交
  18. 24 2月, 2017 1 次提交
    • F
      Btrfs: do not create explicit holes when replaying log tree if NO_HOLES enabled · 3168021c
      Filipe Manana 提交于
      We log holes explicitly by using file extent items, however when replaying
      a log tree, if a logged file extent item corresponds to a hole and the
      NO_HOLES feature is enabled we do not need to copy the file extent item
      into the fs/subvolume tree, as the absence of such file extent items is
      the purpose of the NO_HOLES feature. So skip the copying of file extent
      items representing holes when the NO_HOLES feature is enabled.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      3168021c
  19. 17 2月, 2017 2 次提交