1. 21 8月, 2017 3 次提交
    • L
      btrfs: use btrfsic_submit_bio instead of submit_bio in write_dev_flush · 43a01111
      Lu Fengqi 提交于
      Although this bio has no data attached, it will reach this condition
      (bio->bi_opf & REQ_PREFLUSH) and then update the flush_gen of dev_state
      in __btrfsic_submit_bio. So we should still submit it through integrity
      checker. Otherwise, the integrity checker will throw the following warning
      when I mount a newly created btrfs filesystem.
      
      [10264.755497] btrfs: attempt to write superblock which references block M @29523968 (sdb1/1111654400/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
      [10264.755498] btrfs: attempt to write superblock which references block M @29523968 (sdb1/37912576/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      43a01111
    • F
      Btrfs: incremental send, fix emission of invalid clone operations · 72610b1b
      Filipe Manana 提交于
      When doing an incremental send it's possible that the computed send stream
      contains clone operations that will fail on the receiver if the receiver
      has compression enabled and the clone operations target a sector sized
      extent that starts at a zero file offset, is not compressed on the source
      filesystem but ends up being compressed and inlined at the destination
      filesystem.
      
      Example scenario:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount -o compress /dev/sdb /mnt
      
        # By doing a direct IO write, the data is not compressed.
        $ xfs_io -f -d -c "pwrite -S 0xab 0 4K" /mnt/foobar
        $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
      
        $ xfs_io -c "reflink /mnt/foobar 0 8K 4K" /mnt/foobar
        $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2
      
        $ btrfs send -f /tmp/1.snap /mnt/mysnap1
        $ btrfs send -f /tmp/2.snap -p /mnt/mysnap1 /mnt/mysnap2
        $ umount /mnt
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount -o compress /dev/sdc /mnt
        $ btrfs receive -f /tmp/1.snap /mnt
        $ btrfs receive -f /tmp/2.snap /mnt
        ERROR: failed to clone extents to foobar
        Operation not supported
      
      The same could be achieved by mounting the source filesystem without
      compression and doing a buffered IO write instead of a direct IO one,
      and mounting the destination filesystem with compression enabled.
      
      So fix this by issuing regular write operations in the send stream
      instead of clone operations when the source offset is zero and the
      range has a length matching the sector size.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      72610b1b
    • L
      Btrfs: fix out of bounds array access while reading extent buffer · f716abd5
      Liu Bo 提交于
      There is a corner case that slips through the checkers in functions
      reading extent buffer, ie.
      
      if (start < eb->len) and (start + len > eb->len),
      then
      
      a) map_private_extent_buffer() returns immediately because
      it's thinking the range spans across two pages,
      
      b) and the checkers in read_extent_buffer(), WARN_ON(start > eb->len)
      and WARN_ON(start + len > eb->start + eb->len), both are OK in this
      corner case, but it'd actually try to access the eb->pages out of
      bounds because of (start + len > eb->len).
      
      The case is found by switching extent inline ref type from shared data
      ref to non-shared data ref, which is a kind of metadata corruption.
      
      It'd use the wrong helper to access the eb,
      eg. btrfs_extent_data_ref_root(eb, ref) is used but the %ref passing
      here is "struct btrfs_shared_data_ref".  And if the extent item
      happens to be the first item in the eb, then offset/length will get
      over eb->len which ends up an invalid memory access.
      
      This is adding proper checks in order to avoid invalid memory access,
      ie. 'general protection fault', before it's too late.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f716abd5
  2. 18 8月, 2017 12 次提交
    • N
      btrfs: Fix -EOVERFLOW handling in btrfs_ioctl_tree_search_v2 · c59efa7e
      Nikolay Borisov 提交于
      The buffer passed to btrfs_ioctl_tree_search* functions have to be at least
      sizeof(struct btrfs_ioctl_search_header). If this is not the case then the
      ioctl should return -EOVERFLOW and set the uarg->buf_size to the minimum
      required size. Currently btrfs_ioctl_tree_search_v2 would return an -EOVERFLOW
      error with ->buf_size being set to the value passed by user space. Fix this by
      removing the size check and relying on search_ioctl, which already includes it
      and correctly sets buf_size.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c59efa7e
    • N
      btrfs: Move skip checksum check from btrfs_submit_direct to __btrfs_submit_dio_bio · e6961cac
      Nikolay Borisov 提交于
      Currently the code checks whether we should do data checksumming in
      btrfs_submit_direct and the boolean result of this check is passed to
      btrfs_submit_direct_hook, in turn passing it to __btrfs_submit_dio_bio which
      actually consumes it. The last function actually has all the necessary context
      to figure out whether to skip the check or not, so let's move the check closer
      to where it's being consumed. No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e6961cac
    • F
      Btrfs: fix assertion failure during fsync in no-holes mode · 6399fb5a
      Filipe Manana 提交于
      When logging an inode in full mode that has an inline compressed extent
      that represents a range with a size matching the sector size (currently
      the same as the page size), has a trailing hole and the no-holes feature
      is enabled, we end up failing an assertion leading to a trace like the
      following:
      
      [141812.031528] assertion failed: len == i_size, file: fs/btrfs/tree-log.c, line: 4453
      [141812.033069] ------------[ cut here ]------------
      [141812.034330] kernel BUG at fs/btrfs/ctree.h:3452!
      [141812.035137] invalid opcode: 0000 [#1] PREEMPT SMP
      [141812.035932] Modules linked in: btrfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_flakey dm_mod dax ppdev evdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 tpm_tis psmouse crypto_simd parport_pc sg pcspkr tpm_tis_core cryptd parport serio_raw glue_helper tpm i2c_piix4 i2c_core button sunrpc loop autofs4 ext4 crc16 jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod ata_generic virtio_scsi ata_piix floppy crc32c_intel libata scsi_mod virtio_pci virtio_ring e1000 virtio [last unloaded: btrfs]
      [141812.036790] CPU: 3 PID: 845 Comm: fdm-stress Tainted: G    B   W       4.12.3-btrfs-next-52+ #1
      [141812.036790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
      [141812.036790] task: ffff8801e6694180 task.stack: ffffc90009004000
      [141812.036790] RIP: 0010:assfail.constprop.18+0x1c/0x1e [btrfs]
      [141812.036790] RSP: 0018:ffffc90009007bc0 EFLAGS: 00010282
      [141812.036790] RAX: 0000000000000046 RBX: ffff88017512c008 RCX: 0000000000000001
      [141812.036790] RDX: ffff88023fd95201 RSI: ffffffff8182264c RDI: 00000000ffffffff
      [141812.036790] RBP: ffffc90009007bc0 R08: 0000000000000001 R09: 0000000000000001
      [141812.036790] R10: 0000000000001000 R11: ffffffff82f5a0c9 R12: ffff88014e5947e8
      [141812.036790] R13: 00000000000b4000 R14: ffff8801b234d008 R15: 0000000000000000
      [141812.036790] FS:  00007fdba6ffd700(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
      [141812.036790] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [141812.036790] CR2: 00007fdb9c000010 CR3: 000000016efa2000 CR4: 00000000001406e0
      [141812.036790] Call Trace:
      [141812.036790]  btrfs_log_inode+0x9f0/0xd3d [btrfs]
      [141812.036790]  ? __mutex_lock+0x120/0x3ce
      [141812.036790]  btrfs_log_inode_parent+0x224/0x685 [btrfs]
      [141812.036790]  ? lock_acquire+0x16b/0x1af
      [141812.036790]  btrfs_log_dentry_safe+0x60/0x7b [btrfs]
      [141812.036790]  btrfs_sync_file+0x32e/0x3f8 [btrfs]
      [141812.036790]  vfs_fsync_range+0x8a/0x9d
      [141812.036790]  vfs_fsync+0x1c/0x1e
      [141812.036790]  do_fsync+0x31/0x4a
      [141812.036790]  SyS_fdatasync+0x13/0x17
      [141812.036790]  entry_SYSCALL_64_fastpath+0x18/0xad
      [141812.036790] RIP: 0033:0x7fdbac41a47d
      [141812.036790] RSP: 002b:00007fdba6ffce30 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
      [141812.036790] RAX: ffffffffffffffda RBX: ffffffff81092c9f RCX: 00007fdbac41a47d
      [141812.036790] RDX: 0000004cf0160a40 RSI: 0000000000000000 RDI: 0000000000000006
      [141812.036790] RBP: ffffc90009007f98 R08: 0000000000000000 R09: 0000000000000010
      [141812.036790] R10: 00000000000002e8 R11: 0000000000000293 R12: ffffffff8110cd90
      [141812.036790] R13: ffffc90009007f78 R14: 0000000000000000 R15: 0000000000000000
      [141812.036790]  ? time_hardirqs_off+0x9/0x14
      [141812.036790]  ? trace_hardirqs_off_caller+0x1f/0xa3
      [141812.036790] Code: c7 d6 61 6b a0 48 89 e5 e8 ba ef a8 e0 0f 0b 55 89 f1 48 c7 c2 6d 65 6b a0 48 89 fe 48 c7 c7 81 65 6b a0 48 89 e5 e8 9c ef a8 e0 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89
      [141812.036790] RIP: assfail.constprop.18+0x1c/0x1e [btrfs] RSP: ffffc90009007bc0
      [141812.084448] ---[ end trace 44e472684c7a32cc ]---
      
      Which happens because the code that logs a trailing hole when the no-holes
      feature is enabled, did not consider that a compressed inline extent can
      represent a range with a size matching the sector size, in which case
      expanding the inode's i_size, through a truncate operation, won't lead
      to padding with zeroes the page that represents the inline extent, and
      therefore the inline extent remains after the truncation.
      
      Fix this by adapting the assertion to accept inline extents representing
      data with a sector size length if, and only if, the inline extents are
      compressed.
      
      A sample and trivial reproducer (for systems with a 4K page size) for this
      issue:
      
        mkfs.btrfs -O no-holes -f /dev/sdc
        mount -o compress /dev/sdc /mnt
        xfs_io -f -c "pwrite -S 0xab 0 4K" /mnt/foobar
        sync
        xfs_io -c "truncate 32K" /mnt/foobar
        xfs_io -c "fsync" /mnt/foobar
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6399fb5a
    • F
      Btrfs: avoid unnecessarily locking inode when clearing a range · 4a4b964f
      Filipe Manana 提交于
      If the range being cleared was not marked for defrag and we are not
      about to clear the range from the defrag status, we don't need to
      lock and unlock the inode.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NChris Mason <clm@fb.com>
      Reviewed-by: NWang Shilong <wangshilong1991@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4a4b964f
    • C
      btrfs: remove redundant check on ret being non-zero · 938e1c77
      Colin Ian King 提交于
      The error return variable ret is initialized to zero and then is
      checked to see if it is non-zero in the if-block that follows it.
      It is therefore impossible for ret to be non-zero after the if-block
      hence the check is redundant and can be removed.
      
      Detected by CoverityScan, CID#1021040 ("Logically dead code")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      938e1c77
    • N
      btrfs: expose internal free space tree routine only if sanity tests are enabled · 2d77ab3c
      Nikolay Borisov 提交于
      The internal free space tree management routines are always exposed for
      testing purposes. Make them dependent on SANITY_TESTS being on so that
      they are exposed only when they really have to.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2d77ab3c
    • N
      btrfs: Remove unused sectorsize variable from struct map_lookup · db7c942c
      Nikolay Borisov 提交于
      This variable was added in 1abe9b8a ("Btrfs: add initial tracepointi
      support for btrfs"), yet it never really got used, only assigned to. So
      let's remove it.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      db7c942c
    • N
      btrfs: Remove never-reached WARN_ON · 92ac58ec
      Nikolay Borisov 提交于
      We have a WARN_ON(!var) inside an if branch which is executed (among
      others) only when var is true.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      92ac58ec
    • A
      btrfs: remove unused BTRFS_COMPRESS_LAST · dc2f2921
      Anand Jain 提交于
      We aren't using this define, so removing it.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dc2f2921
    • A
      btrfs: use BTRFS_FSID_SIZE for fsid · b94417ea
      Anand Jain 提交于
      We have define for FSID size so use it.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b94417ea
    • A
      btrfs: use appropriate define for the fsid · 44880fdc
      Anand Jain 提交于
      Though BTRFS_FSID_SIZE and BTRFS_UUID_SIZE are of the same size, we
      should use the matching constant for the fsid buffer.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      44880fdc
    • J
      btrfs: increase ctx->pos for delayed dir index · 42e9cc46
      Josef Bacik 提交于
      Our dir_context->pos is supposed to hold the next position we're
      supposed to look.  If we successfully insert a delayed dir index we
      could end up with a duplicate entry because we don't increase ctx->pos
      after doing the dir_emit.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      42e9cc46
  3. 16 8月, 2017 25 次提交