1. 22 1月, 2018 4 次提交
  2. 30 10月, 2017 2 次提交
  3. 24 8月, 2017 2 次提交
    • O
      Btrfs: fix blk_status_t/errno confusion · 58efbc9f
      Omar Sandoval 提交于
      This fixes several instances of blk_status_t and bare errno ints being
      mixed up, some of which are real bugs.
      
      In the normal case, 0 matches BLK_STS_OK, so we don't observe any
      effects of the missing conversion, but in case of errors or passes
      through the repair/retry paths, the errors get mixed up.
      
      The changes were identified using 'sparse', we don't have reports of the
      buggy behaviour.
      
      Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58efbc9f
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  4. 14 7月, 2017 1 次提交
    • F
      Btrfs: fix write corruption due to bio cloning on raid5/6 · 6592e58c
      Filipe Manana 提交于
      The recent changes to make bio cloning faster (added in the 4.13 merge
      window) by using the bio_clone_fast() API introduced a regression on
      raid5/6 modes, because cloned bios have an invalid bi_vcnt field
      (therefore it can not be used) and the raid5/6 code uses the
      bio_for_each_segment_all() API to iterate the segments of a bio, and this
      API uses a bio's bi_vcnt field.
      
      The issue is very simple to trigger by doing for example a direct IO write
      against a raid5 or raid6 filesystem and then attempting to read what we
      wrote before:
      
        $ mkfs.btrfs -m raid5 -d raid5 -f /dev/sdc /dev/sdd /dev/sde /dev/sdf
        $ mount /dev/sdc /mnt
        $ xfs_io -f -d -c "pwrite -S 0xab 0 1M" /mnt/foobar
        $ od -t x1 /mnt/foobar
        od: /mnt/foobar: read error: Input/output error
      
      For that example, the following is also reported in dmesg/syslog:
      
        [18274.985557] btrfs_print_data_csum_error: 18 callbacks suppressed
        [18274.995277] BTRFS warning (device sdf): csum failed root 5 ino 257 off 0 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18274.997205] BTRFS warning (device sdf): csum failed root 5 ino 257 off 4096 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18275.025221] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18275.047422] BTRFS warning (device sdf): csum failed root 5 ino 257 off 12288 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18275.054818] BTRFS warning (device sdf): csum failed root 5 ino 257 off 4096 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18275.054834] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18275.054943] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 2
        [18275.055207] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 3
        [18275.055571] BTRFS warning (device sdf): csum failed root 5 ino 257 off 0 csum 0x98f94189 expected csum 0x94374193 mirror 1
        [18275.062171] BTRFS warning (device sdf): csum failed root 5 ino 257 off 12288 csum 0x98f94189 expected csum 0x94374193 mirror 1
      
      A scrub will also fail correcting bad copies, mentioning the following in
      dmesg/syslog:
      
        [18276.128696] scrub_handle_errored_block: 498 callbacks suppressed
        [18276.129617] BTRFS warning (device sdf): checksum error at logical 2186346496 on dev /dev/sde, sector 2116608, root 5, inode 257, offset 65536, length 4096, links $
        [18276.149235] btrfs_dev_stat_print_on_error: 498 callbacks suppressed
        [18276.157897] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
        [18276.206059] BTRFS warning (device sdf): checksum error at logical 2186477568 on dev /dev/sdd, sector 2116736, root 5, inode 257, offset 196608, length 4096, links$
        [18276.206059] BTRFS error (device sdf): bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
        [18276.306552] BTRFS warning (device sdf): checksum error at logical 2186543104 on dev /dev/sdd, sector 2116864, root 5, inode 257, offset 262144, length 4096, links$
        [18276.319152] BTRFS error (device sdf): bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
        [18276.394316] BTRFS warning (device sdf): checksum error at logical 2186739712 on dev /dev/sdf, sector 2116992, root 5, inode 257, offset 458752, length 4096, links$
        [18276.396348] BTRFS error (device sdf): bdev /dev/sdf errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
        [18276.434127] BTRFS warning (device sdf): checksum error at logical 2186870784 on dev /dev/sde, sector 2117120, root 5, inode 257, offset 589824, length 4096, links$
        [18276.434127] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
        [18276.500504] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186477568 on dev /dev/sdd
        [18276.538400] BTRFS warning (device sdf): checksum error at logical 2186481664 on dev /dev/sdd, sector 2116744, root 5, inode 257, offset 200704, length 4096, links$
        [18276.540452] BTRFS error (device sdf): bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
        [18276.542012] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186481664 on dev /dev/sdd
        [18276.585030] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186346496 on dev /dev/sde
        [18276.598306] BTRFS warning (device sdf): checksum error at logical 2186412032 on dev /dev/sde, sector 2116736, root 5, inode 257, offset 131072, length 4096, links$
        [18276.598310] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
        [18276.598582] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186350592 on dev /dev/sde
        [18276.603455] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
        [18276.638362] BTRFS warning (device sdf): checksum error at logical 2186354688 on dev /dev/sde, sector 2116624, root 5, inode 257, offset 73728, length 4096, links $
        [18276.640445] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
        [18276.645942] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186354688 on dev /dev/sde
        [18276.657204] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186412032 on dev /dev/sde
        [18276.660563] BTRFS warning (device sdf): checksum error at logical 2186416128 on dev /dev/sde, sector 2116744, root 5, inode 257, offset 135168, length 4096, links$
        [18276.664609] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
        [18276.664609] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186358784 on dev /dev/sde
      
      So fix this by using the bio_for_each_segment() API and setting before
      the bio's bi_iter field to the value of the corresponding btrfs bio
      container's saved iterator if we are processing a cloned bio in the
      raid5/6 code (the same code processes both cloned and non-cloned bios).
      
      This incorrect iteration of cloned bios was also causing some occasional
      BUG_ONs when running fstest btrfs/064, which have a trace like the
      following:
      
        [ 6674.416156] ------------[ cut here ]------------
        [ 6674.416157] kernel BUG at fs/btrfs/raid56.c:1897!
        [ 6674.416159] invalid opcode: 0000 [#1] PREEMPT SMP
        [ 6674.416160] Modules linked in: dm_flakey dm_mod dax ppdev tpm_tis parport_pc tpm_tis_core evdev tpm psmouse sg i2c_piix4 pcspkr parport i2c_core serio_raw button s
        [ 6674.416184] CPU: 3 PID: 19236 Comm: kworker/u32:10 Not tainted 4.12.0-rc6-btrfs-next-44+ #1
        [ 6674.416185] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
        [ 6674.416210] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
        [ 6674.416211] task: ffff880147f6c740 task.stack: ffffc90001fb8000
        [ 6674.416229] RIP: 0010:__raid_recover_end_io+0x1ac/0x370 [btrfs]
        [ 6674.416230] RSP: 0018:ffffc90001fbbb90 EFLAGS: 00010217
        [ 6674.416231] RAX: ffff8801ff4b4f00 RBX: 0000000000000002 RCX: 0000000000000001
        [ 6674.416232] RDX: ffff880099b045d8 RSI: ffffffff81a5f6e0 RDI: 0000000000000004
        [ 6674.416232] RBP: ffffc90001fbbbc8 R08: 0000000000000001 R09: 0000000000000001
        [ 6674.416233] R10: ffffc90001fbbac8 R11: 0000000000001000 R12: 0000000000000002
        [ 6674.416234] R13: ffff880099b045c0 R14: 0000000000000004 R15: ffff88012bff2000
        [ 6674.416235] FS:  0000000000000000(0000) GS:ffff88023f2c0000(0000) knlGS:0000000000000000
        [ 6674.416235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 6674.416236] CR2: 00007f28cf282000 CR3: 00000001000c6000 CR4: 00000000000006e0
        [ 6674.416239] Call Trace:
        [ 6674.416259]  __raid56_parity_recover+0xfc/0x16e [btrfs]
        [ 6674.416276]  raid56_parity_recover+0x157/0x16b [btrfs]
        [ 6674.416293]  btrfs_map_bio+0xe0/0x259 [btrfs]
        [ 6674.416310]  btrfs_submit_bio_hook+0xbf/0x147 [btrfs]
        [ 6674.416327]  end_bio_extent_readpage+0x27b/0x4a0 [btrfs]
        [ 6674.416331]  bio_endio+0x17d/0x1b3
        [ 6674.416346]  end_workqueue_fn+0x3c/0x3f [btrfs]
        [ 6674.416362]  btrfs_scrubparity_helper+0x1aa/0x3b8 [btrfs]
        [ 6674.416379]  btrfs_endio_helper+0xe/0x10 [btrfs]
        [ 6674.416381]  process_one_work+0x276/0x4b6
        [ 6674.416384]  worker_thread+0x1ac/0x266
        [ 6674.416386]  ? rescuer_thread+0x278/0x278
        [ 6674.416387]  kthread+0x106/0x10e
        [ 6674.416389]  ? __list_del_entry+0x22/0x22
        [ 6674.416391]  ret_from_fork+0x27/0x40
        [ 6674.416395] Code: 44 89 e2 be 00 10 00 00 ff 15 b0 ab ef ff eb 72 4d 89 e8 89 d9 44 89 e2 be 00 10 00 00 ff 15 a3 ab ef ff eb 5d 41 83 fc ff 74 02 <0f> 0b 49 63 97
        [ 6674.416432] RIP: __raid_recover_end_io+0x1ac/0x370 [btrfs] RSP: ffffc90001fbbb90
        [ 6674.416434] ---[ end trace 74d56ebe7489dd6a ]---
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      6592e58c
  5. 20 6月, 2017 3 次提交
  6. 09 6月, 2017 1 次提交
  7. 18 4月, 2017 3 次提交
    • Q
      btrfs: Wait for in-flight bios before freeing target device for raid56 · ae6529c3
      Qu Wenruo 提交于
      When raid56 dev-replace is cancelled by running scrub, we will free
      target device without waiting for in-flight bios, causing the following
      NULL pointer deference or general protection failure.
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000005e0
       IP: generic_make_request_checks+0x4d/0x610
       CPU: 1 PID: 11676 Comm: kworker/u4:14 Tainted: G  O    4.11.0-rc2 #72
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
       Workqueue: btrfs-endio-raid56 btrfs_endio_raid56_helper [btrfs]
       task: ffff88002875b4c0 task.stack: ffffc90001334000
       RIP: 0010:generic_make_request_checks+0x4d/0x610
       Call Trace:
        ? generic_make_request+0xc7/0x360
        generic_make_request+0x24/0x360
        ? generic_make_request+0xc7/0x360
        submit_bio+0x64/0x120
        ? page_in_rbio+0x4d/0x80 [btrfs]
        ? rbio_orig_end_io+0x80/0x80 [btrfs]
        finish_rmw+0x3f4/0x540 [btrfs]
        validate_rbio_for_rmw+0x36/0x40 [btrfs]
        raid_rmw_end_io+0x7a/0x90 [btrfs]
        bio_endio+0x56/0x60
        end_workqueue_fn+0x3c/0x40 [btrfs]
        btrfs_scrubparity_helper+0xef/0x620 [btrfs]
        btrfs_endio_raid56_helper+0xe/0x10 [btrfs]
        process_one_work+0x2af/0x720
        ? process_one_work+0x22b/0x720
        worker_thread+0x4b/0x4f0
        kthread+0x10f/0x150
        ? process_one_work+0x720/0x720
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x2e/0x40
       RIP: generic_make_request_checks+0x4d/0x610 RSP: ffffc90001337bb8
      
      In btrfs_dev_replace_finishing(), we will call
      btrfs_rm_dev_replace_blocked() to wait bios before destroying the target
      device when scrub is finished normally.
      
      However when dev-replace is aborted, either due to error or cancelled by
      scrub, we didn't wait for bios, this can lead to use-after-free if there
      are bios holding the target device.
      
      Furthermore, for raid56 scrub, at least 2 places are calling
      btrfs_map_sblock() without protection of bio_counter, leading to the
      problem.
      
      This patch fixes the problem:
      1) Wait for bio_counter before freeing target device when canceling
         replace
      2) When calling btrfs_map_sblock() for raid56, use bio_counter to
         protect the call.
      
      Cc: Liu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ae6529c3
    • L
      Btrfs: fix wrong failed mirror_num of read-repair on raid56 · abad60c6
      Liu Bo 提交于
      In raid56 scenario, after trying parity recovery, we didn't set
      mirror_num for btrfs_bio with failed mirror_num, hence
      end_bio_extent_readpage() will report a random mirror_num in dmesg
      log.
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      abad60c6
    • E
      btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t · dec95574
      Elena Reshetova 提交于
      refcount_t type and corresponding API should be
      used instead of atomic_t when the variable is used as
      a reference counter. This allows to avoid accidental
      refcounter overflows that might lead to use-after-free
      situations.
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dec95574
  8. 14 2月, 2017 1 次提交
  9. 06 12月, 2016 3 次提交
  10. 30 11月, 2016 1 次提交
  11. 26 9月, 2016 1 次提交
  12. 08 6月, 2016 2 次提交
  13. 26 5月, 2016 1 次提交
  14. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  15. 20 1月, 2016 5 次提交
  16. 07 1月, 2016 1 次提交
  17. 11 10月, 2015 1 次提交
  18. 09 8月, 2015 1 次提交
    • O
      Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation · b4ee1782
      Omar Sandoval 提交于
      The current RAID 5/6 recovery code isn't quite prepared to handle
      missing devices. In particular, it expects a bio that we previously
      attempted to use in the read path, meaning that it has valid pages
      allocated. However, missing devices have a NULL blkdev, and we can't
      call bio_add_page() on a bio with a NULL blkdev. We could do manual
      manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add
      a separate path that allows us to manually add pages to the rbio.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      b4ee1782
  19. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  20. 04 3月, 2015 1 次提交
  21. 17 2月, 2015 1 次提交
  22. 22 1月, 2015 3 次提交