1. 05 10月, 2019 1 次提交
    • F
      Btrfs: fix use-after-free when using the tree modification log · b08344be
      Filipe Manana 提交于
      commit efad8a853ad2057f96664328a0d327a05ce39c76 upstream.
      
      At ctree.c:get_old_root(), we are accessing a root's header owner field
      after we have freed the respective extent buffer. This results in an
      use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
      is set, results in a stack trace like the following:
      
        [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
        [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
        (...)
        [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286
        [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000
        [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246
        [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000
        [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000
        [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8
        [ 3876.799620] FS:  00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000
        [ 3876.799643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0
        [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 3876.799723] Call Trace:
        [ 3876.799735]  ? do_raw_spin_unlock+0x49/0xc0
        [ 3876.799749]  ? _raw_spin_unlock+0x24/0x30
        [ 3876.799779]  resolve_indirect_refs+0x1eb/0xc80 [btrfs]
        [ 3876.799810]  find_parent_nodes+0x38d/0x1180 [btrfs]
        [ 3876.799841]  btrfs_check_shared+0x11a/0x1d0 [btrfs]
        [ 3876.799870]  ? extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799895]  extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799913]  do_vfs_ioctl+0x45a/0x700
        [ 3876.799926]  ksys_ioctl+0x70/0x80
        [ 3876.799938]  ? trace_hardirqs_off_thunk+0x1a/0x20
        [ 3876.799953]  __x64_sys_ioctl+0x16/0x20
        [ 3876.799965]  do_syscall_64+0x62/0x220
        [ 3876.799977]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 3876.799993] RIP: 0033:0x7f23e0013dd7
        (...)
        [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7
        [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003
        [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c
        [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80
        [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80
        (...)
        [ 3876.805107] ---[ end trace e53161e179ef04f9 ]---
      
      Fix that by saving the root's header owner field into a local variable
      before freeing the root's extent buffer, and then use that local variable
      when needed.
      
      Fixes: 30b0463a ("Btrfs: fix accessing the root pointer in tree mod log functions")
      CC: stable@vger.kernel.org # 3.10+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b08344be
  2. 22 5月, 2019 1 次提交
    • Q
      btrfs: Check the first key and level for cached extent buffer · d8925a1f
      Qu Wenruo 提交于
      commit 448de471cd4cab0cedd15770082567a69a784a11 upstream.
      
      [BUG]
      When reading a file from a fuzzed image, kernel can panic like:
      
        BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
        assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3500!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
        Call Trace:
         btrfs_lookup_csum+0x52/0x150 [btrfs]
         __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
         btrfs_submit_bio_hook+0x103/0x170 [btrfs]
         submit_one_bio+0x59/0x80 [btrfs]
         extent_read_full_page+0x58/0x80 [btrfs]
         generic_file_read_iter+0x2f6/0x9d0
         __vfs_read+0x14d/0x1a0
         vfs_read+0x8d/0x140
         ksys_read+0x52/0xc0
         do_syscall_64+0x60/0x210
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      The fuzzed image has a corrupted leaf whose first key doesn't match its
      parent:
      
        checksum tree key (CSUM_TREE ROOT_ITEM 0)
        node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
        	...
                key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
      
        leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
        leaf 29761536 flags 0x1(WRITTEN) backref revision 1
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
                item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
                        range start 8798638964736 end 8798641262592 length 2297856
      
      When reading the above tree block, we have extent_buffer->refs = 2 in
      the context:
      
      - initial one from __alloc_extent_buffer()
        alloc_extent_buffer()
        |- __alloc_extent_buffer()
           |- atomic_set(&eb->refs, 1)
      
      - one being added to fs_info->buffer_radix
        alloc_extent_buffer()
        |- check_buffer_tree_ref()
           |- atomic_inc(&eb->refs)
      
      So if even we call free_extent_buffer() in read_tree_block or other
      similar situation, we only decrease the refs by 1, it doesn't reach 0
      and won't be freed right now.
      
      The staled eb and its corrupted content will still be kept cached.
      
      Furthermore, we have several extra cases where we either don't do first
      key check or the check is not proper for all callers:
      
      - scrub
        We just don't have first key in this context.
      
      - shared tree block
        One tree block can be shared by several snapshot/subvolume trees.
        In that case, the first key check for one subvolume doesn't apply to
        another.
      
      So for the above reasons, a corrupted extent buffer can sneak into the
      buffer cache.
      
      [FIX]
      Call verify_level_key in read_block_for_search to do another
      verification. For that purpose the function is exported.
      
      Due to above reasons, although we can free corrupted extent buffer from
      cache, we still need the check in read_block_for_search(), for scrub and
      shared tree blocks.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769Reported-by: NYoon Jungyeon <jungyeon@gatech.edu>
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8925a1f
  3. 07 2月, 2019 1 次提交
    • F
      Btrfs: fix deadlock when allocating tree block during leaf/node split · 5bce1436
      Filipe Manana 提交于
      commit a6279470762c19ba97e454f90798373dccdf6148 upstream.
      
      When splitting a leaf or node from one of the trees that are modified when
      flushing pending block groups (extent, chunk, device and free space trees),
      we need to allocate a new tree block, which in turn can result in the need
      to allocate a new block group. After allocating the new block group we may
      need to flush new block groups that were previously allocated during the
      course of the current transaction, which is what may cause a deadlock due
      to attempts to write lock twice the same leaf or node, as when splitting
      a leaf or node we are holding a write lock on it and its parent node.
      
      The same type of deadlock can also happen when increasing the tree's
      height, since we are holding a lock on the existing root while allocating
      the tree block to use as the new root node.
      
      An example trace when the deadlock happens during the leaf split path is:
      
        [27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G        W         4.19.16 #1
        [27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
        [27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
        (...)
        [27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246
        [27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001
        [27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48
        [27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000
        [27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000
        [27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18
        [27175.303601] FS:  0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000
        [27175.304468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0
        [27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [27175.307940] Call Trace:
        [27175.308802]  btrfs_search_slot+0x779/0x9a0 [btrfs]
        [27175.309669]  ? update_space_info+0xba/0xe0 [btrfs]
        [27175.310534]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
        [27175.311397]  btrfs_insert_item+0x60/0xd0 [btrfs]
        [27175.312253]  btrfs_create_pending_block_groups+0xee/0x210 [btrfs]
        [27175.313116]  do_chunk_alloc+0x25f/0x300 [btrfs]
        [27175.313984]  find_free_extent+0x706/0x10d0 [btrfs]
        [27175.314855]  btrfs_reserve_extent+0x9b/0x1d0 [btrfs]
        [27175.315707]  btrfs_alloc_tree_block+0x100/0x5b0 [btrfs]
        [27175.316548]  split_leaf+0x130/0x610 [btrfs]
        [27175.317390]  btrfs_search_slot+0x94d/0x9a0 [btrfs]
        [27175.318235]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
        [27175.319087]  alloc_reserved_file_extent+0x84/0x2c0 [btrfs]
        [27175.319938]  __btrfs_run_delayed_refs+0x596/0x1150 [btrfs]
        [27175.320792]  btrfs_run_delayed_refs+0xed/0x1b0 [btrfs]
        [27175.321643]  delayed_ref_async_start+0x81/0x90 [btrfs]
        [27175.322491]  normal_work_helper+0xd0/0x320 [btrfs]
        [27175.323328]  ? move_linked_works+0x6e/0xa0
        [27175.324160]  process_one_work+0x191/0x370
        [27175.324976]  worker_thread+0x4f/0x3b0
        [27175.325763]  kthread+0xf8/0x130
        [27175.326531]  ? rescuer_thread+0x320/0x320
        [27175.327284]  ? kthread_create_worker_on_cpu+0x50/0x50
        [27175.328027]  ret_from_fork+0x35/0x40
        [27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---
      
      Fix this by preventing the flushing of new blocks groups when splitting a
      leaf/node and when inserting a new root node for one of the trees modified
      by the flushing operation, similar to what is done when COWing a node/leaf
      from on of these trees.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383Reported-by: NEli V <eliventer@gmail.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bce1436
  4. 17 1月, 2019 1 次提交
  5. 10 1月, 2019 1 次提交
    • F
      Btrfs: send, fix race with transaction commits that create snapshots · 9eec74b4
      Filipe Manana 提交于
      commit be6821f82c3cc36e026f5afd10249988852b35ea upstream.
      
      If we create a snapshot of a snapshot currently being used by a send
      operation, we can end up with send failing unexpectedly (returning
      -ENOENT error to user space for example). The following diagram shows
      how this happens.
      
                  CPU 1                                   CPU2                                CPU3
      
       btrfs_ioctl_send()
        (...)
                                           create_snapshot()
                                            -> creates snapshot of a
                                               root used by the send
                                               task
                                            btrfs_commit_transaction()
                                             create_pending_snapshot()
        __get_inode_info()
         btrfs_search_slot()
          btrfs_search_slot_get_root()
           down_read commit_root_sem
      
           get reference on eb of the
           commit root
            -> eb with bytenr == X
      
           up_read commit_root_sem
      
                                              btrfs_cow_block(root node)
                                               btrfs_free_tree_block()
                                                -> creates delayed ref to
                                                   free the extent
      
                                             btrfs_run_delayed_refs()
                                              -> runs the delayed ref,
                                                 adds extent to
                                                 fs_info->pinned_extents
      
                                             btrfs_finish_extent_commit()
                                              unpin_extent_range()
                                               -> marks extent as free
                                                  in the free space cache
      
                                            transaction commit finishes
      
                                                                             btrfs_start_transaction()
                                                                              (...)
                                                                              btrfs_cow_block()
                                                                               btrfs_alloc_tree_block()
                                                                                btrfs_reserve_extent()
                                                                                 -> allocates extent at
                                                                                    bytenr == X
                                                                                btrfs_init_new_buffer(bytenr X)
                                                                                 btrfs_find_create_tree_block()
                                                                                  alloc_extent_buffer(bytenr X)
                                                                                   find_extent_buffer(bytenr X)
                                                                                    -> returns existing eb,
                                                                                       which the send task got
      
                                                                              (...)
                                                                               -> modifies content of the
                                                                                  eb with bytenr == X
      
          -> uses an eb that now
             belongs to some other
             tree and no more matches
             the commit root of the
             snapshot, resuts will be
             unpredictable
      
      The consequences of this race can be various, and can lead to searches in
      the commit root performed by the send task failing unexpectedly (unable to
      find inode items, returning -ENOENT to user space, for example) or not
      failing because an inode item with the same number was added to the tree
      that reused the metadata extent, in which case send can behave incorrectly
      in the worst case or just fail later for some reason.
      
      Fix this by performing a copy of the commit root's extent buffer when doing
      a search in the context of a send operation.
      
      CC: stable@vger.kernel.org # 4.4.x: 1fc28d8e: Btrfs: move get root out of btrfs_search_slot to a helper
      CC: stable@vger.kernel.org # 4.4.x: f9ddfd05: Btrfs: remove unused check of skip_locking
      CC: stable@vger.kernel.org # 4.4.x
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9eec74b4
  6. 14 11月, 2018 1 次提交
    • F
      Btrfs: fix deadlock when writing out free space caches · ea9c846f
      Filipe Manana 提交于
      commit 5ce55557 upstream.
      
      When writing out a block group free space cache we can end deadlocking
      with ourselves on an extent buffer lock resulting in a warning like the
      following:
      
        [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs]
        [245043.392792] CPU: 4 PID: 2608 Comm: btrfs-transacti Tainted: G
          W I      4.16.8 #1
        [245043.395489] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs]
        [245043.396791] RSP: 0018:ffffc9000424b840 EFLAGS: 00010246
        [245043.398093] RAX: 0000000000000a30 RBX: ffff8807e20a3d20 RCX: 0000000000000001
        [245043.399414] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8807e20a3d20
        [245043.400732] RBP: 0000000000000001 R08: ffff88041f39a700 R09: ffff880000000000
        [245043.402021] R10: 0000000000000040 R11: ffff8807e20a3d20 R12: ffff8807cb220630
        [245043.403296] R13: 0000000000000001 R14: ffff8807cb220628 R15: ffff88041fbdf000
        [245043.404780] FS:  0000000000000000(0000) GS:ffff88082fc80000(0000) knlGS:0000000000000000
        [245043.406050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [245043.407321] CR2: 00007fffdbdb9f10 CR3: 0000000001c09005 CR4: 00000000000206e0
        [245043.408670] Call Trace:
        [245043.409977]  btrfs_search_slot+0x761/0xa60 [btrfs]
        [245043.411278]  btrfs_insert_empty_items+0x62/0xb0 [btrfs]
        [245043.412572]  btrfs_insert_item+0x5b/0xc0 [btrfs]
        [245043.413922]  btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs]
        [245043.415216]  do_chunk_alloc+0x1e5/0x2a0 [btrfs]
        [245043.416487]  find_free_extent+0xcd0/0xf60 [btrfs]
        [245043.417813]  btrfs_reserve_extent+0x96/0x1e0 [btrfs]
        [245043.419105]  btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs]
        [245043.420378]  __btrfs_cow_block+0x127/0x550 [btrfs]
        [245043.421652]  btrfs_cow_block+0xee/0x190 [btrfs]
        [245043.422979]  btrfs_search_slot+0x227/0xa60 [btrfs]
        [245043.424279]  ? btrfs_update_inode_item+0x59/0x100 [btrfs]
        [245043.425538]  ? iput+0x72/0x1e0
        [245043.426798]  write_one_cache_group.isra.49+0x20/0x90 [btrfs]
        [245043.428131]  btrfs_start_dirty_block_groups+0x102/0x420 [btrfs]
        [245043.429419]  btrfs_commit_transaction+0x11b/0x880 [btrfs]
        [245043.430712]  ? start_transaction+0x8e/0x410 [btrfs]
        [245043.432006]  transaction_kthread+0x184/0x1a0 [btrfs]
        [245043.433341]  kthread+0xf0/0x130
        [245043.434628]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
        [245043.435928]  ? kthread_create_worker_on_cpu+0x40/0x40
        [245043.437236]  ret_from_fork+0x1f/0x30
        [245043.441054] ---[ end trace 15abaa2aaf36827f ]---
      
      This is because at write_one_cache_group() when we are COWing a leaf from
      the extent tree we end up allocating a new block group (chunk) and,
      because we have hit a threshold on the number of bytes reserved for system
      chunks, we attempt to finalize the creation of new block groups from the
      current transaction, by calling btrfs_create_pending_block_groups().
      However here we also need to modify the extent tree in order to insert
      a block group item, and if the location for this new block group item
      happens to be in the same leaf that we were COWing earlier, we deadlock
      since btrfs_search_slot() tries to write lock the extent buffer that we
      locked before at write_one_cache_group().
      
      We have already hit similar cases in the past and commit d9a0540a
      ("Btrfs: fix deadlock when finalizing block group creation") fixed some
      of those cases by delaying the creation of pending block groups at the
      known specific spots that could lead to a deadlock. This change reworks
      that commit to be more generic so that we don't have to add similar logic
      to every possible path that can lead to a deadlock. This is done by
      making __btrfs_cow_block() disallowing the creation of new block groups
      (setting the transaction's can_flush_pending_bgs to false) before it
      attempts to allocate a new extent buffer for either the extent, chunk or
      device trees, since those are the trees that pending block creation
      modifies. Once the new extent buffer is allocated, it allows creation of
      pending block groups to happen again.
      
      This change depends on a recent patch from Josef which is not yet in
      Linus' tree, named "btrfs: make sure we create all new block groups" in
      order to avoid occasional warnings at btrfs_trans_release_chunk_metadata().
      
      Fixes: d9a0540a ("Btrfs: fix deadlock when finalizing block group creation")
      CC: stable@vger.kernel.org # 4.4+
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199753
      Link: https://lore.kernel.org/linux-btrfs/CAJtFHUTHna09ST-_EEiyWmDH6gAqS6wa=zMNMBsifj8ABu99cw@mail.gmail.com/Reported-by: NE V <eliventer@gmail.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea9c846f
  7. 06 8月, 2018 3 次提交
  8. 30 5月, 2018 6 次提交
  9. 17 5月, 2018 1 次提交
    • L
      btrfs: fix reading stale metadata blocks after degraded raid1 mounts · 02a3307a
      Liu Bo 提交于
      If a btree block, aka. extent buffer, is not available in the extent
      buffer cache, it'll be read out from the disk instead, i.e.
      
      btrfs_search_slot()
        read_block_for_search()  # hold parent and its lock, go to read child
          btrfs_release_path()
          read_tree_block()  # read child
      
      Unfortunately, the parent lock got released before reading child, so
      commit 5bdd3536 ("Btrfs: Fix block generation verification race") had
      used 0 as parent transid to read the child block.  It forces
      read_tree_block() not to check if parent transid is different with the
      generation id of the child that it reads out from disk.
      
      A simple PoC is included in btrfs/124,
      
      0. A two-disk raid1 btrfs,
      
      1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
      
      2. Mount this filesystem and put it in use, after a while, device tree's
         root got COW but block A hasn't been allocated/overwritten yet.
      
      3. Umount it and reload the btrfs module to remove both disks from the
         global @fs_devices list.
      
      4. mount -odegraded dev1 and write some data, so now block A is allocated
         to be a leaf in checksum tree.  Note that only dev1 has the latest
         metadata of this filesystem.
      
      5. Umount it and mount it again normally (with both disks), since raid1
         can pick up one disk by the writer task's pid, if btrfs_search_slot()
         needs to read block A, dev2 which does NOT have the latest metadata
         might be read for block A, then we got a stale block A.
      
      6. As parent transid is not checked, block A is marked as uptodate and
         put into the extent buffer cache, so the future search won't bother
         to read disk again, which means it'll make changes on this stale
         one and make it dirty and flush it onto disk.
      
      To avoid the problem, parent transid needs to be passed to
      read_tree_block().
      
      In order to get a valid parent transid, we need to hold the parent's
      lock until finishing reading child.
      
      This patch needs to be slightly adapted for stable kernels, the
      &first_key parameter added to read_tree_block() is from 4.16+
      (581c1760). The fix is to replace 0 by 'gen'.
      
      Fixes: 5bdd3536 ("Btrfs: Fix block generation verification race")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      02a3307a
  10. 14 5月, 2018 1 次提交
    • R
      Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting · 6f2f0b39
      Robbie Ko 提交于
      [BUG]
      btrfs incremental send BUG happens when creating a snapshot of snapshot
      that is being used by send.
      
      [REASON]
      The problem can happen if while we are doing a send one of the snapshots
      used (parent or send) is snapshotted, because snapshoting implies COWing
      the root of the source subvolume/snapshot.
      
      1. When doing an incremental send, the send process will get the commit
         roots from the parent and send snapshots, and add references to them
         through extent_buffer_get().
      
      2. When a snapshot/subvolume is snapshotted, its root node is COWed
         (transaction.c:create_pending_snapshot()).
      
      3. COWing releases the space used by the node immediately, through:
      
         __btrfs_cow_block()
         --btrfs_free_tree_block()
         ----btrfs_add_free_space(bytenr of node)
      
      4. Because send doesn't hold a transaction open, it's possible that
         the transaction used to create the snapshot commits, switches the
         commit root and the old space used by the previous root node gets
         assigned to some other node allocation. Allocation of a new node will
         use the existing extent buffer found in memory, which we previously
         got a reference through extent_buffer_get(), and allow the extent
         buffer's content (pages) to be modified:
      
         btrfs_alloc_tree_block
         --btrfs_reserve_extent
         ----find_free_extent (get bytenr of old node)
         --btrfs_init_new_buffer (use bytenr of old node)
         ----btrfs_find_create_tree_block
         ------alloc_extent_buffer
         --------find_extent_buffer (get old node)
      
      5. So send can access invalid memory content and have unpredictable
         behaviour.
      
      [FIX]
      So we fix the problem by copying the commit roots of the send and
      parent snapshots and use those copies.
      
      CallTrace looks like this:
       ------------[ cut here ]------------
       kernel BUG at fs/btrfs/ctree.c:1861!
       invalid opcode: 0000 [#1] SMP
       CPU: 6 PID: 24235 Comm: btrfs Tainted: P           O 3.10.105 #23721
       ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000
       RIP: 0010:[<ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs]
       RSP: 0018:ffff88041b723b68  EFLAGS: 00010246
       RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000
       RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000
       RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66
       R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890
       R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001
       FS:  00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000)
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
       ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880
       ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50
       ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400
       Call Trace:
       [<ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs]
       [<ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs]
       [<ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs]
       [<ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs]
       [<ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs]
       [<ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs]
       [<ffffffff81111133>] ? handle_pte_fault+0x373/0x990
       [<ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20
       [<ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0
       [<ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0
       [<ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500
       [<ffffffff81062f07>] ? check_preempt_curr+0x57/0x90
       [<ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990
       [<ffffffff81034f83>] ? do_fork+0x113/0x3b0
       [<ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c
       [<ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0
       [<ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b
       ---[ end trace 29576629ee80b2e1 ]---
      
      Fixes: 7069830a ("Btrfs: add btrfs_compare_trees function")
      CC: stable@vger.kernel.org # 3.6+
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6f2f0b39
  11. 12 4月, 2018 1 次提交
  12. 31 3月, 2018 14 次提交
  13. 26 3月, 2018 1 次提交
  14. 22 1月, 2018 3 次提交
  15. 07 12月, 2017 1 次提交
  16. 30 10月, 2017 2 次提交
  17. 16 8月, 2017 1 次提交