1. 31 3月, 2018 1 次提交
  2. 26 3月, 2018 9 次提交
  3. 02 2月, 2018 1 次提交
    • F
      Btrfs: fix null pointer dereference when replacing missing device · 627e0873
      Filipe Manana 提交于
      When we are replacing a missing device we mount the filesystem with the
      degraded mode option in which case we are allowed to have a btrfs device
      structure without a backing device member (its bdev member is NULL) and
      therefore we can't dereference that member. Commit 38b5f68e
      ("btrfs: drop btrfs_device::can_discard to query directly") started to
      dereference that member when discarding extents, resulting in a null
      pointer dereference:
      
       [ 3145.322257] BTRFS warning (device sdf): devid 2 uuid 4d922414-58eb-4880-8fed-9c3840f6c5d5 is missing
       [ 3145.364116] BTRFS info (device sdf): dev_replace from <missing disk> (devid 2) to /dev/sdg started
       [ 3145.413489] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
       [ 3145.415085] IP: btrfs_discard_extent+0x6a/0xf8 [btrfs]
       [ 3145.415085] PGD 0 P4D 0
       [ 3145.415085] Oops: 0000 [#1] PREEMPT SMP PTI
       [ 3145.415085] Modules linked in: ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse parport_pc serio_raw i2c_piix4 i2
       [ 3145.415085] CPU: 0 PID: 11989 Comm: btrfs Tainted: G        W        4.15.0-rc9-btrfs-next-55+ #1
       [ 3145.415085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [ 3145.415085] RIP: 0010:btrfs_discard_extent+0x6a/0xf8 [btrfs]
       [ 3145.415085] RSP: 0018:ffffc90004813c60 EFLAGS: 00010293
       [ 3145.415085] RAX: ffff88020d39cc00 RBX: ffff88020c4ea2a0 RCX: 0000000000000002
       [ 3145.415085] RDX: 0000000000000000 RSI: ffff88020c4ea240 RDI: 0000000000000000
       [ 3145.415085] RBP: 0000000000000000 R08: 0000000000004000 R09: 0000000000000000
       [ 3145.415085] R10: ffffc90004813ae8 R11: 0000000000000000 R12: 0000000000000000
       [ 3145.415085] R13: ffff88020c418000 R14: 0000000000000000 R15: 0000000000000000
       [ 3145.415085] FS:  00007f565681f8c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [ 3145.415085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [ 3145.415085] CR2: 00000000000000e0 CR3: 000000020d208006 CR4: 00000000001606f0
       [ 3145.415085] Call Trace:
       [ 3145.415085]  btrfs_finish_extent_commit+0x9a/0x1be [btrfs]
       [ 3145.415085]  btrfs_commit_transaction+0x649/0x7a0 [btrfs]
       [ 3145.415085]  ? start_transaction+0x2b0/0x3b3 [btrfs]
       [ 3145.415085]  btrfs_dev_replace_start+0x274/0x30c [btrfs]
       [ 3145.415085]  btrfs_dev_replace_by_ioctl+0x45/0x59 [btrfs]
       [ 3145.415085]  btrfs_ioctl+0x1a91/0x1d62 [btrfs]
       [ 3145.415085]  ? lock_acquire+0x16a/0x1af
       [ 3145.415085]  ? vfs_ioctl+0x1b/0x28
       [ 3145.415085]  ? trace_hardirqs_on_caller+0x14c/0x1a6
       [ 3145.415085]  vfs_ioctl+0x1b/0x28
       [ 3145.415085]  do_vfs_ioctl+0x5a9/0x5e0
       [ 3145.415085]  ? _raw_spin_unlock_irq+0x34/0x46
       [ 3145.415085]  ? entry_SYSCALL_64_fastpath+0x5/0x8b
       [ 3145.415085]  ? trace_hardirqs_on_caller+0x14c/0x1a6
       [ 3145.415085]  SyS_ioctl+0x52/0x76
       [ 3145.415085]  entry_SYSCALL_64_fastpath+0x1e/0x8b
       [ 3145.415085] RIP: 0033:0x7f56558b3c47
       [ 3145.415085] RSP: 002b:00007ffdcfac4c58 EFLAGS: 00000202
       [ 3145.415085] Code: be 02 00 00 00 4c 89 ef e8 b9 e7 03 00 85 c0 89 c5 75 75 48 8b 44 24 08 45 31 f6 48 8d 58 60 eb 52 48 8b 03 48 8b b8 a0 00 00 00 <48> 8b 87 e0 00
       [ 3145.415085] RIP: btrfs_discard_extent+0x6a/0xf8 [btrfs] RSP: ffffc90004813c60
       [ 3145.415085] CR2: 00000000000000e0
       [ 3145.458185] ---[ end trace 06302e7ac31902bf ]---
      
      This is trivially reproduced by running the test btrfs/027 from fstests
      like this:
      
        $ MOUNT_OPTIONS="-o discard" ./check btrfs/027
      
      Fix this by skipping devices without a backing device before attempting
      to discard.
      
      Fixes: 38b5f68e ("btrfs: drop btrfs_device::can_discard to query directly")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      627e0873
  4. 22 1月, 2018 7 次提交
  5. 07 12月, 2017 1 次提交
  6. 21 11月, 2017 1 次提交
    • J
      btrfs: clear space cache inode generation always · 8e138e0d
      Josef Bacik 提交于
      We discovered a box that had double allocations, and suspected the space
      cache may be to blame.  While auditing the write out path I noticed that
      if we've already setup the space cache we will just carry on.  This
      means that any error we hit after cache_save_setup before we go to
      actually write the cache out we won't reset the inode generation, so
      whatever was already written will be considered correct, except it'll be
      stale.  Fix this by _always_ resetting the generation on the block group
      inode, this way we only ever have valid or invalid cache.
      
      With this patch I was no longer able to reproduce cache corruption with
      dm-log-writes and my bpf error injection tool.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8e138e0d
  7. 13 11月, 2017 1 次提交
    • D
      Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20
      David Howells 提交于
      Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
      extra argument and make it 'unsigned int throughout.
      
      Also, consolidate a bunch of identical action functions into a default
      function that can do the appropriate thing for the mode.
      
      Also, change the argument name in the bit_wait*() function declarations to
      reflect the fact that it's the mode and not the bit number.
      
      [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
      should be done differently, though he's not immediately sure as to how]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      cc: Ingo Molnar <mingo@kernel.org>
      5e4def20
  8. 02 11月, 2017 3 次提交
    • J
      btrfs: track refs in a rb_tree instead of a list · 0e0adbcf
      Josef Bacik 提交于
      If we get a significant amount of delayed refs for a single block (think
      modifying multiple snapshots) we can end up spending an ungodly amount
      of time looping through all of the entries trying to see if they can be
      merged.  This is because we only add them to a list, so we have O(2n)
      for every ref head.  This doesn't make any sense as we likely have refs
      for different roots, and so they cannot be merged.  Tracking in a tree
      will allow us to break as soon as we hit an entry that doesn't match,
      making our worst case O(n).
      
      With this we can also merge entries more easily.  Before we had to hope
      that matching refs were on the ends of our list, but with the tree we
      can search down to exact matches and merge them at insert time.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0e0adbcf
    • J
      btrfs: make the delalloc block rsv per inode · 69fe2d75
      Josef Bacik 提交于
      The way we handle delalloc metadata reservations has gotten
      progressively more complicated over the years.  There is so much cruft
      and weirdness around keeping the reserved count and outstanding counters
      consistent and handling the error cases that it's impossible to
      understand.
      
      Fix this by making the delalloc block rsv per-inode.  This way we can
      calculate the actual size of the outstanding metadata reservations every
      time we make a change, and then reserve the delta based on that amount.
      This greatly simplifies the code everywhere, and makes the error
      handling in btrfs_delalloc_reserve_metadata far less terrifying.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      69fe2d75
    • J
      Btrfs: rework outstanding_extents · 8b62f87b
      Josef Bacik 提交于
      Right now we do a lot of weird hoops around outstanding_extents in order
      to keep the extent count consistent.  This is because we logically
      transfer the outstanding_extent count from the initial reservation
      through the set_delalloc_bits.  This makes it pretty difficult to get a
      handle on how and when we need to mess with outstanding_extents.
      
      Fix this by revamping the rules of how we deal with outstanding_extents.
      Now instead everybody that is holding on to a delalloc extent is
      required to increase the outstanding extents count for itself.  This
      means we'll have something like this
      
      btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
       btrfs_set_extent_delalloc	- outstanding_extents = 2
      btrfs_release_delalloc_extents	- outstanding_extents = 1
      
      for an initial file write.  Now take the append write where we extend an
      existing delalloc range but still under the maximum extent size
      
      btrfs_delalloc_reserve_metadata - outstanding_extents = 2
        btrfs_set_extent_delalloc
          btrfs_set_bit_hook		- outstanding_extents = 3
          btrfs_merge_extent_hook	- outstanding_extents = 2
      btrfs_delalloc_release_extents	- outstanding_extnets = 1
      
      In order to make the ordered extent transition we of course must now
      make ordered extents carry their own outstanding_extent reservation, so
      for cow_file_range we end up with
      
      btrfs_add_ordered_extent	- outstanding_extents = 2
      clear_extent_bit		- outstanding_extents = 1
      btrfs_remove_ordered_extent	- outstanding_extents = 0
      
      This makes all manipulations of outstanding_extents much more explicit.
      Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
      combined with btrfs_release_delalloc_extents, even in the error case, as
      that is the only function that actually modifies the
      outstanding_extents counter.
      
      The drawback to this is now we are much more likely to have transient
      cases where outstanding_extents is much larger than it actually should
      be.  This could happen before as we manipulated the delalloc bits, but
      now it happens basically at every write.  This may put more pressure on
      the ENOSPC flushing code, but I think making this code simpler is worth
      the cost.  I have another change coming to mitigate this side-effect
      somewhat.
      
      I also added trace points for the counter manipulation.  These were used
      by a bpf script I wrote to help track down leak issues.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8b62f87b
  9. 30 10月, 2017 14 次提交
  10. 21 8月, 2017 2 次提交