1. 16 8月, 2017 1 次提交
    • N
      btrfs: remove unused sectorsize member · 23d1f737
      Nikolay Borisov 提交于
      The sectorsize member of btrfs_block_group_cache is unused. So remove it, this
      reduces the number of holes in the struct.
      
      With patch:
      /* size: 856, cachelines: 14, members: 40 */
      /* sum members: 837, holes: 4, sum holes: 19 */
      /* bit holes: 1, sum bit holes: 29 bits */
      /* last cacheline: 24 bytes */
      
      Without patch:
      /* size: 864, cachelines: 14, members: 41 */
      /* sum members: 841, holes: 5, sum holes: 23 */
      /* bit holes: 1, sum bit holes: 29 bits */
      /* last cacheline: 32 bytes */
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      23d1f737
  2. 24 7月, 2017 2 次提交
    • O
      Btrfs: fix early ENOSPC due to delalloc · 17024ad0
      Omar Sandoval 提交于
      If a lot of metadata is reserved for outstanding delayed allocations, we
      rely on shrink_delalloc() to reclaim metadata space in order to fulfill
      reservation tickets. However, shrink_delalloc() has a shortcut where if
      it determines that space can be overcommitted, it will stop early. This
      made sense before the ticketed enospc system, but now it means that
      shrink_delalloc() will often not reclaim enough space to fulfill any
      tickets, leading to an early ENOSPC. (Reservation tickets don't care
      about being able to overcommit, they need every byte accounted for.)
      
      Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
      all of the metadata it is supposed to. This fixes early ENOSPCs we were
      seeing when doing a btrfs receive to populate a new filesystem, as well
      as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.
      
      Fixes: 957780eb ("Btrfs: introduce ticketed enospc infrastructure")
      Tested-by: NChristoph Anton Mitterer <mail@christoph.anton.mitterer.name>
      Cc: stable@vger.kernel.org
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      17024ad0
    • J
      btrfs: fix lockup in find_free_extent with read-only block groups · 14443937
      Jeff Mahoney 提交于
      If we have a block group that is all of the following:
      1) uncached in memory
      2) is read-only
      3) has a disk cache state that indicates we need to recreate the cache
      
      AND the file system has enough free space fragmentation such that the
      request for an extent of a given size can't be honored;
      
      AND have a single CPU core;
      
      AND it's the block group with the highest starting offset such that
      there are no opportunities (like reading from disk) for the loop to
      yield the CPU;
      
      We can end up with a lockup.
      
      The root cause is simple.  Once we're in the position that we've read in
      all of the other block groups directly and none of those block groups
      can honor the request, there are no more opportunities to sleep.  We end
      up trying to start a caching thread which never gets run if we only have
      one core.  This *should* present as a hung task waiting on the caching
      thread to make some progress, but it doesn't.  Instead, it degrades into
      a busy loop because of the placement of the read-only check.
      
      During the first pass through the loop, block_group->cached will be set
      to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
      read-only check and short circuit the loop.  We're not yet in
      LOOP_CACHING_WAIT, so we skip that loop back before going through the
      loop again for other raid groups.
      
      Then we move to LOOP_CACHING_WAIT state.
      
      During the this pass through the loop, ->cached will still be
      BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
      cache_block_group, do a lot of nothing, and return, and also set
      have_caching_bg again.  Then we hit the read-only check and short circuit
      the loop.  The same thing happens as before except now we DO trigger
      the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
      top.  We do this forever.
      
      There are two fixes in this patch since they address the same underlying
      bug.
      
      The first is to add a cond_resched to the end of the loop to ensure
      that the caching thread always has an opportunity to run.  This will
      fix the soft lockup issue, but find_free_extent will still loop doing
      nothing until the thread has completed.
      
      The second is to move the read-only check to the top of the loop.  We're
      never going to return an allocation within a read-only block group so
      we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
      would cause the same problem except that BTRFS_CACHE_ERROR is considered
      a "done" state and we won't re-set have_caching_bg again.
      
      Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
      the testing process.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      14443937
  3. 30 6月, 2017 10 次提交
  4. 20 6月, 2017 7 次提交
  5. 01 6月, 2017 2 次提交
    • J
      btrfs: fix race with relocation recovery and fs_root setup · a9b3311e
      Jeff Mahoney 提交于
      If we have to recover relocation during mount, we'll ultimately have to
      evict the orphan inode.  That goes through the reservation dance, where
      priority_reclaim_metadata_space and flush_space expect fs_info->fs_root
      to be valid.  That's the next thing to be set up during mount, so we
      crash, almost always in flush_space trying to join the transaction
      but priority_reclaim_metadata_space is possible as well.  This call
      path has been problematic in the past WRT whether ->fs_root is valid
      yet.  Commit 957780eb (Btrfs: introduce ticketed enospc
      infrastructure) added new users that are called in the direct path
      instead of the async path that had already been worked around.
      
      The thing is that we don't actually need the fs_root, specifically, for
      anything.  We either use it to determine whether the root is the
      chunk_root for use in choosing an allocation profile or as a root to pass
      btrfs_join_transaction before immediately committing it.  Anything that
      isn't the chunk root works in the former case and any root works in
      the latter.
      
      A simple fix is to use a root we know will always be there: the
      extent_root.
      
      Cc: <stable@vger.kernel.org> # v4.8+
      Fixes: 957780eb (Btrfs: introduce ticketed enospc infrastructure)
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a9b3311e
    • J
      btrfs: fix memory leak in update_space_info failure path · 896533a7
      Jeff Mahoney 提交于
      If we fail to add the space_info kobject, we'll leak the memory
      for the percpu counter.
      
      Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      896533a7
  6. 18 4月, 2017 6 次提交
  7. 02 3月, 2017 1 次提交
  8. 28 2月, 2017 9 次提交
  9. 24 2月, 2017 1 次提交
    • F
      Btrfs: fix assertion failure when freeing block groups at close_ctree() · 5cdd7db6
      Filipe Manana 提交于
      At close_ctree() we free the block groups and then only after we wait for
      any running worker kthreads to finish and shutdown the workqueues. This
      behaviour is racy and it triggers an assertion failure when freeing block
      groups because while we are doing it we can have for example a block group
      caching kthread running, and in that case the block group's reference
      count can still be greater than 1 by the time we assert its reference count
      is 1, leading to an assertion failure:
      
      [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: fs/btrfs/extent-tree.c, line: 9799
      [19041.200584] ------------[ cut here ]------------
      [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
      [19041.202830] invalid opcode: 0000 [#1] PREEMPT SMP
      [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs]
      [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 4.9.0-rc7-btrfs-next-36+ #1
      [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
      [19041.208082] task: ffff88015f028980 task.stack: ffffc9000ad34000
      [19041.208082] RIP: 0010:[<ffffffffa03e319e>]  [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs]
      [19041.208082] RSP: 0018:ffffc9000ad37d60  EFLAGS: 00010286
      [19041.208082] RAX: 0000000000000061 RBX: ffff88015ecb4000 RCX: 0000000000000001
      [19041.208082] RDX: ffff88023f392fb8 RSI: ffffffff817ef7ba RDI: 00000000ffffffff
      [19041.208082] RBP: ffffc9000ad37d60 R08: 0000000000000001 R09: 0000000000000000
      [19041.208082] R10: ffffc9000ad37cb0 R11: ffffffff82f2b66d R12: ffff88023431d170
      [19041.208082] R13: ffff88015ecb40c0 R14: ffff88023431d000 R15: ffff88015ecb4100
      [19041.208082] FS:  00007f44f3d42840(0000) GS:ffff88023f380000(0000) knlGS:0000000000000000
      [19041.208082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [19041.208082] CR2: 00007f65d623b000 CR3: 00000002166f2000 CR4: 00000000000006e0
      [19041.208082] Stack:
      [19041.208082]  ffffc9000ad37d98 ffffffffa035989f ffff88015ecb4000 ffff88015ecb5630
      [19041.208082]  ffff88014f6be000 0000000000000000 00007ffcf0ba6a10 ffffc9000ad37df8
      [19041.208082]  ffffffffa0368cd4 ffff88014e9658e0 ffffc9000ad37e08 ffffffff811a634d
      [19041.208082] Call Trace:
      [19041.208082]  [<ffffffffa035989f>] btrfs_free_block_groups+0x17f/0x392 [btrfs]
      [19041.208082]  [<ffffffffa0368cd4>] close_ctree+0x1c5/0x2e1 [btrfs]
      [19041.208082]  [<ffffffff811a634d>] ? evict_inodes+0x132/0x141
      [19041.208082]  [<ffffffffa034356d>] btrfs_put_super+0x15/0x17 [btrfs]
      [19041.208082]  [<ffffffff8118fc32>] generic_shutdown_super+0x6a/0xeb
      [19041.208082]  [<ffffffff8119004f>] kill_anon_super+0x12/0x1c
      [19041.208082]  [<ffffffffa0343370>] btrfs_kill_super+0x16/0x21 [btrfs]
      [19041.208082]  [<ffffffff8118fad1>] deactivate_locked_super+0x3b/0x68
      [19041.208082]  [<ffffffff8118fb34>] deactivate_super+0x36/0x39
      [19041.208082]  [<ffffffff811a9946>] cleanup_mnt+0x58/0x76
      [19041.208082]  [<ffffffff811a99a2>] __cleanup_mnt+0x12/0x14
      [19041.208082]  [<ffffffff81071573>] task_work_run+0x6f/0x95
      [19041.208082]  [<ffffffff81001897>] prepare_exit_to_usermode+0xa3/0xc1
      [19041.208082]  [<ffffffff81001a23>] syscall_return_slowpath+0x16e/0x1d2
      [19041.208082]  [<ffffffff814c607d>] entry_SYSCALL_64_fastpath+0xab/0xad
      [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
      [19041.208082] RIP  [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs]
      [19041.208082]  RSP <ffffc9000ad37d60>
      [19041.279264] ---[ end trace 23330586f16f064d ]---
      
      This started happening as of kernel 4.8, since commit f3bca802
      ("Btrfs: add ASSERT for block group's memory leak") introduced these
      assertions.
      
      So fix this by freeing the block groups only after waiting for all
      worker kthreads to complete and shutdown the workqueues.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      5cdd7db6
  10. 17 2月, 2017 1 次提交