1. 27 9月, 2016 2 次提交
  2. 26 9月, 2016 1 次提交
    • J
      Btrfs: add a flags field to btrfs_fs_info · afcdd129
      Josef Bacik 提交于
      We have a lot of random ints in btrfs_fs_info that can be put into flags.  This
      is mostly equivalent with the exception of how we deal with quota going on or
      off, now instead we set a flag when we are turning it on or off and deal with
      that appropriately, rather than just having a pending state that the current
      quota_enabled gets set to.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      afcdd129
  3. 25 8月, 2016 1 次提交
    • W
      btrfs: fix fsfreeze hang caused by delayed iputs deal · 9e7cc91a
      Wang Xiaoguang 提交于
      When running fstests generic/068, sometimes we got below deadlock:
        xfs_io          D ffff8800331dbb20     0  6697   6693 0x00000080
        ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
        ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
        ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
        Call Trace:
        [<ffffffff816a9045>] schedule+0x35/0x80
        [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140
        [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100
        [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30
        [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
        [<ffffffff810d32b5>] percpu_down_read+0x35/0x50
        [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40
        [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs]
        [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs]
        [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
        [<ffffffff81230a1a>] evict+0xba/0x1a0
        [<ffffffff812316b6>] iput+0x196/0x200
        [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
        [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs]
        [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs]
        [<ffffffff81218040>] freeze_super+0xf0/0x190
        [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0
        [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140
        [<ffffffff81229409>] SyS_ioctl+0x79/0x90
        [<ffffffff81003c12>] do_syscall_64+0x62/0x110
        [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25
      
      >From this warning, freeze_super() already holds SB_FREEZE_FS, but
      btrfs_freeze() will call btrfs_commit_transaction() again, if
      btrfs_commit_transaction() finds that it has delayed iputs to handle,
      it'll start_transaction(), which will try to get SB_FREEZE_FS lock
      again, then deadlock occurs.
      
      The root cause is that in btrfs, sync_filesystem(sb) does not make
      sure all metadata is updated. There still maybe some codes adding
      delayed iputs, see below sample race window:
      
               CPU1                                  |         CPU2
      |-> freeze_super()                             |
          |-> sync_filesystem(sb);                   |
          |                                          |-> cleaner_kthread()
          |                                          |   |-> btrfs_delete_unused_bgs()
          |                                          |       |-> btrfs_remove_chunk()
          |                                          |           |-> btrfs_remove_block_group()
          |                                          |               |-> btrfs_add_delayed_iput()
          |                                          |
          |-> sb->s_writers.frozen = SB_FREEZE_FS;   |
          |-> sb_wait_write(sb, SB_FREEZE_FS);       |
          |   acquire SB_FREEZE_FS lock.             |
          |                                          |
          |-> btrfs_freeze()                         |
              |-> btrfs_commit_transaction()         |
                  |-> btrfs_run_delayed_iputs()      |
                  |   will handle delayed iputs,     |
                  |   that means start_transaction() |
                  |   will be called, which will try |
                  |   to get SB_FREEZE_FS lock.      |
      
      To fix this issue, introduce a "int fs_frozen" to record internally whether
      fs has been frozen. If fs has been frozen, we can not handle delayed iputs.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add comment to btrfs_freeze ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e7cc91a
  4. 26 7月, 2016 6 次提交
  5. 18 6月, 2016 2 次提交
    • Z
      btrfs: avoid blocking open_ctree from cleaner_kthread · 90c711ab
      Zygo Blaxell 提交于
      This fixes a problem introduced in commit 2f3165ec
      "btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes".
      
      open_ctree eventually calls btrfs_replay_log which in turn calls
      btrfs_commit_super which tries to lock the cleaner_mutex, causing a
      recursive mutex deadlock during mount.
      
      Instead of playing whack-a-mole trying to keep up with all the
      functions that may want to lock cleaner_mutex, put all the cleaner_mutex
      lockers back where they were, and attack the problem more directly:
      keep cleaner_kthread asleep until the filesystem is mounted.
      
      When filesystems are mounted read-only and later remounted read-write,
      open_ctree did not set fs_info->open and neither does anything else.
      Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs
      nor cleaner_kthread get confused by the common case of "/" filesystem
      read-only mount followed by read-write remount.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      90c711ab
    • J
      btrfs: account for non-CoW'd blocks in btrfs_abort_transaction · 64c12921
      Jeff Mahoney 提交于
      The test for !trans->blocks_used in btrfs_abort_transaction is
      insufficient to determine whether it's safe to drop the transaction
      handle on the floor.  btrfs_cow_block, informed by should_cow_block,
      can return blocks that have already been CoW'd in the current
      transaction.  trans->blocks_used is only incremented for new block
      allocations. If an operation overlaps the blocks in the current
      transaction entirely and must abort the transaction, we'll happily
      let it clean up the trans handle even though it may have modified
      the blocks and will commit an incomplete operation.
      
      In the long-term, I'd like to do closer tracking of when the fs
      is actually modified so we can still recover as gracefully as possible,
      but that approach will need some discussion.  In the short term,
      since this is the only code using trans->blocks_used, let's just
      switch it to a bool indicating whether any blocks were used and set
      it when should_cow_block returns false.
      
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      64c12921
  6. 06 6月, 2016 2 次提交
  7. 03 6月, 2016 1 次提交
  8. 26 5月, 2016 1 次提交
  9. 13 5月, 2016 1 次提交
  10. 06 5月, 2016 2 次提交
  11. 28 4月, 2016 3 次提交
  12. 12 3月, 2016 1 次提交
  13. 23 2月, 2016 2 次提交
  14. 12 2月, 2016 3 次提交
  15. 30 1月, 2016 1 次提交
  16. 22 1月, 2016 1 次提交
  17. 16 1月, 2016 1 次提交
    • T
      Btrfs: fix output of compression message in btrfs_parse_options() · b7c47bbb
      Tsutomu Itoh 提交于
      The compression message might not be correctly output.
      Fix it.
      
      [[before fix]]
      
      # mount -o compress /dev/sdb3 /test3
      [  996.874264] BTRFS info (device sdb3): disk space caching is enabled
      [  996.874268] BTRFS: has skinny extents
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress-force /dev/sdb3 /test3
      [ 1035.075017] BTRFS info (device sdb3): force zlib compression
      [ 1035.075021] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress /dev/sdb3 /test3
      [ 1053.679092] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      
      [[after fix]]
      
      # mount -o compress /dev/sdb3 /test3
      [  401.021753] BTRFS info (device sdb3): use zlib compression
      [  401.021758] BTRFS info (device sdb3): disk space caching is enabled
      [  401.021760] BTRFS: has skinny extents
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress-force /dev/sdb3 /test3
      [  439.824624] BTRFS info (device sdb3): force zlib compression
      [  439.824629] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress /dev/sdb3 /test3
      [  459.918430] BTRFS info (device sdb3): use zlib compression
      [  459.918434] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b7c47bbb
  18. 07 1月, 2016 4 次提交
    • D
      btrfs: statfs: report zero available if metadata are exhausted · ca8a51b3
      David Sterba 提交于
      There is one ENOSPC case that's very confusing. There's Available
      greater than zero but no file operation succeds (besides removing
      files). This happens when the metadata are exhausted and there's no
      possibility to allocate another chunk.
      
      In this scenario it's normal that there's still some space in the data
      chunk and the calculation in df reflects that in the Avail value.
      
      To at least give some clue about the ENOSPC situation, let statfs report
      zero value in Avail, even if there's still data space available.
      
      Current:
        /dev/sdb1             4.0G  3.3G  719M  83% /mnt/test
      
      New:
        /dev/sdb1             4.0G  3.3G     0 100% /mnt/test
      
      We calculate the remaining metadata space minus global reserve. If this
      is (supposedly) smaller than zero, there's no space. But this does not
      hold in practice, the exhausted state happens where's still some
      positive delta. So we apply some guesswork and compare the delta to a 4M
      threshold. (Practically observed delta was 2M.)
      
      We probably cannot calculate the exact threshold value because this
      depends on the internal reservations requested by various operations, so
      some operations that consume a few metadata will succeed even if the
      Avail is zero. But this is better than the other way around.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ca8a51b3
    • D
      btrfs: constify static arrays · 4d4ab6d6
      David Sterba 提交于
      There are a few statically initialized arrays that can be made const.
      The remaining (like file_system_type, sysfs attributes or prop handlers)
      do not allow that due to type mismatch when passed to the APIs or
      because the structures are modified through other members.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4d4ab6d6
    • B
      Btrfs: use linux/sizes.h to represent constants · ee22184b
      Byongho Lee 提交于
      We use many constants to represent size and offset value.  And to make
      code readable we use '256 * 1024 * 1024' instead of '268435456' to
      represent '256MB'.  However we can make far more readable with 'SZ_256MB'
      which is defined in the 'linux/sizes.h'.
      
      So this patch replaces 'xxx * 1024 * 1024' kind of expression with
      single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is
      not a power of 2. And I haven't touched to '4096' & '8192' because it's
      more intuitive than 'SZ_4KB' & 'SZ_8KB'.
      Signed-off-by: NByongho Lee <bhlee.kernel@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ee22184b
    • D
      a1c6f057
  19. 18 12月, 2015 2 次提交
  20. 22 10月, 2015 1 次提交
    • J
      Btrfs: add fragment=* debug mount option · d0bd4560
      Josef Bacik 提交于
      In tracking down these weird bitmap problems it was helpful to artificially
      create an extremely fragmented file system.  These mount options let us either
      fragment data or metadata or both.  With these options I could reproduce all
      sorts of weird latencies and hangs that occur under extreme fragmentation and
      get them fixed.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d0bd4560
  21. 29 9月, 2015 1 次提交
    • A
      Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined · 57d816a1
      Anand Jain 提交于
      error handling logic behaves differently with or without
      CONFIG_PRINTK defined, since there are two copies of the same
      function which a bit of different logic
      
      One, when CONFIG_PRINTK is defined, code is
      
      __btrfs_std_error(..)
      {
      ::
             save_error_info(fs_info);
             if (sb->s_flags & MS_BORN)
                     btrfs_handle_error(fs_info);
      }
      
      and two when CONFIG_PRINTK is not defined, the code is
      
      __btrfs_std_error(..)
      {
      ::
             if (sb->s_flags & MS_BORN) {
                     save_error_info(fs_info);
                     btrfs_handle_error(fs_info);
              }
      }
      
      I doubt if this was intentional ? and appear to have caused since
      we maintain two copies of the same function and they got diverged
      with commits.
      
      Now to decide which logic is correct reviewed changes as below,
      
       533574c6
      Commit added two copies of this function
      
       cf79ffb5
      Commit made change to only one copy of the function and to the
      copy when CONFIG_PRINTK is defined.
      
      To fix this, instead of maintaining two copies of same function
      approach, maintain single function, and just put the extra
      portion of the code under CONFIG_PRINTK define.
      
      This patch just does that. And keeps code of with CONFIG_PRINTK
      defined.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      57d816a1
  22. 10 9月, 2015 1 次提交
    • F
      Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock · 85e0a0f2
      Filipe Manana 提交于
      After commmit e44163e1 ("btrfs: explictly delete unused block groups
      in close_ctree and ro-remount"), added in the 4.3 merge window, we have
      calls to btrfs_delete_unused_bgs() while holding the cleaner_mutex.
      This can cause a deadlock with a concurrent block group relocation (when
      a filesystem balance or shrink operation is in progress for example)
      because btrfs_delete_unused_bgs() locks delete_unused_bgs_mutex and the
      relocation path locks first delete_unused_bgs_mutex and then it locks
      cleaner_mutex, resulting in a classic ABBA deadlock:
      
               CPU 0                                        CPU 1
      
      lock fs_info->cleaner_mutex
      
                                                 __btrfs_balance() || btrfs_shrink_device()
                                                   lock fs_info->delete_unused_bgs_mutex
                                                   btrfs_relocate_chunk()
                                                     btrfs_relocate_block_group()
                                                       lock fs_info->cleaner_mutex
      btrfs_delete_unused_bgs()
        lock fs_info->delete_unused_bgs_mutex
      
      Fix this by not taking the cleaner_mutex before calling
      btrfs_delete_unused_bgs() because it's no longer needed after
      commit 67c5e7d4 ("Btrfs: fix race between balance and unused block
      group deletion"). The mutex fs_info->delete_unused_bgs_mutex, the
      spinlock fs_info->unused_bgs_lock and a block group's spinlock are
      enough to get correct serialization between tasks running relocation
      and unused block group deletion (as well as between multiple tasks
      concurrently calling btrfs_delete_unused_bgs()).
      
      This issue was discussed (in the mailing list) during the review of
      the patch titled "btrfs: explictly delete unused block groups in
      close_ctree and ro-remount" and it was agreed that acquiring the
      cleaner mutex had to be dropped after the patch titled
      "Btrfs: fix race between balance and unused block group deletion"
      got merged (both patches were submitted at about the same time, but
      one landed in kernel 4.2 and the other in the 4.3 merge window).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      85e0a0f2