1. 06 8月, 2018 16 次提交
  2. 30 5月, 2018 1 次提交
    • L
      Btrfs: add parent_transid parameter to veirfy_level_key · ff76a864
      Liu Bo 提交于
      As verify_level_key() is checked after verify_parent_transid(), i.e.
      
      if (verify_parent_transid())
         ret = -EIO;
      else if (verify_level_key())
         ret = -EUCLEAN;
      
      if parent_transid is 0, verify_parent_transid() skips verifying
      parent_transid and considers eb as valid, and if verify_level_key()
      reports something wrong, we're not going to know if it's caused by
      corrupted metadata or non-checkecd eb (e.g. stale eb).
      
      The stale eb can be from an outdated raid1 mirror after a degraded
      mount, see eg "btrfs: fix reading stale metadata blocks after degraded
      raid1 mounts" (02a3307a) for more details.
      
      @parent_transid is able to tell whether the eb's generation has been
      verified by the caller.
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ff76a864
  3. 29 5月, 2018 7 次提交
    • O
      Btrfs: get rid of unused orphan infrastructure · a575ceeb
      Omar Sandoval 提交于
      Now that we don't keep long-standing reservations for orphan items,
      root->orphan_block_rsv isn't used. We can git rid of it, along with:
      
      - root->orphan_lock, which was used to protect root->orphan_block_rsv
      - root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv
      - BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations
        in root->orphan_block_rsv
      - btrfs_orphan_commit_root(), which was the last user of any of these
        and does nothing else
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a575ceeb
    • Q
      btrfs: Do super block verification before writing it to disk · 75cb857d
      Qu Wenruo 提交于
      There are already 2 reports about strangely corrupted super blocks,
      where csum still matches but extra garbage gets slipped into super block.
      
      The corruption would looks like:
      ------
      superblock: bytenr=65536, device=/dev/sdc1
      ---------------------------------------------------------
      csum_type               41700 (INVALID)
      csum                    0x3b252d3a [match]
      bytenr                  65536
      flags                   0x1
                              ( WRITTEN )
      magic                   _BHRfS_M [match]
      ...
      incompat_flags          0x5b22400000000169
                              ( MIXED_BACKREF |
                                COMPRESS_LZO |
                                BIG_METADATA |
                                EXTENDED_IREF |
                                SKINNY_METADATA |
                                unknown flag: 0x5b22400000000000 )
      ...
      ------
      Or
      ------
      superblock: bytenr=65536, device=/dev/mapper/x
      ---------------------------------------------------------
      csum_type              35355 (INVALID)
      csum_size              32
      csum                   0xf0dbeddd [match]
      bytenr                 65536
      flags                  0x1
                             ( WRITTEN )
      magic                  _BHRfS_M [match]
      ...
      incompat_flags         0x176d200000000169
                             ( MIXED_BACKREF |
                               COMPRESS_LZO |
                               BIG_METADATA |
                               EXTENDED_IREF |
                               SKINNY_METADATA |
                               unknown flag: 0x176d200000000000 )
      ------
      
      Obviously, csum_type and incompat_flags get some garbage, but its csum
      still matches, which means kernel calculates the csum based on corrupted
      super block memory.
      And after manually fixing these values, the filesystem is completely
      healthy without any problem exposed by btrfs check.
      
      Although the cause is still unknown, at least detect it and prevent further
      corruption.
      
      Both reports have same symptoms, there's an overwrite on offset 192 of
      the superblock, by 4 bytes. The superblock structure is not allocated or
      freed and stays in the memory for the whole filesystem lifetime, so it's
      not a use-after-free kind of error on someone else's leaked page.
      
      As a vague point for the problable cause is mentioning of other system
      freezing related to graphic card drivers.
      Reported-by: NKen Swenson <flat@imo.uto.moe>
      Reported-by: NBen Parsons <9parsonsb@gmail.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add brief analysis of the reports ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      75cb857d
    • Q
      btrfs: Refactor btrfs_check_super_valid · 069ec957
      Qu Wenruo 提交于
      Refactor btrfs_check_super_valid:
      
      1) Rename it to btrfs_validate_mount_super()
         Now it's more obvious when the function should be called.
      
      2) Extract core check routine into validate_super()
         Later write time check can reuse it, and if needed, we could also
         use validate_super() to check each super block.
      
      3) Add more comments about btrfs_validate_mount_super()
         Mostly about what it doesn't check and when it should be called.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ rename to validate_super ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      069ec957
    • Q
      btrfs: Move btrfs_check_super_valid() to avoid forward declaration · 21a852b0
      Qu Wenruo 提交于
      Move btrfs_check_super_valid() before its single caller to avoid forward
      declaration.
      
      Though such code motion is not recommended as it pollutes git history,
      in this case the following patches would need to add new forward
      declarations for static functions that we want to avoid.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      21a852b0
    • A
      btrfs: move btrfs_raid_group values to btrfs_raid_attr table · 41a6e891
      Anand Jain 提交于
      Add a new member struct btrfs_raid_attr::bg_flag so that
      btrfs_raid_array can maintain the bit map flag of the raid type, and
      so we can drop btrfs_raid_group.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      41a6e891
    • D
      btrfs: track running balance in a simpler way · 3009a62f
      David Sterba 提交于
      Currently fs_info::balance_running is 0 or 1 and does not use the
      semantics of atomics. The pause and cancel check for 0, that can happen
      only after __btrfs_balance exits for whatever reason.
      
      Parallel calls to balance ioctl may enter btrfs_ioctl_balance multiple
      times but will block on the balance_mutex that protects the
      fs_info::flags bit.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3009a62f
    • D
      btrfs: kill btrfs_fs_info::volume_mutex · dccdb07b
      David Sterba 提交于
      Mutual exclusion of device add/rm and balance was done by the volume
      mutex up to version 3.7. The commit 5ac00add ("Btrfs: disallow
      mutually exclusive admin operations from user mode") added a bit that
      essentially tracked the same information.
      
      The status bit has an advantage over a mutex that it can be set without
      restrictions of function context, so it started to be used in the
      mount-time resuming of balance or device replace.
      
      But we don't really need to track the same information in two ways.
      
      1) After the previous cleanups, the main ioctl handlers for
         add/del/resize copy the EXCL_OP bit next to the volume mutex, here
         it's clearly safe.
      
      2) Resuming balance during mount or after rw remount will set only the
         EXCL_OP bit and the volume_mutex is held in the kernel thread that
         calls btrfs_balance.
      
      3) Resuming device replace during mount or after rw remount is done
         after balance and is excluded by the EXCL_OP bit. It does not take
         the volume_mutex at all and completely relies on the EXCL_OP bit.
      
      4) The resuming of balance and dev-replace cannot hapen at the same time
         as the ioctls cannot be started in parallel. Nevertheless, a crafted
         image could trigger that and a warning is printed.
      
      5) Balance is normally excluded by EXCL_OP and also uses own mutex to
         protect against concurrent access to its status data. There's some
         trickery to maintain the right lock nesting in case we need to
         reexamine the status in btrfs_ioctl_balance. The volume_mutex is
         removed and the unlock/lock sequence is left in place as we might
         expect other waiters to proceed.
      
      6) Similar to 5, the unlock/lock sequence is kept in
         btrfs_cancel_balance to allow waiters to continue.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      dccdb07b
  4. 17 5月, 2018 1 次提交
    • N
      btrfs: Fix delalloc inodes invalidation during transaction abort · fe816d0f
      Nikolay Borisov 提交于
      When a transaction is aborted btrfs_cleanup_transaction is called to
      cleanup all the various in-flight bits and pieces which migth be
      active. One of those is delalloc inodes - inodes which have dirty
      pages which haven't been persisted yet. Currently the process of
      freeing such delalloc inodes in exceptional circumstances such as
      transaction abort boiled down to calling btrfs_invalidate_inodes whose
      sole job is to invalidate the dentries for all inodes related to a
      root. This is in fact wrong and insufficient since such delalloc inodes
      will likely have pending pages or ordered-extents and will be linked to
      the sb->s_inode_list. This means that unmounting a btrfs instance with
      an aborted transaction could potentially lead inodes/their pages
      visible to the system long after their superblock has been freed. This
      in turn leads to a "use-after-free" situation once page shrink is
      triggered. This situation could be simulated by running generic/019
      which would cause such inodes to be left hanging, followed by
      generic/176 which causes memory pressure and page eviction which lead
      to touching the freed super block instance. This situation is
      additionally detected by the unmount code of VFS with the following
      message:
      
      "VFS: Busy inodes after unmount of Self-destruct in 5 seconds.  Have a nice day..."
      
      Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
      in free_fs_root for the same reason.
      
      This patch aims to rectify the sitaution by doing the following:
      
      1. Change btrfs_destroy_delalloc_inodes so that it calls
      invalidate_inode_pages2 for every inode on the delalloc list, this
      ensures that all the pages of the inode are released. This function
      boils down to calling btrfs_releasepage. During test I observed cases
      where inodes on the delalloc list were having an i_count of 0, so this
      necessitates using igrab to be sure we are working on a non-freed inode.
      
      2. Since calling btrfs_releasepage might queue delayed iputs move the
      call out to btrfs_cleanup_transaction in btrfs_error_commit_super before
      calling run_delayed_iputs for the last time. This is necessary to ensure
      that delayed iputs are run.
      
      Note: this patch is tagged for 4.14 stable but the fix applies to older
      versions too but needs to be backported manually due to conflicts.
      
      CC: stable@vger.kernel.org # 4.14.x: 2b877331: btrfs: Split btrfs_del_delalloc_inode into 2 functions
      CC: stable@vger.kernel.org # 4.14.x
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add comment to igrab ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fe816d0f
  5. 18 4月, 2018 1 次提交
    • Q
      btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT · a514d638
      Qu Wenruo 提交于
      Unlike previous method that tries to commit transaction inside
      qgroup_reserve(), this time we will try to commit transaction using
      fs_info->transaction_kthread to avoid nested transaction and no need to
      worry about locking context.
      
      Since it's an asynchronous function call and we won't wait for
      transaction commit, unlike previous method, we must call it before we
      hit the qgroup limit.
      
      So this patch will use the ratio and size of qgroup meta_pertrans
      reservation as indicator to check if we should trigger a transaction
      commit.  (meta_prealloc won't be cleaned in transaction committ, it's
      useless anyway)
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a514d638
  6. 13 4月, 2018 1 次提交
    • Q
      btrfs: Only check first key for committed tree blocks · 5d41be6f
      Qu Wenruo 提交于
      When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
      kernel warning due to first key verification:
      
      [ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210
      [ 4239.523830] Modules linked in:
      [ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
      [ 4239.527101] Call Trace:
      [ 4239.527251]  read_tree_block+0x42/0x70
      [ 4239.527434]  read_node_slot+0xd2/0x110
      [ 4239.527632]  push_leaf_right+0xad/0x1b0
      [ 4239.527809]  split_leaf+0x4ea/0x700
      [ 4239.527988]  ? leaf_space_used+0xbc/0xe0
      [ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
      [ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
      [ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
      [ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
      [ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
      [ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
      [ 4239.529445]  ? start_transaction+0xa8/0x4f0
      [ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
      [ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
      [ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
      [ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
      [ 4239.532098]  generic_file_direct_write+0xcb/0x170
      [ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
      [ 4239.532491]  aio_write+0xe2/0x180
      [ 4239.532669]  ? lock_acquire+0xac/0x1e0
      [ 4239.532839]  ? __might_fault+0x3e/0x90
      [ 4239.533032]  do_io_submit+0x594/0x860
      [ 4239.533223]  ? do_io_submit+0x594/0x860
      [ 4239.533398]  SyS_io_submit+0x10/0x20
      [ 4239.533560]  ? SyS_io_submit+0x10/0x20
      [ 4239.533729]  do_syscall_64+0x75/0x1d0
      [ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      [ 4239.534182] RIP: 0033:0x7f8519741697
      
      The problem here is, at btree_read_extent_buffer_pages() we don't have
      acquired read/write lock on that extent buffer, only basic info like
      level/bytenr is reliable.
      
      So race condition leads to such false alert.
      
      However in current call site, it's impossible to acquire proper lock
      without race window.
      To fix the problem, we only verify first key for committed tree blocks
      (whose generation is no larger than fs_info->last_trans_committed), so
      the content of such tree blocks will not change and there is no need to
      get read/write lock.
      Reported-by: NNikolay Borisov <nborisov@suse.com>
      Fixes: 581c1760 ("btrfs: Validate child tree block's level and first key")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5d41be6f
  7. 12 4月, 2018 2 次提交
    • D
      btrfs: replace GPL boilerplate by SPDX -- sources · c1d7c514
      David Sterba 提交于
      Remove GPL boilerplate text (long, short, one-line) and keep the rest,
      ie. personal, company or original source copyright statements. Add the
      SPDX header.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c1d7c514
    • L
      Btrfs: clean up resources during umount after trans is aborted · af722733
      Liu Bo 提交于
      Currently if some fatal errors occur, like all IO get -EIO, resources
      would be cleaned up when
      a) transaction is being committed or
      b) BTRFS_FS_STATE_ERROR is set
      
      However, in some rare cases, resources may be left alone after transaction
      gets aborted and umount may run into some ASSERT(), e.g.
      ASSERT(list_empty(&block_group->dirty_list));
      
      For case a), in btrfs_commit_transaciton(), there're several places at the
      beginning where we just call btrfs_end_transaction() without cleaning up
      resources.  For case b), it is possible that the trans handle doesn't have
      any dirty stuff, then only trans hanlde is marked as aborted while
      BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
      
      This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
      all resources won't stay in memory after umount.
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      af722733
  8. 31 3月, 2018 11 次提交
    • L
      Btrfs: print error messages when failing to read trees · f50f4353
      Liu Bo 提交于
      When mount fails to read trees like fs tree, checksum tree, extent
      tree, etc, there is not enough information about where went wrong.
      
      With this, messages like
      
      "BTRFS warning (device sdf): failed to read root (objectid=7): -5"
      
      would help us a bit.
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f50f4353
    • Q
      btrfs: Validate child tree block's level and first key · 581c1760
      Qu Wenruo 提交于
      We have several reports about node pointer points to incorrect child
      tree blocks, which could have even wrong owner and level but still with
      valid generation and checksum.
      
      Although btrfs check could handle it and print error message like:
      leaf parent key incorrect 60670574592
      
      Kernel doesn't have enough check on this type of corruption correctly.
      At least add such check to read_tree_block() and btrfs_read_buffer(),
      where we need two new parameters @level and @first_key to verify the
      child tree block.
      
      The new @level check is mandatory and all call sites are already
      modified to extract expected level from its call chain.
      
      While @first_key is optional, the following call sites are skipping such
      check:
      1) Root node/leaf
         As ROOT_ITEM doesn't contain the first key, skip @first_key check.
      2) Direct backref
         Only parent bytenr and level is known and we need to resolve the key
         all by ourselves, skip @first_key check.
      
      Another note of this verification is, it needs extra info from nodeptr
      or ROOT_ITEM, so it can't fit into current tree-checker framework, which
      is limited to node/leaf boundary.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      581c1760
    • Q
      btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space · 8287475a
      Qu Wenruo 提交于
      For quota disabled->enable case, it's possible that at reservation time
      quota was not enabled so no bytes were really reserved, while at release
      time, quota was enabled so we will try to release some bytes we didn't
      really own.
      
      Such situation can cause metadata reserveation underflow, for both types,
      also less possible for per-trans type since quota enable will commit
      transaction.
      
      To address this, record qgroup meta reserved bytes into
      root::qgroup_meta_rsv_pertrans and ::prealloc.
      So at releasing time we won't free any bytes we didn't reserve.
      
      For DATA, it's already handled by io_tree, so nothing needs to be done
      there.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8287475a
    • Q
      btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup · e1211d0e
      Qu Wenruo 提交于
      Since qgroup has seperate metadata reservation types now, we can
      completely get rid of the old root->qgroup_meta_rsv, which mostly acts
      as current META_PERTRANS reservation type.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e1211d0e
    • J
      btrfs: defer adding raid type kobject until after chunk relocation · 75cb379d
      Jeff Mahoney 提交于
      Any time the first block group of a new type is created, we add a new
      kobject to sysfs to hold the attributes for that type.  Kobject-internal
      allocations always use GFP_KERNEL, making them prone to fs-reclaim races.
      While it appears as if this can occur any time a block group is created,
      the only times the first block group of a new type can be created in
      memory is at mount and when we create the first new block group during
      raid conversion.
      
      This patch adds a new list to track pending kobject additions and then
      handles them after we do chunk relocation.  Between relocating the
      target chunk (or forcing allocation of a new chunk in the case of data)
      and removing the old chunk, we're in a safe place for fs-reclaim to
      occur.  We're holding the volume mutex, which is already held across
      page faults, and the delete_unused_bgs_mutex, which will only stall
      the cleaner thread.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      75cb379d
    • J
      btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers · 8a5a916d
      Jeff Mahoney 提交于
      While running btrfs/011, I hit the following lockdep splat.
      
      This is the important bit:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
      
      The percpu_counter_init call in btrfs_alloc_subvolume_writers
      uses GFP_KERNEL, which we can't do during transaction commit.
      
      This switches it to GFP_NOFS.
      
      ========================================================
      WARNING: possible irq lock inversion dependency detected
      4.12.14-kvmsmall #8 Tainted: G        W
      --------------------------------------------------------
      kswapd0/50 just changed the state of lock:
       (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
      but this lock took another, RECLAIM_FS-unsafe lock in the past:
       (pcpu_alloc_mutex){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_alloc_mutex);
                                     local_irq_disable();
                                     lock(&delayed_node->mutex);
                                     lock(&found->groups_sem);
        <Interrupt>
          lock(&delayed_node->mutex);
      
       *** DEADLOCK ***
      
      2 locks held by kswapd0/50:
       #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
       #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
      
      the shortest dependencies between 2nd lock and 1st lock:
         -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
            HARDIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            SOFTIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            RECLAIM_FS-ON-W at:
                                   __kmalloc+0x47/0x310
                                   pcpu_extend_area_map+0x2b/0xc0
                                   pcpu_alloc+0x3ec/0x5e0
                                   alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                   __do_tune_cpucache+0x2c/0x220
                                   do_tune_cpucache+0x26/0xc0
                                   enable_cpucache+0x6d/0xf0
                                   __kmem_cache_create+0x1bf/0x390
                                   create_cache+0xba/0x1b0
                                   kmem_cache_create+0x1f8/0x2b0
                                   ksm_init+0x6f/0x19d
                                   do_one_initcall+0x50/0x1b0
                                   kernel_init_freeable+0x201/0x289
                                   kernel_init+0xa/0x100
                                   ret_from_fork+0x3a/0x50
            INITIAL USE at:
                               __mutex_lock+0x4e/0x8c0
                               pcpu_alloc+0x1ac/0x5e0
                               alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                               setup_cpu_cache+0x2f/0x1f0
                               __kmem_cache_create+0x1bf/0x390
                               create_boot_cache+0x8b/0xb1
                               kmem_cache_init+0xa1/0x19e
                               start_kernel+0x270/0x4cb
                               x86_64_start_kernel+0x127/0x134
                               secondary_startup_64+0xa5/0xb0
          }
          ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
          ... acquired at:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
         transaction_kthread+0x176/0x1b0 [btrfs]
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
        -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
           HARDIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           HARDIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           SOFTIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           SOFTIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           INITIAL USE at:
                             down_write+0x3e/0xa0
                             cache_block_group+0x287/0x420 [btrfs]
                             find_free_extent+0x106c/0x12d0 [btrfs]
                             btrfs_reserve_extent+0xd8/0x170 [btrfs]
                             cow_file_range.isra.66+0x133/0x470 [btrfs]
                             run_delalloc_range+0x121/0x410 [btrfs]
                             writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                             __extent_writepage+0x19a/0x360 [btrfs]
                             extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                             extent_writepages+0x4d/0x60 [btrfs]
                             do_writepages+0x1a/0x70
                             __filemap_fdatawrite_range+0xa7/0xe0
                             btrfs_rename+0x5ee/0xdb0 [btrfs]
                             vfs_rename+0x52a/0x7e0
                             SyS_rename+0x351/0x3b0
                             do_syscall_64+0x79/0x1e0
                             entry_SYSCALL_64_after_hwframe+0x42/0xb7
         }
         ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
         ... acquired at:
         cache_block_group+0x287/0x420 [btrfs]
         find_free_extent+0x106c/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         btrfs_create_tree+0xbb/0x2a0 [btrfs]
         btrfs_create_uuid_tree+0x37/0x140 [btrfs]
         open_ctree+0x23c0/0x2660 [btrfs]
         btrfs_mount+0xd36/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         btrfs_mount+0x18c/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         do_mount+0x1c1/0xcc0
         SyS_mount+0x7e/0xd0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
       -> (&found->groups_sem){++++..} ops: 2134587 {
          HARDIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          HARDIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          INITIAL USE at:
                           down_write+0x3e/0xa0
                           __link_block_group+0x34/0x130 [btrfs]
                           btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                           open_ctree+0x2054/0x2660 [btrfs]
                           btrfs_mount+0xd36/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           btrfs_mount+0x18c/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           do_mount+0x1c1/0xcc0
                           SyS_mount+0x7e/0xd0
                           do_syscall_64+0x79/0x1e0
                           entry_SYSCALL_64_after_hwframe+0x42/0xb7
        }
        ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
        ... acquired at:
         find_free_extent+0xcb4/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         __btrfs_cow_block+0x110/0x5b0 [btrfs]
         btrfs_cow_block+0xd7/0x290 [btrfs]
         btrfs_search_slot+0x1f6/0x960 [btrfs]
         btrfs_lookup_inode+0x2a/0x90 [btrfs]
         __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
         btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
         btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
         evict+0xc4/0x190
         __dentry_kill+0xbf/0x170
         dput+0x2ae/0x2f0
         SyS_rename+0x2a6/0x3b0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
         HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         IN-RECLAIM_FS-W at:
                             __mutex_lock+0x4e/0x8c0
                             __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                             btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                             evict+0xc4/0x190
                             dispose_list+0x35/0x50
                             prune_icache_sb+0x42/0x50
                             super_cache_scan+0x139/0x190
                             shrink_slab+0x262/0x5b0
                             shrink_node+0x2eb/0x2f0
                             kswapd+0x2eb/0x890
                             kthread+0x102/0x140
                             ret_from_fork+0x3a/0x50
         INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                         btrfs_update_inode+0x83/0x110 [btrfs]
                         btrfs_dirty_inode+0x62/0xe0 [btrfs]
                         touch_atime+0x8c/0xb0
                         do_generic_file_read+0x818/0xb10
                         __vfs_read+0xdc/0x150
                         vfs_read+0x8a/0x130
                         SyS_read+0x45/0xa0
                         do_syscall_64+0x79/0x1e0
                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
       }
       ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
       ... acquired at:
         __lock_acquire+0x264/0x11c0
         lock_acquire+0xbd/0x1e0
         __mutex_lock+0x4e/0x8c0
         __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
         btrfs_evict_inode+0x22c/0x6a0 [btrfs]
         evict+0xc4/0x190
         dispose_list+0x35/0x50
         prune_icache_sb+0x42/0x50
         super_cache_scan+0x139/0x190
         shrink_slab+0x262/0x5b0
         shrink_node+0x2eb/0x2f0
         kswapd+0x2eb/0x890
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
      stack backtrace:
      CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0x78/0xb7
       print_irq_inversion_bug.part.38+0x19f/0x1aa
       check_usage_forwards+0x102/0x120
       ? ret_from_fork+0x3a/0x50
       ? check_usage_backwards+0x110/0x110
       mark_lock+0x16c/0x270
       __lock_acquire+0x264/0x11c0
       ? pagevec_lookup_entries+0x1a/0x30
       ? truncate_inode_pages_range+0x2b3/0x7f0
       lock_acquire+0xbd/0x1e0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       __mutex_lock+0x4e/0x8c0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
       evict+0xc4/0x190
       dispose_list+0x35/0x50
       prune_icache_sb+0x42/0x50
       super_cache_scan+0x139/0x190
       shrink_slab+0x262/0x5b0
       shrink_node+0x2eb/0x2f0
       kswapd+0x2eb/0x890
       kthread+0x102/0x140
       ? mem_cgroup_shrink_node+0x2c0/0x2c0
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x3a/0x50
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8a5a916d
    • N
      btrfs: remove max_active var from open_ctree · 338dae1a
      Nikolay Borisov 提交于
      Introduced by 5cdc7ad3 ("btrfs: Replace fs_info->workers with
      btrfs_workqueue.") but obsoleted by 2a458198 ("btrfs: factor
      btrfs_init_workqueues() out of open_ctree()").
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      338dae1a
    • N
      btrfs: Use sizeof directly instead of a constant variable · 776c4a7c
      Nikolay Borisov 提交于
      The kernel would like to have all stack VLA usage removed[1].
      Unfortunately using an integer constant variable as the size of an
      array is still considered a VLA. Instead let's use directly sizeof(var)
      which removes the VLA usage. Use the occasion to remove csum_size
      altogether and use sizeof() also for the size passed to memcmp
      
      [1]: https://lkml.org/lkml/2018/3/7/621Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      776c4a7c
    • D
      d0ee3934
    • D
      btrfs: remove unused parameters from extent_submit_bio_done_t · 6c553435
      David Sterba 提交于
      Remove parameters not used by any of the callbacks.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6c553435
    • D
      btrfs: remove unused parameters from extent_submit_bio_start_t · d0779291
      David Sterba 提交于
      Remove parameters not used by any of the callbacks.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d0779291