1. 12 1月, 2016 2 次提交
    • D
      xfs: handle dquot buffer readahead in log recovery correctly · 7d6a13f0
      Dave Chinner 提交于
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7d6a13f0
    • D
      xfs: inode recovery readahead can race with inode buffer creation · b79f4a1c
      Dave Chinner 提交于
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b79f4a1c
  2. 11 1月, 2016 1 次提交
    • E
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen 提交于
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f6106efa
  3. 08 1月, 2016 2 次提交
    • D
      xfs: bmapbt checking on debug kernels too expensive · e3543819
      Dave Chinner 提交于
      For large sparse or fragmented files, checking every single entry in
      the bmapbt on every operation is prohibitively expensive. Especially
      as such checks rarely discover problems during normal operations on
      high extent coutn files. Our regression tests don't tend to exercise
      files with hundreds of thousands to millions of extents, so mostly
      this isn't noticed.
      
      However, trying to run things like xfs_mdrestore of large filesystem
      dumps on a debug kernel quickly becomes impossible as the CPU is
      completely burnt up repeatedly walking the sparse file bmapbt that
      is generated for every allocation that is made.
      
      Hence, if the file has more than 10,000 extents, just don't bother
      with walking the tree to check it exhaustively. The btree code has
      checks that ensure that the newly inserted/removed/modified record
      is correctly ordered, so the entrie tree walk in thses cases has
      limited additional value.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e3543819
    • D
      xfs: add tracepoints to readpage calls · 121e213e
      Dave Chinner 提交于
      This allows us to see page cache driven readahead in action as it
      passes through XFS. This helps to understand buffered read
      throughput problems such as readahead IO IO sizes being too small
      for the underlying device to reach max throughput.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      121e213e
  4. 05 1月, 2016 2 次提交
    • B
      xfs: debug mode log record crc error injection · 609adfc2
      Brian Foster 提交于
      XFS now uses CRC verification over a limited section of the log to
      detect torn writes prior to a crash. This is difficult to test directly
      due to the timing and hardware requirements to cause a short write.
      
      Add a mechanism to inject CRC errors into log records to facilitate
      testing torn write detection during log recovery. This mechanism is
      dangerous and can result in filesystem corruption. Thus, it is only
      available in DEBUG mode for testing/development purposes. Set a non-zero
      value to the following sysfs entry to enable error injection:
      
      	/sys/fs/xfs/<dev>/log/log_badcrc_factor
      
      Once enabled, XFS intentionally writes an invalid CRC to a log record at
      some random point in the future based on the provided frequency. The
      filesystem immediately shuts down once the record has been written to
      the physical log to prevent metadata writeback (e.g., AIL insertion)
      once the log write completes. This helps reasonably simulate a torn
      write to the log as the affected record must be safe to discard. The
      next mount after the intentional shutdown requires log recovery and
      should detect and recover from the torn write.
      
      Note again that this _will_ result in data loss or worse. For testing
      and development purposes only!
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      609adfc2
    • B
      xfs: detect and trim torn writes during log recovery · 7088c413
      Brian Foster 提交于
      Certain types of storage, such as persistent memory, do not provide
      sector atomicity for writes. This means that if a crash occurs while XFS
      is writing log records, only part of those records might make it to the
      storage. This is problematic because log recovery uses the cycle value
      packed at the top of each log block to locate the head/tail of the log.
      This can lead to CRC verification failures during log recovery and an
      unmountable fs for a filesystem that is otherwise consistent.
      
      Update log recovery to incorporate log record CRC verification as part
      of the head/tail discovery process. Once the head is located via the
      traditional algorithm, run a CRC-only pass over the records up to the
      head of the log. If CRC verification fails, assume that the records are
      torn as a matter of policy and trim the head block back to the start of
      the first bad record.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      7088c413
  5. 04 1月, 2016 19 次提交
  6. 30 12月, 2015 3 次提交
    • X
      ocfs2/dlm: clear migration_pending when migration target goes down · cc28d6d8
      xuejiufei 提交于
      We have found a BUG on res->migration_pending when migrating lock
      resources.  The situation is as follows.
      
      dlm_mark_lockres_migration
        res->migration_pending = 1;
        __dlm_lockres_reserve_ast
        dlm_lockres_release_ast returns with res->migration_pending remains
            because other threads reserve asts
        wait dlm_migration_can_proceed returns 1
        >>>>>>> o2hb found that target goes down and remove target
                from domain_map
        dlm_migration_can_proceed returns 1
        dlm_mark_lockres_migrating returns -ESHOTDOWN with
            res->migration_pending still remains.
      
      When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
      with res->migration_pending.  So clear migration_pending when target is
      down.
      Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc28d6d8
    • J
      ocfs2: fix flock panic issue · b5a8bc33
      Junxiao Bi 提交于
      Commit 4f656367 ("Move locks API users to locks_lock_inode_wait()")
      move flock/posix lock indentify code to locks_lock_inode_wait(), but
      missed to set fl_flags to FL_FLOCK which caused the following kernel
      panic on 4.4.0_rc5.
      
        kernel BUG at fs/locks.c:1895!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
        CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G           O    4.4.0-rc5-next-20151217 #1
        Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
        task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000
        RIP: locks_lock_inode_wait+0x2e/0x160
        Call Trace:
          ocfs2_do_flock+0x91/0x160 [ocfs2]
          ocfs2_flock+0x76/0xd0 [ocfs2]
          SyS_flock+0x10f/0x1a0
          entry_SYSCALL_64_fastpath+0x12/0x71
        Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d
        RIP  locks_lock_inode_wait+0x2e/0x160
         RSP <ffff880028b5bce8>
        ---[ end trace dfca74ec9b5b274c ]---
      
      Fixes: 4f656367 ("Move locks API users to locks_lock_inode_wait()")
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5a8bc33
    • J
      ocfs2: fix BUG when calculate new backup super · 5c9ee4cb
      Joseph Qi 提交于
      When resizing, it firstly extends the last gd.  Once it should backup
      super in the gd, it calculates new backup super and update the
      corresponding value.
      
      But it currently doesn't consider the situation that the backup super is
      already done.  And in this case, it still sets the bit in gd bitmap and
      then decrease from bg_free_bits_count, which leads to a corrupted gd and
      trigger the BUG in ocfs2_block_group_set_bits:
      
          BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
      
      So check whether the backup super is done and then do the updates.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: NJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9ee4cb
  7. 19 12月, 2015 1 次提交
  8. 17 12月, 2015 1 次提交
  9. 16 12月, 2015 2 次提交
    • C
      Btrfs: check prepare_uptodate_page() error code earlier · bb1591b4
      Chris Mason 提交于
      prepare_pages() may end up calling prepare_uptodate_page() twice if our
      write only spans a single page.  But if the first call returns an error,
      our page will be unlocked and its not safe to call it again.
      
      This bug goes all the way back to 2011, and it's not something commonly
      hit.
      
      While we're here, add a more explicit check for the page being truncated
      away.  The bare lock_page() alone is protected only by good thoughts and
      i_mutex, which we're sure to regret eventually.
      Reported-by: NDave Jones <dsj@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bb1591b4
    • C
      Btrfs: check for empty bitmap list in setup_cluster_bitmaps · 1b9b922a
      Chris Mason 提交于
      Dave Jones found a warning from kasan in setup_cluster_bitmaps()
      
      ==================================================================
      BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at
      addr ffff88039bef6828
      Read of size 8 by task nfsd/1009
      page:ffffea000e6fbd80 count:0 mapcount:0 mapping:          (null)
      index:0x0
      flags: 0x8000000000000000()
      page dumped because: kasan: bad access detected
      CPU: 1 PID: 1009 Comm: nfsd Tainted: G        W
      4.4.0-rc3-backup-debug+ #1
       ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e
       0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0
       ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280
      Call Trace:
       [<ffffffffa680a43e>] dump_stack+0x4b/0x6d
       [<ffffffffa62638d1>] kasan_report_error+0x501/0x520
       [<ffffffffa61121c0>] ? debug_show_all_locks+0x1e0/0x1e0
       [<ffffffffa6263948>] kasan_report+0x58/0x60
       [<ffffffffa6814b00>] ? rb_last+0x10/0x40
       [<ffffffffa66f8af4>] ? setup_cluster_bitmap+0xc4/0x5a0
       [<ffffffffa6262ead>] __asan_load8+0x5d/0x70
       [<ffffffffa66f8af4>] setup_cluster_bitmap+0xc4/0x5a0
       [<ffffffffa66f675a>] ? setup_cluster_no_bitmap+0x6a/0x400
       [<ffffffffa66fcd16>] btrfs_find_space_cluster+0x4b6/0x640
       [<ffffffffa66fc860>] ? btrfs_alloc_from_cluster+0x4e0/0x4e0
       [<ffffffffa66fc36e>] ? btrfs_return_cluster_to_free_space+0x9e/0xb0
       [<ffffffffa702dc37>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa666a1a1>] find_free_extent+0xba1/0x1520
      
      Andrey noticed this was because we were doing list_first_entry on a list
      that might be empty.  Rework the tests a bit so we don't do that.
      Signed-off-by: NChris Mason <clm@fb.com>
      Reprorted-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Reported-by: NDave Jones <dsj@fb.com>
      1b9b922a
  10. 14 12月, 2015 1 次提交
  11. 13 12月, 2015 2 次提交
    • J
      ocfs2: fix SGID not inherited issue · 854ee2e9
      Junxiao Bi 提交于
      Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an
      issue, SGID of sub dir was not inherited from its parents dir.  It is
      because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
      is overwritten by "mode" which don't have SGID set later.
      
      Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue")
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Acked-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      854ee2e9
    • H
      osd fs: __r4w_get_page rely on PageUptodate for uptodate · 3066a967
      Hugh Dickins 提交于
      Commit 42cb14b1 ("mm: migrate dirty page without
      clear_page_dirty_for_io etc") simplified the migration of a PageDirty
      pagecache page: one stat needs moving from zone to zone and that's about
      all.
      
      It's convenient and safest for it to shift the PageDirty bit from old
      page to new, just before updating the zone stats: before copying data
      and marking the new PageUptodate.  This is all done while both pages are
      isolated and locked, just as before; and just as before, there's a
      moment when the new page is visible in the radix_tree, but not yet
      PageUptodate.  What's new is that it may now be briefly visible as
      PageDirty before it is PageUptodate.
      
      When I scoured the tree to see if this could cause a problem anywhere,
      the only places I found were in two similar functions __r4w_get_page():
      which look up a page with find_get_page() (not using page lock), then
      claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
      
      I'm not sure whether that was right before, but now it might be wrong
      (on rare occasions): only claim the page is uptodate if PageUptodate.
      Or perhaps the page in question could never be migratable anyway?
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Tested-by: NBoaz Harrosh <ooo@electrozaur.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3066a967
  12. 10 12月, 2015 3 次提交
    • H
      btrfs: fix misleading warning when space cache failed to load · 94356889
      Holger Hoffstätte 提交于
      When an inconsistent space cache is detected during loading we log a
      warning that users frequently mistake as instruction to invalidate the
      cache manually, even though this is not required. Fix the message to
      indicate that the cache will be rebuilt automatically.
      Signed-off-by: NHolger Hoffstätte <holger.hoffstaette@googlemail.com>
      Acked-by: NFilipe Manana <fdmanana@suse.com>
      94356889
    • F
      Btrfs: fix transaction handle leak in balance · 8a7d656f
      Filipe Manana 提交于
      If we fail to allocate a new data chunk, we were jumping to the error path
      without release the transaction handle we got before. Fix this by always
      releasing it before doing the jump.
      
      Fixes: 2c9fe835 ("btrfs: Fix lost-data-profile caused by balance bg")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      8a7d656f
    • F
      Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list · 348a0013
      Filipe Manana 提交于
      As of my previous change titled "Btrfs: fix scrub preventing unused block
      groups from being deleted", the following warning at
      extent-tree.c:btrfs_delete_unused_bgs() can be hit when we mount the a
      filesysten with "-o discard":
      
       10263  void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
       10264  {
       (...)
       10405                  if (trimming) {
       10406                          WARN_ON(!list_empty(&block_group->bg_list));
       10407                          spin_lock(&trans->transaction->deleted_bgs_lock);
       10408                          list_move(&block_group->bg_list,
       10409                                    &trans->transaction->deleted_bgs);
       10410                          spin_unlock(&trans->transaction->deleted_bgs_lock);
       10411                          btrfs_get_block_group(block_group);
       10412                  }
       (...)
      
      This happens because scrub can now add back the block group to the list of
      unused block groups (fs_info->unused_bgs). This is dangerous because we
      are moving the block group from the unused block groups list to the list
      of deleted block groups without holding the lock that protects the source
      list (fs_info->unused_bgs_lock).
      
      The following diagram illustrates how this happens:
      
                  CPU 1                                     CPU 2
      
       cleaner_kthread()
         btrfs_delete_unused_bgs()
      
           sees bg X in list
            fs_info->unused_bgs
      
           deletes bg X from list
            fs_info->unused_bgs
      
                                                  scrub_enumerate_chunks()
      
                                                    searches device tree using
                                                    its commit root
      
                                                    finds device extent for
                                                    block group X
      
                                                    gets block group X from the tree
                                                    fs_info->block_group_cache_tree
                                                    (via btrfs_lookup_block_group())
      
                                                    sets bg X to RO (again)
      
                                                    scrub_chunk(bg X)
      
                                                    sets bg X back to RW mode
      
                                                    adds bg X to the list
                                                    fs_info->unused_bgs again,
                                                    since it's still unused and
                                                    currently not in that list
      
           sets bg X to RO mode
      
           btrfs_remove_chunk(bg X)
      
           --> discard is enabled and bg X
               is in the fs_info->unused_bgs
               list again so the warning is
               triggered
           --> we move it from that list into
               the transaction's delete_bgs
               list, but we can have another
               task currently manipulating
               the first list (fs_info->unused_bgs)
      
      Fix this by using the same lock (fs_info->unused_bgs_lock) to protect both
      the list of unused block groups and the list of deleted block groups. This
      makes it safe and there's not much worry for more lock contention, as this
      lock is seldom used and only the cleaner kthread adds elements to the list
      of deleted block groups. The warning goes away too, as this was previously
      an impossible case (and would have been better a BUG_ON/ASSERT) but it's
      not impossible anymore.
      Reproduced with fstest btrfs/073 (using MOUNT_OPTIONS="-o discard").
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      348a0013
  13. 09 12月, 2015 1 次提交