1. 03 4月, 2018 1 次提交
    • D
      xfs: fix intent use-after-free on abort · 0612d116
      Dave Chinner 提交于
      When an intent is aborted during it's initial commit through
      xfs_defer_trans_abort(), there is a use after free. The current
      report is for a RUI  through this path in generic/388:
      
       Freed by task 6274:
        __kasan_slab_free+0x136/0x180
        kmem_cache_free+0xe7/0x4b0
        xfs_trans_free_items+0x198/0x2e0
        __xfs_trans_commit+0x27f/0xcc0
        xfs_trans_roll+0x17b/0x2a0
        xfs_defer_trans_roll+0x6ad/0xe60
        xfs_defer_finish+0x2a6/0x2140
        xfs_alloc_file_space+0x53a/0xf90
        xfs_file_fallocate+0x5c6/0xac0
        vfs_fallocate+0x2f5/0x930
        ioctl_preallocate+0x1dc/0x320
        do_vfs_ioctl+0xfe4/0x1690
      
      The problem is that the RUI has two active references - one in the
      current transaction, and another held by the defer_ops structure
      that is passed to the RUD (intent done) so that both the intent and
      the intent done structures are freed on commit of the intent done.
      
      Hence during abort, we need to release the intent item, because the
      defer_ops reference is released separately via ->abort_intent
      callback. Fix all the intent code to do this correctly.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0612d116
  2. 23 2月, 2018 1 次提交
  3. 28 11月, 2017 1 次提交
    • D
      xfs: log recovery should replay deferred ops in order · 50995582
      Darrick J. Wong 提交于
      As part of testing log recovery with dm_log_writes, Amir Goldstein
      discovered an error in the deferred ops recovery that lead to corruption
      of the filesystem metadata if a reflink+rmap filesystem happened to shut
      down midway through a CoW remap:
      
      "This is what happens [after failed log recovery]:
      
      "Phase 1 - find and verify superblock...
      "Phase 2 - using internal log
      "        - zero log...
      "        - scan filesystem freespace and inode maps...
      "        - found root inode chunk
      "Phase 3 - for each AG...
      "        - scan (but don't clear) agi unlinked lists...
      "        - process known inodes and perform inode discovery...
      "        - agno = 0
      "data fork in regular inode 134 claims CoW block 376
      "correcting nextents for inode 134
      "bad data fork in inode 134
      "would have cleared inode 134"
      
      Hou Tao dissected the log contents of exactly such a crash:
      
      "According to the implementation of xfs_defer_finish(), these ops should
      be completed in the following sequence:
      
      "Have been done:
      "(1) CUI: Oper (160)
      "(2) BUI: Oper (161)
      "(3) CUD: Oper (194), for CUI Oper (160)
      "(4) RUI A: Oper (197), free rmap [0x155, 2, -9]
      
      "Should be done:
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI A
      "(8) RUD: for RUI B
      
      "Actually be done by xlog_recover_process_intents()
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI B
      "(8) RUD: for RUI A
      
      "So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
      then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
      from the log record in post_mount.log (generated after umount) and the trace
      print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
      entry [0x155, 2, -9] are freed."
      
      When reconstructing the internal log state from the log items found on
      disk, it's required that deferred ops replay in exactly the same order
      that they would have had the filesystem not gone down.  However,
      replaying unfinished deferred ops can create /more/ deferred ops.  These
      new deferred ops are finished in the wrong order.  This causes fs
      corruption and replay crashes, so let's create a single defer_ops to
      handle the subsequent ops created during replay, then use one single
      transaction at the end of log recovery to ensure that everything is
      replayed in the same order as they're supposed to be.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Analyzed-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      50995582
  4. 02 9月, 2017 1 次提交
  5. 26 4月, 2017 1 次提交
  6. 04 1月, 2017 1 次提交
  7. 06 10月, 2016 1 次提交
    • D
      xfs: store in-progress CoW allocations in the refcount btree · 174edb0e
      Darrick J. Wong 提交于
      Due to the way the CoW algorithm in XFS works, there's an interval
      during which blocks allocated to handle a CoW can be lost -- if the FS
      goes down after the blocks are allocated but before the block
      remapping takes place.  This is exacerbated by the cowextsz hint --
      allocated reservations can sit around for a while, waiting to get
      used.
      
      Since the refcount btree doesn't normally store records with refcount
      of 1, we can use it to record these in-progress extents.  In-progress
      blocks cannot be shared because they're not user-visible, so there
      shouldn't be any conflicts with other programs.  This is a better
      solution than holding EFIs during writeback because (a) EFIs can't be
      relogged currently, (b) even if they could, EFIs are bound by
      available log space, which puts an unnecessary upper bound on how much
      CoW we can have in flight, and (c) we already have a mechanism to
      track blocks.
      
      At mount time, read the refcount records and free anything we find
      with a refcount of 1 because those were in-progress when the FS went
      down.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      174edb0e
  8. 04 10月, 2016 3 次提交