1. 06 1月, 2023 1 次提交
    • G
      xfs: fix super block buf log item UAF during force shutdown · 766ae6eb
      Guo Xuenan 提交于
      mainline inclusion
      from mainline-v6.1-rc4
      commit 575689fc
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=575689fc0ffa6c4bb4e72fd18e31a6525a6124e0
      
      --------------------------------
      
      xfs log io error will trigger xlog shut down, and end_io worker call
      xlog_state_shutdown_callbacks to unpin and release the buf log item.
      The race condition is that when there are some thread doing transaction
      commit and happened not to be intercepted by xlog_is_shutdown, then,
      these log item will be insert into CIL, when unpin and release these
      buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit`
      can increase recurrence probability.
      
      The following call graph actually encountered this bad situation.
      fsstress                    io end worker kworker/0:1H-216
                                  xlog_ioend_work
                                    ->xlog_force_shutdown
                                      ->xlog_state_shutdown_callbacks
                                        ->xlog_cil_process_committed
                                          ->xlog_cil_committed
                                            ->xfs_trans_committed_bulk
      ->xfs_trans_apply_sb_deltas             ->li_ops->iop_unpin(lip, 1);
        ->xfs_trans_getsb
          ->_xfs_trans_bjoin
            ->xfs_buf_item_init
              ->if (bip) { return 0;} //relog
      ->xlog_cil_commit
        ->xlog_cil_insert_items //insert into CIL
                                                 ->xfs_buf_ioend_fail(bp);
                                                   ->xfs_buf_ioend
                                                     ->xfs_buf_item_done
                                                       ->xfs_buf_item_relse
                                                         ->xfs_buf_item_free
      
      when cil push worker gather percpu cil and insert super block buf log item
      into ctx->log_items then uaf occurs.
      
      ==================================================================
      BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0
      Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105
      
      CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W
      6.1.0-rc1-00001-g274115149b42 #136
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.13.0-1ubuntu1.1 04/01/2014
      Workqueue: xfs-cil/sda xlog_cil_push_work
      Call Trace:
       <TASK>
       dump_stack_lvl+0x4d/0x66
       print_report+0x171/0x4a6
       kasan_report+0xb3/0x130
       xlog_cil_push_work+0x1c8f/0x22f0
       process_one_work+0x6f9/0xf70
       worker_thread+0x578/0xf30
       kthread+0x28c/0x330
       ret_from_fork+0x1f/0x30
       </TASK>
      
      Allocated by task 2145:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       __kasan_slab_alloc+0x54/0x60
       kmem_cache_alloc+0x14a/0x510
       xfs_buf_item_init+0x160/0x6d0
       _xfs_trans_bjoin+0x7f/0x2e0
       xfs_trans_getsb+0xb6/0x3f0
       xfs_trans_apply_sb_deltas+0x1f/0x8c0
       __xfs_trans_commit+0xa25/0xe10
       xfs_symlink+0xe23/0x1660
       xfs_vn_symlink+0x157/0x280
       vfs_symlink+0x491/0x790
       do_symlinkat+0x128/0x220
       __x64_sys_symlink+0x7a/0x90
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 216:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       kasan_save_free_info+0x2a/0x40
       __kasan_slab_free+0x105/0x1a0
       kmem_cache_free+0xb6/0x460
       xfs_buf_ioend+0x1e9/0x11f0
       xfs_buf_item_unpin+0x3d6/0x840
       xfs_trans_committed_bulk+0x4c2/0x7c0
       xlog_cil_committed+0xab6/0xfb0
       xlog_cil_process_committed+0x117/0x1e0
       xlog_state_shutdown_callbacks+0x208/0x440
       xlog_force_shutdown+0x1b3/0x3a0
       xlog_ioend_work+0xef/0x1d0
       process_one_work+0x6f9/0xf70
       worker_thread+0x578/0xf30
       kthread+0x28c/0x330
       ret_from_fork+0x1f/0x30
      
      The buggy address belongs to the object at ffff88801800f388
       which belongs to the cache xfs_buf_item of size 272
      The buggy address is located 104 bytes inside of
       272-byte region [ffff88801800f388, ffff88801800f498)
      
      The buggy address belongs to the physical page:
      page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000
      index:0xffff88801800f208 pfn:0x1800e
      head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0
      flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640
      raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                   ^
       ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Disabling lock debugging due to kernel taint
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      (cherry picked from commit 5a5e896a)
      766ae6eb
  2. 09 3月, 2022 2 次提交
    • B
      xfs: remove dead stale buf unpin handling code · 55619cc7
      Brian Foster 提交于
      mainline inclusion
      from mainline-v5.13-rc4
      commit e53d3aa0
      category: bugfix
      bugzilla: 185862 https://gitee.com/openeuler/kernel/issues/I4KIAO
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e53d3aa0b605c49d780e1b2fd0b49dba4154f32b
      
      --------------------------------
      
      This code goes back to a time when transaction commits wrote
      directly to iclogs. The associated log items were pinned, written to
      the log, and then "uncommitted" if some part of the log write had
      failed. This uncommit sequence called an ->iop_unpin_remove()
      handler that was eventually folded into ->iop_unpin() via the remove
      parameter. The log subsystem has since changed significantly in that
      transactions commit to the CIL instead of direct to iclogs, though
      log items must still be aborted in the event of an eventual log I/O
      error. However, the context for a log item abort is now asynchronous
      from transaction commit, which means the committing transaction has
      been freed by this point in time and the transaction uncommit
      sequence of events is no longer relevant.
      
      Further, since stale buffers remain locked at transaction commit
      through unpin, we can be certain that the buffer is not associated
      with any transaction when the unpin callback executes. Remove this
      unused hunk of code and replace it with an assertion that the buffer
      is disassociated from transaction context.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      55619cc7
    • B
      xfs: hold buffer across unpin and potential shutdown processing · baa590d3
      Brian Foster 提交于
      mainline inclusion
      from mainline-v5.13-rc4
      commit 84d8949e
      category: bugfix
      bugzilla: 185862 https://gitee.com/openeuler/kernel/issues/I4KIAO
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84d8949e770745b16a7e8a68dcb1d0f3687bdee9
      
      --------------------------------
      
      The special processing used to simulate a buffer I/O failure on fs
      shutdown has a difficult to reproduce race that can result in a use
      after free of the associated buffer. Consider a buffer that has been
      committed to the on-disk log and thus is AIL resident. The buffer
      lands on the writeback delwri queue, but is subsequently locked,
      committed and pinned by another transaction before submitted for
      I/O. At this point, the buffer is stuck on the delwri queue as it
      cannot be submitted for I/O until it is unpinned. A log checkpoint
      I/O failure occurs sometime later, which aborts the bli. The unpin
      handler is called with the aborted log item, drops the bli reference
      count, the pin count, and falls into the I/O failure simulation
      path.
      
      The potential problem here is that once the pin count falls to zero
      in ->iop_unpin(), xfsaild is free to retry delwri submission of the
      buffer at any time, before the unpin handler even completes. If
      delwri queue submission wins the race to the buffer lock, it
      observes the shutdown state and simulates the I/O failure itself.
      This releases both the bli and delwri queue holds and frees the
      buffer while xfs_buf_item_unpin() sits on xfs_buf_lock() waiting to
      run through the same failure sequence. This problem is rare and
      requires many iterations of fstest generic/019 (which simulates disk
      I/O failures) to reproduce.
      
      To avoid this problem, grab a hold on the buffer before the log item
      is unpinned if the associated item has been aborted and will require
      a simulated I/O failure. The hold is already required for the
      simulated I/O failure, so the ordering simply guarantees the unpin
      handler access to the buffer before it is unpinned and thus
      processed by the AIL. This particular ordering is required so long
      as the AIL does not acquire a reference on the bli, which is the
      long term solution to this problem.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      baa590d3
  3. 27 12月, 2021 2 次提交
    • D
      xfs: xfs_log_force_lsn isn't passed a LSN · e100c699
      Dave Chinner 提交于
      mainline-inclusion
      from mainline-v5.13-rc4
      commit 5f9b4b0d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f9b4b0de8dc2fb8eb655463b438001c111570fe
      
      -------------------------------------------------
      
      In doing an investigation into AIL push stalls, I was looking at the
      log force code to see if an async CIL push could be done instead.
      This lead me to xfs_log_force_lsn() and looking at how it works.
      
      xfs_log_force_lsn() is only called from inode synchronisation
      contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn
      value as the LSN to sync the log to. This gets passed to
      xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the
      journal, and then used by xfs_log_force_lsn() to flush the iclogs to
      the journal.
      
      The problem is that ip->i_itemp->ili_last_lsn does not store a
      log sequence number. What it stores is passed to it from the
      ->iop_committing method, which is called by xfs_log_commit_cil().
      The value this passes to the iop_committing method is the CIL
      context sequence number that the item was committed to.
      
      As it turns out, xlog_cil_force_lsn() converts the sequence to an
      actual commit LSN for the related context and returns that to
      xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn"
      variable that contained a sequence with an actual LSN and then uses
      that to sync the iclogs.
      
      This caused me some confusion for a while, even though I originally
      wrote all this code a decade ago. ->iop_committing is only used by
      a couple of log item types, and only inode items use the sequence
      number it is passed.
      
      Let's clean up the API, CIL structures and inode log item to call it
      a sequence number, and make it clear that the high level code is
      using CIL sequence numbers and not on-disk LSNs for integrity
      synchronisation purposes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      e100c699
    • D
      xfs: Fix CIL throttle hang when CIL space used going backwards · 59ab51ec
      Dave Chinner 提交于
      mainline-inclusion
      from mainline-v5.13-rc4
      commit 19f4e7cc
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19f4e7cc819771812a7f527d7897c2deffbf7a00
      
      -------------------------------------------------
      
      A hang with tasks stuck on the CIL hard throttle was reported and
      largely diagnosed by Donald Buczek, who discovered that it was a
      result of the CIL context space usage decrementing in committed
      transactions once the hard throttle limit had been hit and processes
      were already blocked.  This resulted in the CIL push not waking up
      those waiters because the CIL context was no longer over the hard
      throttle limit.
      
      The surprising aspect of this was the CIL space usage going
      backwards regularly enough to trigger this situation. Assumptions
      had been made in design that the relogging process would only
      increase the size of the objects in the CIL, and so that space would
      only increase.
      
      This change and commit message fixes the issue and documents the
      result of an audit of the triggers that can cause the CIL space to
      go backwards, how large the backwards steps tend to be, the
      frequency in which they occur, and what the impact on the CIL
      accounting code is.
      
      Even though the CIL ctx->space_used can go backwards, it will only
      do so if the log item is already logged to the CIL and contains a
      space reservation for it's entire logged state. This is tracked by
      the shadow buffer state on the log item. If the item is not
      previously logged in the CIL it has no shadow buffer nor log vector,
      and hence the entire size of the logged item copied to the log
      vector is accounted to the CIL space usage. i.e.  it will always go
      up in this case.
      
      If the item has a log vector (i.e. already in the CIL) and the size
      decreases, then the existing log vector will be overwritten and the
      space usage will go down. This is the only condition where the space
      usage reduces, and it can only occur when an item is already tracked
      in the CIL. Hence we are safe from CIL space usage underruns as a
      result of log items decreasing in size when they are relogged.
      
      Typically this reduction in CIL usage occurs from metadata blocks
      being free, such as when a btree block merge occurs or a directory
      enter/xattr entry is removed and the da-tree is reduced in size.
      This generally results in a reduction in size of around a single
      block in the CIL, but also tends to increase the number of log
      vectors because the parent and sibling nodes in the tree needs to be
      updated when a btree block is removed. If a multi-level merge
      occurs, then we see reduction in size of 2+ blocks, but again the
      log vector count goes up.
      
      The other vector is inode fork size changes, which only log the
      current size of the fork and ignore the previously logged size when
      the fork is relogged. Hence if we are removing items from the inode
      fork (dir/xattr removal in shortform, extent record removal in
      extent form, etc) the relogged size of the inode for can decrease.
      
      No other log items can decrease in size either because they are a
      fixed size (e.g. dquots) or they cannot be relogged (e.g. relogging
      an intent actually creates a new intent log item and doesn't relog
      the old item at all.) Hence the only two vectors for CIL context
      size reduction are relogging inode forks and marking buffers active
      in the CIL as stale.
      
      Long story short: the majority of the code does the right thing and
      handles the reduction in log item size correctly, and only the CIL
      hard throttle implementation is problematic and needs fixing. This
      patch makes that fix, as well as adds comments in the log item code
      that result in items shrinking in size when they are relogged as a
      clear reminder that this can and does happen frequently.
      
      The throttle fix is based upon the change Donald proposed, though it
      goes further to ensure that once the throttle is activated, it
      captures all tasks until the CIL push issues a wakeup, regardless of
      whether the CIL space used has gone back under the throttle
      threshold.
      
      This ensures that we prevent tasks reducing the CIL slightly under
      the throttle threshold and then making more changes that push it
      well over the throttle limit. This is acheived by checking if the
      throttle wait queue is already active as a condition of throttling.
      Hence once we start throttling, we continue to apply the throttle
      until the CIL context push wakes everything on the wait queue.
      
      We can use waitqueue_active() for the waitqueue manipulations and
      checks as they are all done under the ctx->xc_push_lock. Hence the
      waitqueue has external serialisation and we can safely peek inside
      the wait queue without holding the internal waitqueue locks.
      
      Many thanks to Donald for his diagnostic and analysis work to
      isolate the cause of this hang.
      Reported-and-tested-by: NDonald Buczek <buczek@molgen.mpg.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      59ab51ec
  4. 16 9月, 2020 4 次提交
  5. 05 8月, 2020 1 次提交
  6. 29 7月, 2020 1 次提交
  7. 14 7月, 2020 1 次提交
  8. 07 7月, 2020 12 次提交
  9. 07 5月, 2020 6 次提交
  10. 19 3月, 2020 1 次提交
  11. 27 1月, 2020 1 次提交
  12. 17 1月, 2020 3 次提交
  13. 19 12月, 2019 1 次提交
  14. 19 11月, 2019 1 次提交
  15. 08 11月, 2019 1 次提交
  16. 27 8月, 2019 1 次提交
  17. 29 6月, 2019 1 次提交