1. 02 12月, 2021 9 次提交
    • R
      ACPI: EC: Make the event work state machine visible · c33676aa
      Rafael J. Wysocki 提交于
      The EC driver uses a relatively simple state machine for the event
      work handling, but it is not really straightforward to figure out.
      
      The states are as follows:
      
       "Ready": The event handling work can be submitted.
      
        In this state, the EC_FLAGS_QUERY_PENDING flag is clear.
      
       "In progress": The event handling work is pending or is being
                      processed.  It cannot be submitted again.
      
        In ths state, the EC_FLAGS_QUERY_PENDING flag is set and both the
        events_to_process count is nonzero and the EC_FLAGS_QUERY_GUARDING
        flag is clear.
      
       "Complete": The event handling work has been completed, but it still
                   cannot be submitted again.
      
        In ths state, the EC_FLAGS_QUERY_PENDING flag is set and the
        events_to_process count is zero or the EC_FLAGS_QUERY_GUARDING
        flag is set.
      
      The state changes from "Ready" to "In progress" when new event is
      detected by advance_transaction() and acpi_ec_submit_event() is
      called by it.
      
      Next, the state can change from "In progress" directly to "Ready" in
      the following situations:
      
       * ec_event_clearing is ACPI_EC_EVT_TIMING_STATUS and the state of
         an ACPI_EC_COMMAND_QUERY transaction becomes ACPI_EC_COMMAND_POLL.
      
       * ec_event_clearing is ACPI_EC_EVT_TIMING_QUERY and the state of
         an ACPI_EC_COMMAND_QUERY transaction becomes
         ACPI_EC_COMMAND_COMPLETE.
      
       * ec_event_clearing is either ACPI_EC_EVT_TIMING_STATUS or
         ACPI_EC_EVT_TIMING_QUERY and there are no more events to
         process (ie. ec->events_to_process becomes 0).
      
      If ec_event_clearing is ACPI_EC_EVT_TIMING_EVENT, however, the
      state must change from "In progress" to "Complete" before it
      can change to "Ready".  The changes from "In progress" to
      "Complete" in that case occur in the following situations:
      
       * The state of an ACPI_EC_COMMAND_QUERY transaction becomes
         ACPI_EC_COMMAND_COMPLETE.
      
       * There are no more events to process (ie. ec->events_to_process
         becomes 0).
      
      Finally, the state changes from "Complete" to "Ready" when
      advance_transaction() is invoked when the state is "Complete" and
      the state of the current transaction is not ACPI_EC_COMMAND_POLL.
      
      To make this state machine visible in the code, add a new
      event_state field to struct acpi_ec and modify the code to use
      it istead the EC_FLAGS_QUERY_PENDING and EC_FLAGS_QUERY_GUARDING
      flags.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c33676aa
    • R
      ACPI: EC: Avoid queuing unnecessary work in acpi_ec_submit_event() · c793570d
      Rafael J. Wysocki 提交于
      Notice that it is not necessary to queue up the event work again
      if the while () loop in acpi_ec_event_handler() is still running
      which is the case if nr_pending_queries is greater than 0 at the
      beginning of acpi_ec_submit_event() and modify the code to avoid
      doing that.
      
      While at it, rename nr_pending_queries in struct acpi_ec to
      events_to_process which actually matches the role of that field
      and change its data type to unsigned int which is sufficient.
      
      No expected functional impact.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c793570d
    • R
      ACPI: EC: Rename three functions · eafe7509
      Rafael J. Wysocki 提交于
      Rename acpi_ec_submit_query() to acpi_ec_submit_event(),
      acpi_ec_query() to acpi_ec_submit_query(), and
      acpi_ec_complete_query() to acpi_ec_close_event() to make
      the names reflect what the functions do.
      
      No expected functional impact.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eafe7509
    • R
      ACPI: EC: Simplify locking in acpi_ec_event_handler() · a105acd7
      Rafael J. Wysocki 提交于
      Because acpi_ec_event_handler() is a work function, it always
      runs in process context with interrupts enabled, so it can use
      spin_lock_irq() and spin_unlock_irq() for the locking.
      
      Make it do so and adjust white space around those calls.
      
      No expected functional impact.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a105acd7
    • R
      ACPI: EC: Rearrange the loop in acpi_ec_event_handler() · 388fb77d
      Rafael J. Wysocki 提交于
      It is not necessary to check ec->nr_pending_queries against 0 in the
      while () loop in acpi_ec_event_handler(), because that loop terminates
      when ec->nr_pending_queries is 0 and the code depending on that can be
      run after the loop has ended.
      
      Modify the code accordingly and while at it rewrite the comment
      regarding that code to make it clearer.
      
      No intentional functional impact.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      388fb77d
    • R
      ACPI: EC: Fold acpi_ec_check_event() into acpi_ec_event_handler() · 98d36450
      Rafael J. Wysocki 提交于
      Because acpi_ec_event_handler() is the only caller of
      acpi_ec_check_event() and the separation of these two functions
      makes it harder to follow the code flow, fold the latter into the
      former (and simplify that code while at it).
      
      No expected functional impact.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      98d36450
    • R
      ACPI: EC: Pass one argument to acpi_ec_query() · 1f235044
      Rafael J. Wysocki 提交于
      Notice that the second argument to acpi_ec_query() is redundant,
      because in the only case when it is not NULL, the value passed
      through it is only checked against 0 and it can only be 0 when
      acpi_ec_query() returns an error code, but its return value
      is checked along with the value passed through its second
      argument.
      
      Accordingly, modify acpi_ec_query() to take only one argument
      and while at it, change its handling of the case when
      acpi_ec_transaction() returns an error so as to return that
      error value to the caller right away.
      
      No expected functional impact.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1f235044
    • R
      ACPI: EC: Call advance_transaction() from acpi_ec_dispatch_gpe() · ca8283dc
      Rafael J. Wysocki 提交于
      Calling acpi_dispatch_gpe() from acpi_ec_dispatch_gpe() is generally
      problematic, because it may cause the spurious interrupt handling in
      advance_transaction() to trigger in theory.
      
      However, instead of calling acpi_dispatch_gpe() to dispatch the EC
      GPE, acpi_ec_dispatch_gpe() can call advance_transaction() directly
      on first_ec and it can pass 'false' as its second argument to indicate
      calling it from process context.
      
      Moreover, if advance_transaction() is modified to return a bool value
      indicating whether or not the EC work needs to be flushed, it can be
      used to avoid unnecessary EC work flushing in acpi_ec_dispatch_gpe(),
      so change the code accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ca8283dc
    • R
      ACPI: EC: Rework flushing of EC work while suspended to idle · 4a9af6ca
      Rafael J. Wysocki 提交于
      The flushing of pending work in the EC driver uses drain_workqueue()
      to flush the event handling work that can requeue itself via
      advance_transaction(), but this is problematic, because that
      work may also be requeued from the query workqueue.
      
      Namely, if an EC transaction is carried out during the execution of
      a query handler, it involves calling advance_transaction() which
      may queue up the event handling work again.  This causes the kernel
      to complain about attempts to add a work item to the EC event
      workqueue while it is being drained and worst-case it may cause a
      valid event to be skipped.
      
      To avoid this problem, introduce two new counters, events_in_progress
      and queries_in_progress, incremented when a work item is queued on
      the event workqueue or the query workqueue, respectively, and
      decremented at the end of the corresponding work function, and make
      acpi_ec_dispatch_gpe() the workqueues in a loop until the both of
      these counters are zero (or system wakeup is pending) instead of
      calling acpi_ec_flush_work().
      
      At the same time, change __acpi_ec_flush_work() to call
      flush_workqueue() instead of drain_workqueue() to flush the event
      workqueue.
      
      While at it, use the observation that the work item queued in
      acpi_ec_query() cannot be pending at that time, because it is used
      only once, to simplify the code in there.
      
      Additionally, clean up a comment in acpi_ec_query() and adjust white
      space in acpi_ec_event_processor().
      
      Fixes: f0ac20c3 ("ACPI: EC: Fix flushing of pending work")
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4a9af6ca
  2. 29 11月, 2021 7 次提交
  3. 28 11月, 2021 16 次提交
  4. 27 11月, 2021 8 次提交
    • Y
      io_uring: Fix undefined-behaviour in io_issue_sqe · f6223ff7
      Ye Bin 提交于
      We got issue as follows:
      ================================================================================
      UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
      signed integer overflow:
      -4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
      CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x170/0x1dc lib/dump_stack.c:118
       ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
       handle_overflow+0x188/0x1dc lib/ubsan.c:192
       __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
       ktime_set include/linux/ktime.h:42 [inline]
       timespec64_to_ktime include/linux/ktime.h:78 [inline]
       io_timeout fs/io_uring.c:5153 [inline]
       io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
       __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
       io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
       io_submit_sqe fs/io_uring.c:6137 [inline]
       io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
       __do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
       __se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
       __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
       invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
       el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
       el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
       el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
      ================================================================================
      
      As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
      negative value maybe lead to overflow.
      To address this issue, we must check if 'sec' is negative.
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      f6223ff7
    • Y
      io_uring: fix soft lockup when call __io_remove_buffers · 1d0254e6
      Ye Bin 提交于
      I got issue as follows:
      [ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      [  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  594.364987] Modules linked in:
      [  594.365405] irq event stamp: 604180238
      [  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
      [  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
      [  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
      [  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
      [  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
      [  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [  594.373604] Workqueue: events_unbound io_ring_exit_work
      [  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
      [  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
      [  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
      [  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
      [  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
      [  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
      [  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
      [  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
      [  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
      [  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
      [  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  594.387403] Call Trace:
      [  594.387738]  <TASK>
      [  594.388042]  find_and_remove_object+0x118/0x160
      [  594.389321]  delete_object_full+0xc/0x20
      [  594.389852]  kfree+0x193/0x470
      [  594.390275]  __io_remove_buffers.part.0+0xed/0x147
      [  594.390931]  io_ring_ctx_free+0x342/0x6a2
      [  594.392159]  io_ring_exit_work+0x41e/0x486
      [  594.396419]  process_one_work+0x906/0x15a0
      [  594.399185]  worker_thread+0x8b/0xd80
      [  594.400259]  kthread+0x3bf/0x4a0
      [  594.401847]  ret_from_fork+0x22/0x30
      [  594.402343]  </TASK>
      
      Message from syslogd@localhost at Nov 13 09:09:54 ...
      kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
      [  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
      
      We can reproduce this issue by follow syzkaller log:
      r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
      sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
      syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
      io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)
      
      The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
      to soft lockup.
      To solve this issue, we need add schedule point when do while loop in
      '__io_remove_buffers'.
      After add  schedule point we do regression, get follow data.
      [  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
      [  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      [  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
      [  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
      [  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
      ...
      
      Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators")
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      1d0254e6
    • S
      tracing: Fix pid filtering when triggers are attached · a55f224f
      Steven Rostedt (VMware) 提交于
      If a event is filtered by pid and a trigger that requires processing of
      the event to happen is a attached to the event, the discard portion does
      not take the pid filtering into account, and the event will then be
      recorded when it should not have been.
      
      Cc: stable@vger.kernel.org
      Fixes: 3fdaf80f ("tracing: Implement event pid filtering")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a55f224f
    • A
      iommu/vt-d: Fix unmap_pages support · 86dc40c7
      Alex Williamson 提交于
      When supporting only the .map and .unmap callbacks of iommu_ops,
      the IOMMU driver can make assumptions about the size and alignment
      used for mappings based on the driver provided pgsize_bitmap.  VT-d
      previously used essentially PAGE_MASK for this bitmap as any power
      of two mapping was acceptably filled by native page sizes.
      
      However, with the .map_pages and .unmap_pages interface we're now
      getting page-size and count arguments.  If we simply combine these
      as (page-size * count) and make use of the previous map/unmap
      functions internally, any size and alignment assumptions are very
      different.
      
      As an example, a given vfio device assignment VM will often create
      a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
      does not support IOMMU super pages, the unmap_pages interface will
      ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
      will recurse down to level 2 of the page table where the first half
      of the pfn range exactly matches the entire pte level.  We clear the
      pte, increment the pfn by the level size, but (oops) the next pte is
      on a new page, so we exit the loop an pop back up a level.  When we
      then update the pfn based on that higher level, we seem to assume
      that the previous pfn value was at the start of the level.  In this
      case the level size is 256K pfns, which we add to the base pfn and
      get a results of 0x7fe00, which is clearly greater than 0x401ff,
      so we're done.  Meanwhile we never cleared the ptes for the remainder
      of the range.  When the VM remaps this range, we're overwriting valid
      ptes and the VT-d driver complains loudly, as reported by the user
      report linked below.
      
      The fix for this seems relatively simple, if each iteration of the
      loop in dma_pte_clear_level() is assumed to clear to the end of the
      level pte page, then our next pfn should be calculated from level_pfn
      rather than our working pfn.
      
      Fixes: 3f34f125 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
      Reported-by: NAjay Garg <ajaygargnsit@gmail.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
      Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omenSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      86dc40c7
    • C
      iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() · 4e5973dd
      Christophe JAILLET 提交于
      If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
      Go through the end of the function instead. This way, the missing
      'rcu_read_unlock()' is called.
      
      Fixes: 7afd7f6a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.frSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      4e5973dd
    • A
      iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 · f7ff3cff
      Alex Bee 提交于
      With the submission of iommu driver for RK3568 a subtle bug was
      introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
      the other way arround - that leads to random errors, especially when
      addresses beyond 32 bit are used.
      
      Fix it.
      
      Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2")
      Signed-off-by: NAlex Bee <knaerzche@gmail.com>
      Tested-by: NPeter Geis <pgwipeout@gmail.com>
      Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
      Tested-by: NDan Johansen <strit@manjaro.org>
      Reviewed-by: NBenjamin Gaignard <benjamin.gaignard@collabora.com>
      Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      f7ff3cff
    • J
      iommu/amd: Clarify AMD IOMMUv2 initialization messages · 717e88aa
      Joerg Roedel 提交于
      The messages printed on the initialization of the AMD IOMMUv2 driver
      have caused some confusion in the past. Clarify the messages to lower
      the confusion in the future.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org
      717e88aa
    • J
      iommu/vt-d: Remove unused PASID_DISABLED · 21e96a20
      Joerg Roedel 提交于
      The macro is unused after commit 00ecd540 so it can be removed.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Fixes: 00ecd540 ("iommu/vt-d: Clean up unused PASID updating functions")
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
      21e96a20