1. 02 9月, 2020 10 次提交
  2. 01 12月, 2019 1 次提交
  3. 19 9月, 2019 1 次提交
    • Y
      genirq: Prevent NULL pointer dereference in resend_irqs() · 991b3458
      Yunfeng Ye 提交于
      commit eddf3e9c7c7e4d0707c68d1bb22cc6ec8aef7d4a upstream.
      
      The following crash was observed:
      
        Unable to handle kernel NULL pointer dereference at 0000000000000158
        Internal error: Oops: 96000004 [#1] SMP
        pc : resend_irqs+0x68/0xb0
        lr : resend_irqs+0x64/0xb0
        ...
        Call trace:
         resend_irqs+0x68/0xb0
         tasklet_action_common.isra.6+0x84/0x138
         tasklet_action+0x2c/0x38
         __do_softirq+0x120/0x324
         run_ksoftirqd+0x44/0x60
         smpboot_thread_fn+0x1ac/0x1e8
         kthread+0x134/0x138
         ret_from_fork+0x10/0x18
      
      The reason for this is that the interrupt resend mechanism happens in soft
      interrupt context, which is a asynchronous mechanism versus other
      operations on interrupts. free_irq() does not take resend handling into
      account. Thus, the irq descriptor might be already freed before the resend
      tasklet is executed. resend_irqs() does not check the return value of the
      interrupt descriptor lookup and derefences the return value
      unconditionally.
      
        1):
        __setup_irq
          irq_startup
            check_irq_resend  // activate softirq to handle resend irq
        2):
        irq_domain_free_irqs
          irq_free_descs
            free_desc
              call_rcu(&desc->rcu, delayed_free_desc)
        3):
        __do_softirq
          tasklet_action
            resend_irqs
              desc = irq_to_desc(irq)
              desc->handle_irq(desc)  // desc is NULL --> Ooops
      
      Fix this by adding a NULL pointer check in resend_irqs() before derefencing
      the irq descriptor.
      
      Fixes: a4633adc ("[PATCH] genirq: add genirq sw IRQ-retrigger")
      Signed-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1630ae13-5c8e-901e-de09-e740b6a426a7@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      991b3458
  4. 29 8月, 2019 1 次提交
  5. 21 7月, 2019 3 次提交
  6. 10 5月, 2019 1 次提交
  7. 17 4月, 2019 2 次提交
    • K
      genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n · 8b4f68b4
      Kefeng Wang 提交于
      commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.
      
      When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc
      is not initialized which causes malfunction.
      
      Fixes: 9114014c ("genirq: Add mutex to irq desc to serialize request/free_irq()")
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: <linux-arm-kernel@lists.infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b4f68b4
    • S
      genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent() · cd5b06a9
      Stephen Boyd 提交于
      commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
      
      If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
      has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
      
      This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
      the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
      walk the chain of parents and set irq wake on any chips that don't have the
      flag set either. If the intent is to call the .irq_set_wake() callback of
      the parent irqchip, then we expect irqchip implementations to omit the
      IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
      calls irq_chip_set_wake_parent().
      
      The problem has been observed on a Qualcomm sdm845 device where set wake
      fails on any GPIO interrupts after applying work in progress wakeup irq
      patches to the GPIO driver. The chain of chips looks like this:
      
           QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
      
      The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
      GIC as parent.  The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
      set, and so does the grandparent ARM GIC.
      
      The GPIO driver doesn't know if the parent needs to set wake or not, so it
      unconditionally calls irq_chip_set_wake_parent() causing this function to
      return a failure because the parent irqchip (PDC) doesn't have the
      .irq_set_wake() callback set. Returning 0 instead makes everything work and
      irqs from the GPIO controller can be configured for wakeup.
      
      Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
      when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 08b55e2a ("genirq: Add irqchip_set_wake_parent")
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-gpio@vger.kernel.org
      Cc: Lina Iyer <ilina@codeaurora.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd5b06a9
  8. 06 4月, 2019 1 次提交
    • T
      genirq: Avoid summation loops for /proc/stat · 1f369486
      Thomas Gleixner 提交于
      [ Upstream commit 1136b0728969901a091f0471968b2b76ed14d9ad ]
      
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      This can be largely avoided for interrupts which are not marked as
      'PER_CPU' interrupts by simply adding a per interrupt summation counter
      which is incremented along with the per interrupt per cpu counter.
      
      The PER_CPU interrupts need to avoid that and use only per cpu accounting
      because they share the interrupt number and the interrupt descriptor and
      concurrent updates would conflict or require unwanted synchronization.
      Reported-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de
      
      8<-------------
      
      v2: Undo the unintentional layout change of struct irq_desc.
      
       include/linux/irqdesc.h |    1 +
       kernel/irq/chip.c       |   12 ++++++++++--
       kernel/irq/internals.h  |    8 +++++++-
       kernel/irq/irqdesc.c    |    7 ++++++-
       4 files changed, 24 insertions(+), 4 deletions(-)
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1f369486
  9. 06 3月, 2019 4 次提交
    • S
      genirq: Make sure the initial affinity is not empty · 17fab891
      Srinivas Ramana 提交于
      [ Upstream commit bddda606ec76550dd63592e32a6e87e7d32583f7 ]
      
      If all CPUs in the irq_default_affinity mask are offline when an interrupt
      is initialized then irq_setup_affinity() can set an empty affinity mask for
      a newly allocated interrupt.
      
      Fix this by falling back to cpu_online_mask in case the resulting affinity
      mask is zero.
      Signed-off-by: NSrinivas Ramana <sramana@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-msm@vger.kernel.org
      Link: https://lkml.kernel.org/r/1545312957-8504-1-git-send-email-sramana@codeaurora.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
      17fab891
    • L
      genirq/matrix: Improve target CPU selection for managed interrupts. · 765c30b3
      Long Li 提交于
      [ Upstream commit e8da8794a7fd9eef1ec9a07f0d4897c68581c72b ]
      
      On large systems with multiple devices of the same class (e.g. NVMe disks,
      using managed interrupts), the kernel can affinitize these interrupts to a
      small subset of CPUs instead of spreading them out evenly.
      
      irq_matrix_alloc_managed() tries to select the CPU in the supplied cpumask
      of possible target CPUs which has the lowest number of interrupt vectors
      allocated.
      
      This is done by searching the CPU with the highest number of available
      vectors. While this is correct for non-managed CPUs it can select the wrong
      CPU for managed interrupts. Under certain constellations this results in
      affinitizing the managed interrupts of several devices to a single CPU in
      a set.
      
      The book keeping of available vectors works the following way:
      
       1) Non-managed interrupts:
      
          available is decremented when the interrupt is actually requested by
          the device driver and a vector is assigned. It's incremented when the
          interrupt and the vector are freed.
      
       2) Managed interrupts:
      
          Managed interrupts guarantee vector reservation when the MSI/MSI-X
          functionality of a device is enabled, which is achieved by reserving
          vectors in the bitmaps of the possible target CPUs. This reservation
          decrements the available count on each possible target CPU.
      
          When the interrupt is requested by the device driver then a vector is
          allocated from the reserved region. The operation is reversed when the
          interrupt is freed by the device driver. Neither of these operations
          affect the available count.
      
          The reservation persist up to the point where the MSI/MSI-X
          functionality is disabled and only this operation increments the
          available count again.
      
      For non-managed interrupts the available count is the correct selection
      criterion because the guaranteed reservations need to be taken into
      account. Using the allocated counter could lead to a failing allocation in
      the following situation (total vector space of 10 assumed):
      
      		 CPU0	CPU1
       available:	    2	   0
       allocated:	    5	   3   <--- CPU1 is selected, but available space = 0
       managed reserved:  3	   7
      
       while available yields the correct result.
      
      For managed interrupts the available count is not the appropriate
      selection criterion because as explained above the available count is not
      affected by the actual vector allocation.
      
      The following example illustrates that. Total vector space of 10
      assumed. The starting point is:
      
      		 CPU0	CPU1
       available:	    5	   4
       allocated:	    2	   3
       managed reserved:  3	   3
      
       Allocating vectors for three non-managed interrupts will result in
       affinitizing the first two to CPU0 and the third one to CPU1 because the
       available count is adjusted with each allocation:
      
      		  CPU0	CPU1
       available:	     5	   4	<- Select CPU0 for 1st allocation
       --> allocated:	     3	   3
      
       available:	     4	   4	<- Select CPU0 for 2nd allocation
       --> allocated:	     4	   3
      
       available:	     3	   4	<- Select CPU1 for 3rd allocation
       --> allocated:	     4	   4
      
       But the allocation of three managed interrupts starting from the same
       point will affinitize all of them to CPU0 because the available count is
       not affected by the allocation (see above). So the end result is:
      
      		  CPU0	CPU1
       available:	     5	   4
       allocated:	     5	   3
      
      Introduce a "managed_allocated" field in struct cpumap to track the vector
      allocation for managed interrupts separately. Use this information to
      select the target CPU when a vector is allocated for a managed interrupt,
      which results in more evenly distributed vector assignments. The above
      example results in the following allocations:
      
      		 CPU0	CPU1
       managed_allocated: 0	   0	<- Select CPU0 for 1st allocation
       --> allocated:	    3	   3
      
       managed_allocated: 1	   0	<- Select CPU1 for 2nd allocation
       --> allocated:	    3	   4
      
       managed_allocated: 1	   1	<- Select CPU0 for 3rd allocation
       --> allocated:	    4	   4
      
      The allocation of non-managed interrupts is not affected by this change and
      is still evaluating the available count.
      
      The overall distribution of interrupt vectors for both types of interrupts
      might still not be perfectly even depending on the number of non-managed
      and managed interrupts in a system, but due to the reservation guarantee
      for managed interrupts this cannot be avoided.
      
      Expose the new field in debugfs as well.
      
      [ tglx: Clarified the background of the problem in the changelog and
        	described it independent of NVME ]
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Michael Kelley <mikelley@microsoft.com>
      Link: https://lkml.kernel.org/r/20181106040000.27316-1-longli@linuxonhyperv.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      765c30b3
    • D
      irq/matrix: Spread managed interrupts on allocation · 8cae7757
      Dou Liyang 提交于
      [ Upstream commit 76f99ae5b54d48430d1f0c5512a84da0ff9761e0 ]
      
      Linux spreads out the non managed interrupt across the possible target CPUs
      to avoid vector space exhaustion.
      
      Managed interrupts are treated differently, as for them the vectors are
      reserved (with guarantee) when the interrupt descriptors are initialized.
      
      When the interrupt is requested a real vector is assigned. The assignment
      logic uses the first CPU in the affinity mask for assignment. If the
      interrupt has more than one CPU in the affinity mask, which happens when a
      multi queue device has less queues than CPUs, then doing the same search as
      for non managed interrupts makes sense as it puts the interrupt on the
      least interrupt plagued CPU. For single CPU affine vectors that's obviously
      a NOOP.
      
      Restructre the matrix allocation code so it does the 'best CPU' search, add
      the sanity check for an empty affinity mask and adapt the call site in the
      x86 vector management code.
      
      [ tglx: Added the empty mask check to the core and improved change log ]
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cae7757
    • D
      irq/matrix: Split out the CPU selection code into a helper · 2948b887
      Dou Liyang 提交于
      [ Upstream commit 8ffe4e61c06a48324cfd97f1199bb9838acce2f2 ]
      
      Linux finds the CPU which has the lowest vector allocation count to spread
      out the non managed interrupts across the possible target CPUs, but does
      not do so for managed interrupts.
      
      Split out the CPU selection code into a helper function for reuse. No
      functional change.
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liyang@163.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2948b887
  10. 13 2月, 2019 1 次提交
    • L
      genirq/affinity: Spread IRQs to all available NUMA nodes · 46ed4f4f
      Long Li 提交于
      [ Upstream commit b82592199032bf7c778f861b936287e37ebc9f62 ]
      
      If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
      which are allocated for a device, the interrupt affinity spreading code
      fails to spread them across all nodes.
      
      The reason is, that the spreading code starts from node 0 and continues up
      to the number of interrupts requested for allocation. This leaves the nodes
      past the last interrupt unused.
      
      This results in interrupt concentration on the first nodes which violates
      the assumption of the block layer that all nodes are covered evenly. As a
      consequence the NUMA nodes above the number of interrupts are all assigned
      to hardware queue 0 and therefore NUMA node 0, which results in bad
      performance and has CPU hotplug implications, because queue 0 gets shut
      down when the last CPU of node 0 is offlined.
      
      Go over all NUMA nodes and assign them round-robin to all requested
      interrupts to solve this.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Cc: Michael Kelley <mikelley@microsoft.com>
      Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      46ed4f4f
  11. 14 11月, 2018 1 次提交
    • L
      genirq: Fix race on spurious interrupt detection · e6d2f788
      Lukas Wunner 提交于
      commit 746a923b863a1065ef77324e1e43f19b1a3eab5c upstream.
      
      Commit 1e77d0a1 ("genirq: Sanitize spurious interrupt detection of
      threaded irqs") made detection of spurious interrupts work for threaded
      handlers by:
      
      a) incrementing a counter every time the thread returns IRQ_HANDLED, and
      b) checking whether that counter has increased every time the thread is
         woken.
      
      However for oneshot interrupts, the commit unmasks the interrupt before
      incrementing the counter.  If another interrupt occurs right after
      unmasking but before the counter is incremented, that interrupt is
      incorrectly considered spurious:
      
      time
       |  irq_thread()
       |    irq_thread_fn()
       |      action->thread_fn()
       |      irq_finalize_oneshot()
       |        unmask_threaded_irq()            /* interrupt is unmasked */
       |
       |                  /* interrupt fires, incorrectly deemed spurious */
       |
       |    atomic_inc(&desc->threads_handled); /* counter is incremented */
       v
      
      This is observed with a hi3110 CAN controller receiving data at high volume
      (from a separate machine sending with "cangen -g 0 -i -x"): The controller
      signals a huge number of interrupts (hundreds of millions per day) and
      every second there are about a dozen which are deemed spurious.
      
      In theory with high CPU load and the presence of higher priority tasks, the
      number of incorrectly detected spurious interrupts might increase beyond
      the 99,900 threshold and cause disablement of the interrupt.
      
      In practice it just increments the spurious interrupt count. But that can
      cause people to waste time investigating it over and over.
      
      Fix it by moving the accounting before the invocation of
      irq_finalize_oneshot().
      
      [ tglx: Folded change log update ]
      
      Fixes: 1e77d0a1 ("genirq: Sanitize spurious interrupt detection of threaded irqs")
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Akshay Bhat <akshay.bhat@timesys.com>
      Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
      Cc: stable@vger.kernel.org # v3.16+
      Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6d2f788
  12. 03 8月, 2018 2 次提交
    • T
      genirq: Make force irq threading setup more robust · d1f0301b
      Thomas Gleixner 提交于
      The support of force threading interrupts which are set up with both a
      primary and a threaded handler wreckaged the setup of regular requested
      threaded interrupts (primary handler == NULL).
      
      The reason is that it does not check whether the primary handler is set to
      the default handler which wakes the handler thread. Instead it replaces the
      thread handler with the primary handler as it would do with force threaded
      interrupts which have been requested via request_irq(). So both the primary
      and the thread handler become the same which then triggers the warnon that
      the thread handler tries to wakeup a not configured secondary thread.
      
      Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
      when requesting the threaded interrupt, which is normaly caught by the
      sanity checks when force irq threading is disabled.
      
      Fix it by skipping the force threading setup when a regular threaded
      interrupt is requested. As a consequence the interrupt request which lacks
      the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
      it.
      
      Fixes: 2a1d3ab8 ("genirq: Handle force threading of irqs with primary and thread handler")
      Reported-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Cc: stable@vger.kernel.org
      d1f0301b
    • P
      genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete · 4f7799d9
      Palmer Dabbelt 提交于
      Now that every user of MULTI_IRQ_HANDLER has been convereted over to use
      GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER.
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux@armlinux.org.uk
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: jonas@southpole.se
      Cc: stefan.kristiansson@saunalahti.fi
      Cc: shorne@gmail.com
      Cc: jason@lakedaemon.net
      Cc: marc.zyngier@arm.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: nicolas.pitre@linaro.org
      Cc: vladimir.murzin@arm.com
      Cc: keescook@chromium.org
      Cc: jinb.park7@gmail.com
      Cc: yamada.masahiro@socionext.com
      Cc: alexandre.belloni@bootlin.com
      Cc: pombredanne@nexb.com
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: kstewart@linuxfoundation.org
      Cc: jhogan@kernel.org
      Cc: mark.rutland@arm.com
      Cc: ard.biesheuvel@linaro.org
      Cc: james.morse@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: openrisc@lists.librecores.org
      Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com
      4f7799d9
  13. 17 7月, 2018 1 次提交
  14. 24 6月, 2018 2 次提交
    • L
      genirq: Synchronize only with single thread on free_irq() · 519cc865
      Lukas Wunner 提交于
      When pciehp is converted to threaded IRQ handling, removal of unplugged
      devices below a PCIe hotplug port happens synchronously in the IRQ thread.
      Removal of devices typically entails a call to free_irq() by their drivers.
      
      If those devices share their IRQ with the hotplug port, __free_irq()
      deadlocks because it calls synchronize_irq() to wait for all hard IRQ
      handlers as well as all threads sharing the IRQ to finish.
      
      Actually it's sufficient to wait only for the IRQ thread of the removed
      device, so call synchronize_hardirq() to wait for all hard IRQ handlers to
      finish, but no longer for any threads.  Compensate by rearranging the
      control flow in irq_wait_for_interrupt() such that the device's thread is
      allowed to run one last time after kthread_stop() has been called.
      
      kthread_stop() blocks until the IRQ thread has completed.  On completion
      the IRQ thread clears its oneshot thread_mask bit.  This is safe because
      __free_irq() holds the request_mutex, thereby preventing __setup_irq() from
      handing out the same oneshot thread_mask bit to a newly requested action.
      
      Stack trace for posterity:
          INFO: task irq/17-pciehp:94 blocked for more than 120 seconds.
          schedule+0x28/0x80
          synchronize_irq+0x6e/0xa0
          __free_irq+0x15a/0x2b0
          free_irq+0x33/0x70
          pciehp_release_ctrl+0x98/0xb0
          pcie_port_remove_service+0x2f/0x40
          device_release_driver_internal+0x157/0x220
          bus_remove_device+0xe2/0x150
          device_del+0x124/0x340
          device_unregister+0x16/0x60
          remove_iter+0x1a/0x20
          device_for_each_child+0x4b/0x90
          pcie_port_device_remove+0x1e/0x30
          pci_device_remove+0x36/0xb0
          device_release_driver_internal+0x157/0x220
          pci_stop_bus_device+0x7d/0xa0
          pci_stop_bus_device+0x3d/0xa0
          pci_stop_and_remove_bus_device+0xe/0x20
          pciehp_unconfigure_device+0xb8/0x160
          pciehp_disable_slot+0x84/0x130
          pciehp_ist+0x158/0x190
          irq_thread_fn+0x1b/0x50
          irq_thread+0x143/0x1a0
          kthread+0x111/0x130
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: linux-pci@vger.kernel.org
      Link: https://lkml.kernel.org/r/d72b41309f077c8d3bee6cc08ad3662d50b5d22a.1529828292.git.lukas@wunner.de
      519cc865
    • L
      genirq: Update code comments wrt recycled thread_mask · 836557bd
      Lukas Wunner 提交于
      Previously a race existed between __free_irq() and __setup_irq() wherein
      the thread_mask of a just removed action could be handed out to a newly
      added action and the freed irq thread would then tread on the oneshot
      mask bit of the newly added irq thread in irq_finalize_oneshot():
      
      time
       |  __free_irq()
       |    raw_spin_lock_irqsave(&desc->lock, flags);
       |    <remove action from linked list>
       |    raw_spin_unlock_irqrestore(&desc->lock, flags);
       |
       |  __setup_irq()
       |    raw_spin_lock_irqsave(&desc->lock, flags);
       |    <traverse linked list to determine oneshot mask bit>
       |    raw_spin_unlock_irqrestore(&desc->lock, flags);
       |
       |  irq_thread() of freed irq (__free_irq() waits in synchronize_irq())
       |    irq_thread_fn()
       |      irq_finalize_oneshot()
       |        raw_spin_lock_irq(&desc->lock);
       |        desc->threads_oneshot &= ~action->thread_mask;
       |        raw_spin_unlock_irq(&desc->lock);
       v
      
      The race was known at least since 2012 when it was documented in a code
      comment by commit e04268b0 ("genirq: Remove paranoid warnons and bogus
      fixups"). The race itself is harmless as nothing touches any of the
      potentially freed data after synchronize_irq().
      
      In 2017 the race was close by commit 9114014c ("genirq: Add mutex to
      irq desc to serialize request/free_irq()"), apparently inadvertantly so
      because the race is neither mentioned in the commit message nor was the
      code comment updated.  Make up for that.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: linux-pci@vger.kernel.org
      Link: https://lkml.kernel.org/r/32fc25aa35ecef4b2692f57687bb7fc2a57230e2.1529828292.git.lukas@wunner.de
      836557bd
  15. 22 6月, 2018 2 次提交
  16. 19 6月, 2018 2 次提交
  17. 06 6月, 2018 4 次提交
    • T
      genirq/affinity: Defer affinity setting if irq chip is busy · 12f47073
      Thomas Gleixner 提交于
      The case that interrupt affinity setting fails with -EBUSY can be handled
      in the kernel completely by using the already available generic pending
      infrastructure.
      
      If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
      interrupts for which irq_chip::set_affinity() can only be invoked from
      interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
      set the affinity pending bit. The next raised interrupt for the affected
      irq will check the pending bit and try to set the new affinity from the
      handler. This avoids that -EBUSY is returned when an affinity change is
      requested from user space and the previous change has not been cleaned
      up. The new affinity will take effect when the next interrupt is raised
      from the device.
      
      Fixes: dccfe314 ("x86/vector: Simplify vector move cleanup")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
      12f47073
    • T
      genirq/migration: Avoid out of line call if pending is not set · d340ebd6
      Thomas Gleixner 提交于
      The upcoming fix for the -EBUSY return from affinity settings requires to
      use the irq_move_irq() functionality even on irq remapped interrupts. To
      avoid the out of line call, move the check for the pending bit into an
      inline helper.
      
      Preparatory change for the real fix. No functional change.
      
      Fixes: dccfe314 ("x86/vector: Simplify vector move cleanup")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
      d340ebd6
    • T
      genirq/generic_pending: Do not lose pending affinity update · a33a5d2d
      Thomas Gleixner 提交于
      The generic pending interrupt mechanism moves interrupts from the interrupt
      handler on the original target CPU to the new destination CPU. This is
      required for x86 and ia64 due to the way the interrupt delivery and
      acknowledge works if the interrupts are not remapped.
      
      However that update can fail for various reasons. Some of them are valid
      reasons to discard the pending update, but the case, when the previous move
      has not been fully cleaned up is not a legit reason to fail.
      
      Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
      a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
      tried again when the next interrupt arrives.
      
      Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
      a33a5d2d
    • S
      ide: don't enable/disable interrupts in force threaded-IRQ mode · 47b82e88
      Sebastian Andrzej Siewior 提交于
      The interrupts are enabled/disabled so the interrupt handler can run
      with enabled interrupts while serving the interrupt and not lose other
      interrupts especially the timer tick.
      If the system runs with force-threaded interrupts then there is no need
      to enable the interrupts.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47b82e88
  18. 16 5月, 2018 1 次提交