1. 10 5月, 2019 1 次提交
  2. 17 4月, 2019 2 次提交
    • K
      genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n · 8b4f68b4
      Kefeng Wang 提交于
      commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.
      
      When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc
      is not initialized which causes malfunction.
      
      Fixes: 9114014c ("genirq: Add mutex to irq desc to serialize request/free_irq()")
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: <linux-arm-kernel@lists.infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b4f68b4
    • S
      genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent() · cd5b06a9
      Stephen Boyd 提交于
      commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
      
      If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
      has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
      
      This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
      the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
      walk the chain of parents and set irq wake on any chips that don't have the
      flag set either. If the intent is to call the .irq_set_wake() callback of
      the parent irqchip, then we expect irqchip implementations to omit the
      IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
      calls irq_chip_set_wake_parent().
      
      The problem has been observed on a Qualcomm sdm845 device where set wake
      fails on any GPIO interrupts after applying work in progress wakeup irq
      patches to the GPIO driver. The chain of chips looks like this:
      
           QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
      
      The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
      GIC as parent.  The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
      set, and so does the grandparent ARM GIC.
      
      The GPIO driver doesn't know if the parent needs to set wake or not, so it
      unconditionally calls irq_chip_set_wake_parent() causing this function to
      return a failure because the parent irqchip (PDC) doesn't have the
      .irq_set_wake() callback set. Returning 0 instead makes everything work and
      irqs from the GPIO controller can be configured for wakeup.
      
      Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
      when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 08b55e2a ("genirq: Add irqchip_set_wake_parent")
      Signed-off-by: NStephen Boyd <swboyd@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-gpio@vger.kernel.org
      Cc: Lina Iyer <ilina@codeaurora.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd5b06a9
  3. 06 4月, 2019 1 次提交
    • T
      genirq: Avoid summation loops for /proc/stat · 1f369486
      Thomas Gleixner 提交于
      [ Upstream commit 1136b0728969901a091f0471968b2b76ed14d9ad ]
      
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      This can be largely avoided for interrupts which are not marked as
      'PER_CPU' interrupts by simply adding a per interrupt summation counter
      which is incremented along with the per interrupt per cpu counter.
      
      The PER_CPU interrupts need to avoid that and use only per cpu accounting
      because they share the interrupt number and the interrupt descriptor and
      concurrent updates would conflict or require unwanted synchronization.
      Reported-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de
      
      8<-------------
      
      v2: Undo the unintentional layout change of struct irq_desc.
      
       include/linux/irqdesc.h |    1 +
       kernel/irq/chip.c       |   12 ++++++++++--
       kernel/irq/internals.h  |    8 +++++++-
       kernel/irq/irqdesc.c    |    7 ++++++-
       4 files changed, 24 insertions(+), 4 deletions(-)
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1f369486
  4. 06 3月, 2019 4 次提交
    • S
      genirq: Make sure the initial affinity is not empty · 17fab891
      Srinivas Ramana 提交于
      [ Upstream commit bddda606ec76550dd63592e32a6e87e7d32583f7 ]
      
      If all CPUs in the irq_default_affinity mask are offline when an interrupt
      is initialized then irq_setup_affinity() can set an empty affinity mask for
      a newly allocated interrupt.
      
      Fix this by falling back to cpu_online_mask in case the resulting affinity
      mask is zero.
      Signed-off-by: NSrinivas Ramana <sramana@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-msm@vger.kernel.org
      Link: https://lkml.kernel.org/r/1545312957-8504-1-git-send-email-sramana@codeaurora.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
      17fab891
    • L
      genirq/matrix: Improve target CPU selection for managed interrupts. · 765c30b3
      Long Li 提交于
      [ Upstream commit e8da8794a7fd9eef1ec9a07f0d4897c68581c72b ]
      
      On large systems with multiple devices of the same class (e.g. NVMe disks,
      using managed interrupts), the kernel can affinitize these interrupts to a
      small subset of CPUs instead of spreading them out evenly.
      
      irq_matrix_alloc_managed() tries to select the CPU in the supplied cpumask
      of possible target CPUs which has the lowest number of interrupt vectors
      allocated.
      
      This is done by searching the CPU with the highest number of available
      vectors. While this is correct for non-managed CPUs it can select the wrong
      CPU for managed interrupts. Under certain constellations this results in
      affinitizing the managed interrupts of several devices to a single CPU in
      a set.
      
      The book keeping of available vectors works the following way:
      
       1) Non-managed interrupts:
      
          available is decremented when the interrupt is actually requested by
          the device driver and a vector is assigned. It's incremented when the
          interrupt and the vector are freed.
      
       2) Managed interrupts:
      
          Managed interrupts guarantee vector reservation when the MSI/MSI-X
          functionality of a device is enabled, which is achieved by reserving
          vectors in the bitmaps of the possible target CPUs. This reservation
          decrements the available count on each possible target CPU.
      
          When the interrupt is requested by the device driver then a vector is
          allocated from the reserved region. The operation is reversed when the
          interrupt is freed by the device driver. Neither of these operations
          affect the available count.
      
          The reservation persist up to the point where the MSI/MSI-X
          functionality is disabled and only this operation increments the
          available count again.
      
      For non-managed interrupts the available count is the correct selection
      criterion because the guaranteed reservations need to be taken into
      account. Using the allocated counter could lead to a failing allocation in
      the following situation (total vector space of 10 assumed):
      
      		 CPU0	CPU1
       available:	    2	   0
       allocated:	    5	   3   <--- CPU1 is selected, but available space = 0
       managed reserved:  3	   7
      
       while available yields the correct result.
      
      For managed interrupts the available count is not the appropriate
      selection criterion because as explained above the available count is not
      affected by the actual vector allocation.
      
      The following example illustrates that. Total vector space of 10
      assumed. The starting point is:
      
      		 CPU0	CPU1
       available:	    5	   4
       allocated:	    2	   3
       managed reserved:  3	   3
      
       Allocating vectors for three non-managed interrupts will result in
       affinitizing the first two to CPU0 and the third one to CPU1 because the
       available count is adjusted with each allocation:
      
      		  CPU0	CPU1
       available:	     5	   4	<- Select CPU0 for 1st allocation
       --> allocated:	     3	   3
      
       available:	     4	   4	<- Select CPU0 for 2nd allocation
       --> allocated:	     4	   3
      
       available:	     3	   4	<- Select CPU1 for 3rd allocation
       --> allocated:	     4	   4
      
       But the allocation of three managed interrupts starting from the same
       point will affinitize all of them to CPU0 because the available count is
       not affected by the allocation (see above). So the end result is:
      
      		  CPU0	CPU1
       available:	     5	   4
       allocated:	     5	   3
      
      Introduce a "managed_allocated" field in struct cpumap to track the vector
      allocation for managed interrupts separately. Use this information to
      select the target CPU when a vector is allocated for a managed interrupt,
      which results in more evenly distributed vector assignments. The above
      example results in the following allocations:
      
      		 CPU0	CPU1
       managed_allocated: 0	   0	<- Select CPU0 for 1st allocation
       --> allocated:	    3	   3
      
       managed_allocated: 1	   0	<- Select CPU1 for 2nd allocation
       --> allocated:	    3	   4
      
       managed_allocated: 1	   1	<- Select CPU0 for 3rd allocation
       --> allocated:	    4	   4
      
      The allocation of non-managed interrupts is not affected by this change and
      is still evaluating the available count.
      
      The overall distribution of interrupt vectors for both types of interrupts
      might still not be perfectly even depending on the number of non-managed
      and managed interrupts in a system, but due to the reservation guarantee
      for managed interrupts this cannot be avoided.
      
      Expose the new field in debugfs as well.
      
      [ tglx: Clarified the background of the problem in the changelog and
        	described it independent of NVME ]
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Michael Kelley <mikelley@microsoft.com>
      Link: https://lkml.kernel.org/r/20181106040000.27316-1-longli@linuxonhyperv.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      765c30b3
    • D
      irq/matrix: Spread managed interrupts on allocation · 8cae7757
      Dou Liyang 提交于
      [ Upstream commit 76f99ae5b54d48430d1f0c5512a84da0ff9761e0 ]
      
      Linux spreads out the non managed interrupt across the possible target CPUs
      to avoid vector space exhaustion.
      
      Managed interrupts are treated differently, as for them the vectors are
      reserved (with guarantee) when the interrupt descriptors are initialized.
      
      When the interrupt is requested a real vector is assigned. The assignment
      logic uses the first CPU in the affinity mask for assignment. If the
      interrupt has more than one CPU in the affinity mask, which happens when a
      multi queue device has less queues than CPUs, then doing the same search as
      for non managed interrupts makes sense as it puts the interrupt on the
      least interrupt plagued CPU. For single CPU affine vectors that's obviously
      a NOOP.
      
      Restructre the matrix allocation code so it does the 'best CPU' search, add
      the sanity check for an empty affinity mask and adapt the call site in the
      x86 vector management code.
      
      [ tglx: Added the empty mask check to the core and improved change log ]
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cae7757
    • D
      irq/matrix: Split out the CPU selection code into a helper · 2948b887
      Dou Liyang 提交于
      [ Upstream commit 8ffe4e61c06a48324cfd97f1199bb9838acce2f2 ]
      
      Linux finds the CPU which has the lowest vector allocation count to spread
      out the non managed interrupts across the possible target CPUs, but does
      not do so for managed interrupts.
      
      Split out the CPU selection code into a helper function for reuse. No
      functional change.
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liyang@163.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2948b887
  5. 13 2月, 2019 1 次提交
    • L
      genirq/affinity: Spread IRQs to all available NUMA nodes · 46ed4f4f
      Long Li 提交于
      [ Upstream commit b82592199032bf7c778f861b936287e37ebc9f62 ]
      
      If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
      which are allocated for a device, the interrupt affinity spreading code
      fails to spread them across all nodes.
      
      The reason is, that the spreading code starts from node 0 and continues up
      to the number of interrupts requested for allocation. This leaves the nodes
      past the last interrupt unused.
      
      This results in interrupt concentration on the first nodes which violates
      the assumption of the block layer that all nodes are covered evenly. As a
      consequence the NUMA nodes above the number of interrupts are all assigned
      to hardware queue 0 and therefore NUMA node 0, which results in bad
      performance and has CPU hotplug implications, because queue 0 gets shut
      down when the last CPU of node 0 is offlined.
      
      Go over all NUMA nodes and assign them round-robin to all requested
      interrupts to solve this.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Cc: Michael Kelley <mikelley@microsoft.com>
      Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      46ed4f4f
  6. 14 11月, 2018 1 次提交
    • L
      genirq: Fix race on spurious interrupt detection · e6d2f788
      Lukas Wunner 提交于
      commit 746a923b863a1065ef77324e1e43f19b1a3eab5c upstream.
      
      Commit 1e77d0a1 ("genirq: Sanitize spurious interrupt detection of
      threaded irqs") made detection of spurious interrupts work for threaded
      handlers by:
      
      a) incrementing a counter every time the thread returns IRQ_HANDLED, and
      b) checking whether that counter has increased every time the thread is
         woken.
      
      However for oneshot interrupts, the commit unmasks the interrupt before
      incrementing the counter.  If another interrupt occurs right after
      unmasking but before the counter is incremented, that interrupt is
      incorrectly considered spurious:
      
      time
       |  irq_thread()
       |    irq_thread_fn()
       |      action->thread_fn()
       |      irq_finalize_oneshot()
       |        unmask_threaded_irq()            /* interrupt is unmasked */
       |
       |                  /* interrupt fires, incorrectly deemed spurious */
       |
       |    atomic_inc(&desc->threads_handled); /* counter is incremented */
       v
      
      This is observed with a hi3110 CAN controller receiving data at high volume
      (from a separate machine sending with "cangen -g 0 -i -x"): The controller
      signals a huge number of interrupts (hundreds of millions per day) and
      every second there are about a dozen which are deemed spurious.
      
      In theory with high CPU load and the presence of higher priority tasks, the
      number of incorrectly detected spurious interrupts might increase beyond
      the 99,900 threshold and cause disablement of the interrupt.
      
      In practice it just increments the spurious interrupt count. But that can
      cause people to waste time investigating it over and over.
      
      Fix it by moving the accounting before the invocation of
      irq_finalize_oneshot().
      
      [ tglx: Folded change log update ]
      
      Fixes: 1e77d0a1 ("genirq: Sanitize spurious interrupt detection of threaded irqs")
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Akshay Bhat <akshay.bhat@timesys.com>
      Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
      Cc: stable@vger.kernel.org # v3.16+
      Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6d2f788
  7. 03 8月, 2018 2 次提交
    • T
      genirq: Make force irq threading setup more robust · d1f0301b
      Thomas Gleixner 提交于
      The support of force threading interrupts which are set up with both a
      primary and a threaded handler wreckaged the setup of regular requested
      threaded interrupts (primary handler == NULL).
      
      The reason is that it does not check whether the primary handler is set to
      the default handler which wakes the handler thread. Instead it replaces the
      thread handler with the primary handler as it would do with force threaded
      interrupts which have been requested via request_irq(). So both the primary
      and the thread handler become the same which then triggers the warnon that
      the thread handler tries to wakeup a not configured secondary thread.
      
      Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
      when requesting the threaded interrupt, which is normaly caught by the
      sanity checks when force irq threading is disabled.
      
      Fix it by skipping the force threading setup when a regular threaded
      interrupt is requested. As a consequence the interrupt request which lacks
      the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
      it.
      
      Fixes: 2a1d3ab8 ("genirq: Handle force threading of irqs with primary and thread handler")
      Reported-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Cc: stable@vger.kernel.org
      d1f0301b
    • P
      genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete · 4f7799d9
      Palmer Dabbelt 提交于
      Now that every user of MULTI_IRQ_HANDLER has been convereted over to use
      GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER.
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux@armlinux.org.uk
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: jonas@southpole.se
      Cc: stefan.kristiansson@saunalahti.fi
      Cc: shorne@gmail.com
      Cc: jason@lakedaemon.net
      Cc: marc.zyngier@arm.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: nicolas.pitre@linaro.org
      Cc: vladimir.murzin@arm.com
      Cc: keescook@chromium.org
      Cc: jinb.park7@gmail.com
      Cc: yamada.masahiro@socionext.com
      Cc: alexandre.belloni@bootlin.com
      Cc: pombredanne@nexb.com
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: kstewart@linuxfoundation.org
      Cc: jhogan@kernel.org
      Cc: mark.rutland@arm.com
      Cc: ard.biesheuvel@linaro.org
      Cc: james.morse@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: openrisc@lists.librecores.org
      Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com
      4f7799d9
  8. 17 7月, 2018 1 次提交
  9. 24 6月, 2018 2 次提交
    • L
      genirq: Synchronize only with single thread on free_irq() · 519cc865
      Lukas Wunner 提交于
      When pciehp is converted to threaded IRQ handling, removal of unplugged
      devices below a PCIe hotplug port happens synchronously in the IRQ thread.
      Removal of devices typically entails a call to free_irq() by their drivers.
      
      If those devices share their IRQ with the hotplug port, __free_irq()
      deadlocks because it calls synchronize_irq() to wait for all hard IRQ
      handlers as well as all threads sharing the IRQ to finish.
      
      Actually it's sufficient to wait only for the IRQ thread of the removed
      device, so call synchronize_hardirq() to wait for all hard IRQ handlers to
      finish, but no longer for any threads.  Compensate by rearranging the
      control flow in irq_wait_for_interrupt() such that the device's thread is
      allowed to run one last time after kthread_stop() has been called.
      
      kthread_stop() blocks until the IRQ thread has completed.  On completion
      the IRQ thread clears its oneshot thread_mask bit.  This is safe because
      __free_irq() holds the request_mutex, thereby preventing __setup_irq() from
      handing out the same oneshot thread_mask bit to a newly requested action.
      
      Stack trace for posterity:
          INFO: task irq/17-pciehp:94 blocked for more than 120 seconds.
          schedule+0x28/0x80
          synchronize_irq+0x6e/0xa0
          __free_irq+0x15a/0x2b0
          free_irq+0x33/0x70
          pciehp_release_ctrl+0x98/0xb0
          pcie_port_remove_service+0x2f/0x40
          device_release_driver_internal+0x157/0x220
          bus_remove_device+0xe2/0x150
          device_del+0x124/0x340
          device_unregister+0x16/0x60
          remove_iter+0x1a/0x20
          device_for_each_child+0x4b/0x90
          pcie_port_device_remove+0x1e/0x30
          pci_device_remove+0x36/0xb0
          device_release_driver_internal+0x157/0x220
          pci_stop_bus_device+0x7d/0xa0
          pci_stop_bus_device+0x3d/0xa0
          pci_stop_and_remove_bus_device+0xe/0x20
          pciehp_unconfigure_device+0xb8/0x160
          pciehp_disable_slot+0x84/0x130
          pciehp_ist+0x158/0x190
          irq_thread_fn+0x1b/0x50
          irq_thread+0x143/0x1a0
          kthread+0x111/0x130
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: linux-pci@vger.kernel.org
      Link: https://lkml.kernel.org/r/d72b41309f077c8d3bee6cc08ad3662d50b5d22a.1529828292.git.lukas@wunner.de
      519cc865
    • L
      genirq: Update code comments wrt recycled thread_mask · 836557bd
      Lukas Wunner 提交于
      Previously a race existed between __free_irq() and __setup_irq() wherein
      the thread_mask of a just removed action could be handed out to a newly
      added action and the freed irq thread would then tread on the oneshot
      mask bit of the newly added irq thread in irq_finalize_oneshot():
      
      time
       |  __free_irq()
       |    raw_spin_lock_irqsave(&desc->lock, flags);
       |    <remove action from linked list>
       |    raw_spin_unlock_irqrestore(&desc->lock, flags);
       |
       |  __setup_irq()
       |    raw_spin_lock_irqsave(&desc->lock, flags);
       |    <traverse linked list to determine oneshot mask bit>
       |    raw_spin_unlock_irqrestore(&desc->lock, flags);
       |
       |  irq_thread() of freed irq (__free_irq() waits in synchronize_irq())
       |    irq_thread_fn()
       |      irq_finalize_oneshot()
       |        raw_spin_lock_irq(&desc->lock);
       |        desc->threads_oneshot &= ~action->thread_mask;
       |        raw_spin_unlock_irq(&desc->lock);
       v
      
      The race was known at least since 2012 when it was documented in a code
      comment by commit e04268b0 ("genirq: Remove paranoid warnons and bogus
      fixups"). The race itself is harmless as nothing touches any of the
      potentially freed data after synchronize_irq().
      
      In 2017 the race was close by commit 9114014c ("genirq: Add mutex to
      irq desc to serialize request/free_irq()"), apparently inadvertantly so
      because the race is neither mentioned in the commit message nor was the
      code comment updated.  Make up for that.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: linux-pci@vger.kernel.org
      Link: https://lkml.kernel.org/r/32fc25aa35ecef4b2692f57687bb7fc2a57230e2.1529828292.git.lukas@wunner.de
      836557bd
  10. 22 6月, 2018 2 次提交
  11. 19 6月, 2018 2 次提交
  12. 06 6月, 2018 4 次提交
    • T
      genirq/affinity: Defer affinity setting if irq chip is busy · 12f47073
      Thomas Gleixner 提交于
      The case that interrupt affinity setting fails with -EBUSY can be handled
      in the kernel completely by using the already available generic pending
      infrastructure.
      
      If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
      interrupts for which irq_chip::set_affinity() can only be invoked from
      interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
      set the affinity pending bit. The next raised interrupt for the affected
      irq will check the pending bit and try to set the new affinity from the
      handler. This avoids that -EBUSY is returned when an affinity change is
      requested from user space and the previous change has not been cleaned
      up. The new affinity will take effect when the next interrupt is raised
      from the device.
      
      Fixes: dccfe314 ("x86/vector: Simplify vector move cleanup")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
      12f47073
    • T
      genirq/migration: Avoid out of line call if pending is not set · d340ebd6
      Thomas Gleixner 提交于
      The upcoming fix for the -EBUSY return from affinity settings requires to
      use the irq_move_irq() functionality even on irq remapped interrupts. To
      avoid the out of line call, move the check for the pending bit into an
      inline helper.
      
      Preparatory change for the real fix. No functional change.
      
      Fixes: dccfe314 ("x86/vector: Simplify vector move cleanup")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
      d340ebd6
    • T
      genirq/generic_pending: Do not lose pending affinity update · a33a5d2d
      Thomas Gleixner 提交于
      The generic pending interrupt mechanism moves interrupts from the interrupt
      handler on the original target CPU to the new destination CPU. This is
      required for x86 and ia64 due to the way the interrupt delivery and
      acknowledge works if the interrupts are not remapped.
      
      However that update can fail for various reasons. Some of them are valid
      reasons to discard the pending update, but the case, when the previous move
      has not been fully cleaned up is not a legit reason to fail.
      
      Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
      a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
      tried again when the next interrupt arrives.
      
      Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <liu.song.a23@gmail.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
      a33a5d2d
    • S
      ide: don't enable/disable interrupts in force threaded-IRQ mode · 47b82e88
      Sebastian Andrzej Siewior 提交于
      The interrupts are enabled/disabled so the interrupt handler can run
      with enabled interrupts while serving the interrupt and not lose other
      interrupts especially the timer tick.
      If the system runs with force-threaded interrupts then there is no need
      to enable the interrupts.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47b82e88
  13. 16 5月, 2018 1 次提交
  14. 13 5月, 2018 1 次提交
    • M
      genirq/msi: Allow level-triggered MSIs to be exposed by MSI providers · 0be8153c
      Marc Zyngier 提交于
      So far, MSIs have been used to signal edge-triggered interrupts, as
      a write is a good model for an edge (you can't "unwrite" something).
      On the other hand, routing zillions of wires in an SoC because you
      need level interrupts is a bit extreme.
      
      People have come up with a variety of schemes to support this, which
      involves sending two messages: one to signal the interrupt, and one
      to clear it. Since the kernel cannot represent this, we've ended up
      with side-band mechanisms that are pretty awful.
      
      Instead, let's acknoledge the requirement, and ensure that, under the
      right circumstances, the irq_compose_msg and irq_write_msg can take
      as a parameter an array of two messages instead of a pointer to a
      single one. We also add some checking that the compose method only
      clobbers the second message if the MSI domain has been created with
      the MSI_FLAG_LEVEL_CAPABLE flags.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lkml.kernel.org/r/20180508121438.11301-2-marc.zyngier@arm.com
      0be8153c
  15. 27 4月, 2018 1 次提交
  16. 06 4月, 2018 5 次提交
    • M
      genirq/affinity: Spread irq vectors among present CPUs as far as possible · d3056812
      Ming Lei 提交于
      Commit 84676c1f ("genirq/affinity: assign vectors to all possible CPUs")
      tried to spread the interrupts accross all possible CPUs to make sure that
      in case of phsyical hotplug (e.g. virtualization) the CPUs which get
      plugged in after the device was initialized are targeted by a hardware
      queue and the corresponding interrupt.
      
      This has a downside in cases where the ACPI tables claim that there are
      more possible CPUs than present CPUs and the number of interrupts to spread
      out is smaller than the number of possible CPUs. These bogus ACPI tables
      are unfortunately not uncommon.
      
      In such a case the vector spreading algorithm assigns interrupts to CPUs
      which can never be utilized and as a consequence these interrupts are
      unused instead of being mapped to present CPUs. As a result the performance
      of the device is suboptimal.
      
      To fix this spread the interrupt vectors in two stages:
      
       1) Spread as many interrupts as possible among the present CPUs
      
       2) Spread the remaining vectors among non present CPUs
      
      On a 8 core system, where CPU 0-3 are present and CPU 4-7 are not present,
      for a device with 4 queues the resulting interrupt affinity is:
      
        1) Before 84676c1f ("genirq/affinity: assign vectors to all possible CPUs")
      	irq 39, cpu list 0
      	irq 40, cpu list 1
      	irq 41, cpu list 2
      	irq 42, cpu list 3
      
        2) With 84676c1f ("genirq/affinity: assign vectors to all possible CPUs")
      	irq 39, cpu list 0-2
      	irq 40, cpu list 3-4,6
      	irq 41, cpu list 5
      	irq 42, cpu list 7
      
        3) With the refined vector spread applied:
      	irq 39, cpu list 0,4
      	irq 40, cpu list 1,6
      	irq 41, cpu list 2,5
      	irq 42, cpu list 3,7
      
      On a 8 core system, where all CPUs are present the resulting interrupt
      affinity for the 4 queues is:
      
      	irq 39, cpu list 0,1
      	irq 40, cpu list 2,3
      	irq 41, cpu list 4,5
      	irq 42, cpu list 6,7
      
      This is independent of the number of CPUs which are online at the point of
      initialization because in such a system the offline CPUs can be easily
      onlined afterwards, while in non-present CPUs need to be plugged physically
      or virtually which requires external interaction.
      
      The downside of this approach is that in case of physical hotplug the
      interrupt vector spreading might be suboptimal when CPUs 4-7 are physically
      plugged. Suboptimal from a NUMA point of view and due to the single target
      nature of interrupt affinities the later plugged CPUs might not be targeted
      by interrupts at all.
      
      Though, physical hotplug systems are not the common case while the broken
      ACPI table disease is wide spread. So it's preferred to have as many
      interrupts as possible utilized at the point where the device is
      initialized.
      
      Block multi-queue devices like NVME create a hardware queue per possible
      CPU, so the goal of commit 84676c1f to assign one interrupt vector per
      possible CPU is still achieved even with physical/virtual hotplug.
      
      [ tglx: Changed from online to present CPUs for the first spreading stage,
        	renamed variables for readability sake, added comments and massaged
        	changelog ]
      Reported-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Link: https://lkml.kernel.org/r/20180308105358.1506-5-ming.lei@redhat.com
      d3056812
    • M
      genirq/affinity: Allow irq spreading from a given starting point · 1a2d0914
      Ming Lei 提交于
      To support two stage irq vector spreading, it's required to add a starting
      point to the spreading function. No functional change, just preparatory
      work for the actual two stage change.
      
      [ tglx: Renamed variables, tidied up the code and massaged changelog ]
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Link: https://lkml.kernel.org/r/20180308105358.1506-4-ming.lei@redhat.com
      1a2d0914
    • M
      genirq/affinity: Move actual irq vector spreading into a helper function · b3e6aaa8
      Ming Lei 提交于
      No functional change, just prepare for converting to 2-stage irq vector
      spreading.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Link: https://lkml.kernel.org/r/20180308105358.1506-3-ming.lei@redhat.com
      b3e6aaa8
    • M
      genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask · 47778f33
      Ming Lei 提交于
      The following patches will introduce two stage irq spreading for improving
      irq spread on all possible CPUs.
      
      No functional change.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Link: https://lkml.kernel.org/r/20180308105358.1506-2-ming.lei@redhat.com
      47778f33
    • T
      genirq/affinity: Don't return with empty affinity masks on error · 0211e12d
      Thomas Gleixner 提交于
      When the allocation of node_to_possible_cpumask fails, then
      irq_create_affinity_masks() returns with a pointer to the empty affinity
      masks array, which will cause malfunction.
      
      Reorder the allocations so the masks array allocation comes last and every
      failure path returns NULL.
      
      Fixes: 9a0ef98e ("genirq/affinity: Assign vectors to all present CPUs")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      0211e12d
  17. 04 4月, 2018 1 次提交
  18. 20 3月, 2018 5 次提交
  19. 15 3月, 2018 1 次提交
    • P
      genirq: Add CONFIG_GENERIC_IRQ_MULTI_HANDLER · caacdbf4
      Palmer Dabbelt 提交于
      The arm multi irq handler registration mechanism has been copied into a
      handful of architectures, including arm64 and openrisc. RISC-V needs the
      same mechanism.
      
      Instead of adding yet another copy for RISC-V copy the arm implementation
      into the core code depending on a new Kconfig symbol:
      CONFIG_GENERIC_MULTI_IRQ_HANDLER.
      
      Subsequent patches will convert the various architectures.
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: jonas@southpole.se
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux@armlinux.org.uk
      Cc: stefan.kristiansson@saunalahti.fi
      Cc: openrisc@lists.librecores.org
      Cc: shorne@gmail.com
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lkml.kernel.org/r/20180307235731.22627-2-palmer@sifive.com
      caacdbf4
  20. 09 3月, 2018 2 次提交