1. 12 10月, 2010 3 次提交
  2. 04 10月, 2010 15 次提交
  3. 24 9月, 2010 1 次提交
  4. 24 3月, 2010 1 次提交
  5. 11 3月, 2010 1 次提交
    • T
      genirq: Prevent oneshot irq thread race · 0b1adaa0
      Thomas Gleixner 提交于
      Lars-Peter pointed out that the oneshot threaded interrupt handler
      code has the following race:
      
       CPU0                            CPU1
       hande_level_irq(irq X)
         mask_ack_irq(irq X)
         handle_IRQ_event(irq X)
           wake_up(thread_handler)
                                       thread handler(irq X) runs
                                       finalize_oneshot(irq X)
      				  does not unmask due to 
      				  !(desc->status & IRQ_MASKED)
      
       return from irq
       does not unmask due to
       (desc->status & IRQ_ONESHOT)
        				  
      This leaves the interrupt line masked forever. 
      
      The reason for this is the inconsistent handling of the IRQ_MASKED
      flag. Instead of setting it in the mask function the oneshot support
      sets the flag after waking up the irq thread.
      
      The solution for this is to set/clear the IRQ_MASKED status whenever
      we mask/unmask an interrupt line. That's the easy part, but that
      cleanup opens another race:
      
       CPU0                            CPU1
       hande_level_irq(irq)
         mask_ack_irq(irq)
         handle_IRQ_event(irq)
           wake_up(thread_handler)
                                       thread handler(irq) runs
                                       finalize_oneshot_irq(irq)
      				  unmask(irq)
           irq triggers again
           handle_level_irq(irq)
             mask_ack_irq(irq)
           return from irq due to IRQ_INPROGRESS				  
      
       return from irq
       does not unmask due to
       (desc->status & IRQ_ONESHOT)
      
      This requires that we synchronize finalize_oneshot_irq() with the
      primary handler. If IRQ_INPROGESS is set we wait until the primary
      handler on the other CPU has returned before unmasking the interrupt
      line again.
      
      We probably have never seen that problem because it does not happen on
      UP and on SMP the irqbalancer protects us by pinning the primary
      handler and the thread to the same CPU.
      Reported-by: NLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      0b1adaa0
  6. 19 2月, 2010 1 次提交
    • B
      x86, irq: Keep chip_data in create_irq_nr and destroy_irq · eb5b3794
      Brandon Philips 提交于
      Version 4: use get_irq_chip_data() in destroy_irq() to get rid of some
      local vars.
      
      When two drivers are setting up MSI-X at the same time via
      pci_enable_msix() there is a race.  See this dmesg excerpt:
      
      [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
      [   85.170611]   alloc irq_desc for 99 on node -1
      [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
      [   85.170614]   alloc kstat_irqs on node -1
      [   85.170616] alloc irq_2_iommu on node -1
      [   85.170617]   alloc irq_desc for 100 on node -1
      [   85.170619]   alloc kstat_irqs on node -1
      [   85.170621] alloc irq_2_iommu on node -1
      [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
      [   85.170626]   alloc irq_desc for 101 on node -1
      [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
      [   85.170630]   alloc kstat_irqs on node -1
      [   85.170631] alloc irq_2_iommu on node -1
      [   85.170635]   alloc irq_desc for 102 on node -1
      [   85.170636]   alloc kstat_irqs on node -1
      [   85.170639] alloc irq_2_iommu on node -1
      [   85.170646] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000088
      
      As you can see igb and ixgbe are both alternating on create_irq_nr()
      via pci_enable_msix() in their probe function.
      
      ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
      choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
      calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
      NULL via dynamic_irq_init().
      
      igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
      via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
      
      	cfg_new = irq_desc_ptrs[102]->chip_data;
      	if (cfg_new->vector != 0)
      		continue;
      
      This hits the NULL deref.
      
      Another possible race exists via pci_disable_msix() in a driver or in
      the number of error paths that call free_msi_irqs():
      
      destroy_irq()
      dynamic_irq_cleanup() which sets desc->chip_data = NULL
      ...race window...
      desc->chip_data = cfg;
      
      Remove the save and restore code for cfg in create_irq_nr() and
      destroy_irq() and take the desc->lock when checking the irq_cfg.
      Reported-and-analyzed-by: NBrandon Philips <bphilips@suse.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20100207210250.GB8256@jenkins.home.ifup.org>
      Signed-off-by: NBrandon Phiilps <bphilips@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      eb5b3794
  7. 15 2月, 2010 1 次提交
  8. 11 2月, 2010 1 次提交
    • B
      x86: Avoid race condition in pci_enable_msix() · ced5b697
      Brandon Phiilps 提交于
      Keep chip_data in create_irq_nr and destroy_irq.
      
      When two drivers are setting up MSI-X at the same time via
      pci_enable_msix() there is a race.  See this dmesg excerpt:
      
      [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
      [   85.170611]   alloc irq_desc for 99 on node -1
      [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
      [   85.170614]   alloc kstat_irqs on node -1
      [   85.170616] alloc irq_2_iommu on node -1
      [   85.170617]   alloc irq_desc for 100 on node -1
      [   85.170619]   alloc kstat_irqs on node -1
      [   85.170621] alloc irq_2_iommu on node -1
      [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
      [   85.170626]   alloc irq_desc for 101 on node -1
      [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
      [   85.170630]   alloc kstat_irqs on node -1
      [   85.170631] alloc irq_2_iommu on node -1
      [   85.170635]   alloc irq_desc for 102 on node -1
      [   85.170636]   alloc kstat_irqs on node -1
      [   85.170639] alloc irq_2_iommu on node -1
      [   85.170646] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000088
      
      As you can see igb and ixgbe are both alternating on create_irq_nr()
      via pci_enable_msix() in their probe function.
      
      ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
      choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
      calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
      NULL via dynamic_irq_init().
      
      igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
      via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
      
      	cfg_new = irq_desc_ptrs[102]->chip_data;
      	if (cfg_new->vector != 0)
      		continue;
      
      This hits the NULL deref.
      
      Another possible race exists via pci_disable_msix() in a driver or in
      the number of error paths that call free_msi_irqs():
      
      destroy_irq()
      dynamic_irq_cleanup() which sets desc->chip_data = NULL
      ...race window...
      desc->chip_data = cfg;
      
      Remove the save and restore code for cfg in create_irq_nr() and
      destroy_irq() and take the desc->lock when checking the irq_cfg.
      Reported-and-analyzed-by: NBrandon Philips <bphilips@suse.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
      Signed-off-by: NBrandon Phililps <bphilips@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ced5b697
  9. 15 12月, 2009 1 次提交
  10. 04 12月, 2009 1 次提交
  11. 04 11月, 2009 1 次提交
  12. 27 8月, 2009 1 次提交
    • T
      genirq: Do not mask oneshot edge type interrupts · 4dbc9ca2
      Thomas Gleixner 提交于
      Masking oneshot edge type interrupts is wrong as we might lose an
      interrupt which is issued when the threaded handler is handling the
      device. We can keep the irq unmasked safely as with edge type
      interrupts there is no danger of interrupt floods. If the threaded
      handler has not yet finished then IRQTF_RUNTHREAD is set which will
      keep the handler thread active.
      
      Debugged and verified in preempt-rt.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4dbc9ca2
  13. 17 8月, 2009 3 次提交
    • T
      genirq: Support nested threaded irq handling · 399b5da2
      Thomas Gleixner 提交于
      Interrupt chips which are behind a slow bus (i2c, spi ...) and
      demultiplex other interrupt sources need to run their interrupt
      handler in a thread. 
      
      The demultiplexed interrupt handlers need to run in thread context as
      well and need to finish before the demux handler thread can reenable
      the interrupt line. So the easiest way is to run the sub device
      handlers in the context of the demultiplexing handler thread.
      
      To avoid that a separate thread is created for the subdevices the
      function set_nested_irq_thread() is provided which sets the
      IRQ_NESTED_THREAD flag in the interrupt descriptor.
      
      A driver which calls request_threaded_irq() must not be aware of the
      fact that the threaded handler is called in the context of the
      demultiplexing handler thread. The setup code checks the
      IRQ_NESTED_THREAD flag which was set from the irq chip setup code and
      does not setup a separate thread for the interrupt. The primary
      function which is provided by the device driver is replaced by an
      internal dummy function which warns when it is called.
      
      For the demultiplexing handler a helper function handle_nested_irq()
      is provided which calls the demux interrupt thread function in the
      context of the caller and does the proper interrupt accounting and
      takes the interrupt disabled status of the demultiplexed subdevice
      into account.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Trilok Soni <soni.trilok@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Brian Swetland <swetland@google.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: m.szyprowski@samsung.com
      Cc: t.fujak@samsung.com
      Cc: kyungmin.park@samsung.com,
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Daniel Ribeiro <drwyrm@gmail.com>
      Cc: arve@android.com
      Cc: Barry Song <21cnbao@gmail.com>
      399b5da2
    • T
      genirq: Add buslock support · 70aedd24
      Thomas Gleixner 提交于
      Some interrupt chips are connected to a "slow" bus (i2c, spi ...). The
      bus access needs to sleep and therefor cannot be called in atomic
      contexts.
      
      Some of the generic interrupt management functions like disable_irq(),
      enable_irq() ... call interrupt chip functions with the irq_desc->lock
      held and interrupts disabled. This does not work for such devices.
      
      Provide a separate synchronization mechanism for such interrupt
      chips. The irq_chip structure is extended by two optional functions
      (bus_lock and bus_sync_and_unlock).
      
      The idea is to serialize the bus access for those operations in the
      core code so that drivers which are behind that bus operated interrupt
      controller do not have to worry about it and just can use the normal
      interfaces. To achieve this we add two function pointers to the
      irq_chip: bus_lock and bus_sync_unlock.
      
      bus_lock() is called to serialize access to the interrupt controller
      bus.
      
      Now the core code can issue chip->mask/unmask ... commands without
      changing the fast path code at all. The chip implementation merily
      stores that information in a chip private data structure and
      returns. No bus interaction as these functions are called from atomic
      context.
      
      After that bus_sync_unlock() is called outside the atomic context. Now
      the chip implementation issues the bus commands, waits for completion
      and unlocks the interrupt controller bus.
      
      The irq_chip implementation as pseudo code:
      
      struct irq_chip_data {
             struct mutex   mutex;
             unsigned int   irq_offset;
             unsigned long  mask;
             unsigned long  mask_status;
      }
      
      static void bus_lock(unsigned int irq)
      {
              struct irq_chip_data *data = get_irq_desc_chip_data(irq);
      
              mutex_lock(&data->mutex);
      }
      
      static void mask(unsigned int irq)
      {
              struct irq_chip_data *data = get_irq_desc_chip_data(irq);
      
              irq -= data->irq_offset;
              data->mask |= (1 << irq);
      }
      
      static void unmask(unsigned int irq)
      {
              struct irq_chip_data *data = get_irq_desc_chip_data(irq);
      
              irq -= data->irq_offset;
              data->mask &= ~(1 << irq);
      }
      
      static void bus_sync_unlock(unsigned int irq)
      {
              struct irq_chip_data *data = get_irq_desc_chip_data(irq);
      
              if (data->mask != data->mask_status) {
                      do_bus_magic_to_set_mask(data->mask);
                      data->mask_status = data->mask;
              }
              mutex_unlock(&data->mutex);
      }
      
      The device drivers can use request_threaded_irq, free_irq, disable_irq
      and enable_irq as usual with the only restriction that the calls need
      to come from non atomic context.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Trilok Soni <soni.trilok@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Brian Swetland <swetland@google.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: m.szyprowski@samsung.com
      Cc: t.fujak@samsung.com
      Cc: kyungmin.park@samsung.com,
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Daniel Ribeiro <drwyrm@gmail.com>
      Cc: arve@android.com
      Cc: Barry Song <21cnbao@gmail.com>
      70aedd24
    • T
      genirq: Add oneshot support · b25c340c
      Thomas Gleixner 提交于
      For threaded interrupt handlers we expect the hard interrupt handler
      part to mask the interrupt on the originating device. The interrupt
      line itself is reenabled after the hard interrupt handler has
      executed.
      
      This requires access to the originating device from hard interrupt
      context which is not always possible. There are devices which can only
      be accessed via a bus (i2c, spi, ...). The bus access requires thread
      context. For such devices we need to keep the interrupt line masked
      until the threaded handler has executed.
      
      Add a new flag IRQF_ONESHOT which allows drivers to request that the
      interrupt is not unmasked after the hard interrupt context handler has
      been executed and the thread has been woken. The interrupt line is
      unmasked after the thread handler function has been executed.
      
      Note that for now IRQF_ONESHOT cannot be used with IRQF_SHARED to
      avoid complex accounting mechanisms.
      
      For oneshot interrupts the primary handler simply returns
      IRQ_WAKE_THREAD and does nothing else. A generic implementation
      irq_default_primary_handler() is provided to avoid useless copies all
      over the place. It is automatically installed when
      request_threaded_irq() is called with handler=NULL and
      thread_fn!=NULL.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Trilok Soni <soni.trilok@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Brian Swetland <swetland@google.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: m.szyprowski@samsung.com
      Cc: t.fujak@samsung.com
      Cc: kyungmin.park@samsung.com,
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Daniel Ribeiro <drwyrm@gmail.com>
      Cc: arve@android.com
      Cc: Barry Song <21cnbao@gmail.com>
      b25c340c
  14. 28 4月, 2009 1 次提交
    • Y
      x86/irq: remove leftover code from NUMA_MIGRATE_IRQ_DESC · fcef5911
      Yinghai Lu 提交于
      The original feature of migrating irq_desc dynamic was too fragile
      and was causing problems: it caused crashes on systems with lots of
      cards with MSI-X when user-space irq-balancer was enabled.
      
      We now have new patches that create irq_desc according to device
      numa node. This patch removes the leftover bits of the dynamic balancer.
      
      [ Impact: remove dead code ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <49F654AF.8000808@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fcef5911
  15. 09 2月, 2009 1 次提交
    • Y
      irq: clear kstat_irqs · 0f3c2a89
      Yinghai Lu 提交于
      Impact: get correct kstat_irqs [/proc/interrupts] for msi/msi-x etc
      
      need to call clear_kstat_irqs(), so when we reuse that irq_desc,
      we get correct kstat in /proc/interrupts.
      
      This makes /proc/interrupts not have <NULL> entries.
      
      Don't need to worry about arch that doesn't support genirq, because they
      will not call dynamic_irq_cleanup().
      
      v2: simplify and make clear_kstat_irqs more robust
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f3c2a89
  16. 14 1月, 2009 1 次提交
  17. 12 1月, 2009 1 次提交
    • M
      cpumask: update irq_desc to use cpumask_var_t · 7f7ace0c
      Mike Travis 提交于
      Impact: reduce memory usage, use new cpumask API.
      
      Replace the affinity and pending_masks with cpumask_var_t's.  This adds
      to the significant size reduction done with the SPARSE_IRQS changes.
      
      The added functions (init_alloc_desc_masks & init_copy_desc_masks) are
      in the include file so they can be inlined (and optimized out for the
      !CONFIG_CPUMASKS_OFFSTACK case.)  [Naming chosen to be consistent with
      the other init*irq functions, as well as the backwards arg declaration
      of "from, to" instead of the more common "to, from" standard.]
      
      Includes a slight change to the declaration of struct irq_desc to embed
      the pending_mask within ifdef(CONFIG_SMP) to be consistent with other
      references, and some small changes to Xen.
      
      Tested: sparse/non-sparse/cpumask_offstack/non-cpumask_offstack/nonuma/nosmp on x86_64
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: virtualization@lists.osdl.org
      Cc: xen-devel@lists.xensource.com
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      7f7ace0c
  18. 29 12月, 2008 1 次提交
  19. 17 12月, 2008 1 次提交
    • Y
      x86, sparseirq: move irq_desc according to smp_affinity, v7 · 48a1b10a
      Yinghai Lu 提交于
      Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes
      
      if CONFIG_NUMA_MIGRATE_IRQ_DESC is set:
      
      -  make irq_desc to go with affinity aka irq_desc moving etc
      -  call move_irq_desc in irq_complete_move()
      -  legacy irq_desc is not moved, because they are allocated via static array
      
      for logical apic mode, need to add move_desc_in_progress_in_same_domain,
      otherwise it will not be moved ==> also could need two phases to get
      irq_desc moved.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48a1b10a
  20. 13 12月, 2008 1 次提交
  21. 08 12月, 2008 1 次提交
    • Y
      sparse irq_desc[] array: core kernel and x86 changes · 0b8f1efa
      Yinghai Lu 提交于
      Impact: new feature
      
      Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
      NR_CPUS set to large values. The goal is to be able to scale up to much
      larger NR_IRQS value without impacting the (important) common case.
      
      To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
      irq_desc pointers.
      
      When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
      this also makes the IRQ descriptors NUMA-local (to the site that calls
      request_irq()).
      
      This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
      uses desc->chip_data for x86 to store irq_cfg.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b8f1efa
  22. 02 12月, 2008 1 次提交