1. 05 5月, 2010 2 次提交
    • E
      x86, ioapic: Fix io_apic_redir_entries to return the number of entries. · 4b6b19a1
      Eric W. Biederman 提交于
      io_apic_redir_entries has a huge conceptual bug.  It returns the maximum
      redirection entry not the number of redirection entries.  Which simply
      does not match what the name of the function.  This just caught me
      and it caught  Feng Tang, and  Len Brown when they wrote sfi_parse_ioapic.
      
      Modify io_apic_redir_entries to actually return the number of redirection
      entries, and fix the callers so that they properly handle receiving the
      number of the number of redirection table entries, instead of the
      number of redirection table entries less one.
      
      While the usage in sfi.c does not show up in this patch it is fixed
      by virtue of the fact that io_apic_redir_entries now has the semantics
      sfi_parse_ioapic most reasonably expects.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-7-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4b6b19a1
    • E
      x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq · 9a0a91bb
      Eric W. Biederman 提交于
      In perverse acpi implementations the isa irqs are not identity mapped
      to the first 16 gsi.  Furthermore at least the extended interrupt
      resource capability may return gsi's and not isa irqs.  So since
      what we get from acpi is a gsi teach acpi_get_overrride_irq to
      operate on a gsi instead of an isa_irq.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-2-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9a0a91bb
  2. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 16 3月, 2010 1 次提交
    • S
      x86: Handle legacy PIC interrupts on all the cpu's · 36e9e1ea
      Suresh Siddha 提交于
      Ingo Molnar reported that with the recent changes of not
      statically blocking IRQ0_VECTOR..IRQ15_VECTOR's on all the
      cpu's, broke an AMD platform (with Nvidia chipset) boot when
      "noapic" boot option is used.
      
      On this platform, legacy PIC interrupts are getting delivered to
      all the cpu's instead of just the boot cpu. Thus not
      initializing the vector to irq mapping for the legacy irq's
      resulted in not handling certain interrupts causing boot hang.
      
      Fix this by initializing the vector to irq mapping on all the
      logical cpu's, if the legacy IRQ is handled by the legacy PIC.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      [ -v2: io-apic-enabled improvement ]
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1268692386.3296.43.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36e9e1ea
  4. 28 2月, 2010 1 次提交
    • E
      x86: Fix out of order of gsi · fad53995
      Eric W. Biederman 提交于
      Iranna D Ankad reported that IBM x3950 systems have boot
      problems after this commit:
      
       |
       | commit b9c61b70
       |
       |    x86/pci: update pirq_enable_irq() to setup io apic routing
       |
      
      The problem is that with the patch, the machine freezes when
      console=ttyS0,... kernel serial parameter is passed.
      
      It seem to freeze at DVD initialization and the whole problem
      seem to be DVD/pata related, but somehow exposed through the
      serial parameter.
      
      Such apic problems can expose really weird behavior:
      
        ACPI: IOAPIC (id[0x10] address[0xfecff000] gsi_base[0])
        IOAPIC[0]: apic_id 16, version 0, address 0xfecff000, GSI 0-2
        ACPI: IOAPIC (id[0x0f] address[0xfec00000] gsi_base[3])
        IOAPIC[1]: apic_id 15, version 0, address 0xfec00000, GSI 3-38
        ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[39])
        IOAPIC[2]: apic_id 14, version 0, address 0xfec01000, GSI 39-74
        ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 4 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 5 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 6 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 7 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 9 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 10 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 11 low edge)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 12 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 15 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 16 dfl dfl)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 17 low edge)
        ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 18 dfl dfl)
      
      It turns out that the system has three io apic controllers, but
      boot ioapic routing is in the second one, and that gsi_base is
      not 0 - it is using a bunch of INT_SRC_OVR...
      
      So these recent changes:
      
       1. one set routing for first io apic controller
       2. assume irq = gsi
      
      ... will break that system.
      
      So try to remap those gsis, need to seperate boot_ioapic_idx
      detection out of enable_IO_APIC() and call them early.
      
      So introduce boot_ioapic_idx, and remap_ioapic_gsi()...
      
       -v2: shift gsi with delta instead of gsi_base of boot_ioapic_idx
      
       -v3: double check with find_isa_irq_apic(0, mp_INT) to get right
            boot_ioapic_idx
      
       -v4: nr_legacy_irqs
      
       -v5: add print out for boot_ioapic_idx, and also make it could be
            applied for current kernel and previous kernel
      
       -v6: add bus_irq, in acpi_sci_ioapic_setup, so can get overwride
            for sci right mapping...
      
       -v7: looks like pnpacpi get irq instead of gsi, so need to revert
            them back...
      
       -v8: split into two patches
      
       -v9: according to Eric, use fixed 16 for shifting instead of remap
      
       -v10: still need to touch rsparser.c
      
       -v11: just revert back to way Eric suggest...
            anyway the ioapic in first ioapic is blocked by second...
      
       -v12: two patches, this one will add more loop but check apic_id and irq > 16
      Reported-by: NIranna D Ankad <iranna.ankad@in.ibm.com>
      Bisected-by: NIranna D Ankad <iranna.ankad@in.ibm.com>
      Tested-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: len.brown@intel.com
      LKML-Reference: <4B8A321A.1000008@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fad53995
  5. 27 2月, 2010 1 次提交
  6. 25 2月, 2010 2 次提交
  7. 24 2月, 2010 1 次提交
  8. 20 2月, 2010 3 次提交
  9. 19 2月, 2010 1 次提交
    • B
      x86, irq: Keep chip_data in create_irq_nr and destroy_irq · eb5b3794
      Brandon Philips 提交于
      Version 4: use get_irq_chip_data() in destroy_irq() to get rid of some
      local vars.
      
      When two drivers are setting up MSI-X at the same time via
      pci_enable_msix() there is a race.  See this dmesg excerpt:
      
      [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
      [   85.170611]   alloc irq_desc for 99 on node -1
      [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
      [   85.170614]   alloc kstat_irqs on node -1
      [   85.170616] alloc irq_2_iommu on node -1
      [   85.170617]   alloc irq_desc for 100 on node -1
      [   85.170619]   alloc kstat_irqs on node -1
      [   85.170621] alloc irq_2_iommu on node -1
      [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
      [   85.170626]   alloc irq_desc for 101 on node -1
      [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
      [   85.170630]   alloc kstat_irqs on node -1
      [   85.170631] alloc irq_2_iommu on node -1
      [   85.170635]   alloc irq_desc for 102 on node -1
      [   85.170636]   alloc kstat_irqs on node -1
      [   85.170639] alloc irq_2_iommu on node -1
      [   85.170646] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000088
      
      As you can see igb and ixgbe are both alternating on create_irq_nr()
      via pci_enable_msix() in their probe function.
      
      ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
      choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
      calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
      NULL via dynamic_irq_init().
      
      igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
      via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
      
      	cfg_new = irq_desc_ptrs[102]->chip_data;
      	if (cfg_new->vector != 0)
      		continue;
      
      This hits the NULL deref.
      
      Another possible race exists via pci_disable_msix() in a driver or in
      the number of error paths that call free_msi_irqs():
      
      destroy_irq()
      dynamic_irq_cleanup() which sets desc->chip_data = NULL
      ...race window...
      desc->chip_data = cfg;
      
      Remove the save and restore code for cfg in create_irq_nr() and
      destroy_irq() and take the desc->lock when checking the irq_cfg.
      Reported-and-analyzed-by: NBrandon Philips <bphilips@suse.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20100207210250.GB8256@jenkins.home.ifup.org>
      Signed-off-by: NBrandon Phiilps <bphilips@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      eb5b3794
  10. 18 2月, 2010 1 次提交
  11. 17 2月, 2010 2 次提交
  12. 11 2月, 2010 2 次提交
    • B
      x86: Avoid race condition in pci_enable_msix() · ced5b697
      Brandon Phiilps 提交于
      Keep chip_data in create_irq_nr and destroy_irq.
      
      When two drivers are setting up MSI-X at the same time via
      pci_enable_msix() there is a race.  See this dmesg excerpt:
      
      [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
      [   85.170611]   alloc irq_desc for 99 on node -1
      [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
      [   85.170614]   alloc kstat_irqs on node -1
      [   85.170616] alloc irq_2_iommu on node -1
      [   85.170617]   alloc irq_desc for 100 on node -1
      [   85.170619]   alloc kstat_irqs on node -1
      [   85.170621] alloc irq_2_iommu on node -1
      [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
      [   85.170626]   alloc irq_desc for 101 on node -1
      [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
      [   85.170630]   alloc kstat_irqs on node -1
      [   85.170631] alloc irq_2_iommu on node -1
      [   85.170635]   alloc irq_desc for 102 on node -1
      [   85.170636]   alloc kstat_irqs on node -1
      [   85.170639] alloc irq_2_iommu on node -1
      [   85.170646] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000088
      
      As you can see igb and ixgbe are both alternating on create_irq_nr()
      via pci_enable_msix() in their probe function.
      
      ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
      choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
      calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
      NULL via dynamic_irq_init().
      
      igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
      via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
      
      	cfg_new = irq_desc_ptrs[102]->chip_data;
      	if (cfg_new->vector != 0)
      		continue;
      
      This hits the NULL deref.
      
      Another possible race exists via pci_disable_msix() in a driver or in
      the number of error paths that call free_msi_irqs():
      
      destroy_irq()
      dynamic_irq_cleanup() which sets desc->chip_data = NULL
      ...race window...
      desc->chip_data = cfg;
      
      Remove the save and restore code for cfg in create_irq_nr() and
      destroy_irq() and take the desc->lock when checking the irq_cfg.
      Reported-and-analyzed-by: NBrandon Philips <bphilips@suse.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
      Signed-off-by: NBrandon Phililps <bphilips@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ced5b697
    • Y
      x86: Fix SCI on IOAPIC != 0 · 18dce6ba
      Yinghai Lu 提交于
      Thomas Renninger <trenn@suse.de> reported on IBM x3330
      
      booting a latest kernel on this machine results in:
      
      PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
      PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0
      ACPI: SCI (IRQ30) allocation failed
      ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161)
      ACPI: Unable to start the ACPI Interpreter
      
      Later all kind of devices fail...
      
      and bisect it down to this commit:
      commit b9c61b70
      
          x86/pci: update pirq_enable_irq() to setup io apic routing
      
      it turns out we need to set irq routing for the sci on ioapic1 early.
      
      -v2: make it work without sparseirq too.
      -v3: fix checkpatch.pl warning, and cc to stable
      Reported-by: NThomas Renninger <trenn@suse.de>
      Bisected-by: NThomas Renninger <trenn@suse.de>
      Tested-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      18dce6ba
  13. 08 2月, 2010 1 次提交
  14. 30 1月, 2010 2 次提交
    • S
      x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path · 9d133e5d
      Suresh Siddha 提交于
      Lowest priority delivery of logical flat mode is broken on some systems,
      such that even when IO-APIC RTE says deliver the interrupt to a particular CPU,
      interrupt subsystem delivers the interrupt to totally different CPU.
      
      For example, this behavior was observed on a P4 based system with SiS chipset
      which was reported by Li Zefan. We have been handling this kind of behavior by
      making sure that in logical flat mode, we assign the same vector to irq
      mappings on all the 8 possible logical cpu's.
      
      But we have been doing this initial assignment (__setup_vector_irq()) a little
      late (before which interrupts were already enabled for a short duration).
      
      Move the __setup_vector_irq() before the first irq enable point in the
      cpu online path to avoid the issue of not handling some interrupts that
      wrongly hit the cpu which is still coming online.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100129194330.283696385@sbs-t61.sc.intel.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9d133e5d
    • S
      x86, irq: Update the vector domain for legacy irqs handled by io-apic · 69c89efb
      Suresh Siddha 提交于
      In the recent change of not reserving IRQ0_VECTOR..IRQ15_VECTOR's on all
      cpu's, we start with irq 0..15 getting directed to (and handled on) cpu-0.
      
      In the logical flat mode, once the AP's are online (and before irqbalance
      comes into picture), kernel intends to handle these IRQ's on any cpu (as the
      logical flat mode allows to specify multiple cpu's for the irq destination and
      the chipset based routing can deliver to the interrupt to any one of
      the specified cpu's). This was broken with our recent change, which was ending
      up using only cpu 0 as the destination, even when the kernel was specifying to
      use all online cpu's for the logical flat mode case.
      
      Fix this by updating vector allocation domain (cfg->domain) for legacy irqs,
      when the IO-APIC handles them.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100129194330.207790269@sbs-t61.sc.intel.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      69c89efb
  15. 20 1月, 2010 1 次提交
    • S
      x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's · 97943390
      Suresh Siddha 提交于
      Currently IRQ0..IRQ15 are assigned to IRQ0_VECTOR..IRQ15_VECTOR's on
      all the cpu's.
      
      If these IRQ's are handled by legacy pic controller, then the kernel
      handles them only on cpu 0. So there is no need to block this vector
      space on all cpu's.
      
      Similarly if these IRQ's are handled by IO-APIC, then the IRQ affinity
      will determine on which cpu's we need allocate the vector resource for
      that particular IRQ. This can be done dynamically and here also there
      is no need to block 16 vectors for IRQ0..IRQ15 on all cpu's.
      
      Fix this by initially assigning IRQ0..IRQ15 to IRQ0_VECTOR..IRQ15_VECTOR's only
      on cpu 0. If the legacy controllers like pic handles these irq's, then
      this configuration will be fixed. If more modern controllers like IO-APIC
      handle these IRQ's, then we start with this configuration and as IRQ's
      migrate, vectors (/and cpu's) associated with these IRQ's change dynamically.
      
      This will freeup the block of 16 vectors on other cpu's which don't handle
      IRQ0..IRQ15, which can now be used for other IRQ's that the particular cpu
      handle.
      
      [ hpa: this also an architectural cleanup for future legacy-PIC-free
        configurations. ]
      [ hpa: fixed typo NR_LEGACY_IRQS -> NR_IRQS_LEGACY ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1263932453.2814.52.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      97943390
  16. 19 1月, 2010 1 次提交
    • S
      x86, irq: Use 0x20 for the IRQ_MOVE_CLEANUP_VECTOR instead of 0x1f · 6579b474
      Suresh Siddha 提交于
      After talking to some more folks inside intel (Peter Anvin, Asit Mallick),
      the safest option (for future compatibility etc) seen was to use vector 0x20
      for IRQ_MOVE_CLEANUP_VECTOR instead of using vector 0x1f (which is documented as
      reserved vector in the Intel IA32 manuals).
      
      Also we don't need to reserve the entire privilege level (all 16 vectors in
      the priority bucket that IRQ_MOVE_CLEANUP_VECTOR falls into), as the
      x86 architecture (section 10.9.3 in SDM Vol3a) specifies that with in the
      priority level, the higher the vector number the higher the priority.
      And hence we don't need to reserve the complete priority level 0x20-0x2f for
      the IRQ migration cleanup logic.
      
      So change the IRQ_MOVE_CLEANUP_VECTOR to 0x20 and  allow 0x21-0x2f to be used
      for device interrupts. 0x30-0x3f will be used for ISA interrupts (these
      also can be migrated in the context of IOAPIC and hence need to be at a higher
      priority level than IRQ_MOVE_CLEANUP_VECTOR).
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100114002118.521826763@sbs-t61.sc.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6579b474
  17. 07 1月, 2010 1 次提交
    • S
      x86, irq: Check move_in_progress before freeing the vector mapping · 7f41c2e1
      Suresh Siddha 提交于
      With the recent irq migration fixes (post 2.6.32), Gary Hade has noticed
      "No IRQ handler for vector" messages during the 2.6.33-rc1 kernel boot on IBM
      AMD platforms and root caused the issue to this commit:
      
      > commit 23359a88
      > Author: Suresh Siddha <suresh.b.siddha@intel.com>
      > Date:   Mon Oct 26 14:24:33 2009 -0800
      >
      >    x86: Remove move_cleanup_count from irq_cfg
      
      As part of this patch, we have removed the move_cleanup_count check
      in smp_irq_move_cleanup_interrupt(). With this change, we can run into a
      situation where an irq cleanup interrupt on a cpu can cleanup the vector
      mappings associated with multiple irqs, of which one of the irq's migration
      might be still in progress. As such when that irq hits the old cpu, we get
      the "No IRQ handler" messages.
      
      Fix this by checking for the irq_cfg's move_in_progress and if the move
      is still in progress delay the vector cleanup to another irq cleanup
      interrupt request (which will happen when the irq starts arriving at the
      new cpu destination).
      Reported-and-tested-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1262804191.2732.7.camel@sbs-t61.sc.intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7f41c2e1
  18. 05 1月, 2010 1 次提交
    • H
      x86, apic: Don't waste a vector to improve vector spread · ea943966
      H. Peter Anvin 提交于
      We want to use a vector-assignment sequence that avoids stumbling onto
      0x80 earlier in the sequence, in order to improve the spread of
      vectors across priority levels on machines with a small number of
      interrupt sources.  Right now, this is done by simply making the first
      vector (0x31 or 0x41) completely unusable.  This is unnecessary; all
      we need is to start assignment at a +1 offset, we don't actually need
      to prohibit the usage of this vector once we have wrapped around.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <4B426550.6000209@kernel.org>
      ea943966
  19. 30 12月, 2009 1 次提交
    • Y
      x86: Increase NR_IRQS and nr_irqs · 9959c888
      Yinghai Lu 提交于
      I have a system with lots of igb and ixgbe, when iov/vf are
      enabled for them, we hit the limit of 3064.
      
      when system has 20 pcie installed, and one card has 2
      functions, and one function needs 64 msi-x,
       may need 20 * 2 * 64 = 2560 for msi-x
      
      but if iov and vf are enabled
       may need 20 * 2 * 64 * 3 = 7680 for msi-x
      assume system with 5 ioapic, nr_irqs_gsi will be 120.
      
      NR_CPUS = 512, and nr_cpu_ids = 128
      will have NR_IRQS = 256 + 512 * 64 = 33024
      will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824
      
      When SPARSE_IRQ is not set, there is no increase with kernel data
      size.
      
      when NR_CPUS=128, and SPARSE_IRQ is set:
         text		   data	    bss		   dec		 hex	filename
      21837444	4216564	12480736	38534744	24bfe58	vmlinux.before
      21837442	4216580	12480736	38534758	24bfe66	vmlinux.after
      when NR_CPUS=4096, and SPARSE_IRQ is set
         text		   data	    bss		   dec		 hex	filename
      21878619	5610244	13415392	40904255	270263f	vmlinux.before
      21878617	5610244	13415392	40904253	270263d	vmlinux.after
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B398ECD.1080506@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9959c888
  20. 18 12月, 2009 1 次提交
    • S
      x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system · 18374d89
      Suresh Siddha 提交于
      John Blackwood reported:
      > on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded),
      > and  32 bit (x86) kernel, once you change the irq smp_affinity of an irq
      > to be less than all cpus in the system, you can never change really the
      > irq smp_affinity back to be all cpus in the system (0xff) again,
      > even though no error status is returned on the "/bin/echo ff >
      > /proc/irq/[n]/smp_affinity" operation.
      >
      > This is due to that fact that BAD_APICID has the same value as
      > all cpus (0xff) on 32bit kernels, and thus the value returned from
      > set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated
      > as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes
      > are made.
      
      set_desc_affinity() is already checking if the incoming cpu mask
      intersects with the cpu online mask or not. So there is no need
      for the apic op cpu_mask_to_apicid_and() to check again
      and return BAD_APICID.
      
      Remove the BAD_APICID return value from cpu_mask_to_apicid_and()
      and also fix set_desc_affinity() to return -1 instead of using BAD_APICID
      to represent error conditions (as cpu_mask_to_apicid_and() can return
      logical or physical apicid values and BAD_APICID is really to represent
      bad physical apic id).
      Reported-by: NJohn Blackwood <john.blackwood@ccur.com>
      Root-caused-by: NJohn Blackwood <john.blackwood@ccur.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1261103386.2535.409.camel@sbs-t61>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      18374d89
  21. 15 12月, 2009 1 次提交
  22. 02 12月, 2009 3 次提交
    • S
      x86, ioapic: Document another case when level irq is seen as an edge · 1c83995b
      Suresh Siddha 提交于
      In the case when cpu goes offline, fixup_irqs() will forward any
      unhandled interrupt on the offlined cpu to the new cpu
      destination that is handling the corresponding interrupt. This
      interrupt forwarding is done via IPI's. Hence, in this case also
      level-triggered io-apic interrupt will be seen as an edge
      interrupt in the cpu's APIC IRR.
      
      Document this scenario in the code which handles this case by doing
      an explicit EOI to the io-apic to clear remote IRR of the io-apic RTE.
      Requested-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: ebiederm@xmission.com
      Cc: garyhade@us.ibm.com
      LKML-Reference: <20091201233335.143970505@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1c83995b
    • S
      x86, ioapic: Fix the EOI register detection mechanism · c29d9db3
      Suresh Siddha 提交于
      Maciej W. Rozycki reported:
      
      > 82093AA I/O APIC has its version set to 0x11 and it
      > does not support the EOI register.  Similarly I/O APICs
      > integrated into the 82379AB south bridge and the 82374EB/SB
      > EISA component.
      
      IO-APIC versions below 0x20 don't support EOI register.
      
      Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
      version as 0x2. This is an error with documentation and these
      ICH chips use io-apic's of version 0x20 and indeed has a working
      EOI register for the io-apic.
      
      Fix the EOI register detection mechanism to check for version
      0x20 and beyond.
      
      And also, a platform can potentially  have io-apic's with
      different versions. Make the EOI register check per io-apic.
      Reported-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: ebiederm@xmission.com
      Cc: garyhade@us.ibm.com
      LKML-Reference: <20091201233335.065361533@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c29d9db3
    • M
      x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq · ca64c47c
      Maciej W. Rozycki 提交于
      When the level-triggered interrupt is seen as an edge interrupt,
      we try to clear the remoteIRR explicitly (using either an
      io-apic eoi register when present or through the idea of
      changing trigger mode of the io-apic RTE to edge and then back
      to level). But this explicit try also needs to happen before we
      try to migrate the irq. Otherwise irq migration attempt will
      fail anyhow, as it postpones the irq migration to a later
      attempt when it sees the remoteIRR in the io-apic RTE still set.
      Signed-off-by: N"Maciej W. Rozycki" <macro@linux-mips.org>
      Reviewed-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: ebiederm@xmission.com
      Cc: garyhade@us.ibm.com
      LKML-Reference: <20091201233334.975416130@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca64c47c
  23. 23 11月, 2009 1 次提交
  24. 16 11月, 2009 1 次提交
  25. 10 11月, 2009 1 次提交
    • C
      x86: apic: Do not use stacked physid_mask_t · 7abc0753
      Cyrill Gorcunov 提交于
      We should not use physid_mask_t as a stack based
      variable in apic code. This type depends on MAX_APICS
      parameter which may be huge enough.
      
      Especially it became a problem with apic NOOP driver which
      is portable between 32 bit and 64 bit environment
      (where we have really huge MAX_APICS).
      
      So apic driver should operate with pointers and a caller
      in turn should aware of allocation physid_mask_t variable.
      
      As a side (but positive) effect -- we may use already
      implemented physid_set_mask_of_physid function eliminating
      default_apicid_to_cpu_present completely.
      
      Note that physids_coerce and physids_promote turned into static
      inline from macro (since macro hides the fact that parameter is
      being interpreted as unsigned long, make it explicit).
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      LKML-Reference: <20091109220659.GA5568@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7abc0753
  26. 09 11月, 2009 2 次提交
  27. 02 11月, 2009 3 次提交
    • S
      x86: Use EOI register in io-apic on intel platforms · b3ec0a37
      Suresh Siddha 提交于
      IO-APIC's in intel chipsets support EOI register starting from
      IO-APIC version 2. Use that when ever we need to clear the
      IO-APIC RTE's RemoteIRR bit explicitly.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NGary Hade <garyhade@us.ibm.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <20091026230001.947855317@sbs-t61.sc.intel.com>
      [ Marked use_eio_reg as __read_mostly, fixed small details ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b3ec0a37
    • S
      x86: Force irq complete move during cpu offline · a5e74b84
      Suresh Siddha 提交于
      When a cpu goes offline, fixup_irqs() try to move irq's
      currently destined to the offline cpu to a new cpu. But this
      attempt will fail if the irq is recently moved to this cpu and
      the irq still hasn't arrived at this cpu (for non intr-remapping
      platforms this is when we free the vector allocation at the
      previous destination) that is about to go offline.
      
      This will endup with the interrupt subsystem still pointing the
      irq to the offline cpu, causing that irq to not work any more.
      
      Fix this by forcing the irq to complete its move (its been a
      long time we moved the irq to this cpu which we are offlining
      now) and then move this irq to a new cpu before this cpu goes
      offline.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NGary Hade <garyhade@us.ibm.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <20091026230001.848830905@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a5e74b84
    • S
      x86: Remove move_cleanup_count from irq_cfg · 23359a88
      Suresh Siddha 提交于
      move_cleanup_count for each irq in irq_cfg is keeping track of
      the total number of cpus that need to free the corresponding
      vectors associated with the irq which has now been migrated to
      new destination. As long as this move_cleanup_count is non-zero
      (i.e., as long as we have n't freed the vector allocations on
      the old destinations) we were preventing the irq's further
      migration.
      
      This cleanup count is unnecessary and it is enough to not allow
      the irq migration till we send the cleanup vector to the
      previous irq destination, for which we already have irq_cfg's
      move_in_progress.  All we need to make sure is that we free the
      vector at the old desintation but we don't need to wait till
      that gets freed.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NGary Hade <garyhade@us.ibm.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <20091026230001.752968906@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      23359a88
  28. 14 10月, 2009 1 次提交