1. 08 12月, 2008 1 次提交
    • Y
      sparse irq_desc[] array: core kernel and x86 changes · 0b8f1efa
      Yinghai Lu 提交于
      Impact: new feature
      
      Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
      NR_CPUS set to large values. The goal is to be able to scale up to much
      larger NR_IRQS value without impacting the (important) common case.
      
      To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
      irq_desc pointers.
      
      When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
      this also makes the IRQ descriptors NUMA-local (to the site that calls
      request_irq()).
      
      This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
      uses desc->chip_data for x86 to store irq_cfg.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b8f1efa
  2. 04 12月, 2008 1 次提交
  3. 03 12月, 2008 3 次提交
  4. 01 12月, 2008 2 次提交
  5. 26 11月, 2008 4 次提交
    • A
      [CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value · a266d9f1
      Andreas Herrmann 提交于
      A workaround for AMD CPU family 11h erratum 311 might cause that the
      P-state Status Register shows a "current P-state" which is larger than
      the "current P-state limit" in P-state Current Limit Register. For the
      wrong P-state value there is no ACPI _PSS object defined and
      powernow-k8/cpufreq can't determine the proper CPU frequency for that
      state.
      
      As a consequence this can cause a panic during boot (potentially with
      all recent kernel versions -- at least I have reproduced it with
      various 2.6.27 kernels and with the current .28 series), as an
      example:
      
      powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82 processors (2 \
      )
      powernow-k8:    0 : pstate 0 (2200 MHz)
      powernow-k8:    1 : pstate 1 (1100 MHz)
      powernow-k8:    2 : pstate 2 (600 MHz)
      BUG: unable to handle kernel paging request at ffff88086e7528b8
      IP: [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f
      PGD 202063 PUD 0
      Oops: 0002 [#1] SMP
      last sysfs file:
      CPU 1
      Modules linked in:
      Pid: 1, comm: swapper Not tainted 2.6.28-rc3-dirty #16
      RIP: 0010:[<ffffffff80486361>]  [<ffffffff80486361>] cpufreq_stats_update+0x4a/0\
      f
      Synaptics claims to have extended capabilities, but I'm not able to read them.<6\
      6
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88006e7528c0
      RDX: 00000000ffffffff RSI: ffff88006e54af00 RDI: ffffffff808f056c
      RBP: 00000000fffee697 R08: 0000000000000003 R09: ffff88006e73f080
      R10: 0000000000000001 R11: 00000000002191c0 R12: ffff88006fb83c10
      R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88006fb50740(0000) knlGS:0000000000000000
      Unable to initialize Synaptics hardware.
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: ffff88086e7528b8 CR3: 0000000000201000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 1, threadinfo ffff88006fb82000, task ffff88006fb816d0)
      Stack:
       ffff88006e74da50 0000000000000000 ffff88006e54af00 ffffffff804863c7
       ffff88006e74da50 0000000000000000 00000000ffffffff 0000000000000000
       ffff88006fb83c10 ffffffff8024b46c ffffffff808f0560 ffff88006fb83c10
      Call Trace:
       [<ffffffff804863c7>] ? cpufreq_stat_notifier_trans+0x51/0x83
       [<ffffffff8024b46c>] ? notifier_call_chain+0x29/0x4c
       [<ffffffff8024b561>] ? __srcu_notifier_call_chain+0x46/0x61
       [<ffffffff8048496d>] ? cpufreq_notify_transition+0x93/0xa9
       [<ffffffff8021ab8d>] ? powernowk8_target+0x1e8/0x5f3
       [<ffffffff80486687>] ? cpufreq_governor_performance+0x1b/0x20
       [<ffffffff80484886>] ? __cpufreq_governor+0x71/0xa8
       [<ffffffff80484b21>] ? __cpufreq_set_policy+0x101/0x13e
       [<ffffffff80485bcd>] ? cpufreq_add_dev+0x3f0/0x4cd
       [<ffffffff8048577a>] ? handle_update+0x0/0x8
       [<ffffffff803c2062>] ? sysdev_driver_register+0xb6/0x10d
       [<ffffffff8056592c>] ? powernowk8_init+0x0/0x7e
       [<ffffffff8048604c>] ? cpufreq_register_driver+0x8f/0x140
       [<ffffffff80209056>] ? _stext+0x56/0x14f
       [<ffffffff802c2234>] ? proc_register+0x122/0x17d
       [<ffffffff802c23a0>] ? create_proc_entry+0x73/0x8a
       [<ffffffff8025c259>] ? register_irq_proc+0x92/0xaa
       [<ffffffff8025c2c8>] ? init_irq_proc+0x57/0x69
       [<ffffffff807fc85f>] ? kernel_init+0x116/0x169
       [<ffffffff8020cc79>] ? child_rip+0xa/0x11
       [<ffffffff807fc749>] ? kernel_init+0x0/0x169
       [<ffffffff8020cc6f>] ? child_rip+0x0/0x11
      Code: 05 c5 83 36 00 48 c7 c2 48 5d 86 80 48 8b 04 d8 48 8b 40 08 48 8b 34 02 48\
      
      RIP  [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f
       RSP <ffff88006fb83b20>
      CR2: ffff88086e7528b8
      ---[ end trace 0678bac75e67a2f7 ]---
      Kernel panic - not syncing: Attempted to kill init!
      
      In short, aftereffect of the wrong P-state is that
      cpufreq_stats_update() uses "-1" as index for some array in
      
      cpufreq_stats_update (unsigned int cpu)
      {
      ...
           if (stat->time_in_state)
                      stat->time_in_state[stat->last_index] =
                              cputime64_add(stat->time_in_state[stat->last_index],
                                            cputime_sub(cur_time, stat->last_time));
      ...
      }
      
      Fortunately, the wrong P-state value is returned only if the core is
      in P-state 0. This fix solves the problem by detecting the
      out-of-range P-state, ignoring it, and using "0" instead.
      
      Cc: Mark Langsdorf <mark.langsdorf@amd.com>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      a266d9f1
    • M
      x86, bts: fix wrmsr and spinlock over kmalloc · de90add3
      Markus Metzger 提交于
      Impact: fix sleeping-with-spinlock-held bugs/crashes
      
      - Turn a wrmsr to write the DS_AREA MSR into a wrmsrl.
      - Use irqsave variants of spinlocks.
      - Do not allocate memory while holding spinlocks.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de90add3
    • M
      x86, pebs: fix PEBS record size configuration · c4858ffc
      Markus Metzger 提交于
      Impact: fix DS hw enablement on 64-bit x86
      
      Fix the PEBS record size in the DS configuration.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c4858ffc
    • M
      x86, bts: exclude ds.c from build when disabled · 292c669c
      Markus Metzger 提交于
      Impact: cleanup
      
      Move the CONFIG guard from the .c file into the makefile.
      Reported-by: NAndi Kleen <andi-suse@firstfloor.org>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      292c669c
  6. 25 11月, 2008 1 次提交
  7. 23 11月, 2008 1 次提交
  8. 21 11月, 2008 1 次提交
    • M
      x86: Fix interrupt leak due to migration · 0ca4b6b0
      Matthew Wilcox 提交于
      When we migrate an interrupt from one CPU to another, we set the
      move_in_progress flag and clean up the vectors later once they're not
      being used.  If you're unlucky and call destroy_irq() before the vectors
      become un-used, the move_in_progress flag is never cleared, which causes
      the interrupt to become unusable.
      
      This was discovered by Jesse Brandeburg for whom it manifested as an
      MSI-X device refusing to use MSI-X mode when the driver was unloaded
      and reloaded repeatedly.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ca4b6b0
  9. 20 11月, 2008 2 次提交
  10. 19 11月, 2008 1 次提交
  11. 18 11月, 2008 7 次提交
  12. 16 11月, 2008 3 次提交
    • Y
      x86: fix es7000 compiling · d3c6aa1e
      Yinghai Lu 提交于
      Impact: fix es7000 build
      
        CC      arch/x86/kernel/es7000_32.o
      arch/x86/kernel/es7000_32.c: In function find_unisys_acpi_oem_table:
      arch/x86/kernel/es7000_32.c:255: error: implicit declaration of function acpi_get_table_with_size
      arch/x86/kernel/es7000_32.c:261: error: implicit declaration of function early_acpi_os_unmap_memory
      arch/x86/kernel/es7000_32.c: In function unmap_unisys_acpi_oem_table:
      arch/x86/kernel/es7000_32.c:277: error: implicit declaration of function __acpi_unmap_table
      make[1]: *** [arch/x86/kernel/es7000_32.o] Error 1
      
      we applied one patch out of order...
      
      | commit a73aaedd
      | Author: Yinghai Lu <yhlu.kernel@gmail.com>
      | Date:   Sun Sep 14 02:33:14 2008 -0700
      |
      |    x86: check dsdt before find oem table for es7000, v2
      |
      |    v2: use __acpi_unmap_table()
      
      that patch need:
      
      	x86: use early_ioremap in __acpi_map_table
      	x86: always explicitly map acpi memory
      	acpi: remove final __acpi_map_table mapping before setting acpi_gbl_permanent_mmap
      	acpi/x86: introduce __apci_map_table, v4
      
      submitted to the ACPI tree but not upstream yet.
      
      fix it until those patches applied, need to revert this one
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3c6aa1e
    • M
      x86, bts: fix unlock problem in ds.c · d1f1e9c0
      Markus Metzger 提交于
      Fix a problem where ds_request() returned an error without releasing the
      ds lock.
      Reported-by: NStephane Eranian <eranian@gmail.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d1f1e9c0
    • D
      Revert "x86: blacklist DMAR on Intel G31/G33 chipsets" · 52168e60
      David Woodhouse 提交于
      This reverts commit e51af663, which was
      wrongly hoovered up and submitted about a month after a better fix had
      already been merged.
      
      The better fix is commit cbda1ba8
      ("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
      this blacklisting based on the DMI identification for the offending
      motherboard, since sometimes this chipset (or at least a chipset with
      the same PCI ID) apparently _does_ actually have an IOMMU.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52168e60
  13. 12 11月, 2008 2 次提交
    • B
      ACPI: pci_link: remove acpi_irq_balance_set() interface · 32836259
      Bjorn Helgaas 提交于
      This removes the acpi_irq_balance_set() interface from the PCI
      interrupt link driver.
      
      x86 used acpi_irq_balance_set() to tell the PCI interrupt link
      driver to configure links to minimize IRQ sharing.  But the link
      driver can easily figure out whether to turn on IRQ balancing
      based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of
      that external interface.
      
      It's better for the driver to figure this out at init-time.  If
      we set it externally via the x86 code, the interface reduces
      modularity, and we depend on the fact that acpi_process_madt()
      happens before we process the kernel command line.
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      32836259
    • R
      x86: KVM guest: fix section mismatch warning in kvmclock.c · a29a2af3
      Rakib Mullick 提交于
      WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch
      in reference from the function kvm_setup_secondary_clock() to the
      function .devinit.text:setup_secondary_APIC_clock()
      The function kvm_setup_secondary_clock() references
      the function __devinit setup_secondary_APIC_clock().
      This is often because kvm_setup_secondary_clock lacks a __devinit
      annotation or the annotation of setup_secondary_APIC_clock is wrong.
      Signed-off-by: NMd.Rakib H. Mullick <rakib.mullick@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a29a2af3
  14. 11 11月, 2008 3 次提交
  15. 10 11月, 2008 1 次提交
  16. 09 11月, 2008 1 次提交
    • I
      sched: optimize sched_clock() a bit · 7cbaef9c
      Ingo Molnar 提交于
      sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
      variant of __cycles_2_ns().
      
      Most of the time sched_clock() is called with irqs disabled already.
      The few places that call it with irqs enabled need to be updated.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7cbaef9c
  17. 06 11月, 2008 4 次提交
    • E
      Revert "x86: default to reboot via ACPI" · 8d00450d
      Eduardo Habkost 提交于
      This reverts commit c7ffa6c2.
      
      the assumptio of this change was that this would not break
      any existing machine. Andrey Borzenkov reported troubles with
      the ACPI reboot method: the system would hang on reboot, necessiating
      a power cycle. Probably more systems are affected as well.
      
      Also, there are patches queued up for v2.6.29 to disable virtualization
      on emergency_restart() - which was the original motivation of
      this change.
      Reported-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Bisected-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d00450d
    • J
      AMD IOMMU: fix lazy IO/TLB flushing in unmap path · 80be308d
      Joerg Roedel 提交于
      Lazy flushing needs to take care of the unmap path too which is not yet
      implemented and leads to stale IO/TLB entries. This is fixed by this
      patch.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      80be308d
    • S
      x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR · d6f0f39b
      Suresh Siddha 提交于
      Impact: fix rare x2apic hang
      
      On x86, x2apic mode accesses for sending IPI's don't have serializing
      semantics. If the IPI receivner refers(in lock-free fashion) to some
      memory setup by the sender, the need for smp_mb() before sending the
      IPI becomes critical in x2apic mode.
      
      Add the smp_mb() in native_flush_tlb_others() before sending the IPI.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6f0f39b
    • B
      x86: don't allow nr_irqs > NR_IRQS · c78d0cf2
      Ben Hutchings 提交于
      Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins
      
      On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can
      return a value larger than NR_IRQS. This will lead to probe_irq_on()
      overrunning the irq_desc array.
      
      I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a
      Supermicro dual Xeon system.  NR_IRQS is 224 but probe_nr_irqs() detects
      5 IOAPICs and returns 240.  Here are the log messages:
      
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
      Tue Nov  4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24])
      Tue Nov  4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48])
      Tue Nov  4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72])
      Tue Nov  4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96])
      Tue Nov  4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119
      Tue Nov  4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
      Tue Nov  4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
      Tue Nov  4 16:53:47 2008 Enabling APIC mode:  Flat.  Using 5 I/O APICs
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Acked-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c78d0cf2
  18. 04 11月, 2008 1 次提交
  19. 02 11月, 2008 1 次提交
    • L
      x86: Clean up late e820 resource allocation · 1f987577
      Linus Torvalds 提交于
      This makes the late e820 resources use 'insert_resource_expand_to_fit()'
      instead of doing a 'reserve_region_with_split()', and also avoids
      marking them as IORESOURCE_BUSY.
      
      This results in us being perfectly happy to use pre-existing PCI
      resources even if they were marked as being in a reserved region, while
      still avoiding any _new_ allocations in the reserved regions.  It also
      makes for a simpler and more accurate resource tree.
      
      Example resource allocation from Jonathan Corbet, who has firmware that
      has an e820 reserved entry that covered a big range (e0000000-fed003ff),
      and that had various PCI resources in it set up by firmware.
      
      With old kernels, the reserved range would force us to re-allocate all
      pre-existing PCI resources, and his reserved range would end up looking
      like this:
      
      	e0000000-fed003ff : reserved
      	  fec00000-fec00fff : IOAPIC 0
      	  fed00000-fed003ff : HPET 0
      
      where only the pre-allocated special regions (IOAPIC and HPET) were kept
      around.
      
      With 2.6.28-rc2, which uses 'reserve_region_with_split()', Jonathan's
      resource tree looked like this:
      
      	e0000000-fe7fffff : reserved
      	fe800000-fe8fffff : PCI Bus 0000:01
      	 fe800000-fe8fffff : reserved
      	fe900000-fe9d9aff : reserved
      	fe9d9b00-fe9d9bff : 0000:00:1f.3
      	 fe9d9b00-fe9d9bff : reserved
      	fe9d9c00-fe9d9fff : 0000:00:1a.7
      	 fe9d9c00-fe9d9fff : reserved
      	fe9da000-fe9dafff : 0000:00:03.3
      	 fe9da000-fe9dafff : reserved
      	fe9db000-fe9dbfff : 0000:00:19.0
      	 fe9db000-fe9dbfff : reserved
      	fe9dc000-fe9dffff : 0000:00:1b.0
      	 fe9dc000-fe9dffff : reserved
      	fe9e0000-fe9fffff : 0000:00:19.0
      	 fe9e0000-fe9fffff : reserved
      	fea00000-fea7ffff : 0000:00:02.0
      	 fea00000-fea7ffff : reserved
      	fea80000-feafffff : 0000:00:02.1
      	 fea80000-feafffff : reserved
      	feb00000-febfffff : 0000:00:02.0
      	 feb00000-febfffff : reserved
      	fec00000-fed003ff : reserved
      	 fec00000-fec00fff : IOAPIC 0
      	 fed00000-fed003ff : HPET 0
      
      and because the reserved entry had been split and moved into the
      individual resources, and because it used the IORESOURCE_BUSY flag, the
      drivers that actually wanted to _use_ those resources couldn't actually
      attach to them:
      
      	e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff]
      	HDA Intel 0000:00:1b.0: BAR 0: can't reserve mem region [0xfe9dc000-0xfe9dffff]
      
      with this patch, the resource tree instead becomes
      
      	e0000000-fed003ff : reserved
      	  fe800000-fe8fffff : PCI Bus 0000:01
      	  fe9d9b00-fe9d9bff : 0000:00:1f.3
      	  fe9d9c00-fe9d9fff : 0000:00:1a.7
      	    fe9d9c00-fe9d9fff : ehci_hcd
      	  fe9da000-fe9dafff : 0000:00:03.3
      	  fe9db000-fe9dbfff : 0000:00:19.0
      	    fe9db000-fe9dbfff : e1000e
      	  fe9dc000-fe9dffff : 0000:00:1b.0
      	    fe9dc000-fe9dffff : ICH HD audio
      	  fe9e0000-fe9fffff : 0000:00:19.0
      	    fe9e0000-fe9fffff : e1000e
      	  fea00000-fea7ffff : 0000:00:02.0
      	  fea80000-feafffff : 0000:00:02.1
      	  feb00000-febfffff : 0000:00:02.0
      	  fec00000-fec00fff : IOAPIC 0
      	  fed00000-fed003ff : HPET 0
      
      ie the one reserved region now ends up surrounding all the PCI resources
      that were allocated inside of it by firmware, and because it is not
      marked BUSY, drivers have no problem attaching to the pre-allocated
      resources.
      Reported-and-tested-by: NJonathan Corbet <corbet@lwn.net>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robert Hancock <hancockr@shaw.ca>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f987577