1. 04 6月, 2009 5 次提交
  2. 29 5月, 2009 10 次提交
  3. 28 5月, 2009 1 次提交
  4. 18 5月, 2009 2 次提交
    • Y
      x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled · f1bdb523
      Yinghai Lu 提交于
      Len expressed concern that the update_mptable feature has
      side-effects on the ACPI code.
      
      Make it sure explicitly that the code only ever gets called if
      the (default disabled) update_mptable boot quirk option is
      disabled.
      
      [ Impact: isolate the update_mptable feature from ACPI code more ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Len Brown <lenb@kernel.org>
      LKML-Reference: <4A0DC832.5090200@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1bdb523
    • Y
      x86, apic: introduce io_apic_irq_attr · e5198075
      Yinghai Lu 提交于
      according to Ingo, io_apic irq-setup related functions have too many
      parameters with a repetitive signature.
      
      So reduce related funcs to get less params by passing a pointer
      to a newly defined io_apic_irq_attr structure.
      
      v2: io_apic_irq ==> irq_attr
          triggering ==> trigger
      
      v3: add set_io_apic_irq_attr
      
      [ Impact: cleanup ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Len Brown <lenb@kernel.org>
      LKML-Reference: <4A08ACD3.2070401@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5198075
  5. 16 5月, 2009 1 次提交
    • J
      x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126
      Jeremy Fitzhardinge 提交于
      Xiaohui Xin and some other folks at Intel have been looking into what's
      behind the performance hit of paravirt_ops when running native.
      
      It appears that the hit is entirely due to the paravirtualized
      spinlocks introduced by:
      
       | commit 8efcbab6
       | Date:   Mon Jul 7 12:07:51 2008 -0700
       |
       |     paravirt: introduce a "lock-byte" spinlock implementation
      
      The extra call/return in the spinlock path is somehow
      causing an increase in the cycles/instruction of somewhere around 2-7%
      (seems to vary quite a lot from test to test).  The working theory is
      that the CPU's pipeline is getting upset about the
      call->call->locked-op->return->return, and seems to be failing to
      speculate (though I haven't seen anything definitive about the precise
      reasons).  This doesn't entirely make sense, because the performance
      hit is also visible on unlock and other operations which don't involve
      locked instructions.  But spinlock operations clearly swamp all the
      other pvops operations, even though I can't imagine that they're
      nearly as common (there's only a .05% increase in instructions
      executed).
      
      If I disable just the pv-spinlock calls, my tests show that pvops is
      identical to non-pvops performance on native (my measurements show that
      it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
      
      Summary of results, averaging 10 runs of the "mmperf" test, using a
      no-pvops build as baseline:
      
      		nopv		Pv-nospin	Pv-spin
      CPU cycles	100.00%		99.89%		102.18%
      instructions	100.00%		100.10%		100.15%
      CPI		100.00%		99.79%		102.03%
      cache ref	100.00%		100.84%		100.28%
      cache miss	100.00%		90.47%		88.56%
      cache miss rate	100.00%		89.72%		88.31%
      branches	100.00%		99.93%		100.04%
      branch miss	100.00%		103.66%		107.72%
      branch miss rt	100.00%		103.73%		107.67%
      wallclock	100.00%		99.90%		102.20%
      
      The clear effect here is that the 2% increase in CPI is
      directly reflected in the final wallclock time.
      
      (The other interesting effect is that the more ops are
      out of line calls via pvops, the lower the cache access
      and miss rates.  Not too surprising, but it suggests that
      the non-pvops kernel is over-inlined.  On the flipside,
      the branch misses go up correspondingly...)
      
      So, what's the fix?
      
      Paravirt patching turns all the pvops calls into direct calls, so
      _spin_lock etc do end up having direct calls.  For example, the compiler
      generated code for paravirtualized _spin_lock is:
      
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq  *0xffffffff805a5b30
      <_spin_lock+22>:	retq
      
      The indirect call will get patched to:
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq <__ticket_spin_lock>
      <_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
      <_spin_lock+22>:	retq
      
      One possibility is to inline _spin_lock, etc, when building an
      optimised kernel (ie, when there's no spinlock/preempt
      instrumentation/debugging enabled).  That will remove the outer
      call/return pair, returning the instruction stream to a single
      call/return, which will presumably execute the same as the non-pvops
      case.  The downsides arel 1) it will replicate the
      preempt_disable/enable code at eack lock/unlock callsite; this code is
      fairly small, but not nothing; and 2) the spinlock definitions are
      already a very heavily tangled mass of #ifdefs and other preprocessor
      magic, and making any changes will be non-trivial.
      
      The other obvious answer is to disable pv-spinlocks.  Making them a
      separate config option is fairly easy, and it would be trivial to
      enable them only when Xen is enabled (as the only non-default user).
      But it doesn't really address the common case of a distro build which
      is going to have Xen support enabled, and leaves the open question of
      whether the native performance cost of pv-spinlocks is worth the
      performance improvement on a loaded Xen system (10% saving of overall
      system CPU when guests block rather than spin).  Still it is a
      reasonable short-term workaround.
      
      [ Impact: fix pvops performance regression when running native ]
      Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
      Analysed-by: N"Li Xin" <xin.li@intel.com>
      Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      LKML-Reference: <4A0B62F7.5030802@goop.org>
      [ fixed the help text ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b4ecc126
  6. 12 5月, 2009 2 次提交
    • Y
      x86: read apic ID in the !acpi_lapic case · 4797f6b0
      Yinghai Lu 提交于
      Ed found that on 32-bit, boot_cpu_physical_apicid is not read right,
      when the mptable is broken.
      
      Interestingly, actually three paths use/set it:
      
       1. acpi: at that time that is already read from reg
       2. mptable: only read from mptable
       3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit
      
      so we could read the apic id for the 2/3 path. We trust the hardware
      register more than we trust a BIOS data structure (the mptable).
      
      We can also avoid the double set_fixmap() when acpi_lapic
      is used, and also need to move cpu_has_apic earlier and
      call apic_disable().
      
      Also when need to update the apic id, we'd better read and
      set the apic version as well - so that quirks are applied precisely.
      
      v2: make path 3 with 64bit, use -1 as apic id, so could read it later.
      v3: fix whitespace problem pointed out by Ed Swierk
      v5: fix boot crash
      
      [ Impact: get correct apic id for bsp other than acpi path ]
      Reported-by: NEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <49FC85A9.2070702@kernel.org>
      [ v4: sanity-check in the ACPI case too ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4797f6b0
    • M
      x86, 32-bit: fix kernel_trap_sp() · 7b6c6c77
      Masami Hiramatsu 提交于
      Use &regs->sp instead of regs for getting the top of stack in kernel mode.
      (on x86-64, regs->sp always points the top of stack)
      
      [ Impact: Oprofile decodes only stack for backtracing on i386 ]
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      [ v2: rename the API to kernel_stack_pointer(), move variable inside ]
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: systemtap@sources.redhat.com
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Cc: Jan Blunck <jblunck@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      LKML-Reference: <20090511210300.17332.67549.stgit@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7b6c6c77
  7. 11 5月, 2009 6 次提交
  8. 03 5月, 2009 1 次提交
  9. 28 4月, 2009 1 次提交
    • Y
      irq: change ACPI GSI APIs to also take a device argument · a2f809b0
      Yinghai Lu 提交于
      We want to use dev_to_node() later on, to be aware of the 'home node'
      of the GSI in question.
      
      [ Impact: cleanup, prepare the IRQ code to be more NUMA aware ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NLen Brown <lenb@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      LKML-Reference: <49F65560.20904@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a2f809b0
  10. 23 4月, 2009 2 次提交
  11. 22 4月, 2009 2 次提交
    • S
      x86: x2apic, IR: remove reinit_intr_remapped_IO_APIC() · ff166cb5
      Suresh Siddha 提交于
      When interrupt-remapping is enabled, we are relying on
      setup_IO_APIC_irqs() to configure remapped entries in the
      IO-APIC, which comes little bit later after enabling
      interrupt-remapping.
      
      Meanwhile, restoration of old io-apic entries after enabling
      interrupt-remapping will not make the interrupts through
      io-apic functional anyway.
      
      So remove the unnecessary reinit_intr_remapped_IO_APIC() step.
      
      The longer story:
      
      When interrupt-remapping is enabled, IO-APIC entries need to be
      setup in the re-mappable format (pointing to
      interrupt-remapping table entries setup by the OS). This
      remapping configuration is happening in the same place where we
      traditionally configure IO-APIC (i.e., in
      setup_IO_APIC_irqs()).
      
      So when we enable interrupt-remapping successfully, there is no
      need to restore old io-apic RTE entries before we actually do a
      complete configuration shortly in setup_IO_APIC_irqs(). Old
      IO-APIC RTE's may be in traditional format (non re-mappable) or
      in re-mappable format pointing to interrupt-remapping table
      entries setup by BIOS. Restoring both of these will not make
      IO-APIC functional. We have to rely on setup_IO_APIC_irqs() for
      proper configuration by OS.
      
      So I am removing this unnecessary and broken step.
      
      [ Impact: remove unnecessary/broken IO-APIC setup step ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NWeidong Han <weidong.han@intel.com>
      Cc: dwmw2@infradead.org
      LKML-Reference: <20090420200450.552359000@linux-os.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ff166cb5
    • D
      FRV: Fix the section attribute on UP DECLARE_PER_CPU() · 9b8de747
      David Howells 提交于
      In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
      does not agree with that specified by DEFINE_PER_CPU().  This means that
      architectures that have a small data section references relative to a base
      register may throw up linkage errors due to too great a displacement between
      where the base register points and the per-CPU variable.
      
      On FRV, the .h declaration says that the variable is in the .sdata section, but
      the .c definition says it's actually in the .data section.  The linker throws
      up the following errors:
      
      kernel/built-in.o: In function `release_task':
      kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
      kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
      
      To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
      as does DEFINE_PER_CPU().  However, this is made slightly more complex by
      virtue of the fact that there are several variants on DEFINE, so these need to
      be matched by variants on DECLARE.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b8de747
  12. 21 4月, 2009 1 次提交
  13. 19 4月, 2009 3 次提交
    • R
      lguest: fix guest crash on non-linear addresses in gdt pvops · a489f0b5
      Rusty Russell 提交于
      Fixes guest crash 'lguest: bad read address 0x4800000 len 256'
      
      The new per-cpu allocator ends up handing a non-linear address to
      write_gdt_entry.  We do __pa() on it, and hand it to the host, which
      kills us.
      
      I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
      code, but had no pressing reason until now.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: lguest@ozlabs.org
      a489f0b5
    • W
      x86, intr-remap: enable interrupt remapping early · 93758238
      Weidong Han 提交于
      Currently, when x2apic is not enabled, interrupt remapping
      will be enabled in init_dmars(), where it is too late to remap
      ioapic interrupts, that is, ioapic interrupts are really in
      compatibility mode, not remappable mode.
      
      This patch always enables interrupt remapping before ioapic
      setup, it guarantees all interrupts will be remapped when
      interrupt remapping is enabled. Thus it doesn't need to set
      the compatibility interrupt bit.
      
      [ Impact: refactor intr-remap init sequence, enable fuller remap mode ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NWeidong Han <weidong.han@intel.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: allen.m.kay@intel.com
      Cc: fenghua.yu@intel.com
      LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      93758238
    • W
      x86, intr-remap: fix ack for interrupt remapping · 5d0ae2db
      Weidong Han 提交于
      Shouldn't call ack_apic_edge() in ir_ack_apic_edge(), because
      ack_apic_edge() does more than just ack: it also does irq migration
      in the non-interrupt-remapping case. But there is no such need for
      interrupt-remapping case, as irq migration is done in the process
      context.
      
      Similarly, ir_ack_apic_level() shouldn't call ack_apic_level, and
      instead should do the local cpu's EOI + directed EOI to the io-apic.
      
      ack_x2APIC_irq() is not neccessary, because ack_APIC_irq() will use MSR
      write for x2apic, and uncached write for non-x2apic.
      
      [ Impact: simplify/standardize intr-remap IRQ acking, fix on !x2apic ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NWeidong Han <weidong.han@intel.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: allen.m.kay@intel.com
      Cc: fenghua.yu@intel.com
      LKML-Reference: <1239957736-6161-3-git-send-email-weidong.han@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d0ae2db
  14. 13 4月, 2009 1 次提交
    • C
      x86: apic - introduce dummy apic operations · 08306ce6
      Cyrill Gorcunov 提交于
      Impact: refactor, speed up and robustize code
      
      In case if apic was disabled by kernel option
      or by hardware limits we can use dummy operations
      in apic->write to simplify the ack_APIC_irq() code.
      
      At the lame time the patch fixes the missed EOI in
      do_IRQ function (which has place if kernel is compiled
      as X86-32 and interrupt without handler happens where
      apic was not asked to be disabled via kernel option).
      
      Note that native_apic_write_dummy() consists of
      WARN_ON_ONCE to catch any buggy writes on enabled
      APICs. Could be removed after some time of testing.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <20090412165058.724788431@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      08306ce6
  15. 12 4月, 2009 1 次提交
  16. 11 4月, 2009 1 次提交