1. 11 6月, 2009 2 次提交
  2. 09 6月, 2009 1 次提交
  3. 07 6月, 2009 1 次提交
    • C
      x86, apic: Fix dummy apic read operation together with broken MP handling · 103428e5
      Cyrill Gorcunov 提交于
      Ingo Molnar reported that read_apic is buggy novadays:
      
      [    0.000000] Using APIC driver default
      [    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
      [    0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic"
      [    0.000000] APIC: disable apic facility
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b()
      [    0.000000] Hardware name: HP OmniBook PC
      
      Indeed we still rely on apic->read operation for SMP compiled
      kernel. And instead of disfigure the SMP code with #ifdef we
      allow to call apic->read. To capture any unexpected results
      we check for apic->read being called for sane reason via
      WARN_ON_ONCE but(!) instead of OR we should use AND logical
      operation (thanks Yinghai for spotting the root of the problem).
      
      Along with that we could be have bad MP table and we are
      to fix it that way no SMP started and no complains about
      BIOS bug if apic was just disabled via command line.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20090607124840.GD4547@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      103428e5
  4. 06 6月, 2009 1 次提交
  5. 05 6月, 2009 1 次提交
  6. 04 6月, 2009 1 次提交
  7. 03 6月, 2009 1 次提交
  8. 02 6月, 2009 2 次提交
    • J
      x86, apic: Restore irqs on fail paths · 3d58829b
      Jiri Slaby 提交于
      lapic_resume forgets to restore interrupts on fail paths.
      Fix that.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <1243497289-18591-1-git-send-email-jirislaby@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      3d58829b
    • N
      x86: Print real IOAPIC version for x86-64 · 58f892e0
      Naga Chumbalkar 提交于
      Fix the fact that the IOAPIC version number in the x86_64 code path always
      gets assigned to 0, instead of the correct value.
      
      Before the patch: (from "dmesg" output):
      
       ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
       IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23     <---
      
       After the patch:
       ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
       IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23    <---
      
      History:
      
      io_apic_get_version() was compiled out of the x86_64 code path in the commit
      f2c2cca3:
      
      Author: Andi Kleen <ak@suse.de>
      Date:   Tue Sep 26 10:52:37 2006 +0200
      
          [PATCH] Remove APIC version/cpu capability mpparse checking/printing
      
          ACPI went to great trouble to get the APIC version and CPU capabilities
          of different CPUs before passing them to the mpparser. But all
          that data was used was to print it out.  Actually it even faked some data
          based on the boot cpu, not on the actual CPU being booted.
      
          Remove all this code because it's not needed.
      
          Cc: len.brown@intel.com
      
      At the time, the IOAPIC version number was deliberately not printed
      in the x86_64 code path. However, after the x86 and x86_64 files were
      merged, the net result is that the IOAPIC version is printed incorrectly
      in the x86_64 code path.
      
      The patch below provides a fix. I have tested it with acpi, and with
      acpi=off, and did not see any problems.
      Signed-off-by: NNaga Chumbalkar <nagananda.chumbalkar@hp.com>
      Acked-by: NYinghai Lu <yhlu.kernel@gmail.com>
      LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      *************************
      58f892e0
  9. 30 5月, 2009 1 次提交
  10. 29 5月, 2009 1 次提交
    • M
      x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not · 32b154c0
      Mel Gorman 提交于
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302
      
      On x86 and x86-64, it is possible that page tables are shared beween
      shared mappings backed by hugetlbfs.  As part of this,
      page_table_shareable() checks a pair of vma->vm_flags and they must match
      if they are to be shared.  All VMA flags are taken into account, including
      VM_LOCKED.
      
      The problem is that VM_LOCKED is cleared on fork().  When a process with a
      shared memory segment forks() to exec() a helper, there will be shared
      VMAs with different flags.  The impact is that the shared segment is
      sometimes considered shareable and other times not, depending on what
      process is checking.
      
      What happens is that the segment page tables are being shared but the
      count is inaccurate depending on the ordering of events.  As the page
      tables are freed with put_page(), bad pmd's are found when some of the
      children exit.  The hugepage counters also get corrupted and the Total and
      Free count will no longer match even when all the hugepage-backed regions
      are freed.  This requires a reboot of the machine to "fix".
      
      This patch addresses the problem by comparing all flags except VM_LOCKED
      when deciding if pagetables should be shared or not for hugetlbfs-backed
      mapping.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <starlight@binnacle.cx>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32b154c0
  11. 28 5月, 2009 1 次提交
  12. 27 5月, 2009 5 次提交
  13. 26 5月, 2009 3 次提交
    • T
      x86, relocs: ignore R_386_NONE in kernel relocation entries · 46176b4f
      Tejun Heo 提交于
      For relocatable 32bit kernels, boot/compressed/relocs.c processes
      relocation entries in the kernel image and appends it to the kernel
      image such that boot/compressed/head_32.S can relocate the kernel.
      The kernel image is one statically linked object and only uses two
      relocation types - R_386_PC32 and R_386_32, of the two only the latter
      needs massaging during kernel relocation and thus handled by relocs.
      R_386_PC32 is ignored and all other relocation types are considered
      error.
      
      When the target of a relocation resides in a discarded section,
      binutils doesn't throw away the relocation record but nullifies it by
      changing it to R_386_NONE, which unfortunately makes relocs fail.
      
      The problem was triggered by yet out-of-tree x86 stack unwind patches
      but given the binutils behavior, ignoring R_386_NONE is the right
      thing to do.
      
      The problem has been tracked down to binutils behavior by Jan Beulich.
      
      [ Impact: fix build with certain binutils by ignoring R_386_NONE ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <4A1B8150.40702@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      46176b4f
    • A
      KVM: Fix PDPTR reloading on CR4 writes · a2edf57f
      Avi Kivity 提交于
      The processor is documented to reload the PDPTRs while in PAE mode if any
      of the CR4 bits PSE, PGE, or PAE change.  Linux relies on this
      behaviour when zapping the low mappings of PAE kernels during boot.
      
      The code already handled changes to CR4.PAE; augment it to also notice changes
      to PSE and PGE.
      
      This triggered while booting an F11 PAE kernel; the futex initialization code
      runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem
      ended up uninitialized, killing PI futexes and pulseaudio which uses them.
      
      Cc: stable@kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a2edf57f
    • A
      KVM: Make paravirt tlb flush also reload the PAE PDPTRs · a8cd0244
      Avi Kivity 提交于
      The paravirt tlb flush may be used not only to flush TLBs, but also
      to reload the four page-directory-pointer-table entries, as it is used
      as a replacement for reloading CR3.  Change the code to do the entire
      CR3 reloading dance instead of simply flushing the TLB.
      
      Cc: stable@kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a8cd0244
  14. 25 5月, 2009 1 次提交
  15. 23 5月, 2009 4 次提交
  16. 22 5月, 2009 1 次提交
  17. 21 5月, 2009 1 次提交
  18. 19 5月, 2009 1 次提交
    • Y
      x86, io-apic: Don't mark pin_programmed early · 4c6f18fc
      Yinghai Lu 提交于
      Peter bisected that:
      
      | commit b9c61b70
      | Date:   Wed May 6 10:10:06 2009 -0700
      |
      |     x86/pci: update pirq_enable_irq() to setup io apic routing
      |
      |     So we can set io apic routing only when enabling the device irq.
      
      wrecked his opteron box, ata1 interrupts fail to get through.
      
      ata1 is using irq 11:
      
      [    1.451839] sata_svw 0000:01:0e.0: version 2.3
      [    1.456333] sata_svw 0000:01:0e.0: PCI INT A -> GSI 11 (level, low) -> IRQ 11
      [    1.463639] scsi0 : sata_svw
      [    1.466949] scsi1 : sata_svw
      [    1.470022] scsi2 : sata_svw
      [    1.473090] scsi3 : sata_svw
      [    1.476112] ata1: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe000 irq 11
      [    1.483490] ata2: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe100 irq 11
      [    1.490870] ata3: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe200 irq 11
      [    1.498247] ata4: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe300 irq 11
      
      that pin is overlapped with pin with legacy ones.
      
      We should not set bits in pin_programmed here, so that those bit could
      be set later via io_apic_set_pci_routing().
      
      [ Impact: fix boot hang on certain systems ]
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NYinghai Lu <yinghai.lu@kernel.org>
      Tested-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Jack Steiner <steiner@sgi.com>
      LKML-Reference: <4A119990.9020606@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4c6f18fc
  19. 18 5月, 2009 4 次提交
  20. 16 5月, 2009 1 次提交
    • J
      x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126
      Jeremy Fitzhardinge 提交于
      Xiaohui Xin and some other folks at Intel have been looking into what's
      behind the performance hit of paravirt_ops when running native.
      
      It appears that the hit is entirely due to the paravirtualized
      spinlocks introduced by:
      
       | commit 8efcbab6
       | Date:   Mon Jul 7 12:07:51 2008 -0700
       |
       |     paravirt: introduce a "lock-byte" spinlock implementation
      
      The extra call/return in the spinlock path is somehow
      causing an increase in the cycles/instruction of somewhere around 2-7%
      (seems to vary quite a lot from test to test).  The working theory is
      that the CPU's pipeline is getting upset about the
      call->call->locked-op->return->return, and seems to be failing to
      speculate (though I haven't seen anything definitive about the precise
      reasons).  This doesn't entirely make sense, because the performance
      hit is also visible on unlock and other operations which don't involve
      locked instructions.  But spinlock operations clearly swamp all the
      other pvops operations, even though I can't imagine that they're
      nearly as common (there's only a .05% increase in instructions
      executed).
      
      If I disable just the pv-spinlock calls, my tests show that pvops is
      identical to non-pvops performance on native (my measurements show that
      it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
      
      Summary of results, averaging 10 runs of the "mmperf" test, using a
      no-pvops build as baseline:
      
      		nopv		Pv-nospin	Pv-spin
      CPU cycles	100.00%		99.89%		102.18%
      instructions	100.00%		100.10%		100.15%
      CPI		100.00%		99.79%		102.03%
      cache ref	100.00%		100.84%		100.28%
      cache miss	100.00%		90.47%		88.56%
      cache miss rate	100.00%		89.72%		88.31%
      branches	100.00%		99.93%		100.04%
      branch miss	100.00%		103.66%		107.72%
      branch miss rt	100.00%		103.73%		107.67%
      wallclock	100.00%		99.90%		102.20%
      
      The clear effect here is that the 2% increase in CPI is
      directly reflected in the final wallclock time.
      
      (The other interesting effect is that the more ops are
      out of line calls via pvops, the lower the cache access
      and miss rates.  Not too surprising, but it suggests that
      the non-pvops kernel is over-inlined.  On the flipside,
      the branch misses go up correspondingly...)
      
      So, what's the fix?
      
      Paravirt patching turns all the pvops calls into direct calls, so
      _spin_lock etc do end up having direct calls.  For example, the compiler
      generated code for paravirtualized _spin_lock is:
      
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq  *0xffffffff805a5b30
      <_spin_lock+22>:	retq
      
      The indirect call will get patched to:
      <_spin_lock+0>:		mov    %gs:0xb4c8,%rax
      <_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
      <_spin_lock+15>:	callq <__ticket_spin_lock>
      <_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
      <_spin_lock+22>:	retq
      
      One possibility is to inline _spin_lock, etc, when building an
      optimised kernel (ie, when there's no spinlock/preempt
      instrumentation/debugging enabled).  That will remove the outer
      call/return pair, returning the instruction stream to a single
      call/return, which will presumably execute the same as the non-pvops
      case.  The downsides arel 1) it will replicate the
      preempt_disable/enable code at eack lock/unlock callsite; this code is
      fairly small, but not nothing; and 2) the spinlock definitions are
      already a very heavily tangled mass of #ifdefs and other preprocessor
      magic, and making any changes will be non-trivial.
      
      The other obvious answer is to disable pv-spinlocks.  Making them a
      separate config option is fairly easy, and it would be trivial to
      enable them only when Xen is enabled (as the only non-default user).
      But it doesn't really address the common case of a distro build which
      is going to have Xen support enabled, and leaves the open question of
      whether the native performance cost of pv-spinlocks is worth the
      performance improvement on a loaded Xen system (10% saving of overall
      system CPU when guests block rather than spin).  Still it is a
      reasonable short-term workaround.
      
      [ Impact: fix pvops performance regression when running native ]
      Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
      Analysed-by: N"Li Xin" <xin.li@intel.com>
      Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      LKML-Reference: <4A0B62F7.5030802@goop.org>
      [ fixed the help text ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b4ecc126
  21. 15 5月, 2009 1 次提交
  22. 14 5月, 2009 1 次提交
    • S
      x86/function-graph: fix constraint for recording old return value · aa512a27
      Steven Rostedt 提交于
      After upgrading from gcc 4.2.2 to 4.4.0, the function graph tracer broke.
      Investigating, I found that in the asm that replaces the return value,
      gcc was using the same register for the old value as it was for the
      new value.
      
      	mov	(addr), old
      	mov	new, (addr)
      
      But if old and new are the same register, we clobber new with old!
      I first thought this was a bug in gcc 4.4.0 and reported it:
      
        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40132
      
      Andrew Pinski responded (quickly), saying that it was correct gcc behavior
      and the code needed to denote old as an "early clobber".
      
      Instead of "=r"(old), we need "=&r"(old).
      
      [Impact: keep function graph tracer from breaking with gcc 4.4.0 ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      aa512a27
  23. 13 5月, 2009 3 次提交
    • R
      xen: use header for EXPORT_SYMBOL_GPL · 44408ad7
      Randy Dunlap 提交于
      mmu.c needs to #include module.h to prevent these warnings:
      
       arch/x86/xen/mmu.c:239: warning: data definition has no type or storage class
       arch/x86/xen/mmu.c:239: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
       arch/x86/xen/mmu.c:239: warning: parameter names (without types) in function declaration
      
      [ Impact: cleanup ]
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44408ad7
    • H
      x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN · c4f68236
      H. Peter Anvin 提交于
      Handle the misconfiguration where CONFIG_PHYSICAL_START is
      incompatible with CONFIG_PHYSICAL_ALIGN.  This is a configuration
      error, but one which arises easily since Kconfig doesn't have the
      smarts to express the true relationship between these two variables.
      Hence, align __PHYSICAL_START the same way we align LOAD_PHYSICAL_ADDR
      in <asm/boot.h>.
      
      For non-relocatable kernels, this would cause the boot to fail.
      
      [ Impact: fix boot failures for non-relocatable kernels ]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c4f68236
    • H
      x86, boot: correct sanity checks in boot/compressed/misc.c · 7ed42a28
      H. Peter Anvin 提交于
      arch/x86/boot/compressed/misc.c contains several sanity checks on the
      output address.  Correct constraints that are no longer correct:
      
      - the alignment test should be MIN_KERNEL_ALIGN on both 32 and 64
        bits.
      - the 64 bit maximum address was set to 2^40, which was the limit of
        one specific x86-64 implementation.  Change the test to 2^46, the
        current Linux limit, and at least try to test the end rather than
        the beginning.
      - for non-relocatable kernels, test against LOAD_PHYSICAL_ADDR on both
        32 and 64 bits.
      
      [ Impact: fix potential boot failure due to invalid tests ]
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7ed42a28
  24. 12 5月, 2009 1 次提交
    • Y
      x86: read apic ID in the !acpi_lapic case · 4797f6b0
      Yinghai Lu 提交于
      Ed found that on 32-bit, boot_cpu_physical_apicid is not read right,
      when the mptable is broken.
      
      Interestingly, actually three paths use/set it:
      
       1. acpi: at that time that is already read from reg
       2. mptable: only read from mptable
       3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit
      
      so we could read the apic id for the 2/3 path. We trust the hardware
      register more than we trust a BIOS data structure (the mptable).
      
      We can also avoid the double set_fixmap() when acpi_lapic
      is used, and also need to move cpu_has_apic earlier and
      call apic_disable().
      
      Also when need to update the apic id, we'd better read and
      set the apic version as well - so that quirks are applied precisely.
      
      v2: make path 3 with 64bit, use -1 as apic id, so could read it later.
      v3: fix whitespace problem pointed out by Ed Swierk
      v5: fix boot crash
      
      [ Impact: get correct apic id for bsp other than acpi path ]
      Reported-by: NEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <49FC85A9.2070702@kernel.org>
      [ v4: sanity-check in the ACPI case too ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4797f6b0