1. 18 5月, 2010 1 次提交
  2. 15 5月, 2010 3 次提交
    • H
      x86, mrst: Don't blindly access extended config space · e9b1d5d0
      H. Peter Anvin 提交于
      Do not blindly access extended configuration space unless we actively
      know we're on a Moorestown platform.  The fixed-size BAR capability
      lives in the extended configuration space, and thus is not applicable
      if the configuration space isn't appropriately sized.
      
      This fixes booting certain VMware configurations with CONFIG_MRST=y.
      
      Moorestown will add a fake PCI-X 266 capability to advertise the
      presence of extended configuration space.
      Reported-and-tested-by: NPetr Vandrovec <petr@vandrovec.name>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NJacob Pan <jacob.jun.pan@intel.com>
      Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>
      e9b1d5d0
    • F
      x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments · 7f284d3c
      Frank Arnold 提交于
      When running a quest kernel on xen we get:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
      IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
      PGD 0
      Oops: 0000 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
      RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
      2ca/0x3df
      RSP: 0018:ffff880002203e08  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
      RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
      RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
      R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
      R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
      FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
      Stack:
       ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
      <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
      <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
      Call Trace:
       <IRQ>
       [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
       [<ffffffff81059140>] ? mod_timer+0x23/0x25
       [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
       [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
       [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
       [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
       [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
       [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
       [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
       [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
       [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
       <EOI>
       [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
       [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
       [<ffffffff810114eb>] ? default_idle+0x36/0x53
       [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
       [<ffffffff81423a9a>] rest_init+0x7e/0x80
       [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
       [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
       [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
      Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
       00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
       ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
      RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
       RSP <ffff880002203e08>
      CR2: 0000000000000038
      ---[ end trace a7919e7f17c0a726 ]---
      
      The L3 cache index disable feature of AMD CPUs has to be disabled if the
      kernel is running as guest on top of a hypervisor because northbridge
      devices are not available to the guest. Currently, this fixes a boot
      crash on top of Xen. In the future this will become an issue on KVM as
      well.
      
      Check if northbridge devices are present and do not enable the feature
      if there are none.
      
      [ hpa: backported to 2.6.34 ]
      Signed-off-by: NFrank Arnold <frank.arnold@amd.com>
      LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      7f284d3c
    • B
      x86, k8: Fix build error when K8_NB is disabled · ade029e2
      Borislav Petkov 提交于
      K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
      at the final linking stage due to missing exported num_k8_northbridges.
      Add a header stub for that.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100503183036.GJ26107@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      ade029e2
  3. 14 5月, 2010 1 次提交
  4. 13 5月, 2010 3 次提交
  5. 11 5月, 2010 1 次提交
    • M
      kprobes/x86: Fix removed int3 checking order · 829e9245
      Masami Hiramatsu 提交于
      Fix kprobe/x86 to check removed int3 when failing to get kprobe
      from hlist. Since we have a time window between checking int3
      exists on probed address and getting kprobe on that address,
      we can have following scenario:
      
       -------
       CPU1                     CPU2
       hit int3
       check int3 exists
                                remove int3
                                remove kprobe from hlist
       get kprobe from hlist
       no kprobe->OOPS!
       -------
      
      This patch moves int3 checking if there is no kprobe on that
      address for fixing this problem as follows:
      
       ------
       CPU1                     CPU2
       hit int3
                                remove int3
                                remove kprobe from hlist
       get kprobe from hlist
       no kprobe->check int3 exists
                ->rollback&retry
       ------
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100427223348.2322.9112.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      829e9245
  6. 06 5月, 2010 1 次提交
    • D
      x86: Fix fake apicid to node mapping for numa emulation · b0c4d952
      David Rientjes 提交于
      With NUMA emulation, it's possible for a single cpu to be bound
      to multiple nodes since more than one may have affinity if
      allocated on a physical node that is local to the cpu.
      
      APIC ids must therefore be mapped to the lowest node ids to
      maintain generic kernel use of functions such as cpu_to_node()
      that determine device affinity.  For example, if a device has
      proximity to physical node 1, for instance, and a cpu happens to
      be mapped to a higher emulated node id 8, the proximity may not
      be correctly determined by comparison in generic code even
      though the cpu may be truly local and allocated on physical node 1.
      
      When this happens, the true topology of the machine isn't
      accurately represented in the emulated environment; although
      this isn't critical to the system's uptime, any generic code
      that is NUMA aware benefits from the physical topology being
      accurately represented.
      
      This can affect any system that maps multiple APIC ids to a
      single node and is booted with numa=fake=N where N is greater
      than the number of physical nodes.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <alpine.DEB.2.00.1005060224140.19473@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b0c4d952
  7. 05 5月, 2010 1 次提交
  8. 03 5月, 2010 2 次提交
  9. 01 5月, 2010 1 次提交
  10. 30 4月, 2010 1 次提交
    • L
      x86: Fix 'reservetop=' functionality · e67a807f
      Liang Li 提交于
      When specifying the 'reservetop=0xbadc0de' kernel parameter,
      the kernel will stop booting due to a early_ioremap bug that
      relates to commit 8827247f.
      
      The root cause of boot failure problem is the value of
      'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
      But later in setup_arch, the function 'parse_early_param' will
      modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.
      
      The simplest fix might be use __fix_to_virt(idx0) to get updated
      value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
      old value from slot_virt[slot] directly.
      
      Changelog since v0:
      
      -v1: When reservetop being handled then FIXADDR_TOP get
           adjusted, Hence check prev_map then re-initialize slot_virt and
           PMD based on new FIXADDR_TOP.
      
      -v2: place fixup_early_ioremap hence call early_ioremap_init in
           reserve_top_address  to re-initialize slot_virt and
           corresponding PMD when parse_reservertop
      
      -v3: move fixup_early_ioremap out of reserve_top_address to make
           sure other clients of reserve_top_address like xen/lguest won't
           broken
      Signed-off-by: NLiang Li <liang.li@windriver.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Wang Chen <wangchen@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
      [ fixed three small cleanliness details in fixup_early_ioremap() ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e67a807f
  11. 29 4月, 2010 1 次提交
  12. 27 4月, 2010 1 次提交
  13. 25 4月, 2010 1 次提交
    • D
      VMware Balloon driver · 453dc659
      Dmitry Torokhov 提交于
      This is a standalone version of VMware Balloon driver.  Ballooning is a
      technique that allows hypervisor dynamically limit the amount of memory
      available to the guest (with guest cooperation).  In the overcommit
      scenario, when hypervisor set detects that it needs to shuffle some
      memory, it instructs the driver to allocate certain number of pages, and
      the underlying memory gets returned to the hypervisor.  Later hypervisor
      may return memory to the guest by reattaching memory to the pageframes and
      instructing the driver to "deflate" balloon.
      
      We are submitting a standalone driver because KVM maintainer (Avi Kivity)
      expressed opinion (rightly) that our transport does not fit well into
      virtqueue paradigm and thus it does not make much sense to integrate with
      virtio.
      
      There were also some concerns whether current ballooning technique is the
      right thing.  If there appears a better framework to achieve this we are
      prepared to evaluate and switch to using it, but in the meantime we'd like
      to get this driver upstream.
      
      We want to get the driver accepted in distributions so that users do not
      have to deal with an out-of-tree module and many distributions have
      "upstream first" requirement.
      
      The driver has been shipping for a number of years and users running on
      VMware platform will have it installed as part of VMware Tools even if it
      will not come from a distribution, thus there should not be additional
      risk in pulling the driver into mainline.  The driver will only activate
      if host is VMware so everyone else should not be affected at all.
      Signed-off-by: NDmitry Torokhov <dtor@vmware.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453dc659
  14. 24 4月, 2010 2 次提交
    • H
      x86: Disable large pages on CPUs with Atom erratum AAE44 · 7a0fc404
      H. Peter Anvin 提交于
      Atom erratum AAE44/AAF40/AAG38/AAH41:
      
      "If software clears the PS (page size) bit in a present PDE (page
      directory entry), that will cause linear addresses mapped through this
      PDE to use 4-KByte pages instead of using a large page after old TLB
      entries are invalidated. Due to this erratum, if a code fetch uses
      this PDE before the TLB entry for the large page is invalidated then
      it may fetch from a different physical address than specified by
      either the old large page translation or the new 4-KByte page
      translation. This erratum may also cause speculative code fetches from
      incorrect addresses."
      
      [http://download.intel.com/design/processor/specupdt/319536.pdf]
      
      Where as commit 211b3d03 seems to
      workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
      opportunity for the bug to occur and does not totally remove it.  This
      patch disables mixed 4K/4MB page tables totally avoiding the page
      splitting and not tripping this processor issue.
      
      This is based on an original patch by Colin King.
      Originally-by: NColin Ian King <colin.king@canonical.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
      Cc: <stable@kernel.org>
      7a0fc404
    • H
      x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero · 7ce5a2b9
      H. Peter Anvin 提交于
      When we do a thread switch, we clear the outgoing FS/GS base if the
      corresponding selector is nonzero.  This is taken by __switch_to() as
      an entry invariant; it does not verify that it is true on entry.
      However, copy_thread() doesn't enforce this constraint, which can
      result in inconsistent results after fork().
      
      Make copy_thread() match the behavior of __switch_to().
      Reported-and-tested-by: NSamuel Thibault <samuel.thibault@inria.fr>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <4BD1E061.8030605@zytor.com>
      Cc: <stable@kernel.org>
      7ce5a2b9
  15. 23 4月, 2010 1 次提交
  16. 21 4月, 2010 3 次提交
  17. 20 4月, 2010 7 次提交
  18. 14 4月, 2010 1 次提交
    • R
      lguest: stop using KVM hypercall mechanism · 091ebf07
      Rusty Russell 提交于
      This is a partial revert of 4cd8b5e2 "lguest: use KVM hypercalls";
      we revert to using (just as questionable but more reliable) int $15 for
      hypercalls.  I didn't revert the register mapping, so we still use the
      same calling convention as kvm.
      
      KVM in more recent incarnations stopped injecting a fault when a guest
      tried to use the VMCALL instruction from ring 1, so lguest under kvm
      fails to make hypercalls.  It was nice to share code with our KVM
      cousins, but this was overreach.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Matias Zabaljauregui <zabaljauregui@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      091ebf07
  19. 09 4月, 2010 2 次提交
    • F
      perf: Fix unsafe frame rewinding with hot regs fetching · ab285f2b
      Frederic Weisbecker 提交于
      When we fetch the hot regs and rewind to the nth caller, it
      might happen that we dereference a frame pointer outside the
      kernel stack boundaries, like in this example:
      
      	perf_trace_sched_switch+0xd5/0x120
              schedule+0x6b5/0x860
              retint_careful+0xd/0x21
      
      Since we directly dereference a userspace frame pointer here while
      rewinding behind retint_careful, this may end up in a crash.
      
      Fix this by simply using probe_kernel_address() when we rewind the
      frame pointer.
      
      This issue will have a much more proper fix in the next version of the
      perf_arch_fetch_caller_regs() API that will only need to rewind to the
      first caller.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Archs <linux-arch@vger.kernel.org>
      ab285f2b
    • B
      x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions · 73a0e614
      Bjorn Helgaas 提交于
      ACPI Address Space Descriptors (used in _CRS) have a Consumer/Producer
      bit that is supposed to distinguish regions that are consumed directly
      by a device from those that are forwarded ("produced") by a bridge.
      But BIOSes have apparently not used this consistently, and Windows
      seems to ignore it, so I think Linux should ignore it as well.
      
      I can't point to any of these supposed broken BIOSes, but since we
      now rely on _CRS by default, I think it's safer to ignore this bit
      from the start.
      
      Here are details of my experiments with how Windows handles it:
          https://bugzilla.kernel.org/show_bug.cgi?id=15701Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      73a0e614
  20. 07 4月, 2010 5 次提交
  21. 06 4月, 2010 1 次提交
    • V
      perf, x86: Enable Nehalem-EX support · 134fbadf
      Vince Weaver 提交于
      According to Intel Software Devel Manual Volume 3B, the
      Nehalem-EX PMU is just like regular Nehalem (except for the
      uncore support, which is completely different).
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      134fbadf