1. 05 5月, 2010 1 次提交
  2. 03 5月, 2010 2 次提交
  3. 01 5月, 2010 1 次提交
  4. 30 4月, 2010 1 次提交
    • L
      x86: Fix 'reservetop=' functionality · e67a807f
      Liang Li 提交于
      When specifying the 'reservetop=0xbadc0de' kernel parameter,
      the kernel will stop booting due to a early_ioremap bug that
      relates to commit 8827247f.
      
      The root cause of boot failure problem is the value of
      'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
      But later in setup_arch, the function 'parse_early_param' will
      modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.
      
      The simplest fix might be use __fix_to_virt(idx0) to get updated
      value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
      old value from slot_virt[slot] directly.
      
      Changelog since v0:
      
      -v1: When reservetop being handled then FIXADDR_TOP get
           adjusted, Hence check prev_map then re-initialize slot_virt and
           PMD based on new FIXADDR_TOP.
      
      -v2: place fixup_early_ioremap hence call early_ioremap_init in
           reserve_top_address  to re-initialize slot_virt and
           corresponding PMD when parse_reservertop
      
      -v3: move fixup_early_ioremap out of reserve_top_address to make
           sure other clients of reserve_top_address like xen/lguest won't
           broken
      Signed-off-by: NLiang Li <liang.li@windriver.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Wang Chen <wangchen@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
      [ fixed three small cleanliness details in fixup_early_ioremap() ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e67a807f
  5. 29 4月, 2010 1 次提交
  6. 27 4月, 2010 1 次提交
  7. 25 4月, 2010 1 次提交
    • D
      VMware Balloon driver · 453dc659
      Dmitry Torokhov 提交于
      This is a standalone version of VMware Balloon driver.  Ballooning is a
      technique that allows hypervisor dynamically limit the amount of memory
      available to the guest (with guest cooperation).  In the overcommit
      scenario, when hypervisor set detects that it needs to shuffle some
      memory, it instructs the driver to allocate certain number of pages, and
      the underlying memory gets returned to the hypervisor.  Later hypervisor
      may return memory to the guest by reattaching memory to the pageframes and
      instructing the driver to "deflate" balloon.
      
      We are submitting a standalone driver because KVM maintainer (Avi Kivity)
      expressed opinion (rightly) that our transport does not fit well into
      virtqueue paradigm and thus it does not make much sense to integrate with
      virtio.
      
      There were also some concerns whether current ballooning technique is the
      right thing.  If there appears a better framework to achieve this we are
      prepared to evaluate and switch to using it, but in the meantime we'd like
      to get this driver upstream.
      
      We want to get the driver accepted in distributions so that users do not
      have to deal with an out-of-tree module and many distributions have
      "upstream first" requirement.
      
      The driver has been shipping for a number of years and users running on
      VMware platform will have it installed as part of VMware Tools even if it
      will not come from a distribution, thus there should not be additional
      risk in pulling the driver into mainline.  The driver will only activate
      if host is VMware so everyone else should not be affected at all.
      Signed-off-by: NDmitry Torokhov <dtor@vmware.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453dc659
  8. 24 4月, 2010 2 次提交
    • H
      x86: Disable large pages on CPUs with Atom erratum AAE44 · 7a0fc404
      H. Peter Anvin 提交于
      Atom erratum AAE44/AAF40/AAG38/AAH41:
      
      "If software clears the PS (page size) bit in a present PDE (page
      directory entry), that will cause linear addresses mapped through this
      PDE to use 4-KByte pages instead of using a large page after old TLB
      entries are invalidated. Due to this erratum, if a code fetch uses
      this PDE before the TLB entry for the large page is invalidated then
      it may fetch from a different physical address than specified by
      either the old large page translation or the new 4-KByte page
      translation. This erratum may also cause speculative code fetches from
      incorrect addresses."
      
      [http://download.intel.com/design/processor/specupdt/319536.pdf]
      
      Where as commit 211b3d03 seems to
      workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
      opportunity for the bug to occur and does not totally remove it.  This
      patch disables mixed 4K/4MB page tables totally avoiding the page
      splitting and not tripping this processor issue.
      
      This is based on an original patch by Colin King.
      Originally-by: NColin Ian King <colin.king@canonical.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
      Cc: <stable@kernel.org>
      7a0fc404
    • H
      x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero · 7ce5a2b9
      H. Peter Anvin 提交于
      When we do a thread switch, we clear the outgoing FS/GS base if the
      corresponding selector is nonzero.  This is taken by __switch_to() as
      an entry invariant; it does not verify that it is true on entry.
      However, copy_thread() doesn't enforce this constraint, which can
      result in inconsistent results after fork().
      
      Make copy_thread() match the behavior of __switch_to().
      Reported-and-tested-by: NSamuel Thibault <samuel.thibault@inria.fr>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <4BD1E061.8030605@zytor.com>
      Cc: <stable@kernel.org>
      7ce5a2b9
  9. 23 4月, 2010 1 次提交
  10. 21 4月, 2010 3 次提交
  11. 20 4月, 2010 7 次提交
  12. 14 4月, 2010 1 次提交
    • R
      lguest: stop using KVM hypercall mechanism · 091ebf07
      Rusty Russell 提交于
      This is a partial revert of 4cd8b5e2 "lguest: use KVM hypercalls";
      we revert to using (just as questionable but more reliable) int $15 for
      hypercalls.  I didn't revert the register mapping, so we still use the
      same calling convention as kvm.
      
      KVM in more recent incarnations stopped injecting a fault when a guest
      tried to use the VMCALL instruction from ring 1, so lguest under kvm
      fails to make hypercalls.  It was nice to share code with our KVM
      cousins, but this was overreach.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Matias Zabaljauregui <zabaljauregui@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      091ebf07
  13. 09 4月, 2010 2 次提交
    • F
      perf: Fix unsafe frame rewinding with hot regs fetching · ab285f2b
      Frederic Weisbecker 提交于
      When we fetch the hot regs and rewind to the nth caller, it
      might happen that we dereference a frame pointer outside the
      kernel stack boundaries, like in this example:
      
      	perf_trace_sched_switch+0xd5/0x120
              schedule+0x6b5/0x860
              retint_careful+0xd/0x21
      
      Since we directly dereference a userspace frame pointer here while
      rewinding behind retint_careful, this may end up in a crash.
      
      Fix this by simply using probe_kernel_address() when we rewind the
      frame pointer.
      
      This issue will have a much more proper fix in the next version of the
      perf_arch_fetch_caller_regs() API that will only need to rewind to the
      first caller.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Archs <linux-arch@vger.kernel.org>
      ab285f2b
    • B
      x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions · 73a0e614
      Bjorn Helgaas 提交于
      ACPI Address Space Descriptors (used in _CRS) have a Consumer/Producer
      bit that is supposed to distinguish regions that are consumed directly
      by a device from those that are forwarded ("produced") by a bridge.
      But BIOSes have apparently not used this consistently, and Windows
      seems to ignore it, so I think Linux should ignore it as well.
      
      I can't point to any of these supposed broken BIOSes, but since we
      now rely on _CRS by default, I think it's safer to ignore this bit
      from the start.
      
      Here are details of my experiments with how Windows handles it:
          https://bugzilla.kernel.org/show_bug.cgi?id=15701Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      73a0e614
  14. 07 4月, 2010 5 次提交
  15. 06 4月, 2010 1 次提交
    • V
      perf, x86: Enable Nehalem-EX support · 134fbadf
      Vince Weaver 提交于
      According to Intel Software Devel Manual Volume 3B, the
      Nehalem-EX PMU is just like regular Nehalem (except for the
      uncore support, which is completely different).
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      134fbadf
  16. 03 4月, 2010 5 次提交
    • S
      x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards · 472a474c
      Suresh Siddha 提交于
      Jan Grossmann reported kernel boot panic while booting SMP
      kernel on his system with a single core cpu. SMP kernels call
      enable_IR_x2apic() from native_smp_prepare_cpus() and on
      platforms where the kernel doesn't find SMP configuration we
      ended up again calling enable_IR_x2apic() from the
      APIC_init_uniprocessor() call in the smp_sanity_check(). Thus
      leading to kernel panic.
      
      Don't call enable_IR_x2apic() and default_setup_apic_routing()
      from APIC_init_uniprocessor() in CONFIG_SMP case.
      
      NOTE: this kind of non-idempotent and assymetric initialization
      sequence is rather fragile and unclean, we'll clean that up
      in v2.6.35. This is the minimal fix for v2.6.34.
      
      Reported-by: Jan.Grossmann@kielnet.net
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: <jbarnes@virtuousgeek.org>
      Cc: <david.woodhouse@intel.com>
      Cc: <weidong.han@intel.com>
      Cc: <youquan.song@intel.com>
      Cc: <Jan.Grossmann@kielnet.net>
      Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x]
      LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      472a474c
    • T
      perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels · 257ef9d2
      Torok Edwin 提交于
      When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
      stopped after the first function, because it has seen a garbage memory
      address (tried to interpret the frame pointer, and return address as a
      64-bit pointer).
      
      Fix this by using a struct stack_frame with 32-bit pointers when the
      TIF_IA32 flag is set.
      
      Note that TIF_IA32 flag must be used, and not is_compat_task(), because
      the latter is only set when the 32-bit process is executing a syscall,
      which may not always be the case (when tracing page fault events for
      example).
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      LKML-Reference: <1268820436-13145-1-git-send-email-edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      257ef9d2
    • P
      perf, x86: Fix AMD hotplug & constraint initialization · b38b24ea
      Peter Zijlstra 提交于
      Commit 3f6da390 ("perf: Rework and fix the arch CPU-hotplug hooks") moved
      the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP
      however amd_nb_id() doesn't work yet on prepare so it would simply bail
      basically reverting to a state where we do not properly track node wide
      constraints - causing weird perf results.
      
      Fix up the AMD NorthBridge initialization code by allocating from
      CPU_UP_PREPARE and installing it from CPU_STARTING once we have the
      proper nb_id. It also properly deals with the allocation failing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ robustify using amd_has_nb() ]
      Signed-off-by: NStephane Eranian <eranian@google.com>
      LKML-Reference: <1269353485.5109.48.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b38b24ea
    • P
      x86: Move notify_cpu_starting() callback to a later stage · 85257024
      Peter Zijlstra 提交于
      Because we need to have cpu identification things done by the time we run
      CPU_STARTING notifiers.
      
      ( This init ordering will be relied on by the next fix. )
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1269353485.5109.48.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      85257024
    • D
      x86: Increase CONFIG_NODES_SHIFT max to 10 · 51591e31
      David Rientjes 提交于
      Some larger systems require more than 512 nodes, so increase the
      maximum CONFIG_NODES_SHIFT to 10 for a new max of 1024 nodes.
      
      This was tested with numa=fake=64M on systems with more than
      64GB of RAM. A total of 1022 nodes were initialized.
      
      Successfully builds with no additional warnings on x86_64
      allyesconfig.
      
      ( No effect on any existing config. Newly enabled CONFIG_MAXSMP=y
        will see the new default. )
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1003251538060.8589@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      51591e31
  17. 02 4月, 2010 4 次提交
    • Y
      ibft, x86: Change reserve_ibft_region() to find_ibft_region() · 042be38e
      Yinghai Lu 提交于
      This allows arch code could decide the way to reserve the ibft.
      
      And we should reserve ibft as early as possible, instead of BOOTMEM
      stage, in case the table is in RAM range and is not reserved by BIOS
      (this will often be the case.)
      
      Move to just after find_smp_config().
      
      Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore.
      
      -v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB510FB.80601@kernel.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      CC: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      042be38e
    • A
      x86, hpet: Fix bug in RTC emulation · b4a5e8a1
      Alok Kataria 提交于
      We think there exists a bug in the HPET code that emulates the RTC.
      
      In the normal case, when the RTC frequency is set, the rtc driver tells
      the hpet code about it here:
      
      int hpet_set_periodic_freq(unsigned long freq)
      {
              uint64_t clc;
      
              if (!is_hpet_enabled())
                      return 0;
      
              if (freq <= DEFAULT_RTC_INT_FREQ)
                      hpet_pie_limit = DEFAULT_RTC_INT_FREQ / freq;
              else {
                      clc = (uint64_t) hpet_clockevent.mult * NSEC_PER_SEC;
                      do_div(clc, freq);
                      clc >>= hpet_clockevent.shift;
                      hpet_pie_delta = (unsigned long) clc;
              }
              return 1;
      }
      
      If freq is set to 64Hz (DEFAULT_RTC_INT_FREQ) or lower, then
      hpet_pie_limit (a static) is set to non-zero.  Then, on every one-shot
      HPET interrupt, hpet_rtc_timer_reinit is called to compute the next
      timeout.  Well, that function has this logic:
      
              if (!(hpet_rtc_flags & RTC_PIE) || hpet_pie_limit)
                      delta = hpet_default_delta;
              else
                      delta = hpet_pie_delta;
      
      Since hpet_pie_limit is not 0, hpet_default_delta is used.  That
      corresponds to 64Hz.
      
      Now, if you set a different rtc frequency, you'll take the else path
      through hpet_set_periodic_freq, but unfortunately no one resets
      hpet_pie_limit back to 0.
      
      Boom....now you are stuck with 64Hz RTC interrupts forever.
      
      The patch below just resets the hpet_pie_limit value when requested freq
      is greater than DEFAULT_RTC_INT_FREQ, which we think fixes this problem.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      LKML-Reference: <201003112200.o2BM0Hre012875@imap1.linux-foundation.org>
      Signed-off-by: NDaniel Hecht <dhecht@vmware.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b4a5e8a1
    • P
      x86, hpet: Erratum workaround for read after write of HPET comparator · 8da854cb
      Pallipadi, Venkatesh 提交于
      On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote:
      > Hello,
      >
      > Again, on the Intel DP55KG board:
      >
      > # uname -a
      > Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux
      >
      > [    1.237600] ------------[ cut here ]------------
      > [    1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80()
      > [    1.238221] Hardware name:
      > [    1.238504] hpet: compare register read back failed.
      > [    1.238793] Modules linked in:
      > [    1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1
      > [    1.239605] Call Trace:
      > [    1.239886]  <IRQ>  [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0
      > [    1.240409]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.240699]  [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50
      > [    1.240992]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.241281]  [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80
      > [    1.241573]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.241859]  [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100
      > [    1.246533]  [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30
      > [    1.246826]  [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0
      > [    1.247118]  [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160
      > [    1.247407]  [<ffffffff81029f55>] ? handle_irq+0x15/0x20
      > [    1.247689]  [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0
      > [    1.247976]  [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa
      > [    1.248262]  <EOI>  [<ffffffff8102f277>] ? mwait_idle+0x57/0x80
      > [    1.248796]  [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0
      > [    1.249080] ---[ end trace db7f668fb6fef4e1 ]---
      >
      > Is this something Intel has to fix or is it a bug in the kernel?
      
      This is a chipset erratum.
      
      Thomas: You mentioned we can retain this check only for known-buggy and
      hpet debug kind of options. But here is the simple workaround patch for
      this particular erratum.
      
      Some chipsets have a erratum due to which read immediately following a
      write of HPET comparator returns old comparator value instead of most
      recently written value.
      
      Erratum 15 in
      "Intel I/O Controller Hub 9 (ICH9) Family Specification Update"
      (http://www.intel.com/assets/pdf/specupdate/316973.pdf)
      
      Workaround for the errata is to read the comparator twice if the first
      one fails.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
      Cc: <stable@kernel.org>
      8da854cb
    • A
      x86: Handle overlapping mptables · 909fc87b
      Andi Kleen 提交于
      We found a system where the MP table MPC and MPF structures overlap.
      
      That doesn't really matter because the mptable is not used anyways with ACPI,
      but it leads to a panic in the early allocator due to the overlapping
      reservations in 2.6.33.
      
      Earlier kernels handled this without problems.
      
      Simply change these reservations to reserve_early_overlap_ok to avoid
      the panic.
      Reported-by: NThomas Renninger <trenn@suse.de>
      Tested-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <20100329074111.GA22821@basil.fritz.box>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      909fc87b
  18. 01 4月, 2010 1 次提交