1. 24 3月, 2007 2 次提交
    • R
      [PATCH] i386: clear segment register padding in core dumps · 6ea65ff7
      Roland McGrath 提交于
      The segment register slots in struct pt_regs are padded to 32 bits.
      Some of these are stored with instructions like "pushl %es", which
      leaves the high 16 bits as they were.  So the high bits of these
      fields in struct pt_regs contain kernel stack garbage.  These bits are
      ignored by everything and never leak to user space, except in core
      dumps.  The user struct pt_regs is always at the base of the thread's
      kernel stack and so it seems unlikely the information that leaks from
      here is ever worthwhile so as to be a security concern, but I'm not
      sure about that.  It has been this way for ages; userland consumers of
      core dumps all mask off these high bits themselves.  So it is not urgent.
      
      This change masks off the padding bits of the segment register slots
      in core dumps.  ptrace already masks off these high bits, so this
      makes the values in core dumps consistent with what ptrace would
      report just before the process died.
      
      As I read the processor manuals, the cs and ss values will always be
      padded with zero bits rather than stack garbage.  But unlike "pushl %es",
      this is not simple to test with a userland program.  So I added the two
      instructions rather than wonder if they are really never necessary.
      
      I think that x86_64 does not have this problem (for either 32-bit or
      64-bit processes).  It only uses "mov" instructions from segment
      registers, which zero-extend.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ea65ff7
    • T
      [PATCH] i386: add command line option "local_apic_timer_c2_ok" · e585bef8
      Thomas Gleixner 提交于
      It turned out that it is almost impossible to trust ACPI, BIOS & Co.
      regarding the C states. This was the reason to switch the local apic
      timer off in C2 state already. OTOH there are sane and well behaving
      systems, which get punished by that decision.
      
      Allow the user to confirm that the local apic timer is trustworthy in C2
      state. This keeps the default behaviour on the safe side.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e585bef8
  2. 17 3月, 2007 1 次提交
  3. 15 3月, 2007 2 次提交
  4. 13 3月, 2007 1 次提交
  5. 07 3月, 2007 2 次提交
  6. 06 3月, 2007 1 次提交
    • I
      [PATCH] disable NMI watchdog by default · 6ebf622b
      Ingo Molnar 提交于
      there's a new NMI watchdog related problem: KVM crashes on certain
      bzImages because ... we enable the NMI watchdog by default (even if the
      user does not ask for it) , and no other OS on this planet does that so
      KVM doesnt have emulation for that yet. So KVM injects a #GP, which
      crashes the Linux guest:
      
       general protection fault: 0000 [#1]
       PREEMPT SMP
       Modules linked in:
       CPU:    0
       EIP:    0060:[<c011a8ae>]    Not tainted VLI
       EFLAGS: 00000246   (2.6.20-rc5-rt0 #3)
       EIP is at setup_apic_nmi_watchdog+0x26d/0x3d3
      
      and no, i did /not/ request an nmi_watchdog on the boot command line!
      
      Solution: turn off that darn thing! It's a debug tool, not a 'make life
      harder' tool!!
      
      with this patch the KVM guest boots up just fine.
      
      And with this my laptop (Lenovo T60) also stopped its sporadic hard
      hanging (sometimes in acpi_init(), sometimes later during bootup,
      sometimes much later during actual use) as well. It hung with both
      nmi_watchdog=1 and nmi_watchdog=2, so it's generally the fact of NMI
      injection that is causing problems, not the NMI watchdog variant, nor
      any particular bootup code.
      
      [ NMI breaks on some systems, esp in combination with SMM -Arjan ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ebf622b
  7. 05 3月, 2007 8 次提交
    • Z
      [PATCH] vmi: apic ops · 772205f6
      Zachary Amsden 提交于
      Use para_fill instead of directly setting the APIC ops to the result of the
      vmi_get_function call - this allows one to implement a VMI ROM without
      implementing APIC functions, just using the native APIC functions.
      
      While doing this, I realized that there is a lot more cleanup that should have
      been done.  Basically, we should never assume that the ROM implements a
      specific set of functions, and always allow fallback to the native
      implementation.
      
      This is critical for future compatibility.
      Signed-off-by: NAnthony Liguori <anthony@codemonkey.ws>
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      772205f6
    • Z
      [PATCH] vmi: pit override · e30fab3a
      Zachary Amsden 提交于
      The time_init_hook in paravirt-ops no longer functions in the correct manner
      after the integration of the hrtimers code.  The problem is that now the call
      path for time initialization is:
      
        time_init :
             late_time_init = hpet_time_init;
      
        late_time_init -> hpet_time_init:
             setup_pit_timer (BAD)
             do_time_init --> (via paravirt.h)
                time_init_hook --> (via arch_hooks.h)
                    time_init_hook (in SUBARCH/setup.c)
      
      If this isn't confusing enough, the paravirt case goes through an indirect
      function pointer in the paravirt-ops table.  The problem is, by the time the
      paravirt hook is called, the pit timer is already enabled.
      
      But paravirt guests have their own timer, and don't want to use the PIT.
      Rather than intensify the struggle for power going on here, just make it all
      nice and simple and just unconditionally do all timer setup in the
      late_time_init hook.  This also has the advantage of enabling timers in the
      same place in all code paths, so everyone has the same bugs and we don't have
      outliers who break other code because they turn on timer too early or too
      late.
      
      So the paravirt-ops time init function is now by default hpet_time_init, which
      is the time init function used for native hardware.  Paravirt guests have the
      chance to override this when they setup the paravirt-ops table, and should
      need no change.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e30fab3a
    • Z
      [PATCH] vmi: paravirt drop udelay op · eda08b1b
      Zachary Amsden 提交于
      Not respecting udelay causes problems with any virtual hardware that is passed
      through to real hardware.  This can be noticed by any device that interacts
      with the real world in real time - like AP startup, which takes real time.  Or
      keyboard LEDs, which should blink in real-time.  Or floppy drives, but only
      when passed through to a real floppy controller on OSes which can't
      sufficiently buffer the floppy commands to emulate a zero latency floppy.  Or
      IDE drives, when connecting to a physical CDROM.
      
      This was mostly a hack to get the kernel to boot faster, but it introduced a
      number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
      against it.  We were the only client, and now want to clean up this cruft.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eda08b1b
    • Z
      [PATCH] vmi: fix highpte · 9a1c13e9
      Zachary Amsden 提交于
      Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
      page tables.  This information is required so the physical address of PTE
      updates can be determined; otherwise, the mm layer would have to carry the
      physical address all the way to each PTE modification callsite, which is even
      more hideous that the macros required to provide the proper hooks.
      
      So lets not mess up arch neutral code to achieve this, but keep the horror in
      an #ifdef HIGHPTE in include/asm-i386/pgtable.h.  I had to use macros here
      because some types are not yet defined in all the include paths for this
      header.
      
      This patch is absolutely required for HIGHPTE kernels to operate properly with
      VMI.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a1c13e9
    • Z
      [PATCH] vmi: cpu cycles fix · 1182d852
      Zachary Amsden 提交于
      In order to share the common code in tsc.c which does CPU Khz calibration, we
      need to make an accurate value of CPU speed available to the tsc.c code.  This
      value loses a lot of precision in a VM because of the timing differences with
      real hardware, but we need it to be as precise as possible so the guest can
      make accurate time calculations with the cycle counters.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1182d852
    • Z
      [PATCH] vmi: sched clock paravirt op fix · 6cb9a835
      Zachary Amsden 提交于
      The custom_sched_clock hook is broken.  The result from sched_clock needs to
      be in nanoseconds, not in CPU cycles.  The TSC is insufficient for this
      purpose, because TSC is poorly defined in a virtual environment, and mostly
      represents real world time instead of scheduled process time (which can be
      interrupted without notice when a virtual machine is descheduled).
      
      To make the scheduler consistent, we must expose a different nature of time,
      that is scheduled time.  So deprecate this custom_sched_clock hack and turn it
      into a paravirt-op, as it should have been all along.  This allows the tsc.c
      code which converts cycles to nanoseconds to be shared by all paravirt-ops
      backends.
      
      It is unfortunate to add a new paravirt-op, but this is a very distinct
      abstraction which is clearly different for all virtual machine
      implementations, and it gets rid of an ugly indirect function which I
      ashamedly admit I hacked in to try to get this to work earlier, and then even
      got in the wrong units.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb9a835
    • C
      [PATCH] sched: remove SMT nice · 69f7c0a1
      Con Kolivas 提交于
      Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
      facilitiate nice working properly where cpu power is shared.  The idling of
      cpus in the presence of runnable tasks is considered too fragile, easy to
      break with outside code, and the complexity of managing this system if an
      architecture comes along with many logical cores sharing cpu power will be
      unworkable.
      
      Remove the associated per_cpu_gain variable in sched_domains used only by
      this code.
      
      Also:
      
        The reason is that with dynticks enabled, this code breaks without yet
        further tweaks so dynticks brought on the rapid demise of this code.  So
        either we tweak this code or kill it off entirely.  It was Ingo's preference
        to kill it off.  Either way this needs to happen for 2.6.21 since dynticks
        has gone in.
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69f7c0a1
    • J
      [PATCH] io_apic.h needs apicdef.h · 58a53b24
      Jean Delvare 提交于
      A -mm patch caused:
      
      In file included from drivers/pci/quirks.c:532:
      include/asm/io_apic.h:61: error: "MAX_IO_APICS" undeclared here (not in a function)
      
      So let's include the needed header.
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58a53b24
  8. 27 2月, 2007 1 次提交
    • L
      Revert "[PATCH] i386: add idle notifier" · ea3d5226
      Linus Torvalds 提交于
      This reverts commit 2ff2d3d7.
      
      Uwe Bugla reports that he cannot mount a floppy drive any more, and Jiri
      Slaby bisected it down to this commit.
      
      Benjamin LaHaise also points out that this is a big hot-path, and that
      interrupt delivery while idle is very common and should not go through
      all these expensive gyrations.
      
      Fix up conflicts in arch/i386/kernel/apic.c and arch/i386/kernel/irq.c
      due to other unrelated irq changes.
      
      Cc: Stephane Eranian <eranian@hpl.hp.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Uwe Bugla <uwe.bugla@gmx.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea3d5226
  9. 21 2月, 2007 2 次提交
  10. 17 2月, 2007 5 次提交
    • T
      [PATCH] i386 rework local apic timer calibration · d36b49b9
      Thomas Gleixner 提交于
      The local apic timer calibration has two problem cases:
      
      1.  The calibration is based on readout of the PIT/HPET timer to detect the
         wrap of the periodic tick.  It happens that a box gets stuck in the
         calibration loop due to a PIT with a broken readout function.
      
      2.  CoreDuo boxen show a sporadic PIT runs too slow defect, which results
         in a wrong lapic calibration.  The PIT goes back to normal operation once
         the lapic timer is switched to periodic mode.
      
      Both are existing and unfixed problems in the current upstream kernel and
      prevent certain laptops and other systems from booting Linux.
      
      Rework the code to address both problems:
      
      - Make the calibration interrupt driven.  This removes the wait_timer_tick
        magic hackery from lapic.c and time_hpet.c.  The clockevents framework
        allows easy substitution of the global tick event handler for the
        calibration.  This is more accurate than monitoring jiffies.  At this point
        of the boot process, nothing disturbes the interrupt delivery, so the
        results are very accurate.
      
      - Verify the calibration against the PM timer, when available by using the
        early access function.  When the measured calibration period is outside of
        an one percent window, then the lapic timer calibration is adjusted to the
        pm timer result.
      
      - Verify the calibration by running the lapic timer with the calibration
        handler.  Disable lapic timer in case of deviation.
      
      This also removes the "synchronization" of the local apic timer to the global
      tick.  This synchronization never worked, as there is no way to synchronize
      PIT(HPET) and local APIC timer.  The synchronization by waiting for the tick
      just alignes the local APIC timer for the first events, but later the events
      drift away due to the different clocks.  Removing the "sync" is just
      randomizing the asynchronous behaviour at setup time.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rohit Seth <rohitseth@google.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d36b49b9
    • T
      [PATCH] clockevents: i386 drivers · e9e2cdb4
      Thomas Gleixner 提交于
      Add clockevent drivers for i386: lapic (local) and PIT/HPET (global).  Update
      the timer IRQ to call into the PIT/HPET driver's event handler and the
      lapic-timer IRQ to call into the lapic clockevent driver.  The assignement of
      timer functionality is delegated to the core framework code and replaces the
      compile and runtime evalution in do_timer_interrupt_hook()
      
      Use the clockevents broadcast support and implement the lapic_broadcast
      function for ACPI.
      
      No changes to existing functionality.
      
      [ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
      [ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
      Cleanups-from: Adrian Bunk <bunk@stusta.de>
      Build-fixes-from: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9e2cdb4
    • T
      [PATCH] i386, apic: clean up the APIC code · e05d723f
      Thomas Gleixner 提交于
      The apic code is quite unstructured and missing a lot of comments.
      
      - Restructure the code into helper functions, timer, setup/shutdown,
        interrupt and power management blocks.
      - Fixup comments.
      - Namespace fixups
      - Inline helpers for version and is_integrated
      - Combine the ack_bad_irq functions
      
      No functional changes.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rohit Seth <rohitseth@google.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e05d723f
    • M
      [PATCH] Mark TSC on GeodeLX reliable · 07190a08
      Marcelo Tosatti 提交于
      The Geode can safely use the TSC for highres, since:
      
      1) Does not support frequency scaling,
      
      2) The TSC _does_ count when the CPU is halted.  Furthermore, the Geode
         supports a mode called "suspension on halt", where Suspend mode (which
         interacts with the power management states) is entered.  TSC counting
         during suspend mode is controlled by bit 8 of the Bus Controller
         Configuration Register #0 (thanks Tom!).
      
      3) no SMP :)
      
      Check if "RTSC counts during suspension" and remove the requirement for
      verification, so the clocksource code can safely select it as an timesource
      for the highres timers subsystem.
      Signed-off-by: NMarcelo Tosatti <marcelo@kvack.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      07190a08
    • I
      [PATCH] x86: rewrite SMP TSC sync code · 95492e46
      Ingo Molnar 提交于
      make the TSC synchronization code more robust, and unify it between x86_64 and
      i386.
      
      The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
      i386, in some rare cases it was /causing/ time-warps on SMP systems.
      
      The new code only checks for TSC asynchronity - and if it can prove a
      time-warp (if it can observe the TSC going backwards when going from one CPU
      to another within a critical section), then the TSC clock-source is turned
      off.
      
      The TSC synchronization-checking code also got moved into a separate file.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95492e46
  11. 13 2月, 2007 15 次提交
    • R
      [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header. · 40d22c1b
      Rusty Russell 提交于
      Extern declarations belong in headers.  Times, they are a'changin.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
      ===================================================================
      40d22c1b
    • R
      [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c · 2a57ff1a
      Rusty Russell 提交于
      When I implemented the DECLARE_PER_CPU(var) macros, I was careful that
      people couldn't use "var" in a non-percpu context, by prepending
      percpu__.  I never considered that this would allow them to overload
      the same name for a per-cpu and a non-percpu variable.
      
      It is only one of many horrors in the i386 boot code, but let's rename
      the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr,
      that's something else...)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
      ===================================================================
      2a57ff1a
    • R
      [PATCH] i386: Move mce_disabled to asm/mce.h · 105fddb8
      Rusty Russell 提交于
      Allows external actors to disable mce.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
      ===================================================================
      105fddb8
    • A
      [PATCH] i386: Remove fastcall in paravirt.[ch] · 1a1eecd1
      Andi Kleen 提交于
      Not needed because fastcall is always default now
      Signed-off-by: NAndi Kleen <ak@suse.de>
      1a1eecd1
    • I
      [PATCH] i386: improve sched_clock() on i686 · f9690982
      Ingo Molnar 提交于
      Clean up sched_clock() on i686: it will use the TSC if available and falls
      back to jiffies only if the user asked for it to be disabled via notsc or
      the CPU calibration code didnt figure out the right cpu_khz.
      
      This generally makes the scheduler timestamps more finegrained, on all
      hardware.  (the current scheduler is pretty resistant against asynchronous
      sched_clock() values on different CPUs, it will allow at most up to a jiffy
      of jitter.)
      
      Also simplify sched_clock()'s check for TSC availability: propagate the
      desire and ability to use the TSC into the tsc_disable flag, previously
      this flag only indicated whether the notsc option was passed.  This makes
      the rare low-res sched_clock() codepath a single branch off a read-mostly
      flag.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f9690982
    • S
      [PATCH] i386: add idle notifier · 2ff2d3d7
      Stephane Eranian 提交于
      Add a notifier mechanism to the low level idle loop.  You can register a
      callback function which gets invoked on entry and exit from the low level idle
      loop.  The low level idle loop is defined as the polling loop, low-power call,
      or the mwait instruction.  Interrupts processed by the idle thread are not
      considered part of the low level loop.
      
      The notifier can be used to measure precisely how much is spent in useless
      execution (or low power mode).  The perfmon subsystem uses it to turn on/off
      monitoring.
      Signed-off-by: Nstephane eranian <eranian@hpl.hp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      2ff2d3d7
    • Z
      [PATCH] i386: Profile pc badness · 7b355202
      Zachary Amsden 提交于
      Profile_pc was broken when using paravirtualization because the
      assumption the kernel was running at CPL 0 was violated, causing
      bad logic to read a random value off the stack.
      
      The only way to be in kernel lock functions is to be in kernel
      code, so validate that assumption explicitly by checking the CS
      value.  We don't want to be fooled by BIOS / APM segments and
      try to read those stacks, so only match KERNEL_CS.
      
      I moved some stuff in segment.h to make it prettier.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      7b355202
    • Z
      [PATCH] i386: vMI timer patches · bbab4f3b
      Zachary Amsden 提交于
      VMI timer code.  It works by taking over the local APIC clock when APIC is
      configured, which requires a couple hooks into the APIC code.  The backend
      timer code could be commonized into the timer infrastructure, but there are
      some pieces missing (stolen time, in particular), and the exact semantics of
      when to do accounting for NO_IDLE need to be shared between different
      hypervisors as well.  So for now, VMI timer is a separate module.
      
      [Adrian Bunk: cleanups]
      
      Subject: VMI timer patches
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      bbab4f3b
    • Z
      [PATCH] i386: vMI backend for paravirt-ops · 7ce0bcfd
      Zachary Amsden 提交于
      Fairly straightforward implementation of VMI backend for paravirt-ops.
      
      [Adrian Bunk: some cleanups]
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      7ce0bcfd
    • Z
      [PATCH] i386: SMP boot hook for paravirt · ae5da273
      Zachary Amsden 提交于
      Add VMI SMP boot hook.  We emulate a regular boot sequence and use the same
      APIC IPI initiation, we just poke magic values to load into the CPU state when
      the startup IPI is received, rather than having to jump through a real mode
      trampoline.
      
      This is all that was needed to get SMP to work.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      ae5da273
    • Z
      [PATCH] i386: paravirt CPU hypercall batching mode · 9226d125
      Zachary Amsden 提交于
      The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
      out to be a significant win during context switch, but must be done at a
      specific point before side effects to CPU state are visible to subsequent
      instructions.  This is similar to the MMU batching hooks already provided.
      The same hooks could be used by the Xen backend to implement a context switch
      multicall.
      
      To explain a bit more about lazy modes in the paravirt patches, basically, the
      idea is that only one of lazy CPU or MMU mode can be active at any given time.
       Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
      multiple PTE updates (say, inside a remap loop), but to avoid keeping some
      kind of state machine about when to flush cpu or mmu updates, we just allow
      one or the other to be active.  Although there is no real reason a more
      comprehensive scheme could not be implemented, there is also no demonstrated
      need for this extra complexity.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      9226d125
    • Z
      [PATCH] MM: page allocation hooks for VMI backend · c119ecce
      Zachary Amsden 提交于
      The VMI backend uses explicit page type notification to track shadow page
      tables.  The allocation of page table roots is especially tricky.  We need to
      clone the root for non-PAE mode while it is protected under the pgd lock to
      correctly copy the shadow.
      
      We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
      they only have 4 entries, and are cached entirely by the processor, which
      makes shadowing them rather simple.
      
      For base page table level allocation, pmd_populate provides the exact hook
      point we need.  Also, we need to allocate pages when splitting a large page,
      and we must release pages before returning the page to any free pool.
      
      Despite being required with these slightly odd semantics for VMI, Xen also
      uses these hooks to determine the exact moment when page tables are created or
      released.
      
      AK: All nops for other architectures
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      c119ecce
    • J
      [PATCH] i386: Convert i386 PDA code to use %fs · 464d1a78
      Jeremy Fitzhardinge 提交于
      Convert the PDA code to use %fs rather than %gs as the segment for
      per-processor data.  This is because some processors show a small but
      measurable performance gain for reloading a NULL segment selector (as %fs
      generally is in user-space) versus a non-NULL one (as %gs generally is).
      
      On modern processors the difference is very small, perhaps undetectable.
      Some old AMD "K6 3D+" processors are noticably slower when %fs is used
      rather than %gs; I have no idea why this might be, but I think they're
      sufficiently rare that it doesn't matter much.
      
      This patch also fixes the math emulator, which had not been adjusted to
      match the changed struct pt_regs.
      
      [frederik.deweerdt@gmail.com: fixit with gdb]
      [mingo@elte.hu: Fix KVM too]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Ian Campbell <Ian.Campbell@XenSource.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NZachary Amsden <zach@vmware.com>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      464d1a78
    • R
      ACPI: cleanup: make disable_acpi() valid w/o CONFIG_ACPI · a795ca58
      Rusty Russell 提交于
      Len Brown <lenb@kernel.org> said:
      > Okay, but better to use disable_acpi()
      > indeed, since this would be the first code not already inside CONFIG_ACPI
      > to invoke disable_acpi(), we could define the inline as empty and you could
      > then scratch the #ifdef too.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a795ca58
    • A
      [PATCH] i386: 2048-byte command line · 7bf9f974
      Alon Bar-Lev 提交于
      Current implementation allows the kernel to receive up to 255 characters from
      the bootloader.  While the boot protocol allows greater buffers to be sent.
      
      In current environment, the command-line is used in order to specify many
      values, including suspend/resume, module arguments, splash, initramfs and
      more.
      
      255 characters are not enough anymore.
      
      After edd issue was fixed, and dynammic kernel command-line patch was
      accepted, we can extend the COMMAND_LINE_SIZE without runtime memory
      requirements.
      Signed-off-by: NAlon Bar-Lev <alon.barlev@gmail.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bf9f974