1. 12 8月, 2007 3 次提交
    • A
      i386: Use global flag to disable broken local apic timer on AMD CPUs. · d3f7eae1
      Andi Kleen 提交于
      The Averatec 2370 and some other Turion laptop BIOS seems to program the
      ENABLE_C1E MSR inconsistently between cores. This confuses the lapic
      use heuristics because when C1E is enabled anywhere it seems to affect
      the complete chip.
      
      Use a global flag instead of a per cpu flag to handle this.
      If any CPU has C1E enabled disabled lapic use.
      
      Thanks to Cal Peake for debugging.
      
      Cc: tglx@linutronix.de
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3f7eae1
    • A
      i386: Make patching more robust, fix paravirt issue · ab144f5e
      Andi Kleen 提交于
      Commit 19d36ccd "x86: Fix alternatives
      and kprobes to remap write-protected kernel text" uses code which is
      being patched for patching.
      
      In particular, paravirt_ops does patching in two stages: first it
      calls paravirt_ops.patch, then it fills any remaining instructions
      with nop_out().  nop_out calls text_poke() which calls
      lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
      that call site is one of the places we patch.
      
      If we always do patching as one single call to text_poke(), we only
      need make sure we're not patching the memcpy in text_poke itself.
      This means the prototype to paravirt_ops.patch needs to change, to
      marshal the new code into a buffer rather than patching in place as it
      does now.  It also means all patching goes through text_poke(), which
      is known to be safe (apply_alternatives is also changed to make a
      single patch).
      
      AK: fix compilation on x86-64 (bad rusty!)
      AK: fix boot on x86-64 (sigh)
      AK: merged with other patches
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab144f5e
    • M
      finish i386 and x86-64 sysdata conversion · 73c59afc
      Muli Ben-Yehuda 提交于
      This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully
      also fixes Riku's and Andy's observed bugs.  It is based on Yinghai Lu's
      and Andy Whitcroft's patches (thanks!) with some changes:
      
      - introduce pci_scan_bus_with_sysdata() and use it instead of
        pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
        allocate the sysdata structure and then call pci_scan_bus().
      - always allocate pci_sysdata dynamically. The whole point of this
        sysdata work is to make it easy to do root-bus specific things
        (e.g., support PCI domains and IOMMU's). I dislike using a default
        struct pci_sysdata in some places and a dynamically allocated
        pci_sysdata elsewhere - the potential for someone indavertantly
        changing the default structure is too high.
      - this patch only makes the minimal changes necessary, i.e., the NUMA node is
        always initialized to -1. Patches to do the right thing with regards
        to the NUMA node can build on top of this (either add a 'node'
        parameter to pci_scan_bus_with_sysdata() or just update the node
        when it becomes known).
      
      The patch was compile tested with various configurations (e.g., NUMAQ,
      VISWS) and run-time tested on i386 and x86-64.  Unfortunately none of my
      machines exhibited the bugs so caveat emptor.
      
      Andy, could you please see if this fixes the NUMA issues you've seen?
      Riku, does this fix "pci=noacpi" on your laptop?
      Signed-off-by: NMuli Ben-Yehuda <muli@il.ibm.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Cc: <riku.seppala@kymp.net>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Jeff Garzik <jeff@garzik.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73c59afc
  2. 01 8月, 2007 2 次提交
    • A
      revert "x86, serial: convert legacy COM ports to platform devices" · 57d4810e
      Andrew Morton 提交于
      Revert 7e92b4fc.  It broke Sébastien Dugué's
      machine and Jeff said (persuasively)
      
        This seems like it will break decades-long-working stuff, in favor of
        breaking new ground in our favorite area, "trusting the BIOS."
      
        It's just not worth it for serial ports, IMO.  Serial ports are something
        that just shouldn't break at this late stage in the game.  My new Intel
        platform boxes don't even have serial ports, so I question the value of
        messing with serial port probing even more...  because...  just wait a year,
        and your box won't have a serial port either!  :)
      
        I certainly don't object to the use of platform devices (or isa_driver),
        but the probe change seems questionable.  That's sorta analagous to
        rewriting the floppy driver probe routine.  Sure you could do it...  but why
        risk all that damage and go through debugging all over again?
      
        It seems clear from this report that we cannot, should not, trust BIOS for
        something (a) so simple and (b) that has been working for over a decade.
      
      Much discussion ensued and we've decided to have another go at all of this.
      
      Cc: Sébastien Dugué <sebastien.dugue@bull.net>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Adam Belay <ambx1@neo.rr.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Jeff Garzik <jeff@garzik.org>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Cc: Sascha Sommer <saschasommer@freenet.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57d4810e
    • S
      remove unused TIF_NOTIFY_RESUME flag · a583f1b5
      Stephane Eranian 提交于
      Remove unused TIF_NOTIFY_RESUME flag for all processor architectures.  The
      flag was not used excecpt on IA-64 where the patch replaces it with
      TIF_PERFMON_WORK.
      Signed-off-by: Nstephane eranian <eranian@hpl.hp.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a583f1b5
  3. 30 7月, 2007 1 次提交
  4. 26 7月, 2007 2 次提交
  5. 25 7月, 2007 1 次提交
  6. 23 7月, 2007 4 次提交
    • A
      i386: Use patchable lock prefix in set_64bit · e05aff85
      Andi Kleen 提交于
      Previously lock was unconditionally used, but shouldn't be needed on
      UP systems.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e05aff85
    • J
      x86: Replace NSC/Cyrix specific chipset access macros by inlined functions. · f25f64ed
      Juergen Beisert 提交于
      Due to index register access ordering problems, when using macros a line
      like this fails (and does nothing):
      
      	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
      
      With inlined functions this line will work as expected.
      
      Note about a side effect: Seems on Geode GX1 based systems the
      "suspend on halt power saving feature" was never enabled due to this
      wrong macro expansion. With inlined functions it will be enabled, but
      this will stop the TSC when the CPU runs into a HLT instruction.
      Kernel output something like this:
      	Clocksource tsc unstable (delta = -472746897 ns)
      
      This is the 3rd version of this patch.
      
       - Adding missed arch/i386/kernel/cpu/mtrr/state.c
      	Thanks to Andres Salomon
       - Adding some big fat comments into the new header file
       	Suggested by Andi Kleen
      
      AK: fixed x86-64 compilation
      Signed-off-by: NJuergen Beisert <juergen@kreuzholzen.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f25f64ed
    • A
      x86: Stop MCEs and NMIs during code patching · 8f4e956b
      Andi Kleen 提交于
      When a machine check or NMI occurs while multiple byte code is patched
      the CPU could theoretically see an inconsistent instruction and crash.
      Prevent this by temporarily disabling MCEs and returning early in the
      NMI handler.
      
      Based on discussion with Mathieu Desnoyers.
      
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4e956b
    • A
      x86: Fix alternatives and kprobes to remap write-protected kernel text · 19d36ccd
      Andi Kleen 提交于
      Reenable kprobes and alternative patching when the kernel text is write
      protected by DEBUG_RODATA
      
      Add a general utility function to change write protected text.  The new
      function remaps the code using vmap to write it and takes care of CPU
      synchronization.  It also does CLFLUSH to make icache recovery faster.
      
      There are some limitations on when the function can be used, see the
      comment.
      
      This is a newer version that also changes the paravirt_ops code.
      text_poke also supports multi byte patching now.
      
      Contains bug fixes from Zach Amsden and suggestions from Mathieu
      Desnoyers.
      
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Zach Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19d36ccd
  7. 22 7月, 2007 15 次提交
  8. 20 7月, 2007 6 次提交
  9. 18 7月, 2007 6 次提交
    • J
      xen: add the Xenbus sysfs and virtual device hotplug driver · 4bac07c9
      Jeremy Fitzhardinge 提交于
      This communicates with the machine control software via a registry
      residing in a controlling virtual machine. This allows dynamic
      creation, destruction and modification of virtual device
      configurations (network devices, block devices and CPUS, to name some
      examples).
      
      [ Greg, would you mind giving this a review?  Thanks -J ]
      Signed-off-by: NIan Pratt <ian.pratt@xensource.com>
      Signed-off-by: NChristian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Greg KH <greg@kroah.com>
      4bac07c9
    • J
      xen: Core Xen implementation · 5ead97c8
      Jeremy Fitzhardinge 提交于
      This patch is a rollup of all the core pieces of the Xen
      implementation, including:
       - booting and setup
       - pagetable setup
       - privileged instructions
       - segmentation
       - interrupt flags
       - upcalls
       - multicall batching
      
      BOOTING AND SETUP
      
      The vmlinux image is decorated with ELF notes which tell the Xen
      domain builder what the kernel's requirements are; the domain builder
      then constructs the address space accordingly and starts the kernel.
      
      Xen has its own entrypoint for the kernel (contained in an ELF note).
      The ELF notes are set up by xen-head.S, which is included into head.S.
      In principle it could be linked separately, but it seems to provoke
      lots of binutils bugs.
      
      Because the domain builder starts the kernel in a fairly sane state
      (32-bit protected mode, paging enabled, flat segments set up), there's
      not a lot of setup needed before starting the kernel proper.  The main
      steps are:
        1. Install the Xen paravirt_ops, which is simply a matter of a
           structure assignment.
        2. Set init_mm to use the Xen-supplied pagetables (analogous to the
           head.S generated pagetables in a native boot).
        3. Reserve address space for Xen, since it takes a chunk at the top
           of the address space for its own use.
        4. Call start_kernel()
      
      PAGETABLE SETUP
      
      Once we hit the main kernel boot sequence, it will end up calling back
      via paravirt_ops to set up various pieces of Xen specific state.  One
      of the critical things which requires a bit of extra care is the
      construction of the initial init_mm pagetable.  Because Xen places
      tight constraints on pagetables (an active pagetable must always be
      valid, and must always be mapped read-only to the guest domain), we
      need to be careful when constructing the new pagetable to keep these
      constraints in mind.  It turns out that the easiest way to do this is
      use the initial Xen-provided pagetable as a template, and then just
      insert new mappings for memory where a mapping doesn't already exist.
      
      This means that during pagetable setup, it uses a special version of
      xen_set_pte which ignores any attempt to remap a read-only page as
      read-write (since Xen will map its own initial pagetable as RO), but
      lets other changes to the ptes happen, so that things like NX are set
      properly.
      
      PRIVILEGED INSTRUCTIONS AND SEGMENTATION
      
      When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
      This means that it is more privileged than user-mode in ring 3, but it
      still can't run privileged instructions directly.  Non-performance
      critical instructions are dealt with by taking a privilege exception
      and trapping into the hypervisor and emulating the instruction, but
      more performance-critical instructions have their own specific
      paravirt_ops.  In many cases we can avoid having to do any hypercalls
      for these instructions, or the Xen implementation is quite different
      from the normal native version.
      
      The privileged instructions fall into the broad classes of:
        Segmentation: setting up the GDT and the GDT entries, LDT,
           TLS and so on.  Xen doesn't allow the GDT to be directly
           modified; all GDT updates are done via hypercalls where the new
           entries can be validated.  This is important because Xen uses
           segment limits to prevent the guest kernel from damaging the
           hypervisor itself.
        Traps and exceptions: Xen uses a special format for trap entrypoints,
           so when the kernel wants to set an IDT entry, it needs to be
           converted to the form Xen expects.  Xen sets int 0x80 up specially
           so that the trap goes straight from userspace into the guest kernel
           without going via the hypervisor.  sysenter isn't supported.
        Kernel stack: The esp0 entry is extracted from the tss and provided to
           Xen.
        TLB operations: the various TLB calls are mapped into corresponding
           Xen hypercalls.
        Control registers: all the control registers are privileged.  The most
           important is cr3, which points to the base of the current pagetable,
           and we handle it specially.
      
      Another instruction we treat specially is CPUID, even though its not
      privileged.  We want to control what CPU features are visible to the
      rest of the kernel, and so CPUID ends up going into a paravirt_op.
      Xen implements this mainly to disable the ACPI and APIC subsystems.
      
      INTERRUPT FLAGS
      
      Xen maintains its own separate flag for masking events, which is
      contained within the per-cpu vcpu_info structure.  Because the guest
      kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
      ignored (and must be, because even if a guest domain disables
      interrupts for itself, it can't disable them overall).
      
      (A note on terminology: "events" and interrupts are effectively
      synonymous.  However, rather than using an "enable flag", Xen uses a
      "mask flag", which blocks event delivery when it is non-zero.)
      
      There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
      are implemented to manage the Xen event mask state.  The only thing
      worth noting is that when events are unmasked, we need to explicitly
      see if there's a pending event and call into the hypervisor to make
      sure it gets delivered.
      
      UPCALLS
      
      Xen needs a couple of upcall (or callback) functions to be implemented
      by each guest.  One is the event upcalls, which is how events
      (interrupts, effectively) are delivered to the guests.  The other is
      the failsafe callback, which is used to report errors in either
      reloading a segment register, or caused by iret.  These are
      implemented in i386/kernel/entry.S so they can jump into the normal
      iret_exc path when necessary.
      
      MULTICALL BATCHING
      
      Xen provides a multicall mechanism, which allows multiple hypercalls
      to be issued at once in order to mitigate the cost of trapping into
      the hypervisor.  This is particularly useful for context switches,
      since the 4-5 hypercalls they would normally need (reload cr3, update
      TLS, maybe update LDT) can be reduced to one.  This patch implements a
      generic batching mechanism for hypercalls, which gets used in many
      places in the Xen code.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Ian Pratt <ian.pratt@xensource.com>
      Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Cc: Adrian Bunk <bunk@stusta.de>
      5ead97c8
    • J
      xen: Add Xen interface header files · a42089dd
      Jeremy Fitzhardinge 提交于
      Add Xen interface header files. These are taken fairly directly from
      the Xen tree, but somewhat rearranged to suit the kernel's conventions.
      
      Define macros and inline functions for doing hypercalls into the
      hypervisor.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NIan Pratt <ian.pratt@xensource.com>
      Signed-off-by: NChristian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      a42089dd
    • J
      Add a sched_clock paravirt_op · 688340ea
      Jeremy Fitzhardinge 提交于
      The tsc-based get_scheduled_cycles interface is not a good match for
      Xen's runstate accounting, which reports everything in nanoseconds.
      
      This patch replaces this interface with a sched_clock interface, which
      matches both Xen and VMI's requirements.
      
      In order to do this, we:
         1. replace get_scheduled_cycles with sched_clock
         2. hoist cycles_2_ns into a common header
         3. update vmi accordingly
      
      One thing to note: because sched_clock is implemented as a weak
      function in kernel/sched.c, we must define a real function in order to
      override this weak binding.  This means the usual paravirt_ops
      technique of using an inline function won't work in this case.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Dan Hecht <dhecht@vmware.com>
      Cc: john stultz <johnstul@us.ibm.com>
      688340ea
    • J
      paravirt: helper to disable all IO space · d572929c
      Jeremy Fitzhardinge 提交于
      In a virtual environment, device drivers such as legacy IDE will waste
      quite a lot of time probing for their devices which will never appear.
      This helper function allows a paravirt implementation to lay claim to
      the whole iomem and ioport space, thereby disabling all device drivers
      trying to claim IO resources.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      d572929c
    • J
      paravirt: make siblingmap functions visible · c70df743
      Jeremy Fitzhardinge 提交于
      Paravirt implementations need to set the sibling map on new cpus.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      c70df743