1. 03 5月, 2007 40 次提交
    • B
      [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync · de938c51
      Bernhard Kaindl 提交于
      If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
      we are running on an AMD64/K8 system, the boot CPU must have had
      MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
      have copied these bits cleared), so we set them on this CPU as well.
      
      This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
      across the CPUs of SMP systems in order to fullfill the duty of
      system software to "initialize and maintain MTRR consistency
      across all processors." as written in the AMD and Intel manuals.
      
      If an WRMSR instruction fails because MtrrFixDramModEn is not
      set, I expect that also the Intel-style MTRR bits are not updated.
      
      AK: minor cleanup, moved MSR defines around
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      de938c51
    • B
      [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending · 3ebad590
      Bernhard Kaindl 提交于
      Note: This patch didn'nt need an update since it's initial post.
      
      Some BIOSes may modify fixed-range MTRRs in SMM, e.g. when they
      transition the system into ACPI mode, which is entered thru an SMI,
      triggered by Linux in acpi_enable().
      
      SMIs which cause that Linux is interrupted and BIOS code is
      executed (which may change e.g. fixed-range MTRRs) in SMM may
      be raised by an embedded system controller which is often found
      in notebooks also at other occasions.
      
      If we would not update our copy of the fixed-range MTRRs before
      suspending to RAM or to disk, restore_processor_state() would
      set the fixed-range MTRRs of the BSP using old backup values
      which may be outdated and this could cause the system to fail
      later during resume.
      
      This patch ensures that our copy of the fixed-range MTRRs
      is updated when saving the boot processor state on suspend
      to disk and suspend to RAM.
      
      In combination with other patches this allows to fix s2ram
      and s2disk on the Acer Ferrari 1000 notebook and at least
      s2disk on the Acer Ferrari 5000 notebook.
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      3ebad590
    • B
      [PATCH] x86: Save the MTRRs of the BSP before booting an AP · 2b1f6278
      Bernhard Kaindl 提交于
      Applied fix by Andew Morton:
      http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.
      
      AMD and Intel x86 CPU manuals state that it is the responsibility of
      system software to initialize and maintain MTRR consistency across
      all processors in Multi-Processing Environments.
      
      Quote from page 188 of the AMD64 System Programming manual (Volume 2):
      
      7.6.5 MTRRs in Multi-Processing Environments
      
      "In multi-processing environments, the MTRRs located in all processors must
      characterize memory in the same way. Generally, this means that identical
      values are written to the MTRRs used by the processors." (short omission here)
      "Failure to do so may result in coherency violations or loss of atomicity.
      Processor implementations do not check the MTRR settings in other processors
      to ensure consistency. It is the responsibility of system software to
      initialize and maintain MTRR consistency across all processors."
      
      Current Linux MTRR code already implements the above in the case that the
      BIOS does not properly initialize MTRRs on the secondary processors,
      but the case where the fixed-range MTRRs of the boot processor are changed
      after Linux started to boot, before the initialsation of a secondary
      processor, is not handled yet.
      
      In this case, secondary processors are currently initialized by Linux
      with MTRRs which the boot processor had very early, when mtrr_bp_init()
      did run, but not with the MTRRs which the boot processor uses at the
      time when that secondary processors is actually booted,
      causing differing MTRR contents on the secondary processors.
      
      Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
      BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
      of the boot processor when it transitions the system into ACPI mode.
      The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
      code runs acpi_enable().
      
      Other occasions where the SMI handler of the BIOS may change bits in
      the MTRRs could occur as well. To initialize newly booted secodary
      processors with the fixed-range MTRRs which the boot processor uses
      at that time, this patch saves the fixed-range MTRRs of the boot
      processor before new secondary processors are started. When the
      secondary processors run their Linux initialisation code, their
      fixed-range MTRRs will be updated with the saved fixed-range MTRRs.
      
      If CONFIG_MTRR is not set, we define mtrr_save_state
      as an empty statement because there is nothing to do.
      
      Possible TODOs:
      
      *) CPU-hotplugging outside of SMP suspend/resume is not yet tested
         with this patch.
      
      *) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
         then the calls to mtrr_save_state() could be replaced by calls to
         mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
         needed.
      
         That would need either verification of the CPU-hotplug code or
         at least a test on a >2 CPU machine.
      
      *) The MTRRs of other running processors are not yet checked at this
         time but it might be interesting to syncronize the MTTRs of all
         processors before booting. That would be an incremental patch,
         but of rather low priority since there is no machine known so
         far which would require this.
      
      AK: moved prototypes on x86-64 around to fix warnings
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b1f6278
    • B
      [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. · 2b3b4835
      Bernhard Kaindl 提交于
      In this current implementation which is used in other patches,
      mtrr_save_fixed_ranges() accepts a dummy void pointer because
      in the current implementation of one of these patches, this
      function may be called from smp_call_function_single() which
      requires that this function takes a void pointer argument.
      
      This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
      which is the element of the static struct which stores our current
      backup of the fixed-range MTRR values which all CPUs shall be
      using.
      
      Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
      kernel initialisation time, __init needs to be removed from
      the declaration of get_fixed_ranges().
      
      If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
      as an empty statement because there is nothing to do.
      
      AK: Moved prototypes for x86-64 around to fix warnings
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b3b4835
    • A
      [PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h · 856f44ff
      Andi Kleen 提交于
      Signed-off-by: NAndi Kleen <ak@suse.de>
      856f44ff
    • J
      [PATCH] i386: Clean up ELF note generation · 03df4f6e
      Jeremy Fitzhardinge 提交于
      Three cleanups:
      
      1: ELF notes are never mapped, so there's no need to have any access
      flags in their phdr.
      
      2: When generating them from asm, tell the assembler to use a SHT_NOTE
      section type.  There doesn't seem to be a way to do this from C.
      
      3: Use ANSI rather than traditional cpp behaviour to stringify the
      macro argument.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      03df4f6e
    • A
      [PATCH] i386: PARAVIRT: Export paravirt_ops for non GPL modules too · 21564fd2
      Andi Kleen 提交于
      Otherwise non GPL modules cannot even do basic operations
      like disabling interrupts anymore, which would be excessive.
      
      Longer term should split the single structure up into
      internal and external symbols and not export the internal
      ones at all.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      21564fd2
    • J
      [PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org> · 441d40dc
      Jeremy Fitzhardinge 提交于
      The other symbols used to delineate the alt-instructions sections have the
      form __foo/__foo_end.  Rename parainstructions to match.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      441d40dc
    • Z
      [PATCH] i386: Convert VMI timer to use clock events · e0bb8643
      Zachary Amsden 提交于
      Convert VMI timer to use clock events, making it properly able to use the NO_HZ
      infrastructure.  On UP systems, with no local APIC, we just continue to route
      these events through the PIT.  On systems with a local APIC, or SMP, we provide
      a single source interrupt chip which creates the local timer IRQ.  It actually
      gets delivered by the APIC hardware, but we don't want to use the same local
      APIC clocksource processing, so we create our own handler here.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      CC: Dan Hecht <dhecht@vmware.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      e0bb8643
    • Z
      [PATCH] i386: Implement vmi_kmap_atomic_pte · eeef9c68
      Zachary Amsden 提交于
      Implement vmi_kmap_atomic_pte in terms of the backend set_linear_mapping
      operation.  The conversion is rather straighforward; call kmap_atomic
      and then inform the hypervisor of the page mapping.
      
      The _flush_tlb damage is due to macros being pulled in from highmem.h.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      eeef9c68
    • Z
    • Z
      [PATCH] i386: Clean up arch/i386/kernel/cpu/mcheck/p4.c · 18420001
      Zachary Amsden 提交于
      No, just no.  You do not use goto to skip a code block.  You do not
      return an obvious variable from a singly-inlined function and give
      the function a return value.  You don't put unexplained comments
      about kmalloc in code which doesn't do dynamic allocation.  And
      you don't leave stray warnings around for no good reason.
      
      Also, when possible, it is better to use block scoped variables
      because gcc can sometime generate better code.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      18420001
    • J
      [PATCH] i386: PARAVIRT: Allow boot-time disable of paravirt_ops patching · 959b4fdf
      Jeremy Fitzhardinge 提交于
      Add "noreplace-paravirt" to disable paravirt_ops patching.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      959b4fdf
    • Z
      752783c0
    • J
      [PATCH] x86: update for i386 and x86-64 check_bugs · 57decbda
      Jeremy Fitzhardinge 提交于
      Remove spurious comments, headers and keywords from x86-64 bugs.[ch].
      
      Use identify_boot_cpu()
      
      AK: merged with other patch
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      57decbda
    • J
      [PATCH] i386: map enough initial memory to create lowmem mappings · 9ce8c2ed
      Jeremy Fitzhardinge 提交于
      head.S creates the very initial pagetable for the kernel.  This just
      maps enough space for the kernel itself, and an allocation bitmap.
      The amount of mapped memory is rounded up to 4Mbytes, and so this
      typically ends up mapping 8Mbytes of memory.
      
      When booting, pagetable_init() needs to create mappings for all
      lowmem, and the pagetables for these mappings are allocated from the
      free pages around the kernel in low memory.  If the number of
      pagetable pages + kernel size exceeds head.S's initial mapping, it
      will end up faulting on an unmapped page.  This will only happen with
      specific combinations of kernel size and memory size.
      
      This patch makes sure that head.S also maps enough space to fit the
      kernel pagetables as well as the kernel itself.  It ends up using an
      additional two pages of unreclaimable memory.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>,
      9ce8c2ed
    • J
      [PATCH] i386: Fix UP gdt bugs · c5413fbe
      Jeremy Fitzhardinge 提交于
      Fixes two problems with the GDT when compiling for uniprocessor:
       - There's no percpu segment, so trying to load its selector into %fs fails.
         Use a null selector instead.
       - The real gdt needs to be loaded at some point.  Do it in cpu_init().
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      c5413fbe
    • J
      [PATCH] i386: Define per_cpu_offset · 1956c73b
      Jeremy Fitzhardinge 提交于
      Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
      asm-generic/percpu.h does for UP.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      1956c73b
    • J
      [PATCH] i386: cleanups to help using per-cpu variables from asm · 978c038e
      Jeremy Fitzhardinge 提交于
      This patch does a few small cleanups:
       - use PER_CPU_NAME to generate the names of per-cpu variables
       - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
         affect condition flags
       - add PER_CPU_VAR which allows direct access to pre-cpu variables
         with the %fs: prefix on SMP.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      978c038e
    • J
      [PATCH] i386: Convert PDA into the percpu section · 7c3576d2
      Jeremy Fitzhardinge 提交于
      Currently x86 (similar to x84-64) has a special per-cpu structure
      called "i386_pda" which can be easily and efficiently referenced via
      the %fs register.  An ELF section is more flexible than a structure,
      allowing any piece of code to use this area.  Indeed, such a section
      already exists: the per-cpu area.
      
      So this patch:
      (1) Removes the PDA and uses per-cpu variables for each current member.
      (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
      (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
          can be used to calculate addresses for this CPU's variables.
      (4) Simplifies startup, because %fs doesn't need to be loaded with a
          special segment at early boot; it can be deferred until the first
          percpu area is allocated (or never for UP).
      
      The result is less code and one less x86-specific concept.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      7c3576d2
    • J
      [PATCH] i386: Page-align the GDT · 7a61d35d
      Jeremy Fitzhardinge 提交于
      Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
      lguest, KVM and native don't care.
      
      Simple transformation to page-aligned "struct gdt_page".
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      7a61d35d
    • J
      [PATCH] x86-64: deflate inflate_dynamic too · 39b7ee06
      Jeremy Fitzhardinge 提交于
      inflate_dynamic() has piggy stack usage too, so heap allocate it too.
      I'm not sure it actually gets used, but it shows up large in "make
      checkstack".
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      39b7ee06
    • J
      [PATCH] x86: deflate stack usage in lib/inflate.c · 35c74226
      Jeremy Fitzhardinge 提交于
      inflate_fixed and huft_build together use around 2.7k of stack.  When
      using 4k stacks, I saw stack overflows from interrupts arriving while
      unpacking the root initrd:
      
      do_IRQ: stack overflow: 384
       [<c0106b64>] show_trace_log_lvl+0x1a/0x30
       [<c01075e6>] show_trace+0x12/0x14
       [<c010763f>] dump_stack+0x16/0x18
       [<c0107ca4>] do_IRQ+0x6d/0xd9
       [<c010202b>] xen_evtchn_do_upcall+0x6e/0xa2
       [<c0106781>] xen_hypervisor_callback+0x25/0x2c
       [<c010116c>] xen_restore_fl+0x27/0x29
       [<c0330f63>] _spin_unlock_irqrestore+0x4a/0x50
       [<c0117aab>] change_page_attr+0x577/0x584
       [<c0117b45>] kernel_map_pages+0x8d/0xb4
       [<c016a314>] cache_alloc_refill+0x53f/0x632
       [<c016a6c2>] __kmalloc+0xc1/0x10d
       [<c0463d34>] malloc+0x10/0x12
       [<c04641c1>] huft_build+0x2a7/0x5fa
       [<c04645a5>] inflate_fixed+0x91/0x136
       [<c04657e2>] unpack_to_rootfs+0x5f2/0x8c1
       [<c0465acf>] populate_rootfs+0x1e/0xe4
      
      (This was under Xen, but there's no reason it couldn't happen on bare
        hardware.)
      
      This patch mallocs the local variables, thereby reducing the stack
      usage to sane levels.
      
      Also, up the heap size for the kernel decompressor to deal with the
      extra allocation.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Tim Yamin <plasmaroo@gentoo.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      35c74226
    • J
      [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear · 4cdd9c89
      Jeremy Fitzhardinge 提交于
      In shadow mode hypervisors, ptep_get_and_clear achieves the desired
      purpose of keeping the shadows in sync by issuing a native_get_and_clear,
      followed by a call to pte_update, which indicates the PTE has been
      modified.
      
      Direct mode hypervisors (Xen) have no need for this anyway, and will trap
      the update using writable pagetables.
      
      This means no hypervisor makes use of ptep_get_and_clear; there is no
      reason to have it in the paravirt-ops structure.  Change confusing
      terminology about raw vs. native functions into consistent use of
      native_pte_xxx for operations which do not invoke paravirt-ops.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4cdd9c89
    • J
      [PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers · 1a45b7aa
      Jeremy Fitzhardinge 提交于
      Replace all the open-coded macros for generating calls with a pair of
      more general macros (__PVOP_CALL/VCALL), and redefine all the
      PVOP_V?CALL[0-4] in terms of them.
      
      [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
        (paravirt_ops-document-asm-i386-paravirth.patch) ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      1a45b7aa
    • J
      [PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi · 4e0fa856
      Jeremy Fitzhardinge 提交于
      Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4e0fa856
    • J
      [PATCH] i386: PARAVIRT: flush lazy mmu updates on kunmap_atomic · 7b2f27f4
      Jeremy Fitzhardinge 提交于
      kunmap_atomic should flush any pending lazy mmu updates, mainly to be
      consistent with kmap_atomic, and to preserve its normal behaviour.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      7b2f27f4
    • J
      [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages · ce6234b5
      Jeremy Fitzhardinge 提交于
      Xen and VMI both have special requirements when mapping a highmem pte
      page into the kernel address space.  These can be dealt with by adding
      a new kmap_atomic_pte() function for mapping highptes, and hooking it
      into the paravirt_ops infrastructure.
      
      Xen specifically wants to map the pte page RO, so this patch exposes a
      helper function, kmap_atomic_prot, which maps the page with the
      specified page protections.
      
      This also adds a kmap_flush_unused() function to clear out the cached
      kmap mappings.  Xen needs this to clear out any potential stray RW
      mappings of pages which will become part of a pagetable.
      
      [ Zach - vmi.c will need some attention after this patch.  It wasn't
        immediately obvious to me what needs to be done. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      ce6234b5
    • J
      [PATCH] i386: PARAVIRT: revert map_pt_hook. · a27fe809
      Jeremy Fitzhardinge 提交于
      Back out the map_pt_hook to clear the way for kmap_atomic_pte.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      a27fe809
    • J
      [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op · d4c10477
      Jeremy Fitzhardinge 提交于
      This patch adds a pv_op for flush_tlb_others.  Linux running on native
      hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
      have a particular mm's pagetable entries cached in its TLB.  This is
      inefficient in a paravirtualized environment, since the hypervisor
      knows which real CPUs actually contain cached mappings, which may be a
      small subset of a guest's VCPUs.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      d4c10477
    • J
      [PATCH] i386: PARAVIRT: add common patching machinery · 63f70270
      Jeremy Fitzhardinge 提交于
      Implement the actual patching machinery.  paravirt_patch_default()
      contains the logic to automatically patch a callsite based on a few
      simple rules:
      
       - if the paravirt_op function is paravirt_nop, then patch nops
       - if the paravirt_op function is a jmp target, then jmp to it
       - if the paravirt_op function is callable and doesn't clobber too much
          for the callsite, call it directly
      
      paravirt_patch_default is suitable as a default implementation of
      paravirt_ops.patch, will remove most of the expensive indirect calls
      in favour of either a direct call or a pile of nops.
      
      Backends may implement their own patcher, however.  There are several
      helper functions to help with this:
      
      paravirt_patch_nop	nop out a callsite
      paravirt_patch_ignore	leave the callsite as-is
      paravirt_patch_call	patch a call if the caller and callee
      			have compatible clobbers
      paravirt_patch_jmp	patch in a jmp
      paravirt_patch_insns	patch some literal instructions over
      			the callsite, if they fit
      
      This patch also implements more direct patches for the native case, so
      that when running on native hardware many common operations are
      implemented inline.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      63f70270
    • J
      [PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h · 294688c0
      Jeremy Fitzhardinge 提交于
      Clean things up, and broadly document:
       - the paravirt_ops functions themselves
       - the patching mechanism
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      294688c0
    • J
      [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable · f8822f42
      Jeremy Fitzhardinge 提交于
      Wrap a set of interesting paravirt_ops calls in a wrapper which makes
      the callsites available for patching.  Unfortunately this is pretty
      ugly because there's no way to get gcc to generate a function call,
      but also wrap just the callsite itself with the necessary labels.
      
      This patch supports functions with 0-4 arguments, and either void or
      returning a value.  64-bit arguments must be split into a pair of
      32-bit arguments (lower word first).  Small structures are returned in
      registers.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      f8822f42
    • J
      [PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register · 42c24fa2
      Jeremy Fitzhardinge 提交于
      Fix a few clobbers to include the return register.  The clobbers set
      is the set of all registers modified (or may be modified) by the code
      snippet, regardless of whether it was deliberate or accidental.
      
      Also, make sure that callsites which are used in contexts which don't
      allow clobbers actually save and restore all clobberable registers.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      42c24fa2
    • J
      [PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure · d5822035
      Jeremy Fitzhardinge 提交于
      Use patch type identifiers derived from the offset of the operation in
      the paravirt_ops structure.  This avoids having to maintain a separate
      enum for patch site types.
      
      Also, since the identifier is derived from the offset into
      paravirt_ops, the offset can be derived from the identifier.  This is
      used to remove replicated information in the various callsite macros,
      which has been a source of bugs in the past.
      
      This patch also drops the fused save_fl+cli operation, which doesn't
      really add much and makes things more complex - specifically because
      it breaks the 1:1 relationship between identifiers and offsets.  If
      this operation turns out to be particularly beneficial, then the right
      answer is to define a new entrypoint for it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      d5822035
    • J
      [PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity · 98de032b
      Jeremy Fitzhardinge 提交于
      Rename struct paravirt_patch to paravirt_patch_site, so that it
      clearly refers to a callsite, and not the patch which may be applied
      to that callsite.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      98de032b
    • J
      [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction · d6dd61c8
      Jeremy Fitzhardinge 提交于
      Add hooks to allow a paravirt implementation to track the lifetime of
      an mm.  Paravirtualization requires three hooks, but only two are
      needed in common code.  They are:
      
      arch_dup_mmap, which is called when a new mmap is created at fork
      
      arch_exit_mmap, which is called when the last process reference to an
        mm is dropped, which typically happens on exit and exec.
      
      The third hook is activate_mm, which is called from the arch-specific
      activate_mm() macro/function, and so doesn't need stub versions for
      other architectures.  It's called when an mm is first used.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: linux-arch@vger.kernel.org
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      d6dd61c8
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: PARAVIRT: Allocate a fixmap slot · 90caccb9
      Jeremy Fitzhardinge 提交于
      Allocate a fixmap slot for use by a paravirt_ops implementation.  This
      is intended for early-boot bootstrap mappings.  Once the zones and
      allocator have been set up, it would be better to use get_vm_area() to
      allocate some virtual space.
      
      Xen uses this to map the hypervisor's shared info page, which doesn't
      have a pseudo-physical page number, and therefore can't be mapped
      ordinarily.  It is needed early because it contains the vcpu state,
      including the interrupt mask.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      90caccb9
    • J
      [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable · b239fb25
      Jeremy Fitzhardinge 提交于
      This patch introduces paravirt_ops hooks to control how the kernel's
      initial pagetable is set up.
      
      In the case of a native boot, the very early bootstrap code creates a
      simple non-PAE pagetable to map the kernel and physical memory.  When
      the VM subsystem is initialized, it creates a proper pagetable which
      respects the PAE mode, large pages, etc.
      
      When booting under a hypervisor, there are many possibilities for what
      paging environment the hypervisor establishes for the guest kernel, so
      the constructon of the kernel's pagetable depends on the hypervisor.
      
      In the case of Xen, the hypervisor boots the kernel with a fully
      constructed pagetable, which is already using PAE if necessary.  Also,
      Xen requires particular care when constructing pagetables to make sure
      all pagetables are always mapped read-only.
      
      In order to make this easier, kernel's initial pagetable construction
      has been changed to only allocate and initialize a pagetable page if
      there's no page already present in the pagetable.  This allows the Xen
      paravirt backend to make a copy of the hypervisor-provided pagetable,
      allowing the kernel to establish any more mappings it needs while
      keeping the existing ones.
      
      A slightly subtle point which is worth highlighting here is that Xen
      requires all kernel mappings to share the same pte_t pages between all
      pagetables, so that updating a kernel page's mapping in one pagetable
      is reflected in all other pagetables.  This makes it possible to
      allocate a page and attach it to a pagetable without having to
      explicitly enumerate that page's mapping in all pagetables.
      
      And:
      
      +From: "Eric W. Biederman" <ebiederm@xmission.com>
      
      If we don't set the leaf page table entries it is quite possible that
      will inherit and incorrect page table entry from the initial boot
      page table setup in head.S.  So we need to redo the effort here,
      so we pick up PSE, PGE and the like.
      
      Hypervisors like Xen require that their page tables be read-only,
      which is slightly incompatible with our low identity mappings, however
      I discussed this with Jeremy he has modified the Xen early set_pte
      function to avoid problems in this area.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      b239fb25