1. 31 3月, 2009 1 次提交
  2. 30 3月, 2009 2 次提交
  3. 28 3月, 2009 1 次提交
    • C
      generic compat_sys_ustat · 2b1c6bd7
      Christoph Hellwig 提交于
      Due to a different size of ino_t ustat needs a compat handler, but
      currently only x86 and mips provide one.  Add a generic compat_sys_ustat
      and switch all architectures over to it.  Instead of doing various
      user copy hacks compat_sys_ustat just reimplements sys_ustat as
      it's trivial.  This was suggested by Arnd Bergmann.
      
      Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
      stack smashing warnings on RHEL/Fedora due to the too large amount of
      data writen by the syscall.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2b1c6bd7
  4. 27 3月, 2009 1 次提交
  5. 26 3月, 2009 1 次提交
  6. 25 3月, 2009 1 次提交
  7. 24 3月, 2009 17 次提交
    • G
      KVM: Report IRQ injection status to userspace. · 4925663a
      Gleb Natapov 提交于
      IRQ injection status is either -1 (if there was no CPU found
      that should except the interrupt because IRQ was masked or
      ioapic was misconfigured or ...) or >= 0 in that case the
      number indicates to how many CPUs interrupt was injected.
      If the value is 0 it means that the interrupt was coalesced
      and probably should be reinjected.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4925663a
    • A
      x86: Add EFER descriptions for FFXSR · d2062693
      Alexander Graf 提交于
      AMD k10 includes support for the FFXSR feature, which leaves out
      XMM registers on FXSAVE/FXSAVE when the EFER_FFXSR bit is set in
      EFER.
      
      The CPUID feature bit exists already, but the EFER bit is missing
      currently, so this patch adds it to the list of known EFER bits.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      CC: Joerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d2062693
    • A
      KVM: Avoid using CONFIG_ in userspace visible headers · 91b2ae77
      Avi Kivity 提交于
      Kconfig symbols are not available in userspace, and are not stripped by
      headers-install.  Avoid their use by adding #defines in <asm/kvm.h> to
      suit each architecture.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      91b2ae77
    • A
      KVM: MMU: Rename "metaphysical" attribute to "direct" · f6e2c02b
      Avi Kivity 提交于
      This actually describes what is going on, rather than alerting the reader
      that something strange is going on.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f6e2c02b
    • A
      KVM: Move struct kvm_pio_request into x86 kvm_host.h · 1c08364c
      Avi Kivity 提交于
      This is an x86 specific stucture and has no business living in common code.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1c08364c
    • M
      KVM: PIT: provide an option to disable interrupt reinjection · 52d939a0
      Marcelo Tosatti 提交于
      Certain clocks (such as TSC) in older 2.6 guests overaccount for lost
      ticks, causing severe time drift. Interrupt reinjection magnifies the
      problem.
      
      Provide an option to disable it.
      
      [avi: allow room for expansion in case we want to disable reinjection
            of other timers]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      52d939a0
    • I
      KVM: introduce kvm_read_guest_virt, kvm_write_guest_virt · 77c2002e
      Izik Eidus 提交于
      This commit change the name of emulator_read_std into kvm_read_guest_virt,
      and add new function name kvm_write_guest_virt that allow writing into a
      guest virtual address.
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      77c2002e
    • M
      KVM: VMX: initialize TSC offset relative to vm creation time · 53f658b3
      Marcelo Tosatti 提交于
      VMX initializes the TSC offset for each vcpu at different times, and
      also reinitializes it for vcpus other than 0 on APIC SIPI message.
      
      This bug causes the TSC's to appear unsynchronized in the guest, even if
      the host is good.
      
      Older Linux kernels don't handle the situation very well, so
      gettimeofday is likely to go backwards in time:
      
      http://www.mail-archive.com/kvm@vger.kernel.org/msg02955.html
      http://sourceforge.net/tracker/index.php?func=detail&aid=2025534&group_id=180599&atid=893831
      
      Fix it by initializating the offset of each vcpu relative to vm creation
      time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the
      APIC MP init path.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      53f658b3
    • A
      KVM: MMU: Segregate mmu pages created with different cr4.pge settings · 2f0b3d60
      Avi Kivity 提交于
      Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
      cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
      global bit set (the global bit has no effect if !cr4.pge).
      
      This can only occur on smp with different cr4.pge settings for different
      vcpus (since a cr4 change will resync the shadow ptes), but there's no
      cost to being correct here.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2f0b3d60
    • A
      KVM: MMU: Inherit a shadow page's guest level count from vcpu setup · a770f6f2
      Avi Kivity 提交于
      Instead of "calculating" it on every shadow page allocation, set it once
      when switching modes, and copy it when allocating pages.
      
      This doesn't buy us much, but sets up the stage for inheriting more
      information related to the mmu setup.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a770f6f2
    • J
      KVM: x86: Virtualize debug registers · 42dbaa5a
      Jan Kiszka 提交于
      So far KVM only had basic x86 debug register support, once introduced to
      realize guest debugging that way. The guest itself was not able to use
      those registers.
      
      This patch now adds (almost) full support for guest self-debugging via
      hardware registers. It refactors the code, moving generic parts out of
      SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
      it ensures that the registers are properly switched between host and
      guest.
      
      This patch also prepares debug register usage by the host. The latter
      will (once wired-up by the following patch) allow for hardware
      breakpoints/watchpoints in guest code. If this is enabled, the guest
      will only see faked debug registers without functionality, but with
      content reflecting the guest's modifications.
      
      Tested on Intel only, but SVM /should/ work as well, but who knows...
      
      Known limitations: Trapping on tss switch won't work - most probably on
      Intel.
      
      Credits also go to Joerg Roedel - I used his once posted debugging
      series as platform for this patch.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      42dbaa5a
    • J
      KVM: New guest debug interface · d0bfb940
      Jan Kiszka 提交于
      This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
      instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
      part, controlling the "main switch" and the single-step feature. The
      arch specific part adds an x86 interface for intercepting both types of
      debug exceptions separately and re-injecting them when the host was not
      interested. Moveover, the foundation for guest debugging via debug
      registers is layed.
      
      To signal breakpoint events properly back to userland, an arch-specific
      data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
      contains the PC, the debug exception, and relevant debug registers to
      tell debug events properly apart.
      
      The availability of this new interface is signaled by
      KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
      provided.
      
      Note that both SVM and VTX are supported, but only the latter was tested
      yet. Based on the experience with all those VTX corner case, I would be
      fairly surprised if SVM will work out of the box.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d0bfb940
    • J
      KVM: VMX: Support for injecting software exceptions · 8ab2d2e2
      Jan Kiszka 提交于
      VMX differentiates between processor and software generated exceptions
      when injecting them into the guest. Extend vmx_queue_exception
      accordingly (and refactor related constants) so that we can use this
      service reliably for the new guest debugging framework.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8ab2d2e2
    • A
      KVM: SVM: Add VMRUN handler · 3d6368ef
      Alexander Graf 提交于
      This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
      in the same context as the normal guest CPU would run.
      So basically it is implemented the same way, a normal CPU would do it.
      
      We also prepare all intercepts that get OR'ed with the original
      intercepts, as we do not allow a level 2 guest to be intercepted less
      than the first level guest.
      
      v2 implements the following improvements:
      
      - fixes the CPL check
      - does not allocate iopm when not used
      - remembers the host's IF in the HIF bit in the hflags
      
      v3:
      
      - make use of the new permission checking
      - add support for V_INTR_MASKING_MASK
      
      v4:
      
      - use host page backed hsave
      
      v5:
      
      - remove IOPM merging code
      
      v6:
      
      - save cr4 so PAE l1 guests work
      
      v7:
      
      - return 0 on vmrun so we check the MSRs too
      - fix MSR check to use the correct variable
      Acked-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3d6368ef
    • A
      KVM: SVM: Implement GIF, clgi and stgi · 1371d904
      Alexander Graf 提交于
      This patch implements the GIF flag and the clgi and stgi instructions that
      set this flag. Only if the flag is set (default), interrupts can be received by
      the CPU.
      
      To keep the information about that somewhere, this patch adds a new hidden
      flags vector. that is used to store information that does not go into the
      vmcb, but is SVM specific.
      
      I tried to write some code to make -no-kvm-irqchip work too, but the first
      level guest won't even boot with that atm, so I ditched it.
      
      v2 moves the hflags to x86 generic code
      v3 makes use of the new permission helper
      v6 only enables interrupt_window if GIF=1
      Acked-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1371d904
    • A
      KVM: SVM: Move EFER and MSR constants to generic x86 code · 9962d032
      Alexander Graf 提交于
      MSR_EFER_SVME_MASK, MSR_VM_CR and MSR_VM_HSAVE_PA are set in KVM
      specific headers. Linux does have nice header files to collect
      EFER bits and MSR IDs, so IMHO we should put them there.
      
      While at it, I also changed the naming scheme to match that
      of the other defines.
      
      (introduced in v6)
      Acked-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9962d032
    • J
      x86/dmi: fix dmi_alloc() section mismatches · c8608d6b
      Jeremy Fitzhardinge 提交于
      Impact: section mismatch fix
      
      Ingo reports these warnings:
      > WARNING: vmlinux.o(.text+0x6a288e): Section mismatch in reference from
      > the function dmi_alloc() to the function .init.text:extend_brk()
      > The function dmi_alloc() references
      > the function __init extend_brk().
      > This is often because dmi_alloc lacks a __init annotation or the
      > annotation of extend_brk is wrong.
      
      dmi_alloc() is a static inline, and so should be immune to this
      kind of error.  But force it to be inlined and make it __init
      anyway, just to be extra sure.
      
      All of dmi_alloc()'s callers are already __init.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <49C6B23C.2040308@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8608d6b
  8. 23 3月, 2009 1 次提交
    • J
      x86: e820 fix various signedness issues in setup.c and e820.c · ba639039
      Jaswinder Singh Rajput 提交于
      Impact: cleanup
      
      This fixed various signedness issues in setup.c and e820.c:
      arch/x86/kernel/setup.c:455:53: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/setup.c:455:53:    expected int *pnr_map
      arch/x86/kernel/setup.c:455:53:    got unsigned int extern [toplevel] *<noident>
      arch/x86/kernel/setup.c:639:53: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/setup.c:639:53:    expected int *pnr_map
      arch/x86/kernel/setup.c:639:53:    got unsigned int extern [toplevel] *<noident>
      arch/x86/kernel/setup.c:820:54: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/setup.c:820:54:    expected int *pnr_map
      arch/x86/kernel/setup.c:820:54:    got unsigned int extern [toplevel] *<noident>
      
      arch/x86/kernel/e820.c:670:53: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/e820.c:670:53:    expected int *pnr_map
      arch/x86/kernel/e820.c:670:53:    got unsigned int [toplevel] *<noident>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      ba639039
  9. 20 3月, 2009 1 次提交
  10. 19 3月, 2009 1 次提交
  11. 18 3月, 2009 6 次提交
    • S
      x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths · ce4e240c
      Suresh Siddha 提交于
      Impact: optimize APIC IPI related barriers
      
      Uncached MMIO accesses for xapic are inherently serializing and hence
      we don't need explicit barriers for xapic IPI paths.
      
      x2apic MSR writes/reads don't have serializing semantics and hence need
      a serializing instruction or mfence, to make all the previous memory
      stores globally visisble before the x2apic msr write for IPI.
      
      Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "steiner@sgi.com" <steiner@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce4e240c
    • S
      x86, ioapic: Fix non atomic allocation with interrupts disabled · 05c3dc2c
      Suresh Siddha 提交于
      Impact: fix possible race
      
      save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
      called with interrupts disabled. Fix this by splitting this into two different
      function. Allocation part save_IO_APIC_setup() now happens before
      disabling interrupts.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      05c3dc2c
    • S
      x86, x2apic: cleanup the IO-APIC level migration with interrupt-remapping · 0280f7c4
      Suresh Siddha 提交于
      Impact: simplification
      
      In the current code, for level triggered migration, we need to modify the
      io-apic RTE with the update vector information, along with modifying interrupt
      remapping table entry(IRTE) with vector and destination. This is to ensure that
      remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.
      
      With this patch, for level triggered, we eliminate the io-apic RTE modification
      (with the updated vector information), by using a virtual vector (io-apic pin
      number).  Real vector that is used for interrupting cpu will be coming from
      the interrupt-remapping table entry. Trigger mode in the IRTE will always be
      edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
      So a level triggered interrupt will appear as an edge to the local apic
      cpu but still as level to the IO-APIC.
      
      With this change, level irq migration can be done by simply modifying
      the interrupt-remapping table entry with out changing the io-apic RTE.
      And as the interrupt appears as edge at the cpu, in addition to do the
      local apic EOI, we need to do IO-APIC directed EOI to clear the remote
      IRR bit in  the IO-APIC RTE.
      
      This simplies the irq migration in the presence of interrupt-remapping.
      Idea-by: NRajesh Sankaran <rajesh.sankaran@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0280f7c4
    • S
      x86, x2apic: fix clear_local_APIC() in the presence of x2apic · cf6567fe
      Suresh Siddha 提交于
      Impact: cleanup, paranoia
      
      We were not clearing the local APIC in clear_local_APIC() in the
      presence of x2apic. Fix it.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cf6567fe
    • S
      x86, x2apic: enable fault handling for intr-remapping · 9d783ba0
      Suresh Siddha 提交于
      Impact: interface augmentation (not yet used)
      
      Enable fault handling flow for intr-remapping aswell. Fault handling
      code now shared by both dma-remapping and intr-remapping.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9d783ba0
    • J
      x86/brk: make the brk reservation symbols inaccessible from C · 0b1c723d
      Jeremy Fitzhardinge 提交于
      Impact: bulletproofing, clarification
      
      The brk reservation symbols are just there to document the amount
      of space reserved by brk users in the final vmlinux file.  Their
      addresses are irrelevent, and using their addresses will cause
      certain havok.  Name them ".brk.NAME", which is a valid asm symbol
      but C can't reference it; it also highlights their special
      role in the symbol table.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      0b1c723d
  12. 17 3月, 2009 2 次提交
  13. 15 3月, 2009 5 次提交
    • J
      x86: allow extend_brk users to reserve brk space · 796216a5
      Jeremy Fitzhardinge 提交于
      Impact: new interface; remove hard-coded limit
      
      Add RESERVE_BRK(name, size) macro to reserve space in the brk
      area.  This should be a conservative (ie, larger) estimate of
      how much space might possibly be required from the brk area.
      Any unused space will be freed, so there's no real downside
      on making the reservation too large (within limits).
      
      The name should be unique within a given file, and somewhat
      descriptive.
      
      The C definition of RESERVE_BRK() ends up being more complex than
      one would expect to work around a cluster of gcc infelicities:
      
        The first attempt was to simply try putting __section(.brk_reservation)
        on a variable.  This doesn't work because it ends up making it a
        @progbits section, which gets actual space allocated in the vmlinux
        executable.
      
        The second attempt was to emit the space into a section using asm,
        but gcc doesn't allow arguments to be passed to file-level asm()
        statements, making it hard to pass in the size.
      
        The final attempt is to wrap the asm() in a function to allow
        it to have arguments, and put the function itself into the
        .discard section, which vmlinux*.lds drops entirely from the
        emitted vmlinux.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      796216a5
    • Y
      x86-32: compute initial mapping size more accurately · 7543c1de
      Yinghai Lu 提交于
      Impact: simplification
      
      We only need to map the kernel in head_32.S, not the whole of
      lowmem.  We use 512MB as a reasonable (but arbitrary) limit on
      the maximum size of the kernel image.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7543c1de
    • J
      x86: use brk allocation for DMI · 6de6cb44
      Jeremy Fitzhardinge 提交于
      Impact: use new interface instead of previous ad hoc implementation
      
      Use extend_brk() to allocate memory for DMI rather than having an
      ad-hoc allocator.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6de6cb44
    • J
      x86-32: use brk segment for allocating initial kernel pagetable · ccf3fe02
      Jeremy Fitzhardinge 提交于
      Impact: use new interface instead of previous ad hoc implementation
      
      Rather than having special purpose init_pg_table_start/end variables
      to delimit the kernel pagetable built by head_32.S, just use the brk
      mechanism to extend the bss for the new pagetable.
      
      This patch removes init_pg_table_start/end and pg0, defines __brk_base
      (which is page-aligned and immediately follows _end), initializes
      the brk region to start there, and uses it for the 32-bit pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccf3fe02
    • J
      x86: add brk allocation for very, very early allocations · 93dbda7c
      Jeremy Fitzhardinge 提交于
      Impact: new interface
      
      Add a brk()-like allocator which effectively extends the bss in order
      to allow very early code to do dynamic allocations.  This is better than
      using statically allocated arrays for data in subsystems which may never
      get used.
      
      The space for brk allocations is in the bss ELF segment, so that the
      space is mapped properly by the code which maps the kernel, and so
      that bootloaders keep the space free rather than putting a ramdisk or
      something into it.
      
      The bss itself, delimited by __bss_stop, ends before the brk area
      (__brk_base to __brk_limit).  The kernel text, data and bss is reserved
      up to __bss_stop.
      
      Any brk-allocated data is reserved separately just before the kernel
      pagetable is built, as that code allocates from unreserved spaces
      in the e820 map, potentially allocating from any unused brk memory.
      Ultimately any unused memory in the brk area is used in the general
      kernel memory pool.
      
      Initially the brk space is set to 1MB, which is probably much larger
      than any user needs (the largest current user is i386 head_32.S's code
      to build the pagetables to map the kernel, which can get fairly large
      with a big kernel image and no PSE support).  So long as the system
      has sufficient memory for the bootloader to reserve the kernel+1MB brk,
      there are no bad effects resulting from an over-large brk.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      93dbda7c