1. 03 4月, 2009 6 次提交
  2. 01 4月, 2009 1 次提交
  3. 31 3月, 2009 1 次提交
  4. 30 3月, 2009 2 次提交
  5. 28 3月, 2009 1 次提交
    • C
      generic compat_sys_ustat · 2b1c6bd7
      Christoph Hellwig 提交于
      Due to a different size of ino_t ustat needs a compat handler, but
      currently only x86 and mips provide one.  Add a generic compat_sys_ustat
      and switch all architectures over to it.  Instead of doing various
      user copy hacks compat_sys_ustat just reimplements sys_ustat as
      it's trivial.  This was suggested by Arnd Bergmann.
      
      Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
      stack smashing warnings on RHEL/Fedora due to the too large amount of
      data writen by the syscall.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2b1c6bd7
  6. 27 3月, 2009 1 次提交
  7. 26 3月, 2009 1 次提交
  8. 25 3月, 2009 1 次提交
  9. 24 3月, 2009 17 次提交
    • G
      KVM: Report IRQ injection status to userspace. · 4925663a
      Gleb Natapov 提交于
      IRQ injection status is either -1 (if there was no CPU found
      that should except the interrupt because IRQ was masked or
      ioapic was misconfigured or ...) or >= 0 in that case the
      number indicates to how many CPUs interrupt was injected.
      If the value is 0 it means that the interrupt was coalesced
      and probably should be reinjected.
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4925663a
    • A
      x86: Add EFER descriptions for FFXSR · d2062693
      Alexander Graf 提交于
      AMD k10 includes support for the FFXSR feature, which leaves out
      XMM registers on FXSAVE/FXSAVE when the EFER_FFXSR bit is set in
      EFER.
      
      The CPUID feature bit exists already, but the EFER bit is missing
      currently, so this patch adds it to the list of known EFER bits.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      CC: Joerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d2062693
    • A
      KVM: Avoid using CONFIG_ in userspace visible headers · 91b2ae77
      Avi Kivity 提交于
      Kconfig symbols are not available in userspace, and are not stripped by
      headers-install.  Avoid their use by adding #defines in <asm/kvm.h> to
      suit each architecture.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      91b2ae77
    • A
      KVM: MMU: Rename "metaphysical" attribute to "direct" · f6e2c02b
      Avi Kivity 提交于
      This actually describes what is going on, rather than alerting the reader
      that something strange is going on.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f6e2c02b
    • A
      KVM: Move struct kvm_pio_request into x86 kvm_host.h · 1c08364c
      Avi Kivity 提交于
      This is an x86 specific stucture and has no business living in common code.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1c08364c
    • M
      KVM: PIT: provide an option to disable interrupt reinjection · 52d939a0
      Marcelo Tosatti 提交于
      Certain clocks (such as TSC) in older 2.6 guests overaccount for lost
      ticks, causing severe time drift. Interrupt reinjection magnifies the
      problem.
      
      Provide an option to disable it.
      
      [avi: allow room for expansion in case we want to disable reinjection
            of other timers]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      52d939a0
    • I
      KVM: introduce kvm_read_guest_virt, kvm_write_guest_virt · 77c2002e
      Izik Eidus 提交于
      This commit change the name of emulator_read_std into kvm_read_guest_virt,
      and add new function name kvm_write_guest_virt that allow writing into a
      guest virtual address.
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      77c2002e
    • M
      KVM: VMX: initialize TSC offset relative to vm creation time · 53f658b3
      Marcelo Tosatti 提交于
      VMX initializes the TSC offset for each vcpu at different times, and
      also reinitializes it for vcpus other than 0 on APIC SIPI message.
      
      This bug causes the TSC's to appear unsynchronized in the guest, even if
      the host is good.
      
      Older Linux kernels don't handle the situation very well, so
      gettimeofday is likely to go backwards in time:
      
      http://www.mail-archive.com/kvm@vger.kernel.org/msg02955.html
      http://sourceforge.net/tracker/index.php?func=detail&aid=2025534&group_id=180599&atid=893831
      
      Fix it by initializating the offset of each vcpu relative to vm creation
      time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the
      APIC MP init path.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      53f658b3
    • A
      KVM: MMU: Segregate mmu pages created with different cr4.pge settings · 2f0b3d60
      Avi Kivity 提交于
      Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
      cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
      global bit set (the global bit has no effect if !cr4.pge).
      
      This can only occur on smp with different cr4.pge settings for different
      vcpus (since a cr4 change will resync the shadow ptes), but there's no
      cost to being correct here.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2f0b3d60
    • A
      KVM: MMU: Inherit a shadow page's guest level count from vcpu setup · a770f6f2
      Avi Kivity 提交于
      Instead of "calculating" it on every shadow page allocation, set it once
      when switching modes, and copy it when allocating pages.
      
      This doesn't buy us much, but sets up the stage for inheriting more
      information related to the mmu setup.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a770f6f2
    • J
      KVM: x86: Virtualize debug registers · 42dbaa5a
      Jan Kiszka 提交于
      So far KVM only had basic x86 debug register support, once introduced to
      realize guest debugging that way. The guest itself was not able to use
      those registers.
      
      This patch now adds (almost) full support for guest self-debugging via
      hardware registers. It refactors the code, moving generic parts out of
      SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
      it ensures that the registers are properly switched between host and
      guest.
      
      This patch also prepares debug register usage by the host. The latter
      will (once wired-up by the following patch) allow for hardware
      breakpoints/watchpoints in guest code. If this is enabled, the guest
      will only see faked debug registers without functionality, but with
      content reflecting the guest's modifications.
      
      Tested on Intel only, but SVM /should/ work as well, but who knows...
      
      Known limitations: Trapping on tss switch won't work - most probably on
      Intel.
      
      Credits also go to Joerg Roedel - I used his once posted debugging
      series as platform for this patch.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      42dbaa5a
    • J
      KVM: New guest debug interface · d0bfb940
      Jan Kiszka 提交于
      This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
      instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
      part, controlling the "main switch" and the single-step feature. The
      arch specific part adds an x86 interface for intercepting both types of
      debug exceptions separately and re-injecting them when the host was not
      interested. Moveover, the foundation for guest debugging via debug
      registers is layed.
      
      To signal breakpoint events properly back to userland, an arch-specific
      data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
      contains the PC, the debug exception, and relevant debug registers to
      tell debug events properly apart.
      
      The availability of this new interface is signaled by
      KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
      provided.
      
      Note that both SVM and VTX are supported, but only the latter was tested
      yet. Based on the experience with all those VTX corner case, I would be
      fairly surprised if SVM will work out of the box.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d0bfb940
    • J
      KVM: VMX: Support for injecting software exceptions · 8ab2d2e2
      Jan Kiszka 提交于
      VMX differentiates between processor and software generated exceptions
      when injecting them into the guest. Extend vmx_queue_exception
      accordingly (and refactor related constants) so that we can use this
      service reliably for the new guest debugging framework.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8ab2d2e2
    • A
      KVM: SVM: Add VMRUN handler · 3d6368ef
      Alexander Graf 提交于
      This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
      in the same context as the normal guest CPU would run.
      So basically it is implemented the same way, a normal CPU would do it.
      
      We also prepare all intercepts that get OR'ed with the original
      intercepts, as we do not allow a level 2 guest to be intercepted less
      than the first level guest.
      
      v2 implements the following improvements:
      
      - fixes the CPL check
      - does not allocate iopm when not used
      - remembers the host's IF in the HIF bit in the hflags
      
      v3:
      
      - make use of the new permission checking
      - add support for V_INTR_MASKING_MASK
      
      v4:
      
      - use host page backed hsave
      
      v5:
      
      - remove IOPM merging code
      
      v6:
      
      - save cr4 so PAE l1 guests work
      
      v7:
      
      - return 0 on vmrun so we check the MSRs too
      - fix MSR check to use the correct variable
      Acked-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      3d6368ef
    • A
      KVM: SVM: Implement GIF, clgi and stgi · 1371d904
      Alexander Graf 提交于
      This patch implements the GIF flag and the clgi and stgi instructions that
      set this flag. Only if the flag is set (default), interrupts can be received by
      the CPU.
      
      To keep the information about that somewhere, this patch adds a new hidden
      flags vector. that is used to store information that does not go into the
      vmcb, but is SVM specific.
      
      I tried to write some code to make -no-kvm-irqchip work too, but the first
      level guest won't even boot with that atm, so I ditched it.
      
      v2 moves the hflags to x86 generic code
      v3 makes use of the new permission helper
      v6 only enables interrupt_window if GIF=1
      Acked-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1371d904
    • A
      KVM: SVM: Move EFER and MSR constants to generic x86 code · 9962d032
      Alexander Graf 提交于
      MSR_EFER_SVME_MASK, MSR_VM_CR and MSR_VM_HSAVE_PA are set in KVM
      specific headers. Linux does have nice header files to collect
      EFER bits and MSR IDs, so IMHO we should put them there.
      
      While at it, I also changed the naming scheme to match that
      of the other defines.
      
      (introduced in v6)
      Acked-by: NJoerg Roedel <joro@8bytes.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9962d032
    • J
      x86/dmi: fix dmi_alloc() section mismatches · c8608d6b
      Jeremy Fitzhardinge 提交于
      Impact: section mismatch fix
      
      Ingo reports these warnings:
      > WARNING: vmlinux.o(.text+0x6a288e): Section mismatch in reference from
      > the function dmi_alloc() to the function .init.text:extend_brk()
      > The function dmi_alloc() references
      > the function __init extend_brk().
      > This is often because dmi_alloc lacks a __init annotation or the
      > annotation of extend_brk is wrong.
      
      dmi_alloc() is a static inline, and so should be immune to this
      kind of error.  But force it to be inlined and make it __init
      anyway, just to be extra sure.
      
      All of dmi_alloc()'s callers are already __init.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <49C6B23C.2040308@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8608d6b
  10. 23 3月, 2009 1 次提交
    • J
      x86: e820 fix various signedness issues in setup.c and e820.c · ba639039
      Jaswinder Singh Rajput 提交于
      Impact: cleanup
      
      This fixed various signedness issues in setup.c and e820.c:
      arch/x86/kernel/setup.c:455:53: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/setup.c:455:53:    expected int *pnr_map
      arch/x86/kernel/setup.c:455:53:    got unsigned int extern [toplevel] *<noident>
      arch/x86/kernel/setup.c:639:53: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/setup.c:639:53:    expected int *pnr_map
      arch/x86/kernel/setup.c:639:53:    got unsigned int extern [toplevel] *<noident>
      arch/x86/kernel/setup.c:820:54: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/setup.c:820:54:    expected int *pnr_map
      arch/x86/kernel/setup.c:820:54:    got unsigned int extern [toplevel] *<noident>
      
      arch/x86/kernel/e820.c:670:53: warning: incorrect type in argument 3 (different signedness)
      arch/x86/kernel/e820.c:670:53:    expected int *pnr_map
      arch/x86/kernel/e820.c:670:53:    got unsigned int [toplevel] *<noident>
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      ba639039
  11. 20 3月, 2009 2 次提交
    • V
      x86, CPA: Add set_pages_arrayuc and set_pages_array_wb · 0f350755
      venkatesh.pallipadi@intel.com 提交于
      Add new interfaces:
      
        set_pages_array_uc()
        set_pages_array_wb()
      
      that can be used change the page attribute for a bunch of pages with
      flush etc done once at the end of all the changes. These interfaces
      are similar to existing set_memory_array_uc() and set_memory_array_wc().
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: arjan@infradead.org
      Cc: eric@anholt.net
      Cc: airlied@redhat.com
      LKML-Reference: <20090319215358.901545000@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f350755
    • M
      PCI/MSI: Use #ifdefs instead of weak functions · 11df1f05
      Michael Ellerman 提交于
      Weak functions aren't all they're cracked up to be. They lead to
      incorrect binaries with some toolchains, they require us to have empty
      functions we otherwise wouldn't, and the unused code is not elided
      (as of gcc 4.3.2 anyway).
      
      So replace the weak MSI arch hooks with the #define foo foo idiom. We no
      longer need empty versions of arch_setup/teardown_msi_irq().
      
      This is less source (by 1 line!), and results in smaller binaries too:
      
         text	   data	    bss	    dec	    hex	filename
      9354300	1693916	 678424	11726640 b2ef30	build/powerpc/vmlinux-before
      9354052	1693852	 678424	11726328 b2edf8	build/powerpc/vmlinux-after
      
      Also smaller on x86_64 and arm (iop13xx).
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      11df1f05
  12. 19 3月, 2009 1 次提交
  13. 18 3月, 2009 5 次提交
    • S
      x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths · ce4e240c
      Suresh Siddha 提交于
      Impact: optimize APIC IPI related barriers
      
      Uncached MMIO accesses for xapic are inherently serializing and hence
      we don't need explicit barriers for xapic IPI paths.
      
      x2apic MSR writes/reads don't have serializing semantics and hence need
      a serializing instruction or mfence, to make all the previous memory
      stores globally visisble before the x2apic msr write for IPI.
      
      Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "steiner@sgi.com" <steiner@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce4e240c
    • S
      x86, ioapic: Fix non atomic allocation with interrupts disabled · 05c3dc2c
      Suresh Siddha 提交于
      Impact: fix possible race
      
      save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
      called with interrupts disabled. Fix this by splitting this into two different
      function. Allocation part save_IO_APIC_setup() now happens before
      disabling interrupts.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      05c3dc2c
    • S
      x86, x2apic: cleanup the IO-APIC level migration with interrupt-remapping · 0280f7c4
      Suresh Siddha 提交于
      Impact: simplification
      
      In the current code, for level triggered migration, we need to modify the
      io-apic RTE with the update vector information, along with modifying interrupt
      remapping table entry(IRTE) with vector and destination. This is to ensure that
      remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.
      
      With this patch, for level triggered, we eliminate the io-apic RTE modification
      (with the updated vector information), by using a virtual vector (io-apic pin
      number).  Real vector that is used for interrupting cpu will be coming from
      the interrupt-remapping table entry. Trigger mode in the IRTE will always be
      edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
      So a level triggered interrupt will appear as an edge to the local apic
      cpu but still as level to the IO-APIC.
      
      With this change, level irq migration can be done by simply modifying
      the interrupt-remapping table entry with out changing the io-apic RTE.
      And as the interrupt appears as edge at the cpu, in addition to do the
      local apic EOI, we need to do IO-APIC directed EOI to clear the remote
      IRR bit in  the IO-APIC RTE.
      
      This simplies the irq migration in the presence of interrupt-remapping.
      Idea-by: NRajesh Sankaran <rajesh.sankaran@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0280f7c4
    • S
      x86, x2apic: fix clear_local_APIC() in the presence of x2apic · cf6567fe
      Suresh Siddha 提交于
      Impact: cleanup, paranoia
      
      We were not clearing the local APIC in clear_local_APIC() in the
      presence of x2apic. Fix it.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cf6567fe
    • S
      x86, x2apic: enable fault handling for intr-remapping · 9d783ba0
      Suresh Siddha 提交于
      Impact: interface augmentation (not yet used)
      
      Enable fault handling flow for intr-remapping aswell. Fault handling
      code now shared by both dma-remapping and intr-remapping.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9d783ba0