1. 15 1月, 2010 1 次提交
    • M
      vhost_net: a kernel-level virtio server · 3a4d5c94
      Michael S. Tsirkin 提交于
      What it is: vhost net is a character device that can be used to reduce
      the number of system calls involved in virtio networking.
      Existing virtio net code is used in the guest without modification.
      
      There's similarity with vringfd, with some differences and reduced scope
      - uses eventfd for signalling
      - structures can be moved around in memory at any time (good for
        migration, bug work-arounds in userspace)
      - write logging is supported (good for migration)
      - support memory table and not just an offset (needed for kvm)
      
      common virtio related code has been put in a separate file vhost.c and
      can be made into a separate module if/when more backends appear.  I used
      Rusty's lguest.c as the source for developing this part : this supplied
      me with witty comments I wouldn't be able to write myself.
      
      What it is not: vhost net is not a bus, and not a generic new system
      call. No assumptions are made on how guest performs hypercalls.
      Userspace hypervisors are supported as well as kvm.
      
      How it works: Basically, we connect virtio frontend (configured by
      userspace) to a backend. The backend could be a network device, or a tap
      device.  Backend is also configured by userspace, including vlan/mac
      etc.
      
      Status: This works for me, and I haven't see any crashes.
      Compared to userspace, people reported improved latency (as I save up to
      4 system calls per packet), as well as better bandwidth and CPU
      utilization.
      
      Features that I plan to look at in the future:
      - mergeable buffers
      - zero copy
      - scalability tuning: figure out the best threading model to use
      
      Note on RCU usage (this is also documented in vhost.h, near
      private_pointer which is the value protected by this variant of RCU):
      what is happening is that the rcu_dereference() is being used in a
      workqueue item.  The role of rcu_read_lock() is taken on by the start of
      execution of the workqueue item, of rcu_read_unlock() by the end of
      execution of the workqueue item, and of synchronize_rcu() by
      flush_workqueue()/flush_work(). In the future we might need to apply
      some gcc attribute or sparse annotation to the function passed to
      INIT_WORK(). Paul's ack below is for this RCU usage.
      
      (Includes fixes by Alan Cox <alan@linux.intel.com>,
      David L Stevens <dlstevens@us.ibm.com>,
      Chris Wright <chrisw@redhat.com>)
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a4d5c94
  2. 07 12月, 2009 1 次提交
    • M
      [S390] Improve address space mode selection. · b11b5334
      Martin Schwidefsky 提交于
      Introduce user_mode to replace the two variables switch_amode and
      s390_noexec. There are three valid combinations of the old values:
        1) switch_amode == 0 && s390_noexec == 0
        2) switch_amode == 1 && s390_noexec == 0
        3) switch_amode == 1 && s390_noexec == 1
      They get replaced by
        1) user_mode == HOME_SPACE_MODE
        2) user_mode == PRIMARY_SPACE_MODE
        3) user_mode == SECONDARY_SPACE_MODE
      The new kernel parameter user_mode=[primary,secondary,home] lets
      you choose the address space mode the user space processes should
      use. In addition the CONFIG_S390_SWITCH_AMODE config option
      is removed.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b11b5334
  3. 03 12月, 2009 4 次提交
    • C
      KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c · f50146bd
      Carsten Otte 提交于
      This patch corrects the checking of the new address for the prefix register.
      On s390, the prefix register is used to address the cpu's lowcore (address
      0...8k). This check is supposed to verify that the memory is readable and
      present.
      copy_from_guest is a helper function, that can be used to read from guest
      memory. It applies prefixing, adds the start address of the guest memory in
      user, and then calls copy_from_user. Previous code was obviously broken for
      two reasons:
      - prefixing should not be applied here. The current prefix register is
        going to be updated soon, and the address we're looking for will be
        0..8k after we've updated the register
      - we're adding the guest origin (gmsor) twice: once in subject code
        and once in copy_from_guest
      
      With kuli, we did not hit this problem because (a) we were lucky with
      previous prefix register content, and (b) our guest memory was mmaped
      very low into user address space.
      
      Cc: stable@kernel.org
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Reported-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f50146bd
    • C
      KVM: s390: Make psw available on all exits, not just a subset · d7b0b5eb
      Carsten Otte 提交于
      This patch moves s390 processor status word into the base kvm_run
      struct and keeps it up-to date on all userspace exits.
      
      The userspace ABI is broken by this, however there are no applications
      in the wild using this.  A capability check is provided so users can
      verify the updated API exists.
      
      Cc: stable@kernel.org
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d7b0b5eb
    • A
      KVM: Activate Virtualization On Demand · 10474ae8
      Alexander Graf 提交于
      X86 CPUs need to have some magic happening to enable the virtualization
      extensions on them. This magic can result in unpleasant results for
      users, like blocking other VMMs from working (vmx) or using invalid TLB
      entries (svm).
      
      Currently KVM activates virtualization when the respective kernel module
      is loaded. This blocks us from autoloading KVM modules without breaking
      other VMMs.
      
      To circumvent this problem at least a bit, this patch introduces on
      demand activation of virtualization. This means, that instead
      virtualization is enabled on creation of the first virtual machine
      and disabled on destruction of the last one.
      
      So using this, KVM can be easily autoloaded, while keeping other
      hypervisors usable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      10474ae8
    • A
      KVM: Return -ENOTTY on unrecognized ioctls · 367e1319
      Avi Kivity 提交于
      Not the incorrect -EINVAL.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      367e1319
  4. 04 10月, 2009 1 次提交
  5. 21 9月, 2009 1 次提交
  6. 10 9月, 2009 7 次提交
  7. 07 8月, 2009 1 次提交
  8. 05 8月, 2009 1 次提交
    • C
      KVM: s390: fix wait_queue handling · d3bc2f91
      Christian Borntraeger 提交于
      There are two waitqueues in kvm for wait handling:
      vcpu->wq for virt/kvm/kvm_main.c and
      vpcu->arch.local_int.wq for the s390 specific wait code.
      
      the wait handling in kvm_s390_handle_wait was broken by using different
      wait_queues for add_wait queue and remove_wait_queue.
      
      There are two options to fix the problem:
      o  move all the s390 specific code to vcpu->wq and remove
         vcpu->arch.local_int.wq
      o  move all the s390 specific code to vcpu->arch.local_int.wq
      
      This patch chooses the 2nd variant for two reasons:
      o  s390 does not use kvm_vcpu_block but implements its own enabled wait
         handling.
         Having a separate wait_queue make it clear, that our wait mechanism is
         different
      o  the patch is much smaller
      Report-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d3bc2f91
  9. 28 6月, 2009 1 次提交
    • C
      KVM: s390: Allow stfle instruction in the guest · ef50f7ac
      Christian Borntraeger 提交于
      2.6.31-rc introduced an architecture level set checker based on facility
      bits. e.g. if the kernel is compiled to run only on z9, several facility
      bits are checked very early and the kernel refuses to boot if a z9 specific
      facility is missing.
      Until now kvm on s390 did not implement the store facility extended (STFLE)
      instruction. A 2.6.31-rc kernel that was compiled for z9 or higher did not
      boot in kvm. This patch implements stfle.
      
      This patch should go in before 2.6.31.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      ef50f7ac
  10. 12 6月, 2009 1 次提交
  11. 10 6月, 2009 7 次提交
  12. 26 3月, 2009 2 次提交
    • H
      [S390] split/move machine check handler code · f5daba1d
      Heiko Carstens 提交于
      Split machine check handler code and move it to cio and kernel code
      where it belongs to. No functional change.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f5daba1d
    • C
      [S390] Fix hypervisor detection for KVM · 92e6ecf3
      Christian Borntraeger 提交于
      Currently we use the cpuid (via STIDP instruction) to recognize LPAR,
      z/VM and KVM.
      The architecture states, that bit 0-7 of STIDP returns all zero, and
      if STIDP is executed in a virtual machine, the VM operating system
      will replace bits 0-7 with FF.
      
      KVM should not use FE to distinguish z/VM from KVM for interested
      guests. The proper way to detect the hypervisor is the STSI (Store
      System Information) instruction, which return information about the
      hypervisors via function code 3, selector1=2, selector2=2.
      
      This patch changes the detection routine of Linux to use STSI instead
      of STIDP. This detection is earlier than bootmem, we have to use a
      static buffer. Since STSI expects a 4kb block (4kb aligned) this
      patch also changes the init.data alignment for s390. As this section
      will be freed during boot, this should be no problem.
      
      Patch is tested with LPAR, z/VM, KVM on LPAR, and KVM under z/VM.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      92e6ecf3
  13. 24 3月, 2009 5 次提交
    • C
      KVM: s390: Fix SIGP set prefix ioctl · b7e6e4d3
      Christian Borntraeger 提交于
      This patch fixes the SET PREFIX interrupt if triggered by userspace.
      Until now, it was not necessary, but life migration will need it. In
      addition, it helped me creating SMP support for my kvm_crashme tool
      (lets kvm execute random guest memory content).
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b7e6e4d3
    • C
      KVM: s390: Fix problem state check for b2 intercepts · 70455a36
      Christian Borntraeger 提交于
      The kernel handles some priviledged instruction exits. While I was
      unable to trigger such an exit from guest userspace, the code should
      check for supervisor state before emulating a priviledged instruction.
      
      I also renamed kvm_s390_handle_priv to kvm_s390_handle_b2. After all
      there are non priviledged b2 instructions like stck (store clock).
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      70455a36
    • C
      KVM: s390: Fix printk on SIGP set arch · 2c411b48
      Christian Borntraeger 提交于
      KVM on s390 does not support the ESA/390 architecture. We refuse to
      change the architecture mode and print a warning. This patch removes
      the printk for several reasons:
      
      o A malicious guest can flood host dmesg
      o The old message had no newline
      o there is no connection between the message and the failing guest
      
      This patch simply removes the printk. We already set the condition
      code to 3 - the guest knows that something went wrong.
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      2c411b48
    • A
      KVM: Add CONFIG_HAVE_KVM_IRQCHIP · 5d9b8e30
      Avi Kivity 提交于
      Two KVM archs support irqchips and two don't.  Add a Kconfig item to
      make selecting between the two models easier.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      5d9b8e30
    • J
      KVM: New guest debug interface · d0bfb940
      Jan Kiszka 提交于
      This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
      instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
      part, controlling the "main switch" and the single-step feature. The
      arch specific part adds an x86 interface for intercepting both types of
      debug exceptions separately and re-injecting them when the host was not
      interested. Moveover, the foundation for guest debugging via debug
      registers is layed.
      
      To signal breakpoint events properly back to userland, an arch-specific
      data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
      contains the PC, the debug exception, and relevant debug registers to
      tell debug events properly apart.
      
      The availability of this new interface is signaled by
      KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
      provided.
      
      Note that both SVM and VTX are supported, but only the latter was tested
      yet. Based on the experience with all those VTX corner case, I would be
      fairly surprised if SVM will work out of the box.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      d0bfb940
  14. 15 2月, 2009 1 次提交
  15. 09 1月, 2009 1 次提交
  16. 31 12月, 2008 3 次提交
  17. 23 11月, 2008 1 次提交
  18. 15 10月, 2008 1 次提交