1. 08 8月, 2017 1 次提交
    • L
      KVM: add spinlock optimization framework · 199b5763
      Longpeng(Mike) 提交于
      If a vcpu exits due to request a user mode spinlock, then
      the spinlock-holder may be preempted in user mode or kernel mode.
      (Note that not all architectures trap spin loops in user mode,
      only AMD x86 and ARM/ARM64 currently do).
      
      But if a vcpu exits in kernel mode, then the holder must be
      preempted in kernel mode, so we should choose a vcpu in kernel mode
      as a more likely candidate for the lock holder.
      
      This introduces kvm_arch_vcpu_in_kernel() to decide whether the
      vcpu is in kernel-mode when it's preempted.  kvm_vcpu_on_spin's
      new argument says the same of the spinning VCPU.
      Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      199b5763
  2. 25 7月, 2017 1 次提交
    • C
      KVM: s390: take srcu lock when getting/setting storage keys · 4f899147
      Christian Borntraeger 提交于
      The following warning was triggered by missing srcu locks around
      the storage key handling functions.
      
      =============================
      WARNING: suspicious RCU usage
      4.12.0+ #56 Not tainted
      -----------------------------
      ./include/linux/kvm_host.h:572 suspicious rcu_dereference_check() usage!
      rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by live_migration/4936:
        #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
      kvm_arch_vm_ioctl+0x6b8/0x22d0
      
       CPU: 8 PID: 4936 Comm: live_migration Not tainted 4.12.0+ #56
       Hardware name: IBM 2964 NC9 704 (LPAR)
       Call Trace:
       ([<000000000011378a>] show_stack+0xea/0xf0)
        [<000000000055cc4c>] dump_stack+0x94/0xd8
        [<000000000012ee70>] gfn_to_memslot+0x1a0/0x1b8
        [<0000000000130b76>] gfn_to_hva+0x2e/0x48
        [<0000000000141c3c>] kvm_arch_vm_ioctl+0x714/0x22d0
        [<000000000013306c>] kvm_vm_ioctl+0x11c/0x7b8
        [<000000000037e2c0>] do_vfs_ioctl+0xa8/0x6c8
        [<000000000037e984>] SyS_ioctl+0xa4/0xb8
        [<00000000008b20a4>] system_call+0xc4/0x27c
       1 lock held by live_migration/4936:
        #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
      kvm_arch_vm_ioctl+0x6b8/0x22d0
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
      4f899147
  3. 28 6月, 2017 1 次提交
    • Q
      KVM: s390: Inject machine check into the guest · 4d62fcc0
      QingFeng Hao 提交于
      If the exit flag of SIE indicates that a machine check has happened
      during guest's running and needs to be injected, inject it to the guest
      accordingly.
      But some machine checks, e.g. Channel Report Pending (CRW), refer to
      host conditions only (the guest's channel devices are not managed by
      the kernel directly) and are therefore not injected into the guest.
      External Damage (ED) is also not reinjected into the guest because ETR
      conditions are gone in Linux and STP conditions are not enabled in the
      guest, and ED contains only these 8 ETR and STP conditions.
      In general, instruction-processing damage, system recovery,
      storage error, service-processor damage and channel subsystem damage
      will be reinjected into the guest, and the remain (System damage,
      timing-facility damage, warning, ED and CRW) will be handled on the host.
      Signed-off-by: NQingFeng Hao <haoqf@linux.vnet.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      4d62fcc0
  4. 27 6月, 2017 1 次提交
    • Q
      KVM: s390: Backup the guest's machine check info · da72ca4d
      QingFeng Hao 提交于
      When a machine check happens in the guest, related mcck info (mcic,
      external damage code, ...) is stored in the vcpu's lowcore on the host.
      Then the machine check handler's low-level part is executed, followed
      by the high-level part.
      
      If the high-level part's execution is interrupted by a new machine check
      happening on the same vcpu on the host, the mcck info in the lowcore is
      overwritten with the new machine check's data.
      
      If the high-level part's execution is scheduled to a different cpu,
      the mcck info in the lowcore is uncertain.
      
      Therefore, for both cases, the further reinjection to the guest will use
      the wrong data.
      Let's backup the mcck info in the lowcore to the sie page
      for further reinjection, so that the right data will be used.
      
      Add new member into struct sie_page to store related machine check's
      info of mcic, failing storage address and external damage code.
      Signed-off-by: NQingFeng Hao <haoqf@linux.vnet.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      da72ca4d
  5. 22 6月, 2017 2 次提交
  6. 04 6月, 2017 1 次提交
  7. 01 6月, 2017 1 次提交
  8. 09 5月, 2017 1 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
  9. 27 4月, 2017 1 次提交
  10. 26 4月, 2017 2 次提交
  11. 25 4月, 2017 1 次提交
  12. 21 4月, 2017 1 次提交
  13. 07 4月, 2017 1 次提交
  14. 06 4月, 2017 1 次提交
    • F
      KVM: s390: introduce ais mode modify function · 51978393
      Fei Li 提交于
      Provide an interface for userspace to modify AIS
      (adapter-interruption-suppression) mode state, and add documentation
      for the interface. Allowed target modes are ALL-Interruptions mode
      and SINGLE-Interruption mode.
      
      We introduce the 'simm' and 'nimm' fields in kvm_s390_float_interrupt
      to store interruption modes for each ISC. Each bit in 'simm' and
      'nimm' targets to one ISC, and collaboratively indicate three modes:
      ALL-Interruptions, SINGLE-Interruption and NO-Interruptions. This
      interface can initiate most transitions between the states; transition
      from SINGLE-Interruption to NO-Interruptions via adapter interrupt
      injection will be introduced in a following patch. The meaningful
      combinations are as follows:
      
          interruption mode | simm bit | nimm bit
          ------------------|----------|----------
                   ALL      |    0     |     0
                 SINGLE     |    1     |     0
                   NO       |    1     |     1
      
      Besides, add tracepoint to track AIS mode transitions.
      Co-Authored-By: NYi Min Zhao <zyimin@linux.vnet.ibm.com>
      Signed-off-by: NYi Min Zhao <zyimin@linux.vnet.ibm.com>
      Signed-off-by: NFei Li <sherrylf@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      51978393
  15. 23 3月, 2017 1 次提交
  16. 16 3月, 2017 2 次提交
  17. 02 3月, 2017 1 次提交
  18. 17 2月, 2017 2 次提交
    • P
      KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1
      Paolo Bonzini 提交于
      The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
      a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
      to a dummy signal handler; by blocking the signal outside KVM_RUN and
      unblocking it inside, this possible race is closed:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
              check flag
                                                set flag
                                                raise signal
              (signal handler does nothing)
              KVM_RUN
      
      However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
      tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
      remote NUMA node, because it is on the node of a thread's creator.
      Taking this lock can be very expensive if there are many userspace
      exits (as is the case for SMP Windows VMs without Hyper-V reference
      time counter).
      
      As an alternative, we can put the flag directly in kvm_run so that
      KVM can see it:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
                                                raise signal
              signal handler
                set run->immediate_exit
              KVM_RUN
                check run->immediate_exit
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      460df4c1
    • P
      s390: Audit and remove any remaining unnecessary uses of module.h · d3217967
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each change instance
      for the presence of either and replace as needed.  An instance
      where module_param was used without moduleparam.h was also fixed,
      as well as implicit use of ptrace.h and string.h headers.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d3217967
  19. 06 2月, 2017 1 次提交
  20. 30 1月, 2017 6 次提交
  21. 20 1月, 2017 1 次提交
    • C
      KVM: s390: do not expose random data via facility bitmap · 04478197
      Christian Borntraeger 提交于
      kvm_s390_get_machine() populates the facility bitmap by copying bytes
      from the host results that are stored in a 256 byte array in the prefix
      page. The KVM code does use the size of the target buffer (2k), thus
      copying and exposing unrelated kernel memory (mostly machine check
      related logout data).
      
      Let's use the size of the source buffer instead.  This is ok, as the
      target buffer will always be greater or equal than the source buffer as
      the KVM internal buffers (and thus S390_ARCH_FAC_LIST_SIZE_BYTE) cover
      the maximum possible size that is allowed by STFLE, which is 256
      doublewords. All structures are zero allocated so we can leave bytes
      256-2047 unchanged.
      
      Add a similar fix for kvm_arch_init_vm().
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      [found with smatch]
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      CC: stable@vger.kernel.org
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      04478197
  22. 23 11月, 2016 2 次提交
    • C
      KVM: s390: handle floating point registers in the run ioctl not in vcpu_put/load · e1788bb9
      Christian Borntraeger 提交于
      Right now we switch the host fprs/vrs in kvm_arch_vcpu_load and switch
      back in kvm_arch_vcpu_put. This process is already optimized
      since commit 9977e886 ("s390/kernel: lazy restore fpu registers")
      avoiding double save/restores on schedule. We still reload the pointers
      and test the guest fpc on each context switch, though.
      
      We can minimize the cost of vcpu_load/put by doing the test in the
      VCPU_RUN ioctl itself. As most VCPU threads almost never exit to
      userspace in the common fast path, this allows to avoid this overhead
      for the common case (eventfd driven I/O, all exits including sleep
      handled in the kernel) - making kvm_arch_vcpu_load/put basically
      disappear in perf top.
      
      Also adapt the fpu get/set ioctls.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      e1788bb9
    • C
      KVM: s390: handle access registers in the run ioctl not in vcpu_put/load · 31d8b8d4
      Christian Borntraeger 提交于
      Right now we save the host access registers in kvm_arch_vcpu_load
      and load them in kvm_arch_vcpu_put. Vice versa for the guest access
      registers. On schedule this means, that we load/save access registers
      multiple times.
      
      e.g. VCPU_RUN with just one reschedule and then return does
      
      [from user space via VCPU_RUN]
      - save the host registers in kvm_arch_vcpu_load (via ioctl)
      - load the guest registers in kvm_arch_vcpu_load (via ioctl)
      - do guest stuff
      - decide to schedule/sleep
      - save the guest registers in kvm_arch_vcpu_put (via sched)
      - load the host registers in kvm_arch_vcpu_put (via sched)
      - save the host registers in switch_to (via sched)
      - schedule
      - return
      - load the host registers in switch_to (via sched)
      - save the host registers in kvm_arch_vcpu_load (via sched)
      - load the guest registers in kvm_arch_vcpu_load (via sched)
      - do guest stuff
      - decide to go to userspace
      - save the guest registers in kvm_arch_vcpu_put (via ioctl)
      - load the host registers in kvm_arch_vcpu_put (via ioctl)
      [back to user space]
      
      As the kernel does not use access registers, we can avoid
      this reloading and simply piggy back on switch_to (let it save
      the guest values instead of host values in thread.acrs) by
      moving the host/guest switch into the VCPU_RUN ioctl function.
      We now do
      
      [from user space via VCPU_RUN]
      - save the host registers in kvm_arch_vcpu_ioctl_run
      - load the guest registers in kvm_arch_vcpu_ioctl_run
      - do guest stuff
      - decide to schedule/sleep
      - save the guest registers in switch_to
      - schedule
      - return
      - load the guest registers in switch_to (via sched)
      - do guest stuff
      - decide to go to userspace
      - save the guest registers in kvm_arch_vcpu_ioctl_run
      - load the host registers in kvm_arch_vcpu_ioctl_run
      
      This seems to save about 10% of the vcpu_put/load functions
      according to perf.
      
      As vcpu_load no longer switches the acrs, We can also loading
      the acrs in kvm_arch_vcpu_ioctl_set_sregs.
      Suggested-by: NFan Zhang <zhangfan@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      31d8b8d4
  23. 16 9月, 2016 1 次提交
  24. 08 9月, 2016 3 次提交
  25. 29 8月, 2016 1 次提交
  26. 26 8月, 2016 1 次提交
  27. 25 8月, 2016 1 次提交
  28. 12 8月, 2016 1 次提交
    • J
      KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed · aca411a4
      Julius Niedworok 提交于
      When triggering KVM_RUN without a user memory region being mapped
      (KVM_SET_USER_MEMORY_REGION) a validity intercept occurs. This could
      happen, if the user memory region was not mapped initially or if it
      was unmapped after the vcpu is initialized. The function
      kvm_s390_handle_requests checks for the KVM_REQ_MMU_RELOAD bit. The
      check function always clears this bit. If gmap_mprotect_notify
      returns an error code, the mapping failed, but the KVM_REQ_MMU_RELOAD
      was not set anymore. So the next time kvm_s390_handle_requests is
      called, the execution would fall trough the check for
      KVM_REQ_MMU_RELOAD. The bit needs to be resetted, if
      gmap_mprotect_notify returns an error code. Resetting the bit with
      kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu) fixes the bug.
      Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NJulius Niedworok <jniedwor@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      aca411a4