1. 10 6月, 2016 4 次提交
  2. 13 5月, 2016 2 次提交
    • C
      KVM: s390: set halt polling to 80 microseconds · c4a8de35
      Christian Borntraeger 提交于
      on s390 we disabled the halt polling with commit 920552b2
      ("KVM: disable halt_poll_ns as default for s390x"), as floating
      interrupts would let all CPUs have a successful poll, resulting
      in much higher CPU usage (on otherwise idle systems).
      
      With the improved selection of polls we can now retry halt polling.
      Performance measurements with different choices like 25,50,80,100,200
      microseconds showed that 80 microseconds seems to improve several cases
      without increasing the CPU costs too much. Higher values would improve
      the performance even more but increased the cpu time as well.
      So let's start small and use this value of 80 microseconds on s390 until
      we have a better understanding of cost/benefit of higher values.
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c4a8de35
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  3. 09 5月, 2016 1 次提交
  4. 08 3月, 2016 3 次提交
    • D
      KVM: s390: allocate only one DMA page per VM · c54f0d6a
      David Hildenbrand 提交于
      We can fit the 2k for the STFLE interpretation and the crypto
      control block into one DMA page. As we now only have to allocate
      one DMA page, we can clean up the code a bit.
      
      As a nice side effect, this also fixes a problem with crycbd alignment in
      case special allocation debug options are enabled, debugged by Sascha
      Silbe.
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      c54f0d6a
    • D
      KVM: s390: protect VCPU cpu timer with a seqcount · 9c23a131
      David Hildenbrand 提交于
      For now, only the owning VCPU thread (that has loaded the VCPU) can get a
      consistent cpu timer value when calculating the delta. However, other
      threads might also be interested in a more recent, consistent value. Of
      special interest will be the timer callback of a VCPU that executes without
      having the VCPU loaded and could run in parallel with the VCPU thread.
      
      The cpu timer has a nice property: it is only updated by the owning VCPU
      thread. And speaking about accounting, a consistent value can only be
      calculated by looking at cputm_start and the cpu timer itself in
      one shot, otherwise the result might be wrong.
      
      As we only have one writing thread at a time (owning VCPU thread), we can
      use a seqcount instead of a seqlock and retry if the VCPU refreshed its
      cpu timer. This avoids any heavy locking and only introduces a counter
      update/check plus a handful of smp_wmb().
      
      The owning VCPU thread should never have to retry on reads, and also for
      other threads this might be a very rare scenario.
      
      Please note that we have to use the raw_* variants for locking the seqcount
      as lockdep will produce false warnings otherwise. The rq->lock held during
      vcpu_load/put is also acquired from hardirq context. Lockdep cannot know
      that we avoid potential deadlocks by disabling preemption and thereby
      disable concurrent write locking attempts (via vcpu_put/load).
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      9c23a131
    • D
      KVM: s390: step VCPU cpu timer during kvm_run ioctl · db0758b2
      David Hildenbrand 提交于
      Architecturally we should only provide steal time if we are scheduled
      away, and not if the host interprets a guest exit. We have to step
      the guest CPU timer in these cases.
      
      In the first shot, we will step the VCPU timer only during the kvm_run
      ioctl. Therefore all time spent e.g. in interception handlers or on irq
      delivery will be accounted for that VCPU.
      
      We have to take care of a few special cases:
      - Other VCPUs can test for pending irqs. We can only report a consistent
        value for the VCPU thread itself when adding the delta.
      - We have to take care of STP sync, therefore we have to extend
        kvm_clock_sync() and disable preemption accordingly
      - During any call to disable/enable/start/stop we could get premeempted
        and therefore get start/stop calls. Therefore we have to make sure we
        don't get into an inconsistent state.
      
      Whenever a VCPU is scheduled out, sleeping, in user space or just about
      to enter the SIE, the guest cpu timer isn't stepped.
      
      Please note that all primitives are prepared to be called from both
      environments (cpu timer accounting enabled or not), although not completely
      used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called
      while cpu timer accounting is enabled).
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      db0758b2
  5. 25 2月, 2016 1 次提交
    • M
      KVM: Use simple waitqueue for vcpu->wq · 8577370f
      Marcelo Tosatti 提交于
      The problem:
      
      On -rt, an emulated LAPIC timer instances has the following path:
      
      1) hard interrupt
      2) ksoftirqd is scheduled
      3) ksoftirqd wakes up vcpu thread
      4) vcpu thread is scheduled
      
      This extra context switch introduces unnecessary latency in the
      LAPIC path for a KVM guest.
      
      The solution:
      
      Allow waking up vcpu thread from hardirq context,
      thus avoiding the need for ksoftirqd to be scheduled.
      
      Normal waitqueues make use of spinlocks, which on -RT
      are sleepable locks. Therefore, waking up a waitqueue
      waiter involves locking a sleeping lock, which
      is not allowed from hard interrupt context.
      
      cyclictest command line:
      
      This patch reduces the average latency in my tests from 14us to 11us.
      
      Daniel writes:
      Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
      benchmark on mainline. The test was run 1000 times on
      tip/sched/core 4.4.0-rc8-01134-g0905f04e:
      
        ./x86-run x86/tscdeadline_latency.flat -cpu host
      
      with idle=poll.
      
      The test seems not to deliver really stable numbers though most of
      them are smaller. Paolo write:
      
      "Anything above ~10000 cycles means that the host went to C1 or
      lower---the number means more or less nothing in that case.
      
      The mean shows an improvement indeed."
      
      Before:
      
                     min             max         mean           std
      count  1000.000000     1000.000000  1000.000000   1000.000000
      mean   5162.596000  2019270.084000  5824.491541  20681.645558
      std      75.431231   622607.723969    89.575700   6492.272062
      min    4466.000000    23928.000000  5537.926500    585.864966
      25%    5163.000000  1613252.750000  5790.132275  16683.745433
      50%    5175.000000  2281919.000000  5834.654000  23151.990026
      75%    5190.000000  2382865.750000  5861.412950  24148.206168
      max    5228.000000  4175158.000000  6254.827300  46481.048691
      
      After
                     min            max         mean           std
      count  1000.000000     1000.00000  1000.000000   1000.000000
      mean   5143.511000  2076886.10300  5813.312474  21207.357565
      std      77.668322   610413.09583    86.541500   6331.915127
      min    4427.000000    25103.00000  5529.756600    559.187707
      25%    5148.000000  1691272.75000  5784.889825  17473.518244
      50%    5160.000000  2308328.50000  5832.025000  23464.837068
      75%    5172.000000  2393037.75000  5853.177675  24223.969976
      max    5222.000000  3922458.00000  6186.720500  42520.379830
      
      [Patch was originaly based on the swait implementation found in the -rt
       tree. Daniel ported it to mainline's version and gathered the
       benchmark numbers for tscdeadline_latency test.]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8577370f
  6. 10 2月, 2016 1 次提交
  7. 26 1月, 2016 1 次提交
    • D
      KVM: s390: fix memory overwrites when vx is disabled · 9abc2a08
      David Hildenbrand 提交于
      The kernel now always uses vector registers when available, however KVM
      has special logic if support is really enabled for a guest. If support
      is disabled, guest_fpregs.fregs will only contain memory for the fpu.
      The kernel, however, will store vector registers into that area,
      resulting in crazy memory overwrites.
      
      Simply extending that area is not enough, because the format of the
      registers also changes. We would have to do additional conversions, making
      the code even more complex. Therefore let's directly use one place for
      the vector/fpu registers + fpc (in kvm_run). We just have to convert the
      data properly when accessing it. This makes current code much easier.
      
      Please note that vector/fpu registers are now always stored to
      vcpu->run->s.regs.vrs. Although this data is visible to QEMU and
      used for migration, we only guarantee valid values to user space  when
      KVM_SYNC_VRS is set. As that is only the case when we have vector
      register support, we are on the safe side.
      
      Fixes: b5510d9b ("s390/fpu: always enable the vector facility if it is available")
      Cc: stable@vger.kernel.org # v4.4 d9a3a09a s390/kvm: remove dependency on struct save_area definition
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      [adopt to d9a3a09a]
      9abc2a08
  8. 09 1月, 2016 1 次提交
  9. 07 1月, 2016 1 次提交
  10. 16 12月, 2015 1 次提交
  11. 30 11月, 2015 4 次提交
  12. 23 10月, 2015 1 次提交
  13. 16 10月, 2015 1 次提交
  14. 25 9月, 2015 1 次提交
  15. 16 9月, 2015 1 次提交
    • P
      KVM: add halt_attempted_poll to VCPU stats · 62bea5bf
      Paolo Bonzini 提交于
      This new statistic can help diagnosing VCPUs that, for any reason,
      trigger bad behavior of halt_poll_ns autotuning.
      
      For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
      like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
      10+20+40+80+160+320+480 = 1110 microseconds out of every
      479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
      is consuming about 30% more CPU than it would use without
      polling.  This would show as an abnormally high number of
      attempted polling compared to the successful polls.
      
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62bea5bf
  16. 29 7月, 2015 2 次提交
  17. 22 7月, 2015 1 次提交
    • H
      s390/kernel: lazy restore fpu registers · 9977e886
      Hendrik Brueckner 提交于
      Improve the save and restore behavior of FPU register contents to use the
      vector extension within the kernel.
      
      The kernel does not use floating-point or vector registers and, therefore,
      saving and restoring the FPU register contents are performed for handling
      signals or switching processes only.  To prepare for using vector
      instructions and vector registers within the kernel, enhance the save
      behavior and implement a lazy restore at return to user space from a
      system call or interrupt.
      
      To implement the lazy restore, the save_fpu_regs() sets a CPU information
      flag, CIF_FPU, to indicate that the FPU registers must be restored.
      Saving and setting CIF_FPU is performed in an atomic fashion to be
      interrupt-safe.  When the kernel wants to use the vector extension or
      wants to change the FPU register state for a task during signal handling,
      the save_fpu_regs() must be called first.  The CIF_FPU flag is also set at
      process switch.  At return to user space, the FPU state is restored.  In
      particular, the FPU state includes the floating-point or vector register
      contents, as well as, vector-enablement and floating-point control.  The
      FPU state restore and clearing CIF_FPU is also performed in an atomic
      fashion.
      
      For KVM, the restore of the FPU register state is performed when restoring
      the general-purpose guest registers before the SIE instructions is started.
      Because the path towards the SIE instruction is interruptible, the CIF_FPU
      flag must be checked again right before going into SIE.  If set, the guest
      registers must be reloaded again by re-entering the outer SIE loop.  This
      is the same behavior as if the SIE critical section is interrupted.
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      9977e886
  18. 26 5月, 2015 1 次提交
  19. 08 5月, 2015 2 次提交
  20. 01 4月, 2015 1 次提交
    • J
      KVM: s390: deliver floating interrupts in order of priority · 6d3da241
      Jens Freimann 提交于
      This patch makes interrupt handling compliant to the z/Architecture
      Principles of Operation with regard to interrupt priorities.
      
      Add a bitmap for pending floating interrupts. Each bit relates to a
      interrupt type and its list. A turned on bit indicates that a list
      contains items (interrupts) which need to be delivered.  When delivering
      interrupts on a cpu we can merge the existing bitmap for cpu-local
      interrupts and floating interrupts and have a single mechanism for
      delivery.
      Currently we have one list for all kinds of floating interrupts and a
      corresponding spin lock. This patch adds a separate list per
      interrupt type. An exception to this are service signal and machine check
      interrupts, as there can be only one pending interrupt at a time.
      Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      6d3da241
  21. 17 3月, 2015 2 次提交
  22. 06 3月, 2015 4 次提交
  23. 04 3月, 2015 1 次提交
    • M
      KVM: s390: include guest facilities in kvm facility test · 981467c9
      Michael Mueller 提交于
      Most facility related decisions in KVM have to take into account:
      
      - the facilities offered by the underlying run container (LPAR/VM)
      - the facilities supported by the KVM code itself
      - the facilities requested by a guest VM
      
      This patch adds the KVM driver requested facilities to the test routine.
      
      It additionally renames struct s390_model_fac to kvm_s390_fac and its field
      names to be more meaningful.
      
      The semantics of the facilities stored in the KVM architecture structure
      is changed. The address arch.model.fac->list now points to the guest
      facility list and arch.model.fac->mask points to the KVM facility mask.
      
      This patch fixes the behaviour of KVM for some facilities for guests
      that ignore the guest visible facility bits, e.g. guests could use
      transactional memory intructions on hosts supporting them even if the
      chosen cpu model would not offer them.
      
      The userspace interface is not affected by this change.
      Signed-off-by: NMichael Mueller <mimu@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      981467c9
  24. 09 2月, 2015 2 次提交