1. 21 4月, 2015 21 次提交
    • P
      KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61
      Paul Mackerras 提交于
      This uses msgsnd where possible for signalling other threads within
      the same core on POWER8 systems, rather than IPIs through the XICS
      interrupt controller.  This includes waking secondary threads to run
      the guest, the interrupts generated by the virtual XICS, and the
      interrupts to bring the other threads out of the guest when exiting.
      
      Aggregated statistics from debugfs across vcpus for a guest with 32
      vcpus, 8 threads/vcore, running on a POWER8, show this before the
      change:
      
       rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
        rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
        rm_intr:     1660.0ns (12 - 553050, 3600051 samples)
      
      and this after the change:
      
       rm_entry:     3060.1ns (212 - 65138, 953873 samples)
        rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
        rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)
      
      for a test of booting Fedora 20 big-endian to the login prompt.
      
      The time taken for a H_PROD hcall (which is handled in the host
      kernel) went down from about 35 microseconds to about 16 microseconds
      with this change.
      
      The noinline added to kvmppc_run_core turned out to be necessary for
      good performance, at least with gcc 4.9.2 as packaged with Fedora 21
      and a little-endian POWER8 host.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      66feed61
    • P
      KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb
      Paul Mackerras 提交于
      This replaces the assembler code for kvmhv_commence_exit() with C code
      in book3s_hv_builtin.c.  It also moves the IPI sending code that was
      in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
      can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      eddb60fb
    • P
      KVM: PPC: Book3S HV: Streamline guest entry and exit · 6af27c84
      Paul Mackerras 提交于
      On entry to the guest, secondary threads now wait for the primary to
      switch the MMU after loading up most of their state, rather than before.
      This means that the secondary threads get into the guest sooner, in the
      common case where the secondary threads get to kvmppc_hv_entry before
      the primary thread.
      
      On exit, the first thread out increments the exit count and interrupts
      the other threads (to get them out of the guest) before saving most
      of its state, rather than after.  That means that the other threads
      exit sooner and means that the first thread doesn't spend so much
      time waiting for the other threads at the point where the MMU gets
      switched back to the host.
      
      This pulls out the code that increments the exit count and interrupts
      other threads into a separate function, kvmhv_commence_exit().
      This also makes sure that r12 and vcpu->arch.trap are set correctly
      in some corner cases.
      
      Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
      improvement.  Aggregating across vcpus for a guest with 32 vcpus,
      8 threads/vcore, running on a POWER8, gives this before the change:
      
       rm_entry:     avg 4537.3ns (222 - 48444, 1068878 samples)
        rm_exit:     avg 4787.6ns (152 - 165490, 1010717 samples)
        rm_intr:     avg 1673.6ns (12 - 341304, 3818691 samples)
      
      and this after the change:
      
       rm_entry:     avg 3427.7ns (232 - 68150, 1118921 samples)
        rm_exit:     avg 4716.0ns (12 - 150720, 1119477 samples)
        rm_intr:     avg 1614.8ns (12 - 522436, 3850432 samples)
      
      showing a substantial reduction in the time spent per guest entry in
      the real-mode guest entry code, and smaller reductions in the real
      mode guest exit and interrupt handling times.  (The test was to start
      the guest and boot Fedora 20 big-endian to the login prompt.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6af27c84
    • P
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras 提交于
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7d6c40da
    • P
      KVM: PPC: Book3S HV: Use decrementer to wake napping threads · fd6d53b1
      Paul Mackerras 提交于
      This arranges for threads that are napping due to their vcpu having
      ceded or due to not having a vcpu to wake up at the end of the guest's
      timeslice without having to be poked with an IPI.  We do that by
      arranging for the decrementer to contain a value no greater than the
      number of timebase ticks remaining until the end of the timeslice.
      In the case of a thread with no vcpu, this number is in the hypervisor
      decrementer already.  In the case of a ceded vcpu, we use the smaller
      of the HDEC value and the DEC value.
      
      Using the DEC like this when ceded means we need to save and restore
      the guest decrementer value around the nap.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      fd6d53b1
    • P
      KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI · ccc07772
      Paul Mackerras 提交于
      When running a multi-threaded guest and vcpu 0 in a virtual core
      is not running in the guest (i.e. it is busy elsewhere in the host),
      thread 0 of the physical core will switch the MMU to the guest and
      then go to nap mode in the code at kvm_do_nap.  If the guest sends
      an IPI to thread 0 using the msgsndp instruction, that will wake
      up thread 0 and cause all the threads in the guest to exit to the
      host unnecessarily.  To avoid the unnecessary exit, this arranges
      for the PECEDP bit to be cleared in this situation.  When napping
      due to a H_CEDE from the guest, we still set PECEDP so that the
      thread will wake up on an IPI sent using msgsndp.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ccc07772
    • P
      KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd
      Paul Mackerras 提交于
      We can tell when a secondary thread has finished running a guest by
      the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
      is no real need for the nap_count field in the kvmppc_vcore struct.
      This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
      pointers of the secondary threads rather than polling vc->nap_count.
      Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
      this also means that we can tell which secondary threads have got
      stuck and thus print a more informative error message.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d5b99cd
    • P
      KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu · 25fedfca
      Paul Mackerras 提交于
      Rather than calling cond_resched() in kvmppc_run_core() before doing
      the post-processing for the vcpus that we have just run (that is,
      calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
      that post-processing before calling cond_resched(), and that post-
      processing is moved out into its own function, post_guest_process().
      
      The reschedule point is now in kvmppc_run_vcpu() and we define a new
      vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
      task is runnable but not running.  (Doing the reschedule with the
      vcore in VCORE_INACTIVE state would be bad because there are potentially
      other vcpus waiting for the runner in kvmppc_wait_for_exec() which
      then wouldn't get woken up.)
      
      Also, we make use of the handy cond_resched_lock() function, which
      unlocks and relocks vc->lock for us around the reschedule.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      25fedfca
    • P
      KVM: PPC: Book3S HV: Minor cleanups · 1f09c3ed
      Paul Mackerras 提交于
      * Remove unused kvmppc_vcore::n_busy field.
      * Remove setting of RMOR, since it was only used on PPC970 and the
        PPC970 KVM support has been removed.
      * Don't use r1 or r2 in setting the runlatch since they are
        conventionally reserved for other things; use r0 instead.
      * Streamline the code a little and remove the ext_interrupt_to_host
        label.
      * Add some comments about register usage.
      * hcall_try_real_mode doesn't need to be global, and can't be
        called from C code anyway.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f09c3ed
    • P
      KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update · d911f0be
      Paul Mackerras 提交于
      Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
      update (i.e. one of its 3 virtual processor areas needed to be pinned
      in memory so the host real mode code can update it on guest entry and
      exit), we would drop the vcore lock and do the update there and then.
      Future changes will make it inconvenient to drop the lock, so instead
      we now remove it from the list of runnable VCPUs and wake up its
      VCPU task.  This will have the effect that the VCPU task will exit
      kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
      re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
      to kvmppc_update_vpas() and then rejoin the vcore.
      
      The one complication is that the runner VCPU (whose VCPU task is the
      current task) might be one of the ones that gets removed from the
      runnable list.  In that case we just return from kvmppc_run_core()
      and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
      the runner if necessary.
      
      This all means that the VCORE_STARTING state is no longer used, so we
      remove it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d911f0be
    • P
      KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df
      Paul Mackerras 提交于
      This reads the timebase at various points in the real-mode guest
      entry/exit code and uses that to accumulate total, minimum and
      maximum time spent in those parts of the code.  Currently these
      times are accumulated per vcpu in 5 parts of the code:
      
      * rm_entry - time taken from the start of kvmppc_hv_entry() until
        just before entering the guest.
      * rm_intr - time from when we take a hypervisor interrupt in the
        guest until we either re-enter the guest or decide to exit to the
        host.  This includes time spent handling hcalls in real mode.
      * rm_exit - time from when we decide to exit the guest until the
        return from kvmppc_hv_entry().
      * guest - time spend in the guest
      * cede - time spent napping in real mode due to an H_CEDE hcall
        while other threads in the same vcore are active.
      
      These times are exposed in debugfs in a directory per vcpu that
      contains a file called "timings".  This file contains one line for
      each of the 5 timings above, with the name followed by a colon and
      4 numbers, which are the count (number of times the code has been
      executed), the total time, the minimum time, and the maximum time,
      all in nanoseconds.
      
      The overhead of the extra code amounts to about 30ns for an hcall that
      is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
      production environments may not wish to incur this overhead, the new
      code is conditional on a new config symbol,
      CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b6c295df
    • P
      KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT · e23a808b
      Paul Mackerras 提交于
      This creates a debugfs directory for each HV guest (assuming debugfs
      is enabled in the kernel config), and within that directory, a file
      by which the contents of the guest's HPT (hashed page table) can be
      read.  The directory is named vmnnnn, where nnnn is the PID of the
      process that created the guest.  The file is named "htab".  This is
      intended to help in debugging problems in the host's management
      of guest memory.
      
      The contents of the file consist of a series of lines like this:
      
        3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
      
      The first field is the index of the entry in the HPT, the second and
      third are the HPT entry, so the third entry contains the real page
      number that is mapped by the entry if the entry's valid bit is set.
      The fourth field is the guest's view of the second doubleword of the
      entry, so it contains the guest physical address.  (The format of the
      second through fourth fields are described in the Power ISA and also
      in arch/powerpc/include/asm/mmu-hash64.h.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e23a808b
    • S
      KVM: PPC: Book3S HV: Add ICP real mode counters · 6e0365b7
      Suresh Warrier 提交于
      Add two counters to count how often we generate real-mode ICS resend
      and reject events. The counters provide some performance statistics
      that could be used in the future to consider if the real mode functions
      need further optimizing. The counters are displayed as part of IPC and
      ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.
      
      Also added two counters that count (approximately) how many times we
      don't find an ICP or ICS we're looking for. These are not currently
      exposed through sysfs, but can be useful when debugging crashes.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6e0365b7
    • S
      KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode · b0221556
      Suresh Warrier 提交于
      Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
      to switch to the host to complete the rest of hypercall function in
      virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
      functions to be runnable in hypervisor real mode, thus avoiding the need
      to switch to the host to execute these functions in virtual mode. However,
      the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
      events - these events cannot be done in real mode and they will still need
      a switch to host virtual mode.
      
      There are sufficient differences between the real mode code and the
      virtual mode code for the ICS/ICP resend and reject functions that
      for now the code has been duplicated instead of sharing common code.
      In the future, we can look at creating common functions.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b0221556
    • S
      KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock · 34cb7954
      Suresh Warrier 提交于
      Replaces the ICS mutex lock with a spin lock since we will be porting
      these routines to real mode. Note that we need to disable interrupts
      before we take the lock in anticipation of the fact that on the guest
      side, we are running in the context of a hard irq and interrupts are
      disabled (EE bit off) when the lock is acquired. Again, because we
      will be acquiring the lock in hypervisor real mode, we need to use
      an arch_spinlock_t instead of a normal spinlock here as we want to
      avoid running any lockdep code (which may not be safe to execute in
      real mode).
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      34cb7954
    • S
      KVM: PPC: Book3S HV: Add guest->host real mode completion counters · 878610fe
      Suresh E. Warrier 提交于
      Add counters to track number of times we switch from guest real mode
      to host virtual mode during an interrupt-related hyper call because the
      hypercall requires actions that cannot be completed in real mode. This
      will help when making optimizations that reduce guest-host transitions.
      
      It is safe to use an ordinary increment rather than an atomic operation
      because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
      only works on the ICP for the current VCPU.
      
      The counters are displayed as part of IPC and ICP state provided by
      /sys/debug/kernel/powerpc/kvm* for each VM.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      878610fe
    • A
      KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte · a4bd6eb0
      Aneesh Kumar K.V 提交于
      This adds helper routines for locking and unlocking HPTEs, and uses
      them in the rest of the code.  We don't change any locking rules in
      this patch.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a4bd6eb0
    • A
      KVM: PPC: Book3S HV: Remove RMA-related variables from code · 31037eca
      Aneesh Kumar K.V 提交于
      We don't support real-mode areas now that 970 support is removed.
      Remove the remaining details of rma from the code.  Also rename
      rma_setup_done to hpte_setup_done to better reflect the changes.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      31037eca
    • M
      KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation. · e928e9cb
      Michael Ellerman 提交于
      Some PowerNV systems include a hardware random-number generator.
      This HWRNG is present on POWER7+ and POWER8 chips and is capable of
      generating one 64-bit random number every microsecond.  The random
      numbers are produced by sampling a set of 64 unstable high-frequency
      oscillators and are almost completely entropic.
      
      PAPR defines an H_RANDOM hypercall which guests can use to obtain one
      64-bit random sample from the HWRNG.  This adds a real-mode
      implementation of the H_RANDOM hypercall.  This hypercall was
      implemented in real mode because the latency of reading the HWRNG is
      generally small compared to the latency of a guest exit and entry for
      all the threads in the same virtual core.
      
      Userspace can detect the presence of the HWRNG and the H_RANDOM
      implementation by querying the KVM_CAP_PPC_HWRNG capability.  The
      H_RANDOM hypercall implementation will only be invoked when the guest
      does an H_RANDOM hypercall if userspace first enables the in-kernel
      H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e928e9cb
    • D
      kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM · 99342cf8
      David Gibson 提交于
      On POWER, storage caching is usually configured via the MMU - attributes
      such as cache-inhibited are stored in the TLB and the hashed page table.
      
      This makes correctly performing cache inhibited IO accesses awkward when
      the MMU is turned off (real mode).  Some CPU models provide special
      registers to control the cache attributes of real mode load and stores but
      this is not at all consistent.  This is a problem in particular for SLOF,
      the firmware used on KVM guests, which runs entirely in real mode, but
      which needs to do IO to load the kernel.
      
      To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
      and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
      a logical address (aka guest physical address).  SLOF uses these for IO.
      
      However, because these are implemented within qemu, not the host kernel,
      these bypass any IO devices emulated within KVM itself.  The simplest way
      to see this problem is to attempt to boot a KVM guest from a virtio-blk
      device with iothread / dataplane enabled.  The iothread code relies on an
      in kernel implementation of the virtio queue notification, which is not
      triggered by the IO hcalls, and so the guest will stall in SLOF unable to
      load the guest OS.
      
      This patch addresses this by providing in-kernel implementations of the
      2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
      address not handled by the KVM IO bus will cause a VM exit, hitting the
      qemu implementation as before.
      
      Note that a userspace change is also required, in order to enable these
      new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [agraf: fix compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      99342cf8
    • S
      powerpc: Export __spin_yield · ae75116e
      Suresh E. Warrier 提交于
      Export __spin_yield so that the arch_spin_unlock() function can
      be invoked from a module. This will be required for modules where
      we want to take a lock that is also is acquired in hypervisor
      real mode. Because we want to avoid running any lockdep code
      (which may not be safe in real mode), this lock needs to be
      an arch_spinlock_t instead of a normal spinlock.
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ae75116e
  2. 09 4月, 2015 1 次提交
  3. 08 4月, 2015 1 次提交
  4. 01 4月, 2015 1 次提交
  5. 27 3月, 2015 2 次提交
  6. 23 3月, 2015 1 次提交
  7. 20 3月, 2015 6 次提交
    • P
      KVM: PPC: Book3S HV: Fix instruction emulation · 2bf27601
      Paul Mackerras 提交于
      Commit 4a157d61 ("KVM: PPC: Book3S HV: Fix endianness of
      instruction obtained from HEIR register") had the side effect that
      we no longer reset vcpu->arch.last_inst to -1 on guest exit in
      the cases where the instruction is not fetched from the guest.
      This means that if instruction emulation turns out to be required
      in those cases, the host will emulate the wrong instruction, since
      vcpu->arch.last_inst will contain the last instruction that was
      emulated.
      
      This fixes it by making sure that vcpu->arch.last_inst is reset
      to -1 in those cases.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2bf27601
    • P
      KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count · ecb6d618
      Paul Mackerras 提交于
      The VPA (virtual processor area) is defined by PAPR and is therefore
      big-endian, so we need a be32_to_cpu when reading it in
      kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
      little-endian host, causing SMP guests to waste time spinning on
      spinlocks.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ecb6d618
    • P
      KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr() · 8f902b00
      Paul Mackerras 提交于
      Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
      and inside that does mutex_lock(&kvm->lock).  It is not permitted to
      take a mutex while holding a spinlock, because the mutex_lock might
      call schedule().  In addition, this causes lockdep to warn about a
      lock ordering issue:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
      -------------------------------------------------------
      qemu-system-ppc/8179 is trying to acquire lock:
       (&kvm->lock){+.+.+.}, at: [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
      
      but task is already holding lock:
       (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&(&vcore->lock)->rlock){+.+...}:
             [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
             [<d00000000ecc7a14>] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
             [<d00000000eb9f5cc>] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
             [<d00000000eb9cb24>] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
             [<d00000000eb94478>] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
             [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
             [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
             [<c000000000009264>] syscall_exit+0x0/0x98
      
      -> #0 (&kvm->lock){+.+.+.}:
             [<c0000000000ff28c>] .lock_acquire+0xcc/0x1a0
             [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
             [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
             [<d00000000ecc510c>] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
             [<d00000000eb9f234>] .kvmppc_set_one_reg+0x44/0x330 [kvm]
             [<d00000000eb9c9dc>] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
             [<d00000000eb9ced4>] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
             [<d00000000eb940b0>] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
             [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
             [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
             [<c000000000009264>] syscall_exit+0x0/0x98
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&(&vcore->lock)->rlock);
                                     lock(&kvm->lock);
                                     lock(&(&vcore->lock)->rlock);
        lock(&kvm->lock);
      
       *** DEADLOCK ***
      
      2 locks held by qemu-system-ppc/8179:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f18>] .vcpu_load+0x28/0x90 [kvm]
       #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]
      
      stack backtrace:
      CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131
      Call Trace:
      [c000001a66c0f310] [c000000000b486ac] .dump_stack+0x88/0xb4 (unreliable)
      [c000001a66c0f390] [c0000000000f8bec] .print_circular_bug+0x27c/0x3d0
      [c000001a66c0f440] [c0000000000fe9e8] .__lock_acquire+0x2028/0x2190
      [c000001a66c0f5d0] [c0000000000ff28c] .lock_acquire+0xcc/0x1a0
      [c000001a66c0f6a0] [c000000000b3c120] .mutex_lock_nested+0x80/0x570
      [c000001a66c0f7c0] [d00000000ecc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
      [c000001a66c0f860] [d00000000ecc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
      [c000001a66c0f8d0] [d00000000eb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
      [c000001a66c0f960] [d00000000eb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
      [c000001a66c0f9f0] [d00000000eb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
      [c000001a66c0faf0] [d00000000eb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
      [c000001a66c0fcb0] [c00000000026cbb4] .do_vfs_ioctl+0x444/0x770
      [c000001a66c0fd90] [c00000000026cfa4] .SyS_ioctl+0xc4/0xe0
      [c000001a66c0fe30] [c000000000009264] syscall_exit+0x0/0x98
      
      This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
      the spin-locked region.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8f902b00
    • T
      powerpc/pseries: Little endian fixes for post mobility device tree update · f6ff0414
      Tyrel Datwyler 提交于
      We currently use the device tree update code in the kernel after resuming
      from a suspend operation to re-sync the kernels view of the device tree with
      that of the hypervisor. The code as it stands is not endian safe as it relies
      on parsing buffers returned by RTAS calls that thusly contains data in big
      endian format.
      
      This patch annotates variables and structure members with __be types as well
      as performing necessary byte swaps to cpu endian for data that needs to be
      parsed.
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Cyril Bur <cyrilbur@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f6ff0414
    • B
      powerpc: Add PVR for POWER8NVL processor · ddee09c0
      Benjamin Herrenschmidt 提交于
      There's a new variant of POWER8 coming called "POWER8 with NVLink". The
      core is identical to POWER8 but unfortunately they strapped it with a
      different PVR, so we need to add an explicit entry for it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ddee09c0
    • P
      powerpc/powernv: Fixes for hypervisor doorbell handling · 755563bc
      Paul Mackerras 提交于
      Since we can now use hypervisor doorbells for host IPIs, this makes
      sure we clear the host IPI flag when taking a doorbell interrupt, and
      clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
      already do for IPIs sent via the XICS interrupt controller).  Otherwise
      if there did happen to be a leftover pending doorbell interrupt for
      an offline CPU thread for any reason, it would prevent that thread from
      going into a power-saving mode; it would instead keep waking up because
      of the interrupt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      755563bc
  8. 04 3月, 2015 2 次提交
    • N
      powerpc/iommu: Remove IOMMU device references via bus notifier · 4ad04e59
      Nishanth Aravamudan 提交于
      After d905c5df ("PPC: POWERNV: move iommu_add_device earlier"), the
      refcnt on the kobject backing the IOMMU group for a PCI device is
      elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
      set_iommu_table_base_and_group). When we go to dlpar a multi-function
      PCI device out:
      
              iommu_reconfig_notifier ->
                      iommu_free_table ->
                              iommu_group_put
                              BUG_ON(tbl->it_group)
      
      We trip this BUG_ON, because there are still references on the table, so
      it is not freed. Fix this by moving the powernv bus notifier to common
      code and calling it for both powernv and pseries.
      
      Fixes: d905c5df ("PPC: POWERNV: move iommu_add_device earlier")
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Tested-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4ad04e59
    • M
      powerpc/smp: Wait until secondaries are active & online · 875ebe94
      Michael Ellerman 提交于
      Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
      "kernel BUG at kernel/smpboot.c:134!" issue during boot:
      
        BUG_ON(td->cpu != smp_processor_id());
      
      Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
      output confirms it:
      
        CPU: 0
        Comm: watchdog/130
      
      The problem is that we aren't ensuring the CPU active bit is set for the
      secondary before allowing the master to continue on. The master unparks
      the secondary CPU's kthreads and the scheduler looks for a CPU to run
      on. It calls select_task_rq() and realises the suggested CPU is not in
      the cpus_allowed mask. It then ends up in select_fallback_rq(), and
      since the active bit isnt't set we choose some other CPU to run on.
      
      This seems to have been introduced by 6acbfb96 "sched: Fix hotplug
      vs. set_cpus_allowed_ptr()", which changed from setting active before
      online to setting active after online. However that was in turn fixing a
      bug where other code assumed an active CPU was also online, so we can't
      just revert that fix.
      
      The simplest fix is just to spin waiting for both active & online to be
      set. We already have a barrier prior to set_cpu_online() (which also
      sets active), to ensure all other setup is completed before online &
      active are set.
      
      Fixes: 6acbfb96 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      875ebe94
  9. 23 2月, 2015 1 次提交
    • P
      powerpc: Re-enable dynticks · fea559f3
      Paul Clarke 提交于
      Implement arch_irq_work_has_interrupt() for powerpc
      
      Commit 9b01f5bf introduced a dependency on "IRQ work self-IPIs" for
      full dynamic ticks to be enabled, by expecting architectures to
      implement a suitable arch_irq_work_has_interrupt() routine.
      
      Several arches have implemented this routine, including x86 (3010279f)
      and arm (09f6edd4), but powerpc was omitted.
      
      This patch implements this routine for powerpc.
      
      The symptom, at boot (on powerpc systems) with "nohz_full=<CPU list>"
      is displayed:
      
           NO_HZ: Can't run full dynticks because arch doesn't support irq work self-IPIs
      
      after this patch:
      
           NO_HZ: Full dynticks CPUs: <CPU list>.
      
      Tested against 3.19.
      
      powerpc implements "IRQ work self-IPIs" by setting the decrementer to 1 in
      arch_irq_work_raise(), which causes a decrementer exception on the next
      timebase tick. We then handle the work in __timer_interrupt().
      
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul A. Clarke <pc@us.ibm.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [mpe: Flesh out change log, fix ws & include guards, remove include of processor.h]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fea559f3
  10. 19 2月, 2015 1 次提交
  11. 18 2月, 2015 1 次提交
  12. 17 2月, 2015 1 次提交
  13. 14 2月, 2015 1 次提交