1. 31 1月, 2017 18 次提交
    • P
      KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement · a29ebeaf
      Paul Mackerras 提交于
      With radix, the guest can do TLB invalidations itself using the tlbie
      (global) and tlbiel (local) TLB invalidation instructions.  Linux guests
      use local TLB invalidations for translations that have only ever been
      accessed on one vcpu.  However, that doesn't mean that the translations
      have only been accessed on one physical cpu (pcpu) since vcpus can move
      around from one pcpu to another.  Thus a tlbiel might leave behind stale
      TLB entries on a pcpu where the vcpu previously ran, and if that task
      then moves back to that previous pcpu, it could see those stale TLB
      entries and thus access memory incorrectly.  The usual symptom of this
      is random segfaults in userspace programs in the guest.
      
      To cope with this, we detect when a vcpu is about to start executing on
      a thread in a core that is a different core from the last time it
      executed.  If that is the case, then we mark the core as needing a
      TLB flush and then send an interrupt to any thread in the core that is
      currently running a vcpu from the same guest.  This will get those vcpus
      out of the guest, and the first one to re-enter the guest will do the
      TLB flush.  The reason for interrupting the vcpus executing on the old
      core is to cope with the following scenario:
      
      	CPU 0			CPU 1			CPU 4
      	(core 0)			(core 0)			(core 1)
      
      	VCPU 0 runs task X      VCPU 1 runs
      	core 0 TLB gets
      	entries from task X
      	VCPU 0 moves to CPU 4
      							VCPU 0 runs task X
      							Unmap pages of task X
      							tlbiel
      
      				(still VCPU 1)			task X moves to VCPU 1
      				task X runs
      				task X sees stale TLB
      				entries
      
      That is, as soon as the VCPU starts executing on the new core, it
      could unmap and tlbiel some page table entries, and then the task
      could migrate to one of the VCPUs running on the old core and
      potentially see stale TLB entries.
      
      Since the TLB is shared between all the threads in a core, we only
      use the bit of kvm->arch.need_tlb_flush corresponding to the first
      thread in the core.  To ensure that we don't have a window where we
      can miss a flush, this moves the clearing of the bit from before the
      actual flush to after it.  This way, two threads might both do the
      flush, but we prevent the situation where one thread can enter the
      guest before the flush is finished.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a29ebeaf
    • P
      KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode · 65dae540
      Paul Mackerras 提交于
      If the guest is in radix mode, then it doesn't have a hashed page
      table (HPT), so all of the hypercalls that manipulate the HPT can't
      work and should return an error.  This adds checks to make them
      return H_FUNCTION ("function not supported").
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      65dae540
    • P
      KVM: PPC: Book3S HV: Implement dirty page logging for radix guests · 8f7b79b8
      Paul Mackerras 提交于
      This adds code to keep track of dirty pages when requested (that is,
      when memslot->dirty_bitmap is non-NULL) for radix guests.  We use the
      dirty bits in the PTEs in the second-level (partition-scoped) page
      tables, together with a bitmap of pages that were dirty when their
      PTE was invalidated (e.g., when the page was paged out).  This bitmap
      is stored in the first half of the memslot->dirty_bitmap area, and
      kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
      bitmap that gets returned to userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f7b79b8
    • P
      KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests · 01756099
      Paul Mackerras 提交于
      This adapts our implementations of the MMU notifier callbacks
      (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
      to call radix functions when the guest is using radix.  These
      implementations are much simpler than for HPT guests because we
      have only one PTE to deal with, so we don't need to traverse
      rmap chains.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      01756099
    • P
      KVM: PPC: Book3S HV: Page table construction and page faults for radix guests · 5a319350
      Paul Mackerras 提交于
      This adds the code to construct the second-level ("partition-scoped" in
      architecturese) page tables for guests using the radix MMU.  Apart from
      the PGD level, which is allocated when the guest is created, the rest
      of the tree is all constructed in response to hypervisor page faults.
      
      As well as hypervisor page faults for missing pages, we also get faults
      for reference/change (RC) bits needing to be set, as well as various
      other error conditions.  For now, we only set the R or C bit in the
      guest page table if the same bit is set in the host PTE for the
      backing page.
      
      This code can take advantage of the guest being backed with either
      transparent or ordinary 2MB huge pages, and insert 2MB page entries
      into the guest page tables.  There is no support for 1GB huge pages
      yet.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a319350
    • P
      KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests · f4c51f84
      Paul Mackerras 提交于
      This adds code to  branch around the parts that radix guests don't
      need - clearing and loading the SLB with the guest SLB contents,
      saving the guest SLB contents on exit, and restoring the host SLB
      contents.
      
      Since the host is now using radix, we need to save and restore the
      host value for the PID register.
      
      On hypervisor data/instruction storage interrupts, we don't do the
      guest HPT lookup on radix, but just save the guest physical address
      for the fault (from the ASDR register) in the vcpu struct.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f4c51f84
    • P
      KVM: PPC: Book3S HV: Add basic infrastructure for radix guests · 9e04ba69
      Paul Mackerras 提交于
      This adds a field in struct kvm_arch and an inline helper to
      indicate whether a guest is a radix guest or not, plus a new file
      to contain the radix MMU code, which currently contains just a
      translate function which knows how to traverse the guest page
      tables to translate an address.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e04ba69
    • P
      KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9 · ef8c640c
      Paul Mackerras 提交于
      POWER9 adds a register called ASDR (Access Segment Descriptor
      Register), which is set by hypervisor data/instruction storage
      interrupts to contain the segment descriptor for the address
      being accessed, assuming the guest is using HPT translation.
      (For radix guests, it contains the guest real address of the
      access.)
      
      Thus, for HPT guests on POWER9, we can use this register rather
      than looking up the SLB with the slbfee. instruction.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ef8c640c
    • P
      KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9 · 468808bd
      Paul Mackerras 提交于
      This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
      for HPT guests on POWER9.  With this, we can return 1 for the
      KVM_CAP_PPC_MMU_HASH_V3 capability.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      468808bd
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
    • P
      powerpc/64: Allow for relocation-on interrupts from guest to host · bc355125
      Paul Mackerras 提交于
      With host and guest both using radix translation, it is feasible
      for the host to take interrupts that come from the guest with
      relocation on, and that is in fact what the POWER9 hardware will
      do when LPCR[AIL] = 3.  All such interrupts use HSRR0/1 not SRR0/1
      except for system call with LEV=1 (hcall).
      
      Therefore this adds the KVM tests to the _HV variants of the
      relocation-on interrupt handlers, and adds the KVM test to the
      relocation-on system call entry point.
      
      We also instantiate the relocation-on versions of the hypervisor
      data storage and instruction interrupt handlers, since these can
      occur with relocation on in radix guests.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bc355125
    • P
      powerpc/64: Make type of partition table flush depend on partition type · 16ed1416
      Paul Mackerras 提交于
      When changing a partition table entry on POWER9, we do a particular
      form of the tlbie instruction which flushes all TLBs and caches of
      the partition table for a given logical partition ID (LPID).
      This instruction has a field in the instruction word, labelled R
      (radix), which should be 1 if the partition was previously a radix
      partition and 0 if it was a HPT partition.  This implements that
      logic.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      16ed1416
    • P
      powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM · ba9b399a
      Paul Mackerras 提交于
      This exports the pgtable_cache array and the pgtable_cache_add
      function so that HV KVM can use them for allocating radix page
      tables for guests.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ba9b399a
    • P
      powerpc/64: More definitions for POWER9 · dbcbfee0
      Paul Mackerras 提交于
      This adds definitions for bits in the DSISR register which are used
      by POWER9 for various translation-related exception conditions, and
      for some more bits in the partition table entry that will be needed
      by KVM.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dbcbfee0
    • P
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras 提交于
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940
    • P
      powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options · 3f4ab2f8
      Paul Mackerras 提交于
      This fixes the byte index values for some of the option bits in
      the "ibm,architectur-vec-5" property. The "platform facilities options"
      bits are in byte 17 not byte 14, so the upper 8 bits of their
      definitions need to be 0x11 not 0x0E. The "sub processor support" option
      is in byte 21 not byte 15.
      
      Note none of these options are actually looked up in
      "ibm,architecture-vec-5" at this time, so there is no bug.
      
      When checking whether option bits are set, we should check that
      the offset of the byte being checked is less than the vector
      length that we got from the hypervisor.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3f4ab2f8
    • P
      powerpc/64: Don't try to use radix MMU under a hypervisor · 18569c1f
      Paul Mackerras 提交于
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it will try to use the radix MMU even though it doesn't have
      the necessary code to use radix under a hypervisor (it doesn't negotiate
      use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
      result is that the guest kernel will crash when it tries to turn on the
      MMU.
      
      This fixes it by looking for the /chosen/ibm,architecture-vec-5
      property, and if it exists, clears the radix MMU feature bit, before we
      decide whether to initialize for radix or HPT. This property is created
      by the hypervisor as a result of the guest calling the
      ibm,client-architecture-support method to indicate its capabilities, so
      it will indicate whether the hypervisor agreed to us using radix.
      
      Systems without a hypervisor may have this property also (for example,
      skiboot creates it), so we check the HV bit in the MSR to see whether we
      are running as a guest or not. If we are in hypervisor mode, then we can
      do whatever we like including using the radix MMU.
      
      The reason for using this property is that in future, when we have
      support for using radix under a hypervisor, we will need to check this
      property to see whether the hypervisor agreed to us using radix.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      18569c1f
    • N
      KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts · a97a65d5
      Nicholas Piggin 提交于
      64-bit Book3S exception handlers must find the dynamic kernel base
      to add to the target address when branching beyond __end_interrupts,
      in order to support kernel running at non-0 physical address.
      
      Support this in KVM by branching with CTR, similarly to regular
      interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
      restored after the branch.
      
      Without this, the host kernel hangs and crashes randomly when it is
      running at a non-0 address and a KVM guest is started.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a97a65d5
  2. 27 1月, 2017 2 次提交
  3. 26 12月, 2016 2 次提交
    • L
      powerpc: Fix build warning on 32-bit PPC · 8ae679c4
      Larry Finger 提交于
      I am getting the following warning when I build kernel 4.9-git on my
      PowerBook G4 with a 32-bit PPC processor:
      
          AS      arch/powerpc/kernel/misc_32.o
        arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
      
      This problem is evident after commit 989cea5c ("kbuild: prevent
      lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
      error that has been in the code since 2005 when this source file was
      created.  That was with commit 9994a338 ("powerpc: Introduce
      entry_{32,64}.S, misc_{32,64}.S, systbl.S").
      
      The offending line does not make a lot of sense.  This error does not
      seem to cause any errors in the executable, thus I am not recommending
      that it be applied to any stable versions.
      
      Thanks to Nicholas Piggin for suggesting this solution.
      
      Fixes: 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
      Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ae679c4
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  4. 25 12月, 2016 3 次提交
  5. 21 12月, 2016 3 次提交
    • M
      powerpc: fsl/fman: remove fsl,fman from of_device_ids[] · ae6021d4
      Madalin Bucur 提交于
      The fsl/fman drivers will use of_platform_populate() on all
      supported platforms. Call of_platform_populate() to probe the
      FMan sub-nodes.
      Signed-off-by: NIgal Liberman <igal.liberman@freescale.com>
      Signed-off-by: NMadalin Bucur <madalin.bucur@nxp.com>
      Acked-by: NScott Wood <oss@buserror.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae6021d4
    • T
      powerpc: ima: send the kexec buffer to the next kernel · ab6b1d1f
      Thiago Jung Bauermann 提交于
      The IMA kexec buffer allows the currently running kernel to pass the
      measurement list via a kexec segment to the kernel that will be kexec'd.
      
      This is the architecture-specific part of setting up the IMA kexec
      buffer for the next kernel.  It will be used in the next patch.
      
      Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.comSigned-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andreas Steffen <andreas.steffen@strongswan.org>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab6b1d1f
    • T
      powerpc: ima: get the kexec buffer passed by the previous kernel · 467d2782
      Thiago Jung Bauermann 提交于
      Patch series "ima: carry the measurement list across kexec", v8.
      
      The TPM PCRs are only reset on a hard reboot.  In order to validate a
      TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
      list of the running kernel must be saved and then restored on the
      subsequent boot, possibly of a different architecture.
      
      The existing securityfs binary_runtime_measurements file conveniently
      provides a serialized format of the IMA measurement list.  This patch
      set serializes the measurement list in this format and restores it.
      
      Up to now, the binary_runtime_measurements was defined as architecture
      native format.  The assumption being that userspace could and would
      handle any architecture conversions.  With the ability of carrying the
      measurement list across kexec, possibly from one architecture to a
      different one, the per boot architecture information is lost and with it
      the ability of recalculating the template digest hash.  To resolve this
      problem, without breaking the existing ABI, this patch set introduces
      the boot command line option "ima_canonical_fmt", which is arbitrarily
      defined as little endian.
      
      The need for this boot command line option will be limited to the
      existing version 1 format of the binary_runtime_measurements.
      Subsequent formats will be defined as canonical format (eg.  TPM 2.0
      support for larger digests).
      
      A simplified method of Thiago Bauermann's "kexec buffer handover" patch
      series for carrying the IMA measurement list across kexec is included in
      this patch set.  The simplified method requires all file measurements be
      taken prior to executing the kexec load, as subsequent measurements will
      not be carried across the kexec and restored.
      
      This patch (of 10):
      
      The IMA kexec buffer allows the currently running kernel to pass the
      measurement list via a kexec segment to the kernel that will be kexec'd.
      The second kernel can check whether the previous kernel sent the buffer
      and retrieve it.
      
      This is the architecture-specific part which enables IMA to receive the
      measurement list passed by the previous kernel.  It will be used in the
      next patch.
      
      The change in machine_kexec_64.c is to factor out the logic of removing
      an FDT memory reservation so that it can be used by remove_ima_buffer.
      
      Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.comSigned-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andreas Steffen <andreas.steffen@strongswan.org>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      467d2782
  6. 15 12月, 2016 2 次提交
  7. 13 12月, 2016 4 次提交
  8. 10 12月, 2016 6 次提交