1. 21 6月, 2017 2 次提交
  2. 20 6月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't sleep if XIVE interrupt pending on POWER9 · ee3308a2
      Paul Mackerras 提交于
      On a POWER9 system, it is possible for an interrupt to become pending
      for a VCPU when that VCPU is about to cede (execute a H_CEDE hypercall)
      and has already disabled interrupts, or in the H_CEDE processing up
      to the point where the XIVE context is pulled from the hardware.  In
      such a case, the H_CEDE should not sleep, but should return immediately
      to the guest.  However, the conditions tested in kvmppc_vcpu_woken()
      don't include the condition that a XIVE interrupt is pending, so the
      VCPU could sleep until the next decrementer interrupt.
      
      To fix this, we add a new xive_interrupt_pending() helper which looks
      in the XIVE context that was pulled from the hardware to see if the
      priority of any pending interrupt is higher (numerically lower than)
      the CPU priority.  If so then kvmppc_vcpu_woken() will return true.
      If the XIVE context has never been used, then both the pipr and the
      cppr fields will be zero and the test will indicate that no interrupt
      is pending.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ee3308a2
  3. 19 6月, 2017 5 次提交
    • P
      KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9 · 57900694
      Paul Mackerras 提交于
      On POWER9, we no longer have the restriction that we had on POWER8
      where all threads in a core have to be in the same partition, so
      the CPU threads are now independent.  However, we still want to be
      able to run guests with a virtual SMT topology, if only to allow
      migration of guests from POWER8 systems to POWER9.
      
      A guest that has a virtual SMT mode greater than 1 will expect to
      be able to use the doorbell facility; it will expect the msgsndp
      and msgclrp instructions to work appropriately and to be able to read
      sensible values from the TIR (thread identification register) and
      DPDES (directed privileged doorbell exception status) special-purpose
      registers.  However, since each CPU thread is a separate sub-processor
      in POWER9, these instructions and registers can only be used within
      a single CPU thread.
      
      In order for these instructions to appear to act correctly according
      to the guest's virtual SMT mode, we have to trap and emulate them.
      We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
      register.  The emulation is triggered by the hypervisor facility
      unavailable interrupt that occurs when the guest uses them.
      
      To cause a doorbell interrupt to occur within the guest, we set the
      DPDES register to 1.  If the guest has interrupts enabled, the CPU
      will generate a doorbell interrupt and clear the DPDES register in
      hardware.  The DPDES hardware register for the guest is saved in the
      vcpu->arch.vcore->dpdes field.  Since this gets written by the guest
      exit code, other VCPUs wishing to cause a doorbell interrupt don't
      write that field directly, but instead set a vcpu->arch.doorbell_request
      flag.  This is consumed and set to 0 by the guest entry code, which
      then sets DPDES to 1.
      
      Emulating reads of the DPDES register is somewhat involved, because
      it requires reading the doorbell pending interrupt status of all of the
      VCPU threads in the virtual core, and if any of those VCPUs are
      running, their doorbell status is only up-to-date in the hardware
      DPDES registers of the CPUs where they are running.  In order to get
      a reasonable approximation of the current doorbell status, we send
      those CPUs an IPI, causing an exit from the guest which will update
      the vcpu->arch.vcore->dpdes field.  We then use that value in
      constructing the emulated DPDES register value.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      57900694
    • P
      KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode · 3c313524
      Paul Mackerras 提交于
      This allows userspace to set the desired virtual SMT (simultaneous
      multithreading) mode for a VM, that is, the number of VCPUs that
      get assigned to each virtual core.  Previously, the virtual SMT mode
      was fixed to the number of threads per subcore, and if userspace
      wanted to have fewer vcpus per vcore, then it would achieve that by
      using a sparse CPU numbering.  This had the disadvantage that the
      vcpu numbers can get quite large, particularly for SMT1 guests on
      a POWER8 with 8 threads per core.  With this patch, userspace can
      set its desired virtual SMT mode and then use contiguous vcpu
      numbering.
      
      On POWER8, where the threading mode is "strict", the virtual SMT mode
      must be less than or equal to the number of threads per subcore.  On
      POWER9, which implements a "loose" threading mode, the virtual SMT
      mode can be any power of 2 between 1 and 8, even though there is
      effectively one thread per subcore, since the threads are independent
      and can all be in different partitions.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3c313524
    • P
      KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9 · 769377f7
      Paul Mackerras 提交于
      This adds code to allow us to use a different value for the HFSCR
      (Hypervisor Facilities Status and Control Register) when running the
      guest from that which applies in the host.  The reason for doing this
      is to allow us to trap the msgsndp instruction and related operations
      in future so that they can be virtualized.  We also save the value of
      HFSCR when a hypervisor facility unavailable interrupt occurs, because
      the high byte of HFSCR indicates which facility the guest attempted to
      access.
      
      We save and restore the host value on guest entry/exit because some
      bits of it affect host userspace execution.
      
      We only do all this on POWER9, not on POWER8, because we are not
      intending to virtualize any of the facilities controlled by HFSCR on
      POWER8.  In particular, the HFSCR bit that controls execution of
      msgsndp and related operations does not exist on POWER8.  The HFSCR
      doesn't exist at all on POWER7.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      769377f7
    • P
      KVM: PPC: Book3S HV: Don't let VCPU sleep if it has a doorbell pending · 1da4e2f4
      Paul Mackerras 提交于
      It is possible, through a narrow race condition, for a VCPU to exit
      the guest with a H_CEDE hypercall while it has a doorbell interrupt
      pending.  In this case, the H_CEDE should return immediately, but in
      fact it puts the VCPU to sleep until some other interrupt becomes
      pending or a prod is received (via another VCPU doing H_PROD).
      
      This fixes it by checking the DPDES (Directed Privileged Doorbell
      Exception Status) bit for the thread along with the other interrupt
      pending bits.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1da4e2f4
    • P
      KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9 · 1bc3fe81
      Paul Mackerras 提交于
      This allows userspace (e.g. QEMU) to enable large decrementer mode for
      the guest when running on a POWER9 host, by setting the LPCR_LD bit in
      the guest LPCR value.  With this, the guest exit code saves 64 bits of
      the guest DEC value on exit.  Other places that use the guest DEC
      value check the LPCR_LD bit in the guest LPCR value, and if it is set,
      omit the 32-bit sign extension that would otherwise be done.
      
      This doesn't change the DEC emulation used by PR KVM because PR KVM
      is not supported on POWER9 yet.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1bc3fe81
  4. 16 6月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · 3d3efb68
      Paul Mackerras 提交于
      POWER9 DD1 has an erratum where writing to the TBU40 register, which
      is used to apply an offset to the timebase, can cause the timebase to
      lose counts.  This results in the timebase on some CPUs getting out of
      sync with other CPUs, which then results in misbehaviour of the
      timekeeping code.
      
      To work around the problem, we make KVM ignore the timebase offset for
      all guests on POWER9 DD1 machines.  This means that live migration
      cannot be supported on POWER9 DD1 machines.
      
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3d3efb68
    • P
      KVM: PPC: Book3S HV: Save/restore host values of debug registers · 7ceaa6dc
      Paul Mackerras 提交于
      At present, HV KVM on POWER8 and POWER9 machines loses any instruction
      or data breakpoint set in the host whenever a guest is run.
      Instruction breakpoints are currently only used by xmon, but ptrace
      and the perf_event subsystem can set data breakpoints as well as xmon.
      
      To fix this, we save the host values of the debug registers (CIABR,
      DAWR and DAWRX) before entering the guest and restore them on exit.
      To provide space to save them in the stack frame, we expand the stack
      frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7ceaa6dc
  5. 15 6月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 46a704f8
      Paul Mackerras 提交于
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      46a704f8
    • P
      KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 4c3bb4cc
      Paul Mackerras 提交于
      This restores several special-purpose registers (SPRs) to sane values
      on guest exit that were missed before.
      
      TAR and VRSAVE are readable and writable by userspace, and we need to
      save and restore them to prevent the guest from potentially affecting
      userspace execution (not that TAR or VRSAVE are used by any known
      program that run uses the KVM_RUN ioctl).  We save/restore these
      in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.
      
      FSCR affects userspace execution in that it can prohibit access to
      certain facilities by userspace.  We restore it to the normal value
      for the task on exit from the KVM_RUN ioctl.
      
      IAMR is normally 0, and is restored to 0 on guest exit.  However,
      with a radix host on POWER9, it is set to a value that prevents the
      kernel from executing user-accessible memory.  On POWER9, we save
      IAMR on guest entry and restore it on guest exit to the saved value
      rather than 0.  On POWER8 we continue to set it to 0 on guest exit.
      
      PSPB is normally 0.  We restore it to 0 on guest exit to prevent
      userspace taking advantage of the guest having set it non-zero
      (which would allow userspace to set its SMT priority to high).
      
      UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
      the AMR from being used as a covert channel between userspace
      processes, since the AMR is not context-switched at present.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4c3bb4cc
  6. 13 6月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Context-switch EBB registers properly · ca8efa1d
      Paul Mackerras 提交于
      This adds code to save the values of three SPRs (special-purpose
      registers) used by userspace to control event-based branches (EBBs),
      which are essentially interrupts that get delivered directly to
      userspace.  These registers are loaded up with guest values when
      entering the guest, and their values are saved when exiting the
      guest, but we were not saving the host values and restoring them
      before going back to userspace.
      
      On POWER8 this would only affect userspace programs which explicitly
      request the use of EBBs and also use the KVM_RUN ioctl, since the
      only source of EBBs on POWER8 is the PMU, and there is an explicit
      enable bit in the PMU registers (and those PMU registers do get
      properly context-switched between host and guest).  On POWER9 there
      is provision for externally-generated EBBs, and these are not subject
      to the control in the PMU registers.
      
      Since these registers only affect userspace, we can save them when
      we first come in from userspace and restore them before returning to
      userspace, rather than saving/restoring the host values on every
      guest entry/exit.  Similarly, we don't need to worry about their
      values on offline secondary threads since they execute in the context
      of the idle task, which never executes in userspace.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ca8efa1d
  7. 29 5月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Cope with host using large decrementer mode · 2f272463
      Paul Mackerras 提交于
      POWER9 introduces a new mode for the decrementer register, called
      large decrementer mode, in which the decrementer counter is 56 bits
      wide rather than 32, and reads are sign-extended rather than
      zero-extended.  For the decrementer, this new mode is optional and
      controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
      is 56 bits wide on POWER9 and has no mode control.
      
      Since KVM code reads and writes the decrementer and hypervisor
      decrementer registers in a few places, it needs to be aware of the
      need to treat the decrementer value as a 64-bit quantity, and only do
      a 32-bit sign extension when large decrementer mode is not in effect.
      Similarly, the HDEC should always be treated as a 64-bit quantity on
      POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
      test for POWER9 and the sign extension.
      
      To enable the sign extension to be removed in large decrementer mode,
      we test the LPCR_LD bit in the host LPCR image stored in the struct
      kvm for the guest.  If is set then large decrementer mode is enabled
      and the sign extension should be skipped.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2f272463
  8. 19 5月, 2017 2 次提交
    • O
      devicetree: Move include prefixes from arch to separate directory · d5d332d3
      Olof Johansson 提交于
      We use a directory under arch/$ARCH/boot/dts as an include path
      that has links outside of the subtree to find dt-bindings from under
      include/dt-bindings. That's been working well, but new DT architectures
      haven't been adding them by default.
      
      Recently there's been a desire to share some of the DT material between
      arm and arm64, which originally caused developers to create symlinks or
      relative includes between the subtrees. This isn't ideal -- it breaks
      if the DT files aren't stored in the exact same hierarchy as the kernel
      tree, and generally it's just icky.
      
      As a somewhat cleaner solution we decided to add a $ARCH/ prefix link
      once, and allow DTS files to reference dtsi (and dts) files in other
      architectures that way.
      
      Original approach was to create these links under each architecture,
      but it lead to the problem of recursive symlinks.
      
      As a remedy, move the include link directories out of the architecture
      trees into a common location. At the same time, they can now share one
      directory and one dt-bindings/ link as well.
      
      Fixes: 4027494a ('ARM: dts: add arm/arm64 include symlinks')
      Reported-by: NRussell King <linux@armlinux.org.uk>
      Reported-by: NOmar Sandoval <osandov@osandov.com>
      Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Tested-by: NHeiko Stuebner <heiko@sntech.de>
      Acked-by: NRob Herring <robh@kernel.org>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      d5d332d3
    • M
      powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash · e41e53cd
      Michael Ellerman 提交于
      virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
      an address. What this means in practice is that it should only return true for
      addresses in the linear mapping which are backed by a valid PFN.
      
      We are failing to properly check that the address is in the linear mapping,
      because virt_to_pfn() will return a valid looking PFN for more or less any
      address. That bug is actually caused by __pa(), used in virt_to_pfn().
      
      eg: __pa(0xc000000000010000) = 0x10000  # Good
          __pa(0xd000000000010000) = 0x10000  # Bad!
          __pa(0x0000000000010000) = 0x10000  # Bad!
      
      This started happening after commit bdbc29c1 ("powerpc: Work around gcc
      miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
      of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
      the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
      you something bogus back.
      
      Until we can verify if that GCC bug is no longer an issue, or come up with
      another solution, this commit does the minimal fix to make virt_addr_valid()
      work, by explicitly checking that the address is in the linear mapping region.
      
      Fixes: bdbc29c1 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Tested-by: NBreno Leitao <breno.leitao@gmail.com>
      e41e53cd
  9. 17 5月, 2017 1 次提交
    • M
      powerpc/mm: Fix crash in page table dump with huge pages · bfb9956a
      Michael Ellerman 提交于
      The page table dump code doesn't know about huge pages, so currently
      it crashes (or walks random memory, usually leading to a crash), if it
      finds a huge page. On Book3S we only see huge pages in the Linux page
      tables when we're using the P9 Radix MMU.
      
      Teaching the code to properly handle huge pages is a bit more involved,
      so for now just prevent the crash.
      
      Cc: stable@vger.kernel.org # v4.10+
      Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bfb9956a
  10. 16 5月, 2017 2 次提交
  11. 15 5月, 2017 2 次提交
    • M
      powerpc/tm: Fix FP and VMX register corruption · f48e91e8
      Michael Neuling 提交于
      In commit dc310669 ("powerpc: tm: Always use fp_state and vr_state
      to store live registers"), a section of code was removed that copied
      the current state to checkpointed state. That code should not have been
      removed.
      
      When an FP (Floating Point) unavailable is taken inside a transaction,
      we need to abort the transaction. This is because at the time of the
      tbegin, the FP state is bogus so the state stored in the checkpointed
      registers is incorrect. To fix this, we treclaim (to get the
      checkpointed GPRs) and then copy the thread_struct FP live state into
      the checkpointed state. We then trecheckpoint so that the FP state is
      correctly restored into the CPU.
      
      The copying of the FP registers from live to checkpointed is what was
      missing.
      
      This simplifies the logic slightly from the original patch.
      tm_reclaim_thread() will now always write the checkpointed FP
      state. Either the checkpointed FP state will be written as part of
      the actual treclaim (in tm.S), or it'll be a copy of the live
      state. Which one we use is based on MSR[FP] from userspace.
      
      Similarly for VMX.
      
      Fixes: dc310669 ("powerpc: tm: Always use fp_state and vr_state to store live registers")
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: cyrilbur@gmail.com
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f48e91e8
    • M
      powerpc/modules: If mprofile-kernel is enabled add it to vermagic · 43e24e82
      Michael Ellerman 提交于
      On powerpc we can build the kernel with two different ABIs for mcount(), which
      is used by ftrace. Kernels built with one ABI do not know how to load modules
      built with the other ABI. The new style ABI is called "mprofile-kernel", for
      want of a better name.
      
      Currently if we build a module using the old style ABI, and the kernel with
      mprofile-kernel, when we load the module we'll oops something like:
      
        # insmod autofs4-no-mprofile-kernel.ko
        ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount
        ------------[ cut here ]------------
        WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0
        CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74 #11
        ...
        NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0
        LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590
        Call Trace:
          alloc_pages_current+0xc4/0x1d0 (unreliable)
          ftrace_process_locs+0x4a8/0x590
          load_module+0x1c8c/0x28f0
          SyS_finit_module+0x110/0x140
          system_call+0x38/0xfc
        ...
        ftrace failed to modify
        [<d000000002a31024>] 0xd000000002a31024
         actual:   35:65:00:48
      
      We can avoid this by including in the vermagic whether the kernel/module was
      built with mprofile-kernel. Which results in:
      
        # insmod autofs4-pg.ko
        autofs4: version magic
        '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74 SMP mod_unload modversions '
        should be
        '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74-dirty SMP mod_unload modversions mprofile-kernel'
        insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format
      
      Fixes: 8c50b72a ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NJessica Yu <jeyu@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      43e24e82
  12. 12 5月, 2017 3 次提交
    • P
      KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms · 76d837a4
      Paul Mackerras 提交于
      Commit e91aa8e6 ("KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64
      permanently", 2017-03-22) enabled the SPAPR TCE code for all 64-bit
      Book 3S kernel configurations in order to simplify the code and
      reduce #ifdefs.  However, 64-bit Book 3S PPC platforms other than
      pseries and powernv don't implement the necessary IOMMU callbacks,
      leading to build failures like the following (for a pasemi config):
      
      scripts/kconfig/conf  --silentoldconfig Kconfig
      warning: (KVM_BOOK3S_64) selects SPAPR_TCE_IOMMU which has unmet direct dependencies (IOMMU_SUPPORT && (PPC_POWERNV || PPC_PSERIES))
      
      ...
      
        CC [M]  arch/powerpc/kvm/book3s_64_vio.o
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c: In function ‘kvmppc_clear_tce’:
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_64_vio.c:363:2: error: implicit declaration of function ‘iommu_tce_xchg’ [-Werror=implicit-function-declaration]
        iommu_tce_xchg(tbl, entry, &hpa, &dir);
        ^
      
      To fix this, we make the inclusion of the SPAPR TCE support, and the
      code that uses it in book3s_vio.c and book3s_vio_hv.c, depend on
      the inclusion of support for the pseries and/or powernv platforms.
      This means that when running a 'pseries' guest on those platforms,
      the guest won't have in-kernel acceleration of the PAPR TCE hypercalls,
      but at least now they compile.
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      76d837a4
    • P
      KVM: PPC: Book3S PR: Check copy_to/from_user return values · 67325e98
      Paul Mackerras 提交于
      The PR KVM implementation of the PAPR HPT hypercalls (H_ENTER etc.)
      access an image of the HPT in userspace memory using copy_from_user
      and copy_to_user.  Recently, the declarations of those functions were
      annotated to indicate that the return value must be checked.  Since
      this code doesn't currently check the return value, this causes
      compile warnings like the ones shown below, and since on PPC the
      default is to compile arch/powerpc with -Werror, this causes the
      build to fail.
      
      To fix this, we check the return values, and if non-zero, fail the
      hypercall being processed with a H_FUNCTION error return value.
      There is really no good error return value to use since PAPR didn't
      envisage the possibility that the hypervisor may not be able to access
      the guest's HPT, and H_FUNCTION (function not supported) seems as
      good as any.
      
      The typical compile warnings look like this:
      
        CC      arch/powerpc/kvm/book3s_pr_papr.o
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c: In function ‘kvmppc_h_pr_enter’:
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:53:2: error: ignoring return value of ‘copy_from_user’, declared with attribute warn_unused_result [-Werror=unused-result]
        copy_from_user(pteg, (void __user *)pteg_addr, sizeof(pteg));
        ^
      /home/paulus/kernel/kvm/arch/powerpc/kvm/book3s_pr_papr.c:74:2: error: ignoring return value of ‘copy_to_user’, declared with attribute warn_unused_result [-Werror=unused-result]
        copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE);
        ^
      
      ... etc.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      67325e98
    • P
      KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers · acde2572
      Paul Mackerras 提交于
      POWER9 running a radix guest will take some hypervisor interrupts
      without going to real mode (turning off the MMU).  This means that
      early hypercall handlers may now be called in virtual mode.  Most of
      the handlers work just fine in both modes, but there are some that
      can crash the host if called in virtual mode, notably the TCE (IOMMU)
      hypercalls H_PUT_TCE, H_STUFF_TCE and H_PUT_TCE_INDIRECT.  These
      already have both a real-mode and a virtual-mode version, so we
      arrange for the real-mode version to return H_TOO_HARD for radix
      guests, which will result in the virtual-mode version being called.
      
      The other hypercall which is sensitive to the MMU mode is H_RANDOM.
      It doesn't have a virtual-mode version, so this adds code to enable
      it to be called in either mode.
      
      An alternative solution was considered which would refuse to call any
      of the early hypercall handlers when doing a virtual-mode exit from a
      radix guest.  However, the XICS-on-XIVE code depends on the XICS
      hypercalls being handled early even for virtual-mode exits, because
      the handlers need to be called before the XIVE vCPU state has been
      pulled off the hardware.  Therefore that solution would have become
      quite invasive and complicated, and was rejected in favour of the
      simpler, though less elegant, solution presented here.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      acde2572
  13. 10 5月, 2017 1 次提交
    • N
      uapi: export all headers under uapi directories · fcc8487d
      Nicolas Dichtel 提交于
      Regularly, when a new header is created in include/uapi/, the developer
      forgets to add it in the corresponding Kbuild file. This error is usually
      detected after the release is out.
      
      In fact, all headers under uapi directories should be exported, thus it's
      useless to have an exhaustive list.
      
      After this patch, the following files, which were not exported, are now
      exported (with make headers_install_all):
      asm-arc/kvm_para.h
      asm-arc/ucontext.h
      asm-blackfin/shmparam.h
      asm-blackfin/ucontext.h
      asm-c6x/shmparam.h
      asm-c6x/ucontext.h
      asm-cris/kvm_para.h
      asm-h8300/shmparam.h
      asm-h8300/ucontext.h
      asm-hexagon/shmparam.h
      asm-m32r/kvm_para.h
      asm-m68k/kvm_para.h
      asm-m68k/shmparam.h
      asm-metag/kvm_para.h
      asm-metag/shmparam.h
      asm-metag/ucontext.h
      asm-mips/hwcap.h
      asm-mips/reg.h
      asm-mips/ucontext.h
      asm-nios2/kvm_para.h
      asm-nios2/ucontext.h
      asm-openrisc/shmparam.h
      asm-parisc/kvm_para.h
      asm-powerpc/perf_regs.h
      asm-sh/kvm_para.h
      asm-sh/ucontext.h
      asm-tile/shmparam.h
      asm-unicore32/shmparam.h
      asm-unicore32/ucontext.h
      asm-x86/hwcap2.h
      asm-xtensa/kvm_para.h
      drm/armada_drm.h
      drm/etnaviv_drm.h
      drm/vgem_drm.h
      linux/aspeed-lpc-ctrl.h
      linux/auto_dev-ioctl.h
      linux/bcache.h
      linux/btrfs_tree.h
      linux/can/vxcan.h
      linux/cifs/cifs_mount.h
      linux/coresight-stm.h
      linux/cryptouser.h
      linux/fsmap.h
      linux/genwqe/genwqe_card.h
      linux/hash_info.h
      linux/kcm.h
      linux/kcov.h
      linux/kfd_ioctl.h
      linux/lightnvm.h
      linux/module.h
      linux/nbd-netlink.h
      linux/nilfs2_api.h
      linux/nilfs2_ondisk.h
      linux/nsfs.h
      linux/pr.h
      linux/qrtr.h
      linux/rpmsg.h
      linux/sched/types.h
      linux/sed-opal.h
      linux/smc.h
      linux/smc_diag.h
      linux/stm.h
      linux/switchtec_ioctl.h
      linux/vfio_ccw.h
      linux/wil6210_uapi.h
      rdma/bnxt_re-abi.h
      
      Note that I have removed from this list the files which are generated in every
      exported directories (like .install or .install.cmd).
      
      Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
      subdirs with a pure makefile command.
      
      For the record, note that exported files for asm directories are a mix of
      files listed by:
       - include/uapi/asm-generic/Kbuild.asm;
       - arch/<arch>/include/uapi/asm/Kbuild;
       - arch/<arch>/include/asm/Kbuild.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: NMark Salter <msalter@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      fcc8487d
  14. 09 5月, 2017 8 次提交
  15. 05 5月, 2017 1 次提交
    • S
      powerpc/64e: Don't place the stack beyond TASK_SIZE · 61baf155
      Scott Wood 提交于
      Commit f4ea6dcb ("powerpc/mm: Enable mappings above 128TB") increased
      the task size on book3s, and introduced a mechanism to dynamically
      control whether a task uses these larger addresses.  While the change to
      the task size itself was ifdef-protected to only apply on book3s, the
      change to STACK_TOP_USER64 was not.  On book3e, this had the effect of
      trying to use addresses up to 128TiB for the stack despite a 64TiB task
      size limit -- which broke 64-bit userspace producing the following errors:
      
      Starting init: /sbin/init exists but couldn't execute it (error -14)
      Starting init: /bin/sh exists but couldn't execute it (error -14)
      Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
      
      Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      61baf155
  16. 04 5月, 2017 1 次提交
  17. 03 5月, 2017 5 次提交
    • N
      powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it · 700b7ead
      Nicholas Piggin 提交于
      Power9/ISAv3 has no VRMASD field in LPCR, we shouldn't be setting reserved bits,
      so don't set them on Power9.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      700b7ead
    • A
      powerpc/powernv: Fix TCE kill on NVLink2 · 6b3d12a9
      Alistair Popple 提交于
      Commit 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on
      NVLink2") forced all TCE kills to go via the OPAL call for
      NVLink2. However the PHB3 implementation of TCE kill was still being
      called directly from some functions which in some circumstances caused
      a machine check.
      
      This patch adds an equivalent IODA2 version of the function which uses
      the correct invalidation method depending on PHB model and changes all
      external callers to use it instead.
      
      Fixes: 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b3d12a9
    • M
      powerpc/mm/radix: Drop support for CPUs without lockless tlbie · 3c9ac2bc
      Michael Ellerman 提交于
      Currently the radix TLB code includes support for CPUs that do *not*
      have MMU_FTR_LOCKLESS_TLBIE. On those CPUs we are required to take a
      global spinlock before issuing a tlbie.
      
      Radix can only be built for 64-bit Book3s CPUs, and of those, only
      POWER4, 970, Cell and PA6T do not have MMU_FTR_LOCKLESS_TLBIE. Although
      it's possible to build a kernel with Radix support that can also boot on
      those CPUs, we happen to know that in reality none of those CPUs support
      the Radix MMU, so the code can never actually run on those CPUs.
      
      So remove the native_tlbie_lock in the Radix TLB code.
      
      Note that there is another lock of the same name in the hash code, which
      is unaffected by this patch.
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3c9ac2bc
    • M
      powerpc/book3s/mce: Move add_taint() later in virtual mode · d93b0ac0
      Mahesh Salgaonkar 提交于
      machine_check_early() gets called in real mode. The very first time when
      add_taint() is called, it prints a warning which ends up calling opal
      call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
      very first machine check while we are in opal we are doomed. OPAL_CALL
      overwrites the PACASAVEDMSR in r13 and in this case when we are done with
      MCE handling the original opal call will use this new MSR on it's way
      back to opal_return. This usually leads to unexpected behaviour or the
      kernel to panic. Instead move the add_taint() call later in the virtual
      mode where it is safe to call.
      
      This is broken with current FW level. We got lucky so far for not getting
      very first MCE hit while in OPAL. But easily reproducible on Mambo.
      
      Fixes: 27ea2c42 ("powerpc: Set the correct kernel taint on machine check errors.")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d93b0ac0
    • M
      powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body · 3f2290e1
      Michael Ellerman 提交于
      The entire body of unregister_cpu_online() is inside an #ifdef
      CONFIG_HOTPLUG_CPU block. This is ugly and means we create an empty function
      when hotplug is disabled for no reason.
      
      Instead move the #ifdef out of the function body and define the function to be
      NULL in the else case. This means we'll pass NULL to cpuhp_setup_state(), but
      that's fine because it accepts NULL to mean there is no teardown callback, which
      is exactly what we want.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3f2290e1