1. 19 6月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9 · 769377f7
      Paul Mackerras 提交于
      This adds code to allow us to use a different value for the HFSCR
      (Hypervisor Facilities Status and Control Register) when running the
      guest from that which applies in the host.  The reason for doing this
      is to allow us to trap the msgsndp instruction and related operations
      in future so that they can be virtualized.  We also save the value of
      HFSCR when a hypervisor facility unavailable interrupt occurs, because
      the high byte of HFSCR indicates which facility the guest attempted to
      access.
      
      We save and restore the host value on guest entry/exit because some
      bits of it affect host userspace execution.
      
      We only do all this on POWER9, not on POWER8, because we are not
      intending to virtualize any of the facilities controlled by HFSCR on
      POWER8.  In particular, the HFSCR bit that controls execution of
      msgsndp and related operations does not exist on POWER8.  The HFSCR
      doesn't exist at all on POWER7.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      769377f7
    • P
      KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9 · 1bc3fe81
      Paul Mackerras 提交于
      This allows userspace (e.g. QEMU) to enable large decrementer mode for
      the guest when running on a POWER9 host, by setting the LPCR_LD bit in
      the guest LPCR value.  With this, the guest exit code saves 64 bits of
      the guest DEC value on exit.  Other places that use the guest DEC
      value check the LPCR_LD bit in the guest LPCR value, and if it is set,
      omit the 32-bit sign extension that would otherwise be done.
      
      This doesn't change the DEC emulation used by PR KVM because PR KVM
      is not supported on POWER9 yet.
      
      This is partly based on an earlier patch by Oliver O'Halloran.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1bc3fe81
  2. 19 5月, 2017 1 次提交
    • M
      powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash · e41e53cd
      Michael Ellerman 提交于
      virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
      an address. What this means in practice is that it should only return true for
      addresses in the linear mapping which are backed by a valid PFN.
      
      We are failing to properly check that the address is in the linear mapping,
      because virt_to_pfn() will return a valid looking PFN for more or less any
      address. That bug is actually caused by __pa(), used in virt_to_pfn().
      
      eg: __pa(0xc000000000010000) = 0x10000  # Good
          __pa(0xd000000000010000) = 0x10000  # Bad!
          __pa(0x0000000000010000) = 0x10000  # Bad!
      
      This started happening after commit bdbc29c1 ("powerpc: Work around gcc
      miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
      of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
      the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
      you something bogus back.
      
      Until we can verify if that GCC bug is no longer an issue, or come up with
      another solution, this commit does the minimal fix to make virt_addr_valid()
      work, by explicitly checking that the address is in the linear mapping region.
      
      Fixes: bdbc29c1 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Tested-by: NBreno Leitao <breno.leitao@gmail.com>
      e41e53cd
  3. 15 5月, 2017 1 次提交
    • M
      powerpc/modules: If mprofile-kernel is enabled add it to vermagic · 43e24e82
      Michael Ellerman 提交于
      On powerpc we can build the kernel with two different ABIs for mcount(), which
      is used by ftrace. Kernels built with one ABI do not know how to load modules
      built with the other ABI. The new style ABI is called "mprofile-kernel", for
      want of a better name.
      
      Currently if we build a module using the old style ABI, and the kernel with
      mprofile-kernel, when we load the module we'll oops something like:
      
        # insmod autofs4-no-mprofile-kernel.ko
        ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount
        ------------[ cut here ]------------
        WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0
        CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74 #11
        ...
        NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0
        LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590
        Call Trace:
          alloc_pages_current+0xc4/0x1d0 (unreliable)
          ftrace_process_locs+0x4a8/0x590
          load_module+0x1c8c/0x28f0
          SyS_finit_module+0x110/0x140
          system_call+0x38/0xfc
        ...
        ftrace failed to modify
        [<d000000002a31024>] 0xd000000002a31024
         actual:   35:65:00:48
      
      We can avoid this by including in the vermagic whether the kernel/module was
      built with mprofile-kernel. Which results in:
      
        # insmod autofs4-pg.ko
        autofs4: version magic
        '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74 SMP mod_unload modversions '
        should be
        '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74-dirty SMP mod_unload modversions mprofile-kernel'
        insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format
      
      Fixes: 8c50b72a ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NJessica Yu <jeyu@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      43e24e82
  4. 09 5月, 2017 3 次提交
  5. 05 5月, 2017 1 次提交
    • S
      powerpc/64e: Don't place the stack beyond TASK_SIZE · 61baf155
      Scott Wood 提交于
      Commit f4ea6dcb ("powerpc/mm: Enable mappings above 128TB") increased
      the task size on book3s, and introduced a mechanism to dynamically
      control whether a task uses these larger addresses.  While the change to
      the task size itself was ifdef-protected to only apply on book3s, the
      change to STACK_TOP_USER64 was not.  On book3e, this had the effect of
      trying to use addresses up to 128TiB for the stack despite a 64TiB task
      size limit -- which broke 64-bit userspace producing the following errors:
      
      Starting init: /sbin/init exists but couldn't execute it (error -14)
      Starting init: /bin/sh exists but couldn't execute it (error -14)
      Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
      
      Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB")
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      61baf155
  6. 03 5月, 2017 1 次提交
    • C
      powerpc/8xx: Adding support of IRQ in MPC8xx GPIO · 726bd223
      Christophe Leroy 提交于
      This patch allows the use of IRQ to notify the change of GPIO status
      on MPC8xx CPM IO ports. This then allows to associate IRQs to GPIOs
      in the Device Tree.
      
      Ex:
      	CPM1_PIO_C: gpio-controller@960 {
      		#gpio-cells = <2>;
      		compatible = "fsl,cpm1-pario-bank-c";
      		reg = <0x960 0x10>;
      		fsl,cpm1-gpio-irq-mask = <0x0fff>;
      		interrupts = <1 2 6 9 10 11 14 15 23 24 26 31>;
      		interrupt-parent = <&CPM_PIC>;
      		gpio-controller;
      	};
      
      The property 'fsl,cpm1-gpio-irq-mask' defines which of the 16 GPIOs
      have the associated interrupts defined in the 'interrupts' property.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      726bd223
  7. 28 4月, 2017 8 次提交
  8. 27 4月, 2017 2 次提交
  9. 24 4月, 2017 3 次提交
    • D
      powerpc/mm: Ensure IRQs are off in switch_mm() · 9765ad13
      David Gibson 提交于
      powerpc expects IRQs to already be (soft) disabled when switch_mm() is
      called, as made clear in the commit message of 9c1e1052 ("powerpc: Allow
      perf_counters to access user memory at interrupt time").
      
      Aside from any race conditions that might exist between switch_mm() and an IRQ,
      there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
      followed at some point by an IRQ enable then interrupts will remain disabled
      until we return to userspace.
      
      It is true that when switch_mm() is called from the scheduler IRQs are off, but
      not when it's called by use_mm(). Looking closer we see that last year in commit
      f98db601 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
      this was made more explicit by the addition of switch_mm_irqs_off() which is now
      called by the scheduler, vs switch_mm() which is used by use_mm().
      
      Arguably it is a bug in use_mm() to call switch_mm() in a different context than
      it expects, but fixing that will take time.
      
      This was discovered recently when vhost started throwing warnings such as:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:578
        in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
        no locks held by vhost-10760/10768.
        irq event stamp: 10
        hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
        hardirqs last disabled at (10): switch_slb+0x2e4/0x490
        softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
        softirqs last disabled at (0):  (null)
        Call Trace:
          show_stack+0x88/0x390 (unreliable)
          dump_stack+0x30/0x44
          __might_sleep+0x1c4/0x2d0
          mutex_lock_nested+0x74/0x5c0
          cgroup_attach_task_all+0x5c/0x180
          vhost_attach_cgroups_work+0x58/0x80 [vhost]
          vhost_worker+0x24c/0x3d0 [vhost]
          kthread+0xec/0x100
          ret_from_kernel_thread+0x5c/0xd4
      
      Prior to commit 04b96e55 ("vhost: lockless enqueuing") (Aug 2016) the
      vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
      which had the effect of reenabling IRQs. Since that commit removed the locking
      in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
      off causing the warnings.
      
      This patch addresses the problem by making the powerpc code mirror the x86 code,
      ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
      defining switch_mm_irqs_off().
      
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [mpe: Flesh out/rewrite change log, add stable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9765ad13
    • N
      powerpc: Introduce a new helper to obtain function entry points · 1b32cd17
      Naveen N. Rao 提交于
      kprobe_lookup_name() is specific to the kprobe subsystem and may not always
      return the function entry point (in a subsequent patch for KPROBES_ON_FTRACE).
      For looking up function entry points, introduce a separate helper and use it
      in optprobes.c
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1b32cd17
    • N
      powerpc/kprobes: Add support for KPROBES_ON_FTRACE · ead514d5
      Naveen N. Rao 提交于
      Allow kprobes to be placed on ftrace _mcount() call sites. This optimization
      avoids the use of a trap, by riding on ftrace infrastructure.
      
      This depends on HAVE_DYNAMIC_FTRACE_WITH_REGS which depends on MPROFILE_KERNEL,
      which is only currently enabled on powerpc64le with newer toolchains.
      
      Based on the x86 code by Masami.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ead514d5
  10. 23 4月, 2017 8 次提交
  11. 21 4月, 2017 1 次提交
  12. 20 4月, 2017 9 次提交
    • N
      kprobes: Convert kprobe_lookup_name() to a function · 49e0b465
      Naveen N. Rao 提交于
      The macro is now pretty long and ugly on powerpc. In the light of further
      changes needed here, convert it to a __weak variant to be over-ridden with a
      nicer looking function.
      Suggested-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      49e0b465
    • N
      powerpc/64s: Use relon prolog for EXC_VIRT_OOL_MASKABLE_HV handlers · a050d20d
      Nicholas Piggin 提交于
      Hypervisor Virtualization and Directed Hypervisor Doorbell interrupt handlers
      use the macro EXC_VIRT_OOL_MASKABLE_HV for their relocation-on handlers, which
      calls MASKABLE_RELON_EXCEPTION_HV_OOL, which uses the *real mode* interrupt
      prolog. This means we needlessly rfid from virtual mode to virtual mode.
      
      For POWER8 it only affects doorbell IPIs. Context switch microbenchmark between
      threads with snooze disabled (which causes IPI) gets about 3% faster, about 370
      cycles. Should be more important on POWER9 with global doorbells and HVI for
      host interrupts.
      
      Use the RELON variant instead to reduce overhead.
      
      Fixes: 1707dd16 ("powerpc: Save CFAR before branching in interrupt entry paths")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Fold some more detail into the change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a050d20d
    • A
      KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba
      Alexey Kardashevskiy 提交于
      This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
      and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
      without passing them to user space which saves time on switching
      to user space and back.
      
      This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
      KVM tries to handle a TCE request in the real mode, if failed
      it passes the request to the virtual mode to complete the operation.
      If it a virtual mode handler fails, the request is passed to
      the user space; this is not expected to happen though.
      
      To avoid dealing with page use counters (which is tricky in real mode),
      this only accelerates SPAPR TCE IOMMU v2 clients which are required
      to pre-register the userspace memory. The very first TCE request will
      be handled in the VFIO SPAPR TCE driver anyway as the userspace view
      of the TCE table (iommu_table::it_userspace) is not allocated till
      the very first mapping happens and we cannot call vmalloc in real mode.
      
      If we fail to update a hardware IOMMU table unexpected reason, we just
      clear it and move on as there is nothing really we can do about it -
      for example, if we hot plug a VFIO device to a guest, existing TCE tables
      will be mirrored automatically to the hardware and there is no interface
      to report to the guest about possible failures.
      
      This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
      the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
      and associates a physical IOMMU table with the SPAPR TCE table (which
      is a guest view of the hardware IOMMU table). The iommu_table object
      is cached and referenced so we do not have to look up for it in real mode.
      
      This does not implement the UNSET counterpart as there is no use for it -
      once the acceleration is enabled, the existing userspace won't
      disable it unless a VFIO container is destroyed; this adds necessary
      cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
      
      This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
      space.
      
      This adds real mode version of WARN_ON_ONCE() as the generic version
      causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
      returns in the code, this also adds a check for already existing
      vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
      
      This finally makes use of vfio_external_user_iommu_id() which was
      introduced quite some time ago and was considered for removal.
      
      Tests show that this patch increases transmission speed from 220MB/s
      to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      121f80ba
    • A
      KVM: PPC: iommu: Unify TCE checking · b1af23d8
      Alexey Kardashevskiy 提交于
      This reworks helpers for checking TCE update parameters in way they
      can be used in KVM.
      
      This should cause no behavioral change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      b1af23d8
    • A
      KVM: PPC: Pass kvm* to kvmppc_find_table() · 503bfcbe
      Alexey Kardashevskiy 提交于
      The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm*
      there. This will be used in the following patches where we will be
      attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than
      to VCPU).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      503bfcbe
    • A
      KVM: PPC: Book3S PR: Preserve storage control bits · 96df2267
      Alexey Kardashevskiy 提交于
      PR KVM page fault handler performs eaddr to pte translation for a guest,
      however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits
      (storage control) in the kvmppc_pte struct. If PR KVM is running as
      a second level guest under HV KVM, and PR KVM tries inserting HPT entry,
      this fails in HV KVM if it already has this mapping.
      
      This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and
      kvmppc_mmu_map_page().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      96df2267
    • P
      KVM: PPC: Add MMIO emulation for remaining floating-point instructions · 9b5ab005
      Paul Mackerras 提交于
      For completeness, this adds emulation of the lfiwax and lfiwzx
      instructions.  With this, all floating-point load and store instructions
      as of Power ISA V2.07 are emulated.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9b5ab005
    • P
      KVM: PPC: Emulation for more integer loads and stores · ceba57df
      Paul Mackerras 提交于
      This adds emulation for the following integer loads and stores,
      thus enabling them to be used in a guest for accessing emulated
      MMIO locations.
      
      - lhaux
      - lwaux
      - lwzux
      - ldu
      - lwa
      - stdux
      - stwux
      - stdu
      - ldbrx
      - stdbrx
      
      Previously, most of these would cause an emulation failure exit to
      userspace, though ldu and lwa got treated incorrectly as ld, and
      stdu got treated incorrectly as std.
      
      This also tidies up some of the formatting and updates the comment
      listing instructions that still need to be implemented.
      
      With this, all integer loads and stores that are defined in the Power
      ISA v2.07 are emulated, except for those that are permitted to trap
      when used on cache-inhibited or write-through mappings (and which do
      in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx,
      lq/stq, and l[bhwdq]arx/st[bhwdq]cx.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ceba57df
    • A
      KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed) · 91242fd1
      Alexey Kardashevskiy 提交于
      This adds missing stdx emulation for emulated MMIO accesses by KVM
      guests.  This allows the Mellanox mlx5_core driver from recent kernels
      to work when MMIO emulation is enforced by userspace.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      91242fd1