1. 21 2月, 2019 6 次提交
    • M
      powerpc/mm/hash: Increase vmalloc space to 512T with hash MMU · 3d8810e0
      Michael Ellerman 提交于
      This patch updates the kernel non-linear virtual map to 512TB when
      we're built with 64K page size and are using the hash MMU. We allocate
      one context for the vmalloc region and hence the max virtual area size
      is limited by the context map size (512TB for 64K and 64TB for 4K page
      size).
      
      This patch fixes boot failures with large amounts of system RAM where
      we need large vmalloc space to handle per cpu allocations.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      3d8810e0
    • M
      powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning · ca6d5149
      Michael Ellerman 提交于
      GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
      the build:
      
        In function ‘user_regset_copyin’,
            inlined from ‘vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
        include/linux/regset.h:295:4: error: ‘memcpy’ offset [-527, -529] is
        out of the bounds [0, 16] of object ‘vrsave’ with type ‘union
        <anonymous>’ [-Werror=array-bounds]
        arch/powerpc/kernel/ptrace.c: In function ‘vr_set’:
        arch/powerpc/kernel/ptrace.c:623:5: note: ‘vrsave’ declared here
           } vrsave;
      
      This has been identified as a regression in GCC, see GCC bug 88273.
      
      However we can avoid the warning and also simplify the logic and make
      it more robust.
      
      Currently we pass -1 as end_pos to user_regset_copyout(). This says
      "copy up to the end of the regset".
      
      The definition of the regset is:
      	[REGSET_VMX] = {
      		.core_note_type = NT_PPC_VMX, .n = 34,
      		.size = sizeof(vector128), .align = sizeof(vector128),
      		.active = vr_active, .get = vr_get, .set = vr_set
      	},
      
      The end is calculated as (n * size), ie. 34 * sizeof(vector128).
      
      In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
      we can copy up to sizeof(vector128) into/out-of vrsave.
      
      The on-stack vrsave is defined as:
        union {
      	  elf_vrreg_t reg;
      	  u32 word;
        } vrsave;
      
      And elf_vrreg_t is:
        typedef __vector128 elf_vrreg_t;
      
      So there is no bug, but we rely on all those sizes lining up,
      otherwise we would have a kernel stack exposure/overwrite on our
      hands.
      
      Rather than relying on that we can pass an explict end_pos based on
      the sizeof(vrsave). The result should be exactly the same but it's
      more obviously not over-reading/writing the stack and it avoids the
      compiler warning.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NMathieu Malaterre <malat@debian.org>
      Cc: stable@vger.kernel.org
      Tested-by: NMathieu Malaterre <malat@debian.org>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca6d5149
    • P
      powerpc/powernv/npu: Remove redundant change_pte() hook · 1b58a975
      Peter Xu 提交于
      The change_pte() notifier was designed to use as a quick path to
      update secondary MMU PTEs on write permission changes or PFN changes.
      For KVM, it could reduce the vm-exits when vcpu faults on the pages
      that was touched up by KSM. It's not used to do cache invalidations,
      for example, if we see the notifier will be called before the real PTE
      update after all (please see set_pte_at_notify that set_pte_at was
      called later).
      
      All the necessary cache invalidation should all be done in
      invalidate_range() already.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1b58a975
    • P
      powerpc/64s: Better printing of machine check info for guest MCEs · c0577201
      Paul Mackerras 提交于
      This adds an "in_guest" parameter to machine_check_print_event_info()
      so that we can avoid trying to translate guest NIP values into
      symbolic form using the host kernel's symbol table.
      Reviewed-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c0577201
    • P
      KVM: PPC: Book3S HV: Simplify machine check handling · 884dfb72
      Paul Mackerras 提交于
      This makes the handling of machine check interrupts that occur inside
      a guest simpler and more robust, with less done in assembler code and
      in real mode.
      
      Now, when a machine check occurs inside a guest, we always get the
      machine check event struct and put a copy in the vcpu struct for the
      vcpu where the machine check occurred.  We no longer call
      machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
      on POWER8, when a vcpu is running on an offline secondary thread and
      we call machine_check_queue_event(), that calls irq_work_queue(),
      which doesn't work because the CPU is offline, but instead triggers
      the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
      fires again and again because nothing clears the condition).
      
      All that machine_check_queue_event() actually does is to cause the
      event to be printed to the console.  For a machine check occurring in
      the guest, we now print the event in kvmppc_handle_exit_hv()
      instead.
      
      The assembly code at label machine_check_realmode now just calls C
      code and then continues exiting the guest.  We no longer either
      synthesize a machine check for the guest in assembly code or return
      to the guest without a machine check.
      
      The code in kvmppc_handle_exit_hv() is extended to handle the case
      where the guest is not FWNMI-capable.  In that case we now always
      synthesize a machine check interrupt for the guest.  Previously, if
      the host thinks it has recovered the machine check fully, it would
      return to the guest without any notification that the machine check
      had occurred.  If the machine check was caused by some action of the
      guest (such as creating duplicate SLB entries), it is much better to
      tell the guest that it has caused a problem.  Therefore we now always
      generate a machine check interrupt for guests that are not
      FWNMI-capable.
      Reviewed-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      884dfb72
    • M
      KVM: PPC: Book3S HV: Context switch AMR on Power9 · d976f680
      Michael Ellerman 提交于
      kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9
      when guest and host are both running with the Radix MMU.
      
      Currently in that path we don't save the host AMR (Authority Mask
      Register) value, and we always restore 0 on return to the host. That
      is OK at the moment because the AMR is not used for storage keys with
      the Radix MMU.
      
      However we plan to start using the AMR on Radix to prevent the kernel
      from reading/writing to userspace outside of copy_to/from_user(). In
      order to make that work we need to save/restore the AMR value.
      
      We only restore the value if it is different from the guest value,
      which is already in the register when we exit to the host. This should
      mean we rarely need to actually restore the value when running a
      modern Linux as a guest, because it will be using the same value as
      us.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NRussell Currey <ruscur@russell.cc>
      d976f680
  2. 18 2月, 2019 30 次提交
  3. 17 2月, 2019 1 次提交
    • M
      powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present() · a5800762
      Michael Ellerman 提交于
      In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT
      rather than just checking that the value is non-zero, e.g.:
      
        static inline int pgd_present(pgd_t pgd)
        {
       -       return !pgd_none(pgd);
       +       return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
        }
      
      Unfortunately this is broken on big endian, as the result of the
      bitwise & is truncated to int, which is always zero because
      _PAGE_PRESENT is 0x8000000000000000ul. This means pgd_present() and
      pud_present() are always false at compile time, and the compiler
      elides the subsequent code.
      
      Remarkably with that bug present we are still able to boot and run
      with few noticeable effects. However under some work loads we are able
      to trigger a warning in the ext4 code:
      
        WARNING: CPU: 11 PID: 29593 at fs/ext4/inode.c:3927 .ext4_set_page_dirty+0x70/0xb0
        CPU: 11 PID: 29593 Comm: debugedit Not tainted 4.20.0-rc1 #1
        ...
        NIP .ext4_set_page_dirty+0x70/0xb0
        LR  .set_page_dirty+0xa0/0x150
        Call Trace:
         .set_page_dirty+0xa0/0x150
         .unmap_page_range+0xbf0/0xe10
         .unmap_vmas+0x84/0x130
         .unmap_region+0xe8/0x190
         .__do_munmap+0x2f0/0x510
         .__vm_munmap+0x80/0x110
         .__se_sys_munmap+0x14/0x30
         system_call+0x5c/0x70
      
      The fix is simple, we need to convert the result of the bitwise & to
      an int before returning it.
      
      Thanks to Erhard, Jan Kara and Aneesh for help with debugging.
      
      Fixes: da7ad366 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
      Cc: stable@vger.kernel.org # v4.20+
      Reported-by: NErhard F. <erhard_f@mailbox.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a5800762
  4. 06 2月, 2019 2 次提交
  5. 05 2月, 2019 1 次提交