1. 30 11月, 2021 1 次提交
  2. 24 11月, 2021 1 次提交
  3. 22 10月, 2021 1 次提交
  4. 26 8月, 2021 1 次提交
  5. 13 8月, 2021 2 次提交
  6. 26 7月, 2021 1 次提交
  7. 09 7月, 2021 3 次提交
  8. 30 6月, 2021 1 次提交
  9. 22 6月, 2021 1 次提交
  10. 21 6月, 2021 2 次提交
    • B
      KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE · f0c6fbbb
      Bharata B Rao 提交于
      H_RPT_INVALIDATE does two types of TLB invalidations:
      
      1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
         This is currently not used in KVM as GTSE is not usually
         disabled in KVM.
      2. Partition-scoped invalidations that an L1 hypervisor does on
         behalf of an L2 guest. This is currently handled
         by H_TLB_INVALIDATE hcall and this new replaces the old that.
      
      This commit enables process-scoped invalidations for L1 guests.
      Support for process-scoped and partition-scoped invalidations
      from/for nested guests will be added separately.
      
      Process scoped tlbie invalidations from L1 and nested guests
      need RS register for TLBIE instruction to contain both PID and
      LPID.  This patch introduces primitives that execute tlbie
      instruction with both PID and LPID set in prepartion for
      H_RPT_INVALIDATE hcall.
      
      A description of H_RPT_INVALIDATE follows:
      
      int64   /* H_Success: Return code on successful completion */
              /* H_Busy - repeat the call with the same */
              /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
      	   parameters */
      hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
      					translation
      					lookaside information */
            uint64 id,        /* PID/LPID to invalidate */
            uint64 target,    /* Invalidation target */
            uint64 type,      /* Type of lookaside information */
            uint64 pg_sizes,  /* Page sizes */
            uint64 start,     /* Start of Effective Address (EA)
      			   range (inclusive) */
            uint64 end)       /* End of EA range (exclusive) */
      
      Invalidation targets (target)
      -----------------------------
      Core MMU        0x01 /* All virtual processors in the
      			partition */
      Core local MMU  0x02 /* Current virtual processor */
      Nest MMU        0x04 /* All nest/accelerator agents
      			in use by the partition */
      
      A combination of the above can be specified,
      except core and core local.
      
      Type of translation to invalidate (type)
      ---------------------------------------
      NESTED       0x0001  /* invalidate nested guest partition-scope */
      TLB          0x0002  /* Invalidate TLB */
      PWC          0x0004  /* Invalidate Page Walk Cache */
      PRT          0x0008  /* Invalidate caching of Process Table
      			Entries if NESTED is clear */
      PAT          0x0008  /* Invalidate caching of Partition Table
      			Entries if NESTED is set */
      
      A combination of the above can be specified.
      
      Page size mask (pages)
      ----------------------
      4K              0x01
      64K             0x02
      2M              0x04
      1G              0x08
      All sizes       (-1UL)
      
      A combination of the above can be specified.
      All page sizes can be selected with -1.
      
      Semantics: Invalidate radix tree lookaside information
                 matching the parameters given.
      * Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
        are different from the defined values.
      * Return H_PARAMETER if NESTED is set and pid is not a valid nested
        LPID allocated to this partition
      * Return H_P5 if (start, end) doesn't form a valid range. Start and
        end should be a valid Quadrant address and  end > start.
      * Return H_NotSupported if the partition is not in running in radix
        translation mode.
      * May invalidate more translation information than requested.
      * If start = 0 and end = -1, set the range to cover all valid
        addresses. Else start and end should be aligned to 4kB (lower 11
        bits clear).
      * If NESTED is clear, then invalidate process scoped lookaside
        information. Else pid specifies a nested LPID, and the invalidation
        is performed   on nested guest partition table and nested guest
        partition scope real addresses.
      * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
        and quadrant 0 spaces, Else valid addresses are quadrant 0.
      * Pages which are fully covered by the range are to be invalidated.
        Those which are partially covered are considered outside
        invalidation range, which allows a caller to optimally invalidate
        ranges that may   contain mixed page sizes.
      * Return H_SUCCESS on success.
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com
      f0c6fbbb
    • B
      powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def · d6265cb3
      Bharata B Rao 提交于
      Add a field to mmu_psize_def to store the page size encodings
      of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
      AP encodings. This will be used when invalidating with required
      page size encoding in the hcall.
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210621085003.904767-3-bharata@linux.ibm.com
      d6265cb3
  11. 17 6月, 2021 1 次提交
  12. 10 6月, 2021 2 次提交
  13. 01 5月, 2021 2 次提交
  14. 17 4月, 2021 1 次提交
  15. 14 4月, 2021 3 次提交
  16. 08 4月, 2021 5 次提交
    • M
      powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR · 87e65ad7
      Michael Ellerman 提交于
      When we enabled STRICT_KERNEL_RWX we received some reports of boot
      failures when using the Hash MMU and running under phyp. The crashes
      are intermittent, and often exhibit as a completely unresponsive
      system, or possibly an oops.
      
      One example, which was caught in xmon:
      
        [   14.068327][    T1] devtmpfs: mounted
        [   14.069302][    T1] Freeing unused kernel memory: 5568K
        [   14.142060][  T347] BUG: Unable to handle kernel instruction fetch
        [   14.142063][    T1] Run /sbin/init as init process
        [   14.142074][  T347] Faulting instruction address: 0xc000000000004400
        cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
            pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
            lr: c0000000001862d4: update_rq_clock+0x44/0x110
            sp: c00000000c747880
           msr: 8000000040001031
          current = 0xc00000000c60d380
          paca    = 0xc00000001ec9de80   irqmask: 0x03   irq_happened: 0x01
            pid   = 347, comm = kworker/2:1
        ...
        enter ? for help
        [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
        [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
        [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
        [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
        [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
        [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
        [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
        [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
        [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
        [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
        [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c
      
      This shows that CPU 2, which was idle, woke up and then appears to
      randomly take an instruction fault on a completely valid area of
      kernel text.
      
      The cause turns out to be the call to hash__mark_rodata_ro(), late in
      boot. Due to the way we layout text and rodata, that function actually
      changes the permissions for all of text and rodata to read-only plus
      execute.
      
      To do the permission change we use a hypervisor call, H_PROTECT. On
      phyp that appears to be implemented by briefly removing the mapping of
      the kernel text, before putting it back with the updated permissions.
      If any other CPU is executing during that window, it will see spurious
      faults on the kernel text and/or data, leading to crashes.
      
      To fix it we use stop machine to collect all other CPUs, and then have
      them drop into real mode (MMU off), while we change the mapping. That
      way they are unaffected by the mapping temporarily disappearing.
      
      We don't see this bug on KVM because KVM always use VPM=1, where
      faults are directed to the hypervisor, and the fault will be
      serialised vs the h_protect() by HPTE_V_HVLOCK.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au
      87e65ad7
    • M
      powerpc/mm/64s/hash: Factor out change_memory_range() · 6f223ebe
      Michael Ellerman 提交于
      Pull the loop calling hpte_updateboltedpp() out of
      hash__change_memory_range() into a helper function. We need it to be a
      separate function for the next patch.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210331003845.216246-4-mpe@ellerman.id.au
      6f223ebe
    • M
      powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro() · 2c02e656
      Michael Ellerman 提交于
      In hash__mark_rodata_ro() we pass the raw PP_RXXX value to
      hash__change_memory_range(). That has the effect of setting the key to
      zero, because PP_RXXX contains no key value.
      
      Fix it by using htab_convert_pte_flags(), which knows how to convert a
      pgprot into a pp value, including the key.
      
      Fixes: d94b827e ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Link: https://lore.kernel.org/r/20210331003845.216246-3-mpe@ellerman.id.au
      2c02e656
    • J
      powerpc/64s: Fix pte update for kernel memory on radix · b8b2f37c
      Jordan Niethe 提交于
      When adding a PTE a ptesync is needed to order the update of the PTE
      with subsequent accesses otherwise a spurious fault may be raised.
      
      radix__set_pte_at() does not do this for performance gains. For
      non-kernel memory this is not an issue as any faults of this kind are
      corrected by the page fault handler. For kernel memory these faults
      are not handled. The current solution is that there is a ptesync in
      flush_cache_vmap() which should be called when mapping from the
      vmalloc region.
      
      However, map_kernel_page() does not call flush_cache_vmap(). This is
      troublesome in particular for code patching with Strict RWX on radix.
      In do_patch_instruction() the page frame that contains the instruction
      to be patched is mapped and then immediately patched. With no ordering
      or synchronization between setting up the PTE and writing to the page
      it is possible for faults.
      
      As the code patching is done using __put_user_asm_goto() the resulting
      fault is obscured - but using a normal store instead it can be seen:
      
        BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c
        Faulting instruction address: 0xc00000000008bd74
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
        Modules linked in: nop_module(PO+) [last unloaded: nop_module]
        CPU: 4 PID: 757 Comm: sh Tainted: P           O      5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
        NIP:  c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810
        REGS: c000000016f634a0 TRAP: 0300   Tainted: P           O       (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
        MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002884  XER: 00000000
        CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1
      
      This results in the kind of issue reported here:
        https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/
      
      Chris Riedl suggested a reliable way to reproduce the issue:
        $ mount -t debugfs none /sys/kernel/debug
        $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) &
      
      Turning ftrace on and off does a large amount of code patching which
      in usually less then 5min will crash giving a trace like:
      
         ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000)
         ------------[ ftrace bug ]------------
         ftrace failed to modify
         [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390
          actual:   11:3b:47:4b
         Setting ftrace call site to call ftrace function
         ftrace record flags: 80000001
          (1)
          expected tramp: c00000000006c96c
         ------------[ cut here ]------------
         WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8
         Modules linked in: nop_module(PO-) [last unloaded: nop_module]
         CPU: 4 PID: 809 Comm: sh Tainted: P           O      5.10.0-rc5-01360-gf878ccaf250a #1
         NIP:  c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0
         REGS: c000000004c8b760 TRAP: 0700   Tainted: P           O       (5.10.0-rc5-01360-gf878ccaf250a)
         MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008848  XER: 20040000
         CFAR: c0000000001a9c98 IRQMASK: 0
         GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022
         GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8
         GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118
         GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000
         GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008
         GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8
         GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020
         GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0
         NIP ftrace_bug+0x28c/0x2e8
         LR  ftrace_bug+0x288/0x2e8
         Call Trace:
           ftrace_bug+0x288/0x2e8 (unreliable)
           ftrace_modify_all_code+0x168/0x210
           arch_ftrace_update_code+0x18/0x30
           ftrace_run_update_code+0x44/0xc0
           ftrace_startup+0xf8/0x1c0
           register_ftrace_function+0x4c/0xc0
           function_trace_init+0x80/0xb0
           tracing_set_tracer+0x2a4/0x4f0
           tracing_set_trace_write+0xd4/0x130
           vfs_write+0xf0/0x330
           ksys_write+0x84/0x140
           system_call_exception+0x14c/0x230
           system_call_common+0xf0/0x27c
      
      To fix this when updating kernel memory PTEs using ptesync.
      
      Fixes: f1cb8f9b ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Tidy up change log slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
      b8b2f37c
    • B
      powerpc: Spelling/typo fixes · 4763d378
      Bhaskar Chowdhury 提交于
      Various spelling/typo fixes.
      Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4763d378
  17. 29 3月, 2021 3 次提交
  18. 11 2月, 2021 2 次提交
  19. 08 2月, 2021 7 次提交