1. 15 1月, 2022 1 次提交
  2. 23 12月, 2021 2 次提交
  3. 25 11月, 2021 1 次提交
  4. 25 8月, 2021 1 次提交
  5. 16 6月, 2021 1 次提交
  6. 15 6月, 2021 9 次提交
  7. 23 5月, 2021 1 次提交
  8. 21 4月, 2021 1 次提交
  9. 03 4月, 2021 1 次提交
  10. 10 12月, 2020 1 次提交
  11. 09 12月, 2020 1 次提交
  12. 29 7月, 2020 1 次提交
  13. 23 7月, 2020 2 次提交
  14. 16 7月, 2020 7 次提交
  15. 15 7月, 2020 1 次提交
    • N
      powerpc/64/signal: Balance return predictor stack in signal trampoline · 0138ba57
      Nicholas Piggin 提交于
      Returning from an interrupt or syscall to a signal handler currently
      begins execution directly at the handler's entry point, with LR set to
      the address of the sigreturn trampoline. When the signal handler
      function returns, it runs the trampoline. It looks like this:
      
          # interrupt at user address xyz
          # kernel stuff... signal is raised
          rfid
          # void handler(int sig)
          addis 2,12,.TOC.-.LCF0@ha
          addi 2,2,.TOC.-.LCF0@l
          mflr 0
          std 0,16(1)
          stdu 1,-96(1)
          # handler stuff
          ld 0,16(1)
          mtlr 0
          blr
          # __kernel_sigtramp_rt64
          addi    r1,r1,__SIGNAL_FRAMESIZE
          li      r0,__NR_rt_sigreturn
          sc
          # kernel executes rt_sigreturn
          rfid
          # back to user address xyz
      
      Note the blr with no matching bl. This can corrupt the return
      predictor.
      
      Solve this by instead resuming execution at the signal trampoline
      which then calls the signal handler. qtrace-tools link_stack checker
      confirms the entire user/kernel/vdso cycle is balanced after this
      patch, whereas it's not upstream.
      
      Alan confirms the dwarf unwind info still looks good. gdb still
      recognises the signal frame and can step into parent frames if it
      break inside a signal handler.
      
      Performance is pretty noisy, not a very significant change on a POWER9
      here, but branch misses are consistently a lot lower on a
      microbenchmark:
      
       Performance counter stats for './signal':
      
             13,085.72 msec task-clock                #    1.000 CPUs utilized
        45,024,760,101      cycles                    #    3.441 GHz
        65,102,895,542      instructions              #    1.45  insn per cycle
        11,271,673,787      branches                  #  861.372 M/sec
            59,468,979      branch-misses             #    0.53% of all branches
      
             12,989.09 msec task-clock                #    1.000 CPUs utilized
        44,692,719,559      cycles                    #    3.441 GHz
        65,109,984,964      instructions              #    1.46  insn per cycle
        11,282,136,057      branches                  #  868.585 M/sec
            39,786,942      branch-misses             #    0.35% of all branches
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
      0138ba57
  16. 18 5月, 2020 1 次提交
  17. 05 7月, 2019 1 次提交
  18. 03 7月, 2019 2 次提交
  19. 14 6月, 2019 1 次提交
  20. 31 5月, 2019 1 次提交
  21. 16 3月, 2019 1 次提交
    • N
      powerpc: bpf: Fix generation of load/store DW instructions · 86be36f6
      Naveen N. Rao 提交于
      Yauheni Kaliuta pointed out that PTR_TO_STACK store/load verifier test
      was failing on powerpc64 BE, and rightfully indicated that the PPC_LD()
      macro is not masking away the last two bits of the offset per the ISA,
      resulting in the generation of 'lwa' instruction instead of the intended
      'ld' instruction.
      
      Segher also pointed out that we can't simply mask away the last two bits
      as that will result in loading/storing from/to a memory location that
      was not intended.
      
      This patch addresses this by using ldx/stdx if the offset is not
      word-aligned. We load the offset into a temporary register (TMP_REG_2)
      and use that as the index register in a subsequent ldx/stdx. We fix
      PPC_LD() macro to mask off the last two bits, but enhance PPC_BPF_LL()
      and PPC_BPF_STL() to factor in the offset value and generate the proper
      instruction sequence. We also convert all existing users of PPC_LD() and
      PPC_STD() to use these macros. All existing uses of these macros have
      been audited to ensure that TMP_REG_2 can be clobbered.
      
      Fixes: 156d0e29 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
      Cc: stable@vger.kernel.org # v4.9+
      Reported-by: NYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      86be36f6
  22. 25 2月, 2019 1 次提交
  23. 23 2月, 2019 1 次提交