1. 24 5月, 2018 2 次提交
    • S
      bpf: powerpc64: add JIT support for multi-function programs · 8484ce83
      Sandipan Das 提交于
      This adds support for bpf-to-bpf function calls in the powerpc64
      JIT compiler. The JIT compiler converts the bpf call instructions
      to native branch instructions. After a round of the usual passes,
      the start addresses of the JITed images for the callee functions
      are known. Finally, to fixup the branch target addresses, we need
      to perform an extra pass.
      
      Because of the address range in which JITed images are allocated
      on powerpc64, the offsets of the start addresses of these images
      from __bpf_call_base are as large as 64 bits. So, for a function
      call, we cannot use the imm field of the instruction to determine
      the callee's address. Instead, we use the alternative method of
      getting it from the list of function addresses in the auxiliary
      data of the caller by using the off field as an index.
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8484ce83
    • S
      bpf: powerpc64: pad function address loads with NOPs · 4ea69b2f
      Sandipan Das 提交于
      For multi-function programs, loading the address of a callee
      function to a register requires emitting instructions whose
      count varies from one to five depending on the nature of the
      address.
      
      Since we come to know of the callee's address only before the
      extra pass, the number of instructions required to load this
      address may vary from what was previously generated. This can
      make the JITed image grow or shrink.
      
      To avoid this, we should generate a constant five-instruction
      when loading function addresses by padding the optimized load
      sequence with NOPs.
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4ea69b2f
  2. 22 5月, 2018 1 次提交
  3. 18 5月, 2018 1 次提交
    • M
      powerpc/64s: Clear PCR on boot · faf37c44
      Michael Neuling 提交于
      Clear the PCR (Processor Compatibility Register) on boot to ensure we
      are not running in a compatibility mode.
      
      We've seen this cause problems when a crash (and kdump) occurs while
      running compat mode guests. The kdump kernel then runs with the PCR
      set and causes problems. The symptom in the kdump kernel (also seen in
      petitboot after fast-reboot) is early userspace programs taking
      sigills on newer instructions (seen in libc).
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      faf37c44
  4. 17 5月, 2018 7 次提交
    • N
      powerpc/powernv: Fix NVRAM sleep in invalid context when crashing · c1d2a313
      Nicholas Piggin 提交于
      Similarly to opal_event_shutdown, opal_nvram_write can be called in
      the crash path with irqs disabled. Special case the delay to avoid
      sleeping in invalid context.
      
      Fixes: 3b807033 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
      Cc: stable@vger.kernel.org # v3.2
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1d2a313
    • N
      powerpc: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selected · 4c1d9bb0
      Nicholas Piggin 提交于
      This requires further changes to linker script to KEEP some tables
      and wildcard compiler generated sections into the right place. This
      includes pp32 modifications from Christophe Leroy.
      
      When compiling powernv_defconfig with this option, the resulting
      kernel is almost 400kB smaller (and still boots):
      
          text      data       bss        dec   filename
      11827621   4810490   1341080   17979191   vmlinux
      11752437   4598858   1338776   17690071   vmlinux.dcde
      
      Mathieu's numbers for custom Mac Mini G4 config has almost 200kB
      saving. It also had some increase in vmlinux size for as-yet
      unknown reasons.
      
          text      data       bss        dec   filename
       7461457   2475122   1428064   11364643   vmlinux
       7386425   2364370   1425432   11176227   vmlinux.dcde
      
      Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [8xx]
      Tested-by: Mathieu Malaterre <malat@debian.org> [32-bit powermac]
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      4c1d9bb0
    • P
      KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path · df158189
      Paul Mackerras 提交于
      A radix guest can execute tlbie instructions to invalidate TLB entries.
      After a tlbie or a group of tlbies, it must then do the architected
      sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
      has been processed by all CPUs in the system before it can rely on
      no CPU using any translation that it just invalidated.
      
      In fact it is the ptesync which does the actual synchronization in
      this sequence, and hardware has a requirement that the ptesync must
      be executed on the same CPU thread as the tlbies which it is expected
      to order.  Thus, if a vCPU gets moved from one physical CPU to
      another after it has done some tlbies but before it can get to do the
      ptesync, the ptesync will not have the desired effect when it is
      executed on the second physical CPU.
      
      To fix this, we do a ptesync in the exit path for radix guests.  If
      there are any pending tlbies, this will wait for them to complete.
      If there aren't, then ptesync will just do the same as sync.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      df158189
    • B
      KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change · 9dc81d6b
      Benjamin Herrenschmidt 提交于
      When a vcpu priority (CPPR) is set to a lower value (masking more
      interrupts), we stop processing interrupts already in the queue
      for the priorities that have now been masked.
      
      If those interrupts were previously re-routed to a different
      CPU, they might still be stuck until the older one that has
      them in its queue processes them. In the case of guest CPU
      unplug, that can be never.
      
      To address that without creating additional overhead for
      the normal interrupt processing path, this changes H_CPPR
      handling so that when such a priority change occurs, we
      scan the interrupt queue for that vCPU, and for any
      interrupt in there that has been re-routed, we replace it
      with a dummy and force a re-trigger.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9dc81d6b
    • N
      KVM: PPC: Book3S HV: Make radix clear pte when unmapping · 7e3d9a1d
      Nicholas Piggin 提交于
      The current partition table unmap code clears the _PAGE_PRESENT bit
      out of the pte, which leaves pud_huge/pmd_huge true and does not
      clear pud_present/pmd_present.  This can confuse subsequent page
      faults and possibly lead to the guest looping doing continual
      hypervisor page faults.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7e3d9a1d
    • N
      KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page · e2560b10
      Nicholas Piggin 提交于
      The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it
      is ordered with respect to subsequent operations.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e2560b10
    • P
      KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry · 57b8daa7
      Paul Mackerras 提交于
      Currently, the HV KVM guest entry/exit code adds the timebase offset
      from the vcore struct to the timebase on guest entry, and subtracts
      it on guest exit.  Which is fine, except that it is possible for
      userspace to change the offset using the SET_ONE_REG interface while
      the vcore is running, as there is only one timebase offset per vcore
      but potentially multiple VCPUs in the vcore.  If that were to happen,
      KVM would subtract a different offset on guest exit from that which
      it had added on guest entry, leading to the timebase being out of sync
      between cores in the host, which then leads to bad things happening
      such as hangs and spurious watchdog timeouts.
      
      To fix this, we add a new field 'tb_offset_applied' to the vcore struct
      which stores the offset that is currently applied to the timebase.
      This value is set from the vcore tb_offset field on guest entry, and
      is what is subtracted from the timebase on guest exit.  Since it is
      zero when the timebase offset is not applied, we can simplify the
      logic in kvmhv_start_timing and kvmhv_accumulate_time.
      
      In addition, we had secondary threads reading the timebase while
      running concurrently with code on the primary thread which would
      eventually add or subtract the timebase offset from the timebase.
      This occurred while saving or restoring the DEC register value on
      the secondary threads.  Although no specific incorrect behaviour has
      been observed, this is a race which should be fixed.  To fix it, we
      move the DEC saving code to just before we call kvmhv_commence_exit,
      and the DEC restoring code to after the point where we have waited
      for the primary thread to switch the MMU context and add the timebase
      offset.  That way we are sure that the timebase contains the guest
      timebase value in both cases.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      57b8daa7
  5. 16 5月, 2018 1 次提交
  6. 14 5月, 2018 2 次提交
  7. 09 5月, 2018 6 次提交
  8. 08 5月, 2018 3 次提交
  9. 07 5月, 2018 4 次提交
  10. 04 5月, 2018 1 次提交
  11. 27 4月, 2018 2 次提交
  12. 25 4月, 2018 5 次提交
    • E
      signal/powerpc: Replace TRAP_FIXME with TRAP_UNK · e821fa42
      Eric W. Biederman 提交于
      Using an si_code of 0 that aliases with SI_USER is clearly the wrong
      thing todo, and causes problems in interesting ways.
      
      For use in unknown_exception the recently defined TRAP_UNK
      semantically is a perfect fit.  For use in RunModeException it looks
      like something more specific than TRAP_UNK could be used.  No one has
      bothered to find a better fit than the broken si_code of 0 in all of
      these years and I don't see an obvious better fit so TRAP_UNK is
      switching RunModeException to return TRAP_UNK is clearly an
      improvement.
      
      Recent history suggests no actually cares about crazy corner
      cases of the kernel behavior like this so I don't expect any
      regressions from changing this.  However if something does
      happen this change is easy to revert.
      
      Though I wonder if SIGKILL might not be a better fit.
      
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Kumar Gala <kumar.gala@freescale.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
      Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e821fa42
    • E
      signal/powerpc: Replace FPE_FIXME with FPE_FLTUNK · aeb1c0f6
      Eric W. Biederman 提交于
      Using an si_code of 0 that aliases with SI_USER is clearly the
      wrong thing todo, and causes problems in interesting ways.
      
      The newly defined FPE_FLTUNK semantically appears to fit the
      bill so use it instead.
      
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Kumar Gala <kumar.gala@freescale.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc:  linuxppc-dev@lists.ozlabs.org
      Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
      Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
      History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      aeb1c0f6
    • E
      signal: Ensure every siginfo we send has all bits initialized · 3eb0f519
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure every stack allocated siginfo is properly
      initialized before being passed to the signal sending functions.
      
      Note: It is not safe to depend on C initializers to initialize struct
      siginfo on the stack because C is allowed to skip holes when
      initializing a structure.
      
      The initialization of struct siginfo in tracehook_report_syscall_exit
      was moved from the helper user_single_step_siginfo into
      tracehook_report_syscall_exit itself, to make it clear that the local
      variable siginfo gets fully initialized.
      
      In a few cases the scope of struct siginfo has been reduced to make it
      clear that siginfo siginfo is not used on other paths in the function
      in which it is declared.
      
      Instances of using memset to initialize siginfo have been replaced
      with calls clear_siginfo for clarity.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3eb0f519
    • N
      powerpc: Fix smp_send_stop NMI IPI handling · ac61c115
      Nicholas Piggin 提交于
      The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count
      over the handler function call, which causes later smp_send_nmi_ipi()
      callers to spin until the call is finished.
      
      The stop_this_cpu() function never returns, so the busy count is never
      decremeted, which can cause the system to hang in some cases. For
      example panic() will call smp_send_stop() early on which calls
      stop_this_cpu() on other CPUs, then later in the reboot path,
      pnv_restart() will call smp_send_stop() again, which hangs.
      
      Fix this by adding a special case to the stop_this_cpu() handler to
      decrement the busy count, because it will never return.
      
      Now that the NMI/non-NMI versions of stop_this_cpu() are different,
      split them out into separate functions rather than doing #ifdef tricks
      to share the body between the two functions.
      
      Fixes: 6bed3237 ("powerpc: use NMI IPI for smp_send_stop")
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Split out the functions, tweak change log a bit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac61c115
    • N
      rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops · 682e6b4d
      Nicholas Piggin 提交于
      The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
      OPAL_BUSY_EVENT from firmware, which causes large scheduling
      latencies, up to 50 seconds have been observed here when RTC stops
      responding (BMC reboot can do it).
      
      Fix this by converting it to the standard form OPAL_BUSY loop that
      sleeps.
      
      Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
      Cc: stable@vger.kernel.org # v3.2+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      682e6b4d
  13. 24 4月, 2018 5 次提交
    • M
      powerpc/mce: Fix a bug where mce loops on memory UE. · 75ecfb49
      Mahesh Salgaonkar 提交于
      The current code extracts the physical address for UE errors and then
      hooks it up into memory failure infrastructure. On successful
      extraction of physical address it wrongly sets "handled = 1" which
      means this UE error has been recovered. Since MCE handler gets return
      value as handled = 1, it assumes that error has been recovered and
      goes back to same NIP. This causes MCE interrupt again and again in a
      loop leading to hard lockup.
      
      Also, initialize phys_addr to ULONG_MAX so that we don't end up
      queuing undesired page to hwpoison.
      
      Without this patch we see:
        Severe Machine check interrupt [Recovered]
          NIP: [000000001002588c] PID: 7109 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffd2755940
            Physical address:  000020181a080000
        ...
        Severe Machine check interrupt [Recovered]
          NIP: [000000001002588c] PID: 7109 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffd2755940
            Physical address:  000020181a080000
        Severe Machine check interrupt [Recovered]
          NIP: [000000001002588c] PID: 7109 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffd2755940
            Physical address:  000020181a080000
        Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        Memory failure: 0x20181a08: already hardware poisoned
        ...
        Watchdog CPU:38 Hard LOCKUP
      
      After this patch we see:
      
        Severe Machine check interrupt [Not recovered]
          NIP: [00007fffaae585f4] PID: 7168 Comm: find
          Initiator: CPU
          Error type: UE [Load/Store]
            Effective address: 00007fffaafe28ac
            Physical address:  00002017c0bd0000
        find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4
        Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered
      
      Fixes: 01eaac2b ("powerpc/mce: Hookup ierror (instruction) UE errors")
      Fixes: ba41e1e1 ("powerpc/mce: Hookup derror (load/store) UE errors")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      75ecfb49
    • A
      powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range · d0cf9b56
      Alistair Popple 提交于
      The NPU has a limited number of address translation shootdown (ATSD)
      registers and the GPU has limited bandwidth to process ATSDs. This can
      result in contention of ATSD registers leading to soft lockups on some
      threads, particularly when invalidating a large address range in
      pnv_npu2_mn_invalidate_range().
      
      At some threshold it becomes more efficient to flush the entire GPU
      TLB for the given MM context (PID) than individually flushing each
      address in the range. This patch will result in ranges greater than
      2MB being converted from 32+ ATSDs into a single ATSD which will flush
      the TLB for the given PID on each GPU.
      
      Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Tested-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d0cf9b56
    • A
      powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters · a1409ada
      Alistair Popple 提交于
      There is a single npu context per set of callback parameters. Callers
      should be prevented from overwriting existing callback values so
      instead return an error if different parameters are passed.
      
      Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Tested-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a1409ada
    • A
      powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroy · 28a5933e
      Alistair Popple 提交于
      The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions
      are used to allocate/free contexts to allow address translation and
      shootdown by the NPU on a particular GPU. Context initialisation is
      implicitly safe as it is protected by the requirement mmap_sem be held
      in write mode, however pnv_npu2_destroy_context() does not require
      mmap_sem to be held and it is not safe to call with a concurrent
      initialisation for a different GPU.
      
      It was assumed the driver would ensure destruction was not called
      concurrently with initialisation. However the driver may be simplified
      by allowing concurrent initialisation and destruction for different
      GPUs. As npu context creation/destruction is not a performance
      critical path and the critical section is not large a single spinlock
      is used for simplicity.
      
      Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Tested-by: NMark Hairgrove <mhairgrove@nvidia.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      28a5933e
    • B
      powerpc/powernv/memtrace: Let the arch hotunplug code flush cache · 7fd6641d
      Balbir Singh 提交于
      Don't do this via custom code, instead now that we have support in the
      arch hotplug/hotunplug code, rely on those routines to do the right
      thing.
      
      The existing flush doesn't work because it uses ppc64_caches.l1d.size
      instead of ppc64_caches.l1d.line_size.
      
      Fixes: 9d5171a8 ("powerpc/powernv: Enable removal of memory for in memory tracing")
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7fd6641d