1. 09 2月, 2018 3 次提交
    • A
      KVM: PPC: Book3S HV: Branch inside feature section · d20fe50a
      Alexander Graf 提交于
      We ended up with code that did a conditional branch inside a feature
      section to code outside of the feature section. Depending on how the
      object file gets organized, that might mean we exceed the 14bit
      relocation limit for conditional branches:
      
        arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4
      
      So instead of doing a conditional branch outside of the feature section,
      let's just jump at the end of the same, making the branch very short.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d20fe50a
    • D
      KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 · 790a9df5
      David Gibson 提交于
      This adds code to enable the HPT resizing code to work on POWER9,
      which uses a slightly modified HPT entry format compared to POWER8.
      On POWER9, we convert HPTEs read from the HPT from the new format to
      the old format so that the rest of the HPT resizing code can work as
      before.  HPTEs written to the new HPT are converted to the new format
      as the last step before writing them into the new HPT.
      
      This takes out the checks added by commit bcd3bb63 ("KVM: PPC:
      Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
      now that HPT resizing works on POWER9.
      
      On POWER9, when we pivot to the new HPT, we now call
      kvmppc_setup_partition_table() to update the partition table in order
      to make the hardware use the new HPT.
      
      [paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
       wrote commit message.]
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      790a9df5
    • P
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · 05f2bb03
      Paul Mackerras 提交于
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      05f2bb03
  2. 08 2月, 2018 1 次提交
  3. 01 2月, 2018 2 次提交
    • A
      KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled · 07ae5389
      Alexander Graf 提交于
      When copying between the vcpu and svcpu, we may get scheduled away onto
      a different host CPU which in turn means our svcpu pointer may change.
      
      That means we need to atomically copy to and from the svcpu with preemption
      disabled, so that all code around it always sees a coherent state.
      Reported-by: NSimon Guo <wei.guo.simon@gmail.com>
      Fixes: 3d3319b4 ("KVM: PPC: Book3S: PR: Enable interrupts earlier")
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      07ae5389
    • P
      KVM: PPC: Book3S HV: Drop locks before reading guest memory · 36ee41d1
      Paul Mackerras 提交于
      Running with CONFIG_DEBUG_ATOMIC_SLEEP reveals that HV KVM tries to
      read guest memory, in order to emulate guest instructions, while
      preempt is disabled and a vcore lock is held.  This occurs in
      kvmppc_handle_exit_hv(), called from post_guest_process(), when
      emulating guest doorbell instructions on POWER9 systems, and also
      when checking whether we have hit a hypervisor breakpoint.
      Reading guest memory can cause a page fault and thus cause the
      task to sleep, so we need to avoid reading guest memory while
      holding a spinlock or when preempt is disabled.
      
      To fix this, we move the preempt_enable() in kvmppc_run_core() to
      before the loop that calls post_guest_process() for each vcore that
      has just run, and we drop and re-take the vcore lock around the calls
      to kvmppc_emulate_debug_inst() and kvmppc_emulate_doorbell_instr().
      
      Dropping the lock is safe with respect to the iteration over the
      runnable vcpus in post_guest_process(); for_each_runnable_thread
      is actually safe to use locklessly.  It is possible for a vcpu
      to become runnable and add itself to the runnable_threads array
      (code near the beginning of kvmppc_run_vcpu()) and then get included
      in the iteration in post_guest_process despite the fact that it
      has not just run.  This is benign because vcpu->arch.trap and
      vcpu->arch.ceded will be zero.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      36ee41d1
  4. 19 1月, 2018 6 次提交
  5. 18 1月, 2018 2 次提交
    • P
      KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9 · d075745d
      Paul Mackerras 提交于
      Hypervisor maintenance interrupts (HMIs) are generated by various
      causes, signalled by bits in the hypervisor maintenance exception
      register (HMER).  In most cases calling OPAL to handle the interrupt
      is the correct thing to do, but the "debug trigger" HMIs signalled by
      PPC bit 17 (bit 46) of HMER are used to invoke software workarounds
      for hardware bugs, and OPAL does not have any code to handle this
      cause.  The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips
      to work around a hardware bug in executing vector load instructions to
      cache inhibited memory.  In POWER9 DD2.2 chips, it is generated when
      conditions are detected relating to threads being in TM (transactional
      memory) suspended mode when the core SMT configuration needs to be
      reconfigured.
      
      The kernel currently has code to detect the vector CI load condition,
      but only when the HMI occurs in the host, not when it occurs in a
      guest.  If a HMI occurs in the guest, it is always passed to OPAL, and
      then we always re-sync the timebase, because the HMI cause might have
      been a timebase error, for which OPAL would re-sync the timebase, thus
      removing the timebase offset which KVM applied for the guest.  Since
      we don't know what OPAL did, we don't know whether to subtract the
      timebase offset from the timebase, so instead we re-sync the timebase.
      
      This adds code to determine explicitly what the cause of a debug
      trigger HMI will be.  This is based on a new device-tree property
      under the CPU nodes called ibm,hmi-special-triggers, if it is
      present, or otherwise based on the PVR (processor version register).
      The handling of debug trigger HMIs is pulled out into a separate
      function which can be called from the KVM guest exit code.  If this
      function handles and clears the HMI, and no other HMI causes remain,
      then we skip calling OPAL and we proceed to subtract the guest
      timebase offset from the timebase.
      
      The overall handling for HMIs that occur in the host (i.e. not in a
      KVM guest) is largely unchanged, except that we now don't set the flag
      for the vector CI load workaround on DD2.2 processors.
      
      This also removes a BUG_ON in the KVM code.  BUG_ON is generally not
      useful in KVM guest entry/exit code since it is difficult to handle
      the resulting trap gracefully.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d075745d
    • P
      KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2 · 00608e1f
      Paul Mackerras 提交于
      POWER9 chip versions starting with "Nimbus" v2.2 can support running
      with some threads of a core in HPT mode and others in radix mode.
      This means that we don't have to prohibit independent-threads mode
      when running a HPT guest on a radix host, and we don't have to do any
      of the synchronization between threads that was introduced in commit
      c0101509 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix
      hosts", 2017-10-19).
      
      Rather than using up another CPU feature bit, we just do an
      explicit test on the PVR (processor version register) at module
      startup time to determine whether we have to take steps to avoid
      having some threads in HPT mode and some in radix mode (so-called
      "mixed mode").  We test for "Nimbus" (indicated by 0 or 1 in the top
      nibble of the lower 16 bits) v2.2 or later, or "Cumulus" (indicated by
      2 or 3 in that nibble) v1.1 or later.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      00608e1f
  6. 17 1月, 2018 2 次提交
    • P
      KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded · 6964e6a4
      Paul Mackerras 提交于
      This moves the code that loads and unloads the guest SLB values so that
      it is done while the guest LPCR value is loaded in the LPCR register.
      The reason for doing this is that on POWER9, the behaviour of the
      slbmte instruction depends on the LPCR[UPRT] bit.  If UPRT is 1, as
      it is for a radix host (or guest), the SLB index is truncated to
      2 bits.  This means that for a HPT guest on a radix host, the SLB
      was not being loaded correctly, causing the guest to crash.
      
      The SLB is now loaded much later in the guest entry path, after the
      LPCR is loaded, which for a secondary thread is after it sees that
      the primary thread has switched the MMU to the guest.  The loop that
      waits for the primary thread has a branch out to the exit code that
      is taken if it sees that other threads have commenced exiting the
      guest.  Since we have now not loaded the SLB at this point, we make
      this path branch to a new label 'guest_bypass' and we move the SLB
      unload code to before this label.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6964e6a4
    • P
      KVM: PPC: Book3S HV: Make sure we don't re-enter guest without XIVE loaded · 43ff3f65
      Paul Mackerras 提交于
      This fixes a bug where it is possible to enter a guest on a POWER9
      system without having the XIVE (interrupt controller) context loaded.
      This can happen because we unload the XIVE context from the CPU
      before doing the real-mode handling for machine checks.  After the
      real-mode handler runs, it is possible that we re-enter the guest
      via a fast path which does not load the XIVE context.
      
      To fix this, we move the unloading of the XIVE context to come after
      the real-mode machine check handler is called.
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      43ff3f65
  7. 16 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras 提交于
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5855564c
  8. 12 1月, 2018 2 次提交
  9. 11 1月, 2018 2 次提交
  10. 09 1月, 2018 1 次提交
  11. 30 11月, 2017 1 次提交
  12. 29 11月, 2017 2 次提交
    • V
      powerpc: Do not assign thread.tidr if already assigned · 7e4d4233
      Vaibhav Jain 提交于
      If set_thread_tidr() is called twice for same task_struct then it will
      allocate a new tidr value to it leaving the previous value still
      dangling in the vas_thread_ida table.
      
      To fix this the patch changes set_thread_tidr() to check if a tidr
      value is already assigned to the task_struct and if yes then returns
      zero.
      
      Fixes: ec233ede("powerpc: Add support for setting SPRN_TIDR")
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      [mpe: Modify to return 0 in the success case, not the TID value]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7e4d4233
    • V
      powerpc: Avoid signed to unsigned conversion in set_thread_tidr() · aca7573f
      Vaibhav Jain 提交于
      There is an unsafe signed to unsigned conversion in set_thread_tidr()
      that may cause an error value to be assigned to SPRN_TIDR register and
      used as thread-id.
      
      The issue happens as assign_thread_tidr() returns an int and
      thread.tidr is an unsigned-long. So a negative error code returned
      from assign_thread_tidr() will fail the error check and gets assigned
      as tidr as a large positive value.
      
      To fix this the patch assigns the return value of assign_thread_tidr()
      to a temporary int and assigns it to thread.tidr iff its '> 0'.
      
      The patch shouldn't impact the calling convention of set_thread_tidr()
      i.e all -ve return-values are error codes and a return value of '0'
      indicates success.
      
      Fixes: ec233ede("powerpc: Add support for setting SPRN_TIDR")
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aca7573f
  13. 28 11月, 2017 1 次提交
    • J
      KVM: Let KVM_SET_SIGNAL_MASK work as advertised · 20b7035c
      Jan H. Schönherr 提交于
      KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
      "any unblocked signal received [...] will cause KVM_RUN to return with
      -EINTR" and that "the signal will only be delivered if not blocked by
      the original signal mask".
      
      This, however, is only true, when the calling task has a signal handler
      registered for a signal. If not, signal evaluation is short-circuited for
      SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
      returning or the whole process is terminated.
      
      Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
      to that in do_sigtimedwait() to avoid short-circuiting of signals.
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      20b7035c
  14. 24 11月, 2017 1 次提交
  15. 23 11月, 2017 7 次提交
    • M
      powerpc/powernv: Fix kexec crashes caused by tlbie tracing · a3961f82
      Mahesh Salgaonkar 提交于
      Rebooting into a new kernel with kexec fails in trace_tlbie() which is
      called from native_hpte_clear(). This happens if the running kernel
      has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints
      always execute few RCU checks regardless of whether tracing is on or
      off. We are already in the last phase of kexec sequence in real mode
      with HILE_BE set. At this point the RCU check ends up in
      RCU_LOCKDEP_WARN and causes kexec to fail.
      
      Fix this by not calling trace_tlbie() from native_hpte_clear().
      
      mpe: It's not safe to call trace points at this point in the kexec
      path, even if we could avoid the RCU checks/warnings. The only
      solution is to not call them.
      
      Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a3961f82
    • P
      KVM: PPC: Book3S: Eliminate some unnecessary checks · 9aa6825b
      Paul Mackerras 提交于
      In an excess of caution, commit 6f63e81b ("KVM: PPC: Book3S: Add
      MMIO emulation for FP and VSX instructions", 2017-02-21) included
      checks for the case that vcpu->arch.mmio_vsx_copy_nums is less than
      zero, even though its type is u8.  This causes a Coverity warning,
      so we remove the check for < 0.  We also adjust the associated
      comment to be more accurate ("4 or less" rather than "less than 4").
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9aa6825b
    • P
      KVM: PPC: Book3S HV: Fix conditions for starting vcpu · c0093f1a
      Paul Mackerras 提交于
      This corrects the test that determines whether a vcpu that has just
      become able to run in the guest (e.g. it has just finished handling
      a hypercall or hypervisor page fault) and whose virtual core is
      already running somewhere as a "piggybacked" vcore can start
      immediately or not.  (A piggybacked vcore is one which is executing
      along with another vcore as a result of dynamic micro-threading.)
      
      Previously the test tried to lock the piggybacked vcore using
      spin_trylock, which would always fail because the vcore was already
      locked, and so the vcpu would have to wait until its vcore exited
      the guest before it could enter.
      
      In fact the vcpu can enter if its vcore is in VCORE_PIGGYBACK state
      and not already exiting (or exited) the guest, so the test in
      VCORE_PIGGYBACK state is basically the same as for VCORE_RUNNING
      state.
      
      Coverity detected this as a double unlock issue, which it isn't
      because the spin_trylock would always fail.  This will fix the
      apparent double unlock as well.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c0093f1a
    • P
      KVM: PPC: Book3S HV: Remove useless statement · 4fcf361d
      Paul Mackerras 提交于
      This removes a statement that has no effect.  It should have been
      removed in commit 898b25b2 ("KVM: PPC: Book3S HV: Simplify dynamic
      micro-threading code", 2017-06-22) along with the loop over the
      piggy-backed virtual cores.
      
      This issue was reported by Coverity.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4fcf361d
    • P
      KVM: PPC: Book3S HV: Fix typo in kvmppc_hv_get_dirty_log_radix() · 117647ff
      Paul Mackerras 提交于
      This fixes a typo where the intent was to assign to 'j' in order to
      skip some number of bits in the dirty bitmap for a guest.  The effect
      of the typo is benign since it means we just iterate through all the
      bits rather than skipping bits which we know will be zero.  This issue
      was found by Coverity.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      117647ff
    • P
      KVM: PPC: Book3S HV: Avoid shifts by negative amounts · cda2eaa3
      Paul Mackerras 提交于
      The kvmppc_hpte_page_shifts function decodes the actual and base page
      sizes for a HPTE, returning -1 if it doesn't recognize the page size
      encoding.  This then gets used as a shift amount in various places,
      which is undefined behaviour.  This was reported by Coverity.
      
      In fact this should never occur, since we should only get HPTEs in the
      HPT which have a recognized page size encoding.  The only place where
      this might not be true is in the call to kvmppc_actual_pgsz() near the
      beginning of kvmppc_do_h_enter(), where we are validating the HPTE
      value passed in from the guest.
      
      So to fix this and eliminate the undefined behaviour, we make
      kvmppc_hpte_page_shifts return 0 for unrecognized page size encodings,
      and make kvmppc_actual_pgsz() detect that case and return 0 for the
      page size, which will then cause kvmppc_do_h_enter() to return an
      error and refuse to insert any HPTE with an unrecognized page size
      encoding.
      
      To ensure that we don't get undefined behaviour in compute_tlbie_rb(),
      we take the 4k page size path for any unrecognized page size encoding.
      This should never be hit in practice because it is only used on HPTE
      values which have previously been checked for having a recognized
      page size encoding.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cda2eaa3
    • P
      KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts · ded13fc1
      Paul Mackerras 提交于
      This fixes two errors that prevent a guest using the HPT MMU from
      successfully migrating to a POWER9 host in radix MMU mode, or resizing
      its HPT when running on a radix host.
      
      The first bug was that commit 8dc6cca5 ("KVM: PPC: Book3S HV:
      Don't rely on host's page size information", 2017-09-11) missed two
      uses of hpte_base_page_size(), one in the HPT rehashing code and
      one in kvm_htab_write() (which is used on the destination side in
      migrating a HPT guest).  Instead we use kvmppc_hpte_base_page_shift().
      Having the shift count means that we can use left and right shifts
      instead of multiplication and division in a few places.
      
      Along the way, this adds a check in kvm_htab_write() to ensure that the
      page size encoding in the incoming HPTEs is recognized, and if not
      return an EINVAL error to userspace.
      
      The second bug was that kvm_htab_write was performing some but not all
      of the functions of kvmhv_setup_mmu(), resulting in the destination VM
      being left in radix mode as far as the hardware is concerned.  The
      simplest fix for now is make kvm_htab_write() call
      kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does.
      In future it would be better to refactor the code more extensively
      to remove the duplication.
      
      Fixes: 8dc6cca5 ("KVM: PPC: Book3S HV: Don't rely on host's page size information")
      Fixes: 7a84084c ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9")
      Reported-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Tested-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ded13fc1
  16. 22 11月, 2017 6 次提交
    • M
      powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features · 4d6c51b1
      Michael Ellerman 提交于
      I got the logic wrong in the DT CPU features code when I added the
      Power9 DD2.1 feature. We should be setting the bit if we detect a
      DD2.1, not clearing it if we detect a DD2.0.
      
      This code isn't actually exercised at the moment so nothing is
      actually broken.
      
      Fixes: 3ffa9d9e ("powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4d6c51b1
    • M
      powerpc/perf: Fix IMC_MAX_PMU macro · 73ce9aec
      Madhavan Srinivasan 提交于
      IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds
      nest pmu information. Current value for the macro is 32 based on
      the initial number of nest pmu units supported by the nest microcode.
      But going forward, microcode could support more nest units. Instead
      of static storage, patch to fix the code to dynamically allocate an
      array based on the number of nest imc units found in the device tree.
      
      Fixes:8f95faaa ('powerpc/powernv: Detect and create IMC device')
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      73ce9aec
    • M
      powerpc/perf: Fix pmu_count to count only nest imc pmus · de34787f
      Madhavan Srinivasan 提交于
      "pmu_count" in opal_imc_counters_probe() is intended to hold
      the number of successful nest imc pmu registerations. But
      current code also counts other imc units like core_imc and
      thread_imc. Patch add a check to count only nest imc pmus.
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de34787f
    • C
      powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX · 252eb558
      Christophe Leroy 提交于
      On powerpc32, patch_instruction() is called by apply_feature_fixups()
      which is called from early_init()
      
      There is the following note in front of early_init():
       * Note that the kernel may be running at an address which is different
       * from the address that it was linked at, so we must use RELOC/PTRRELOC
       * to access static data (including strings).  -- paulus
      
      Therefore, slab_is_available() cannot be called yet, and
      text_poke_area must be addressed with PTRRELOC()
      
      Fixes: 95902e6c ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
      Cc: stable@vger.kernel.org # v4.14+
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      252eb558
    • M
      powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id() · f3f1dfd6
      Michael Ellerman 提交于
      init_imc_pmu() uses topology_physical_package_id() to detect the
      node id of the processor it is on to get local memory, but that's
      wrong, and can lead to crashes. Fix it to use cpu_to_node().
      
      Fixes: 885dcd70 ("powerpc/perf: Add nest IMC PMU support")
      Cc: stable@vger.kernel.org # v4.14+
      Reported-By: NRob Lippert <rlippert@google.com>
      Tested-By: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f3f1dfd6
    • K
      treewide: setup_timer() -> timer_setup() (2 field) · 86cb30ec
      Kees Cook 提交于
      This converts all remaining setup_timer() calls that use a nested field
      to reach a struct timer_list. Coccinelle does not have an easy way to
      match multiple fields, so a new script is needed to change the matches of
      "&_E->_timer" into "&_E->_field1._timer" in all the rules.
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup-2fields.cocci
      
      @fix_address_of depends@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _field1;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_field1._timer, NULL, _E);
      +timer_setup(&_E->_field1._timer, NULL, 0);
      |
      -setup_timer(&_E->_field1._timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, NULL, 0);
      |
      -setup_timer(&_E._field1._timer, NULL, &_E);
      +timer_setup(&_E._field1._timer, NULL, 0);
      |
      -setup_timer(&_E._field1._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _field1;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_field1._timer, _callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, &_callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
       _E->_field1._timer@_stl.function = _callback;
      |
       _E->_field1._timer@_stl.function = &_callback;
      |
       _E->_field1._timer@_stl.function = (_cast_func)_callback;
      |
       _E->_field1._timer@_stl.function = (_cast_func)&_callback;
      |
       _E._field1._timer@_stl.function = _callback;
      |
       _E._field1._timer@_stl.function = &_callback;
      |
       _E._field1._timer@_stl.function = (_cast_func)_callback;
      |
       _E._field1._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _field1._timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _field1._timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _field1._timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_field1._timer, _callback, 0);
      +setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._field1._timer, _callback, 0);
      +setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_field1._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_field1._timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_field1._timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_field1._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_field1._timer
      |
      -(_cast_data)&_E
      +&_E._field1._timer
      |
      -_E
      +&_E->_field1._timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _field1;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_field1._timer, _callback, 0);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, _callback, 0L);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, _callback, 0UL);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, 0);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, 0L);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, 0UL);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_field1._timer, _callback, 0);
      +timer_setup(&_field1._timer, _callback, 0);
      |
      -setup_timer(&_field1._timer, _callback, 0L);
      +timer_setup(&_field1._timer, _callback, 0);
      |
      -setup_timer(&_field1._timer, _callback, 0UL);
      +timer_setup(&_field1._timer, _callback, 0);
      |
      -setup_timer(_field1._timer, _callback, 0);
      +timer_setup(_field1._timer, _callback, 0);
      |
      -setup_timer(_field1._timer, _callback, 0L);
      +timer_setup(_field1._timer, _callback, 0);
      |
      -setup_timer(_field1._timer, _callback, 0UL);
      +timer_setup(_field1._timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      86cb30ec