1. 13 4月, 2017 1 次提交
    • N
      powerpc: Change the doorbell IPI calling convention · b866cc21
      Nicholas Piggin 提交于
      Change the doorbell callers to know about their msgsnd addressing,
      rather than have them set a per-cpu target data tag at boot that gets
      sent to the cause_ipi functions. The data is only used for doorbell IPI
      functions, no other IPI types, so it makes sense to keep that detail
      local to doorbell.
      
      Have the platform code understand doorbell IPIs, rather than the
      interrupt controller code understand them. Platform code can look at
      capabilities it has available and decide which to use.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b866cc21
  2. 20 3月, 2015 1 次提交
    • P
      powerpc/powernv: Fixes for hypervisor doorbell handling · 755563bc
      Paul Mackerras 提交于
      Since we can now use hypervisor doorbells for host IPIs, this makes
      sure we clear the host IPI flag when taking a doorbell interrupt, and
      clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
      already do for IPIs sent via the XICS interrupt controller).  Otherwise
      if there did happen to be a leftover pending doorbell interrupt for
      an offline CPU thread for any reason, it would prevent that thread from
      going into a power-saving mode; it would instead keep waking up because
      of the interrupt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      755563bc
  3. 03 11月, 2014 1 次提交
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  4. 27 8月, 2014 2 次提交
    • T
      Revert "powerpc: Replace __get_cpu_var uses" · 23f66e2d
      Tejun Heo 提交于
      This reverts commit 5828f666 due to
      build failure after merging with pending powerpc changes.
      
      Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.auSigned-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      23f66e2d
    • C
      powerpc: Replace __get_cpu_var uses · 5828f666
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      tj: Folded a fix patch.
          http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5828f666
  5. 18 4月, 2013 1 次提交
    • I
      powerpc: Add accounting for Doorbell interrupts · a6a058e5
      Ian Munsie 提交于
      This patch adds a new line to /proc/interrupts to account for the
      doorbell interrupts that each hardware thread has received. The total
      interrupt count in /proc/stat will now also include doorbells.
      
       # cat /proc/interrupts
                 CPU0       CPU1       CPU2       CPU3
       16:        551       1267        281        175      XICS Level     IPI
      LOC:       2037       1503       1688       1625   Local timer interrupts
      SPU:          0          0          0          0   Spurious interrupts
      CNT:          0          0          0          0   Performance monitoring interrupts
      MCE:          0          0          0          0   Machine check exceptions
      DBL:         42        550         20         91   Doorbell interrupts
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      a6a058e5
  6. 10 1月, 2013 1 次提交
  7. 05 9月, 2012 1 次提交
    • P
      powerpc: Make sure IPI handlers see data written by IPI senders · 9fb1b36c
      Paul Mackerras 提交于
      We have been observing hangs, both of KVM guest vcpu tasks and more
      generally, where a process that is woken doesn't properly wake up and
      continue to run, but instead sticks in TASK_WAKING state.  This
      happens because the update of rq->wake_list in ttwu_queue_remote()
      is not ordered with the update of ipi_message in
      smp_muxed_ipi_message_pass(), and the reading of rq->wake_list in
      scheduler_ipi() is not ordered with the reading of ipi_message in
      smp_ipi_demux().  Thus it is possible for the IPI receiver not to see
      the updated rq->wake_list and therefore conclude that there is nothing
      for it to do.
      
      In order to make sure that anything done before smp_send_reschedule()
      is ordered before anything done in the resulting call to scheduler_ipi(),
      this adds barriers in smp_muxed_message_pass() and smp_ipi_demux().
      The barrier in smp_muxed_message_pass() is a full barrier to ensure that
      there is a full ordering between the smp_send_reschedule() caller and
      scheduler_ipi().  In smp_ipi_demux(), we use xchg() rather than
      xchg_local() because xchg() includes release and acquire barriers.
      Using xchg() rather than xchg_local() makes sense given that
      ipi_message is not just accessed locally.
      
      This moves the barrier between setting the message and calling the
      cause_ipi() function into the individual cause_ipi implementations.
      Most of them -- those that used outb, out_8 or similar -- already had
      a full barrier because out_8 etc. include a sync before the MMIO
      store.  This adds an explicit barrier in the two remaining cases.
      
      These changes made no measurable difference to the speed of IPIs as
      measured using a simple ping-pong latency test across two CPUs on
      different cores of a POWER7 machine.
      
      The analysis of the reason why processes were not waking up properly
      is due to Milton Miller.
      
      Cc: stable@vger.kernel.org # v3.0+
      Reported-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9fb1b36c
  8. 09 3月, 2012 1 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
  9. 19 5月, 2011 2 次提交
    • M
      powerpc: Consolidate ipi message mux and demux · 23d72bfd
      Milton Miller 提交于
      Consolidate the mux and demux of ipi messages into smp.c and call
      a new smp_ops callback to actually trigger the ipi.
      
      The powerpc architecture code is optimised for having 4 distinct
      ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
      single, scheduler ipi, and enter debugger).  However, several interrupt
      controllers only provide a single software triggered interrupt that
      can be delivered to each cpu.  To resolve this limitation, each smp_ops
      implementation created a per-cpu variable that is manipulated with atomic
      bitops.  Since these lines will be contended they are optimialy marked as
      shared_aligned and take a full cache line for each cpu.  Distro kernels
      may have 2 or 3 of these in their config, each taking per-cpu space
      even though at most one will be in use.
      
      This consolidation removes smp_message_recv and replaces the single call
      actions cases with direct calls from the common message recognition loop.
      The complicated debugger ipi case with its muxed crash handling code is
      moved to debug_ipi_action which is now called from the demux code (instead
      of the multi-message action calling smp_message_recv).
      
      I put a call to reschedule_action to increase the likelyhood of correctly
      merging the anticipated scheduler_ipi() hook coming from the scheduler
      tree; that single required call can be inlined later.
      
      The actual message decode is a copy of the old pseries xics code with its
      memory barriers and cache line spacing, augmented with a per-cpu unsigned
      long based on the book-e doorbell code.  The optional data is set via a
      callback from the implementation and is passed to the new cause-ipi hook
      along with the logical cpu number.  While currently only the doorbell
      implemntation uses this data it should be almost zero cost to retrieve and
      pass it -- it adds a single register load for the argument from the same
      cache line to which we just completed a store and the register is dead
      on return from the call.  I extended the data element from unsigned int
      to unsigned long in case some other code wanted to associate a pointer.
      
      The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
      conditioned on the CPU_DBELL feature.  The ifdef guard could be relaxed
      to CONFIG_SMP but I left it with BOOKE for now.
      
      Also, the doorbell interrupt vector for book-e was not calling irq_enter
      and irq_exit, which throws off cpu accounting and causes code to not
      realize it is running in interrupt context.  Add the missing calls.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      23d72bfd
    • M
      powerpc: Remove checks for MSG_ALL and MSG_ALL_BUT_SELF · f1072939
      Milton Miller 提交于
      Now that smp_ops->smp_message_pass is always called with an (online) cpu
      number for the target remove the checks for MSG_ALL and MSG_ALL_BUT_SELF.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f1072939
  10. 09 7月, 2010 4 次提交
  11. 23 2月, 2009 1 次提交