1. 22 5月, 2012 1 次提交
  2. 12 5月, 2012 1 次提交
    • B
      powerpc/irq: Fix another case of lazy IRQ state getting out of sync · 7c0482e3
      Benjamin Herrenschmidt 提交于
      So we have another case of paca->irq_happened getting out of
      sync with the HW irq state. This can happen when a perfmon
      interrupt occurs while soft disabled, as it will return to a
      soft disabled but hard enabled context while leaving a stale
      PACA_IRQ_HARD_DIS flag set.
      
      This patch fixes it, and also adds a test for the condition
      of those flags being out of sync in arch_local_irq_restore()
      when CONFIG_TRACE_IRQFLAGS is enabled.
      
      This helps catching those gremlins faster (and so far I
      can't seem see any anymore, so that's good news).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7c0482e3
  3. 09 5月, 2012 1 次提交
  4. 30 4月, 2012 1 次提交
    • G
      powerpc/irqdomain: Fix broken NR_IRQ references · 4013369f
      Grant Likely 提交于
      The switch from using irq_map to irq_alloc_desc*() for managing irq
      number allocations introduced new bugs in some of the powerpc
      interrupt code.  Several functions rely on the value of NR_IRQS to
      determine the maximum irq number that could get allocated.  However,
      with sparse_irq and using irq_alloc_desc*() the maximum possible irq
      number is now specified with 'nr_irqs' which may be a number larger
      than NR_IRQS.  This has caused breakage on powermac when
      CONFIG_NR_IRQS is set to 32.
      
      This patch removes most of the direct references to NR_IRQS in the
      powerpc code and replaces them with either a nr_irqs reference or by
      using the common for_each_irq_desc() macro.  The powerpc-specific
      for_each_irq() macro is removed at the same time.
      
      Also, the Cell axon_msi driver is refactored to remove the global
      build assumption on the size of NR_IRQS and instead add a limit to the
      maximum irq number when calling irq_domain_add_nomap().
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4013369f
  5. 11 4月, 2012 1 次提交
  6. 29 3月, 2012 1 次提交
  7. 28 3月, 2012 2 次提交
  8. 21 3月, 2012 1 次提交
  9. 09 3月, 2012 1 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
  10. 07 3月, 2012 1 次提交
    • G
      powerpc: Make SPARSE_IRQ required · ad5b7f13
      Grant Likely 提交于
      All IRQs on powerpc are managed via irq_domain anyway, there isn't really
      any advantage to turning SPARSE_IRQ off, and it's the direction we want
      to take the kernel design anyway.  This patch makes powerpc always use
      SPARSE_IRQ.
      
      On pseries_defconfig, SPARSE_IRQ adds only about 0x300 bytes to the
      .text sections, and removes about 0x20000 from the data section for the
      static irq_desc table.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ad5b7f13
  11. 16 2月, 2012 2 次提交
  12. 15 2月, 2012 2 次提交
    • G
      irq_domain/powerpc: eliminate irq_map; use irq_alloc_desc() instead · 4bbdd45a
      Grant Likely 提交于
      This patch drops the powerpc-specific irq_map table and replaces it with
      directly using the irq_alloc_desc()/irq_free_desc() interfaces for allocating
      and freeing irq_desc structures.
      
      This patch is a preparation step for generalizing the powerpc-specific virq
      infrastructure to become irq_domains.
      
      As part of this change, the irq_big_lock is changed to a mutex from a raw
      spinlock.  There is no longer any need to use a spin lock since the irq_desc
      allocation code is now responsible for the critical section of finding
      an unused range of irq numbers.
      
      The radix lookup table is also changed to store the irq_data pointer instead
      of the irq_map entry since the irq_map is removed.  This should end up being
      functionally equivalent since only allocated irq_descs are ever added to the
      radix tree.
      
      v5: - Really don't ever allocate virq 0.  The previous version could still
            do it if hint == 0
          - Respect irq_virq_count setting for NOMAP.  Some NOMAP domains cannot
            use virq values above irq_virq_count.
          - Use numa_node_id() when allocating irq_descs.  Ideally the API should
            obtain that value from the caller, but that touches a lot of call sites
            so will be deferred to a follow-on patch.
          - Fix irq_find_mapping() to include irq numbers lower than
            NUM_ISA_INTERRUPTS.  With the switch to irq_alloc_desc*(), the lowest
            possible allocated irq is now returned by arch_probe_nr_irqs().
      v4: - Fix incorrect access to irq_data structure in debugfs code
          - Don't ever allocate virq 0
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Milton Miller <miltonm@bga.com>
      Tested-by: NOlof Johansson <olof@lixom.net>
      4bbdd45a
    • G
      irq_domain/powerpc: Use common irq_domain structure instead of irq_host · bae1d8f1
      Grant Likely 提交于
      This patch drops the powerpc-specific irq_host structures and uses the common
      irq_domain strucutres defined in linux/irqdomain.h.  It also fixes all
      the users to use the new structure names.
      
      Renaming irq_host to irq_domain has been discussed for a long time, and this
      patch is a step in the process of generalizing the powerpc virq code to be
      usable by all architecture.
      
      An astute reader will notice that this patch actually removes the irq_host
      structure instead of renaming it.  This is because the irq_domain structure
      already exists in include/linux/irqdomain.h and has the needed data members.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Milton Miller <miltonm@bga.com>
      Tested-by: NOlof Johansson <olof@lixom.net>
      bae1d8f1
  13. 14 2月, 2012 1 次提交
  14. 08 12月, 2011 1 次提交
  15. 25 11月, 2011 2 次提交
    • A
      powerpc/time: Optimise decrementer_check_overflow · 7df10275
      Anton Blanchard 提交于
      decrementer_check_overflow is called from arch_local_irq_restore so
      we want to make it as light weight as possible. As such, turn
      decrementer_check_overflow into an inline function.
      
      To avoid a circular mess of includes, separate out the two components
      of struct decrementer_clock and keep the struct clock_event_device
      part local to time.c.
      
      The fast path improves from:
      
      arch_local_irq_restore
           0:       mflr    r0
           4:       std     r0,16(r1)
           8:       stdu    r1,-112(r1)
           c:       stb     r3,578(r13)
          10:       cmpdi   cr7,r3,0
          14:       beq-    cr7,24 <.arch_local_irq_restore+0x24>
      ...
          24:       addi    r1,r1,112
          28:       ld      r0,16(r1)
          2c:       mtlr    r0
          30:       blr
      
      to:
      
      arch_local_irq_restore
          0:       std     r30,-16(r1)
          4:       ld      r30,0(r2)
          8:       stb     r3,578(r13)
          c:       cmpdi   cr7,r3,0
         10:       beq-    cr7,6c <.arch_local_irq_restore+0x6c>
      ...
         6c:       ld      r30,-16(r1)
         70:       blr
      
      Unfortunately we still setup a local TOC (due to -mminimal-toc). Yet
      another sign we should be moving to -mcmodel=medium.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7df10275
    • A
      powerpc/time: Handle wrapping of decrementer · 37fb9a02
      Anton Blanchard 提交于
      When re-enabling interrupts we have code to handle edge sensitive
      decrementers by resetting the decrementer to 1 whenever it is negative.
      If interrupts were disabled long enough that the decrementer wrapped to
      positive we do nothing. This means interrupts can be delayed for a long
      time until it finally goes negative again.
      
      While we hope interrupts are never be disabled long enough for the
      decrementer to go positive, we have a very good test team that can
      drive any kernel into the ground. The softlockup data we get back
      from these fails could be seconds in the future, completely missing
      the cause of the lockup.
      
      We already keep track of the timebase of the next event so use that
      to work out if we should trigger a decrementer exception.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      37fb9a02
  16. 01 11月, 2011 1 次提交
  17. 22 7月, 2011 1 次提交
  18. 19 7月, 2011 1 次提交
  19. 29 6月, 2011 1 次提交
  20. 23 6月, 2011 1 次提交
  21. 26 5月, 2011 5 次提交
    • M
      powerpc: Fix irq_free_virt by adjusting bounds before loop · 4dd60290
      Milton Miller 提交于
      Instead of looping over each irq and checking against the irq array
      bounds, adjust the bounds before looping.
      
      The old code will not free any irq if the irq + count is above
      irq_virq_count because the test in the loop is testing irq + count
      instead of irq + i.
      
      This code checks the limits to avoid unsigned integer overflows.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4dd60290
    • M
      powerpc/irq: Protect irq_radix_revmap_lookup against irq_free_virt · 9b788251
      Milton Miller 提交于
      The radix-tree code uses call_rcu when freeing internal elements.
      We must protect against the elements being freed while we traverse
      the tree, even if the returned pointer will still be valid.
      
      While preparing a patch to expand the context in which
      irq_radix_revmap_lookup will be called, I realized that the
      radix tree was not locked.
      
      When asked
      
          For a normal call_rcu usage, is it allowed to read the structure in
          irq_enter / irq_exit, without additional rcu_read_lock?  Could an
          element freed with call_rcu advance with the cpu still between
          irq_enter/irq_exit (and irq_disabled())?
      
      Paul McKenney replied:
      
          Absolutely illegal to do so. OK for call_rcu_sched(), but a
          flaming bug for call_rcu().
      
          And thank you very much for finding this!!!
      
      Further analysis:
      
      In the current CONFIG_TREE_RCU implementation. CONFIG_TREE_PREEMPT_RCU
      (and CONFIG_TINY_PREEMPT_RCU) uses explicit counters.
      
      These counters are reflected from per-CPU to global in the
      scheduling-clock-interrupt handler, so disabling irq does prevent the
      grace period from completing. But there are real-time implementations
      (such as the one use by the Concurrent guys) where disabling irq
      does -not- prevent the grace period from completing.
      
      While an alternative fix would be to switch radix-tree to rcu_sched, I
      don't want to audit the other users of radix trees (nor put alternative
      freeing in the library).  The normal overhead for rcu_read_lock and
      unlock are a local counter increment and decrement.
      
      This does not show up in the rcu lockdep because in 2.6.34 commit
      2676a58c (radix-tree: Disable RCU lockdep checking in radix tree)
      deemed it too hard to pass the condition of the protecting lock
      to the library.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9b788251
    • M
      powerpc/irq: Check desc in handle_one_irq and expand generic_handle_irq · 2e455257
      Milton Miller 提交于
      Look up the descriptor and check that it is found in handle_one_irq
      before checking if we are on the irq stack, and call the handler
      directly using the descriptor if we are on the stack.
      
      We need check irq_to_desc finds the descriptor to avoid a NULL
      pointer dereference.  It could have failed because the number from
      ppc_md.get_irq was above NR_IRQS, or various exceptional conditions
      with sparse irqs (eg race conditions while freeing an irq if its was
      not shutdown in the controller).
      
      fe12bc2c (genirq: Uninline and sanity check generic_handle_irq())
      moved generic_handle_irq out of line to allow its use by interrupt
      controllers in modules.  However, handle_one_irq is core arch code.
      It already knows the details of struct irq_desc and handling irqs in
      the nested irq case.  This will avoid the extra stack frame to return
      the value we don't check.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2e455257
    • M
      powerpc/irq: Always free duplicate IRQ_LEGACY hosts · 3d1b5e20
      Milton Miller 提交于
      Since kmem caches are allocated before init_IRQ as noted in 3af259d1
      (powerpc: Radix trees are available before init_IRQ), we now call
      kmalloc in all cases and can can always call kfree if we are asked
      to allocate a duplicate or conflicting IRQ_HOST_MAP_LEGACY host.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3d1b5e20
    • M
      powerpc/irq: Remove stale and misleading comment · 8142f032
      Milton Miller 提交于
      The comment claims we will call host->ops->map() to update the flags if
      we find a previously established mapping, but we never did.  We used
      to call remap, but that call was removed in da051980 (powerpc: Remove
      irq_host_ops->remap hook).
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8142f032
  22. 19 5月, 2011 7 次提交
    • M
      powerpc: Make IRQ_NOREQUEST last to clear, first to set · 41fb5e62
      Milton Miller 提交于
      When creating an irq, don't allow a concurent driver request until
      we have caled map, which will likley call set_chip_and_handler to
      change the irq_chip and its operations.
      
      Similarly, when tearing down an IRQ, make sure no new uses come
      along while we change the irq back to the nop chip and then reset
      the descriptor to freed status.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      41fb5e62
    • M
      powerpc: Remove virq_to_host · 1e8c2301
      Milton Miller 提交于
      The only references to the irq_map[].host field are internal to
      arch/powerpc/kernel/irq.c
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1e8c2301
    • M
      powerpc: Add virq_is_host to reduce virq_to_host usage · 3ee62d36
      Milton Miller 提交于
      Some irq_host implementations are using virq_to_host to check if
      they are the irq_host for a virtual irq.  To allow us to make space
      versus time tradeoffs, replace this usage with an assertive
      virq_is_host that confirms or denies the irq is associated with the
      given irq_host.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ee62d36
    • M
      powerpc: Remove irq_host_ops->remap hook · da051980
      Milton Miller 提交于
      It was called from irq_create_mapping if that was called for a host
      and hwirq that was previously mapped, "to update the flags".  But the
      only implementation was in beat_interrupt and all it did was repeat a
      hypervisor call without error checking that was performed with error
      checking at the beginning of the map hook.  In addition, the comment on
      the beat remap hook says it will only called once for a given mapping,
      which would apply to map not remap.
      
      All flags should be known by the time the match hook is called, before
      we call the map hook.  Removing this mostly unused hook will simpify
      the requirements of irq_domain concept.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      da051980
    • M
      powerpc: Return early if irq_host lookup type is wrong · 2d441681
      Milton Miller 提交于
      If for some reason the code incrorectly calls the wrong function to
      manage the revmap, not only should we warn, we should take action.
      However, in the paths we expect to be taken every delivered interrupt
      change to WARN_ON_ONCE.  Use the if (WARN_ON(x)) format to get the
      unlikely for free.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Reviewed-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2d441681
    • M
      powerpc: Radix trees are available before init_IRQ · 3af259d1
      Milton Miller 提交于
      Since the generic irq code uses a radix tree for sparse interrupts,
      the initcall ordering has been changed to initialize radix trees before
      irqs.   We no longer need to defer creating revmap radix trees to the
      arch_initcall irq_late_init.
      
      Also, the kmem caches are allocated so we don't need to use
      zalloc_maybe_bootmem.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Reviewed-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3af259d1
    • M
      powerpc: Consolidate ipi message mux and demux · 23d72bfd
      Milton Miller 提交于
      Consolidate the mux and demux of ipi messages into smp.c and call
      a new smp_ops callback to actually trigger the ipi.
      
      The powerpc architecture code is optimised for having 4 distinct
      ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
      single, scheduler ipi, and enter debugger).  However, several interrupt
      controllers only provide a single software triggered interrupt that
      can be delivered to each cpu.  To resolve this limitation, each smp_ops
      implementation created a per-cpu variable that is manipulated with atomic
      bitops.  Since these lines will be contended they are optimialy marked as
      shared_aligned and take a full cache line for each cpu.  Distro kernels
      may have 2 or 3 of these in their config, each taking per-cpu space
      even though at most one will be in use.
      
      This consolidation removes smp_message_recv and replaces the single call
      actions cases with direct calls from the common message recognition loop.
      The complicated debugger ipi case with its muxed crash handling code is
      moved to debug_ipi_action which is now called from the demux code (instead
      of the multi-message action calling smp_message_recv).
      
      I put a call to reschedule_action to increase the likelyhood of correctly
      merging the anticipated scheduler_ipi() hook coming from the scheduler
      tree; that single required call can be inlined later.
      
      The actual message decode is a copy of the old pseries xics code with its
      memory barriers and cache line spacing, augmented with a per-cpu unsigned
      long based on the book-e doorbell code.  The optional data is set via a
      callback from the implementation and is passed to the new cause-ipi hook
      along with the logical cpu number.  While currently only the doorbell
      implemntation uses this data it should be almost zero cost to retrieve and
      pass it -- it adds a single register load for the argument from the same
      cache line to which we just completed a store and the register is dead
      on return from the call.  I extended the data element from unsigned int
      to unsigned long in case some other code wanted to associate a pointer.
      
      The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
      conditioned on the CPU_DBELL feature.  The ifdef guard could be relaxed
      to CONFIG_SMP but I left it with BOOKE for now.
      
      Also, the doorbell interrupt vector for book-e was not calling irq_enter
      and irq_exit, which throws off cpu accounting and causes code to not
      realize it is running in interrupt context.  Add the missing calls.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      23d72bfd
  23. 04 5月, 2011 1 次提交
  24. 27 4月, 2011 2 次提交
  25. 01 4月, 2011 1 次提交