1. 12 5月, 2012 1 次提交
    • B
      powerpc/irq: Fix another case of lazy IRQ state getting out of sync · 7c0482e3
      Benjamin Herrenschmidt 提交于
      So we have another case of paca->irq_happened getting out of
      sync with the HW irq state. This can happen when a perfmon
      interrupt occurs while soft disabled, as it will return to a
      soft disabled but hard enabled context while leaving a stale
      PACA_IRQ_HARD_DIS flag set.
      
      This patch fixes it, and also adds a test for the condition
      of those flags being out of sync in arch_local_irq_restore()
      when CONFIG_TRACE_IRQFLAGS is enabled.
      
      This helps catching those gremlins faster (and so far I
      can't seem see any anymore, so that's good news).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7c0482e3
  2. 09 5月, 2012 2 次提交
  3. 08 5月, 2012 3 次提交
  4. 05 5月, 2012 1 次提交
  5. 30 4月, 2012 1 次提交
    • G
      powerpc/irqdomain: Fix broken NR_IRQ references · 4013369f
      Grant Likely 提交于
      The switch from using irq_map to irq_alloc_desc*() for managing irq
      number allocations introduced new bugs in some of the powerpc
      interrupt code.  Several functions rely on the value of NR_IRQS to
      determine the maximum irq number that could get allocated.  However,
      with sparse_irq and using irq_alloc_desc*() the maximum possible irq
      number is now specified with 'nr_irqs' which may be a number larger
      than NR_IRQS.  This has caused breakage on powermac when
      CONFIG_NR_IRQS is set to 32.
      
      This patch removes most of the direct references to NR_IRQS in the
      powerpc code and replaces them with either a nr_irqs reference or by
      using the common for_each_irq_desc() macro.  The powerpc-specific
      for_each_irq() macro is removed at the same time.
      
      Also, the Cell axon_msi driver is refactored to remove the global
      build assumption on the size of NR_IRQS and instead add a limit to the
      maximum irq number when calling irq_domain_add_nomap().
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4013369f
  6. 26 4月, 2012 2 次提交
    • T
      powerpc: Use generic idle thread allocation · 17e32eac
      Thomas Gleixner 提交于
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Link: http://lkml.kernel.org/r/20120420124557.311212868@linutronix.de
      17e32eac
    • T
      smp: Add task_struct argument to __cpu_up() · 8239c25f
      Thomas Gleixner 提交于
      Preparatory patch to make the idle thread allocation for secondary
      cpus generic.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
      8239c25f
  7. 20 4月, 2012 1 次提交
    • B
      powerpc: fix build when CONFIG_BOOKE_WDT is enabled · eda713e2
      Baruch Siach 提交于
      Commit ae3a197e (Disintegrate asm/system.h for PowerPC) broke build of
      assembly files when CONFIG_BOOKE_WDT is enabled as follows:
      
        AS      arch/powerpc/lib/string.o
      /home/baruch/git/stable/arch/powerpc/include/asm/reg_booke.h: Assembler messages:
      /home/baruch/git/stable/arch/powerpc/include/asm/reg_booke.h:19: Error: Unrecognized opcode: `extern'
      /home/baruch/git/stable/arch/powerpc/include/asm/reg_booke.h:20: Error: Unrecognized opcode: `extern'
      
      Since setup_32.c is the only user of the booke_wdt configuration variables, move
      the declarations there.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      eda713e2
  8. 11 4月, 2012 2 次提交
  9. 10 4月, 2012 1 次提交
    • B
      powerpc: Fix page fault with lockdep regression · 08f1ec8a
      Benjamin Herrenschmidt 提交于
      commit a546498f
      introduced a regression on 32-bit when irq tracing
      is enabled by exposing an old bug in our irq tracing
      code for exception entry.
      
      The code would save and restore some GPRs around the
      calls to the C lockdep code, however, it tries to be
      too smart for its own good and restores some of the
      GPRs from the exception frame (as saved there on
      exception entry).
      
      However, for page faults, we do replace those GPRs with
      arguments to do_page_fault before we call transfer_to_handler
      and so restoring from the exception frame is plain wrong in
      this case.
      
      This was fine as long as we didn't touch the interrupt state
      when taking page fault, but when I started doing it, it would
      trigger the lockdep calls and the bug.
      
      This fixes it by cleaning up that code a bit. It did create
      a small stack frame for the sake of backtraces, so let's
      make it a bit bigger and use it to save and restore the
      stuff we care about.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      08f1ec8a
  10. 02 4月, 2012 1 次提交
  11. 29 3月, 2012 1 次提交
  12. 28 3月, 2012 5 次提交
  13. 24 3月, 2012 1 次提交
    • J
      coredump: remove VM_ALWAYSDUMP flag · 909af768
      Jason Baron 提交于
      The motivation for this patchset was that I was looking at a way for a
      qemu-kvm process, to exclude the guest memory from its core dump, which
      can be quite large.  There are already a number of filter flags in
      /proc/<pid>/coredump_filter, however, these allow one to specify 'types'
      of kernel memory, not specific address ranges (which is needed in this
      case).
      
      Since there are no more vma flags available, the first patch eliminates
      the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
      the kernel to mark vdso and vsyscall pages.  However, it is simple
      enough to check if a vma covers a vdso or vsyscall page without the need
      for this flag.
      
      The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
      'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
      'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
      continue to work the same as before unless 'MADV_DONTDUMP' is set on the
      region.
      
      The qemu code which implements this features is at:
      
        http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch
      
      In my testing the qemu core dump shrunk from 383MB -> 13MB with this
      patch.
      
      I also believe that the 'MADV_DONTDUMP' flag might be useful for
      security sensitive apps, which might want to select which areas are
      dumped.
      
      This patch:
      
      The VM_ALWAYSDUMP flag is currently used by the coredump code to
      indicate that a vma is part of a vsyscall or vdso section.  However, we
      can determine if a vma is in one these sections by checking it against
      the gate_vma and checking for a non-NULL return value from
      arch_vma_name().  Thus, freeing a valuable vma bit.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      909af768
  14. 21 3月, 2012 4 次提交
  15. 17 3月, 2012 1 次提交
  16. 16 3月, 2012 2 次提交
  17. 09 3月, 2012 11 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
    • G
      powerpc/eeh: Introduce EEH device · eb740b5f
      Gavin Shan 提交于
      Original EEH implementation depends on struct pci_dn heavily. However,
      EEH shouldn't depend on that actually because EEH needn't share much
      information with other PCI components. That's to say, EEH should have
      worked independently.
      
      The patch introduces struct eeh_dev so that EEH core components needn't
      be working based on struct pci_dn in future. Also, struct pci_dn, struct
      eeh_dev instances are created in dynamic fasion and the binding with EEH
      device, OF node, PCI device is implemented as well.
      
      The EEH devices are created after PHBs are detected and initialized, but
      PCI emunation hasn't started yet. Apart from that, PHB might be created
      dynamically through DLPAR component and the EEH devices should be creatd
      as well. Another case might be OF node is created dynamically by DR
      (Dynamic Reconfiguration), which has been defined by PAPR. For those OF
      nodes created by DR, EEH devices should be also created accordingly. The
      binding between EEH device and OF node is done while the EEH device is
      initially created.
      
      The binding between EEH device and PCI device should be done after PCI
      emunation is done. Besides, PCI hotplug also needs the binding so that
      the EEH devices could be traced from the newly coming PCI buses or PCI
      devices.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      eb740b5f
    • B
      powerpc: Replace mfmsr instructions with load from PACA kernel_msr field · d9ada91a
      Benjamin Herrenschmidt 提交于
      On 64-bit, the mfmsr instruction can be quite slow, slower
      than loading a field from the cache-hot PACA, which happens
      to already contain the value we want in most cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d9ada91a
    • B
      powerpc: Fix 64-bit BookE FP unavailable exceptions · 9424fabf
      Benjamin Herrenschmidt 提交于
      We were using CR0.EQ after EXCEPTION_COMMON, hoping it still
      contained whether we came from userspace or kernel space.
      
      However, under some circumstances, EXCEPTION_COMMON will
      call C code and clobber non-volatile registers, so we really
      need to re-load the previous MSR from the stackframe and
      re-test.
      
      While there, invert the condition to make the fast path more
      obvious and remove the BUG_OPCODE which was a debugging
      leftover and call .ret_from_except as we should.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9424fabf
    • B
      powerpc: Disable interrupts in 64-bit kernel FP and vector faults · 9f2f79e3
      Benjamin Herrenschmidt 提交于
      If we get a floating point, altivec or vsx unavaible interrupt in
      kernel, we trigger a kernel error. There is no point preserving
      the interrupt state, in fact, that can even make debugging harder
      as the processor state might change (we may even preempt) between
      taking the exception and landing in a debugger.
      
      So just make those 3 disable interrupts unconditionally.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2: On BookE only disable when hitting the kernel unavailable
          path, otherwise it will fail to restore softe as
          fast_exception_return doesn't do it.
      9f2f79e3
    • B
      powerpc: Call do_page_fault() with interrupts off · a546498f
      Benjamin Herrenschmidt 提交于
      We currently turn interrupts back to their previous state before
      calling do_page_fault(). This can be annoying when debugging as
      a bad fault will potentially have lost some processor state before
      getting into the debugger.
      
      We also end up calling some generic code with interrupts enabled
      such as notify_page_fault() with interrupts enabled, which could
      be unexpected.
      
      This changes our code to behave more like other architectures,
      and make the assembly entry code call into do_page_faults() with
      interrupts disabled. They are conditionally re-enabled from
      within do_page_fault() in the same spot x86 does it.
      
      While there, add the might_sleep() test in the case of a successful
      trylock of the mmap semaphore, again like x86.
      
      Also fix a bug in the existing assembly where r12 (_MSR) could get
      clobbered by C calls (the DTL accounting in the exception common
      macro and DISABLE_INTS) in some cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2. Add the r12 clobber fix
      a546498f
    • B
      powerpc: Improve 64-bit syscall entry/exit · 1421ae0b
      Benjamin Herrenschmidt 提交于
      We unconditionally hard enable interrupts. This is unnecessary as
      syscalls are expected to always be called with interrupts enabled.
      
      While at it, we add a WARN_ON if that is not the case and
      CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead
      to the fast path when this is not set though).
      
      Thus let's remove the enabling (and associated irq tracing) from
      the syscall entry path. Also on Book3S, replace a few mfmsr
      instructions with loads of PACAMSR from the PACA, which should be
      faster & schedule better.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1421ae0b
    • B
      powerpc: Rework runlatch code · fe1952fc
      Benjamin Herrenschmidt 提交于
      This moves the inlines into system.h and changes the runlatch
      code to use the thread local flags (non-atomic) rather than
      the TIF flags (atomic) to keep track of the latch state.
      
      The code to turn it back on in an asynchronous interrupt is
      now simplified and partially inlined.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fe1952fc
    • B
      powerpc: Use the same interrupt prolog for perfmon as other interrupts · 7450f6f0
      Benjamin Herrenschmidt 提交于
      The perfmon interrupt is the sole user of a special variant of the
      interrupt prolog which differs from the one used by external and timer
      interrupts in that it saves the non-volatile GPRs and doesn't turn the
      runlatch on.
      
      The former is unnecessary and the later is arguably incorrect, so
      let's clean that up by using the same prolog. While at it we rename
      that prolog to use the _ASYNC prefix.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7450f6f0
    • B
      powerpc: Remove legacy iSeries bits from assembly files · 4f8cf36f
      Benjamin Herrenschmidt 提交于
      This removes the various bits of assembly in the kernel entry,
      exception handling and SLB management code that were specific
      to running under the legacy iSeries hypervisor which is no
      longer supported.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4f8cf36f
    • S
      powerpc: clean up vio.c · b0787660
      Stephen Rothwell 提交于
      This cleans up vio.c after the removal of the legacy iSeries platform.
      It also removes some no longer referenced include files.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b0787660