1. 17 3月, 2012 1 次提交
  2. 16 3月, 2012 2 次提交
  3. 09 3月, 2012 11 次提交
    • B
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt 提交于
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      interrupts.
      
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      field.
      
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      
      In addition, this adds a few refinements:
      
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
      
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
      
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
      
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2:
      
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
      
      v3:
      
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
      
      v4:
      
       - Fix resend of doorbells on return from interrupt on Book3E
      
      v5:
      
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
      
      v6:
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
      
      v7:
       - Fix a bug with hard irq state tracking on native power7
      7230c564
    • G
      powerpc/eeh: Introduce EEH device · eb740b5f
      Gavin Shan 提交于
      Original EEH implementation depends on struct pci_dn heavily. However,
      EEH shouldn't depend on that actually because EEH needn't share much
      information with other PCI components. That's to say, EEH should have
      worked independently.
      
      The patch introduces struct eeh_dev so that EEH core components needn't
      be working based on struct pci_dn in future. Also, struct pci_dn, struct
      eeh_dev instances are created in dynamic fasion and the binding with EEH
      device, OF node, PCI device is implemented as well.
      
      The EEH devices are created after PHBs are detected and initialized, but
      PCI emunation hasn't started yet. Apart from that, PHB might be created
      dynamically through DLPAR component and the EEH devices should be creatd
      as well. Another case might be OF node is created dynamically by DR
      (Dynamic Reconfiguration), which has been defined by PAPR. For those OF
      nodes created by DR, EEH devices should be also created accordingly. The
      binding between EEH device and OF node is done while the EEH device is
      initially created.
      
      The binding between EEH device and PCI device should be done after PCI
      emunation is done. Besides, PCI hotplug also needs the binding so that
      the EEH devices could be traced from the newly coming PCI buses or PCI
      devices.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      eb740b5f
    • B
      powerpc: Replace mfmsr instructions with load from PACA kernel_msr field · d9ada91a
      Benjamin Herrenschmidt 提交于
      On 64-bit, the mfmsr instruction can be quite slow, slower
      than loading a field from the cache-hot PACA, which happens
      to already contain the value we want in most cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d9ada91a
    • B
      powerpc: Fix 64-bit BookE FP unavailable exceptions · 9424fabf
      Benjamin Herrenschmidt 提交于
      We were using CR0.EQ after EXCEPTION_COMMON, hoping it still
      contained whether we came from userspace or kernel space.
      
      However, under some circumstances, EXCEPTION_COMMON will
      call C code and clobber non-volatile registers, so we really
      need to re-load the previous MSR from the stackframe and
      re-test.
      
      While there, invert the condition to make the fast path more
      obvious and remove the BUG_OPCODE which was a debugging
      leftover and call .ret_from_except as we should.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9424fabf
    • B
      powerpc: Disable interrupts in 64-bit kernel FP and vector faults · 9f2f79e3
      Benjamin Herrenschmidt 提交于
      If we get a floating point, altivec or vsx unavaible interrupt in
      kernel, we trigger a kernel error. There is no point preserving
      the interrupt state, in fact, that can even make debugging harder
      as the processor state might change (we may even preempt) between
      taking the exception and landing in a debugger.
      
      So just make those 3 disable interrupts unconditionally.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2: On BookE only disable when hitting the kernel unavailable
          path, otherwise it will fail to restore softe as
          fast_exception_return doesn't do it.
      9f2f79e3
    • B
      powerpc: Call do_page_fault() with interrupts off · a546498f
      Benjamin Herrenschmidt 提交于
      We currently turn interrupts back to their previous state before
      calling do_page_fault(). This can be annoying when debugging as
      a bad fault will potentially have lost some processor state before
      getting into the debugger.
      
      We also end up calling some generic code with interrupts enabled
      such as notify_page_fault() with interrupts enabled, which could
      be unexpected.
      
      This changes our code to behave more like other architectures,
      and make the assembly entry code call into do_page_faults() with
      interrupts disabled. They are conditionally re-enabled from
      within do_page_fault() in the same spot x86 does it.
      
      While there, add the might_sleep() test in the case of a successful
      trylock of the mmap semaphore, again like x86.
      
      Also fix a bug in the existing assembly where r12 (_MSR) could get
      clobbered by C calls (the DTL accounting in the exception common
      macro and DISABLE_INTS) in some cases.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ---
      
      v2. Add the r12 clobber fix
      a546498f
    • B
      powerpc: Improve 64-bit syscall entry/exit · 1421ae0b
      Benjamin Herrenschmidt 提交于
      We unconditionally hard enable interrupts. This is unnecessary as
      syscalls are expected to always be called with interrupts enabled.
      
      While at it, we add a WARN_ON if that is not the case and
      CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead
      to the fast path when this is not set though).
      
      Thus let's remove the enabling (and associated irq tracing) from
      the syscall entry path. Also on Book3S, replace a few mfmsr
      instructions with loads of PACAMSR from the PACA, which should be
      faster & schedule better.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1421ae0b
    • B
      powerpc: Rework runlatch code · fe1952fc
      Benjamin Herrenschmidt 提交于
      This moves the inlines into system.h and changes the runlatch
      code to use the thread local flags (non-atomic) rather than
      the TIF flags (atomic) to keep track of the latch state.
      
      The code to turn it back on in an asynchronous interrupt is
      now simplified and partially inlined.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fe1952fc
    • B
      powerpc: Use the same interrupt prolog for perfmon as other interrupts · 7450f6f0
      Benjamin Herrenschmidt 提交于
      The perfmon interrupt is the sole user of a special variant of the
      interrupt prolog which differs from the one used by external and timer
      interrupts in that it saves the non-volatile GPRs and doesn't turn the
      runlatch on.
      
      The former is unnecessary and the later is arguably incorrect, so
      let's clean that up by using the same prolog. While at it we rename
      that prolog to use the _ASYNC prefix.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7450f6f0
    • B
      powerpc: Remove legacy iSeries bits from assembly files · 4f8cf36f
      Benjamin Herrenschmidt 提交于
      This removes the various bits of assembly in the kernel entry,
      exception handling and SLB management code that were specific
      to running under the legacy iSeries hypervisor which is no
      longer supported.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4f8cf36f
    • S
      powerpc: clean up vio.c · b0787660
      Stephen Rothwell 提交于
      This cleans up vio.c after the removal of the legacy iSeries platform.
      It also removes some no longer referenced include files.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b0787660
  4. 07 3月, 2012 3 次提交
    • G
      powerpc: Make SPARSE_IRQ required · ad5b7f13
      Grant Likely 提交于
      All IRQs on powerpc are managed via irq_domain anyway, there isn't really
      any advantage to turning SPARSE_IRQ off, and it's the direction we want
      to take the kernel design anyway.  This patch makes powerpc always use
      SPARSE_IRQ.
      
      On pseries_defconfig, SPARSE_IRQ adds only about 0x300 bytes to the
      .text sections, and removes about 0x20000 from the data section for the
      static irq_desc table.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ad5b7f13
    • N
      powerpc/prom: Remove limit on maximum size of properties · e9daf2ad
      Nishanth Aravamudan 提交于
      On a 16TB system (using AMS/CMO), I get:
      
      WARNING: ignoring large property [/ibm,dynamic-reconfiguration-memory] ibm,dynamic-memory length 0x000000000017ffec
      
      and significantly less memory is thus shown to the partition. As far as
      I can tell, the constant used is arbitrary. Ben Herrenschmidt provided
      additional background that
      
      > The limit was originally set because of Apple machines carrying ROM
      > images in the device-tree, at a time where we were much more memory
      > constrained than we are now.
      
      and that it is likely not very useful any longer.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e9daf2ad
    • M
      powerpc: Use set_current_blocked() and block_sigmask() · a2007ce8
      Matt Fleming 提交于
      As described in e6fa16ab ("signal: sigprocmask() should do
      retarget_shared_pending()") the modification of current->blocked is
      incorrect as we need to check whether the signal we're about to block
      is pending in the shared queue.
      
      Also, use the new helper function introduced in commit 5e6292c0
      ("signal: add block_sigmask() for adding sigmask to current->blocked")
      which centralises the code for updating current->blocked after
      successfully delivering a signal and reduces the amount of duplicate
      code across architectures. In the past some architectures got this
      code wrong, so using this helper function should stop that from
      happening again.
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a2007ce8
  5. 23 2月, 2012 9 次提交
    • M
      powerpc/perf: Move perf core & PMU code into a subdirectory · f2699491
      Michael Ellerman 提交于
      The perf code has grown a lot since it started, and is big enough to
      warrant its own subdirectory. For reference it's ~60% bigger than the
      oprofile code. It declutters the kernel directory, makes it simpler to
      grep for "just perf stuff", and allows us to shorten some filenames.
      
      While we're at it, make it more obvious that we have two implementations
      of the core perf logic. One for (roughly) Book3S CPUs, which was the
      original implementation, and the other for Freescale embedded CPUs.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f2699491
    • M
      fadump: Remove the phyp assisted dump code. · 12d92992
      Mahesh Salgaonkar 提交于
      Remove the phyp assisted dump implementation which is not is use.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      12d92992
    • M
      fadump: Invalidate the fadump registration during machine shutdown. · 67b43b9d
      Mahesh Salgaonkar 提交于
      If dump is active during system reboot, shutdown or halt then invalidate
      the fadump registration as it does not get invalidated automatically.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      67b43b9d
    • M
      fadump: Invalidate registration and release reserved memory for general use. · b500afff
      Mahesh Salgaonkar 提交于
      This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
      invalidate the last fadump registration, invalidate '/proc/vmcore', release
      the reserved memory for general use and re-register for future kernel dump.
      Once the dump is copied to the disk, unlike phyp dump, the userspace tool
      can release all the memory reserved for dump with one single operation of
      echo 1 to '/sys/kernel/fadump_release_mem'.
      
      Release the reserved memory region excluding the size of the memory required
      for future kernel dump registration. And therefore, unlike kdump, Fadump
      doesn't need a 2nd reboot to get back the system to the production
      configuration.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b500afff
    • M
      fadump: Add PT_NOTE program header for vmcoreinfo · d34c5f26
      Mahesh Salgaonkar 提交于
      Introduce a PT_NOTE program header that points to physical address of
      vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
      note buffer is populated during crash_fadump() at the time of system
      crash.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d34c5f26
    • M
      fadump: Convert firmware-assisted cpu state dump data into elf notes. · ebaeb5ae
      Mahesh Salgaonkar 提交于
      When registered for firmware assisted dump on powerpc, firmware preserves
      the registers for the active CPUs during a system crash. This patch reads
      the cpu register data stored in Firmware-assisted dump format (except for
      crashing cpu) and converts it into elf notes and updates the PT_NOTE program
      header accordingly. The exact register state for crashing cpu is saved to
      fadump crash info structure in scratch area during crash_fadump() and read
      during second kernel boot.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ebaeb5ae
    • M
      fadump: Initialize elfcore header and add PT_LOAD program headers. · 2df173d9
      Mahesh Salgaonkar 提交于
      Build the crash memory range list by traversing through system memory during
      the first kernel before we register for firmware-assisted dump. After the
      successful dump registration, initialize the elfcore header and populate
      PT_LOAD program headers with crash memory ranges. The elfcore header is
      saved in the scratch area within the reserved memory. The scratch area starts
      at the end of the memory reserved for saving RMR region contents. The
      scratch area contains fadump crash info structure that contains magic number
      for fadump validation and physical address where the eflcore header can be
      found. This structure will also be used to pass some important crash info
      data to the second kernel which will help second kernel to populate ELF core
      header with correct data before it gets exported through /proc/vmcore. Since
      the firmware preserves the entire partition memory at the time of crash the
      contents of the scratch area will be preserved till second kernel boot.
      
      Since the memory dump exported through /proc/vmcore is in ELF format similar
      to kdump, it will help us to reuse the kdump infrastructure for dump capture
      and filtering. Unlike phyp dump, userspace tool does not need to refer any
      sysfs interface while reading /proc/vmcore.
      
      NOTE: The current design implementation does not address a possibility of
      introducing additional fields (in future) to this structure without affecting
      compatibility. It's on TODO list to come up with better approach to
      address this.
      
      Reserved dump area start => +-------------------------------------+
                                  |  CPU state dump data                |
                                  +-------------------------------------+
                                  |  HPTE region data                   |
                                  +-------------------------------------+
                                  |  RMR region data                    |
      Scratch area start       => +-------------------------------------+
                                  |  fadump crash info structure {      |
                                  |     magic nummber                   |
                           +------|---- elfcorehdr_addr                 |
                           |      |  }                                  |
                           +----> +-------------------------------------+
                                  |  ELF core header                    |
      Reserved dump area end   => +-------------------------------------+
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2df173d9
    • M
      fadump: Register for firmware assisted dump. · 3ccc00a7
      Mahesh Salgaonkar 提交于
      On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote:
      > On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote:
      >
      > If I have read the code correctly, we are going to get this printk on
      > non-pSeries machines or on older pSeries machines, even if the user
      > has not put the fadump=on option on the kernel command line.  The
      > printk will be annoying since there is no actual error condition.  It
      > seems to me that the condition for the printk should include
      > fw_dump.fadump_enabled.  In other words you should probably add
      >
      > 	if (!fw_dump.fadump_enabled)
      > 		return 0;
      >
      > at the beginning of the function.
      
      Hi Paul,
      
      Thanks for pointing it out. Please find the updated patch below.
      
      The existing patches above this (4/10 through 10/10) cleanly applies
      on this update.
      
      Thanks,
      -Mahesh.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ccc00a7
    • M
      fadump: Reserve the memory for firmware assisted dump. · eb39c880
      Mahesh Salgaonkar 提交于
      Reserve the memory during early boot to preserve CPU state data, HPTE region
      and RMA (real mode area) region data in case of kernel crash. At the time of
      crash, powerpc firmware will store CPU state data, HPTE region data and move
      RMA region data to the reserved memory area.
      
      If the firmware-assisted dump fails to reserve the memory, then fallback
      to existing kexec-based kdump.
      
      Most of the code implementation to reserve memory has been
      adapted from phyp assisted dump implementation written by Linas Vepstas
      and Manish Ahuja
      
      This patch also introduces a config option CONFIG_FA_DUMP for firmware
      assisted dump feature on Powerpc (ppc64) architecture.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      eb39c880
  6. 22 2月, 2012 2 次提交
  7. 16 2月, 2012 3 次提交
    • A
      powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events · 9a45a940
      Anton Blanchard 提交于
      perf on POWER stopped working after commit e050e3f0 (perf: Fix
      broken interrupt rate throttling). That patch exposed a bug in
      the POWER perf_events code.
      
      Since the PMCs count upwards and take an exception when the top bit
      is set, we want to write 0x80000000 - left in power_pmu_start. We were
      instead programming in left which effectively disables the counter
      until we eventually hit 0x80000000. This could take seconds or longer.
      
      With the patch applied I get the expected number of samples:
      
                SAMPLE events:       9948
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: <stable@kernel.org>
      9a45a940
    • B
      powerpc: Disable interrupts early in Program Check · 54321242
      Benjamin Herrenschmidt 提交于
      Program Check exceptions are the result of WARNs, BUGs, some
      type of breakpoints, kprobe, and other illegal instructions.
      
      We want interrupts (and thus preemption) to remain disabled
      while doing the initial stage of testing the reason and
      branching off to a debugger or kprobe, so we are still on
      the original CPU which makes debugging easier in various cases.
      
      This is how the code was intended, hence the local_irq_enable()
      right in the middle of program_check_exception().
      
      However, the assembly exception prologue for that exception was
      incorrectly marked as enabling interrupts, which defeats that
      (and records a redundant enable with lockdep).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      54321242
    • I
      powerpc: Fix kernel log of oops/panic instruction dump · 40c8cefa
      Ira Snyder 提交于
      A kernel oops/panic prints an instruction dump showing several
      instructions before and after the instruction which caused the
      oops/panic.
      
      The code intended that the faulting instruction be enclosed in angle
      brackets, however a bug caused the faulting instruction to be
      interpreted by printk() as the message log level.
      
      To fix this, the KERN_CONT log level is added before the actual text of
      the printed message.
      
      === Before the patch ===
      
      [ 1081.587266] Instruction dump:
      [ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      [ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
      [ 1081.602500]  4e800020 3803ffd0 2b800009
      
      <4>[ 1081.587266] Instruction dump:
      <4>[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      <4>[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
      <98090000>[ 1081.602500]  4e800020 3803ffd0 2b800009
      
      === After the patch ===
      
      [   51.385216] Instruction dump:
      [   51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      [   51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009
      
      <4>[   51.385216] Instruction dump:
      <4>[   51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      <4>[   51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009
      Signed-off-by: NIra W. Snyder <iws@ovro.caltech.edu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      40c8cefa
  8. 14 2月, 2012 2 次提交
    • B
      powerpc/pseries: Fix partition migration hang in stop_topology_update · 444080d1
      Brian King 提交于
      This fixes a hang that was observed during live partition migration.
      Since stop_topology_update must not be called from an interrupt
      context, call it earlier in the migration process. The hang observed
      can be seen below:
      
      WARNING: at kernel/timer.c:1011
      Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 fuse loop ibmveth sg ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin dm_multipath scsi_dh sd_mod crc_t10dif ibmvfc scsi_transport_fc scsi_tgt scsi_mod dm_snapshot dm_mod
      NIP: c0000000000c52d8 LR: c00000000004be28 CTR: 0000000000000000
      REGS: c00000005ffd77d0 TRAP: 0700   Not tainted  (3.2.0-git-00001-g07d106d0)
      MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 48000084  XER: 00000001
      CFAR: c00000000004be20
      TASK = c00000005ec78860[0] 'swapper/3' THREAD: c00000005ec98000 CPU: 3
      GPR00: 0000000000000001 c00000005ffd7a50 c000000000fbbc98 c000000000ec8340
      GPR04: 00000000282a0020 0000000000000000 0000000000004000 0000000000000101
      GPR08: 0000000000000012 c00000005ffd4000 0000000000000020 c000000000f3ba88
      GPR12: 0000000000000000 c000000007f40900 0000000000000001 0000000000000004
      GPR16: 0000000000000001 0000000000000000 0000000000000000 c000000001022310
      GPR20: 0000000000000001 0000000000000000 0000000000200200 c000000001029e14
      GPR24: 0000000000000000 0000000000000001 0000000000000040 c00000003f74bc80
      GPR28: c00000003f74bc84 c000000000f38038 c000000000f16b58 c000000000ec8340
      NIP [c0000000000c52d8] .del_timer_sync+0x28/0x60
      LR [c00000000004be28] .stop_topology_update+0x20/0x38
      Call Trace:
      [c00000005ffd7a50] [c00000005ec78860] 0xc00000005ec78860 (unreliable)
      [c00000005ffd7ad0] [c00000000004be28] .stop_topology_update+0x20/0x38
      [c00000005ffd7b40] [c000000000028378] .__rtas_suspend_last_cpu+0x58/0x260
      [c00000005ffd7bf0] [c0000000000fa230] .generic_smp_call_function_interrupt+0x160/0x358
      [c00000005ffd7cf0] [c000000000036ec8] .smp_ipi_demux+0x88/0x100
      [c00000005ffd7d80] [c00000000005c154] .icp_hv_ipi_action+0x5c/0x80
      [c00000005ffd7e00] [c00000000012a088] .handle_irq_event_percpu+0x100/0x318
      [c00000005ffd7f00] [c00000000012e774] .handle_percpu_irq+0x84/0xd0
      [c00000005ffd7f90] [c000000000022ba8] .call_handle_irq+0x1c/0x2c
      [c00000005ec9ba20] [c00000000001157c] .do_IRQ+0x22c/0x2a8
      [c00000005ec9bae0] [c0000000000054bc] hardware_interrupt_entry+0x18/0x1c
      Exception: 501 at .cpu_idle+0x194/0x2f8
          LR = .cpu_idle+0x194/0x2f8
      [c00000005ec9bdd0] [c000000000017e58] .cpu_idle+0x188/0x2f8 (unreliable)
      [c00000005ec9be90] [c00000000067ec18] .start_secondary+0x3e4/0x524
      [c00000005ec9bf90] [c0000000000093e8] .start_secondary_prolog+0x10/0x14
      Instruction dump:
      ebe1fff8 4e800020 fbe1fff8 7c0802a6 f8010010 7c7f1b78 f821ff81 78290464
      80090014 5400019e 7c0000d0 78000fe0 <0b000000> 4800000c 7c210b78 7c421378
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      444080d1
    • B
      powerpc: Fix WARN_ON in decrementer_check_overflow · 6fe5f5f3
      Benjamin Herrenschmidt 提交于
      We use __get_cpu_var() which triggers a false positive warning
      in smp_processor_id() thinking interrupts are enabled (at this
      point, they are soft-enabled but hard-disabled).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6fe5f5f3
  9. 25 1月, 2012 2 次提交
  10. 18 1月, 2012 2 次提交
    • E
      audit: inline audit_syscall_entry to reduce burden on archs · b05d8447
      Eric Paris 提交于
      Every arch calls:
      
      if (unlikely(current->audit_context))
      	audit_syscall_entry()
      
      which requires knowledge about audit (the existance of audit_context) in
      the arch code.  Just do it all in static inline in audit.h so that arch's
      can remain blissfully ignorant.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b05d8447
    • E
      Audit: push audit success and retcode into arch ptrace.h · d7e7528b
      Eric Paris 提交于
      The audit system previously expected arches calling to audit_syscall_exit to
      supply as arguments if the syscall was a success and what the return code was.
      Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
      by converting from negative retcodes to an audit internal magic value stating
      success or failure.  This helper was wrong and could indicate that a valid
      pointer returned to userspace was a failed syscall.  The fix is to fix the
      layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
      in turns calls back into arch code to collect the return value and to
      determine if the syscall was a success or failure.  We also define a generic
      is_syscall_success() macro which determines success/failure based on if the
      value is < -MAX_ERRNO.  This works for arches like x86 which do not use a
      separate mechanism to indicate syscall failure.
      
      We make both the is_syscall_success() and regs_return_value() static inlines
      instead of macros.  The reason is because the audit function must take a void*
      for the regs.  (uml calls theirs struct uml_pt_regs instead of just struct
      pt_regs so audit_syscall_exit can't take a struct pt_regs).  Since the audit
      function takes a void* we need to use static inlines to cast it back to the
      arch correct structure to dereference it.
      
      The other major change is that on some arches, like ia64, MIPS and ppc, we
      change regs_return_value() to give us the negative value on syscall failure.
      THE only other user of this macro, kretprobe_example.c, won't notice and it
      makes the value signed consistently for the audit functions across all archs.
      
      In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
      audit code as the return value.  But the ptrace_64.h code defined the macro
      regs_return_value() as regs[3].  I have no idea which one is correct, but this
      patch now uses the regs_return_value() function, so it now uses regs[3].
      
      For powerpc we previously used regs->result but now use the
      regs_return_value() function which uses regs->gprs[3].  regs->gprs[3] is
      always positive so the regs_return_value(), much like ia64 makes it negative
      before calling the audit code when appropriate.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
      Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
      Acked-by: Richard Weinberger <richard@nod.at> [for uml]
      Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
      Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
      d7e7528b
  11. 13 1月, 2012 2 次提交
  12. 11 1月, 2012 1 次提交
    • A
      powerpc: Fix RCU idle and hcall tracing · a5ccfee0
      Anton Blanchard 提交于
      Tracepoints should not be called inside an rcu_idle_enter/rcu_idle_exit
      region. Since pSeries calls H_CEDE in the idle loop, we were violating
      this rule.
      
      commit a7b152d5 (powerpc: Tell RCU about idle after hcall tracing)
      tried to work around it by delaying the rcu_idle_enter until after we
      called the hcall tracepoint, but there are a number of issues with it.
      
      The hcall tracepoint trampoline code is called conditionally when the
      tracepoint is enabled. If the tracepoint is not enabled we never call
      rcu_idle_enter. The idle_uses_rcu check was also done at compile time
      which breaks multiplatform builds.
      
      The simple fix is to avoid tracing H_CEDE and rely on other tracepoints
      and the hypervisor dispatch trace log to work out if we called H_CEDE.
      
      This fixes a hang during boot on pSeries.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a5ccfee0