1. 16 2月, 2012 1 次提交
    • I
      powerpc: Fix kernel log of oops/panic instruction dump · 40c8cefa
      Ira Snyder 提交于
      A kernel oops/panic prints an instruction dump showing several
      instructions before and after the instruction which caused the
      oops/panic.
      
      The code intended that the faulting instruction be enclosed in angle
      brackets, however a bug caused the faulting instruction to be
      interpreted by printk() as the message log level.
      
      To fix this, the KERN_CONT log level is added before the actual text of
      the printed message.
      
      === Before the patch ===
      
      [ 1081.587266] Instruction dump:
      [ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      [ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
      [ 1081.602500]  4e800020 3803ffd0 2b800009
      
      <4>[ 1081.587266] Instruction dump:
      <4>[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      <4>[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
      <98090000>[ 1081.602500]  4e800020 3803ffd0 2b800009
      
      === After the patch ===
      
      [   51.385216] Instruction dump:
      [   51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      [   51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009
      
      <4>[   51.385216] Instruction dump:
      <4>[   51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
      <4>[   51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009
      Signed-off-by: NIra W. Snyder <iws@ovro.caltech.edu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      40c8cefa
  2. 28 11月, 2011 1 次提交
  3. 17 11月, 2011 2 次提交
  4. 01 11月, 2011 1 次提交
  5. 19 7月, 2011 2 次提交
  6. 12 7月, 2011 2 次提交
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
    • Y
      powerpc/e500: Save SPEFCSR in flush_spe_to_thread() · 685659ee
      yu liu 提交于
      giveup_spe() saves the SPE state which is protected by MSR[SPE].
      However, modifying SPEFSCR does not trap when MSR[SPE]=0.
      And since SPEFSCR is already saved/restored in _switch(),
      not all the callers want to save SPEFSCR again.
      Thus, saving SPEFSCR should not belong to giveup_spe().
      
      This patch moves SPEFSCR saving to flush_spe_to_thread(),
      and cleans up the caller that needs to save SPEFSCR accordingly.
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      Acked-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      685659ee
  7. 25 5月, 2011 1 次提交
    • P
      powerpc: mmu_gather rework · d6bf29b4
      Peter Zijlstra 提交于
      Fix up powerpc to the new mmu_gather stuff.
      
      PPC has an extra batching queue to RCU free the actual pagetable
      allocations, use the ARCH extentions for that for now.
      
      For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
      hardware hash-table, keep using per-cpu arrays but flush on context switch
      and use a TLF bit to track the lazy_mmu state.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6bf29b4
  8. 27 4月, 2011 2 次提交
  9. 23 3月, 2011 1 次提交
  10. 02 3月, 2011 1 次提交
  11. 21 1月, 2011 1 次提交
  12. 02 9月, 2010 2 次提交
    • P
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras 提交于
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
    • M
      powerpc: Move arch_sd_sibling_asym_packing() to smp.c · e1f0ece1
      Michael Neuling 提交于
      Simple cleanup by moving arch_sd_sibling_asym_packing from process.c to
      smp.c to save an #ifdef CONFIG_SMP
      
      No functionality change.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e1f0ece1
  13. 24 8月, 2010 2 次提交
  14. 18 8月, 2010 1 次提交
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  15. 14 8月, 2010 1 次提交
  16. 09 7月, 2010 1 次提交
    • B
      powerpc/book3e: Hack to get gdb moving along on Book3E 64-bit · a2e19811
      Benjamin Herrenschmidt 提交于
      Our handling of debug interrupts on Book3E 64-bit is not quite
      the way it should be just yet. This is a workaround to let gdb
      work at least for now. We ensure that when context switching,
      we set the appropriate DBCR0 value for the new task. We also
      make sure that we turn off MSR[DE] within the kernel, and set
      it as part of the bits that get set when going back to userspace.
      
      In the long run, we will probably set the userspace DBCR0 on the
      exception exit code path and ensure we have some proper kernel
      value to set on the way into the kernel, a bit like ppc32 does,
      but that will take more work.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a2e19811
  17. 29 6月, 2010 1 次提交
  18. 22 6月, 2010 1 次提交
    • K
      powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors · 5aae8a53
      K.Prasad 提交于
      Implement perf-events based hw-breakpoint interfaces for PowerPC
      64-bit server (Book III S) processors.  This allows access to a
      given location to be used as an event that can be counted or
      profiled by the perf_events subsystem.
      
      This is done using the DABR (data breakpoint register), which can
      also be used for process debugging via ptrace.  When perf_event
      hw_breakpoint support is configured in, the perf_event subsystem
      manages the DABR and arbitrates access to it, and ptrace then
      creates a perf_event when it is requested to set a data breakpoint.
      
      [Adopted suggestions from Paul Mackerras <paulus@samba.org> to
      - emulate_step() all system-wide breakpoints and single-step only the
        per-task breakpoints
      - perform arch-specific cleanup before unregistration through
        arch_unregister_hw_breakpoint()
      ]
      Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5aae8a53
  19. 15 6月, 2010 1 次提交
  20. 09 6月, 2010 2 次提交
    • P
      powerpc: Exclude arch_sd_sibiling_asym_packing() on UP · 89275d59
      Peter Zijlstra 提交于
      Only SMP systems care about load-balance features, plus this
      saves some .text space on UP and also fixes the build.
      Reported-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Neuling <mikey@neuling.org>
      LKML-Reference: <tip-76cbd8a8@git.kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89275d59
    • M
      powerpc: Enable asymmetric SMT scheduling on POWER7 · 76cbd8a8
      Michael Neuling 提交于
      The POWER7 core has dynamic SMT mode switching which is controlled by
      the hypervisor.  There are 3 SMT modes:
      	SMT1 uses thread  0
      	SMT2 uses threads 0 & 1
      	SMT4 uses threads 0, 1, 2 & 3
      When in any particular SMT mode, all threads have the same performance
      as each other (ie. at any moment in time, all threads perform the same).
      
      The SMT mode switching works such that when linux has threads 2 & 3 idle
      and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the
      idle loop and the hypervisor will automatically switch to SMT2 for that
      core (independent of other cores).  The opposite is not true, so if
      threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode.
      
      Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go
      into SMT1 mode.
      
      If we can get the core into a lower SMT mode (SMT1 is best), the threads
      will perform better (since they share less core resources).  Hence when
      we have idle threads, we want them to be the higher ones.
      
      This adds a feature bit for asymmetric packing to powerpc and then
      enables it on POWER7.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@ozlabs.org
      LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76cbd8a8
  21. 05 5月, 2010 1 次提交
  22. 17 2月, 2010 2 次提交
    • D
      powerpc/booke: Add support for advanced debug registers · 3bffb652
      Dave Kleikamp 提交于
      powerpc/booke: Add support for advanced debug registers
      
      From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      
      Based on patches originally written by Torez Smith.
      
      This patch defines context switch and trap related functionality
      for BookE specific Debug Registers. It adds support to ptrace()
      for setting and getting BookE related Debug Registers
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Torez Smith  <lnxtorez@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <dwg@au1.ibm.com>
      Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Sergio Durigan Junior <sergiodj@br.ibm.com>
      Cc: Thiago Jung Bauermann <bauerman@br.ibm.com>
      Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3bffb652
    • D
      powerpc/booke: Introduce new CONFIG options for advanced debug registers · 172ae2e7
      Dave Kleikamp 提交于
      powerpc/booke: Introduce new CONFIG options for advanced debug registers
      
      From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      
      Introduce new config options to simplify the ifdefs pertaining to the
      advanced debug registers for booke and 40x processors:
      
      CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors
      CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers
      CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers
      CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers
      CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported
      
      Beginning conservatively, since I only have the facilities to test 440
      hardware.  I believe all 40x and booke platforms support at least 2 IAC
      and 2 DAC registers.  For 440, 4 IAC and 2 DVC registers are enabled, as
      well as the DAC ranges.
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Acked-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      172ae2e7
  23. 01 2月, 2010 1 次提交
  24. 27 10月, 2009 1 次提交
    • K
      powerpc: Fix compile errors found by new ppc64e_defconfig · ce7a35c7
      Kumar Gala 提交于
      Fix the following 3 issues:
      
      arch/powerpc/kernel/process.c: In function 'arch_randomize_brk':
      arch/powerpc/kernel/process.c:1183: error: 'mmu_highuser_ssize' undeclared (first use in this function)
      arch/powerpc/kernel/process.c:1183: error: (Each undeclared identifier is reported only once
      arch/powerpc/kernel/process.c:1183: error: for each function it appears in.)
      arch/powerpc/kernel/process.c:1183: error: 'MMU_SEGSIZE_1T' undeclared (first use in this function)
      
      In file included from arch/powerpc/kernel/setup_64.c:60:
      arch/powerpc/include/asm/mmu-hash64.h:132: error: redefinition of 'struct mmu_psize_def'
      arch/powerpc/include/asm/mmu-hash64.h:159: error: expected identifier or '(' before numeric constant
      arch/powerpc/include/asm/mmu-hash64.h:396: error: conflicting types for 'mm_context_t'
      arch/powerpc/include/asm/mmu-book3e.h:184: error: previous declaration of 'mm_context_t' was here
      
      cc1: warnings being treated as errors
      arch/powerpc/kernel/pci_64.c: In function 'pcibios_unmap_io_space':
      arch/powerpc/kernel/pci_64.c:100: error: unused variable 'res'
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ce7a35c7
  25. 14 10月, 2009 1 次提交
    • S
      powerpc/ftrace: show real return addresses in modules · 9135c3cc
      Steven Rostedt 提交于
      When the function graph tracer is enabled, it replaces the return address
      with a hook back to the tracer. This makes back traces see the hook instead
      of the actual return address.
      
      The current code also shows the real address by checking if the return
      address jumps to the return_to_handler. If it is, is also prints out
      the saved real return address.
      
      On powerpc64, some modules may return to mod_return_to_handler, which
      is not checked. This patch will also show the real address if a return
      is to mod_return_to_handler as well.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9135c3cc
  26. 24 9月, 2009 1 次提交
    • A
      powerpc: Move 64bit heap above 1TB on machines with 1TB segments · 8bbde7a7
      Anton Blanchard 提交于
      If we are using 1TB segments and we are allowed to randomise the heap, we can
      put it above 1TB so it is backed by a 1TB segment. Otherwise the heap will be
      in the bottom 1TB which always uses 256MB segments and this may result in a
      performance penalty.
      
      This functionality is disabled when heap randomisation is turned off:
      
      echo 1 > /proc/sys/kernel/randomize_va_space
      
      which may be useful when trying to allocate the maximum amount of 16M or 16G
      pages.
      
      On a microbenchmark that repeatedly touches 32GB of memory with a stride of
      256MB + 4kB (designed to stress 256MB segments while still mapping nicely into
      the L1 cache), we see the improvement:
      
      Force malloc to use heap all the time:
      # export MALLOC_MMAP_MAX_=0 MALLOC_TRIM_THRESHOLD_=-1
      
      Disable heap randomization:
      # echo 1 > /proc/sys/kernel/randomize_va_space
      # time ./test
      12.51s
      
      Enable heap randomization:
      # echo 2 > /proc/sys/kernel/randomize_va_space
      # time ./test
      1.70s
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8bbde7a7
  27. 11 9月, 2009 1 次提交
  28. 20 8月, 2009 1 次提交
  29. 26 6月, 2009 1 次提交
  30. 09 6月, 2009 1 次提交
  31. 03 4月, 2009 1 次提交
  32. 23 2月, 2009 1 次提交