1. 13 10月, 2010 2 次提交
  2. 06 10月, 2010 2 次提交
    • S
      powerpc: remove unused variable · 7c6d45e6
      Stephen Rothwell 提交于
      Since powerpc uses -Werror on arch powerpc, the build was broken like
      this:
      
        cc1: warnings being treated as errors
        arch/powerpc/kernel/module.c: In function 'module_finalize':
        arch/powerpc/kernel/module.c:66: error: unused variable 'err'
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c6d45e6
    • L
      modules: Fix module_bug_list list corruption race · 5336377d
      Linus Torvalds 提交于
      With all the recent module loading cleanups, we've minimized the code
      that sits under module_mutex, fixing various deadlocks and making it
      possible to do most of the module loading in parallel.
      
      However, that whole conversion totally missed the rather obscure code
      that adds a new module to the list for BUG() handling.  That code was
      doubly obscure because (a) the code itself lives in lib/bugs.c (for
      dubious reasons) and (b) it gets called from the architecture-specific
      "module_finalize()" rather than from generic code.
      
      Calling it from arch-specific code makes no sense what-so-ever to begin
      with, and is now actively wrong since that code isn't protected by the
      module loading lock any more.
      
      So this commit moves the "module_bug_{finalize,cleanup}()" calls away
      from the arch-specific code, and into the generic code - and in the
      process protects it with the module_mutex so that the list operations
      are now safe.
      
      Future fixups:
       - move the module list handling code into kernel/module.c where it
         belongs.
       - get rid of 'module_bug_list' and just use the regular list of modules
         (called 'modules' - imagine that) that we already create and maintain
         for other reasons.
      Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5336377d
  3. 23 9月, 2010 2 次提交
  4. 02 9月, 2010 9 次提交
    • B
      powerpc/dma: Add optional platform override of dma_set_mask() · 5b6e9ff6
      Benjamin Herrenschmidt 提交于
      Some platforms may want to override dma_set_mask() to take into
      account some specific "features" such as the availability of
      a direct-map window in addition to an iommu.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5b6e9ff6
    • D
      powerpc: Use is_32bit_task() helper to test 32-bit binary · cab175f9
      Denis Kirjanov 提交于
      This patch removes all explicit tests for the TIF_32BIT flag
      Signed-off-by: NDenis Kirjanov <dkirjanov@kernel.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cab175f9
    • A
      powerpc: Remove fpscr use from [kvm_]cvt_{fd,df} · 05d77ac9
      Andreas Schwab 提交于
      Neither lfs nor stfs touch the fpscr, so remove the restore/save of it
      around them.
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      05d77ac9
    • P
      powerpc/pseries: Re-enable dispatch trace log userspace interface · 872e439a
      Paul Mackerras 提交于
      Since the cpu accounting code uses the hypervisor dispatch trace log
      now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled
      access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory
      in that case.  This restores those files.
      
      To do this, we now have a hook that the cpu accounting code will call
      as it processes each entry from the hypervisor dispatch trace log.
      The code in dtl.c now uses that to fill up its ring buffer, rather
      than having the hypervisor fill the ring buffer directly.
      
      This also fixes dtl_file_read() to handle overflow conditions a bit
      better and adds a spinlock to ensure that race conditions (multiple
      processes opening or reading the file concurrently) are handled
      correctly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      872e439a
    • P
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras 提交于
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
    • P
      powerpc: Dynamically allocate most lppaca structs · 93c22703
      Paul Mackerras 提交于
      This arranges for the lppaca structs for most cpus to be dynamically
      allocated in the same manner as the paca structs.  If we don't include
      support for legacy iSeries, only the first lppaca is statically
      allocated; the rest are dynamically allocated.  If we include legacy
      iSeries support, then we statically allocate the first 64 lppaca
      structs, since the iSeries hypervisor requires that the lppaca
      structs be present in the data section of the kernel image, but
      legacy iSeries supports at most 64 cpus.
      
      With CONFIG_NR_CPUS, the kernel image size for a typical pSeries config
      went from:
      
         text    data     bss     dec     hex filename
      9524478 4734564 8469944 22728986        15ad11a ../test-1024/vmlinux
      
      to:
      
         text    data     bss     dec     hex filename
      9524482 3751508 8469944 21745934        14bd10e ../test-1024/vmlinux
      
      a reduction of 983052 bytes overall.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      93c22703
    • P
      powerpc: Abstract indexing of lppaca structs · 8154c5d2
      Paul Mackerras 提交于
      Currently we have the lppaca structs as a simple array of NR_CPUS
      entries, taking up space in the data section of the kernel image.
      In future we would like to allocate them dynamically, so this
      abstracts out the accesses to the array, making it easier to
      change how we locate the lppaca for a given cpu in future.
      Specifically, lppaca[cpu] changes to lppaca_of(cpu).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8154c5d2
    • M
      powerpc: Move arch_sd_sibling_asym_packing() to smp.c · e1f0ece1
      Michael Neuling 提交于
      Simple cleanup by moving arch_sd_sibling_asym_packing from process.c to
      smp.c to save an #ifdef CONFIG_SMP
      
      No functionality change.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e1f0ece1
    • A
      powerpc: Feature nop out reservation clear when stcx checks address · f89451fb
      Anton Blanchard 提交于
      The POWER architecture does not require stcx to check that it is operating
      on the same address as the larx. This means it is possible for an
      an exception handler to execute a larx, get a reservation, decide
      not to do the stcx and then return back with an active reservation. If the
      interrupted code was in the middle of a larx/stcx sequence the stcx could
      incorrectly succeed.
      
      All recent POWER CPUs check the address before letting the stcx succeed
      so we can create a CPU feature and nop it out. As Ben suggested, we can
      only do this in our syscall path because there is a remote possibility
      some kernel code gets interrupted by an exception that ends up operating
      on the same cacheline.
      
      Thanks to Paul Mackerras and Derek Williams for the idea.
      
      To test this I used a very simple null syscall (actually getppid) testcase
      at http://ozlabs.org/~anton/junkcode/null_syscall.c
      
      I tested against 2.6.35-git10 with the following changes against the
      pseries_defconfig:
      
      CONFIG_VIRT_CPU_ACCOUNTING=n
      CONFIG_AUDIT=n
      CONFIG_PPC_4K_PAGES=n
      CONFIG_PPC_64K_PAGES=y
      CONFIG_FORCE_MAX_ZONEORDER=9
      CONFIG_PPC_SUBPAGE_PROT=n
      CONFIG_FUNCTION_TRACER=n
      CONFIG_FUNCTION_GRAPH_TRACER=n
      CONFIG_IRQSOFF_TRACER=n
      CONFIG_STACK_TRACER=n
      
      to remove the overhead of virtual CPU accounting, syscall auditing and
      the ftrace mcount tracers. 64kB pages were enabled to minimise TLB misses.
      
      POWER6: +8.2%
      POWER7: +7.0%
      
      Another suggestion was to use a larx to something in the L1 instead of a stcx.
      This was almost as fast as removing the larx on POWER6, but only 3.5% faster
      on POWER7. We can use this to speed up the reservation clear in our
      exception exit code.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f89451fb
  5. 31 8月, 2010 3 次提交
    • M
      powerpc: Don't use kernel stack with translation off · 54a83404
      Michael Neuling 提交于
      In f761622e we changed
      early_setup_secondary so it's called using the proper kernel stack
      rather than the emergency one.
      
      Unfortunately, this stack pointer can't be used when translation is off
      on PHYP as this stack pointer might be outside the RMO.  This results in
      the following on all non zero cpus:
        cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
            pc: 000000000001c50c
            lr: 000000000000821c
            sp: c00000001639ff90
           msr: 8000000000001000
           dar: c00000001639ffa0
         dsisr: 42000000
          current = 0xc000000016393540
          paca    = 0xc000000006e00200
            pid   = 0, comm = swapper
      
      The original patch was only tested on bare metal system, so it never
      caught this problem.
      
      This changes __secondary_start so that we calculate the new stack
      pointer but only start using it after we've called early_setup_secondary.
      
      With this patch, the above problem goes away.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      54a83404
    • P
      powerpc/perf_event: Reduce latency of calling perf_event_do_pending · b0d278b7
      Paul Mackerras 提交于
      Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to
      perf_event_do_pending call") moved the call to perf_event_do_pending
      in timer_interrupt() down so that it was after the irq_enter() call.
      Unfortunately this moved it after the code that checks whether it
      is time for the next decrementer clock event.  The result is that
      the call to perf_event_do_pending() won't happen until the next
      decrementer clock event is due.  This was pointed out by Milton
      Miller.
      
      This fixes it by moving the check for whether it's time for the
      next decrementer clock event down to the point where we're about
      to call the event handler, after we've called perf_event_do_pending.
      
      This has the side effect that on old pre-Core99 Powermacs where we
      use the ppc_n_lost_interrupts mechanism to replay interrupts, a
      replayed interrupt will incur a little more latency since it will
      now do the code from the irq_enter down to the irq_exit, that it
      used to skip.  However, these machines are now old and rare enough
      that this doesn't matter.  To make it clear that ppc_n_lost_interrupts
      is only used on Powermacs, and to speed up the code slightly on
      non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts
      is now conditional on CONFIG_PMAC as well as CONFIG_PPC32.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b0d278b7
    • M
      powerpc/kexec: Adds correct calling convention for kexec purgatory · 4562c986
      Matthew McClintock 提交于
      Call kexec purgatory code correctly. We were getting lucky before.
      If you examine the powerpc 32bit kexec "purgatory" code you will
      see it expects the following:
      
      >From kexec-tools: purgatory/arch/ppc/v2wrap_32.S
      -> calling convention:
      ->   r3 = physical number of this cpu (all cpus)
      ->   r4 = address of this chunk (master only)
      
      As such, we need to set r3 to the current core, r4 happens to be
      unused by purgatory at the moment but we go ahead and set it
      here as well
      Signed-off-by: NMatthew McClintock <msm@freescale.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4562c986
  6. 24 8月, 2010 11 次提交
  7. 23 8月, 2010 3 次提交
  8. 18 8月, 2010 1 次提交
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  9. 14 8月, 2010 1 次提交
  10. 07 8月, 2010 1 次提交
  11. 05 8月, 2010 1 次提交
  12. 03 8月, 2010 2 次提交
  13. 01 8月, 2010 2 次提交