1. 09 9月, 2008 1 次提交
    • M
      kernel/cpu.c: create a CPU_STARTING cpu_chain notifier · e545a614
      Manfred Spraul 提交于
      Right now, there is no notifier that is called on a new cpu, before the new
      cpu begins processing interrupts/softirqs.
      Various kernel function would need that notification, e.g. kvm works around
      by calling smp_call_function_single(), rcu polls cpu_online_map.
      
      The patch adds a CPU_STARTING notification. It also adds a helper function
      that sends the message to all cpu_chain handlers.
      
      Tested on x86-64.
      All other archs are untested. Especially on sparc, I'm not sure if I got
      it right.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e545a614
  2. 28 7月, 2008 3 次提交
  3. 26 6月, 2008 2 次提交
  4. 14 5月, 2008 1 次提交
    • M
      [POWERPC] Fix sparse warnings in arch/powerpc/kernel · 1c21a293
      Michael Ellerman 提交于
      Make a few things static in lparcfg.c
      Make init and exit routines static in rtas_flash.c
      Make things static in rtas_pci.c
      Make some functions static in rtas.c
      Make fops static in rtas-proc.c
      Remove unneeded extern for do_gtod in smp.c
      Make clocksource_init() static in time.c
      Make last_tick_len and ticklen_to_xs static in time.c
      Move the declaration of the pvr per-cpu into smp.h
      Make kexec_smp_down() and kexec_stack static in machine_kexec_64.c
      Don't return void in arch_teardown_msi_irqs() in msi.c
      Move declaration of GregorianDay()into asm/time.h
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1c21a293
  5. 02 5月, 2008 1 次提交
    • P
      [POWERPC] Bolt in SLB entry for kernel stack on secondary cpus · 3b575064
      Paul Mackerras 提交于
      This fixes a regression reported by Kamalesh Bulabel where a POWER4
      machine would crash because of an SLB miss at a point where the SLB
      miss exception was unrecoverable.  This regression is tracked at:
      
      http://bugzilla.kernel.org/show_bug.cgi?id=10082
      
      SLB misses at such points shouldn't happen because the kernel stack is
      the only memory accessed other than things in the first segment of the
      linear mapping (which is mapped at all times by entry 0 of the SLB).
      The context switch code ensures that SLB entry 2 covers the kernel
      stack, if it is not already covered by entry 0.  None of entries 0
      to 2 are ever replaced by the SLB miss handler.
      
      Where this went wrong is that the context switch code assumes it
      doesn't have to write to SLB entry 2 if the new kernel stack is in the
      same segment as the old kernel stack, since entry 2 should already be
      correct.  However, when we start up a secondary cpu, it calls
      slb_initialize, which doesn't set up entry 2.  This is correct for
      the boot cpu, where we will be using a stack in the kernel BSS at this
      point (i.e. init_thread_union), but not necessarily for secondary
      cpus, whose initial stack can be allocated anywhere.  This doesn't
      cause any immediate problem since the SLB miss handler will just
      create an SLB entry somewhere else to cover the initial stack.
      
      In fact it's possible for the cpu to go quite a long time without SLB
      entry 2 being valid.  Eventually, though, the entry created by the SLB
      miss handler will get overwritten by some other entry, and if the next
      access to the stack is at an unrecoverable point, we get the crash.
      
      This fixes the problem by making slb_initialize create a suitable
      entry for the kernel stack, if we are on a secondary cpu and the stack
      isn't covered by SLB entry 0.  This requires initializing the
      get_paca()->kstack field earlier, so I do that in smp_create_idle
      where the current field is initialized.  This also abstracts a bit of
      the computation that mk_esid_data in slb.c does so that it can be used
      in slb_initialize.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3b575064
  6. 25 1月, 2008 2 次提交
    • O
      [POWERPC] Make smp_send_stop() handle panic and xmon reboot · e057d985
      Olof Johansson 提交于
      smp_send_stop() will send an IPI to all other cpus to shut them down.
      However, for the case of xmon-based reboots (as well as potentially some
      panics), the other cpus are (or might be) spinning with interrupts off,
      and won't take the IPI.
      
      Current code will drop us into the debugger when the IPI fails, which
      means we're in an infinite loop that we can't get out of without an
      external reset of some sort.
      
      Instead, make the smp_send_stop() IPI call path just print the warning
      about being unable to send IPIs, but make it return so the rest of the
      shutdown sequence can continue. It's not perfect, but the lesser of
      two evils.
      
      Also move the call_lock handling outside of smp_call_function_map so we
      can avoid deadlocks in smp_send_stop().
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e057d985
    • O
      [POWERPC] Make smp_call_function_map static · b616de5e
      Olof Johansson 提交于
      smp_call_function_map should be static, and for consistency prepend it
      with __ like other local helper functions in the same file.
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b616de5e
  7. 17 10月, 2007 1 次提交
  8. 03 10月, 2007 1 次提交
  9. 22 9月, 2007 1 次提交
    • S
      [POWERPC] Avoid pointless WARN_ON(irqs_disabled()) from panic codepath · 8fd7675c
      Satyam Sharma 提交于
      > ------------[ cut here ]------------
      > Badness at arch/powerpc/kernel/smp.c:202
      
      comes when smp_call_function_map() has been called with irqs disabled,
      which is illegal. However, there is a special case, the panic() codepath,
      when we do not want to warn about this -- warning at that time is pointless
      anyway, and only serves to scroll away the *real* cause of the panic and
      distracts from the real bug.
      
      * So let's extract the WARN_ON() from smp_call_function_map() into all its
        callers -- smp_call_function() and smp_call_function_single()
      
      * Also, introduce another caller of smp_call_function_map(), namely
        __smp_call_function() (and make smp_call_function() a wrapper over this)
        which does *not* warn about disabled irqs
      
      * Use this __smp_call_function() from the panic codepath's smp_send_stop()
      
      We also end having to move code of smp_send_stop() below the definition
      of __smp_call_function().
      Signed-off-by: NSatyam Sharma <satyam@infradead.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8fd7675c
  10. 03 8月, 2007 1 次提交
  11. 22 7月, 2007 1 次提交
  12. 22 5月, 2007 1 次提交
  13. 07 5月, 2007 1 次提交
  14. 13 4月, 2007 1 次提交
    • B
      [POWERPC] Make tlb flush batch use lazy MMU mode · a741e679
      Benjamin Herrenschmidt 提交于
      The current tlb flush code on powerpc 64 bits has a subtle race since we
      lost the page table lock due to the possible faulting in of new PTEs
      after a previous one has been removed but before the corresponding hash
      entry has been evicted, which can leads to all sort of fatal problems.
      
      This patch reworks the batch code completely. It doesn't use the mmu_gather
      stuff anymore. Instead, we use the lazy mmu hooks that were added by the
      paravirt code. They have the nice property that the enter/leave lazy mmu
      mode pair is always fully contained by the PTE lock for a given range
      of PTEs. Thus we can guarantee that all batches are flushed on a given
      CPU before it drops that lock.
      
      We also generalize batching for any PTE update that require a flush.
      
      Batching is now enabled on a CPU by arch_enter_lazy_mmu_mode() and
      disabled by arch_leave_lazy_mmu_mode(). The code epects that this is
      always contained within a PTE lock section so no preemption can happen
      and no PTE insertion in that range from another CPU. When batching
      is enabled on a CPU, every PTE updates that need a hash flush will
      use the batch for that flush.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a741e679
  15. 14 2月, 2007 1 次提交
  16. 12 1月, 2007 1 次提交
    • G
      [PATCH] Change cpu_up and co from __devinit to __cpuinit · b282b6f8
      Gautham R Shenoy 提交于
      Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n
      with CONFIG_RELOCATABLE = y generates the following modpost warnings
      
      WARNING: vmlinux - Section mismatch: reference to .init.data: from
      .text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from
      .text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up'
      WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up
      from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from
      .text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from
      .text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from
      .text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up'
      
      This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are
      defined as __devinit
      AND
      __cpu_up calls some __cpuinit functions.
      
      Since __cpuinit would map to __init with this kind of a configuration,
      we get a .text refering .init.data warning.
      
      This patch solves the problem by converting all of __cpu_up, _cpu_up
      and cpu_up from __devinit to __cpuinit. The approach is justified since
      the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or
      are of __init type.
      
      Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up
      in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would
      land up in .init section.
      
      Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b282b6f8
  17. 25 10月, 2006 1 次提交
  18. 05 10月, 2006 1 次提交
    • D
      IRQ: Maintain regs pointer globally rather than passing to IRQ handlers · 7d12e780
      David Howells 提交于
      Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
      of passing regs around manually through all ~1800 interrupt handlers in the
      Linux kernel.
      
      The regs pointer is used in few places, but it potentially costs both stack
      space and code to pass it around.  On the FRV arch, removing the regs parameter
      from all the genirq function results in a 20% speed up of the IRQ exit path
      (ie: from leaving timer_interrupt() to leaving do_IRQ()).
      
      Where appropriate, an arch may override the generic storage facility and do
      something different with the variable.  On FRV, for instance, the address is
      maintained in GR28 at all times inside the kernel as part of general exception
      handling.
      
      Having looked over the code, it appears that the parameter may be handed down
      through up to twenty or so layers of functions.  Consider a USB character
      device attached to a USB hub, attached to a USB controller that posts its
      interrupts through a cascaded auxiliary interrupt controller.  A character
      device driver may want to pass regs to the sysrq handler through the input
      layer which adds another few layers of parameter passing.
      
      I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
      main part of the code on FRV and i386, though I can't test most of the drivers.
      I've also done partial conversion for powerpc and MIPS - these at least compile
      with minimal configurations.
      
      This will affect all archs.  Mostly the changes should be relatively easy.
      Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
      
      	struct pt_regs *old_regs = set_irq_regs(regs);
      
      And put the old one back at the end:
      
      	set_irq_regs(old_regs);
      
      Don't pass regs through to generic_handle_irq() or __do_IRQ().
      
      In timer_interrupt(), this sort of change will be necessary:
      
      	-	update_process_times(user_mode(regs));
      	-	profile_tick(CPU_PROFILING, regs);
      	+	update_process_times(user_mode(get_irq_regs()));
      	+	profile_tick(CPU_PROFILING);
      
      I'd like to move update_process_times()'s use of get_irq_regs() into itself,
      except that i386, alone of the archs, uses something other than user_mode().
      
      Some notes on the interrupt handling in the drivers:
      
       (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
           the input_dev struct.
      
       (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
           something different depending on whether it's been supplied with a regs
           pointer or not.
      
       (*) Various IRQ handler function pointers have been moved to type
           irq_handler_t.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
      7d12e780
  19. 25 7月, 2006 1 次提交
  20. 07 7月, 2006 1 次提交
  21. 01 7月, 2006 1 次提交
  22. 21 6月, 2006 1 次提交
  23. 29 3月, 2006 1 次提交
  24. 24 2月, 2006 1 次提交
    • P
      powerpc: Implement accurate task and CPU time accounting · c6622f63
      Paul Mackerras 提交于
      This implements accurate task and cpu time accounting for 64-bit
      powerpc kernels.  Instead of accounting a whole jiffy of time to a
      task on a timer interrupt because that task happened to be running at
      the time, we now account time in units of timebase ticks according to
      the actual time spent by the task in user mode and kernel mode.  We
      also count the time spent processing hardware and software interrupts
      accurately.  This is conditional on CONFIG_VIRT_CPU_ACCOUNTING.  If
      that is not set, we do tick-based approximate accounting as before.
      
      To get this accurate information, we read either the PURR (processor
      utilization of resources register) on POWER5 machines, or the timebase
      on other machines on
      
      * each entry to the kernel from usermode
      * each exit to usermode
      * transitions between process context, hard irq context and soft irq
        context in kernel mode
      * context switches.
      
      On POWER5 systems with shared-processor logical partitioning we also
      read both the PURR and the timebase at each timer interrupt and
      context switch in order to determine how much time has been taken by
      the hypervisor to run other partitions ("steal" time).  Unfortunately,
      since we need values of the PURR on both threads at the same time to
      accurately calculate the steal time, and since we can only calculate
      steal time on a per-core basis, the apportioning of the steal time
      between idle time (time which we ceded to the hypervisor in the idle
      loop) and actual stolen time is somewhat approximate at the moment.
      
      This is all based quite heavily on what s390 does, and it uses the
      generic interfaces that were added by the s390 developers,
      i.e. account_system_time(), account_user_time(), etc.
      
      This patch doesn't add any new interfaces between the kernel and
      userspace, and doesn't change the units in which time is reported to
      userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(),
      times(), etc.  Internally the various task and cpu times are stored in
      timebase units, but they are converted to USER_HZ units (1/100th of a
      second) when reported to userspace.  Some precision is therefore lost
      but there should not be any accumulating error, since the internal
      accumulation is at full precision.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c6622f63
  25. 07 2月, 2006 1 次提交
  26. 13 1月, 2006 1 次提交
  27. 09 1月, 2006 3 次提交
  28. 16 11月, 2005 1 次提交
  29. 11 11月, 2005 1 次提交
    • B
      [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel · a7f290da
      Benjamin Herrenschmidt 提交于
      This patch moves the vdso's to arch/powerpc, adds support for the 32
      bits vdso to the 32 bits kernel, rename systemcfg (finally !), and adds
      some new (still untested) routines to both vdso's: clock_gettime() with
      support for CLOCK_REALTIME and CLOCK_MONOTONIC, clock_getres() (same
      clocks) and get_tbfreq() for glibc to retreive the timebase frequency.
      
      Tom,Steve: The implementation of get_tbfreq() I've done for 32 bits
      returns a long long (r3, r4) not a long. This is such that if we ever
      add support for >4Ghz timebases on ppc32, the userland interface won't
      have to change.
      
      I have tested gettimeofday() using some glibc patches in both ppc32 and
      ppc64 kernels using 32 bits userland (I haven't had a chance to test a
      64 bits userland yet, but the implementation didn't change and was
      tested earlier). I haven't tested yet the new functions.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a7f290da
  30. 10 11月, 2005 3 次提交
  31. 08 11月, 2005 1 次提交
  32. 05 11月, 2005 1 次提交