1. 04 1月, 2013 1 次提交
    • G
      POWERPC: drivers: remove __dev* attributes. · cad5cef6
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad5cef6
  2. 30 10月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online · 512691d4
      Paul Mackerras 提交于
      When a Book3S HV KVM guest is running, we need the host to be in
      single-thread mode, that is, all of the cores (or at least all of
      the cores where the KVM guest could run) to be running only one
      active hardware thread.  This is because of the hardware restriction
      in POWER processors that all of the hardware threads in the core
      must be in the same logical partition.  Complying with this restriction
      is much easier if, from the host kernel's point of view, only one
      hardware thread is active.
      
      This adds two hooks in the SMP hotplug code to allow the KVM code to
      make sure that secondary threads (i.e. hardware threads other than
      thread 0) cannot come online while any KVM guest exists.  The KVM
      code still has to check that any core where it runs a guest has the
      secondary threads offline, but having done that check it can now be
      sure that they will not come online while the guest is running.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      512691d4
  3. 19 9月, 2012 1 次提交
  4. 13 9月, 2012 1 次提交
  5. 05 9月, 2012 1 次提交
    • P
      powerpc: Make sure IPI handlers see data written by IPI senders · 9fb1b36c
      Paul Mackerras 提交于
      We have been observing hangs, both of KVM guest vcpu tasks and more
      generally, where a process that is woken doesn't properly wake up and
      continue to run, but instead sticks in TASK_WAKING state.  This
      happens because the update of rq->wake_list in ttwu_queue_remote()
      is not ordered with the update of ipi_message in
      smp_muxed_ipi_message_pass(), and the reading of rq->wake_list in
      scheduler_ipi() is not ordered with the reading of ipi_message in
      smp_ipi_demux().  Thus it is possible for the IPI receiver not to see
      the updated rq->wake_list and therefore conclude that there is nothing
      for it to do.
      
      In order to make sure that anything done before smp_send_reschedule()
      is ordered before anything done in the resulting call to scheduler_ipi(),
      this adds barriers in smp_muxed_message_pass() and smp_ipi_demux().
      The barrier in smp_muxed_message_pass() is a full barrier to ensure that
      there is a full ordering between the smp_send_reschedule() caller and
      scheduler_ipi().  In smp_ipi_demux(), we use xchg() rather than
      xchg_local() because xchg() includes release and acquire barriers.
      Using xchg() rather than xchg_local() makes sense given that
      ipi_message is not just accessed locally.
      
      This moves the barrier between setting the message and calling the
      cause_ipi() function into the individual cause_ipi implementations.
      Most of them -- those that used outb, out_8 or similar -- already had
      a full barrier because out_8 etc. include a sync before the MMIO
      store.  This adds an explicit barrier in the two remaining cases.
      
      These changes made no measurable difference to the speed of IPIs as
      measured using a simple ping-pong latency test across two CPUs on
      different cores of a POWER7 machine.
      
      The analysis of the reason why processes were not waking up properly
      is due to Milton Miller.
      
      Cc: stable@vger.kernel.org # v3.0+
      Reported-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9fb1b36c
  6. 11 7月, 2012 1 次提交
    • A
      powerpc: Add VDSO version of getcpu · 18ad51dd
      Anton Blanchard 提交于
      We have a request for a fast method of getting CPU and NUMA node IDs
      from userspace. This patch implements a getcpu VDSO function,
      similar to x86.
      
      Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
      modified by a KVM guest, so we save the SPRG3 value in the paca and
      restore it when transitioning from the guest to the host.
      
      I have a glibc patch that implements sched_getcpu on top of this.
      Testing on a POWER7:
      
      baseline: 538 cycles
      vdso:      30 cycles
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      18ad51dd
  7. 03 7月, 2012 1 次提交
    • Y
      powerpc/smp: remove call to ipi_call_lock()/ipi_call_unlock() · e250d4bc
      Yong Zhang 提交于
      1) call_function.lock used in smp_call_function_many() is just to protect
         call_function.queue and &data->refs, cpu_online_mask is outside of the
         lock. And it's not necessary to protect cpu_online_mask,
         because data->cpumask is pre-calculate and even if a cpu is brougt up
         when calling arch_send_call_function_ipi_mask(), it's harmless because
         validation test in generic_smp_call_function_interrupt() will take care
         of it.
      
      2) For cpu down issue, stop_machine() will guarantee that no concurrent
         smp_call_fuction() is processing.
      Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e250d4bc
  8. 05 6月, 2012 1 次提交
  9. 26 4月, 2012 2 次提交
    • T
      powerpc: Use generic idle thread allocation · 17e32eac
      Thomas Gleixner 提交于
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Link: http://lkml.kernel.org/r/20120420124557.311212868@linutronix.de
      17e32eac
    • T
      smp: Add task_struct argument to __cpu_up() · 8239c25f
      Thomas Gleixner 提交于
      Preparatory patch to make the idle thread allocation for secondary
      cpus generic.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
      8239c25f
  10. 29 3月, 2012 1 次提交
  11. 22 12月, 2011 1 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  12. 25 11月, 2011 1 次提交
  13. 08 11月, 2011 1 次提交
  14. 01 11月, 2011 1 次提交
  15. 20 9月, 2011 1 次提交
  16. 27 7月, 2011 1 次提交
  17. 12 7月, 2011 1 次提交
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
  18. 08 7月, 2011 1 次提交
  19. 29 6月, 2011 1 次提交
  20. 20 6月, 2011 1 次提交
  21. 26 5月, 2011 1 次提交
    • M
      powerpc/cell: Use common smp ipi actions · 7ef71d75
      Milton Miller 提交于
      The cell iic interrupt controller has enough software caused interrupts
      to use a unique interrupt for each of the 4 messages powerpc uses.
      This means each interrupt gets its own irq action/data combination.
      
      Use the seperate, optimized, arch common ipi action functions
      registered via the helper smp_request_message_ipi instead passing the
      message as action data to a single action that then demultipexes to
      the required acton via a switch statement.
      
      smp_request_message_ipi will register the action as IRQF_PER_CPU
      and IRQF_DISABLED, and WARN if the allocation fails for some reason,
      so no need to print on that failure.  It will return positive if
      the message will not be used by the kernel, in which case we can
      free the virq.
      
      In addition to elimiating inefficient code, this also corrects the
      error that a kernel built with kexec but without a debugger would
      not register the ipi for kdump to notify the other cpus of a crash.
      
      This also restores the debugger action to be static to kernel/smp.c.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7ef71d75
  22. 19 5月, 2011 5 次提交
    • M
      powerpc: Use bytes instead of bitops in smp ipi multiplexing · 71454272
      Milton Miller 提交于
      Since there are only 4 messages, we can replace the atomic bit set
      (which uses atomic load reserve and store conditional sequence) with
      a byte stores to seperate bytes.  We still have to perform a load
      reserve and store conditional sequence to avoid loosing messages on
      reception but we can do that with a single call to xchg.
      
      The do {} while and __BIG_ENDIAN specific mask testing was chosen by
      looking at the generated asm code.  On gcc-4.4, the bit masking becomes
      a simple bit mask and test of the register returned from xchg without
      storing and loading the value to the stack like attempts with a union
      of bytes and an int (or worse, loading single bit constants from the
      constant pool into non-voliatle registers that had to be preseved on
      the stack).  The do {} while avoids an unconditional branch to the
      end of the loop to test the entry / repeat condition of a while loop
      and instead optimises for the expected single iteration of the loop.
      
      We have a full mb() at the beginning to cover ordering between send,
      ipi, and receive so we can use xchg_local and forgo the further
      acquire and release barriers of xchg.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      71454272
    • M
      powerpc: Add kconfig for muxed smp ipi support · 1ece355b
      Milton Miller 提交于
      Compile the new smp ipi mux and demux code only if a platform
      will make use of it.  The new config is selected as required.
      
      The new cause_ipi smp op is only available conditionally to point out
      configs where the select is required; this makes setting the op an
      immediate fail instead of a deferred unresolved symbol at link.
      
      This also creates a new config for power surge powermac upgrade support
      that can be disabled in expert mode but is default on.
      
      I also removed the depends / default y on CONFIG_XICS since it is selected
      by PSERIES.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1ece355b
    • M
      powerpc: Consolidate ipi message mux and demux · 23d72bfd
      Milton Miller 提交于
      Consolidate the mux and demux of ipi messages into smp.c and call
      a new smp_ops callback to actually trigger the ipi.
      
      The powerpc architecture code is optimised for having 4 distinct
      ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
      single, scheduler ipi, and enter debugger).  However, several interrupt
      controllers only provide a single software triggered interrupt that
      can be delivered to each cpu.  To resolve this limitation, each smp_ops
      implementation created a per-cpu variable that is manipulated with atomic
      bitops.  Since these lines will be contended they are optimialy marked as
      shared_aligned and take a full cache line for each cpu.  Distro kernels
      may have 2 or 3 of these in their config, each taking per-cpu space
      even though at most one will be in use.
      
      This consolidation removes smp_message_recv and replaces the single call
      actions cases with direct calls from the common message recognition loop.
      The complicated debugger ipi case with its muxed crash handling code is
      moved to debug_ipi_action which is now called from the demux code (instead
      of the multi-message action calling smp_message_recv).
      
      I put a call to reschedule_action to increase the likelyhood of correctly
      merging the anticipated scheduler_ipi() hook coming from the scheduler
      tree; that single required call can be inlined later.
      
      The actual message decode is a copy of the old pseries xics code with its
      memory barriers and cache line spacing, augmented with a per-cpu unsigned
      long based on the book-e doorbell code.  The optional data is set via a
      callback from the implementation and is passed to the new cause-ipi hook
      along with the logical cpu number.  While currently only the doorbell
      implemntation uses this data it should be almost zero cost to retrieve and
      pass it -- it adds a single register load for the argument from the same
      cache line to which we just completed a store and the register is dead
      on return from the call.  I extended the data element from unsigned int
      to unsigned long in case some other code wanted to associate a pointer.
      
      The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
      conditioned on the CPU_DBELL feature.  The ifdef guard could be relaxed
      to CONFIG_SMP but I left it with BOOKE for now.
      
      Also, the doorbell interrupt vector for book-e was not calling irq_enter
      and irq_exit, which throws off cpu accounting and causes code to not
      realize it is running in interrupt context.  Add the missing calls.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      23d72bfd
    • M
      powerpc: Remove call sites of MSG_ALL_BUT_SELF · e0476371
      Milton Miller 提交于
      The only user of MSG_ALL_BUT_SELF in the whole kernel tree is powerpc,
      and it only uses it to start the debugger. Both debuggers always call
      smp_send_debugger_break with MSG_ALL_BUT_SELF, and only mpic can do
      anything more optimal than a loop over all online cpus, but all message
      passing implementations have to code for this special delivery target.
      
      Convert smp_send_debugger_break to take void and loop calling the smp_ops
      message_pass function for each of the other cpus in the online cpumask.
      
      Use raw_smp_processor_id() because we are either entering the debugger
      or trying to start kdump and the additional warning it not useful were
      it to trigger.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e0476371
    • K
      powerpc/4xx: Fix regression in SMP on 476 · c560bbce
      kerstin jonsson 提交于
      commit c56e5853 breaks SMP support in PPC_47x chip.
       secondary_ti must be set to current thread info before callin kick_cpu or else
       start_secondary_47x will jump into void when trying to return to c-code.
       In the current setup secondary_ti is initialized before the CPU idle task is started
       and only the boot core will start. I am not sure this is the correct solution, but it
       makes SMP possible in my chip.
       Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
       available in head_44x.S - start_secondary_resume?
      Signed-off-by: NKerstin Jonsson <kerstin.jonsson@ericsson.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c560bbce
  23. 04 5月, 2011 1 次提交
  24. 20 4月, 2011 1 次提交
  25. 14 4月, 2011 1 次提交
  26. 01 4月, 2011 9 次提交
  27. 29 11月, 2010 1 次提交
    • V
      powerpc: Cleanup APIs for cpu/thread/core mappings · 99d86705
      Vaidyanathan Srinivasan 提交于
      These APIs take logical cpu number as input
      Change cpu_first_thread_in_core() to cpu_first_thread_sibling()
      Change cpu_last_thread_in_core() to cpu_last_thread_sibling()
      
      These APIs convert core number (index) to logical cpu/thread numbers
      Add cpu_first_thread_of_core(int core)
      Changed cpu_thread_to_core() to cpu_core_index_of_thread(int cpu)
      
      The goal is to make 'threads_per_core' accessible to the
      pseries_energy module.  Instead of making an API to read
      threads_per_core, this is a higher level wrapper function to
      convert from logical cpu number to core number.
      
      The current APIs cpu_first_thread_in_core() and
      cpu_last_thread_in_core() returns logical CPU number while
      cpu_thread_to_core() returns core number or index which is
      not a logical CPU number.  The new APIs are now clearly named to
      distinguish 'core number' versus first and last 'logical cpu
      number' in that core.
      
      The new APIs cpu_{first,last}_thread_sibling() work on
      logical cpu numbers.  While cpu_first_thread_of_core() and
      cpu_core_index_of_thread() work on core index.
      
      Example usage:  (4 threads per core system)
      
      cpu_first_thread_sibling(5) = 4
      cpu_last_thread_sibling(5) = 7
      cpu_core_index_of_thread(5) = 1
      cpu_first_thread_of_core(1) = 4
      
      cpu_core_index_of_thread() is used in cpu_to_drc_index() in the
      module and cpu_first_thread_of_core() is used in
      drc_index_to_cpu() in the module.
      
      Make API changes to few callers.  Export symbols for use in modules.
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      99d86705