1. 02 3月, 2016 3 次提交
    • T
      cpu/hotplug: Unpark smpboot threads from the state machine · 931ef163
      Thomas Gleixner 提交于
      Handle the smpboot threads in the state machine.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      931ef163
    • T
      cpu/hotplug: Convert to a state machine for the control processor · cff7d378
      Thomas Gleixner 提交于
      Move the split out steps into a callback array and let the cpu_up/down
      code iterate through the array functions. For now most of the
      callbacks are asymmetric to resemble the current hotplug maze.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cff7d378
    • T
      cpu/hotplug: Restructure FROZEN state handling · 090e77c3
      Thomas Gleixner 提交于
      There are only a few callbacks which really care about FROZEN
      vs. !FROZEN. No need to have extra states for this.
      
      Publish the frozen state in an extra variable which is updated under
      the hotplug lock and let the users interested deal with it w/o
      imposing that extra state checks on everyone.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.334912357@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      090e77c3
  2. 08 10月, 2015 1 次提交
  3. 18 7月, 2015 1 次提交
    • N
      include, lib: add __printf attributes to several function prototypes · 8db14860
      Nicolas Iooss 提交于
      Using __printf attributes helps to detect several format string issues
      at compile time (even though -Wformat-security is currently disabled in
      Makefile).  For example it can detect when formatting a pointer as a
      number, like the issue fixed in commit a3fa71c4 ("wl18xx: show
      rx_frames_per_rates as an array as it really is"), or when the arguments
      do not match the format string, c.f.  for example commit 5ce1aca8
      ("reiserfs: fix __RASSERT format string").
      
      To prevent similar bugs in the future, add a __printf attribute to every
      function prototype which needs one in include/linux/ and lib/.  These
      functions were mostly found by using gcc's -Wsuggest-attribute=format
      flag.
      Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8db14860
  4. 13 4月, 2015 2 次提交
    • I
      cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well · 590ee7db
      Ingo Molnar 提交于
      Now that we are using smpboot_thread_init() in init/main.c as well,
      provide it for !CONFIG_SMP as well.
      
      This addresses a !CONFIG_SMP build failure.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      590ee7db
    • P
      cpu: Defer smpboot kthread unparking until CPU known to scheduler · 00df35f9
      Paul E. McKenney 提交于
      Currently, smpboot_unpark_threads() is invoked before the incoming CPU
      has been added to the scheduler's runqueue structures.  This might
      potentially cause the unparked kthread to run on the wrong CPU, since the
      correct CPU isn't fully set up yet.
      
      That causes a sporadic, hard to debug boot crash triggering on some
      systems, reported by Borislav Petkov, and bisected down to:
      
        2a442c9c ("x86: Use common outgoing-CPU-notification code")
      
      This patch places smpboot_unpark_threads() in a CPU hotplug
      notifier with priority set so that these kthreads are unparked just after
      the CPU has been added to the runqueues.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      00df35f9
  5. 13 3月, 2015 1 次提交
    • P
      rcu: Handle outgoing CPUs on exit from idle loop · 88428cc5
      Paul E. McKenney 提交于
      This commit informs RCU of an outgoing CPU just before that CPU invokes
      arch_cpu_idle_dead() during its last pass through the idle loop (via a
      new CPU_DYING_IDLE notifier value).  This change means that RCU need not
      deal with outgoing CPUs passing through the scheduler after informing
      RCU that they are no longer online.  Note that removing the CPU from
      the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
      and orphaning callbacks is still done at CPU_DEAD time, the reason being
      that at CPU_DEAD time we have another CPU that can adopt them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88428cc5
  6. 12 3月, 2015 1 次提交
    • P
      smpboot: Add common code for notification from dying CPU · 8038dad7
      Paul E. McKenney 提交于
      RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
      (They -can- use SRCU, but not RCU.)  This means that any use of RCU
      during or after the call to arch_cpu_idle_dead().  Unfortunately,
      commit 2ed53c0d added a complete() call, which will contain RCU
      read-side critical sections if there is a task waiting to be awakened.
      
      Which, as it turns out, there almost never is.  In my qemu/KVM testing,
      the to-be-awakened task is not yet asleep more than 99.5% of the time.
      In current mainline, failure is even harder to reproduce, requiring a
      virtualized environment that delays the outgoing CPU by at least three
      jiffies between the time it exits its stop_machine() task at CPU_DYING
      time and the time it calls arch_cpu_idle_dead() from the idle loop.
      However, this problem really can occur, especially in virtualized
      environments, and therefore really does need to be fixed
      
      This suggests moving back to the polling loop, but using a much shorter
      wait, with gentle exponential backoff instead of the old 100-millisecond
      wait.  Most of the time, the loop will exit without waiting at all,
      and almost all of the remaining uses will wait only five microseconds.
      If the outgoing CPU is preempted, a loop will wait one jiffy, then
      increase the wait by a factor of 11/10ths, rounding up.  As before, there
      is a five-second timeout.
      
      This commit therefore provides common-code infrastructure to do the
      dying-to-surviving CPU handoff in a safe manner.  This code also
      provides an indication at CPU-online of whether the CPU to be onlined
      previously timed out on offline.  The new cpu_check_up_prepare() function
      returns -EBUSY if this CPU previously took more than five seconds to
      go offline, or -EAGAIN if it has not yet managed to go offline.  The
      rationale for -EAGAIN is that it might still be preempted, so an additional
      wait might well find it correctly offlined.  Architecture-specific code
      can decide how to handle these conditions.  Systems in which CPUs take
      themselves completely offline might respond to an -EBUSY return as if
      it was a zero (success) return.  Systems in which the surviving CPU must
      take some action might take it at this time, or might simply mark the
      other CPU as unusable.
      
      Note that architectures that take the easy way out and simply pass the
      -EBUSY and -EAGAIN upwards will change the sysfs API.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      [ paulmck: Fixed state machine for architectures that don't check earlier
        CPU-hotplug results as suggested by James Hogan. ]
      8038dad7
  7. 08 11月, 2014 1 次提交
    • S
      drivers: base: add cpu_device_create to support per-cpu devices · 3d52943b
      Sudeep Holla 提交于
      This patch adds a new function to create per-cpu devices.
      This helps in:
      1. reusing the device infrastructure to create any cpu related
         attributes and corresponding sysfs instead of creating and
         dealing with raw kobjects directly
      2. retaining the legacy path(/sys/devices/system/cpu/..) to support
         existing sysfs ABI
      3. avoiding to create links in the bus directory pointing to the
         device as there would be per-cpu instance of these devices with
         the same name since dev->bus is not populated to cpu_sysbus on
         purpose
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d52943b
  8. 19 9月, 2014 1 次提交
    • P
      rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42
      Paul E. McKenney 提交于
      Currently, the expedited grace-period primitives do get_online_cpus().
      This greatly simplifies their implementation, but means that calls
      to them holding locks that are acquired by CPU-hotplug notifiers (to
      say nothing of calls to these primitives from CPU-hotplug notifiers)
      can deadlock.  But this is starting to become inconvenient, as can be
      seen here: https://lkml.org/lkml/2014/8/5/754.  The problem in this
      case is that some developers need to acquire a mutex from a CPU-hotplug
      notifier, but also need to hold it across a synchronize_rcu_expedited().
      As noted above, this currently results in deadlock.
      
      This commit avoids the deadlock and retains the simplicity by creating
      a try_get_online_cpus(), which returns false if the get_online_cpus()
      reference count could not immediately be incremented.  If a call to
      try_get_online_cpus() returns true, the expedited primitives operate as
      before.  If a call returns false, the expedited primitives fall back to
      normal grace-period operations.  This falling back of course results in
      increased grace-period latency, but only during times when CPU hotplug
      operations are actually in flight.  The effect should therefore be
      negligible during normal operation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Tested-by: NLan Tianyu <tianyu.lan@intel.com>
      dd56af42
  9. 07 6月, 2014 1 次提交
  10. 20 3月, 2014 1 次提交
    • S
      CPU hotplug: Provide lockless versions of callback registration functions · 93ae4f97
      Srivatsa S. Bhat 提交于
      The following method of CPU hotplug callback registration is not safe
      due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
      and the cpu_hotplug.lock.
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      The deadlock is shown below:
      
                CPU 0                                         CPU 1
                -----                                         -----
      
         Acquire cpu_hotplug.lock
         [via get_online_cpus()]
      
                                                    CPU online/offline operation
                                                    takes cpu_add_remove_lock
                                                    [via cpu_maps_update_begin()]
      
         Try to acquire
         cpu_add_remove_lock
         [via register_cpu_notifier()]
      
                                                    CPU online/offline operation
                                                    tries to acquire cpu_hotplug.lock
                                                    [via cpu_hotplug_begin()]
      
                                  *** DEADLOCK! ***
      
      The problem here is that callback registration takes the locks in one order
      whereas the CPU hotplug operations take the same locks in the opposite order.
      To avoid this issue and to provide a race-free method to register CPU hotplug
      callbacks (along with initialization of already online CPUs), introduce new
      variants of the callback registration APIs that simply register the callbacks
      without holding the cpu_add_remove_lock during the registration. That way,
      we can avoid the ABBA scenario. However, we will need to hold the
      cpu_add_remove_lock throughout the entire critical section, to protect updates
      to the callback/notifier chain.
      
      This can be achieved by writing the callback registration code as follows:
      
      	cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* This doesn't take the cpu_add_remove_lock */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_maps_update_done();  [ or cpu_notifier_register_done(); see below ]
      
      Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
      because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
      notifiers, and hence get_online_cpus() cannot provide the necessary
      synchronization to protect the callback/notifier chains against concurrent
      reads and writes. On the other hand, since the cpu_add_remove_lock protects
      the entire hotplug operation (including CPU_POST_DEAD), we can use
      cpu_maps_update_begin/done() to guarantee proper synchronization.
      
      Also, since cpu_maps_update_begin/done() is like a super-set of
      get/put_online_cpus(), the former naturally protects the critical sections
      from concurrent hotplug operations.
      
      Since the names cpu_maps_update_begin/done() don't make much sense in CPU
      hotplug callback registration scenarios, we'll introduce new APIs named
      cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
      
      In summary, introduce the lockless variants of un/register_cpu_notifier() and
      also export the cpu_notifier_register_begin/done() APIs for use by modules.
      This way, we provide a race-free way to register hotplug callbacks as well as
      perform initialization for the CPUs that are already online.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      93ae4f97
  11. 19 2月, 2014 1 次提交
  12. 16 10月, 2013 1 次提交
  13. 01 10月, 2013 1 次提交
    • T
      hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() · 6dedcca6
      Toshi Kani 提交于
      cpu_hotplug_driver_lock() serializes CPU online/offline operations
      when ARCH_CPU_PROBE_RELEASE is set.  This lock interface is no longer
      necessary with the following reason:
      
       - lock_device_hotplug() now protects CPU online/offline operations,
         including the probe & release interfaces enabled by
         ARCH_CPU_PROBE_RELEASE.  The use of cpu_hotplug_driver_lock() is
         redundant.
       - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
         is defined, which is misleading and is only enabled on powerpc.
      
      This patch removes the cpu_hotplug_driver_lock() interface.  As
      a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
      probe & release interface as intended.  There is no functional change
      in this patch.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6dedcca6
  14. 21 8月, 2013 1 次提交
    • S
      of: move of_get_cpu_node implementation to DT core library · 183912d3
      Sudeep KarkadaNagesha 提交于
      This patch moves the generalized implementation of of_get_cpu_node from
      PowerPC to DT core library, thereby adding support for retrieving cpu
      node for a given logical cpu index on any architecture.
      
      The CPU subsystem can now use this function to assign of_node in the
      cpu device while registering CPUs.
      
      It is recommended to use these helper function only in pre-SMP/early
      initialisation stages to retrieve CPU device node pointers in logical
      ordering. Once the cpu devices are registered, it can be retrieved easily
      from cpu device of_node which avoids unnecessary parsing and matching.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Acked-by: NRob Herring <rob.herring@calxeda.com>
      Signed-off-by: NSudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
      183912d3
  15. 13 8月, 2013 1 次提交
    • T
      ACPI / processor: Acquire writer lock to update CPU maps · b9d10be7
      Toshi Kani 提交于
      CPU system maps are protected with reader/writer locks.  The reader
      lock, get_online_cpus(), assures that the maps are not updated while
      holding the lock.  The writer lock, cpu_hotplug_begin(), is used to
      udpate the cpu maps along with cpu_maps_update_begin().
      
      However, the ACPI processor handler updates the cpu maps without
      holding the the writer lock.
      
      acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
      update cpu_possible_mask and cpu_present_mask.  acpi_unmap_lsapic()
      is called from acpi_processor_remove() to update cpu_possible_mask.
      Currently, they are either unprotected or protected with the reader
      lock, which is not correct.
      
      For example, the get_online_cpus() below is supposed to assure that
      cpu_possible_mask is not changed while the code is iterating with
      for_each_possible_cpu().
      
              get_online_cpus();
              for_each_possible_cpu(cpu) {
      		:
              }
              put_online_cpus();
      
      However, this lock has no protection with CPU hotplug since the ACPI
      processor handler does not use the writer lock when it updates
      cpu_possible_mask.  The reader lock does not serialize within the
      readers.
      
      This patch protects them with the writer lock with cpu_hotplug_begin()
      along with cpu_maps_update_begin(), which must be held before calling
      cpu_hotplug_begin().  It also protects arch_register_cpu() /
      arch_unregister_cpu(), which creates / deletes a sysfs cpu device
      interface.  For this purpose it changes cpu_hotplug_begin() and
      cpu_hotplug_done() to global and exports them in cpu.h.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b9d10be7
  16. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  17. 13 6月, 2013 1 次提交
  18. 28 5月, 2013 1 次提交
  19. 08 4月, 2013 2 次提交
  20. 18 7月, 2012 1 次提交
    • T
      workqueue: perform cpu down operations from low priority cpu_notifier() · 65758202
      Tejun Heo 提交于
      Currently, all workqueue cpu hotplug operations run off
      CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
      ensure that workqueue is up and running while bringing up a CPU before
      other notifiers try to use workqueue on the CPU.
      
      Per-cpu workqueues are supposed to remain working and bound to the CPU
      for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
      with workqueue offlining running with higher priority because
      workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
      runs the per-cpu workqueue without concurrency management without
      explicitly detaching the existing workers.
      
      However, if the trustee needs to create new workers, it creates
      unbound workers which may wander off to other CPUs while
      CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
      down is cancelled, the per-CPU workqueue may end up with workers which
      aren't bound to the CPU.
      
      While reliably reproducible with a convoluted artificial test-case
      involving scheduling and flushing CPU burning work items from CPU down
      notifiers, this isn't very likely to happen in the wild, and, even
      when it happens, the effects are likely to be hidden by the following
      successful CPU down.
      
      Fix it by using different priorities for up and down notifiers - high
      priority for up operations and low priority for down operations.
      
      Workqueue cpu hotplug operations will soon go through further cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      65758202
  21. 01 6月, 2012 1 次提交
    • A
      cpu: introduce clear_tasks_mm_cpumask() helper · cb79295e
      Anton Vorontsov 提交于
      Many architectures clear tasks' mm_cpumask like this:
      
      	read_lock(&tasklist_lock);
      	for_each_process(p) {
      		if (p->mm)
      			cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
      	}
      	read_unlock(&tasklist_lock);
      
      Depending on the context, the code above may have several problems,
      such as:
      
      1. Working with task->mm w/o getting mm or grabing the task lock is
         dangerous as ->mm might disappear (exit_mm() assigns NULL under
         task_lock(), so tasklist lock is not enough).
      
      2. Checking for process->mm is not enough because process' main
         thread may exit or detach its mm via use_mm(), but other threads
         may still have a valid mm.
      
      This patch implements a small helper function that does things
      correctly, i.e.:
      
      1. We take the task's lock while whe handle its mm (we can't use
         get_task_mm()/mmput() pair as mmput() might sleep);
      
      2. To catch exited main thread case, we use find_lock_task_mm(),
         which walks up all threads and returns an appropriate task
         (with task lock held).
      
      Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
      the new helper, instead we take the rcu read lock. We can do this
      because the function is called after the cpu is taken down and marked
      offline, so no new tasks will get this cpu set in their mm mask.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb79295e
  22. 17 5月, 2012 1 次提交
    • P
      sched: Remove stale power aware scheduling remnants and dysfunctional knobs · 8e7fbcbc
      Peter Zijlstra 提交于
      It's been broken forever (i.e. it's not scheduling in a power
      aware fashion), as reported by Suresh and others sending
      patches, and nobody cares enough to fix it properly ...
      so remove it to make space free for something better.
      
      There's various problems with the code as it stands today, first
      and foremost the user interface which is bound to topology
      levels and has multiple values per level. This results in a
      state explosion which the administrator or distro needs to
      master and almost nobody does.
      
      Furthermore large configuration state spaces aren't good, it
      means the thing doesn't just work right because it's either
      under so many impossibe to meet constraints, or even if
      there's an achievable state workloads have to be aware of
      it precisely and can never meet it for dynamic workloads.
      
      So pushing this kind of decision to user-space was a bad idea
      even with a single knob - it's exponentially worse with knobs
      on every node of the topology.
      
      There is a proposal to replace the user interface with a single
      3 state knob:
      
       sched_balance_policy := { performance, power, auto }
      
      where 'auto' would be the preferred default which looks at things
      like Battery/AC mode and possible cpufreq state or whatever the hw
      exposes to show us power use expectations - but there's been no
      progress on it in the past many months.
      
      Aside from that, the actual implementation of the various knobs
      is known to be broken. There have been sporadic attempts at
      fixing things but these always stop short of reaching a mergable
      state.
      
      Therefore this wholesale removal with the hopes of spurring
      people who care to come forward once again and work on a
      coherent replacement.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8e7fbcbc
  23. 16 3月, 2012 1 次提交
    • P
      device.h: audit and cleanup users in main include dir · 313162d0
      Paul Gortmaker 提交于
      The <linux/device.h> header includes a lot of stuff, and
      it in turn gets a lot of use just for the basic "struct device"
      which appears so often.
      
      Clean up the users as follows:
      
      1) For those headers only needing "struct device" as a pointer
      in fcn args, replace the include with exactly that.
      
      2) For headers not really using anything from device.h, simply
      delete the include altogether.
      
      3) For headers relying on getting device.h implicitly before
      being included themselves, now explicitly include device.h
      
      4) For files in which doing #1 or #2 uncovers an implicit
      dependency on some other header, fix by explicitly adding
      the required header(s).
      
      Any C files that were implicitly relying on device.h to be
      present have already been dealt with in advance.
      
      Total removals from #1 and #2: 51.  Total additions coming
      from #3: 9.  Total other implicit dependencies from #4: 7.
      
      As of 3.3-rc1, there were 110, so a net removal of 42 gives
      about a 38% reduction in device.h presence in include/*
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      313162d0
  24. 27 1月, 2012 1 次提交
  25. 22 12月, 2011 1 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  26. 12 12月, 2011 1 次提交
    • J
      driver-core/cpu: Expose hotpluggability to the rest of the kernel · 2987557f
      Josh Triplett 提交于
      When architectures register CPUs, they indicate whether the CPU allows
      hotplugging; notably, x86 and ARM don't allow hotplugging CPU 0.
      Userspace can easily query the hotpluggability of a CPU via sysfs;
      however, the kernel has no convenient way of accessing that property in
      an architecture-independent way.  While the kernel can simply try it and
      see, some code needs to distinguish between "hotplug failed" and
      "hotplug has no hope of working on this CPU"; for example, rcutorture's
      CPU hotplug tests want to avoid drowning out real hotplug failures with
      expected failures.
      
      Expose this property via a new cpu_is_hotpluggable function, so that the
      rest of the kernel can access it in an architecture-independent way.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2987557f
  27. 05 11月, 2011 1 次提交
  28. 26 7月, 2011 1 次提交
    • A
      notifiers: cpu: move cpu notifiers into cpu.h · 80f1ff97
      Amerigo Wang 提交于
      We presently define all kinds of notifiers in notifier.h.  This is not
      necessary at all, since different subsystems use different notifiers, they
      are almost non-related with each other.
      
      This can also save much build time.  Suppose I add a new netdevice event,
      really I don't have to recompile all the source, just network related.
      Without this patch, all the source will be recompiled.
      
      I move the notify events near to their subsystem notifier registers, so
      that they can be found more easily.
      
      This patch:
      
      It is not necessary to share the same notifier.h.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80f1ff97
  29. 11 11月, 2010 1 次提交
  30. 29 6月, 2010 1 次提交
    • T
      workqueue: reimplement CPU hotplugging support using trustee · db7bccf4
      Tejun Heo 提交于
      Reimplement CPU hotplugging support using trustee thread.  On CPU
      down, a trustee thread is created and each step of CPU down is
      executed by the trustee and workqueue_cpu_callback() simply drives and
      waits for trustee state transitions.
      
      CPU down operation no longer waits for works to be drained but trustee
      sticks around till all pending works have been completed.  If CPU is
      brought back up while works are still draining,
      workqueue_cpu_callback() tells trustee to step down and tell workers
      to rebind to the cpu.
      
      As it's difficult to tell whether cwqs are empty if it's freezing or
      frozen, trustee doesn't consider draining to be complete while a gcwq
      is freezing or frozen (tracked by new GCWQ_FREEZING flag).  Also,
      workers which get unbound from their cpu are marked with WORKER_ROGUE.
      
      Trustee based implementation doesn't bring any new feature at this
      point but it will be used to manage worker pool when dynamic shared
      worker pool is implemented.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      db7bccf4
  31. 09 6月, 2010 2 次提交
    • T
      sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining · 3a101d05
      Tejun Heo 提交于
      Currently, when a cpu goes down, cpu_active is cleared before
      CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
      default priority cpu notifier.  When a cpu is coming up, it's set
      before CPU_ONLINE but cpuset configuration again is updated from the
      same cpu notifier.
      
      For cpu notifiers, this presents an inconsistent state.  Threads which
      a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
      migrated to other cpus because the cpu is no more inactive.
      
      Fix it by updating cpu_active in the highest priority cpu notifier and
      cpuset configuration in the second highest when a cpu is coming up.
      Down path is updated similarly.  This guarantees that all other cpu
      notifiers see consistent cpu_active and cpuset configuration.
      
      cpuset_track_online_cpus() notifier is converted to
      cpuset_update_active_cpus() which just updates the configuration and
      now called from cpuset_cpu_[in]active() notifiers registered from
      sched_init_smp().  If cpuset is disabled, cpuset_update_active_cpus()
      degenerates into partition_sched_domains() making separate notifier
      for !CONFIG_CPUSETS unnecessary.
      
      This problem is triggered by cmwq.  During CPU_DOWN_PREPARE, hotplug
      callback creates a kthread and kthread_bind()s it to the target cpu,
      and the thread is expected to run on that cpu.
      
      * Ingo's test discovered __cpuinit/exit markups were incorrect.
        Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Menage <menage@google.com>
      3a101d05
    • T
      sched: define and use CPU_PRI_* enums for cpu notifier priorities · 50a323b7
      Tejun Heo 提交于
      Instead of hardcoding priority 10 and 20 in sched and perf, collect
      them into CPU_PRI_* enums.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      50a323b7
  32. 09 12月, 2009 2 次提交
    • G
      powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate · 51badebd
      Gautham R Shenoy 提交于
      Currently the cpu-allocation/deallocation process comprises of two steps:
      - Set the indicators and to update the device tree with DLPAR node
        information.
      
      - Online/offline the allocated/deallocated CPU.
      
      This is achieved by writing to the sysfs tunables "probe" during allocation
      and "release" during deallocation.
      
      At the sametime, the userspace can independently online/offline the CPUs of
      the system using the sysfs tunable "online".
      
      It is quite possible that when a userspace tool offlines a CPU
      for the purpose of deallocation and is in the process of updating the device
      tree, some other userspace tool could bring the CPU back online by writing to
      the "online" sysfs tunable thereby causing the deallocate process to fail.
      
      The solution to this is to serialize writes to the "probe/release" sysfs
      tunable with the writes to the "online" sysfs tunable.
      
      This patch employs a mutex to provide this serialization, which is a no-op on
      all architectures except PPC_PSERIES
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      51badebd
    • N
      sysfs/cpu: Add probe/release files · 12633e80
      Nathan Fontenot 提交于
      Version 3 of this patch is updated with documentation added to
      Documentation/ABI.  There are no changes to any of the C code from v2
      of the patch.
      
      In order to support kernel DLPAR of CPU resources we need to provide an
      interface to add (probe) and remove (release) the resource from the system.
      This patch Creates new generic probe and release sysfs files to facilitate
      cpu probe/release.  The probe/release interface provides for allowing each
      arch to supply their own routines for implementing the backend of adding
      and removing cpus to/from the system.
      
      This also creates the powerpc specific stubs to handle the arch callouts
      from writes to the sysfs files.
      
      The creation and use of these files is regulated by the
      CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
      capability will have the files created.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      12633e80
  33. 16 8月, 2009 1 次提交
  34. 23 6月, 2009 1 次提交