1. 25 12月, 2016 1 次提交
  2. 08 12月, 2016 1 次提交
    • M
      hotplug: Make register and unregister notifier API symmetric · 777c6e0d
      Michal Hocko 提交于
      Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
      notifiers when HOTPLUG_CPU=y while the registration might succeed even
      when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
      might keep a stale notifier on the list on the manual clean up during
      the pool tear down and thus corrupt the list. Resulting in the following
      
      [  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
      [  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
      <snipped>
      [  145.122628] Call Trace:
      [  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
      [  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
      [  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
      [  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
      [  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
      [  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
      [  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
      [  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
      [  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
      [  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
      [  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
      [  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
      [  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
      [  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
      [  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      This can be even triggered manually by changing
      /sys/module/zswap/parameters/compressor multiple times.
      
      Fix this issue by making unregister APIs symmetric to the register so
      there are no surprises.
      
      Fixes: 47e627bc ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
      Reported-and-tested-by: NYu Zhao <yuzhao@google.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: linux-mm@kvack.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      777c6e0d
  3. 29 11月, 2016 1 次提交
    • P
      sched/idle: Add support for tasks that inject idle · c1de45ca
      Peter Zijlstra 提交于
      Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use
      realtime tasks to take control of CPU then inject idle. There are two
      issues with this approach:
      
       1. Low efficiency: injected idle task is treated as busy so sched ticks
          do not stop during injected idle period, the result of these
          unwanted wakeups can be ~20% loss in power savings.
      
       2. Idle accounting: injected idle time is presented to user as busy.
      
      This patch addresses the issues by introducing a new PF_IDLE flag which
      allows any given task to be treated as idle task while the flag is set.
      Therefore, idle injection tasks can run through the normal flow of NOHZ
      idle enter/exit to get the correct accounting as well as tick stop when
      possible.
      
      The implication is that idle task is then no longer limited to PID == 0.
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c1de45ca
  4. 08 10月, 2016 1 次提交
  5. 07 9月, 2016 1 次提交
  6. 26 8月, 2016 1 次提交
    • J
      cpu/hotplug: Allow suspend/resume CPU to be specified · d391e552
      James Morse 提交于
      disable_nonboot_cpus() assumes that the lowest numbered online CPU is
      the boot CPU, and that this is the correct CPU to run any power
      management code on.
      
      On x86 this is always correct, as CPU0 cannot (easily) by taken offline.
      
      On arm64 CPU0 can be taken offline. For hibernate/resume this means we
      may hibernate on a CPU other than CPU0. If the system is rebooted with
      kexec 'CPU0' will be assigned to a different physical CPU. This
      complicates hibernate/resume as now we can't trust the CPU numbers.
      Arch code can find the correct physical CPU, and ensure it is online
      before resume from hibernate begins, but also needs to influence
      disable_nonboot_cpus()s choice of CPU.
      
      Rename disable_nonboot_cpus() as freeze_secondary_cpus() and add an
      argument indicating which CPU should be left standing. Follow the logic
      in migrate_to_reboot_cpu() to use the lowest numbered online CPU if the
      requested CPU is not online.
      Add disable_nonboot_cpus() as an inline function that has the existing
      behaviour.
      
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d391e552
  7. 14 7月, 2016 2 次提交
  8. 06 5月, 2016 3 次提交
  9. 02 3月, 2016 5 次提交
    • T
      rcu: Make CPU_DYING_IDLE an explicit call · 27d50c7e
      Thomas Gleixner 提交于
      Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
      invoked at the proper place.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      27d50c7e
    • T
      cpu/hotplug: Make wait for dead cpu completion based · e69aab13
      Thomas Gleixner 提交于
      Kill the busy spinning on the control side and just wait for the hotplugged
      cpu to tell that it reached the dead state.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.776157858@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e69aab13
    • T
      cpu/hotplug: Unpark smpboot threads from the state machine · 931ef163
      Thomas Gleixner 提交于
      Handle the smpboot threads in the state machine.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      931ef163
    • T
      cpu/hotplug: Convert to a state machine for the control processor · cff7d378
      Thomas Gleixner 提交于
      Move the split out steps into a callback array and let the cpu_up/down
      code iterate through the array functions. For now most of the
      callbacks are asymmetric to resemble the current hotplug maze.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cff7d378
    • T
      cpu/hotplug: Restructure FROZEN state handling · 090e77c3
      Thomas Gleixner 提交于
      There are only a few callbacks which really care about FROZEN
      vs. !FROZEN. No need to have extra states for this.
      
      Publish the frozen state in an extra variable which is updated under
      the hotplug lock and let the users interested deal with it w/o
      imposing that extra state checks on everyone.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.334912357@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      090e77c3
  10. 08 10月, 2015 1 次提交
  11. 18 7月, 2015 1 次提交
    • N
      include, lib: add __printf attributes to several function prototypes · 8db14860
      Nicolas Iooss 提交于
      Using __printf attributes helps to detect several format string issues
      at compile time (even though -Wformat-security is currently disabled in
      Makefile).  For example it can detect when formatting a pointer as a
      number, like the issue fixed in commit a3fa71c4 ("wl18xx: show
      rx_frames_per_rates as an array as it really is"), or when the arguments
      do not match the format string, c.f.  for example commit 5ce1aca8
      ("reiserfs: fix __RASSERT format string").
      
      To prevent similar bugs in the future, add a __printf attribute to every
      function prototype which needs one in include/linux/ and lib/.  These
      functions were mostly found by using gcc's -Wsuggest-attribute=format
      flag.
      Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8db14860
  12. 13 4月, 2015 2 次提交
    • I
      cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well · 590ee7db
      Ingo Molnar 提交于
      Now that we are using smpboot_thread_init() in init/main.c as well,
      provide it for !CONFIG_SMP as well.
      
      This addresses a !CONFIG_SMP build failure.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      590ee7db
    • P
      cpu: Defer smpboot kthread unparking until CPU known to scheduler · 00df35f9
      Paul E. McKenney 提交于
      Currently, smpboot_unpark_threads() is invoked before the incoming CPU
      has been added to the scheduler's runqueue structures.  This might
      potentially cause the unparked kthread to run on the wrong CPU, since the
      correct CPU isn't fully set up yet.
      
      That causes a sporadic, hard to debug boot crash triggering on some
      systems, reported by Borislav Petkov, and bisected down to:
      
        2a442c9c ("x86: Use common outgoing-CPU-notification code")
      
      This patch places smpboot_unpark_threads() in a CPU hotplug
      notifier with priority set so that these kthreads are unparked just after
      the CPU has been added to the runqueues.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      00df35f9
  13. 13 3月, 2015 1 次提交
    • P
      rcu: Handle outgoing CPUs on exit from idle loop · 88428cc5
      Paul E. McKenney 提交于
      This commit informs RCU of an outgoing CPU just before that CPU invokes
      arch_cpu_idle_dead() during its last pass through the idle loop (via a
      new CPU_DYING_IDLE notifier value).  This change means that RCU need not
      deal with outgoing CPUs passing through the scheduler after informing
      RCU that they are no longer online.  Note that removing the CPU from
      the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
      and orphaning callbacks is still done at CPU_DEAD time, the reason being
      that at CPU_DEAD time we have another CPU that can adopt them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88428cc5
  14. 12 3月, 2015 1 次提交
    • P
      smpboot: Add common code for notification from dying CPU · 8038dad7
      Paul E. McKenney 提交于
      RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
      (They -can- use SRCU, but not RCU.)  This means that any use of RCU
      during or after the call to arch_cpu_idle_dead().  Unfortunately,
      commit 2ed53c0d added a complete() call, which will contain RCU
      read-side critical sections if there is a task waiting to be awakened.
      
      Which, as it turns out, there almost never is.  In my qemu/KVM testing,
      the to-be-awakened task is not yet asleep more than 99.5% of the time.
      In current mainline, failure is even harder to reproduce, requiring a
      virtualized environment that delays the outgoing CPU by at least three
      jiffies between the time it exits its stop_machine() task at CPU_DYING
      time and the time it calls arch_cpu_idle_dead() from the idle loop.
      However, this problem really can occur, especially in virtualized
      environments, and therefore really does need to be fixed
      
      This suggests moving back to the polling loop, but using a much shorter
      wait, with gentle exponential backoff instead of the old 100-millisecond
      wait.  Most of the time, the loop will exit without waiting at all,
      and almost all of the remaining uses will wait only five microseconds.
      If the outgoing CPU is preempted, a loop will wait one jiffy, then
      increase the wait by a factor of 11/10ths, rounding up.  As before, there
      is a five-second timeout.
      
      This commit therefore provides common-code infrastructure to do the
      dying-to-surviving CPU handoff in a safe manner.  This code also
      provides an indication at CPU-online of whether the CPU to be onlined
      previously timed out on offline.  The new cpu_check_up_prepare() function
      returns -EBUSY if this CPU previously took more than five seconds to
      go offline, or -EAGAIN if it has not yet managed to go offline.  The
      rationale for -EAGAIN is that it might still be preempted, so an additional
      wait might well find it correctly offlined.  Architecture-specific code
      can decide how to handle these conditions.  Systems in which CPUs take
      themselves completely offline might respond to an -EBUSY return as if
      it was a zero (success) return.  Systems in which the surviving CPU must
      take some action might take it at this time, or might simply mark the
      other CPU as unusable.
      
      Note that architectures that take the easy way out and simply pass the
      -EBUSY and -EAGAIN upwards will change the sysfs API.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      [ paulmck: Fixed state machine for architectures that don't check earlier
        CPU-hotplug results as suggested by James Hogan. ]
      8038dad7
  15. 08 11月, 2014 1 次提交
    • S
      drivers: base: add cpu_device_create to support per-cpu devices · 3d52943b
      Sudeep Holla 提交于
      This patch adds a new function to create per-cpu devices.
      This helps in:
      1. reusing the device infrastructure to create any cpu related
         attributes and corresponding sysfs instead of creating and
         dealing with raw kobjects directly
      2. retaining the legacy path(/sys/devices/system/cpu/..) to support
         existing sysfs ABI
      3. avoiding to create links in the bus directory pointing to the
         device as there would be per-cpu instance of these devices with
         the same name since dev->bus is not populated to cpu_sysbus on
         purpose
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d52943b
  16. 19 9月, 2014 1 次提交
    • P
      rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42
      Paul E. McKenney 提交于
      Currently, the expedited grace-period primitives do get_online_cpus().
      This greatly simplifies their implementation, but means that calls
      to them holding locks that are acquired by CPU-hotplug notifiers (to
      say nothing of calls to these primitives from CPU-hotplug notifiers)
      can deadlock.  But this is starting to become inconvenient, as can be
      seen here: https://lkml.org/lkml/2014/8/5/754.  The problem in this
      case is that some developers need to acquire a mutex from a CPU-hotplug
      notifier, but also need to hold it across a synchronize_rcu_expedited().
      As noted above, this currently results in deadlock.
      
      This commit avoids the deadlock and retains the simplicity by creating
      a try_get_online_cpus(), which returns false if the get_online_cpus()
      reference count could not immediately be incremented.  If a call to
      try_get_online_cpus() returns true, the expedited primitives operate as
      before.  If a call returns false, the expedited primitives fall back to
      normal grace-period operations.  This falling back of course results in
      increased grace-period latency, but only during times when CPU hotplug
      operations are actually in flight.  The effect should therefore be
      negligible during normal operation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Tested-by: NLan Tianyu <tianyu.lan@intel.com>
      dd56af42
  17. 07 6月, 2014 1 次提交
  18. 20 3月, 2014 1 次提交
    • S
      CPU hotplug: Provide lockless versions of callback registration functions · 93ae4f97
      Srivatsa S. Bhat 提交于
      The following method of CPU hotplug callback registration is not safe
      due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
      and the cpu_hotplug.lock.
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      The deadlock is shown below:
      
                CPU 0                                         CPU 1
                -----                                         -----
      
         Acquire cpu_hotplug.lock
         [via get_online_cpus()]
      
                                                    CPU online/offline operation
                                                    takes cpu_add_remove_lock
                                                    [via cpu_maps_update_begin()]
      
         Try to acquire
         cpu_add_remove_lock
         [via register_cpu_notifier()]
      
                                                    CPU online/offline operation
                                                    tries to acquire cpu_hotplug.lock
                                                    [via cpu_hotplug_begin()]
      
                                  *** DEADLOCK! ***
      
      The problem here is that callback registration takes the locks in one order
      whereas the CPU hotplug operations take the same locks in the opposite order.
      To avoid this issue and to provide a race-free method to register CPU hotplug
      callbacks (along with initialization of already online CPUs), introduce new
      variants of the callback registration APIs that simply register the callbacks
      without holding the cpu_add_remove_lock during the registration. That way,
      we can avoid the ABBA scenario. However, we will need to hold the
      cpu_add_remove_lock throughout the entire critical section, to protect updates
      to the callback/notifier chain.
      
      This can be achieved by writing the callback registration code as follows:
      
      	cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	/* This doesn't take the cpu_add_remove_lock */
      	__register_cpu_notifier(&foobar_cpu_notifier);
      
      	cpu_maps_update_done();  [ or cpu_notifier_register_done(); see below ]
      
      Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
      because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
      notifiers, and hence get_online_cpus() cannot provide the necessary
      synchronization to protect the callback/notifier chains against concurrent
      reads and writes. On the other hand, since the cpu_add_remove_lock protects
      the entire hotplug operation (including CPU_POST_DEAD), we can use
      cpu_maps_update_begin/done() to guarantee proper synchronization.
      
      Also, since cpu_maps_update_begin/done() is like a super-set of
      get/put_online_cpus(), the former naturally protects the critical sections
      from concurrent hotplug operations.
      
      Since the names cpu_maps_update_begin/done() don't make much sense in CPU
      hotplug callback registration scenarios, we'll introduce new APIs named
      cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
      
      In summary, introduce the lockless variants of un/register_cpu_notifier() and
      also export the cpu_notifier_register_begin/done() APIs for use by modules.
      This way, we provide a race-free way to register hotplug callbacks as well as
      perform initialization for the CPUs that are already online.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      93ae4f97
  19. 19 2月, 2014 1 次提交
  20. 16 10月, 2013 1 次提交
  21. 01 10月, 2013 1 次提交
    • T
      hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() · 6dedcca6
      Toshi Kani 提交于
      cpu_hotplug_driver_lock() serializes CPU online/offline operations
      when ARCH_CPU_PROBE_RELEASE is set.  This lock interface is no longer
      necessary with the following reason:
      
       - lock_device_hotplug() now protects CPU online/offline operations,
         including the probe & release interfaces enabled by
         ARCH_CPU_PROBE_RELEASE.  The use of cpu_hotplug_driver_lock() is
         redundant.
       - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
         is defined, which is misleading and is only enabled on powerpc.
      
      This patch removes the cpu_hotplug_driver_lock() interface.  As
      a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
      probe & release interface as intended.  There is no functional change
      in this patch.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6dedcca6
  22. 21 8月, 2013 1 次提交
    • S
      of: move of_get_cpu_node implementation to DT core library · 183912d3
      Sudeep KarkadaNagesha 提交于
      This patch moves the generalized implementation of of_get_cpu_node from
      PowerPC to DT core library, thereby adding support for retrieving cpu
      node for a given logical cpu index on any architecture.
      
      The CPU subsystem can now use this function to assign of_node in the
      cpu device while registering CPUs.
      
      It is recommended to use these helper function only in pre-SMP/early
      initialisation stages to retrieve CPU device node pointers in logical
      ordering. Once the cpu devices are registered, it can be retrieved easily
      from cpu device of_node which avoids unnecessary parsing and matching.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Acked-by: NRob Herring <rob.herring@calxeda.com>
      Signed-off-by: NSudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
      183912d3
  23. 13 8月, 2013 1 次提交
    • T
      ACPI / processor: Acquire writer lock to update CPU maps · b9d10be7
      Toshi Kani 提交于
      CPU system maps are protected with reader/writer locks.  The reader
      lock, get_online_cpus(), assures that the maps are not updated while
      holding the lock.  The writer lock, cpu_hotplug_begin(), is used to
      udpate the cpu maps along with cpu_maps_update_begin().
      
      However, the ACPI processor handler updates the cpu maps without
      holding the the writer lock.
      
      acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
      update cpu_possible_mask and cpu_present_mask.  acpi_unmap_lsapic()
      is called from acpi_processor_remove() to update cpu_possible_mask.
      Currently, they are either unprotected or protected with the reader
      lock, which is not correct.
      
      For example, the get_online_cpus() below is supposed to assure that
      cpu_possible_mask is not changed while the code is iterating with
      for_each_possible_cpu().
      
              get_online_cpus();
              for_each_possible_cpu(cpu) {
      		:
              }
              put_online_cpus();
      
      However, this lock has no protection with CPU hotplug since the ACPI
      processor handler does not use the writer lock when it updates
      cpu_possible_mask.  The reader lock does not serialize within the
      readers.
      
      This patch protects them with the writer lock with cpu_hotplug_begin()
      along with cpu_maps_update_begin(), which must be held before calling
      cpu_hotplug_begin().  It also protects arch_register_cpu() /
      arch_unregister_cpu(), which creates / deletes a sysfs cpu device
      interface.  For this purpose it changes cpu_hotplug_begin() and
      cpu_hotplug_done() to global and exports them in cpu.h.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b9d10be7
  24. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  25. 13 6月, 2013 1 次提交
  26. 28 5月, 2013 1 次提交
  27. 08 4月, 2013 2 次提交
  28. 18 7月, 2012 1 次提交
    • T
      workqueue: perform cpu down operations from low priority cpu_notifier() · 65758202
      Tejun Heo 提交于
      Currently, all workqueue cpu hotplug operations run off
      CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
      ensure that workqueue is up and running while bringing up a CPU before
      other notifiers try to use workqueue on the CPU.
      
      Per-cpu workqueues are supposed to remain working and bound to the CPU
      for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
      with workqueue offlining running with higher priority because
      workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
      runs the per-cpu workqueue without concurrency management without
      explicitly detaching the existing workers.
      
      However, if the trustee needs to create new workers, it creates
      unbound workers which may wander off to other CPUs while
      CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
      down is cancelled, the per-CPU workqueue may end up with workers which
      aren't bound to the CPU.
      
      While reliably reproducible with a convoluted artificial test-case
      involving scheduling and flushing CPU burning work items from CPU down
      notifiers, this isn't very likely to happen in the wild, and, even
      when it happens, the effects are likely to be hidden by the following
      successful CPU down.
      
      Fix it by using different priorities for up and down notifiers - high
      priority for up operations and low priority for down operations.
      
      Workqueue cpu hotplug operations will soon go through further cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      65758202
  29. 01 6月, 2012 1 次提交
    • A
      cpu: introduce clear_tasks_mm_cpumask() helper · cb79295e
      Anton Vorontsov 提交于
      Many architectures clear tasks' mm_cpumask like this:
      
      	read_lock(&tasklist_lock);
      	for_each_process(p) {
      		if (p->mm)
      			cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
      	}
      	read_unlock(&tasklist_lock);
      
      Depending on the context, the code above may have several problems,
      such as:
      
      1. Working with task->mm w/o getting mm or grabing the task lock is
         dangerous as ->mm might disappear (exit_mm() assigns NULL under
         task_lock(), so tasklist lock is not enough).
      
      2. Checking for process->mm is not enough because process' main
         thread may exit or detach its mm via use_mm(), but other threads
         may still have a valid mm.
      
      This patch implements a small helper function that does things
      correctly, i.e.:
      
      1. We take the task's lock while whe handle its mm (we can't use
         get_task_mm()/mmput() pair as mmput() might sleep);
      
      2. To catch exited main thread case, we use find_lock_task_mm(),
         which walks up all threads and returns an appropriate task
         (with task lock held).
      
      Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
      the new helper, instead we take the rcu read lock. We can do this
      because the function is called after the cpu is taken down and marked
      offline, so no new tasks will get this cpu set in their mm mask.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb79295e
  30. 17 5月, 2012 1 次提交
    • P
      sched: Remove stale power aware scheduling remnants and dysfunctional knobs · 8e7fbcbc
      Peter Zijlstra 提交于
      It's been broken forever (i.e. it's not scheduling in a power
      aware fashion), as reported by Suresh and others sending
      patches, and nobody cares enough to fix it properly ...
      so remove it to make space free for something better.
      
      There's various problems with the code as it stands today, first
      and foremost the user interface which is bound to topology
      levels and has multiple values per level. This results in a
      state explosion which the administrator or distro needs to
      master and almost nobody does.
      
      Furthermore large configuration state spaces aren't good, it
      means the thing doesn't just work right because it's either
      under so many impossibe to meet constraints, or even if
      there's an achievable state workloads have to be aware of
      it precisely and can never meet it for dynamic workloads.
      
      So pushing this kind of decision to user-space was a bad idea
      even with a single knob - it's exponentially worse with knobs
      on every node of the topology.
      
      There is a proposal to replace the user interface with a single
      3 state knob:
      
       sched_balance_policy := { performance, power, auto }
      
      where 'auto' would be the preferred default which looks at things
      like Battery/AC mode and possible cpufreq state or whatever the hw
      exposes to show us power use expectations - but there's been no
      progress on it in the past many months.
      
      Aside from that, the actual implementation of the various knobs
      is known to be broken. There have been sporadic attempts at
      fixing things but these always stop short of reaching a mergable
      state.
      
      Therefore this wholesale removal with the hopes of spurring
      people who care to come forward once again and work on a
      coherent replacement.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8e7fbcbc
  31. 16 3月, 2012 1 次提交
    • P
      device.h: audit and cleanup users in main include dir · 313162d0
      Paul Gortmaker 提交于
      The <linux/device.h> header includes a lot of stuff, and
      it in turn gets a lot of use just for the basic "struct device"
      which appears so often.
      
      Clean up the users as follows:
      
      1) For those headers only needing "struct device" as a pointer
      in fcn args, replace the include with exactly that.
      
      2) For headers not really using anything from device.h, simply
      delete the include altogether.
      
      3) For headers relying on getting device.h implicitly before
      being included themselves, now explicitly include device.h
      
      4) For files in which doing #1 or #2 uncovers an implicit
      dependency on some other header, fix by explicitly adding
      the required header(s).
      
      Any C files that were implicitly relying on device.h to be
      present have already been dealt with in advance.
      
      Total removals from #1 and #2: 51.  Total additions coming
      from #3: 9.  Total other implicit dependencies from #4: 7.
      
      As of 3.3-rc1, there were 110, so a net removal of 42 gives
      about a 38% reduction in device.h presence in include/*
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      313162d0