1. 01 8月, 2014 1 次提交
    • J
      timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks · 504d5874
      Jan Kara 提交于
      clockevents_increase_min_delta() calls printk() from under
      hrtimer_bases.lock. That causes lock inversion on scheduler locks because
      printk() can call into the scheduler. Lockdep puts it as:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.15.0-rc8-06195-g939f04be #2 Not tainted
      -------------------------------------------------------
      trinity-main/74 is trying to acquire lock:
       (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c
      
      but task is already holding lock:
       (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (hrtimer_bases.lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
             [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
             [<81080792>] task_clock_event_start+0x3a/0x3f
             [<810807a4>] task_clock_event_add+0xd/0x14
             [<8108259a>] event_sched_in+0xb6/0x17a
             [<810826a2>] group_sched_in+0x44/0x122
             [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
             [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
             [<81082bf6>] __perf_install_in_context+0x8b/0xa3
             [<8107eb8e>] remote_function+0x12/0x2a
             [<8105f5af>] smp_call_function_single+0x2d/0x53
             [<8107e17d>] task_function_call+0x30/0x36
             [<8107fb82>] perf_install_in_context+0x87/0xbb
             [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
             [<810856f9>] SyS_perf_event_open+0x17/0x19
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #4 (&ctx->lock){......}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      -> #3 (&rq->lock){-.-.-.}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f04c>] _raw_spin_lock+0x21/0x30
             [<81040873>] __task_rq_lock+0x33/0x3a
             [<8104184c>] wake_up_new_task+0x25/0xc2
             [<8102474b>] do_fork+0x15c/0x2a0
             [<810248a9>] kernel_thread+0x1a/0x1f
             [<814232a2>] rest_init+0x1a/0x10e
             [<817af949>] start_kernel+0x303/0x308
             [<817af2ab>] i386_start_kernel+0x79/0x7d
      
      -> #2 (&p->pi_lock){-.-...}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<810413dd>] try_to_wake_up+0x1d/0xd6
             [<810414cd>] default_wake_function+0xb/0xd
             [<810461f3>] __wake_up_common+0x39/0x59
             [<81046346>] __wake_up+0x29/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #1 (&tty->write_wait){-.....}:
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<81046332>] __wake_up+0x15/0x3b
             [<811b8733>] tty_wakeup+0x49/0x51
             [<811c3568>] uart_write_wakeup+0x17/0x19
             [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
             [<811c5f28>] serial8250_handle_irq+0x54/0x6a
             [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
             [<811c56d8>] serial8250_interrupt+0x38/0x9e
             [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
             [<81051296>] handle_irq_event+0x2c/0x43
             [<81052cee>] handle_level_irq+0x57/0x80
             [<81002a72>] handle_irq+0x46/0x5c
             [<810027df>] do_IRQ+0x32/0x89
             [<8143036e>] common_interrupt+0x2e/0x33
             [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
             [<811c25a4>] uart_start+0x2d/0x32
             [<811c2c04>] uart_write+0xc7/0xd6
             [<811bc6f6>] n_tty_write+0xb8/0x35e
             [<811b9beb>] tty_write+0x163/0x1e4
             [<811b9cd9>] redirected_tty_write+0x6d/0x75
             [<810b6ed6>] vfs_write+0x75/0xb0
             [<810b7265>] SyS_write+0x44/0x77
             [<8142f8ee>] syscall_call+0x7/0xb
      
      -> #0 (&port_lock_key){-.....}:
             [<8104a62d>] __lock_acquire+0x9ea/0xc6d
             [<8104a942>] lock_acquire+0x92/0x101
             [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
             [<811c60be>] serial8250_console_write+0x8c/0x10c
             [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
             [<8104f5d5>] console_unlock+0x1d7/0x398
             [<8104fb70>] vprintk_emit+0x3da/0x3e4
             [<81425f76>] printk+0x17/0x19
             [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
             [<8105c548>] clockevents_program_event+0xe7/0xf3
             [<8105cc1c>] tick_program_event+0x1e/0x23
             [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
             [<8103c49e>] __remove_hrtimer+0x5b/0x79
             [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
             [<8103cb4b>] hrtimer_cancel+0xd/0x18
             [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
             [<81080705>] task_clock_event_stop+0x20/0x64
             [<81080756>] task_clock_event_del+0xd/0xf
             [<81081350>] event_sched_out+0xab/0x11e
             [<810813e0>] group_sched_out+0x1d/0x66
             [<81081682>] ctx_sched_out+0xaf/0xbf
             [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
             [<8142cacc>] __schedule+0x4c6/0x4cb
             [<8142cae0>] schedule+0xf/0x11
             [<8142f9a6>] work_resched+0x5/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        &port_lock_key --> &ctx->lock --> hrtimer_bases.lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(hrtimer_bases.lock);
                                     lock(&ctx->lock);
                                     lock(hrtimer_bases.lock);
        lock(&port_lock_key);
      
       *** DEADLOCK ***
      
      4 locks held by trinity-main/74:
       #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
       #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
       #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4
      
      stack backtrace:
      CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04be #2
       00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
       8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
       8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
      Call Trace:
       [<81426f69>] dump_stack+0x16/0x18
       [<81425a99>] print_circular_bug+0x18f/0x19c
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] ? serial8250_console_write+0x8c/0x10c
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104af87>] ? lock_release+0x191/0x223
       [<811c6032>] ? wait_for_xmitr+0x76/0x76
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8104416d>] ? __dequeue_entity+0x23/0x27
       [<81044505>] ? pick_next_task_fair+0xb1/0x120
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
       [<810475b0>] ? trace_hardirqs_off+0xb/0xd
       [<81056346>] ? rcu_irq_exit+0x64/0x77
      
      Fix the problem by using printk_deferred() which does not call into the
      scheduler.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      504d5874
  2. 31 7月, 2014 3 次提交
  3. 30 7月, 2014 1 次提交
  4. 29 7月, 2014 1 次提交
  5. 24 7月, 2014 1 次提交
    • S
      sched_clock: Avoid corrupting hrtimer tree during suspend · f723aa18
      Stephen Boyd 提交于
      During suspend we call sched_clock_poll() to update the epoch and
      accumulated time and reprogram the sched_clock_timer to fire
      before the next wrap-around time. Unfortunately,
      sched_clock_poll() doesn't restart the timer, instead it relies
      on the hrtimer layer to do that and during suspend we aren't
      calling that function from the hrtimer layer. Instead, we're
      reprogramming the expires time while the hrtimer is enqueued,
      which can cause the hrtimer tree to be corrupted. Furthermore, we
      restart the timer during suspend but we update the epoch during
      resume which seems counter-intuitive.
      
      Let's fix this by saving the accumulated state and canceling the
      timer during suspend. On resume we can update the epoch and
      restart the timer similar to what we would do if we were starting
      the clock for the first time.
      
      Fixes: a08ca5d1 "sched_clock: Use an hrtimer instead of timer"
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f723aa18
  6. 23 7月, 2014 6 次提交
    • L
      workqueue: use nr_node_ids instead of wq_numa_tbl_len · ddcb57e2
      Lai Jiangshan 提交于
      They are the same and nr_node_ids is provided by the memory subsystem.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ddcb57e2
    • L
      workqueue: remove the misnamed out_unlock label in get_unbound_pool() · 3fb1823c
      Lai Jiangshan 提交于
      After the locking was moved up to the caller of the get_unbound_pool(),
      out_unlock label doesn't need to do any unlock operation and the name
      became bad, so we just remove this label, and the only usage-site
      "goto out_unlock" is subsituted to "return pool".
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      3fb1823c
    • L
      workqueue: remove the stale comment in pwq_unbound_release_workfn() · 29b1cb41
      Lai Jiangshan 提交于
      In 75ccf595 ("workqueue: prepare flush_workqueue() for dynamic
      creation and destrucion of unbound pool_workqueues"), a comment
      about the synchronization for the pwq in pwq_unbound_release_workfn()
      was added. The comment claimed the flush_mutex wasn't strictly
      necessary, it was correct in that time, due to the pwq was protected
      by workqueue_lock.
      
      But it is incorrect now since the wq->flush_mutex was renamed to
      wq->mutex and workqueue_lock was removed, the wq->mutex is strictly
      needed. But the comment was miss-updated when the synchronization
      was changed.
      
      This patch removes the incorrect comments and doesn't add any new
      comment to explain why wq->mutex is needed here, which is definitely
      obvious and wq->pwqs_node has "WQ" notation in its definition which is
      better comment.
      
      The old commit mentioned above also introduced a comment in link_pwq()
      about the synchronization. This comment is also removed in this patch
      since the whole link_pwq() is proteced by wq->mutex.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      29b1cb41
    • L
      workqueue: move rescuer pool detachment to the end · 13b1d625
      Lai Jiangshan 提交于
      In 51697d39 ("workqueue: use generic attach/detach routine for
      rescuers"), The rescuer detaches itself from the pool before put_pwq()
      so that the put_unbound_pool() will not destroy the rescuer-attached
      pool.
      
      It is unnecessary.  worker_detach_from_pool() can be used as the last
      statement to access to the pool just like the regular workers,
      put_unbound_pool() will wait for it to detach and then free the pool.
      
      So we move the worker_detach_from_pool() down, make it coincide with
      the regular workers.
      
      tj: Minor description update.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      13b1d625
    • L
      workqueue: unfold start_worker() into create_worker() · 051e1850
      Lai Jiangshan 提交于
      Simply unfold the code of start_worker() into create_worker() and
      remove the original start_worker() and create_and_start_worker().
      
      The only trade-off is the introduced overhead that the pool->lock
      is released and regrabbed after the newly worker is started.
      The overhead is acceptible since the manager is slow path.
      
      And because this new locking behavior, the newly created worker
      may grab the lock earlier than the manager and go to process
      work items. In this case, the recheck need_to_create_worker() may be
      true as expected and the manager goes to restart which is the
      correct behavior.
      
      tj: Minor updates to description and comments.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      051e1850
    • L
      workqueue: remove @wakeup from worker_set_flags() · 228f1d00
      Lai Jiangshan 提交于
      worker_set_flags() has only two callers, each specifying %true and
      %false for @wakeup.  Let's push the wake up to the caller and remove
      @wakeup from worker_set_flags().  The caller can use the following
      instead if wakeup is necessary:
      
      	worker_set_flags();
      	if (need_more_worker(pool))
       		wake_up_worker(pool);
      
      This makes the code simpler.  This patch doesn't introduce behavior
      changes.
      
      tj: Updated description and comments.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      228f1d00
  7. 22 7月, 2014 1 次提交
    • L
      workqueue: remove an unneeded UNBOUND test before waking up the next worker · a489a03e
      Lai Jiangshan 提交于
      In process_one_work():
      
      	if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool))
      		wake_up_worker(pool);
      
      the first test is unneeded.  Even if the first test is removed, it
      doesn't affect the wake-up logic for WORKER_UNBOUND, and it will not
      introduce any useless wake-ups for normal per-cpu workers since
      nr_running is always >= 1.  It will introduce useless/redundant
      wake-ups for CPU_INTENSIVE, but this case is rare and the next patch
      will also remove this redundant wake-up.
      
      tj: Minor updates to the description and comment.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a489a03e
  8. 21 7月, 2014 1 次提交
  9. 19 7月, 2014 1 次提交
  10. 18 7月, 2014 1 次提交
  11. 16 7月, 2014 12 次提交
    • D
      locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER · 5db6c6fe
      Davidlohr Bueso 提交于
      Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
      encapsulate the dependencies for rwsem optimistic spinning.
      No logical changes here as it continues to depend on both
      SMP and the XADD algorithm variant.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NJason Low <jason.low2@hp.com>
      [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
      Cc: aswin@hp.com
      Cc: Chris Mason <clm@fb.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5db6c6fe
    • P
      locking/mutex: Disable optimistic spinning on some architectures · 4badad35
      Peter Zijlstra 提交于
      The optimistic spin code assumes regular stores and cmpxchg() play nice;
      this is found to not be true for at least: parisc, sparc32, tile32,
      metag-lock1, arc-!llsc and hexagon.
      
      There is further wreckage, but this in particular seemed easy to
      trigger, so blacklist this.
      
      Opt in for known good archs.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: John David Anglin <dave.anglin@bell.net>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: stable@vger.kernel.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4badad35
    • P
      locking/rwsem: Rename 'activity' to 'count' · 13b9a962
      Peter Zijlstra 提交于
      There are two definitions of struct rw_semaphore, one in linux/rwsem.h
      and one in linux/rwsem-spinlock.h.
      
      For some reason they have different names for the initial field. This
      makes it impossible to use C99 named initialization for
      __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
      along with the structure definitions.
      
      The simpler patch is renaming the rwsem-spinlock variant to match the
      regular rwsem.
      
      This allows us to switch to C99 named initialization.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13b9a962
    • M
      sched: Fix possible divide by zero in avg_atom() calculation · b0ab99e7
      Mateusz Guzik 提交于
      proc_sched_show_task() does:
      
        if (nr_switches)
      	do_div(avg_atom, nr_switches);
      
      nr_switches is unsigned long and do_div truncates it to 32 bits, which
      means it can test non-zero on e.g. x86-64 and be truncated to zero for
      division.
      
      Fix the problem by using div64_ul() instead.
      
      As a side effect calculations of avg_atom for big nr_switches are now correct.
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0ab99e7
    • J
      locking/spinlocks/mcs: Micro-optimize osq_unlock() · 33ecd208
      Jason Low 提交于
      In the unlock function of the cancellable MCS spinlock, the first
      thing we do is to retrive the current CPU's osq node. However, due to
      the changes made in the previous patch, in the common case where the
      lock is not contended, we wouldn't need to access the current CPU's
      osq node anymore.
      
      This patch optimizes this by only retriving this CPU's osq node
      after we attempt the initial cmpxchg to unlock the osq and found
      that its contended.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      33ecd208
    • J
      locking/spinlocks/mcs: Introduce and use init macro and function for osq locks · 4d9d951e
      Jason Low 提交于
      Currently, we initialize the osq lock by directly setting the lock's values. It
      would be preferable if we use an init macro to do the initialization like we do
      with other locks.
      
      This patch introduces and uses a macro and function for initializing the osq lock.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d9d951e
    • J
      locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead · 90631822
      Jason Low 提交于
      The cancellable MCS spinlock is currently used to queue threads that are
      doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
      the lock would access and queue the local node corresponding to the CPU that
      it's running on. Currently, the cancellable MCS lock is implemented by using
      pointers to these nodes.
      
      In this patch, instead of operating on pointers to the per-cpu nodes, we
      store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
      A similar concept is used with the qspinlock.
      
      By operating on the CPU # of the nodes using atomic_t instead of pointers
      to those nodes, this can reduce the overhead of the cancellable MCS spinlock
      by 32 bits (on 64 bit systems).
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90631822
    • J
      locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node() · 046a619d
      Jason Low 提交于
      Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
      named "optimistic_spin_queue". However, in a follow up patch in the series
      we will be introducing a new structure that serves as the new "handle" for
      the lock. It would make more sense if that structure is named
      "optimistic_spin_queue". Additionally, since the current use of the
      "optimistic_spin_queue" structure are  "nodes", it might be better if we
      rename them to "node" anyway.
      
      This preparatory patch renames all current "optimistic_spin_queue"
      to "optimistic_spin_node".
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      046a619d
    • J
      locking/rwsem: Allow conservative optimistic spinning when readers have lock · 37e95624
      Jason Low 提交于
      Commit 4fc828e2 ("locking/rwsem: Support optimistic spinning")
      introduced a major performance regression for workloads such as
      xfs_repair which mix read and write locking of the mmap_sem across
      many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
      than on 3.15 and using 20x more system CPU time.
      
      Perf profiles indicate in some workloads that significant time can
      be spent spinning on !owner. This is because we don't set the lock
      owner when readers(s) obtain the rwsem.
      
      In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
      return false if there is no lock owner. The rationale is that if we
      just entered the slowpath, yet there is no lock owner, then there is
      a possibility that a reader has the lock. To be conservative, we'll
      avoid spinning in these situations.
      
      This patch reduced the total run time of the xfs_repair workload from
      about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
      back to close to the same performance as on 3.15.
      
      Retesting of AIM7, which were some of the workloads used to test the
      original optimistic spinning code, confirmed that we still get big
      performance gains with optimistic spinning, even with this additional
      regression fix. Davidlohr found that while the 'custom' workload took
      a performance hit of ~-14% to throughput for >300 users with this
      additional patch, the overall gain with optimistic spinning is
      still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users.
      Tested-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBoxSigned-off-by: NIngo Molnar <mingo@kernel.org>
      37e95624
    • P
      perf: Fix lockdep warning on process exit · 4a1c0f26
      Peter Zijlstra 提交于
      Sasha Levin reported:
      
      > While fuzzing with trinity inside a KVM tools guest running the latest -next
      > kernel I've stumbled on the following spew:
      >
      > ======================================================
      > [ INFO: possible circular locking dependency detected ]
      > 3.15.0-next-20140613-sasha-00026-g6dd125d-dirty #654 Not tainted
      > -------------------------------------------------------
      > trinity-c578/9725 is trying to acquire lock:
      > (&(&pool->lock)->rlock){-.-...}, at: __queue_work (kernel/workqueue.c:1346)
      >
      > but task is already holding lock:
      > (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533)
      >
      > which lock already depends on the new lock.
      
      > 1 lock held by trinity-c578/9725:
      > #0: (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533)
      >
      >  Call Trace:
      >  dump_stack (lib/dump_stack.c:52)
      >  print_circular_bug (kernel/locking/lockdep.c:1216)
      >  __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
      >  lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
      >  _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
      >  __queue_work (kernel/workqueue.c:1346)
      >  queue_work_on (kernel/workqueue.c:1424)
      >  free_object (lib/debugobjects.c:209)
      >  __debug_check_no_obj_freed (lib/debugobjects.c:715)
      >  debug_check_no_obj_freed (lib/debugobjects.c:727)
      >  kmem_cache_free (mm/slub.c:2683 mm/slub.c:2711)
      >  free_task (kernel/fork.c:221)
      >  __put_task_struct (kernel/fork.c:250)
      >  put_ctx (include/linux/sched.h:1855 kernel/events/core.c:898)
      >  perf_event_exit_task (kernel/events/core.c:907 kernel/events/core.c:7478 kernel/events/core.c:7533)
      >  do_exit (kernel/exit.c:766)
      >  do_group_exit (kernel/exit.c:884)
      >  get_signal_to_deliver (kernel/signal.c:2347)
      >  do_signal (arch/x86/kernel/signal.c:698)
      >  do_notify_resume (arch/x86/kernel/signal.c:751)
      >  int_signal (arch/x86/kernel/entry_64.S:600)
      
      Urgh.. so the only way I can make that happen is through:
      
        perf_event_exit_task_context()
          raw_spin_lock(&child_ctx->lock);
          unclone_ctx(child_ctx)
            put_ctx(ctx->parent_ctx);
          raw_spin_unlock_irqrestore(&child_ctx->lock);
      
      And we can avoid this by doing the change below.
      
      I can't immediately see how this changed recently, but given that you
      say it's easy to reproduce, lets fix this.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140623141242.GB19860@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4a1c0f26
    • P
      perf: Revert ("perf: Always destroy groups on exit") · 1903d50c
      Peter Zijlstra 提交于
      Vince reported that commit 15a2d4de ("perf: Always destroy groups
      on exit") causes a regression with grouped events. In particular his
      read_group_attached.c test fails.
      
        https://github.com/deater/perf_event_tests/blob/master/tests/bugs/read_group_attached.c
      
      Because of the context switch optimization in
      perf_event_context_sched_out() the 'original' event may end up in the
      child process and when that exits the change in the patch in question
      destroys the actual grouping.
      
      Therefore revert that change and only destroy inherited groups.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-zedy3uktcp753q8fw8dagx7a@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1903d50c
    • M
      ring-buffer: Fix polling on trace_pipe · 97b8ee84
      Martin Lau 提交于
      ring_buffer_poll_wait() should always put the poll_table to its wait_queue
      even there is immediate data available.  Otherwise, the following epoll and
      read sequence will eventually hang forever:
      
      1. Put some data to make the trace_pipe ring_buffer read ready first
      2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
      3. epoll_wait()
      4. read(trace_pipe_fd) till EAGAIN
      5. Add some more data to the trace_pipe ring_buffer
      6. epoll_wait() -> this epoll_wait() will block forever
      
      ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
        ring_buffer_poll_wait() returns immediately without adding poll_table,
        which has poll_table->_qproc pointing to ep_poll_callback(), to its
        wait_queue.
      ~ During the epoll_wait() call in step 3 and step 6,
        ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
        because the poll_table->_qproc is NULL and it is how epoll works.
      ~ When there is new data available in step 6, ring_buffer does not know
        it has to call ep_poll_callback() because it is not in its wait queue.
        Hence, block forever.
      
      Other poll implementation seems to call poll_wait() unconditionally as the very
      first thing to do.  For example, tcp_poll() in tcp.c.
      
      Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
      
      Cc: stable@vger.kernel.org # 2.6.27
      Fixes: 2a2cc8f7 "ftrace: allow the event pipe to be polled"
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NMartin Lau <kafai@fb.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      97b8ee84
  12. 15 7月, 2014 11 次提交
    • Z
      tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs · f0160a5a
      zhangwei(Jovi) 提交于
      The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
      so add it, to be consistent with __trace_printk/__trace_bprintk.
      Those functions are all called by the same function: trace_printk().
      
      Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com
      
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f0160a5a
    • L
      workqueue: alloc struct worker on its local node · f7537df5
      Lai Jiangshan 提交于
      When the create_worker() is called from non-manager, the struct worker
      is allocated from the node of the caller which may be different from the
      node of pool->node.
      
      So we add a node ID argument for the alloc_worker() to ensure the
      struct worker is allocated from the preferable node.
      
      tj: @nid renamed to @node for consistency.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f7537df5
    • S
      tracing: Fix graph tracer with stack tracer on other archs · 5f8bf2d2
      Steven Rostedt (Red Hat) 提交于
      Running my ftrace tests on PowerPC, it failed the test that checks
      if function_graph tracer is affected by the stack tracer. It was.
      Looking into this, I found that the update_function_graph_func()
      must be called even if the trampoline function is not changed.
      This is because archs like PowerPC do not support ftrace_ops being
      passed by assembly and instead uses a helper function (what the
      trampoline function points to). Since this function is not changed
      even when multiple ftrace_ops are added to the code, the test that
      falls out before calling update_function_graph_func() will miss that
      the update must still be done.
      
      Call update_function_graph_function() for all calls to
      update_ftrace_function()
      
      Cc: stable@vger.kernel.org # 3.3+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5f8bf2d2
    • T
      cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test · 5de4fa13
      Tejun Heo 提交于
      cgrp_dfl_root_inhibit_ss_mask determines which subsystems are not
      supported on the default hierarchy and is currently initialized
      statically and just includes the debug subsystem.  Now that there's
      cgroup_subsys->dfl_files, we can easily tell which subsystems support
      the default hierarchy or not.
      
      Let's initialize cgrp_dfl_root_inhibit_ss_mask by testing whether
      cgroup_subsys->dfl_files is NULL.  After all, subsystems with NULL
      ->dfl_files aren't useable on the default hierarchy anyway.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5de4fa13
    • T
      cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core · 05ebb6e6
      Tejun Heo 提交于
      cgroup now distinguishes cftypes for the default and legacy
      hierarchies more explicitly by using separate arrays and
      CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
      inside cgroup core proper.  Let's make it clear that the flags are
      internal by prefixing them with double underscores.
      
      CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency.  The
      two flags are also collected and assigned bits >= 16 so that they
      aren't mixed with the published flags.
      
      v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
          revision to the previous patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      05ebb6e6
    • T
      cgroup: distinguish the default and legacy hierarchies when handling cftypes · a8ddc821
      Tejun Heo 提交于
      Until now, cftype arrays carried files for both the default and legacy
      hierarchies and the files which needed to be used on only one of them
      were flagged with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE.  This
      gets confusing very quickly and we may end up exposing interface files
      to the default hierarchy without thinking it through.
      
      This patch makes cgroup core provide separate sets of interfaces for
      cftype handling so that the cftypes for the default and legacy
      hierarchies are clearly distinguished.  The previous two patches
      renamed the existing ones so that they clearly indicate that they're
      for the legacy hierarchies.  This patch adds the interface for the
      default hierarchy and apply them selectively depending on the
      hierarchy type.
      
      * cftypes added through cgroup_subsys->dfl_cftypes and
        cgroup_add_dfl_cftypes() only show up on the default hierarchy.
      
      * cftypes added through cgroup_subsys->legacy_cftypes and
        cgroup_add_legacy_cftypes() only show up on the legacy hierarchies.
      
      * cgroup_subsys->dfl_cftypes and ->legacy_cftypes can point to the
        same array for the cases where the interface files are identical on
        both types of hierarchies.
      
      * This makes all the existing subsystem interface files legacy-only by
        default and all subsystems will have no interface file created when
        enabled on the default hierarchy.  Each subsystem should explicitly
        review and compose the interface for the default hierarchy.
      
      * A boot param "cgroup__DEVEL__legacy_files_on_dfl" is added which
        makes subsystems which haven't decided the interface files for the
        default hierarchy to present the legacy files on the default
        hierarchy so that its behavior on the default hierarchy can be
        tested.  As the awkward name suggests, this is for development only.
      
      * memcg's CFTYPE_INSANE on "use_hierarchy" is noop now as the whole
        array isn't used on the default hierarchy.  The flag is removed.
      
      v2: Updated documentation for cgroup__DEVEL__legacy_files_on_dfl.
      
      v3: Clear CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE when cfts are removed
          as suggested by Li.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      a8ddc821
    • T
      cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() · 2cf669a5
      Tejun Heo 提交于
      Currently, cftypes added by cgroup_add_cftypes() are used for both the
      unified default hierarchy and legacy ones and subsystems can mark each
      file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
      appear only on one of them.  This is quite hairy and error-prone.
      Also, we may end up exposing interface files to the default hierarchy
      without thinking it through.
      
      cgroup_subsys will grow two separate cftype addition functions and
      apply each only on the hierarchies of the matching type.  This will
      allow organizing cftypes in a lot clearer way and encourage subsystems
      to scrutinize the interface which is being exposed in the new default
      hierarchy.
      
      In preparation, this patch adds cgroup_add_legacy_cftypes() which
      currently is a simple wrapper around cgroup_add_cftypes() and replaces
      all cgroup_add_cftypes() usages with it.
      
      While at it, this patch drops a completely spurious return from
      __hugetlb_cgroup_file_init().
      
      This patch doesn't introduce any functional differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      2cf669a5
    • T
      cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes · 5577964e
      Tejun Heo 提交于
      Currently, cgroup_subsys->base_cftypes is used for both the unified
      default hierarchy and legacy ones and subsystems can mark each file
      with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
      only on one of them.  This is quite hairy and error-prone.  Also, we
      may end up exposing interface files to the default hierarchy without
      thinking it through.
      
      cgroup_subsys will grow two separate cftype arrays and apply each only
      on the hierarchies of the matching type.  This will allow organizing
      cftypes in a lot clearer way and encourage subsystems to scrutinize
      the interface which is being exposed in the new default hierarchy.
      
      In preparation, this patch renames cgroup_subsys->base_cftypes to
      cgroup_subsys->legacy_cftypes.  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      5577964e
    • T
      cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[] · a14c6874
      Tejun Heo 提交于
      Currently cgroup_base_files[] contains the cgroup core interface files
      for both legacy and default hierarchies with each file tagged with
      CFTYPE_INSANE and CFTYPE_ONLY_ON_DFL.  This is difficult to read.
      
      Let's separate it out to two separate tables, cgroup_dfl_base_files[]
      and cgroup_legacy_base_files[], and use the appropriate one in
      cgroup_mkdir() depending on the hierarchy type.  This makes tagging
      each file unnecessary.
      
      This patch doesn't introduce any behavior changes.
      
      v2: cgroup_dfl_base_files[] was missing the termination entry
          triggering WARN in cgroup_init_cftypes() for 0day kernel testing
          robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Jet Chen <jet.chen@intel.com>
      a14c6874
    • Z
      tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs · 8abfb872
      zhangwei(Jovi) 提交于
      Currently trace option stacktrace is not applicable for
      trace_printk with constant string argument, the reason is
      in __trace_puts/__trace_bputs ftrace_trace_stack is missing.
      
      In contrast, when using trace_printk with non constant string
      argument(will call into __trace_printk/__trace_bprintk), then
      trace option stacktrace is workable, this inconstant result
      will confuses users a lot.
      
      Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8abfb872
    • Z
      PM / sleep: fix freeze_ops NULL pointer dereferences · 653a3538
      Zhang Rui 提交于
      This patch fixes a NULL pointer dereference issue introduced by
      commit 1f0b6386 (ACPI / PM: Hold ACPI scan lock over the "freeze"
      sleep state).
      
      Fixes: 1f0b6386 (ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state)
      Link: http://marc.info/?l=linux-pm&m=140541182017443&w=2Reported-and-tested-by: NAlexander Stein <alexander.stein@systec-electronic.com>
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      653a3538