1. 30 6月, 2013 1 次提交
  2. 26 6月, 2013 2 次提交
    • C
      futex: Use freezable blocking call · 88c8004f
      Colin Cross 提交于
      Avoid waking up every thread sleeping in a futex_wait call during
      suspend and resume by calling a freezable blocking call.  Previous
      patches modified the freezer to avoid sending wakeups to threads
      that are blocked in freezable blocking calls.
      
      This call was selected to be converted to a freezable call because
      it doesn't hold any locks or release any resources when interrupted
      that might be needed by another freezing task or a kernel driver
      during suspend, and is a common site where idle userspace tasks are
      blocked.
      Signed-off-by: NColin Cross <ccross@android.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: arve@android.com
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: http://lkml.kernel.org/r/1367458508-9133-8-git-send-email-ccross@android.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      88c8004f
    • Z
      futex: Take hugepages into account when generating futex_key · 13d60f4b
      Zhang Yi 提交于
      The futex_keys of process shared futexes are generated from the page
      offset, the mapping host and the mapping index of the futex user space
      address. This should result in an unique identifier for each futex.
      
      Though this is not true when futexes are located in different subpages
      of an hugepage. The reason is, that the mapping index for all those
      futexes evaluates to the index of the base page of the hugetlbfs
      mapping. So a futex at offset 0 of the hugepage mapping and another
      one at offset PAGE_SIZE of the same hugepage mapping have identical
      futex_keys. This happens because the futex code blindly uses
      page->index.
      
      Steps to reproduce the bug:
      
      1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
         and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
         mapping.
      
         The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
         PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
         their keys solely depend on the user space address.
      
      2. Lock mutex1 and mutex2
      
      3. Create thread1 and in the thread function lock mutex1, which
         results in thread1 blocking on the locked mutex1.
      
      4. Create thread2 and in the thread function lock mutex2, which
         results in thread2 blocking on the locked mutex2.
      
      5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
         still blocks on mutex2 because the futex_key points to mutex1.
      
      To solve this issue we need to take the normal page index of the page
      which contains the futex into account, if the futex is in an hugetlbfs
      mapping. In other words, we calculate the normal page mapping index of
      the subpage in the hugetlbfs mapping.
      
      Mappings which are not based on hugetlbfs are not affected and still
      use page->index.
      
      Thanks to Mel Gorman who provided a patch for adding proper evaluation
      functions to the hugetlbfs code to avoid exposing hugetlbfs specific
      details to the futex code.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NZhang Yi <zhang.yi20@zte.com.cn>
      Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn>
      Tested-by: NMa Chenggong <ma.chenggong@zte.com.cn>
      Reviewed-by: N'Mel Gorman' <mgorman@suse.de>
      Acked-by: N'Darren Hart' <dvhart@linux.intel.com>
      Cc: 'Peter Zijlstra' <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      13d60f4b
  3. 21 6月, 2013 1 次提交
    • D
      tick: Fix tick_broadcast_pending_mask not cleared · ea8deb8d
      Daniel Lezcano 提交于
      The recent modification in the cpuidle framework consolidated the
      timer broadcast code across the different drivers by setting a new
      flag in the idle state. It tells the cpuidle core code to enter/exit
      the broadcast mode for the cpu when entering a deep idle state. The
      broadcast timer enter/exit is no longer handled by the back-end
      driver.
      
      This change made the local interrupt to be enabled *before* calling
      CLOCK_EVENT_NOTIFY_EXIT.
      
      On a tegra114, a four cores system, when the flag has been introduced
      in the driver, the following warning appeared:
      
      WARNING: at kernel/time/tick-broadcast.c:578 tick_broadcast_oneshot_control
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-rc3-next-20130529+ #15
      [<c00667f8>] (tick_broadcast_oneshot_control+0x1a4/0x1d0) from [<c0065cd0>] (tick_notify+0x240/0x40c)
      [<c0065cd0>] (tick_notify+0x240/0x40c) from [<c0044724>] (notifier_call_chain+0x44/0x84)
      [<c0044724>] (notifier_call_chain+0x44/0x84) from [<c0044828>] (raw_notifier_call_chain+0x18/0x20)
      [<c0044828>] (raw_notifier_call_chain+0x18/0x20) from [<c00650cc>] (clockevents_notify+0x28/0x170)
      [<c00650cc>] (clockevents_notify+0x28/0x170) from [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168)
      [<c033f1f0>] (cpuidle_idle_call+0x11c/0x168) from [<c000ea94>] (arch_cpu_idle+0x8/0x38)
      [<c000ea94>] (arch_cpu_idle+0x8/0x38) from [<c005ea80>] (cpu_startup_entry+0x60/0x134)
      [<c005ea80>] (cpu_startup_entry+0x60/0x134) from [<804fe9a4>] (0x804fe9a4)
      
      I don't have the hardware, so I wasn't able to reproduce the warning
      but after looking a while at the code, I deduced the following:
      
       1. the CPU2 enters a deep idle state and sets the broadcast timer
      
       2. the timer expires, the tick_handle_oneshot_broadcast function is
          called, setting the tick_broadcast_pending_mask and waking up the
          idle cpu CPU2
      
       3. the CPU2 exits idle handles the interrupt and then invokes
          tick_broadcast_oneshot_control with CLOCK_EVENT_NOTIFY_EXIT which
          runs the following code:
      
          [...]
          if (dev->next_event.tv64 == KTIME_MAX)
                  goto out;
      
          if (cpumask_test_and_clear_cpu(cpu,
                                       tick_broadcast_pending_mask))
                  goto out;
          [...]
      
          So if there is no next event scheduled for CPU2, we fulfil the
          first condition and jump out without clearing the
          tick_broadcast_pending_mask.
      
       4. CPU2 goes to deep idle again and calls
          tick_broadcast_oneshot_control with CLOCK_NOTIFY_EVENT_ENTER but
          with the tick_broadcast_pending_mask set for CPU2, triggering the
          warning.
      
      The issue only surfaced due to the modifications of the cpuidle
      framework, which resulted in interrupts being enabled before the call
      to the clockevents code. If the call happens before interrupts have
      been enabled, the warning cannot trigger, because there is still the
      event pending which caused the broadcast timer expiry.
      
      Move the check for the next event below the check for the pending bit,
      so the pending bit gets cleared whether an event is scheduled on the
      cpu or not.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reported-and-tested-by: NJoseph Lo <josephl@nvidia.com>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linaro-kernel@lists.linaro.org
      Link: http://lkml.kernel.org/r/1371485735-31249-1-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ea8deb8d
  4. 20 6月, 2013 2 次提交
  5. 19 6月, 2013 4 次提交
    • S
      tracing/context-tracking: Add preempt_schedule_context() for tracing · 29bb9e5a
      Steven Rostedt 提交于
      Dave Jones hit the following bug report:
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #1 Not tainted
       -------------------------------
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
       RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       2 locks held by cc1/63645:
        #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
        #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
      
       CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
       Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
        0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
        ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
        0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
       Call Trace:
        [<ffffffff816ae383>] dump_stack+0x19/0x1b
        [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
        [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
        [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
        [<ffffffff8108dffc>] update_curr+0xec/0x240
        [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
        [<ffffffff816b3a71>] __schedule+0x161/0x9b0
        [<ffffffff816b4721>] preempt_schedule+0x51/0x80
        [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
        [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
        [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
        [<ffffffff816be280>] ftrace_call+0x5/0x2f
        [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
        [<ffffffff816b4805>] ? schedule_user+0x5/0x70
        [<ffffffff816b4805>] ? schedule_user+0x5/0x70
        [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
       ------------[ cut here ]------------
      
      What happened was that the function tracer traced the schedule_user() code
      that tells RCU that the system is coming back from userspace, and to
      add the CPU back to the RCU monitoring.
      
      Because the function tracer does a preempt_disable/enable_notrace() calls
      the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
      then preempt_schedule() is called. But this is called before the user_exit()
      function can inform the kernel that the CPU is no longer in user mode and
      needs to be accounted for by RCU.
      
      The fix is to create a new preempt_schedule_context() that checks if
      the kernel is still in user mode and if so to switch it to kernel mode
      before calling schedule. It also switches back to user mode coming back
      from schedule in need be.
      
      The only user of this currently is the preempt_enable_notrace(), which is
      only used by the tracing subsystem.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29bb9e5a
    • V
      sched: Fix clear NOHZ_BALANCE_KICK · 873b4c65
      Vincent Guittot 提交于
      I have faced a sequence where the Idle Load Balance was sometime not
      triggered for a while on my platform, in the following scenario:
      
       CPU 0 and CPU 1 are running tasks and CPU 2 is idle
      
       CPU 1 kicks the Idle Load Balance
       CPU 1 selects CPU 2 as the new Idle Load Balancer
       CPU 2 sets NOHZ_BALANCE_KICK for CPU 2
       CPU 2 sends a reschedule IPI to CPU 2
      
       While CPU 3 wakes up, CPU 0 or CPU 1 migrates a waking up task A on CPU 2
      
       CPU 2 finally wakes up, runs task A and discards the Idle Load Balance
             task A quickly goes back to sleep (before a tick occurs on CPU 2)
       CPU 2 goes back to idle with NOHZ_BALANCE_KICK set
      
      Whenever CPU 2 will be selected as the ILB, no reschedule IPI will be sent
      because NOHZ_BALANCE_KICK is already set and no Idle Load Balance will be
      performed.
      
      We must wait for the sched softirq to be raised on CPU 2 thanks to another
      part the kernel to come back to clear NOHZ_BALANCE_KICK.
      
      The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if
      we can't raise the sched_softirq for the Idle Load Balance.
      
      Change since V1:
      
      - move the clear of NOHZ_BALANCE_KICK in got_nohz_idle_kick if the ILB
        can't run on this CPU (as suggested by Peter)
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1370419991-13870-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      873b4c65
    • P
      perf: Fix mmap() accounting hole · 9bb5d40c
      Peter Zijlstra 提交于
      Vince's fuzzer once again found holes. This time it spotted a leak in
      the locked page accounting.
      
      When an event had redirected output and its close() was the last
      reference to the buffer we didn't have a vm context to undo accounting.
      
      Change the code to destroy the buffer on the last munmap() and detach
      all redirected events at that time. This provides us the right context
      to undo the vm accounting.
      Reported-and-tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20130604084421.GI8923@twins.programming.kicks-ass.net
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9bb5d40c
    • Y
      range: Do not add new blank slot with add_range_with_merge · 05418815
      Yinghai Lu 提交于
      Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
      during add range) broke mtrr cleanup on his setup in 3.9.5.
      corresponding commit in upstream is fbe06b7b.
      
      The reason is add_range_with_merge could generate blank spot.
      
      We could avoid that by searching new expanded start/end, that
      new range should include all connected ranges in range array.
      At last add the new expanded start/end to the range array.
      Also move up left array so do not add new blank slot in the
      range array.
      
      -v2: move left array to avoid enhance add_range()
      -v3: include fix from Joshua about memmove declaring when
           DYN_DEBUG is used.
      Reported-by: NJoshua Covington <joshuacov@googlemail.com>
      Tested-by: NJoshua Covington <joshuacov@googlemail.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1371154622-8929-3-git-send-email-yinghai@kernel.org
      Cc: <stable@vger.kernel.org> v3.9
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      05418815
  6. 15 6月, 2013 2 次提交
  7. 13 6月, 2013 5 次提交
    • C
      kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() · 736f3203
      Chen Gang 提交于
      audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
      the rule itself freed in kill_rules().
      
      The reason is when it is killed, the 'rule' itself may have already
      released, we should not access it.  one example: we add a rule to an
      inode, just at the same time the other task is deleting this inode.
      
      The work flow for adding a rule:
      
          audit_receive() -> (need audit_cmd_mutex lock)
            audit_receive_skb() ->
              audit_receive_msg() ->
                audit_receive_filter() ->
                  audit_add_rule() ->
                    audit_add_tree_rule() -> (need audit_filter_mutex lock)
                      ...
                      unlock audit_filter_mutex
                      get_tree()
                      ...
                      iterate_mounts() -> (iterate all related inodes)
                        tag_mount() ->
                          tag_trunk() ->
                            create_trunk() -> (assume it is 1st rule)
                              fsnotify_add_mark() ->
                                fsnotify_add_inode_mark() ->  (add mark to inode->i_fsnotify_marks)
                              ...
                              get_tree(); (each inode will get one)
                      ...
                      lock audit_filter_mutex
      
      The work flow for deleting an inode:
      
          __destroy_inode() ->
           fsnotify_inode_delete() ->
             __fsnotify_inode_delete() ->
              fsnotify_clear_marks_by_inode() ->  (get mark from inode->i_fsnotify_marks)
                fsnotify_destroy_mark() ->
                 fsnotify_destroy_mark_locked() ->
                   audit_tree_freeing_mark() ->
                     evict_chunk() ->
                       ...
                       tree->goner = 1
                       ...
                       kill_rules() ->   (assume current->audit_context == NULL)
                         call_rcu() ->   (rule->tree != NULL)
                           audit_free_rule_rcu() ->
                             audit_free_rule()
                       ...
                       audit_schedule_prune() ->  (assume current->audit_context == NULL)
                         kthread_run() ->    (need audit_cmd_mutex and audit_filter_mutex lock)
                           prune_one() ->    (delete it from prue_list)
                             put_tree(); (match the original get_tree above)
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      736f3203
    • O
      audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE · f000cfdd
      Oleg Nesterov 提交于
      audit_log_start() does wait_for_auditd() in a loop until
      audit_backlog_wait_time passes or audit_skb_queue has a room.
      
      If signal_pending() is true this becomes a busy-wait loop, schedule() in
      TASK_INTERRUPTIBLE won't block.
      
      Thanks to Guy for fully investigating and explaining the problem.
      
      (akpm: that'll cause the system to lock up on a non-preemptible
      uniprocessor kernel)
      
      (Guy: "Our customer was in fact running a uniprocessor machine, and they
      reported a system hang.")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NGuy Streeter <streeter@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f000cfdd
    • K
      kmsg: honor dmesg_restrict sysctl on /dev/kmsg · 637241a9
      Kees Cook 提交于
      The dmesg_restrict sysctl currently covers the syslog method for access
      dmesg, however /dev/kmsg isn't covered by the same protections.  Most
      people haven't noticed because util-linux dmesg(1) defaults to using the
      syslog method for access in older versions.  With util-linux dmesg(1)
      defaults to reading directly from /dev/kmsg.
      
      To fix /dev/kmsg, let's compare the existing interfaces and what they
      allow:
      
       - /proc/kmsg allows:
        - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
          single-reader interface (SYSLOG_ACTION_READ).
        - everything, after an open.
      
       - syslog syscall allows:
        - anything, if CAP_SYSLOG.
        - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
          dmesg_restrict==0.
        - nothing else (EPERM).
      
      The use-cases were:
       - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
       - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
         destructive SYSLOG_ACTION_READs.
      
      AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
      clear the ring buffer.
      
      Based on the comments in devkmsg_llseek, it sounds like actions besides
      reading aren't going to be supported by /dev/kmsg (i.e.
      SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
      syslog syscall actions.
      
      To this end, move the check as Josh had done, but also rename the
      constants to reflect their new uses (SYSLOG_FROM_CALL becomes
      SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
      SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
      allows destructive actions after a capabilities-constrained
      SYSLOG_ACTION_OPEN check.
      
       - /dev/kmsg allows:
        - open if CAP_SYSLOG or dmesg_restrict==0
        - reading/polling, after open
      
      Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
      
      [akpm@linux-foundation.org: use pr_warn_once()]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NChristian Kujau <lists@nerdbynature.de>
      Tested-by: NJosh Boyer <jwboyer@redhat.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      637241a9
    • R
      reboot: rigrate shutdown/reboot to boot cpu · cf7df378
      Robin Holt 提交于
      We recently noticed that reboot of a 1024 cpu machine takes approx 16
      minutes of just stopping the cpus.  The slowdown was tracked to commit
      f96972f2 ("kernel/sys.c: call disable_nonboot_cpus() in
      kernel_restart()").
      
      The current implementation does all the work of hot removing the cpus
      before halting the system.  We are switching to just migrating to the
      boot cpu and then continuing with shutdown/reboot.
      
      This also has the effect of not breaking x86's command line parameter
      for specifying the reboot cpu.  Note, this code was shamelessly copied
      from arch/x86/kernel/reboot.c with bits removed pertaining to the
      reboot_cpu command line parameter.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Tested-by: NShawn Guo <shawn.guo@linaro.org>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf7df378
    • S
      CPU hotplug: provide a generic helper to disable/enable CPU hotplug · 16e53dbf
      Srivatsa S. Bhat 提交于
      There are instances in the kernel where we would like to disable CPU
      hotplug (from sysfs) during some important operation.  Today the freezer
      code depends on this and the code to do it was kinda tailor-made for
      that.
      
      Restructure the code and make it generic enough to be useful for other
      usecases too.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16e53dbf
  8. 12 6月, 2013 2 次提交
    • T
      idle: Add the stack canary init to cpu_startup_entry() · d7880812
      Thomas Gleixner 提交于
      Moving x86 to the generic idle implementation (commit 7d1a9417 "x86:
      Use generic idle loop") wreckaged the stack protector.
      
      I stupidly missed that boot_init_stack_canary() must be inlined from a
      function which never returns, but I put that call into
      arch_cpu_idle_prepare() which of course returns.
      
      I pondered to play tricks with arch_cpu_idle_prepare() first, but then
      I noticed, that the other archs which have implemented the
      stackprotector (ARM and SH) do not initialize the canary for the
      non-boot cpus.
      
      So I decided to move the boot_init_stack_canary() call into
      cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef
      is just a temporary measure as I don't want to inflict the
      boot_init_stack_canary() call on ARM and SH that late in the cycle.
      
      I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH
      maintainers have no objection.
      Reported-by: NWouter van Kesteren <woutershep@gmail.com>
      Cc: x86@kernel.org
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d7880812
    • Y
      tracing: Fix outputting formats of x86-tsc and counter when use trace_clock · 58e8eedf
      Yoshihiro YUNOMAE 提交于
      Outputting formats of x86-tsc and counter should be a raw format, but after
      applying the patch(2b6080f2), the format was
      changed to nanosec. This is because the global variable trace_clock_id was used.
      When we use multiple buffers, clock_id of each sub-buffer should be used. Then,
      this patch uses tr->clock_id instead of the global variable trace_clock_id.
      
      [ Basically, this fixes a regression where the multibuffer code changed the
        trace_clock file to update tr->clock_id but the traces still use the old
        global trace_clock_id variable, negating the file's effect. The global
        trace_clock_id variable is obsolete and removed. - SR ]
      
      Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevelSigned-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      58e8eedf
  9. 11 6月, 2013 3 次提交
    • B
      Fix lockup related to stop_machine being stuck in __do_softirq. · 34376a50
      Ben Greear 提交于
      The stop machine logic can lock up if all but one of the migration
      threads make it through the disable-irq step and the one remaining
      thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
      that it has a bail-out based on jiffies timeout, but in the lockup case,
      jiffies itself is not incremented.
      
      To work around this, re-add the max_restart counter in __do_irq and stop
      processing irqs after 10 restarts.
      
      Thanks to Tejun Heo and Rusty Russell and others for helping me track
      this down.
      
      This was introduced in 3.9 by commit c10d7367 ("softirq: reduce
      latencies").
      
      It may be worth looking into ath9k to see if it has issues with its irq
      handler at a later date.
      
      The hang stack traces look something like this:
      
          ------------[ cut here ]------------
          WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
          Watchdog detected hard LOCKUP on cpu 2
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
          Call Trace:
           <NMI>   warn_slowpath_common+0x85/0x9f
            warn_slowpath_fmt+0x46/0x48
            watchdog_overflow_callback+0x9c/0xa7
            __perf_event_overflow+0x137/0x1cb
            perf_event_overflow+0x14/0x16
            intel_pmu_handle_irq+0x2dc/0x359
            perf_event_nmi_handler+0x19/0x1b
            nmi_handle+0x7f/0xc2
            do_nmi+0xbc/0x304
            end_repeat_nmi+0x1e/0x2e
           <<EOE>>
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
          ---[ end trace 4947dfa9b0a4cec3 ]---
          BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          irq event stamp: 835637905
          hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
          hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
          softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
          softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
          CPU 1
          Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
          RIP: tasklet_hi_action+0xf0/0xf0
          Process migration/1
          Call Trace:
           <IRQ>
            __do_softirq+0x117/0x257
            irq_exit+0x5f/0xbb
            smp_apic_timer_interrupt+0x8a/0x98
            apic_timer_interrupt+0x72/0x80
           <EOI>
            printk+0x4d/0x4f
            stop_machine_cpu_stop+0x22c/0x274
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPekka Riikonen <priikone@iki.fi>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34376a50
    • P
      rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration · 971394f3
      Paul E. McKenney 提交于
      In Steven Rostedt's words:
      
      > I've been debugging the last couple of days why my tests have been
      > locking up. One of my tracing tests, runs all available tracers. The
      > lockup always happened with the mmiotrace, which is used to trace
      > interactions between priority drivers and the kernel. But to do this
      > easily, when the tracer gets registered, it disables all but the boot
      > CPUs. The lockup always happened after it got done disabling the CPUs.
      >
      > Then I decided to try this:
      >
      > while :; do
      > 	for i in 1 2 3; do
      > 		echo 0 > /sys/devices/system/cpu/cpu$i/online
      > 	done
      > 	for i in 1 2 3; do
      > 		echo 1 > /sys/devices/system/cpu/cpu$i/online
      > 	done
      > done
      >
      > Well, sure enough, that locked up too, with the same users. Doing a
      > sysrq-w (showing all blocked tasks):
      >
      > [ 2991.344562]   task                        PC stack   pid father
      > [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
      > [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
      > [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
      > [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
      > [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
      > [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
      > [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
      > [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
      > [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
      > [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
      > [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
      > [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
      > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
      > [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
      > [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
      > [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
      > [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
      > [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
      > [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
      > [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
      > [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
      > [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
      > [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
      > [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
      > [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
      > [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
      > [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
      > [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
      > [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
      > [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
      > [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
      > [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
      > [ 2991.344562] Call Trace:
      > [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
      > [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
      > [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
      > [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
      > [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
      > [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
      > [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
      > [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
      > [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
      > [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
      > [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
      > [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
      > [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
      > [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
      > [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
      > [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
      > [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
      > [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
      >
      > As well as held locks:
      >
      > [ 3034.728033] Showing all locks held in the system:
      > [ 3034.728033] 1 lock held by rcu_preempt/10:
      > [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
      > [ 3034.728033] 4 locks held by kworker/0:1/47:
      > [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
      > [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
      > [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
      > [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
      > [ 3034.728033] 1 lock held by mingetty/2563:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2565:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2569:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2572:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 1 lock held by mingetty/2575:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      > [ 3034.728033] 7 locks held by bash/2618:
      > [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
      > [ 3034.728033]  #1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
      > [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
      > [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
      > [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
      > [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
      > [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
      > [ 3034.728033] 1 lock held by bash/2980:
      > [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
      >
      > Things looked a little weird. Also, this is a deadlock that lockdep did
      > not catch. But what we have here does not look like a circular lock
      > issue:
      >
      > Bash is blocked in rcu_cpu_notify():
      >
      > 1961		/* Exclude any attempts to start a new grace period. */
      > 1962		mutex_lock(&rsp->onoff_mutex);
      >
      >
      > kworker is blocked in get_online_cpus(), which makes sense as we are
      > currently taking down a CPU.
      >
      > But rcu_preempt is not blocked on anything. It is simply sleeping in
      > rcu_gp_kthread (really rcu_gp_init) here:
      >
      > 1453	#ifdef CONFIG_PROVE_RCU_DELAY
      > 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
      > 1455			    system_state == SYSTEM_RUNNING)
      > 1456				schedule_timeout_uninterruptible(2);
      > 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
      >
      > And it does this while holding the onoff_mutex that bash is waiting for.
      >
      > Doing a function trace, it showed me where it happened:
      >
      > [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
      > [...]
      > [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
      >
      > The watchdog ran, and then:
      >
      > [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
      >
      > Not sure what modprobe was doing, but shortly after that:
      >
      > [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
      >
      > Where the migration thread took down the CPU:
      >
      > [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
      >
      > which finally did:
      >
      > [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
      > [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
      > [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
      > [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
      > [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
      >
      >
      > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
      > doing a schedule_timeout_uninterruptible() and it registered it's
      > timeout to the timer base for CPU 3. You would think that it would get
      > migrated right? The issue here is that the timer migration happens at
      > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
      > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
      > held by the thread that just put itself into a uninterruptible sleep,
      > that wont wake up until the CPU_DEAD notifier of the timer
      > infrastructure is called, which wont happen until the rcu notifier
      > finishes. Here's our deadlock!
      
      This commit breaks this deadlock cycle by substituting a shorter udelay()
      for the previous schedule_timeout_uninterruptible(), while at the same
      time increasing the probability of the delay.  This maintains the intensity
      of the testing.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      971394f3
    • S
      rcu: Don't call wakeup() with rcu_node structure ->lock held · 016a8d5b
      Steven Rostedt 提交于
      This commit fixes a lockdep-detected deadlock by moving a wake_up()
      call out from a rnp->lock critical section.  Please see below for
      the long version of this story.
      
      On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
      
      > [12572.705832] ======================================================
      > [12572.750317] [ INFO: possible circular locking dependency detected ]
      > [12572.796978] 3.10.0-rc3+ #39 Not tainted
      > [12572.833381] -------------------------------------------------------
      > [12572.862233] trinity-child17/31341 is trying to acquire lock:
      > [12572.870390]  (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12572.878859]
      > but task is already holding lock:
      > [12572.894894]  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12572.903381]
      > which lock already depends on the new lock.
      >
      > [12572.927541]
      > the existing dependency chain (in reverse order) is:
      > [12572.943736]
      > -> #4 (&ctx->lock){-.-...}:
      > [12572.960032]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12572.968337]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12572.976633]        [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
      > [12572.984969]        [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
      > [12572.993326]        [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
      > [12573.001652]        [<ffffffff816eacfe>] schedule_user+0x2e/0x70
      > [12573.009998]        [<ffffffff816ecd64>] retint_careful+0x12/0x2e
      > [12573.018321]
      > -> #3 (&rq->lock){-.-.-.}:
      > [12573.034628]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.042930]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.051248]        [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
      > [12573.059579]        [<ffffffff810492f5>] do_fork+0x105/0x470
      > [12573.067880]        [<ffffffff81049686>] kernel_thread+0x26/0x30
      > [12573.076202]        [<ffffffff816cee63>] rest_init+0x23/0x140
      > [12573.084508]        [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
      > [12573.092852]        [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
      > [12573.101233]        [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
      > [12573.109528]
      > -> #2 (&p->pi_lock){-.-.-.}:
      > [12573.125675]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.133829]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.141964]        [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
      > [12573.150065]        [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
      > [12573.158151]        [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
      > [12573.166195]        [<ffffffff81085398>] __wake_up_common+0x58/0x90
      > [12573.174215]        [<ffffffff81086909>] __wake_up+0x39/0x50
      > [12573.182146]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.190119]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.198023]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.205860]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.213656]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      > [12573.221379]
      > -> #1 (&rsp->gp_wq){..-.-.}:
      > [12573.236329]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.243783]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.251178]        [<ffffffff810868f3>] __wake_up+0x23/0x50
      > [12573.258505]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.265891]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.273248]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.280564]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.287807]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      
      Notice the above call chain.
      
      rcu_start_future_gp() is called with the rnp->lock held. Then it calls
      rcu_start_gp_advance, which does a wakeup.
      
      You can't do wakeups while holding the rnp->lock, as that would mean
      that you could not do a rcu_read_unlock() while holding the rq lock, or
      any lock that was taken while holding the rq lock. This is because...
      (See below).
      
      > [12573.295067]
      > -> #0 (rcu_node_0){..-.-.}:
      > [12573.309293]        [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.316568]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.323825]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.331081]        [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.338377]        [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.345648]        [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.352942]        [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.360211]        [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.367514]        [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.374816]        [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      Notice the above trace.
      
      perf took its own ctx->lock, which can be taken while holding the rq
      lock. While holding this lock, it did a rcu_read_unlock(). The
      perf_lock_task_context() basically looks like:
      
      rcu_read_lock();
      raw_spin_lock(ctx->lock);
      rcu_read_unlock();
      
      Now, what looks to have happened, is that we scheduled after taking that
      first rcu_read_lock() but before taking the spin lock. When we scheduled
      back in and took the ctx->lock, the following rcu_read_unlock()
      triggered the "special" code.
      
      The rcu_read_unlock_special() takes the rnp->lock, which gives us a
      possible deadlock scenario.
      
      	CPU0		CPU1		CPU2
      	----		----		----
      
      				     rcu_nocb_kthread()
          lock(rq->lock);
      		    lock(ctx->lock);
      				     lock(rnp->lock);
      
      				     wake_up();
      
      				     lock(rq->lock);
      
      		    rcu_read_unlock();
      
      		    rcu_read_unlock_special();
      
      		    lock(rnp->lock);
          lock(ctx->lock);
      
      **** DEADLOCK ****
      
      > [12573.382068]
      > other info that might help us debug this:
      >
      > [12573.403229] Chain exists of:
      >   rcu_node_0 --> &rq->lock --> &ctx->lock
      >
      > [12573.424471]  Possible unsafe locking scenario:
      >
      > [12573.438499]        CPU0                    CPU1
      > [12573.445599]        ----                    ----
      > [12573.452691]   lock(&ctx->lock);
      > [12573.459799]                                lock(&rq->lock);
      > [12573.467010]                                lock(&ctx->lock);
      > [12573.474192]   lock(rcu_node_0);
      > [12573.481262]
      >  *** DEADLOCK ***
      >
      > [12573.501931] 1 lock held by trinity-child17/31341:
      > [12573.508990]  #0:  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12573.516475]
      > stack backtrace:
      > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
      > [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
      > [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
      > [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
      > [12573.567856] Call Trace:
      > [12573.575011]  [<ffffffff816e375b>] dump_stack+0x19/0x1b
      > [12573.582284]  [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
      > [12573.589637]  [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.596982]  [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
      > [12573.604344]  [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.611652]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.619030]  [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.626331]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.633671]  [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.640992]  [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
      > [12573.648330]  [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
      > [12573.655662]  [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
      > [12573.662964]  [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.670276]  [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.677622]  [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
      > [12573.684981]  [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.692358]  [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.699753]  [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
      > [12573.707135]  [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
      > [12573.714599]  [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.721996]  [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      This commit delays the wakeup via irq_work(), which is what
      perf and ftrace use to perform wakeups in critical sections.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      016a8d5b
  10. 09 6月, 2013 3 次提交
  11. 07 6月, 2013 1 次提交
    • S
      tracing: Use current_uid() for critical time tracing · f17a5194
      Steven Rostedt (Red Hat) 提交于
      The irqsoff tracer records the max time that interrupts are disabled.
      There are hooks in the assembly code that calls back into the tracer when
      interrupts are disabled or enabled.
      
      When they are enabled, the tracer checks if the amount of time they
      were disabled is larger than the previous recorded max interrupts off
      time. If it is, it creates a snapshot of the currently running trace
      to store where the last largest interrupts off time was held and how
      it happened.
      
      During testing, this RCU lockdep dump appeared:
      
      [ 1257.829021] ===============================
      [ 1257.829021] [ INFO: suspicious RCU usage. ]
      [ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G        W
      [ 1257.829021] -------------------------------
      [ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
      [ 1257.829021]
      [ 1257.829021] other info that might help us debug this:
      [ 1257.829021]
      [ 1257.829021]
      [ 1257.829021] RCU used illegally from idle CPU!
      [ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
      [ 1257.829021] RCU used illegally from extended quiescent state!
      [ 1257.829021] 2 locks held by trace-cmd/4831:
      [ 1257.829021]  #0:  (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209
      [ 1257.829021]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee
      [ 1257.829021]
      [ 1257.829021] stack backtrace:
      [ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G        W    3.10.0-rc1-test+ #171
      [ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
      [ 1257.829021]  0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
      [ 1257.829021]  ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
      [ 1257.829021]  ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
      [ 1257.829021] Call Trace:
      [ 1257.829021]  [<ffffffff8153dd2b>] dump_stack+0x19/0x1b
      [ 1257.829021]  [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112
      [ 1257.829021]  [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee
      [ 1257.829021]  [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209
      [ 1257.829021]  [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197
      [ 1257.829021]  [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf
      [ 1257.829021]  [<ffffffff811002b9>] user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810029b4>] do_notify_resume+0x92/0x97
      [ 1257.829021]  [<ffffffff8154bdca>] int_signal+0x12/0x17
      
      What happened was entering into the user code, the interrupts were enabled
      and a max interrupts off was recorded. The trace buffer was saved along with
      various information about the task: comm, pid, uid, priority, etc.
      
      The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
      to retrieve the data, and this happened to happen where RCU is blind (user_enter).
      
      As only the preempt and irqs off tracers can have this happen, and they both
      only have the tsk == current, if tsk == current, use current_uid() instead of
      task_uid(), as current_uid() does not use RCU as only current can change its uid.
      
      This fixes the RCU suspicious splat.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f17a5194
  12. 04 6月, 2013 1 次提交
  13. 31 5月, 2013 4 次提交
    • J
      tick: Remove useless timekeeping duty attribution to broadcast source · f5d00c1f
      Jiri Bohac 提交于
      Since 7300711e ("clockevents: broadcast fixup possible waiters"),
      the timekeeping duty is assigned to the CPU that handles the tick
      broadcast clock device by the time it is set in one shot mode.
      
      This is an issue in full dynticks mode where the timekeeping duty
      must stay handled by the boot CPU for now. Otherwise it prevents
      secondary CPUs from offlining and this breaks
      suspend/shutdown/reboot/...
      
      As it appears there is no reason for this timekeeping duty to be
      moved to the broadcast CPU, besides nothing prevent it from being
      later re-assigned to another target, let's simply remove it.
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f5d00c1f
    • L
      nohz: Fix notifier return val that enforce timekeeping · 1a7f829f
      Li Zhong 提交于
      In tick_nohz_cpu_down_callback() if the cpu is the one handling
      timekeeping, we must return something that stops the CPU_DOWN_PREPARE
      notifiers and then start notify CPU_DOWN_FAILED on the already called
      notifier call backs.
      
      However traditional errno values are not handled by the notifier unless
      these are encapsulated using errno_to_notifier().
      
      Hence the current -EINVAL is misinterpreted and converted to junk after
      notifier_to_errno(), leaving the notifier subsystem to random behaviour
      such as eventually allowing the cpu to go down.
      
      Fix this by using the standard NOTIFY_BAD instead.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1a7f829f
    • F
      kvm: Move guest entry/exit APIs to context_tracking · 521921ba
      Frederic Weisbecker 提交于
      The kvm_host.h header file doesn't handle well
      inclusion when archs don't support KVM.
      
      This results in build crashes for such archs when they
      want to implement context tracking because this subsystem
      includes kvm_host.h in order to implement the
      guest_enter/exit APIs but it doesn't handle KVM off case.
      
      To fix this, move the guest_enter()/guest_exit()
      declarations and generic implementation to the context
      tracking headers. These generic APIs actually belong to
      this subsystem, besides other domains boundary tracking
      like user_enter() et al.
      
      KVM now properly becomes a user of this library, not the
      other buggy way around.
      Reported-by: NKevin Hilman <khilman@linaro.org>
      Reviewed-by: NKevin Hilman <khilman@linaro.org>
      Tested-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      521921ba
    • F
      vtime: Use consistent clocks among nohz accounting · 45eacc69
      Frederic Weisbecker 提交于
      While computing the cputime delta of dynticks CPUs,
      we are mixing up clocks of differents natures:
      
      * local_clock() which takes care of unstable clock
      sources and fix these if needed.
      
      * sched_clock() which is the weaker version of
      local_clock(). It doesn't compute any fixup in case
      of unstable source.
      
      If the clock source is stable, those two clocks are the
      same and we can safely compute the difference against
      two random points.
      
      Otherwise it results in random deltas as sched_clock()
      can randomly drift away, back or forward, from local_clock().
      
      As a consequence, some strange behaviour with unstable tsc
      has been observed such as non progressing constant zero cputime.
      (The 'top' command showing no load).
      
      Fix this by only using local_clock(), or its irq safe/remote
      equivalent, in vtime code.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Suggested-by: NMike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      45eacc69
  14. 30 5月, 2013 1 次提交
  15. 29 5月, 2013 4 次提交
    • S
      ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends · 1bb539ca
      Steven Rostedt 提交于
      As rcu_dereference_raw() under RCU debug config options can add quite a
      bit of checks, and that tracing uses rcu_dereference_raw(), these checks
      happen with the function tracer. The function tracer also happens to trace
      these debug checks too. This added overhead can livelock the system.
      
      Have the function tracer use the new RCU _notrace equivalents that do
      not do the debug checks for RCU.
      
      Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.orgAcked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1bb539ca
    • J
      cgroup: warn about mismatching options of a new mount of an existing hierarchy · 2a0ff3fb
      Jeff Liu 提交于
      With the new __DEVEL__sane_behavior mount option was introduced,
      if the root cgroup is alive with no xattr function, to mount a
      new cgroup with xattr will be rejected in terms of design which
      just fine.  However, if the root cgroup does not mounted with
      __DEVEL__sane_hehavior, to create a new cgroup with xattr option
      will succeed although after that the EA function does not works
      as expected but will get ENOTSUPP for setting up attributes under
      either cgroup. e.g.
      
      setfattr: /cgroup2/test: Operation not supported
      
      Instead of keeping silence in this case, it's better to drop a log
      entry in warning level.  That would be helpful to understand the
      reason behind the scene from the user's perspective, and this is
      essentially an improvement does not break the backward compatibilities.
      
      With this fix, above mount attemption will keep up works as usual but
      the following line cound be found at the system log:
      
      [ ...] cgroup: new mount options do not match the existing superblock
      
      tj: minor formatting / message updates.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      2a0ff3fb
    • Z
      timekeeping: Correct run-time detection of persistent_clock. · 0d6bd995
      Zoran Markovic 提交于
      Since commit 31ade306, timekeeping_init()
      checks for presence of persistent clock by attempting to read a non-zero
      time value. This is an issue on platforms where persistent_clock (instead
      is implemented as a free-running counter (instead of an RTC) starting
      from zero on each boot and running during suspend. Examples are some ARM
      platforms (e.g. PandaBoard).
      
      An attempt to read such a clock during timekeeping_init() may return zero
      value and falsely declare persistent clock as missing. Additionally, in
      the above case suspend times may be accounted twice (once from
      timekeeping_resume() and once from rtc_resume()), resulting in a gradual
      drift of system time.
      
      This patch does a run-time correction of the issue by doing the same check
      during timekeeping_suspend().
      
      A better long-term solution would have to return error when trying to read
      non-existing clock and zero when trying to read an uninitialized clock, but
      that would require changing all persistent_clock implementations.
      
      This patch addresses the immediate breakage, for now.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NZoran Markovic <zoran.markovic@linaro.org>
      [jstultz: Tweaked commit message and subject]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      0d6bd995
    • G
      ntp: Remove unused variable flags in __hardpps · aa848233
      Geert Uytterhoeven 提交于
      kernel/time/ntp.c: In function ‘__hardpps’:
      kernel/time/ntp.c:877: warning: unused variable ‘flags’
      
      commit a076b214 ("ntp: Remove ntp_lock,
      using the timekeeping locks to protect ntp state") removed its users,
      but not the actual variable.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      aa848233
  16. 28 5月, 2013 4 次提交
    • S
      ring-buffer: Do not poll non allocated cpu buffers · 6721cb60
      Steven Rostedt (Red Hat) 提交于
      The tracing infrastructure sets up for possible CPUs, but it uses
      the ring buffer polling, it is possible to call the ring buffer
      polling code with a CPU that hasn't been allocated. This will cause
      a kernel oops when it access a ring buffer cpu buffer that is part
      of the possible cpus but hasn't been allocated yet as the CPU has never
      been online.
      Reported-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Tested-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6721cb60
    • P
      perf: Fix perf mmap bugs · 26cb63ad
      Peter Zijlstra 提交于
      Vince reported a problem found by his perf specific trinity
      fuzzer.
      
      Al noticed 2 problems with perf's mmap():
      
       - it has issues against fork() since we use vma->vm_mm for accounting.
       - it has an rb refcount leak on double mmap().
      
      We fix the issues against fork() by using VM_DONTCOPY; I don't
      think there's code out there that uses this; we didn't hear
      about weird accounting problems/crashes. If we do need this to
      work, the previously proposed VM_PINNED could make this work.
      
      Aside from the rb reference leak spotted by Al, Vince's example
      prog was indeed doing a double mmap() through the use of
      perf_event_set_output().
      
      This exposes another problem, since we now have 2 events with
      one buffer, the accounting gets screwy because we account per
      event. Fix this by making the buffer responsible for its own
      accounting.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26cb63ad
    • M
      kprobes: Fix to free gone and unused optprobes · 7b959fc5
      Masami Hiramatsu 提交于
      Fix to free gone and unused optprobes. This bug will
      cause a kernel panic if the user reuses the killed and
      unused probe.
      
      Reported at:
      
        http://sourceware.org/ml/systemtap/2013-q2/msg00142.html
      
      In the normal path, an optprobe on an init function is
      unregistered when a module goes live.
      
      unregister_kprobe(kp)
       -> __unregister_kprobe_top
         ->__disable_kprobe
           ->disarm_kprobe(ap == op)
             ->__disarm_kprobe
              ->unoptimize_kprobe : the op is queued
                                    on unoptimizing_list
      and do nothing in __unregister_kprobe_bottom
      
      After a while (usually wait 5 jiffies), kprobe_optimizer
      runs to unoptimize and free optprobe.
      
      kprobe_optimizer
       ->do_unoptimize_kprobes
         ->arch_unoptimize_kprobes : moved to free_list
       ->do_free_cleaned_kprobes
         ->hlist_del: the op is removed
         ->free_aggr_kprobe
           ->arch_remove_optimized_kprobe
           ->arch_remove_kprobe
           ->kfree: the op is freed
      
      Here, if kprobes_module_callback is called and the delayed
      unoptimizing probe is picked BEFORE kprobe_optimizer runs,
      
      kprobes_module_callback
       ->kill_kprobe
         ->kill_optimized_kprobe : dequeued from unoptimizing_list <=!!!
           ->arch_remove_optimized_kprobe
         ->arch_remove_kprobe
         (but op is not freed, and on the kprobe hash table)
      
      This doesn't happen if the probe unregistration is done AFTER
      kprobes_module_callback is called (because at that time the op
      is gone), and kprobe-tracer does it.
      
      To fix this bug, this patch changes kprobes_module_callback to
      enqueue the op to freeing_list at kill_optimized_kprobe only
      if the op is unused. The unused probes on freeing_list will
      be freed in do_free_cleaned_kprobes.
      
      Note that this calls arch_remove_*kprobe twice on the
      same probe. Thus those functions have to check the double free.
      Fortunately, most of arch codes already checked that except
      for mips. This will be fixed in the next patch.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Timo Juhani Lindfors <timo.lindfors@iki.fi>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: systemtap@sourceware.org
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: David S. Miller <davem@davemloft.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20130522093409.9084.63554.stgit@mhiramat-M0-7522
      [ Minor edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7b959fc5
    • T
      tick: Cure broadcast false positive pending bit warning · 2938d275
      Thomas Gleixner 提交于
      commit 26517f3e (tick: Avoid programming the local cpu timer if
      broadcast pending) added a warning if the cpu enters broadcast mode
      again while the pending bit is still set. Meelis reported that the
      warning triggers. There are two corner cases which have been not
      considered:
      
      1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
         twice. That can result in the following scenario
      
         CPU0                    CPU1
                                 cpuidle_idle_call()
                                   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                     set cpu in tick_broadcast_oneshot_mask
      
         broadcast interrupt
           event expired for cpu1
           set pending bit
      
                                   acpi_idle_enter_simple()
                                     clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                       WARN_ON(pending bit)
      
        Move the WARN_ON into the section where we enter broadcast mode so
        it wont provide false positives on the second call.
      
      2) safe_halt() enables interrupts, so a broadcast interrupt can be
         delivered befor the broadcast mode is disabled. That sets the
         pending bit for the CPU which receives the broadcast
         interrupt. Though the interrupt is delivered right away from the
         broadcast handler and leaves the pending bit stale.
      
         Clear the pending bit for the current cpu in the broadcast handler.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2938d275