1. 24 1月, 2014 6 次提交
  2. 22 1月, 2014 8 次提交
    • M
      sched: add tracepoints related to NUMA task migration · 286549dc
      Mel Gorman 提交于
      This patch adds three tracepoints
       o trace_sched_move_numa	when a task is moved to a node
       o trace_sched_swap_numa	when a task is swapped with another task
       o trace_sched_stick_numa	when a numa-related migration fails
      
      The tracepoints allow the NUMA scheduler activity to be monitored and the
      following high-level metrics can be calculated
      
       o NUMA migrated stuck	 nr trace_sched_stick_numa
       o NUMA migrated idle	 nr trace_sched_move_numa
       o NUMA migrated swapped nr trace_sched_swap_numa
       o NUMA local swapped	 trace_sched_swap_numa src_nid == dst_nid (should never happen)
       o NUMA remote swapped	 trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped)
       o NUMA group swapped	 trace_sched_swap_numa src_ngid == dst_ngid
      			 Maybe a small number of these are acceptable
      			 but a high number would be a major surprise.
      			 It would be even worse if bounces are frequent.
       o NUMA avg task migs.	 Average number of migrations for tasks
       o NUMA stddev task mig	 Self-explanatory
       o NUMA max task migs.	 Maximum number of migrations for a single task
      
      In general the intent of the tracepoints is to help diagnose problems
      where automatic NUMA balancing appears to be doing an excessive amount
      of useless work.
      
      [akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      286549dc
    • S
      kernel/power/snapshot.c: use memblock apis for early memory allocations · c2f69cda
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2f69cda
    • S
      kernel/printk/printk.c: use memblock apis for early memory allocations · 9da791df
      Santosh Shilimkar 提交于
      Switch to memblock interfaces for early memory allocator instead of
      bootmem allocator.  No functional change in beahvior than what it is in
      current code from bootmem users points of view.
      
      Archs already converted to NO_BOOTMEM now directly use memblock
      interfaces instead of bootmem wrappers build on top of memblock.  And
      the archs which still uses bootmem, these new apis just fallback to
      exiting bootmem APIs.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9da791df
    • O
      introduce for_each_thread() to replace the buggy while_each_thread() · 0c740d0a
      Oleg Nesterov 提交于
      while_each_thread() and next_thread() should die, almost every lockless
      usage is wrong.
      
      1. Unless g == current, the lockless while_each_thread() is not safe.
      
         while_each_thread(g, t) can loop forever if g exits, next_thread()
         can't reach the unhashed thread in this case. Note that this can
         happen even if g is the group leader, it can exec.
      
      2. Even if while_each_thread() itself was correct, people often use
         it wrongly.
      
         It was never safe to just take rcu_read_lock() and loop unless
         you verify that pid_alive(g) == T, even the first next_thread()
         can point to the already freed/reused memory.
      
      This patch adds signal_struct->thread_head and task->thread_node to
      create the normal rcu-safe list with the stable head.  The new
      for_each_thread(g, t) helper is always safe under rcu_read_lock() as
      long as this task_struct can't go away.
      
      Note: of course it is ugly to have both task_struct->thread_node and the
      old task_struct->thread_group, we will kill it later, after we change
      the users of while_each_thread() to use for_each_thread().
      
      Perhaps we can kill it even before we convert all users, we can
      reimplement next_thread(t) using the new thread_head/thread_node.  But
      we can't do this right now because this will lead to subtle behavioural
      changes.  For example, do/while_each_thread() always sees at least one
      task, while for_each_thread() can do nothing if the whole thread group
      has died.  Or thread_group_empty(), currently its semantics is not clear
      unless thread_group_leader(p) and we need to audit the callers before we
      can change it.
      
      So this patch adds the new interface which has to coexist with the old
      one for some time, hopefully the next changes will be more or less
      straightforward and the old one will go away soon.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NSergey Dyasly <dserrg@gmail.com>
      Tested-by: NSergey Dyasly <dserrg@gmail.com>
      Reviewed-by: NSameer Nanda <snanda@chromium.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: "Ma, Xindong" <xindong.ma@intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c740d0a
    • J
      mm: add overcommit_kbytes sysctl variable · 49f0ce5f
      Jerome Marchand 提交于
      Some applications that run on HPC clusters are designed around the
      availability of RAM and the overcommit ratio is fine tuned to get the
      maximum usage of memory without swapping.  With growing memory, the
      1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
      for these workload (on a 2TB machine it represents no less than 20GB).
      
      This patch adds the new overcommit_kbytes sysctl variable that allow a
      much finer grain.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49f0ce5f
    • J
      fsnotify: remove pointless NULL initializers · 56b27cf6
      Jan Kara 提交于
      We usually rely on the fact that struct members not specified in the
      initializer are set to NULL.  So do that with fsnotify function pointers
      as well.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56b27cf6
    • J
      fsnotify: remove .should_send_event callback · 83c4c4b0
      Jan Kara 提交于
      After removing event structure creation from the generic layer there is
      no reason for separate .should_send_event and .handle_event callbacks.
      So just remove the first one.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c4c4b0
    • J
      fsnotify: do not share events between notification groups · 7053aee2
      Jan Kara 提交于
      Currently fsnotify framework creates one event structure for each
      notification event and links this event into all interested notification
      groups.  This is done so that we save memory when several notification
      groups are interested in the event.  However the need for event
      structure shared between inotify & fanotify bloats the event structure
      so the result is often higher memory consumption.
      
      Another problem is that fsnotify framework keeps path references with
      outstanding events so that fanotify can return open file descriptors
      with its events.  This has the undesirable effect that filesystem cannot
      be unmounted while there are outstanding events - a regression for
      inotify compared to a situation before it was converted to fsnotify
      framework.  For fanotify this problem is hard to avoid and users of
      fanotify should kind of expect this behavior when they ask for file
      descriptors from notified files.
      
      This patch changes fsnotify and its users to create separate event
      structure for each group.  This allows for much simpler code (~400 lines
      removed by this patch) and also smaller event structures.  For example
      on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
      additional space for file name, additional 24 bytes for second and each
      subsequent group linking the event, and additional 32 bytes for each
      inotify group for private data.  After the conversion inotify event
      consumes 48 bytes plus space for file name which is considerably less
      memory unless file names are long and there are several groups
      interested in the events (both of which are uncommon).  Fanotify event
      fits in 56 bytes after the conversion (fanotify doesn't care about file
      names so its events don't have to have it allocated).  A win unless
      there are four or more fanotify groups interested in the event.
      
      The conversion also solves the problem with unmount when only inotify is
      used as we don't have to grab path references for inotify events.
      
      [hughd@google.com: fanotify: fix corruption preventing startup]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7053aee2
  3. 21 1月, 2014 1 次提交
  4. 20 1月, 2014 1 次提交
    • A
      tracing: Fix buggered tee(2) on tracing_pipe · 92fdd98c
      Al Viro 提交于
      In kernel/trace/trace.c we have this:
      static void tracing_pipe_buf_release(struct pipe_inode_info *pipe,
                                           struct pipe_buffer *buf)
      {
              __free_page(buf->page);
      }
      static const struct pipe_buf_operations tracing_pipe_buf_ops = {
              .can_merge              = 0,
              .map                    = generic_pipe_buf_map,
              .unmap                  = generic_pipe_buf_unmap,
              .confirm                = generic_pipe_buf_confirm,
              .release                = tracing_pipe_buf_release,
              .steal                  = generic_pipe_buf_steal,
              .get                    = generic_pipe_buf_get,
      };
      with
      void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
      {
              page_cache_get(buf->page);
      }
      
      and I don't see anything that would've prevented tee(2) called on the pipe
      that got stuff spliced into it from that sucker.  ->ops->get() will be
      called, then buf gets copied into target pipe's ->bufs[] and eventually
      readers get to both copies of the buffer.  With
      	get_page(page)
      	look at that page
      	__free_page(page)
      	look at that page
      	__free_page(page)
      which is not a good thing, to put it mildly.  AFAICS, that ought to use
      the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)),
      shouldn't it?
      
      [
       SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL),
        and doesn't do anything special with it (no LRU logic). The __free_page()
        should be fine, as it wont actually free a page with reference count.
        Maybe there's a chance to leak memory? Anyway, This change is at a minimum
        good for being symmetric with generic_pipe_buf_get, it is fine to add.
      ]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      [ SDR - Removed no longer used tracing_pipe_buf_release ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      92fdd98c
  5. 18 1月, 2014 1 次提交
  6. 17 1月, 2014 1 次提交
  7. 16 1月, 2014 9 次提交
  8. 14 1月, 2014 4 次提交
  9. 13 1月, 2014 9 次提交
    • S
      ftrace: Have function graph only trace based on global_ops filters · 23a8e844
      Steven Rostedt (Red Hat) 提交于
      Doing some different tests, I discovered that function graph tracing, when
      filtered via the set_ftrace_filter and set_ftrace_notrace files, does
      not always keep with them if another function ftrace_ops is registered
      to trace functions.
      
      The reason is that function graph just happens to trace all functions
      that the function tracer enables. When there was only one user of
      function tracing, the function graph tracer did not need to worry about
      being called by functions that it did not want to trace. But now that there
      are other users, this becomes a problem.
      
      For example, one just needs to do the following:
      
       # cd /sys/kernel/debug/tracing
       # echo schedule > set_ftrace_filter
       # echo function_graph > current_tracer
       # cat trace
      [..]
       0)               |  schedule() {
       ------------------------------------------
       0)    <idle>-0    =>   rcu_pre-7
       ------------------------------------------
      
       0) ! 2980.314 us |  }
       0)               |  schedule() {
       ------------------------------------------
       0)   rcu_pre-7    =>    <idle>-0
       ------------------------------------------
      
       0) + 20.701 us   |  }
      
       # echo 1 > /proc/sys/kernel/stack_tracer_enabled
       # cat trace
      [..]
       1) + 20.825 us   |      }
       1) + 21.651 us   |    }
       1) + 30.924 us   |  } /* SyS_ioctl */
       1)               |  do_page_fault() {
       1)               |    __do_page_fault() {
       1)   0.274 us    |      down_read_trylock();
       1)   0.098 us    |      find_vma();
       1)               |      handle_mm_fault() {
       1)               |        _raw_spin_lock() {
       1)   0.102 us    |          preempt_count_add();
       1)   0.097 us    |          do_raw_spin_lock();
       1)   2.173 us    |        }
       1)               |        do_wp_page() {
       1)   0.079 us    |          vm_normal_page();
       1)   0.086 us    |          reuse_swap_page();
       1)   0.076 us    |          page_move_anon_rmap();
       1)               |          unlock_page() {
       1)   0.082 us    |            page_waitqueue();
       1)   0.086 us    |            __wake_up_bit();
       1)   1.801 us    |          }
       1)   0.075 us    |          ptep_set_access_flags();
       1)               |          _raw_spin_unlock() {
       1)   0.098 us    |            do_raw_spin_unlock();
       1)   0.105 us    |            preempt_count_sub();
       1)   1.884 us    |          }
       1)   9.149 us    |        }
       1) + 13.083 us   |      }
       1)   0.146 us    |      up_read();
      
      When the stack tracer was enabled, it enabled all functions to be traced, which
      now the function graph tracer also traces. This is a side effect that should
      not occur.
      
      To fix this a test is added when the function tracing is changed, as well as when
      the graph tracer is enabled, to see if anything other than the ftrace global_ops
      function tracer is enabled. If so, then the graph tracer calls a test trampoline
      that will look at the function that is being traced and compare it with the
      filters defined by the global_ops.
      
      As an optimization, if there's no other function tracers registered, or if
      the only registered function tracers also use the global ops, the function
      graph infrastructure will call the registered function graph callback directly
      and not go through the test trampoline.
      
      Cc: stable@vger.kernel.org # 3.3+
      Fixes: d2d45c7a "tracing: Have stack_tracer use a separate list of functions"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      23a8e844
    • P
      sched/clock: Fix up clear_sched_clock_stable() · 6577e42a
      Peter Zijlstra 提交于
      The below tells us the static_key conversion has a problem; since the
      exact point of clearing that flag isn't too important, delay the flip
      and use a workqueue to process it.
      
      [ ] TSC synchronization [CPU#0 -> CPU#22]:
      [ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock.
      [ ]
      [ ] ======================================================
      [ ] [ INFO: possible circular locking dependency detected ]
      [ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted
      [ ] -------------------------------------------------------
      [ ] swapper/0/1 is trying to acquire lock:
      [ ]  (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]
      [ ] but task is already holding lock:
      [ ]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
      [ ]
      [ ] which lock already depends on the new lock.
      [ ]
      [ ]
      [ ] the existing dependency chain (in reverse order) is:
      [ ]
      [ ] -> #1 (cpu_hotplug.lock){+.+.+.}:
      [ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]        [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60
      [ ]        [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130
      [ ]        [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
      [ ]        [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
      [ ]        [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
      [ ]        [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100
      [ ]        [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30
      [ ]        [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7
      [ ]        [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f
      [ ]        [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c
      [ ]        [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe
      [ ]
      [ ] -> #0 (jump_label_mutex){+.+...}:
      [ ]        [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
      [ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]        [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]        [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
      [ ]        [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]        [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]        [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]        [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]        [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]        [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]        [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]        [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]        [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]        [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]
      [ ] other info that might help us debug this:
      [ ]
      [ ]  Possible unsafe locking scenario:
      [ ]
      [ ]        CPU0                    CPU1
      [ ]        ----                    ----
      [ ]   lock(cpu_hotplug.lock);
      [ ]                                lock(jump_label_mutex);
      [ ]                                lock(cpu_hotplug.lock);
      [ ]   lock(jump_label_mutex);
      [ ]
      [ ]  *** DEADLOCK ***
      [ ]
      [ ] 2 locks held by swapper/0/1:
      [ ]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20
      [ ]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
      [ ]
      [ ] stack backtrace:
      [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
      [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [ ]  ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270
      [ ]  ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0
      [ ]  ffff880236858678 0000000000000001 0000000000000002 ffff880236858000
      [ ] Call Trace:
      [ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
      [ ]  [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207
      [ ]  [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
      [ ]  [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0
      [ ]  [<ffffffff810def00>] lock_acquire+0x90/0x130
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
      [ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115a637>] jump_label_lock+0x17/0x20
      [ ]  [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
      [ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ] ------------[ cut here ]------------
      [ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300()
      [ ] Modules linked in:
      [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
      [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
      [ ]  0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000
      [ ]  ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000
      [ ]  ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28
      [ ] Call Trace:
      [ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
      [ ]  [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20
      [ ]  [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff8110ba96>] smp_call_function+0x46/0x80
      [ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
      [ ]  [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0
      [ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
      [ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
      [ ]  [<ffffffff8104f964>] text_poke_bp+0x64/0xd0
      [ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
      [ ]  [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130
      [ ]  [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
      [ ]  [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
      [ ]  [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
      [ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
      [ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
      [ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
      [ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
      [ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
      [ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
      [ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
      [ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
      [ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
      [ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
      [ ] ---[ end trace 6ff1df5620c49d26 ]---
      [ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6577e42a
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
    • P
      sched/clock: Remove local_irq_disable() from the clocks · ef08f0ff
      Peter Zijlstra 提交于
      Now that x86 no longer requires IRQs disabled for sched_clock() and
      ia64 never had this requirement (it doesn't seem to do cpufreq at
      all), we can remove the requirement of disabling IRQs.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     257223     221876
          (cold) local_clock: 301773     309889     234692
          (warm) sched_clock: 38375      25280      25602
          (warm) local_clock: 100371     85268      33265
          (warm) rdtsc:       27340      24247      24214
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     301224     235941
          (cold) local_clock: 396890     399870     297017
          (warm) sched_clock: 38194      25630      25233
          (warm) local_clock: 143452     129629     71234
          (warm) rdtsc:       27345      24307      24245
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-36e5kohiasnr106d077mgubp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef08f0ff
    • P
      locking: Optimize lock_bh functions · 9ea4c380
      Peter Zijlstra 提交于
      Currently all _bh_ lock functions do two preempt_count operations:
      
        local_bh_disable();
        preempt_disable();
      
      and for the unlock:
      
        preempt_enable_no_resched();
        local_bh_enable();
      
      Since its a waste of perfectly good cycles to modify the same variable
      twice when you can do it in one go; use the new
      __local_bh_{dis,en}able_ip() functions that allow us to provide a
      preempt_count value to add/sub.
      
      So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to
      add/sub to be done in one go.
      
      As a bonus it gets rid of the preempt_enable_no_resched() usage.
      
      This reduces a 1000 loops of:
      
        spin_lock_bh(&bh_lock);
        spin_unlock_bh(&bh_lock);
      
      from 53596 cycles to 51995 cycles. I didn't do enough measurements to
      say for absolute sure that the result is significant but the the few
      runs I did for each suggest it is so.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: rui.zhang@intel.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9ea4c380
    • D
      sched: Factor out the on_null_domain() checks in trigger_load_balance() · c726099e
      Daniel Lezcano 提交于
      The test on_null_domain is done twice in the trigger_load_balance function.
      
      Move the test at the begin of the function, so there is only one check.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1389008085-9069-9-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c726099e
    • D
      sched: Pass 'struct rq' to nohz_idle_balance() · 208cb16b
      Daniel Lezcano 提交于
      The cpu information is stored in the struct rq. Pass the struct rq to
      nohz_idle_balance, so all the functions called in run_rebalance_domains have
      the same parameters and the 'this_cpu' variable becomes pointless.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      [ Added !SMP build fix. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1389008085-9069-8-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      208cb16b
    • D
      sched: Pass 'struct rq' to rebalance_domains() · f7ed0a89
      Daniel Lezcano 提交于
      The cpu information is stored in the struct rq and the caller of the
      rebalance_domains function pass the cpu to retrieve the struct rq but
      it already has the struct rq info. Replace the cpu parameter with the
      struct rq.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1389008085-9069-7-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f7ed0a89
    • D
      sched: Remove unused parameter from nohz_balancer_kick() · 0aeeeeba
      Daniel Lezcano 提交于
      The cpu parameter is no longer needed in nohz_balancer_kick, let's remove
      the parameter.
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1389008085-9069-6-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0aeeeeba