1. 23 4月, 2013 4 次提交
    • F
      nohz: Re-evaluate the tick for the new task after a context switch · 99e5ada9
      Frederic Weisbecker 提交于
      When a task is scheduled in, it may have some properties
      of its own that could make the CPU reconsider the need for
      the tick: posix cpu timers, perf events, ...
      
      So notify the full dynticks subsystem when a task gets
      scheduled in and re-check the tick dependency at this
      stage. This is done through a self IPI to avoid messing
      up with any current lock scenario.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      99e5ada9
    • F
      nohz: Re-evaluate the tick from the scheduler IPI · ff442c51
      Frederic Weisbecker 提交于
      The scheduler IPI is used by the scheduler to kick
      full dynticks CPUs asynchronously when more than one
      task are running or when a new timer list timer is
      enqueued. This way the destination CPU can decide
      to restart the tick to handle this new situation.
      
      Now let's call that kick in the scheduler IPI.
      
      (Reusing the scheduler IPI rather than implementing
      a new IPI was suggested by Peter Zijlstra a while ago)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ff442c51
    • F
      sched: New helper to prevent from stopping the tick in full dynticks · ce831b38
      Frederic Weisbecker 提交于
      Provide a new helper to be called from the full dynticks engine
      before stopping the tick in order to make sure we don't stop
      it when there is more than one task running on the CPU.
      
      This way we make sure that the tick stays alive to maintain
      fairness.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ce831b38
    • F
      sched: Kick full dynticks CPU that have more than one task enqueued. · 9f3660c2
      Frederic Weisbecker 提交于
      Kick the tick on full dynticks CPUs when they get more
      than one task running on their queue. This makes sure that
      local fairness is maintained by the tick on the destination.
      
      This is done regardless of these tasks' class. We should
      be able to be more clever in the future depending on these. eg:
      a CPU that runs a SCHED_FIFO task doesn't need to maintain
      fairness against local pending tasks of the fair class.
      
      But keep things simple for now.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      9f3660c2
  2. 16 4月, 2013 1 次提交
    • F
      nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2
      Frederic Weisbecker 提交于
      "Extended nohz" was used as a naming base for the full dynticks
      API and Kconfig symbols. It reflects the fact the system tries
      to stop the tick in more places than just idle.
      
      But that "extended" name is a bit opaque and vague. Rename it to
      "full" makes it clearer what the system tries to do under this
      config: try to shutdown the tick anytime it can. The various
      constraints that prevent that to happen shouldn't be considered
      as fundamental properties of this feature but rather technical
      issues that may be solved in the future.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c5bfece2
  3. 03 4月, 2013 1 次提交
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
  4. 21 3月, 2013 1 次提交
    • F
      nohz: Wake up full dynticks CPUs when a timer gets enqueued · 1c20091e
      Frederic Weisbecker 提交于
      Wake up a CPU when a timer list timer is enqueued there and
      the target is part of the full dynticks range. Sending an IPI
      to it makes it reconsidering the next timer to program on top
      of recent updates.
      
      This may later be improved by checking if the tick is really
      stopped on the target. This would need some careful
      synchronization though. So deal with such optimization later
      and start simple.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1c20091e
  5. 18 3月, 2013 1 次提交
  6. 14 3月, 2013 2 次提交
    • A
      sched: Fix variable name misnomer, add comments · 1bf08230
      Andrei Epure 提交于
      The min_vruntime variable actually stores the maximum value.
      The added comment was taken from place_entity function.
      Signed-off-by: NAndrei Epure <epure.andrei@gmail.com>
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1363115544-1964-1-git-send-email-epure.andrei@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1bf08230
    • F
      sched: Lower chances of cputime scaling overflow · d9a3c982
      Frederic Weisbecker 提交于
      Some users have reported that after running a process with
      hundreds of threads on intensive CPU-bound loads, the cputime
      of the group started to freeze after a few days.
      
      This is due to how we scale the tick-based cputime against
      the scheduler precise execution time value.
      
      We add the values of all threads in the group and we multiply
      that against the sum of the scheduler exec runtime of the whole
      group.
      
      This easily overflows after a few days/weeks of execution.
      
      A proposed solution to solve this was to compute that multiplication
      on stime instead of utime:
         62188451
         ("cputime: Avoid multiplication overflow on utime scaling")
      
      The rationale behind that was that it's easy for a thread to
      spend most of its time in userspace under intensive CPU-bound workload
      but it's much harder to do CPU-bound intensive long run in the kernel.
      
      This postulate got defeated when a user recently reported he was still
      seeing cputime freezes after the above patch. The workload that
      triggers this issue relates to intensive networking workloads where
      most of the cputime is consumed in the kernel.
      
      To reduce much more the opportunities for multiplication overflow,
      lets reduce the multiplication factors to the remainders of the division
      between sched exec runtime and cputime. Assuming the difference between
      these shouldn't ever be that large, it could work on many situations.
      
      This gets the same results as in the upstream scaling code except for
      a small difference: the upstream code always rounds the results to
      the nearest integer not greater to what would be the precise result.
      The new code rounds to the nearest integer either greater or not
      greater. In practice this difference probably shouldn't matter but
      it's worth mentioning.
      
      If this solution appears not to be enough in the end, we'll
      need to partly revert back to the behaviour prior to commit
           0cf55e1e
           ("sched, cputime: Introduce thread_group_times()")
      
      Back then, the scaling was done on exit() time before adding the cputime
      of an exiting thread to the signal struct. And then we'll need to
      scale one-by-one the live threads cputime in thread_group_cputime(). The
      drawback may be a slightly slower code on exit time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      d9a3c982
  7. 11 3月, 2013 2 次提交
  8. 08 3月, 2013 2 次提交
    • F
      cputime: Dynamically scale cputime for full dynticks accounting · 9fbc42ea
      Frederic Weisbecker 提交于
      The full dynticks cputime accounting is able to account either
      using the tick or the context tracking subsystem. This way
      the housekeeping CPU can keep the low overhead tick based
      solution.
      
      This latter mode has a low jiffies resolution granularity and
      need to be scaled against CFS precise runtime accounting to
      improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING,
      now we also need to expand it to full dynticks accounting dynamic
      off-case as well.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fbc42ea
    • F
      context_tracking: Restore preempted context state after preempt_schedule_irq() · b22366cd
      Frederic Weisbecker 提交于
      From the context tracking POV, preempt_schedule_irq() behaves pretty much
      like an exception: It can be called anytime and schedule another task.
      
      But currently it doesn't restore the context tracking state of the preempted
      code on preempt_schedule_irq() return.
      
      As a result, if preempt_schedule_irq() is called in the tiny frame between
      user_enter() and the actual return to userspace, we resume userspace with
      the wrong context tracking state.
      
      Fix this by using exception_enter/exit() which are a perfect fit for this
      kind of issue.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      b22366cd
  9. 06 3月, 2013 7 次提交
  10. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  11. 24 2月, 2013 2 次提交
    • F
      cputime: Use local_clock() for full dynticks cputime accounting · 7f6575f1
      Frederic Weisbecker 提交于
      Running the full dynticks cputime accounting with preemptible
      kernel debugging trigger the following warning:
      
      	[    4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
      	[    4.490971] caller is native_sched_clock+0x22/0x80
      	[    4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13
      	[    4.496376] Call Trace:
      	[    4.498996]  [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0
      	[    4.501716]  [<ffffffff8101e642>] native_sched_clock+0x22/0x80
      	[    4.504434]  [<ffffffff8101db99>] sched_clock+0x9/0x10
      	[    4.507185]  [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120
      	[    4.509916]  [<ffffffff81096dd5>] task_cputime+0x35/0x60
      	[    4.512622]  [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40
      	[    4.515372]  [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0
      	[    4.518117]  [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0
      	[    4.520844]  [<ffffffff81867a10>] ? rest_init+0x160/0x160
      	[    4.523554]  [<ffffffff8117d457>] do_execve+0x37/0x40
      	[    4.526276]  [<ffffffff810021a3>] run_init_process+0x23/0x30
      	[    4.528953]  [<ffffffff81867aac>] kernel_init+0x9c/0xf0
      	[    4.531608]  [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0
      
      We use sched_clock() to perform and fixup the cputime
      accounting. However we are calling it with preemption enabled
      from the read side, which trigger the bug above.
      
      To fix this up, use local_clock() instead. It takes care of
      preemption and also provide a more reliable clock source. This
      is welcome for this kind of statistic that is widely relied on
      in userspace.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f6575f1
    • T
      sched: do not use cpu_to_node() to find an offlined cpu's node. · aa00d89c
      Tang Chen 提交于
      If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu)
      will return -1.  As a result, cpumask_of_node(nid) will return NULL.  In
      this case, find_next_bit() in for_each_cpu will get a NULL pointer and
      cause panic.
      
      Here is a call trace:
        Call Trace:
         <IRQ>
          select_fallback_rq+0x71/0x190
          try_to_wake_up+0x2cb/0x2f0
          wake_up_process+0x15/0x20
          hrtimer_wakeup+0x22/0x30
          __run_hrtimer+0x83/0x320
          hrtimer_interrupt+0x106/0x280
          smp_apic_timer_interrupt+0x69/0x99
          apic_timer_interrupt+0x6f/0x80
      
      There is a hrtimer process sleeping, whose cpu has already been
      offlined.  When it is waken up, it tries to find another cpu to run, and
      get a -1 nid.  As a result, cpumask_of_node(-1) returns NULL, and causes
      ernel panic.
      
      This patch fixes this problem by judging if the nid is -1.  If nid is
      not -1, a cpu on the same node will be picked.  Else, a online cpu on
      another node will be picked.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa00d89c
  12. 22 2月, 2013 2 次提交
  13. 20 2月, 2013 1 次提交
  14. 19 2月, 2013 1 次提交
  15. 15 2月, 2013 1 次提交
  16. 08 2月, 2013 3 次提交
  17. 05 2月, 2013 2 次提交
  18. 04 2月, 2013 1 次提交
  19. 31 1月, 2013 1 次提交
  20. 29 1月, 2013 1 次提交
  21. 28 1月, 2013 3 次提交
    • F
      cputime: Safely read cputime of full dynticks CPUs · 6a61671b
      Frederic Weisbecker 提交于
      While remotely reading the cputime of a task running in a
      full dynticks CPU, the values stored in utime/stime fields
      of struct task_struct may be stale. Its values may be those
      of the last kernel <-> user transition time snapshot and
      we need to add the tickless time spent since this snapshot.
      
      To fix this, flush the cputime of the dynticks CPUs on
      kernel <-> user transition and record the time / context
      where we did this. Then on top of this snapshot and the current
      time, perform the fixup on the reader side from task_times()
      accessors.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fixed kvm module related build errors]
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      6a61671b
    • F
      kvm: Prepare to add generic guest entry/exit callbacks · c11f11fc
      Frederic Weisbecker 提交于
      Do some ground preparatory work before adding guest_enter()
      and guest_exit() context tracking callbacks. Those will
      be later used to read the guest cputime safely when we
      run in full dynticks mode.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c11f11fc
    • F
      cputime: Use accessors to read task cputime stats · 6fac4829
      Frederic Weisbecker 提交于
      This is in preparation for the full dynticks feature. While
      remotely reading the cputime of a task running in a full
      dynticks CPU, we'll need to do some extra-computation. This
      way we can account the time it spent tickless in userspace
      since its last cputime snapshot.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      6fac4829