1. 18 11月, 2010 6 次提交
    • P
      sched: Fix update_cfs_load() synchronization · e33078ba
      Paul Turner 提交于
      Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with
      the put path since nr_running accounting occurs at deactivation.
      
      It's also not safe to make the removal decision based on load_avg as this fails
      with both high periods and low shares.  Resolve this by clipping history after
      4 periods without activity.
      
      Note: the above will always occur from update_shares() since in the
      last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load
      is called.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.933428187@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e33078ba
    • P
      sched: Fix load corruption from update_cfs_shares() · f0d7442a
      Paul Turner 提交于
      As part of enqueue_entity both a new entity weight and its contribution to the
      queuing cfs_rq / rq are updated.  Since update_cfs_shares will only update the
      queueing weights when the entity is on_rq (which in this case it is not yet),
      there's a dependency loop here:
      
      update_cfs_shares needs account_entity_enqueue to update cfs_rq->load.weight
      account_entity_enqueue needs the updated weight for the queuing cfs_rq load[*]
      
      Fix this and avoid spurious dequeue/enqueues by issuing update_cfs_shares as
      if we had accounted the enqueue already.
      
      This was also resulting in rq->load corruption previously.
      
      [*]: this dependency also exists when using the group cfs_rq w/
           update_cfs_shares as the weight of the enqueued entity changes
           without the load being updated.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.844900206@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f0d7442a
    • P
      sched: Make tg_shares_up() walk on-demand · 9e3081ca
      Peter Zijlstra 提交于
      Make tg_shares_up() use the active cgroup list, this means we cannot
      do a strict bottom-up walk of the hierarchy, but assuming its a very
      wide tree with a small number of active groups it should be a win.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.754159484@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9e3081ca
    • P
      sched: Implement on-demand (active) cfs_rq list · 3d4b47b4
      Peter Zijlstra 提交于
      Make certain load-balance actions scale per number of active cgroups
      instead of the number of existing cgroups.
      
      This makes wakeup/sleep paths more expensive, but is a win for systems
      where the vast majority of existing cgroups are idle.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.666535048@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3d4b47b4
    • P
      sched: Rewrite tg_shares_up) · 2069dd75
      Peter Zijlstra 提交于
      By tracking a per-cpu load-avg for each cfs_rq and folding it into a
      global task_group load on each tick we can rework tg_shares_up to be
      strictly per-cpu.
      
      This should improve cpu-cgroup performance for smp systems
      significantly.
      
      [ Paul: changed to use queueing cfs_rq + bug fixes ]
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101115234937.580480400@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2069dd75
    • P
      sched: Simplify cpu-hot-unplug task migration · 48c5ccae
      Peter Zijlstra 提交于
      While discussing the need for sched_idle_next(), Oleg remarked that
      since try_to_wake_up() ensures sleeping tasks will end up running on a
      sane cpu, we can do away with migrate_live_tasks().
      
      If we then extend the existing hack of migrating current from
      CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
      need for the sched_idle_next() abomination disappears as well, since
      idle will be the only possible thread left after the migration thread
      stops.
      
      This greatly simplifies the hot-unplug task migration path, as can be
      seen from the resulting code reduction (and about half the new lines
      are comments).
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1289851597.2109.547.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48c5ccae
  2. 16 11月, 2010 1 次提交
  3. 12 11月, 2010 3 次提交
  4. 11 11月, 2010 1 次提交
    • S
      perf_events: Fix time tracking in samples · eed01528
      Stephane Eranian 提交于
      This patch corrects time tracking in samples. Without this patch
      both time_enabled and time_running are bogus when user asks for
      PERF_SAMPLE_READ.
      
      One uses PERF_SAMPLE_READ to sample the values of other counters
      in each sample. Because of multiplexing, it is necessary to know
      both time_enabled, time_running to be able to scale counts correctly.
      
      In this second version of the patch, we maintain a shadow
      copy of ctx->time which allows us to compute ctx->time without
      calling update_context_time() from NMI context. We avoid the
      issue that update_context_time() must always be called with
      ctx->lock held.
      
      We do not keep shadow copies of the other event timings
      because if the lead event is overflowing then it is active
      and thus it's been scheduled in via event_sched_in() in
      which case neither tstamp_stopped, tstamp_running can be modified.
      
      This timing logic only applies to samples when PERF_SAMPLE_READ
      is used.
      
      Note that this patch does not address timing issues related
      to sampling inheritance between tasks. This will be addressed
      in a future patch.
      
      With this patch, the libpfm4 example task_smpl now reports
      correct counts (shown on 2.4GHz Core 2):
      
      $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears  noploop 5
      noploop for 5 seconds
      IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3
      	2,400,000,254 unhalted_core_cycles:u (33)
      	2,399,273,744 instructions_retired:u (34)
      	53,340 baclears (35)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eed01528
  5. 10 11月, 2010 1 次提交
    • C
      block: remove REQ_HARDBARRIER · 02e031cb
      Christoph Hellwig 提交于
      REQ_HARDBARRIER is dead now, so remove the leftovers.  What's left
      at this point is:
      
       - various checks inside the block layer.
       - sanity checks in bio based drivers.
       - now unused bio_empty_barrier helper.
       - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
         but Xen really needs to sort out it's barrier situaton.
       - setting of ordered tags in uas - dead code copied from old scsi
         drivers.
       - scsi different retry for barriers - it's dead and should have been
         removed when flushes were converted to FS requests.
       - blktrace handling of barriers - removed.  Someone who knows blktrace
         better should add support for REQ_FLUSH and REQ_FUA, though.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02e031cb
  6. 06 11月, 2010 2 次提交
    • D
      watchdog: Fix section mismatch and potential undefined behavior. · 433039e9
      David Daney 提交于
      Commit d9ca07a0 ("watchdog: Avoid kernel crash when disabling
      watchdog") introduces a section mismatch.
      
      Now that we reference no_watchdog from non-__init code it can no longer
      be __initdata.
      Signed-off-by: NDavid Daney <ddaney@caviumnetworks.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      433039e9
    • O
      posix-cpu-timers: workaround to suppress the problems with mt exec · e0a70217
      Oleg Nesterov 提交于
      posix-cpu-timers.c correctly assumes that the dying process does
      posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD
      timers from signal->cpu_timers list.
      
      But, it also assumes that timer->it.cpu.task is always the group
      leader, and thus the dead ->task means the dead thread group.
      
      This is obviously not true after de_thread() changes the leader.
      After that almost every posix_cpu_timer_ method has problems.
      
      It is not simple to fix this bug correctly. First of all, I think
      that timer->it.cpu should use struct pid instead of task_struct.
      Also, the locking should be reworked completely. In particular,
      tasklist_lock should not be used at all. This all needs a lot of
      nontrivial and hard-to-test changes.
      
      Change __exit_signal() to do posix_cpu_timers_exit_group() when
      the old leader dies during exec. This is not the fix, just the
      temporary hack to hide the problem for 2.6.37 and stable. IOW,
      this is obviously wrong but this is what we currently have anyway:
      cpu timers do not work after mt exec.
      
      In theory this change adds another race. The exiting leader can
      detach the timers which were attached to the new leader. However,
      the window between de_thread() and release_task() is small, we
      can pretend that sys_timer_create() was called before de_thread().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0a70217
  7. 05 11月, 2010 1 次提交
  8. 30 10月, 2010 12 次提交
  9. 29 10月, 2010 1 次提交
  10. 28 10月, 2010 12 次提交