1. 24 4月, 2011 1 次提交
  2. 21 4月, 2011 1 次提交
  3. 19 4月, 2011 4 次提交
    • P
      sched: Fix sched_domain iterations vs. RCU · 057f3fad
      Peter Zijlstra 提交于
      Vladis Kletnieks reported a new RCU debug warning in the scheduler.
      
      Since commit dce840a0 ("sched: Dynamically allocate sched_domain/
      sched_group data-structures") the sched_domain trees are protected by
      RCU instead of RCU-sched.
      
      This means that we need to include rcu_read_lock() protection when we
      iterate them since disabling preemption doesn't suffice anymore.
      
      Reported-by: Valdis.Kletnieks@vt.edu
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1302882741.2388.241.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>
      057f3fad
    • V
      sched: Next buddy hint on sleep and preempt path · 2f36825b
      Venkatesh Pallipadi 提交于
      When a task in a taskgroup sleeps, pick_next_task starts all the way back at
      the root and picks the task/taskgroup with the min vruntime across all
      runnable tasks.
      
      But when there are many frequently sleeping tasks across different taskgroups,
      it makes better sense to stay with same taskgroup for its slice period (or
      until all tasks in the taskgroup sleeps) instead of switching cross taskgroup
      on each sleep after a short runtime.
      
      This helps specifically where taskgroups corresponds to a process with
      multiple threads. The change reduces the number of CR3 switches in this case.
      
      Example:
      
      Two taskgroups with 2 threads each which are running for 2ms and
      sleeping for 1ms. Looking at sched:sched_switch shows:
      
      BEFORE: taskgroup_1 threads [5004, 5005], taskgroup_2 threads [5016, 5017]
            cpu-soaker-5004  [003]  3683.391089
            cpu-soaker-5016  [003]  3683.393106
            cpu-soaker-5005  [003]  3683.395119
            cpu-soaker-5017  [003]  3683.397130
            cpu-soaker-5004  [003]  3683.399143
            cpu-soaker-5016  [003]  3683.401155
            cpu-soaker-5005  [003]  3683.403168
            cpu-soaker-5017  [003]  3683.405170
      
      AFTER: taskgroup_1 threads [21890, 21891], taskgroup_2 threads [21934, 21935]
            cpu-soaker-21890 [003]   865.895494
            cpu-soaker-21935 [003]   865.897506
            cpu-soaker-21934 [003]   865.899520
            cpu-soaker-21935 [003]   865.901532
            cpu-soaker-21934 [003]   865.903543
            cpu-soaker-21935 [003]   865.905546
            cpu-soaker-21891 [003]   865.907548
            cpu-soaker-21890 [003]   865.909560
            cpu-soaker-21891 [003]   865.911571
            cpu-soaker-21890 [003]   865.913582
            cpu-soaker-21891 [003]   865.915594
            cpu-soaker-21934 [003]   865.917606
      
      Similar problem is there when there are multiple taskgroups and say a task A
      preempts currently running task B of taskgroup_1. On schedule, pick_next_task
      can pick an unrelated task on taskgroup_2. Here it would be better to give some
      preference to task B on pick_next_task.
      
      A simple (may be extreme case) benchmark I tried was tbench with 2 tbench
      client processes with 2 threads each running on a single CPU. Avg throughput
      across 5 50 sec runs was:
      
       BEFORE: 105.84 MB/sec
       AFTER:  112.42 MB/sec
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1302802253-25760-1-git-send-email-venki@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2f36825b
    • V
      sched: Make set_*_buddy() work on non-task entities · 69c80f3e
      Venkatesh Pallipadi 提交于
      Make set_*_buddy() work on non-task sched_entity, to facilitate the
      use of next_buddy to cache a group entity in cases where one of the
      tasks within that entity sleeps or gets preempted.
      
      set_skip_buddy() was incorrectly comparing the policy of task that is
      yielding to be not equal to SCHED_IDLE. Yielding should happen even
      when task yielding is SCHED_IDLE. This change removes the policy check
      on the yielding task.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1302744070-30079-2-git-send-email-venki@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      69c80f3e
    • L
      next_pidmap: fix overflow condition · c78193e9
      Linus Torvalds 提交于
      next_pidmap() just quietly accepted whatever 'last' pid that was passed
      in, which is not all that safe when one of the users is /proc.
      
      Admittedly the proc code should do some sanity checking on the range
      (and that will be the next commit), but that doesn't mean that the
      helper functions should just do that pidmap pointer arithmetic without
      checking the range of its arguments.
      
      So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
      doesn't really matter, the for-loop does check against the end of the
      pidmap array properly (it's only the actual pointer arithmetic overflow
      case we need to worry about, and going one bit beyond isn't going to
      overflow).
      
      [ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]
      Reported-by: NTavis Ormandy <taviso@cmpxchg8b.com>
      Analyzed-by: NRobert Święcki <robert@swiecki.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c78193e9
  4. 16 4月, 2011 2 次提交
    • J
      block: make unplug timer trace event correspond to the schedule() unplug · 49cac01e
      Jens Axboe 提交于
      It's a pretty close match to what we had before - the timer triggering
      would mean that nobody unplugged the plug in due time, in the new
      scheme this matches very closely what the schedule() unplug now is.
      It's essentially the difference between an explicit unplug (IO unplug)
      or an implicit unplug (timer unplug, we scheduled with pending IO
      queued).
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      49cac01e
    • J
      block: let io_schedule() flush the plug inline · a237c1c5
      Jens Axboe 提交于
      Linus correctly observes that the most important dispatch cases
      are now done from kblockd, this isn't ideal for latency reasons.
      The original reason for switching dispatches out-of-line was to
      avoid too deep a stack, so by _only_ letting the "accidental"
      flush directly in schedule() be guarded by offload to kblockd,
      we should be able to get the best of both worlds.
      
      So add a blk_schedule_flush_plug() that offloads to kblockd,
      and only use that from the schedule() path.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a237c1c5
  5. 15 4月, 2011 1 次提交
  6. 14 4月, 2011 21 次提交
  7. 13 4月, 2011 1 次提交
  8. 12 4月, 2011 4 次提交
  9. 11 4月, 2011 5 次提交