1. 19 10月, 2010 6 次提交
    • N
      sched: Force balancing on newidle balance if local group has capacity · fab47622
      Nikhil Rao 提交于
      This patch forces a load balance on a newly idle cpu when the local group has
      extra capacity and the busiest group does not have any. It improves system
      utilization when balancing tasks with a large weight differential.
      
      Under certain situations, such as a niced down task (i.e. nice = -15) in the
      presence of nr_cpus NICE0 tasks, the niced task lands on a sched group and
      kicks away other tasks because of its large weight. This leads to sub-optimal
      utilization of the machine. Even though the sched group has capacity, it does
      not pull tasks because sds.this_load >> sds.max_load, and f_b_g() returns NULL.
      
      With this patch, if the local group has extra capacity, we shortcut the checks
      in f_b_g() and try to pull a task over. A sched group has extra capacity if the
      group capacity is greater than the number of running tasks in that group.
      
      Thanks to Mike Galbraith for discussions leading to this patch and for the
      insight to reuse SD_NEWIDLE_BALANCE.
      Signed-off-by: NNikhil Rao <ncrao@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1287173550-30365-4-git-send-email-ncrao@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fab47622
    • N
      sched: Set group_imb only a task can be pulled from the busiest cpu · 2582f0eb
      Nikhil Rao 提交于
      When cycling through sched groups to determine the busiest group, set
      group_imb only if the busiest cpu has more than 1 runnable task. This patch
      fixes the case where two cpus in a group have one runnable task each, but there
      is a large weight differential between these two tasks. The load balancer is
      unable to migrate any task from this group, and hence do not consider this
      group to be imbalanced.
      Signed-off-by: NNikhil Rao <ncrao@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286996978-7007-3-git-send-email-ncrao@google.com>
      [ small code readability edits ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2582f0eb
    • N
      sched: Do not consider SCHED_IDLE tasks to be cache hot · ef8002f6
      Nikhil Rao 提交于
      This patch adds a check in task_hot to return if the task has SCHED_IDLE
      policy. SCHED_IDLE tasks have very low weight, and when run with regular
      workloads, are typically scheduled many milliseconds apart. There is no
      need to consider these tasks hot for load balancing.
      Signed-off-by: NNikhil Rao <ncrao@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1287173550-30365-2-git-send-email-ncrao@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef8002f6
    • L
      sched: Drop all load weight manipulation for RT tasks · 17bdcf94
      Linus Walleij 提交于
      Load weights are for the CFS, they do not belong in the RT task. This makes all
      RT scheduling classes leave the CFS weights alone.
      
      This fixes a real bug as well: I noticed the following phonomena: a process
      elevated to SCHED_RR forks with SCHED_RESET_ON_FORK set, and the child is
      indeed SCHED_OTHER, and the niceval is indeed reset to 0. However the weight
      inserted by set_load_weight() remains at 0, giving the task insignificat
      priority.
      
      With this fix, the weight is reset to what the task had before being elevated
      to SCHED_RR/SCHED_FIFO.
      
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Walleij <linus.walleij@stericsson.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1286807811-10568-1-git-send-email-linus.walleij@stericsson.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17bdcf94
    • P
      sched: Create special class for stop/migrate work · 34f971f6
      Peter Zijlstra 提交于
      In order to separate the stop/migrate work thread from the SCHED_FIFO
      implementation, create a special class for it that is of higher priority than
      SCHED_FIFO itself.
      
      This currently solves a problem where cpu-hotplug consumes so much cpu-time
      that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
      timer pending on the now dead cpu.
      
      It is also required for when we add the planned deadline scheduling class above
      SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks.
      Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1285165776.2275.1022.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34f971f6
    • P
      sched: Unindent labels · 49246274
      Peter Zijlstra 提交于
      Labels should be on column 0.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49246274
  2. 14 10月, 2010 11 次提交
  3. 13 10月, 2010 5 次提交
    • R
      ARM: relax ioremap prohibition (309caa9c) for -final and -stable · 06c10884
      Russell King 提交于
      ... but produce a big warning about the problem as encouragement
      for people to fix their drivers.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      06c10884
    • R
    • M
      ARM: 6440/1: ep93xx: DMA: fix channel_disable · 10d48b39
      Mika Westerberg 提交于
      When channel_disable() is called, it disables per channel interrupts and
      waits until channels state becomes STATE_STALL, and then disables the
      channel. Now, if the DMA transfer is disabled while the channel is in
      STATE_NEXT we will not wait anything and disable the channel immediately.
      This seems to cause weird data corruption for example in audio transfers.
      
      Fix is to wait while we are in STATE_NEXT or STATE_ON and only then
      disable the channel.
      Signed-off-by: NMika Westerberg <mika.westerberg@iki.fi>
      Acked-by: NRyan Mallon <ryan@bluewatersys.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      10d48b39
    • L
      Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 0acc1b2a
      Linus Torvalds 提交于
      * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Move TSC reset out of vmcb_init
        KVM: x86: Fix SVM VMCB reset
      0acc1b2a
    • S
      ring-buffer: Fix typo of time extends per page · d0134324
      Steven Rostedt 提交于
      Time stamps for the ring buffer are created by the difference between
      two events. Each page of the ring buffer holds a full 64 bit timestamp.
      Each event has a 27 bit delta stamp from the last event. The unit of time
      is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
      happen more than 134 milliseconds apart, a time extend is inserted
      to add more bits for the delta. The time extend has 59 bits, which
      is good for ~18 years.
      
      Currently the time extend is committed separately from the event.
      If an event is discarded before it is committed, due to filtering,
      the time extend still exists. If all events are being filtered, then
      after ~134 milliseconds a new time extend will be added to the buffer.
      
      This can only happen till the end of the page. Since each page holds
      a full timestamp, there is no reason to add a time extend to the
      beginning of a page. Time extends can only fill a page that has actual
      data at the beginning, so there is no fear that time extends will fill
      more than a page without any data.
      
      When reading an event, a loop is made to skip over time extends
      since they are only used to maintain the time stamp and are never
      given to the caller. As a paranoid check to prevent the loop running
      forever, with the knowledge that time extends may only fill a page,
      a check is made that tests the iteration of the loop, and if the
      iteration is more than the number of time extends that can fit in a page
      a warning is printed and the ring buffer is disabled (all of ftrace
      is also disabled with it).
      
      There is another event type that is called a TIMESTAMP which can
      hold 64 bits of data in the theoretical case that two events happen
      18 years apart. This code has not been implemented, but the name
      of this event exists, as well as the structure for it. The
      size of a TIMESTAMP is 16 bytes, where as a time extend is only
      8 bytes. The macro used to calculate how many time extends can fit on
      a page used the TIMESTAMP size instead of the time extend size
      cutting the amount in half.
      
      The following test case can easily trigger the warning since we only
      need to have half the page filled with time extends to trigger the
      warning:
      
       # cd /sys/kernel/debug/tracing/
       # echo function > current_tracer
       # echo 'common_pid < 0' > events/ftrace/function/filter
       # echo > trace
       # echo 1 > trace_marker
       # sleep 120
       # cat trace
      
      Enabling the function tracer and then setting the filter to only trace
      functions where the process id is negative (no events), then clearing
      the trace buffer to ensure that we have nothing in the buffer,
      then write to trace_marker to add an event to the beginning of a page,
      sleep for 2 minutes (only 35 seconds is probably needed, but this
      guarantees the bug), and then finally reading the trace which will
      trigger the bug.
      
      This patch fixes the typo and prevents the false positive of that warning.
      Reported-by: NHans J. Koch <hjk@linutronix.de>
      Tested-by: NHans J. Koch <hjk@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stable Kernel <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d0134324
  4. 12 10月, 2010 13 次提交
  5. 11 10月, 2010 5 次提交