1. 30 9月, 2017 2 次提交
  2. 12 9月, 2017 1 次提交
  3. 09 9月, 2017 2 次提交
  4. 29 8月, 2017 1 次提交
    • Y
      smp: Avoid using two cache lines for struct call_single_data · 966a9671
      Ying Huang 提交于
      struct call_single_data is used in IPIs to transfer information between
      CPUs.  Its size is bigger than sizeof(unsigned long) and less than
      cache line size.  Currently it is not allocated with any explicit alignment
      requirements.  This makes it possible for allocated call_single_data to
      cross two cache lines, which results in double the number of the cache lines
      that need to be transferred among CPUs.
      
      This can be fixed by requiring call_single_data to be aligned with the
      size of call_single_data. Currently the size of call_single_data is the
      power of 2.  If we add new fields to call_single_data, we may need to
      add padding to make sure the size of new definition is the power of 2
      as well.
      
      Fortunately, this is enforced by GCC, which will report bad sizes.
      
      To set alignment requirements of call_single_data to the size of
      call_single_data, a struct definition and a typedef is used.
      
      To test the effect of the patch, I used the vm-scalability multiple
      thread swap test case (swap-w-seq-mt).  The test will create multiple
      threads and each thread will eat memory until all RAM and part of swap
      is used, so that huge number of IPIs are triggered when unmapping
      memory.  In the test, the throughput of memory writing improves ~5%
      compared with misaligned call_single_data, because of faster IPIs.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NHuang, Ying <ying.huang@intel.com>
      [ Add call_single_data_t and align with size of call_single_data. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      966a9671
  5. 25 8月, 2017 1 次提交
  6. 01 8月, 2017 1 次提交
    • V
      sched: cpufreq: Allow remote cpufreq callbacks · 674e7541
      Viresh Kumar 提交于
      With Android UI and benchmarks the latency of cpufreq response to
      certain scheduling events can become very critical. Currently, callbacks
      into cpufreq governors are only made from the scheduler if the target
      CPU of the event is the same as the current CPU. This means there are
      certain situations where a target CPU may not run the cpufreq governor
      for some time.
      
      One testcase to show this behavior is where a task starts running on
      CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
      system is configured such that the new tasks should receive maximum
      demand initially, this should result in CPU0 increasing frequency
      immediately. But because of the above mentioned limitation though, this
      does not occur.
      
      This patch updates the scheduler core to call the cpufreq callbacks for
      remote CPUs as well.
      
      The schedutil, ondemand and conservative governors are updated to
      process cpufreq utilization update hooks called for remote CPUs where
      the remote CPU is managed by the cpufreq policy of the local CPU.
      
      The intel_pstate driver is updated to always reject remote callbacks.
      
      This is tested with couple of usecases (Android: hackbench, recentfling,
      galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
      octa-core, single policy). Only galleryfling showed minor improvements,
      while others didn't had much deviation.
      
      The reason being that this patch only targets a corner case, where
      following are required to be true to improve performance and that
      doesn't happen too often with these tests:
      
      - Task is migrated to another CPU.
      - The task has high demand, and should take the target CPU to higher
        OPPs.
      - And the target CPU doesn't call into the cpufreq governor until the
        next tick.
      
      Based on initial work from Steve Muckle.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NSaravana Kannan <skannan@codeaurora.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      674e7541
  7. 23 6月, 2017 2 次提交
  8. 20 6月, 2017 1 次提交
    • I
      sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well · 6d3aed3d
      Ingo Molnar 提交于
      This definition of SCHED_WARN_ON():
      
       #define SCHED_WARN_ON(x)        ((void)(x))
      
      is not fully compatible with the 'real' WARN_ON_ONCE() primitive, as it
      has no return value, so it cannot be used in conditionals.
      
      Fix it.
      
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6d3aed3d
  9. 08 6月, 2017 7 次提交
  10. 15 5月, 2017 5 次提交
    • P
      sched/topology: Rename sched_group_cpus() · ae4df9d6
      Peter Zijlstra 提交于
      There's a discrepancy in naming between the sched_domain and
      sched_group cpumask accessor. Since we're doing changes, fix it.
      
        $ git grep sched_group_cpus | wc -l
        28
        $ git grep sched_domain_span | wc -l
        38
      
      Suggests changing sched_group_cpus() into sched_group_span():
      
        for i  in `git grep -l sched_group_cpus`
        do
          sed -ie 's/sched_group_cpus/sched_group_span/g' $i
        done
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ae4df9d6
    • P
      sched/topology: Rename sched_group_mask() · e5c14b1f
      Peter Zijlstra 提交于
      Since sched_group_mask() is now an independent cpumask (it no longer
      masks sched_group_cpus()), rename the thing.
      Suggested-by: NLauro Ramos Venancio <lvenanci@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e5c14b1f
    • P
      sched/topology: Add sched_group_capacity debugging · 005f874d
      Peter Zijlstra 提交于
      Add sgc::id to easier spot domain construction issues.
      
      Take the opportunity to slightly rework the group printing, because
      adding more "(id: %d)" strings makes the entire thing very hard to
      read. Also the individual groups are very hard to separate, so add
      explicit visual grouping, which allows replacing all the "(%s: %d)"
      format things with shorter "%s=%d" variants.
      
      Then fix up some inconsistencies in surrounding prints for domains.
      
      The end result looks like:
      
        [] CPU0 attaching sched-domain(s):
        []  domain-0: span=0,4 level=DIE
        []   groups: 0:{ span=0 }, 4:{ span=4 }
        []   domain-1: span=0-1,3-5,7 level=NUMA
        []    groups: 0:{ span=0,4 mask=0,4 cap=2048 }, 1:{ span=1,5 mask=1,5 cap=2048 }, 3:{ span=3,7 mask=3,7 cap=2048 }
        []    domain-2: span=0-7 level=NUMA
        []     groups: 0:{ span=0-1,3-5,7 mask=0,4 cap=6144 }, 2:{ span=1-3,5-7 mask=2,6 cap=6144 }
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      005f874d
    • P
      sched/topology: Small cleanup · 8d5dc512
      Peter Zijlstra 提交于
      Move the allocation of topology specific cpumasks into the topology
      code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8d5dc512
    • S
      sched/core: Call __schedule() from do_idle() without enabling preemption · 8663effb
      Steven Rostedt (VMware) 提交于
      I finally got around to creating trampolines for dynamically allocated
      ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
      function hook callbacks, like perf, that allocate the ftrace_ops
      descriptor via kmalloc() and friends, ftrace was not able to optimize
      the functions being traced to use a trampoline because they would also
      need to be allocated dynamically. The problem is that they cannot be
      freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
      was preempted on the trampoline. That was before Paul McKenney
      implemented synchronize_rcu_tasks() that would make sure all tasks
      (except idle) have scheduled out or have entered user space.
      
      While testing this, I triggered this bug:
      
       BUG: unable to handle kernel paging request at ffffffffa0230077
       ...
       RIP: 0010:0xffffffffa0230077
       ...
       Call Trace:
        schedule+0x5/0xe0
        schedule_preempt_disabled+0x18/0x30
        do_idle+0x172/0x220
      
      What happened was that the idle task was preempted on the trampoline.
      As synchronize_rcu_tasks() ignores the idle thread, there's nothing
      that lets ftrace know that the idle task was preempted on a trampoline.
      
      The idle task shouldn't need to ever enable preemption. The idle task
      is simply a loop that calls schedule or places the cpu into idle mode.
      In fact, having preemption enabled is inefficient, because it can
      happen when idle is just about to call schedule anyway, which would
      cause schedule to be called twice. Once for when the interrupt came in
      and was returning back to normal context, and then again in the normal
      path that the idle loop is running in, which would be pointless, as it
      had already scheduled.
      
      The only reason schedule_preempt_disable() enables preemption is to be
      able to call sched_submit_work(), which requires preemption enabled. As
      this is a nop when the task is in the RUNNING state, and idle is always
      in the running state, there's no reason that idle needs to enable
      preemption. But that means it cannot use schedule_preempt_disable() as
      other callers of that function require calling sched_submit_work().
      
      Adding a new function local to kernel/sched/ that allows idle to call
      the scheduler without enabling preemption, fixes the
      synchronize_rcu_tasks() issue, as well as removes the pointless spurious
      schedule calls caused by interrupts happening in the brief window where
      preemption is enabled just before it calls schedule.
      
      Reviewed: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170414084809.3dacde2a@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8663effb
  11. 27 4月, 2017 1 次提交
  12. 16 3月, 2017 2 次提交
    • P
      sched/core: Add {EN,DE}QUEUE_NOCLOCK flags · 0a67d1ee
      Peter Zijlstra 提交于
      Currently {en,de}queue_task() do an unconditional update_rq_clock().
      However since we want to avoid duplicate updates, so that each
      rq->lock section appears atomic in time, we need to be able to skip
      these clock updates.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0a67d1ee
    • P
      sched/core: Add rq->lock wrappers · 8a8c69c3
      Peter Zijlstra 提交于
      The missing update_rq_clock() check can work with partial rq->lock
      wrappery, since a missing wrapper can cause the warning to not be
      emitted when it should have, but cannot cause the warning to trigger
      when it should not have.
      
      The duplicate update_rq_clock() check however can cause false warnings
      to trigger. Therefore add more comprehensive rq->lock wrappery.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8a8c69c3
  13. 02 3月, 2017 14 次提交