1. 14 1月, 2015 7 次提交
  2. 09 1月, 2015 4 次提交
    • T
      sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group() · 7f1a169b
      Tetsuo Handa 提交于
      When alloc_fair_sched_group() in sched_create_group() fails,
      free_sched_group() is called, and free_fair_sched_group() is called by
      free_sched_group(). Since destroy_cfs_bandwidth() is called by
      free_fair_sched_group() without calling init_cfs_bandwidth(),
      RCU stall occurs at hrtimer_cancel():
      
        INFO: rcu_sched self-detected stall on CPU { 1}  (t=60000 jiffies g=13074 c=13073 q=0)
        Task dump for CPU 1:
        (fprintd)       R  running task        0  6249      1 0x00000088
        ...
        Call Trace:
         <IRQ>  [<ffffffff81094988>] sched_show_task+0xa8/0x110
         [<ffffffff81097acd>] dump_cpu_task+0x3d/0x50
         [<ffffffff810c3a80>] rcu_dump_cpu_stacks+0x90/0xd0
         [<ffffffff810c7751>] rcu_check_callbacks+0x491/0x700
         [<ffffffff810cbf2b>] update_process_times+0x4b/0x80
         [<ffffffff810db046>] tick_sched_handle.isra.20+0x36/0x50
         [<ffffffff810db0a2>] tick_sched_timer+0x42/0x70
         [<ffffffff810ccb19>] __run_hrtimer+0x69/0x1a0
         [<ffffffff810db060>] ? tick_sched_handle.isra.20+0x50/0x50
         [<ffffffff810ccedf>] hrtimer_interrupt+0xef/0x230
         [<ffffffff810452cb>] local_apic_timer_interrupt+0x3b/0x70
         [<ffffffff8164a465>] smp_apic_timer_interrupt+0x45/0x60
         [<ffffffff816485bd>] apic_timer_interrupt+0x6d/0x80
         <EOI>  [<ffffffff810cc588>] ? lock_hrtimer_base.isra.23+0x18/0x50
         [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
         [<ffffffff810cc9d2>] hrtimer_try_to_cancel+0x22/0xd0
         [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
         [<ffffffff810ccaa2>] hrtimer_cancel+0x22/0x30
         [<ffffffff810a3cb5>] free_fair_sched_group+0x25/0xd0
         [<ffffffff8108df46>] free_sched_group+0x16/0x40
         [<ffffffff810971bb>] sched_create_group+0x4b/0x80
         [<ffffffff810aa383>] sched_autogroup_create_attach+0x43/0x1c0
         [<ffffffff8107dc9c>] sys_setsid+0x7c/0x110
         [<ffffffff81647729>] system_call_fastpath+0x12/0x17
      
      Check whether init_cfs_bandwidth() was called before calling
      destroy_cfs_bandwidth().
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      [ Move the check into destroy_cfs_bandwidth() to aid compilability. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f1a169b
    • L
      sched/deadline: Avoid double-accounting in case of missed deadlines · 269ad801
      Luca Abeni 提交于
      The dl_runtime_exceeded() function is supposed to ckeck if
      a SCHED_DEADLINE task must be throttled, by checking if its
      current runtime is <= 0. However, it also checks if the
      scheduling deadline has been missed (the current time is
      larger than the current scheduling deadline), further
      decreasing the runtime if this happens.
      This "double accounting" is wrong:
      
      - In case of partitioned scheduling (or single CPU), this
        happens if task_tick_dl() has been called later than expected
        (due to small HZ values). In this case, the current runtime is
        also negative, and replenish_dl_entity() can take care of the
        deadline miss by recharging the current runtime to a value smaller
        than dl_runtime
      
      - In case of global scheduling on multiple CPUs, scheduling
        deadlines can be missed even if the task did not consume more
        runtime than expected, hence penalizing the task is wrong
      
      This patch fix this problem by throttling a SCHED_DEADLINE task
      only when its runtime becomes negative, and not modifying the runtime
      Signed-off-by: NLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.itSigned-off-by: NIngo Molnar <mingo@kernel.org>
      269ad801
    • L
      sched/deadline: Fix migration of SCHED_DEADLINE tasks · 6a503c3b
      Luca Abeni 提交于
      According to global EDF, tasks should be migrated between runqueues
      without checking if their scheduling deadlines and runtimes are valid.
      However, SCHED_DEADLINE currently performs such a check:
      a migration happens doing:
      
      	deactivate_task(rq, next_task, 0);
      	set_task_cpu(next_task, later_rq->cpu);
      	activate_task(later_rq, next_task, 0);
      
      which ends up calling dequeue_task_dl(), setting the new CPU, and then
      calling enqueue_task_dl().
      
      enqueue_task_dl() then calls enqueue_dl_entity(), which calls
      update_dl_entity(), which can modify scheduling deadline and runtime,
      breaking global EDF scheduling.
      
      As a result, some of the properties of global EDF are not respected:
      for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
      two cores can have unbounded response times for the third task even
      if 30/80+40/80+120/170 = 1.5809 < 2
      
      This can be fixed by invoking update_dl_entity() only in case of
      wakeup, or if this is a new SCHED_DEADLINE task.
      Signed-off-by: NLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.itSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6a503c3b
    • Y
      sched: Fix odd values in effective_load() calculations · 32a8df4e
      Yuyang Du 提交于
      In effective_load, we have (long w * unsigned long tg->shares) / long W,
      when w is negative, it is cast to unsigned long and hence the product is
      insanely large. Fix this by casting tg->shares to long.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NYuyang Du <yuyang.du@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      32a8df4e
  3. 23 12月, 2014 1 次提交
  4. 11 12月, 2014 1 次提交
  5. 08 12月, 2014 1 次提交
  6. 04 12月, 2014 1 次提交
  7. 24 11月, 2014 1 次提交
    • T
      sched: Provide update_curr callbacks for stop/idle scheduling classes · 90e362f4
      Thomas Gleixner 提交于
      Chris bisected a NULL pointer deference in task_sched_runtime() to
      commit 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
      inconsistency'.
      
      Chris observed crashes in atop or other /proc walking programs when he
      started fork bombs on his machine.  He assumed that this is a new exit
      race, but that does not make any sense when looking at that commit.
      
      What's interesting is that, the commit provides update_curr callbacks
      for all scheduling classes except stop_task and idle_task.
      
      While nothing can ever hit that via the clock_nanosleep() and
      clock_gettime() interfaces, which have been the target of the commit in
      question, the author obviously forgot that there are other code paths
      which invoke task_sched_runtime()
      
      do_task_stat(()
       thread_group_cputime_adjusted()
         thread_group_cputime()
           task_cputime()
             task_sched_runtime()
              if (task_current(rq, p) && task_on_rq_queued(p)) {
                update_rq_clock(rq);
                up->sched_class->update_curr(rq);
              }
      
      If the stats are read for a stomp machine task, aka 'migration/N' and
      that task is current on its cpu, this will happily call the NULL pointer
      of stop_task->update_curr.  Ooops.
      
      Chris observation that this happens faster when he runs the fork bomb
      makes sense as the fork bomb will kick migration threads more often so
      the probability to hit the issue will increase.
      
      Add the missing update_curr callbacks to the scheduler classes stop_task
      and idle_task.  While idle tasks cannot be monitored via /proc we have
      other means to hit the idle case.
      
      Fixes: 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
      Reported-by: NChris Mason <clm@fb.com>
      Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90e362f4
  8. 16 11月, 2014 11 次提交
  9. 10 11月, 2014 1 次提交
    • A
      sched/numa: Fix out of bounds read in sched_init_numa() · c123588b
      Andrey Ryabinin 提交于
      On latest mm + KASan patchset I've got this:
      
          ==================================================================
          BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c
          =============================================================================
          BUG kmalloc-8 (Not tainted): kasan error
          -----------------------------------------------------------------------------
      
          Disabling lock debugging due to kernel taint
          INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0
           __slab_alloc+0x4b4/0x4f0
           __kmalloc_track_caller+0x15f/0x1e0
           kstrdup+0x44/0x90
           alloc_vfsmnt+0xb0/0x2c0
           vfs_kern_mount+0x35/0x190
           kern_mount_data+0x25/0x50
           pid_ns_prepare_proc+0x19/0x50
           alloc_pid+0x5e2/0x630
           copy_process.part.41+0xdf5/0x2aa0
           do_fork+0xf5/0x460
           kernel_thread+0x21/0x30
           rest_init+0x1e/0x90
           start_kernel+0x522/0x531
           x86_64_start_reservations+0x2a/0x2c
           x86_64_start_kernel+0x15b/0x16a
          INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080
          INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70
      
          Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
          Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5                          proc.kk.
          Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc                          ........
          Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
          CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B          3.18.0-rc3-mm1+ #108
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
           ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18
           ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48
           ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20
          Call Trace:
          dump_stack (lib/dump_stack.c:52)
          print_trailer (mm/slub.c:645)
          object_err (mm/slub.c:652)
          ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
          kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178)
          ? kasan_poison_shadow (mm/kasan/kasan.c:48)
          ? kasan_unpoison_shadow (mm/kasan/kasan.c:54)
          ? kasan_poison_shadow (mm/kasan/kasan.c:48)
          ? kasan_kmalloc (mm/kasan/kasan.c:311)
          __asan_load4 (mm/kasan/kasan.c:371)
          ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
          sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063)
          kernel_init_freeable (init/main.c:869 init/main.c:997)
          ? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248)
          ? rest_init (init/main.c:924)
          kernel_init (init/main.c:929)
          ? rest_init (init/main.c:924)
          ret_from_fork (arch/x86/kernel/entry_64.S:348)
          ? rest_init (init/main.c:924)
          Read of size 4 by task swapper/0:
          Memory state around the buggy address:
           ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc
           ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          >ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc
                                                                    ^
           ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
           ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
           ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
          ==================================================================
      
      Zero 'level' (e.g. on non-NUMA system) causing out of bounds
      access in this line:
      
           sched_max_numa_distance = sched_domains_numa_distance[level - 1];
      
      Fix this by exiting from sched_init_numa() earlier.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Fixes: 9942f79b ("sched/numa: Export info needed for NUMA balancing on complex topologies")
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c123588b
  10. 04 11月, 2014 12 次提交