1. 11 12月, 2016 1 次提交
    • V
      sched/core: Fix find_idlest_group() for fork · f519a3f1
      Vincent Guittot 提交于
      During fork, the utilization of a task is init once the rq has been
      selected because the current utilization level of the rq is used to
      set the utilization of the fork task. As the task's utilization is
      still 0 at this step of the fork sequence, it doesn't make sense to
      look for some spare capacity that can fit the task's utilization.
      Furthermore, I can see perf regressions for the test:
      
         hackbench -P -g 1
      
      because the least loaded policy is always bypassed and tasks are not
      spread during fork.
      
      With this patch and the fix below, we are back to same performances as
      for v4.8. The fix below is only a temporary one used for the test
      until a smarter solution is found because we can't simply remove the
      test which is useful for others benchmarks
      
      | @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
      |
      |	avg_cost = this_sd->avg_scan_cost;
      |
      | -	/*
      | -	 * Due to large variance we need a large fuzz factor; hackbench in
      | -	 * particularly is sensitive here.
      | -	 */
      | -	if ((avg_idle / 512) < avg_cost)
      | -		return -1;
      | -
      |	time = local_clock();
      |
      |	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
      Tested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: kernellwp@gmail.com
      Cc: umgwanakikbuti@gmail.com
      Cc: yuyang.du@intel.comc
      Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f519a3f1
  2. 08 12月, 2016 6 次提交
    • O
      kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() · 8fb9dcbd
      Oleg Nesterov 提交于
      kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this
      only makes sense if this kthread can be parked/unparked by cpuhp code.
      kthread workers never call kthread_parkme() so this has no effect.
      
      Change __kthread_create_worker() to simply call kthread_bind(task, cpu).
      The very fact that kthread_create_on_cpu() doesn't accept a generic fmt
      shows that it should not be used outside of smpboot.c.
      
      Now, the only reason we can not unexport this helper and move it into
      smpboot.c is that it sets kthread->cpu and struct kthread is not exported.
      And the only reason we can not kill kthread->cpu is that kthread_unpark()
      is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can
      not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175110.GA5342@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8fb9dcbd
    • O
      kthread: Don't use to_live_kthread() in kthread_[un]park() · cf380a4a
      Oleg Nesterov 提交于
      Now that to_kthread() is always validm change kthread_park() and
      kthread_unpark() to use it and kill to_live_kthread().
      
      The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set
      then the task has called complete(&self->parked) and there the function
      cannot race against a concurrent kthread_stop() and exit.
      
      kthread_park() is more tricky, because its semantics are not well
      defined. It returns -ENOSYS if the thread exited but this can never happen
      and as Roman pointed out kthread_park() can obviously block forever if it
      would race with the exiting kthread.
      
      The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c)
      is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads()
      clears *ht->store, smpboot_park_thread() checks it is not NULL under the same
      smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other
      callers are fine too.
      
      But it has two more users:
      
      - watchdog_park_threads():
      
        The code is actually correct, get_online_cpus() ensures that
        kthread_park() can't race with itself (note that kthread_park() can't
        handle this race correctly), but it should not use kthread_park()
        directly.
      
      - drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use
        kthread_park() either.
      
        kthread_park() must not be called after amd_sched_fini() which does
        kthread_stop(), otherwise even to_live_kthread() is not safe because
        task_struct can be already freed and sched->thread can point to nowhere.
      
      The usage of kthread_park/unpark should either be restricted to core code
      which is properly protected against the exit race or made more robust so it
      is safe to use it in drivers.
      
      To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175107.GA5339@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cf380a4a
    • O
      kthread: Don't use to_live_kthread() in kthread_stop() · efb29fbf
      Oleg Nesterov 提交于
      kthread_stop() had to use to_live_kthread() simply because it was not
      possible to access kthread->exited after the exiting task clears
      task_struct->vfork_done. Now that to_kthread() is always valid,
      wake_up_process() + wait_for_completion() can be done
      ununconditionally. It's not an issue anymore if the task has already issued
      complete_vfork_done() or died.
      
      The exiting task can get the spurious wakeup after mm_release() but this is
      possible without this change too and is fine; do_task_dead() ensures that
      this can't make any harm.
      
      As a further enhancement this could be converted to task_work_add() later,
      so ->vfork_done can be avoided completely.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175103.GA5336@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      efb29fbf
    • O
      Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in... · eff96625
      Oleg Nesterov 提交于
      Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
      
      This reverts commit 23196f2e.
      
      Now that struct kthread is kmalloc'ed and not longer on the task stack
      there is no need anymore to pin the stack.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175100.GA5333@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      
      eff96625
    • O
      kthread: Make struct kthread kmalloc'ed · 1da5c46f
      Oleg Nesterov 提交于
      commit 23196f2e "kthread: Pin the stack via try_get_task_stack() /
      put_task_stack() in to_live_kthread() function" is a workaround for the
      fragile design of struct kthread being allocated on the task stack.
      
      struct kthread in its current form should be removed, but this needs
      cleanups outside of kthread.c.
      
      As a first step move struct kthread away from the task stack by making it
      kmalloc'ed. This allows to access kthread.exited without the magic of
      trying to pin task stack and the try logic in to_live_kthread().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1da5c46f
    • K
      kcov: add missing #include <linux/sched.h> · 166ad0e1
      Kefeng Wang 提交于
      In __sanitizer_cov_trace_pc we use task_struct and fields within it, but
      as we haven't included <linux/sched.h>, it is not guaranteed to be
      defined.  While we usually happen to acquire the definition through a
      transitive include, this is fragile (and hasn't been true in the past,
      causing issues with backports).
      
      Include <linux/sched.h> to avoid any fragility.
      
      [mark.rutland@arm.com: rewrote changelog]
      Link: http://lkml.kernel.org/r/1481007384-27529-1-git-send-email-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      166ad0e1
  3. 06 12月, 2016 2 次提交
    • D
      lockdep: Fix report formatting · f943fe0f
      Dmitry Vyukov 提交于
      Since commit:
      
        4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines")
      
      printk() requires KERN_CONT to continue log messages. Lots of printk()
      in lockdep.c and print_ip_sym() don't have it. As the result lockdep
      reports are completely messed up.
      
      Add missing KERN_CONT and inline print_ip_sym() where necessary.
      
      Example of a messed up report:
      
        0-rc5+ #41 Not tainted
        -------------------------------------------------------
        syz-executor0/5036 is trying to acquire lock:
         (
        rtnl_mutex
        ){+.+.+.}
        , at:
        [<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20
        but task is already holding lock:
         (
        &net->packet.sklist_lock
        ){+.+...}
        , at:
        [<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3
         (
        &net->packet.sklist_lock
        +.+...}
        ...
      
      Without this patch all scripts that parse kernel bug reports are broken.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andreyknvl@google.com
      Cc: aryabinin@virtuozzo.com
      Cc: joe@perches.com
      Cc: syzkaller@googlegroups.com
      Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f943fe0f
    • D
      perf/core: Remove invalid warning from list_update_cgroup_even()t · 8fc31ce8
      David Carrillo-Cisneros 提交于
      The warning introduced in commit:
      
        864c2357 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
      
      assumed that a cgroup switch always precedes list_del_event. This is
      not the case. Remove warning.
      
      Make sure that cpuctx->cgrp is NULL until a cgroup event is sched in
      or ctx->nr_cgroups == 0.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1480841177-27299-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8fc31ce8
  4. 02 12月, 2016 2 次提交
    • T
      locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() · 1be5d4fa
      Thomas Gleixner 提交于
      While debugging the rtmutex unlock vs. dequeue race Will suggested to use
      READ_ONCE() in rt_mutex_owner() as it might race against the
      cmpxchg_release() in unlock_rt_mutex_safe().
      
      Will: "It's a minor thing which will most likely not matter in practice"
      
      Careful search did not unearth an actual problem in todays code, but it's
      better to be safe than surprised.
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1be5d4fa
    • T
      locking/rtmutex: Prevent dequeue vs. unlock race · dbb26055
      Thomas Gleixner 提交于
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: NDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: NDavid Daney <david.daney@cavium.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbb26055
  5. 01 12月, 2016 1 次提交
  6. 30 11月, 2016 1 次提交
    • L
      Re-enable CONFIG_MODVERSIONS in a slightly weaker form · faaae2a5
      Linus Torvalds 提交于
      This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC
      information in order to work around the issue that newer binutils
      versions seem to occasionally drop the CRC on the floor.  binutils 2.26
      seems to work fine, while binutils 2.27 seems to break MODVERSIONS of
      symbols that have been defined in assembler files.
      
      [ We've had random missing CRC's before - it may be an old problem that
        just is now reliably triggered with the weak asm symbols and a new
        version of binutils ]
      
      Some day I really do want to remove MODVERSIONS entirely.  Sadly, today
      does not appear to be that day: Debian people apparently do want the
      option to enable MODVERSIONS to make it easier to have external modules
      across kernel versions, and this seems to be a fairly minimal fix for
      the annoying problem.
      
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Acked-by: NMichal Marek <mmarek@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faaae2a5
  7. 24 11月, 2016 2 次提交
  8. 23 11月, 2016 2 次提交
  9. 22 11月, 2016 2 次提交
    • O
      sched/autogroup: Do not use autogroup->tg in zombie threads · 8e5bfa8c
      Oleg Nesterov 提交于
      Exactly because for_each_thread() in autogroup_move_group() can't see it
      and update its ->sched_task_group before _put() and possibly free().
      
      So the exiting task needs another sched_move_task() before exit_notify()
      and we need to re-introduce the PF_EXITING (or similar) check removed by
      the previous change for another reason.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hartsjc@redhat.com
      Cc: vbendel@redhat.com
      Cc: vlovejoy@redhat.com
      Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8e5bfa8c
    • O
      sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task() · 18f649ef
      Oleg Nesterov 提交于
      The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
      it, but see the next patch.
      
      However the comment is correct in that autogroup_move_group() must always
      change task_group() for every thread so the sysctl_ check is very wrong;
      we can race with cgroups and even sys_setsid() is not safe because a task
      running with task_group() == ag->tg must participate in refcounting:
      
      	int main(void)
      	{
      		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);
      
      		assert(sctl > 0);
      		if (fork()) {
      			wait(NULL); // destroy the child's ag/tg
      			pause();
      		}
      
      		assert(pwrite(sctl, "1\n", 2, 0) == 2);
      		assert(setsid() > 0);
      		if (fork())
      			pause();
      
      		kill(getppid(), SIGKILL);
      		sleep(1);
      
      		// The child has gone, the grandchild runs with kref == 1
      		assert(pwrite(sctl, "0\n", 2, 0) == 2);
      		assert(setsid() > 0);
      
      		// runs with the freed ag/tg
      		for (;;)
      			sleep(1);
      
      		return 0;
      	}
      
      crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
      autogroup_move_group() actually frees the task_group or this happens later.
      Reported-by: NVern Lovejoy <vlovejoy@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hartsjc@redhat.com
      Cc: vbendel@redhat.com
      Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18f649ef
  10. 21 11月, 2016 1 次提交
  11. 19 11月, 2016 1 次提交
  12. 17 11月, 2016 1 次提交
    • J
      bpf: fix range arithmetic for bpf map access · f23cc643
      Josef Bacik 提交于
      I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
      invalid accesses to bpf map entries.  Fix this up by doing a few things
      
      1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
      life and just adds extra complexity.
      
      2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
      minimum value to 0 for positive AND's.
      
      3) Don't do operations on the ranges if they are set to the limits, as they are
      by definition undefined, and allowing arithmetic operations on those values
      could make them appear valid when they really aren't.
      
      This fixes the testcase provided by Jann as well as a few other theoretical
      problems.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f23cc643
  13. 16 11月, 2016 12 次提交
  14. 15 11月, 2016 6 次提交
    • D
      perf/core: Do not set cpuctx->cgrp for unscheduled cgroups · 864c2357
      David Carrillo-Cisneros 提交于
      Commit:
      
        db4a8356 ("perf/core: Set cgroup in CPU contexts for new cgroup events")
      
      failed to verify that event->cgrp is actually the scheduled cgroup
      in a CPU before setting cpuctx->cgrp. This patch fixes that.
      
      Now that there is a different path for scheduled and unscheduled
      cgroup, add a warning to catch when cpuctx->cgrp is still set after
      the last cgroup event has been unsheduled.
      
      To verify the bug:
      
        # Create 2 cgroups.
        mkdir /dev/cgroups/devices/g1
        mkdir /dev/cgroups/devices/g2
      
        # launch a task, bind it to a cpu and move it to g1
        CPU=2
        while :; do : ; done &
        P=$!
      
        taskset -pc $CPU $P
        echo $P > /dev/cgroups/devices/g1/tasks
      
        # monitor g2 (it runs no tasks) and observe output
        perf stat -e cycles -I 1000 -C $CPU -G g2
      
        #           time             counts unit events
           1.000091408          7,579,527      cycles                    g2
           2.000350111      <not counted>      cycles                    g2
           3.000589181      <not counted>      cycles                    g2
           4.000771428      <not counted>      cycles                    g2
      
        # note first line that displays that a task run in g2, despite
        # g2 having no tasks. This is because cpuctx->cgrp was wrongly
        # set when context of new event was installed.
        # After applying the fix we obtain the right output:
      
        perf stat -e cycles -I 1000 -C $CPU -G g2
        #           time             counts unit events
           1.000119615      <not counted>      cycles                    g2
           2.000389430      <not counted>      cycles                    g2
           3.000590962      <not counted>      cycles                    g2
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Link: http://lkml.kernel.org/r/1478026378-86083-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      864c2357
    • S
      sched/cputime: Simplify task_cputime() · 353c50eb
      Stanislaw Gruszka 提交于
      Now since fetch_task_cputime() has no other users than task_cputime(),
      its code could be used directly in task_cputime().
      
      Moreover since only 2 task_cputime() calls of 17 use a NULL argument,
      we can add dummy variables to those calls and remove NULL checks from
      task_cputimes().
      
      Also remove NULL checks from task_cputimes_scaled().
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      353c50eb
    • S
      sched/cputime, powerpc, s390: Make scaled cputime arch specific · 40565b5a
      Stanislaw Gruszka 提交于
      Only s390 and powerpc have hardware facilities allowing to measure
      cputimes scaled by frequency. On all other architectures
      utimescaled/stimescaled are equal to utime/stime (however they are
      accounted separately).
      
      Remove {u,s}timescaled accounting on all architectures except
      powerpc and s390, where those values are explicitly accounted
      in the proper places.
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161031162143.GB12646@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      40565b5a
    • S
      sched/cputime, powerpc: Remove cputime_to_scaled() · 981ee2d4
      Stanislaw Gruszka 提交于
      Currently cputime_to_scaled() just return it's argument on
      all implementations, we don't need to call this function.
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NPaul Mackerras <paulus@ozlabs.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1479175612-14718-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      981ee2d4
    • S
      ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records · 546fece4
      Steven Rostedt (Red Hat) 提交于
      When a module is first loaded and its function ip records are added to the
      ftrace list of functions to modify, they are set to DISABLED, as their text
      is still in a read only state. When the module is fully loaded, and can be
      updated, the flag is cleared, and if their's any functions that should be
      tracing them, it is updated at that moment.
      
      But there's several locations that do record accounting and should ignore
      records that are marked as disabled, or they can cause issues.
      
      Alexei already fixed one location, but others need to be addressed.
      
      Cc: stable@vger.kernel.org
      Fixes: b7ffffbb "ftrace: Add infrastructure for delayed enabling of module functions"
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      546fece4
    • A
      ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records · 977c1f9c
      Alexei Starovoitov 提交于
      ftrace_shutdown() checks for sanity of ftrace records
      and if dyn_ftrace->flags is not zero, it will warn.
      It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point,
      since some module was loaded, but before ftrace_module_enable()
      cleared the flags for this module.
      
      In other words the module.c is doing:
      ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED
      ... // here ftrace_shutdown() is called that warns, since
      err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED
      
      Fix it by ignoring disabled records.
      It's similar to what __ftrace_hash_rec_update() is already doing.
      
      Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com
      
      Cc: stable@vger.kernel.org
      Fixes: b7ffffbb "ftrace: Add infrastructure for delayed enabling of module functions"
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      977c1f9c