1. 04 5月, 2017 1 次提交
  2. 11 4月, 2017 1 次提交
    • N
      sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() · 717a94b5
      NeilBrown 提交于
      It is not safe for one thread to modify the ->flags
      of another thread as there is no locking that can protect
      the update.
      
      So tsk_restore_flags(), which takes a task pointer and modifies
      the flags, is an invitation to do the wrong thing.
      
      All current users pass "current" as the task, so no developers have
      accepted that invitation.  It would be best to ensure it remains
      that way.
      
      So rename tsk_restore_flags() to current_restore_flags() and don't
      pass in a task_struct pointer.  Always operate on current->flags.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      717a94b5
  3. 04 4月, 2017 1 次提交
    • X
      sched/rtmutex/deadline: Fix a PI crash for deadline tasks · e96a7705
      Xunlei Pang 提交于
      A crash happened while I was playing with deadline PI rtmutex.
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          IP: [<ffffffff810eeb8f>] rt_mutex_get_top_task+0x1f/0x30
          PGD 232a75067 PUD 230947067 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 1 PID: 10994 Comm: a.out Not tainted
      
          Call Trace:
          [<ffffffff810b658c>] enqueue_task+0x2c/0x80
          [<ffffffff810ba763>] activate_task+0x23/0x30
          [<ffffffff810d0ab5>] pull_dl_task+0x1d5/0x260
          [<ffffffff810d0be6>] pre_schedule_dl+0x16/0x20
          [<ffffffff8164e783>] __schedule+0xd3/0x900
          [<ffffffff8164efd9>] schedule+0x29/0x70
          [<ffffffff8165035b>] __rt_mutex_slowlock+0x4b/0xc0
          [<ffffffff81650501>] rt_mutex_slowlock+0xd1/0x190
          [<ffffffff810eeb33>] rt_mutex_timed_lock+0x53/0x60
          [<ffffffff810ecbfc>] futex_lock_pi.isra.18+0x28c/0x390
          [<ffffffff810ed8b0>] do_futex+0x190/0x5b0
          [<ffffffff810edd50>] SyS_futex+0x80/0x180
      
      This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi()
      are only protected by pi_lock when operating pi waiters, while
      rt_mutex_get_top_task(), will access them with rq lock held but
      not holding pi_lock.
      
      In order to tackle it, we introduce new "pi_top_task" pointer
      cached in task_struct, and add new rt_mutex_update_top_task()
      to update its value, it can be called by rt_mutex_setprio()
      which held both owner's pi_lock and rq lock. Thus "pi_top_task"
      can be safely accessed by enqueue_task_dl() under rq lock.
      
      Originally-From: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: juri.lelli@arm.com
      Cc: bigeasy@linutronix.de
      Cc: mathieu.desnoyers@efficios.com
      Cc: jdesfossez@efficios.com
      Cc: bristot@redhat.com
      Link: http://lkml.kernel.org/r/20170323150216.157682758@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e96a7705
  4. 28 3月, 2017 1 次提交
    • T
      LSM: Revive security_task_alloc() hook and per "struct task_struct" security blob. · e4e55b47
      Tetsuo Handa 提交于
      We switched from "struct task_struct"->security to "struct cred"->security
      in Linux 2.6.29. But not all LSM modules were happy with that change.
      TOMOYO LSM module is an example which want to use per "struct task_struct"
      security blob, for TOMOYO's security context is defined based on "struct
      task_struct" rather than "struct cred". AppArmor LSM module is another
      example which want to use it, for AppArmor is currently abusing the cred
      a little bit to store the change_hat and setexeccon info. Although
      security_task_free() hook was revived in Linux 3.4 because Yama LSM module
      wanted to release per "struct task_struct" security blob,
      security_task_alloc() hook and "struct task_struct"->security field were
      not revived. Nowadays, we are getting proposals of lightweight LSM modules
      which want to use per "struct task_struct" security blob.
      
      We are already allowing multiple concurrent LSM modules (up to one fully
      armored module which uses "struct cred"->security field or exclusive hooks
      like security_xfrm_state_pol_flow_match(), plus unlimited number of
      lightweight modules which do not use "struct cred"->security nor exclusive
      hooks) as long as they are built into the kernel. But this patch does not
      implement variable length "struct task_struct"->security field which will
      become needed when multiple LSM modules want to use "struct task_struct"->
      security field. Although it won't be difficult to implement variable length
      "struct task_struct"->security field, let's think about it after we merged
      this patch.
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NJohn Johansen <john.johansen@canonical.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Tested-by: NDjalal Harouni <tixxdz@gmail.com>
      Acked-by: NJosé Bollo <jobol@nonadev.net>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: José Bollo <jobol@nonadev.net>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      e4e55b47
  5. 24 3月, 2017 1 次提交
  6. 17 3月, 2017 1 次提交
    • T
      cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups · 77f88796
      Tejun Heo 提交于
      Creation of a kthread goes through a couple interlocked stages between
      the kthread itself and its creator.  Once the new kthread starts
      running, it initializes itself and wakes up the creator.  The creator
      then can further configure the kthread and then let it start doing its
      job by waking it up.
      
      In this configuration-by-creator stage, the creator is the only one
      that can wake it up but the kthread is visible to userland.  When
      altering the kthread's attributes from userland is allowed, this is
      fine; however, for cases where CPU affinity is critical,
      kthread_bind() is used to first disable affinity changes from userland
      and then set the affinity.  This also prevents the kthread from being
      migrated into non-root cgroups as that can affect the CPU affinity and
      many other things.
      
      Unfortunately, the cgroup side of protection is racy.  While the
      PF_NO_SETAFFINITY flag prevents further migrations, userland can win
      the race before the creator sets the flag with kthread_bind() and put
      the kthread in a non-root cgroup, which can lead to all sorts of
      problems including incorrect CPU affinity and starvation.
      
      This bug got triggered by userland which periodically tries to migrate
      all processes in the root cpuset cgroup to a non-root one.  Per-cpu
      workqueue workers got caught while being created and ended up with
      incorrected CPU affinity breaking concurrency management and sometimes
      stalling workqueue execution.
      
      This patch adds task->no_cgroup_migration which disallows the task to
      be migrated by userland.  kthreadd starts with the flag set making
      every child kthread start in the root cgroup with migration
      disallowed.  The flag is cleared after the kthread finishes
      initialization by which time PF_NO_SETAFFINITY is set if the kthread
      should stay in the root cgroup.
      
      It'd be better to wait for the initialization instead of failing but I
      couldn't think of a way of implementing that without adding either a
      new PF flag, or sleeping and retrying from waiting side.  Even if
      userland depends on changing cgroup membership of a kthread, it either
      has to be synchronized with kthread_create() or periodically repeat,
      so it's unlikely that this would break anything.
      
      v2: Switch to a simpler implementation using a new task_struct bit
          field suggested by Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-and-debugged-by: NChris Mason <clm@fb.com>
      Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on < v4.3)
      Signed-off-by: NTejun Heo <tj@kernel.org>
      77f88796
  7. 08 3月, 2017 1 次提交
    • J
      livepatch: change to a per-task consistency model · d83a7cb3
      Josh Poimboeuf 提交于
      Change livepatch to use a basic per-task consistency model.  This is the
      foundation which will eventually enable us to patch those ~10% of
      security patches which change function or data semantics.  This is the
      biggest remaining piece needed to make livepatch more generally useful.
      
      This code stems from the design proposal made by Vojtech [1] in November
      2014.  It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
      consistency and syscall barrier switching combined with kpatch's stack
      trace switching.  There are also a number of fallback options which make
      it quite flexible.
      
      Patches are applied on a per-task basis, when the task is deemed safe to
      switch over.  When a patch is enabled, livepatch enters into a
      transition state where tasks are converging to the patched state.
      Usually this transition state can complete in a few seconds.  The same
      sequence occurs when a patch is disabled, except the tasks converge from
      the patched state to the unpatched state.
      
      An interrupt handler inherits the patched state of the task it
      interrupts.  The same is true for forked tasks: the child inherits the
      patched state of the parent.
      
      Livepatch uses several complementary approaches to determine when it's
      safe to patch tasks:
      
      1. The first and most effective approach is stack checking of sleeping
         tasks.  If no affected functions are on the stack of a given task,
         the task is patched.  In most cases this will patch most or all of
         the tasks on the first try.  Otherwise it'll keep trying
         periodically.  This option is only available if the architecture has
         reliable stacks (HAVE_RELIABLE_STACKTRACE).
      
      2. The second approach, if needed, is kernel exit switching.  A
         task is switched when it returns to user space from a system call, a
         user space IRQ, or a signal.  It's useful in the following cases:
      
         a) Patching I/O-bound user tasks which are sleeping on an affected
            function.  In this case you have to send SIGSTOP and SIGCONT to
            force it to exit the kernel and be patched.
         b) Patching CPU-bound user tasks.  If the task is highly CPU-bound
            then it will get patched the next time it gets interrupted by an
            IRQ.
         c) In the future it could be useful for applying patches for
            architectures which don't yet have HAVE_RELIABLE_STACKTRACE.  In
            this case you would have to signal most of the tasks on the
            system.  However this isn't supported yet because there's
            currently no way to patch kthreads without
            HAVE_RELIABLE_STACKTRACE.
      
      3. For idle "swapper" tasks, since they don't ever exit the kernel, they
         instead have a klp_update_patch_state() call in the idle loop which
         allows them to be patched before the CPU enters the idle state.
      
         (Note there's not yet such an approach for kthreads.)
      
      All the above approaches may be skipped by setting the 'immediate' flag
      in the 'klp_patch' struct, which will disable per-task consistency and
      patch all tasks immediately.  This can be useful if the patch doesn't
      change any function or data semantics.  Note that, even with this flag
      set, it's possible that some tasks may still be running with an old
      version of the function, until that function returns.
      
      There's also an 'immediate' flag in the 'klp_func' struct which allows
      you to specify that certain functions in the patch can be applied
      without per-task consistency.  This might be useful if you want to patch
      a common function like schedule(), and the function change doesn't need
      consistency but the rest of the patch does.
      
      For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
      must set patch->immediate which causes all tasks to be patched
      immediately.  This option should be used with care, only when the patch
      doesn't change any function or data semantics.
      
      In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
      may be allowed to use per-task consistency if we can come up with
      another way to patch kthreads.
      
      The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
      is in transition.  Only a single patch (the topmost patch on the stack)
      can be in transition at a given time.  A patch can remain in transition
      indefinitely, if any of the tasks are stuck in the initial patch state.
      
      A transition can be reversed and effectively canceled by writing the
      opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
      the transition is in progress.  Then all the tasks will attempt to
      converge back to the original patch state.
      
      [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.czSigned-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: Ingo Molnar <mingo@kernel.org>        # for the scheduler changes
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      d83a7cb3
  8. 03 3月, 2017 33 次提交