1. 11 12月, 2014 27 次提交
    • O
      exit: pidns: fix/update the comments in zap_pid_ns_processes() · a53b8315
      Oleg Nesterov 提交于
      The comments in zap_pid_ns_processes() are not clear, we need to explain
      how this code actually works.
      
      1. "Ignore SIGCHLD" looks like optimization but it is not, we also
         need this for correctness.
      
      2. The comment above sys_wait4() could tell more.
      
         EXIT_ZOMBIE child is only possible if it has exited before we
         ignored SIGCHLD. Or if it is traced from the parent namespace,
         but in this case it will be reaped by debugger after detach,
         sys_wait4() acts as a synchronization point.
      
      3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is
         outdated. Contrary to what it says we do not need to make sure
         they all go away after 0a01f2cc "pidns: Make the pidns proc
         mount/umount logic obvious".
      
         At the same time, we do need to wait for nr_hashed==init_pids,
         but the reasons are quite different and not obvious: setns().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a53b8315
    • O
      exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting · 24c037eb
      Oleg Nesterov 提交于
      alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
      fails because disable_pid_allocation() was called by the exiting
      child_reaper.
      
      We could simply move get_pid_ns() down to successful return, but this fix
      tries to be as trivial as possible.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24c037eb
    • O
      exit: exit_notify: re-use "dead" list to autoreap current · 6c66e7db
      Oleg Nesterov 提交于
      After the previous change we can add just the exiting EXIT_DEAD task to
      the "dead" list and remove another release_task(tsk).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c66e7db
    • O
      exit: reparent: call forget_original_parent() under tasklist_lock · 482a3767
      Oleg Nesterov 提交于
      Shift "release dead children" loop from forget_original_parent() to its
      caller, exit_notify().  It is safe to reap them even if our parent reaps
      us right after we drop tasklist_lock, those children no longer have any
      connection to the exiting task.
      
      And this allows us to avoid write_lock_irq(tasklist_lock) right after it
      was released by forget_original_parent(), we can simply call it with
      tasklist_lock held.
      
      While at it, move the comment about forget_original_parent() up to
      this function.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      482a3767
    • O
      exit: reparent: avoid find_new_reaper() if no children · ad9e206a
      Oleg Nesterov 提交于
      Now that pid_ns logic was isolated we can change forget_original_parent()
      to return right after find_child_reaper() when father->children is empty,
      there is nothing to reparent in this case.
      
      In particular this avoids find_alive_thread() and this can help if the
      whole process exits and it has a lot of PF_EXITING threads at the start of
      the thread list, this can easily lead to O(nr_threads ** 2) iterations.
      
      Trivial test case (tested under KVM, 2 CPUs):
      
          static void *tfunc(void *arg)
          {
              pause();
              return NULL;
          }
      
          static int child(unsigned int nt)
          {
              pthread_t pt;
      
              while (nt--)
                  assert(pthread_create(&pt, NULL, tfunc, NULL) == 0);
      
              pthread_kill(pt, SIGTRAP);
              pause();
              return 0;
          }
      
          int main(int argc, const char *argv[])
          {
              int stat;
              unsigned int nf = atoi(argv[1]);
              unsigned int nt = atoi(argv[2]);
      
              while (nf--) {
                  if (!fork())
                      return child(nt);
      
                  wait(&stat);
                  assert(stat == SIGTRAP);
              }
      
              return 0;
          }
      
      $ time ./test 16 16536 shows:
      
                    real        user         sys
          -    5m37.628s    0m4.437s    8m5.560s
          +    0m50.032s    0m7.130s    1m4.927s
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad9e206a
    • O
      exit: reparent: introduce find_alive_thread() · c9dc05bf
      Oleg Nesterov 提交于
      Add the new simple helper to factor out the for_each_thread() code in
      find_child_reaper() and find_new_reaper().  It can also simplify the
      potential PF_EXITING -> exit_state change, plus perhaps we can change this
      code to take SIGNAL_GROUP_EXIT into account.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9dc05bf
    • O
      exit: reparent: introduce find_child_reaper() · 1109909c
      Oleg Nesterov 提交于
      find_new_reaper() does 2 completely different things.  Not only it finds a
      reaper, it also updates pid_ns->child_reaper or kills the whole namespace
      if the caller is ->child_reaper.
      
      Now that has_child_subreaper logic doesn't depend on child_reaper check we
      can move that pid_ns code into a separate helper.  IMHO this makes the
      code more clean, and this allows the next changes.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1109909c
    • O
      exit: reparent: document the ->has_child_subreaper checks · 175aed3f
      Oleg Nesterov 提交于
      Swap the "init_task" and same_thread_group() checks.  This way it is more
      simple to document these checks and we can remove the link to the previous
      discussion on lkml.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      175aed3f
    • O
      exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() · 3750ef97
      Oleg Nesterov 提交于
      Change find_new_reaper() to use for_each_thread() instead of deprecated
      while_each_thread().  We do not bother to check "thread != father" in the
      1st loop, we can rely on PF_EXITING check.
      
      Note: this means the minor behavioural change: for_each_thread() starts
      from the group leader.  But this should be fine, nobody should make any
      assumption about do_wait(__WNOTHREAD) when it comes to reparented tasks.
      And this can avoid the pointless reparenting to a short-living thread
      While zombie leaders are not that common.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3750ef97
    • O
      exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting · 7d24e2df
      Oleg Nesterov 提交于
      find_new_reaper() assumes that "has_child_subreaper" logic is safe as
      long as we are not the exiting ->child_reaper and this is doubly wrong:
      
      1. In fact it is safe if "pid_ns->child_reaper == father"; there must
         be no children after zap_pid_ns_processes() returns, so it doesn't
         matter what we return in this case and even pid_ns->child_reaper is
         wrong otherwise: we can't reparent to ->child_reaper == current.
      
         This is not a bug, but this is confusing.
      
      2. It is not safe if we are not pid_ns->child_reaper but from the same
         thread group. We drop tasklist_lock before zap_pid_ns_processes(),
         so another thread can lock it and choose the new reaper from the
         upper namespace if has_child_subreaper == T, and this is obviously
         wrong.
      
         This is not that bad, zap_pid_ns_processes() won't return until the
         the new reaper reaps all zombies, but this should be fixed anyway.
      
      We could change for_each_thread() loop to use ->exit_state instead of
      PF_EXITING which we had to use until 8aac6270, or we could change
      copy_signal() to check CLONE_NEWPID before setting has_child_subreaper,
      but lets change this code so that it is clear we can't look outside of
      our namespace, otherwise same_thread_group(reaper, child_reaper) check
      will look wrong and confusing anyway.
      
      We can simply start from "father" and fix the problem. We can't wrongly
      return a thread from the same thread group if ->is_child_subreaper == T,
      we know that all threads have PF_EXITING set.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d24e2df
    • O
      exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting · 8a1296ae
      Oleg Nesterov 提交于
      The ->has_child_subreaper code in find_new_reaper() finds alive "thread"
      but returns another "reaper" thread which can be dead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a1296ae
    • O
      exit: release_task: fix the comment about group leader accounting · 26e75b5c
      Oleg Nesterov 提交于
      Contrary to what the comment in __exit_signal() says we do account the
      group leader. Fix this and explain why.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26e75b5c
    • O
      exit: wait: drop tasklist_lock before psig->c* accounting · 986094df
      Oleg Nesterov 提交于
      wait_task_zombie() no longer needs tasklist_lock to accumulate the
      psig->c* counters, we can drop it right after cmpxchg(exit_state).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      986094df
    • O
      exit: wait: don't use zombie->real_parent · f953ccd0
      Oleg Nesterov 提交于
      1. wait_task_zombie() uses p->real_parent to get psig/siglock. This is
         correct but needs tasklist_lock, ->real_parent can exit.
      
         We can use "current" instead. This is our natural child, its parent
         must be our sub-thread.
      
      2. Read psig/sig outside of ->siglock, ->signal is no longer protected
         by this lock.
      
      3. Fix the outdated comments about tasklist_lock. We can not race with
         __exit_signal(), the whole thread group is dead, nobody but us can
         call it.
      
         Also clarify the usage of ->stats_lock and ->siglock.
      
      Note: thread_group_cputime_adjusted() is sub-optimal in this case, we
      probably want to export cputime_adjust() to avoid thread_group_cputime().
      The comment says "all threads" but there are no other threads.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f953ccd0
    • O
      exit: wait: cleanup the ptrace_reparented() checks · f6507f83
      Oleg Nesterov 提交于
      Now that EXIT_DEAD is the terminal state we can kill "int traced"
      variable and check "state == EXIT_DEAD" instead to cleanup the code.  In
      particular, this way it is clear that the check obviously doesn't need
      tasklist_lock.
      
      Also fix the type of "unsigned long state", "long" was always wrong
      although this doesn't matter because cmpxchg/xchg uses typeof(*ptr).
      
      [akpm@linux-foundation.org: don't make me google the C Operator Precedence table]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6507f83
    • O
      usermodehelper: kill the kmod_thread_locker logic · 7f6def9f
      Oleg Nesterov 提交于
      Now that we do not call kernel_thread(CLONE_VFORK) from the worker
      thread we can not deadlock if do_execve() in turn triggers another
      call_usermodehelper(), we can remove the kmod_thread_locker code.
      
      Note: we should probably kill khelper_wq and simply use one of the
      global workqueues, say, system_unbound_wq, this special wq for umh buys
      nothing nowadays.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f6def9f
    • O
      usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() · 7117bc88
      Oleg Nesterov 提交于
      After "kernel/kmod: fix use-after-free of the sub_infostructure"
      CLONE_VFORK in __call_usermodehelper() buys nothing, we rely on on
      umh_complete() in ____call_usermodehelper() anyway.
      
      Remove it.  This also eliminates the unnecessary sleep/wakeup in the
      likely case, and this allows the next change.
      
      While at it, kill the "int wait" locals in ____call_usermodehelper() and
      __call_usermodehelper(), they can safely use sub_info->wait.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7117bc88
    • A
      printk: drop logbuf_cpu volatile qualifier · f099755d
      Alex Elder 提交于
      Pranith Kumar posted a patch in which removed the "volatile"
      qualifier for the "logbuf_cpu" variable in vprintk_emit().
          https://lkml.org/lkml/2014/11/13/894
      In his patch, he used ACCESS_ONCE() for all references to
      that symbol to provide whatever protection was intended.
      
      There was some discussion that followed, and in the end Steven Rostedt
      concluded that not only was "volatile" not needed, neither was it
      required to use ACCESS_ONCE().  I offered an elaborate description that
      concluded Steven was right, and Pranith asked me to submit an
      alternative patch.  And this is it.
      
      The basic reason "volatile" is not needed is that "logbuf_cpu" has
      static storage duration, and vprintk_emit() is an exported
      interface.  This means that the value of logbuf_cpu must be read
      from memory the first time it is used in a particular call of
      vprintk_emit().  The variable's value is read only once in that
      function, when it's read it'll be the copy from memory (or cache).
      
      In addition, the value of "logbuf_cpu" is only ever written under
      protection of a spinlock.  So the value that is read is the "real"
      value (and not an out-of-date cached one).  If its value is not
      UINT_MAX, it is the current CPU's processor id, and it will have
      been last written by the running CPU.
      Signed-off-by: NAlex Elder <elder@linaro.org>
      Reported-by: NPranith Kumar <bobby.prani@gmail.com>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f099755d
    • J
      printk: add and use LOGLEVEL_<level> defines for KERN_<LEVEL> equivalents · a39d4a85
      Joe Perches 提交于
      Use #defines instead of magic values.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a39d4a85
    • J
      printk: remove used-once early_vprintk · 1dc6244b
      Joe Perches 提交于
      Eliminate the unlikely possibility of message interleaving for
      early_printk/early_vprintk use.
      
      early_vprintk can be done via the %pV extension so remove this
      unnecessary function and change early_printk to have the equivalent
      vprintk code.
      
      All uses of early_printk already end with a newline so also remove the
      unnecessary newline from the early_printk function.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1dc6244b
    • P
      kernel: add panic_on_warn · 9e3961a0
      Prarit Bhargava 提交于
      There have been several times where I have had to rebuild a kernel to
      cause a panic when hitting a WARN() in the code in order to get a crash
      dump from a system.  Sometimes this is easy to do, other times (such as
      in the case of a remote admin) it is not trivial to send new images to
      the user.
      
      A much easier method would be a switch to change the WARN() over to a
      panic.  This makes debugging easier in that I can now test the actual
      image the WARN() was seen on and I do not have to engage in remote
      debugging.
      
      This patch adds a panic_on_warn kernel parameter and
      /proc/sys/kernel/panic_on_warn calls panic() in the
      warn_slowpath_common() path.  The function will still print out the
      location of the warning.
      
      An example of the panic_on_warn output:
      
      The first line below is from the WARN_ON() to output the WARN_ON()'s
      location.  After that the panic() output is displayed.
      
          WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
          Kernel panic - not syncing: panic_on_warn set ...
      
          CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ #57
          Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
           0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
           0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
           ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
          Call Trace:
           [<ffffffff81665190>] dump_stack+0x46/0x58
           [<ffffffff8165e2ec>] panic+0xd0/0x204
           [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
           [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
           [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
           [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
           [<ffffffff81002144>] do_one_initcall+0xd4/0x210
           [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
           [<ffffffff810f8889>] load_module+0x16a9/0x1b30
           [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
           [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
           [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
           [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17
      
      Successfully tested by me.
      
      hpa said: There is another very valid use for this: many operators would
      rather a machine shuts down than being potentially compromised either
      functionally or security-wise.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e3961a0
    • O
      exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent() · 7c8bd232
      Oleg Nesterov 提交于
      Now that forget_original_parent() uses ->ptrace_entry for EXIT_DEAD tasks,
      we can simply pass "dead_children" list to exit_ptrace() and remove
      another release_task() loop.  Plus this way we do not need to drop and
      reacquire tasklist_lock.
      
      Also shift the list_empty(ptraced) check, if we want this optimization it
      makes sense to eliminate the function call altogether.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@hack.frob.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c8bd232
    • O
      exit: reparent: cleanup the usage of reparent_leader() · 2831096e
      Oleg Nesterov 提交于
      1. Now that reparent_leader() doesn't abuse ->sibling we can shift
         list_move_tail() from reparent_leader() to forget_original_parent()
         and turn it into a single list_splice_tail_init(). This also makes
         BUG_ON(!list_empty()) and list_for_each_entry_safe() unnecessary.
      
      2. This also allows to shift the same_thread_group() check, it looks
         a bit more clear in the caller.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@hack.frob.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2831096e
    • O
      exit: reparent: cleanup the changing of ->parent · 57a05918
      Oleg Nesterov 提交于
      1. Cosmetic, but "if (t->parent == father)" looks a bit confusing.
         We need to change t->parent if and only if t is not traced.
      
      2. If we actually want this BUG_ON() to ensure that parent/ptrace
         match each other, then we should also take ptrace_reparented()
         case into account too.
      
      3. Change this code to use for_each_thread() instead of deprecated
         while_each_thread().
      
      [dan.carpenter@oracle.com: silence a bogus static checker warning]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@hack.frob.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57a05918
    • O
      exit: reparent: use ->ptrace_entry rather than ->sibling for EXIT_DEAD tasks · dc2fd4b0
      Oleg Nesterov 提交于
      reparent_leader() reuses ->sibling as a list node to add an EXIT_DEAD task
      into dead_children list we are going to release.  This obviously removes
      the dead task from its real_parent->children list and this is even good;
      the parent can do nothing with the EXIT_DEAD reparented zombie, it only
      makes do_wait() slower.
      
      But, this also means that it can not be reparented once again, so if its
      new parent dies too nobody will update ->parent/real_parent, they can
      point to the freed memory even before release_task() we are going to call,
      this breaks the code which relies on pid_alive() to access
      ->real_parent/parent.
      
      Fortunately this is mostly theoretical, this can only happen if init or
      PR_SET_CHILD_SUBREAPER process ignores SIGCHLD and the new parent
      sub-thread exits right after we drop tasklist_lock.
      
      Change this code to use ->ptrace_entry instead, we know that the child is
      not traced so nobody can ever use this member.  This also allows to unify
      this logic with exit_ptrace(), see the next changes.
      
      Note: we really need to change release_task() to nullify real_parent/
      parent/group_leader pointers, but we need to change the current users
      first somehow.  And it would be better to reap this zombie immediately but
      release_task_locked() we need is complicated by proc_flush_task().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@hack.frob.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc2fd4b0
    • O
      sched_show_task: fix unsafe usage of ->real_parent · a90e984c
      Oleg Nesterov 提交于
      rcu_read_lock() can not protect p->real_parent if release_task(p) was
      already called, change sched_show_task() to check pis_alive() like other
      users do.
      
      Note: we need some helpers to cleanup the code like this.  And it seems
      that that the usage of cpu_curr(cpu) in dump_cpu_task() is not safe too.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
      Cc: Sterling Alexander <stalexan@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Roland McGrath <roland@hack.frob.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a90e984c
    • J
      kernel: res_counter: remove the unused API · 5b1efc02
      Johannes Weiner 提交于
      All memory accounting and limiting has been switched over to the
      lockless page counters.  Bye, res_counter!
      
      [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
      [mhocko@suse.cz: ditch the last remainings of res_counter]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b1efc02
  2. 08 12月, 2014 2 次提交
  3. 06 12月, 2014 1 次提交
  4. 04 12月, 2014 4 次提交
  5. 03 12月, 2014 2 次提交
  6. 25 11月, 2014 1 次提交
  7. 24 11月, 2014 2 次提交
    • A
      uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME · 82975bc6
      Andy Lutomirski 提交于
      x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
      not on non-paranoid returns.  I suspect that this is a mistake and that
      the code only works because int3 is paranoid.
      
      Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
      for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
      from the uprobes code.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82975bc6
    • T
      sched: Provide update_curr callbacks for stop/idle scheduling classes · 90e362f4
      Thomas Gleixner 提交于
      Chris bisected a NULL pointer deference in task_sched_runtime() to
      commit 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
      inconsistency'.
      
      Chris observed crashes in atop or other /proc walking programs when he
      started fork bombs on his machine.  He assumed that this is a new exit
      race, but that does not make any sense when looking at that commit.
      
      What's interesting is that, the commit provides update_curr callbacks
      for all scheduling classes except stop_task and idle_task.
      
      While nothing can ever hit that via the clock_nanosleep() and
      clock_gettime() interfaces, which have been the target of the commit in
      question, the author obviously forgot that there are other code paths
      which invoke task_sched_runtime()
      
      do_task_stat(()
       thread_group_cputime_adjusted()
         thread_group_cputime()
           task_cputime()
             task_sched_runtime()
              if (task_current(rq, p) && task_on_rq_queued(p)) {
                update_rq_clock(rq);
                up->sched_class->update_curr(rq);
              }
      
      If the stats are read for a stomp machine task, aka 'migration/N' and
      that task is current on its cpu, this will happily call the NULL pointer
      of stop_task->update_curr.  Ooops.
      
      Chris observation that this happens faster when he runs the fork bomb
      makes sense as the fork bomb will kick migration threads more often so
      the probability to hit the issue will increase.
      
      Add the missing update_curr callbacks to the scheduler classes stop_task
      and idle_task.  While idle tasks cannot be monitored via /proc we have
      other means to hit the idle case.
      
      Fixes: 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
      Reported-by: NChris Mason <clm@fb.com>
      Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90e362f4
  8. 23 11月, 2014 1 次提交