1. 11 12月, 2014 40 次提交
    • L
      Merge tag 'ktest-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · f74ea368
      Linus Torvalds 提交于
      Pull ktest changes from Steven Rostedt:
       "The following ktest updates were done:
      
         - Fix handling the make kernelrelease change
         - Fix make_min_config that was broken by new bisect_config changes
         - Allow tests to undefine default options (not just being able to
           override them)
         - Print name of test (if defined) to start of test output"
      
      * tag 'ktest-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Add back "tail -1" to kernelrelease make
        ktest: Add name to running title
        ktest: Allow tests to undefine default options
        ktest: Fix make_min_config to handle new assign_configs call
        ktest: Use make -s kernelrelease
      f74ea368
    • L
      Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 350e4f49
      Linus Torvalds 提交于
      Pull nmi-safe seq_buf printk update from Steven Rostedt:
       "This code is a fork from the trace-3.19 pull as it needed the
        trace_seq clean ups from that branch.
      
        This code solves the issue of performing stack dumps from NMI context.
        The issue is that printk() is not safe from NMI context as if the NMI
        were to trigger when a printk() was being performed, the NMI could
        deadlock from the printk() internal locks.  This has been seen in
        practice.
      
        With lots of review from Petr Mladek, this code went through several
        iterations, and we feel that it is now at a point of quality to be
        accepted into mainline.
      
        Here's what is contained in this patch set:
      
         - Creates a "seq_buf" generic buffer utility that allows a descriptor
           to be passed around where functions can write their own "printk()"
           formatted strings into it.  The generic version was pulled out of
           the trace_seq() code that was made specifically for tracing.
      
         - The seq_buf code was change to model the seq_file code.  I have a
           patch (not included for 3.19) that converts the seq_file.c code
           over to use seq_buf.c like the trace_seq.c code does.  This was
           done to make sure that seq_buf.c is compatible with seq_file.c.  I
           may try to get that patch in for 3.20.
      
         - The seq_buf.c file was moved to lib/ to remove it from being
           dependent on CONFIG_TRACING.
      
         - The printk() was updated to allow for a per_cpu "override" of the
           internal calls.  That is, instead of writing to the console, a call
           to printk() may do something else.  This made it easier to allow
           the NMI to change what printk() does in order to call dump_stack()
           without needing to update that code as well.
      
         - Finally, the dump_stack from all CPUs via NMI code was converted to
           use the seq_buf code.  The caller to trigger the NMI code would
           wait till all the NMIs finished, and then it would print the
           seq_buf data to the console safely from a non NMI context
      
        One added bonus is that this code also makes the NMI dump stack work
        on PREEMPT_RT kernels.  As printk() includes sleeping locks on
        PREEMPT_RT, printk() only writes to console if the console does not
        use any rt_mutex converted spin locks.  Which a lot do"
      
      * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        x86/nmi: Fix use of unallocated cpumask_var_t
        printk/percpu: Define printk_func when printk is not defined
        x86/nmi: Perform a safe NMI stack trace on all CPUs
        printk: Add per_cpu printk func to allow printk to be diverted
        seq_buf: Move the seq_buf code to lib/
        seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
        tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
        tracing: Have seq_buf use full buffer
        seq_buf: Add seq_buf_can_fit() helper function
        tracing: Add paranoid size check in trace_printk_seq()
        tracing: Use trace_seq_used() and seq_buf_used() instead of len
        tracing: Clean up tracing_fill_pipe_page()
        seq_buf: Create seq_buf_used() to find out how much was written
        tracing: Add a seq_buf_clear() helper and clear len and readpos in init
        tracing: Convert seq_buf fields to be like seq_file fields
        tracing: Convert seq_buf_path() to be like seq_path()
        tracing: Create seq_buf layer in trace_seq
      350e4f49
    • L
      Merge tag 'ftracetest-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · c3280952
      Linus Torvalds 提交于
      Pull ftrace self-test updates from Steven Rostedt:
       "Updates for the ftrace self tests:
      
         - Added kprobes on ftrace testcase
         - Sort test cases
         - Add file to hold helper functions
         - Use logfile name supported by busybox's mktemp
         - Clear trace buffer after running kprobe test
         - Fix show descriptions when run on dash shell
         - Add --verbose option for showing echo output"
      
      * tag 'ftracetest-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftracetest: Add --verbose option for showing echo output
        ftracetest: Fix to show descriptions on dash
        ftracetest: Add basic event tracing test cases
        ftracetest: Clear trace buffer after running kprobe testcases
        ftracetest: Use logfile name supported by busybox's mktemp
        ftracetest: Add a couple of ftrace test cases
        ftracetest: Add functions file that holds helper functions
        ftracetest: Sort testcases
        ftracetest: Add kprobes on ftrace testcase
      c3280952
    • L
      Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 1dd7dcb6
      Linus Torvalds 提交于
      Pull tracing updates from Steven Rostedt:
       "There was a lot of clean ups and minor fixes.  One of those clean ups
        was to the trace_seq code.  It also removed the return values to the
        trace_seq_*() functions and use trace_seq_has_overflowed() to see if
        the buffer filled up or not.  This is similar to work being done to
        the seq_file code as well in another tree.
      
        Some of the other goodies include:
      
         - Added some "!" (NOT) logic to the tracing filter.
      
         - Fixed the frame pointer logic to the x86_64 mcount trampolines
      
         - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
           That is, the ftrace trampoline can be dynamically allocated and be
           called directly by functions that only have a single hook to them"
      
      * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
        tracing: Truncated output is better than nothing
        tracing: Add additional marks to signal very large time deltas
        Documentation: describe trace_buf_size parameter more accurately
        tracing: Allow NOT to filter AND and OR clauses
        tracing: Add NOT to filtering logic
        ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
        ftrace/x86: Get rid of ftrace_caller_setup
        ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
        ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
        ftrace/x86: Simplify save_mcount_regs on getting RIP
        ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
        ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
        ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
        ftrace/x86: Have static tracing also use ftrace_caller_setup
        ftrace/x86: Have static function tracing always test for function graph
        kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
        ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
        kprobes/ftrace: Recover original IP if pre_handler doesn't change it
        tracing/trivial: Fix typos and make an int into a bool
        tracing: Deletion of an unnecessary check before iput()
        ...
      1dd7dcb6
    • L
      Merge branch 'akpm' (patchbomb from Andrew) · b6da0076
      Linus Torvalds 提交于
      Merge first patchbomb from Andrew Morton:
       - a few minor cifs fixes
       - dma-debug upadtes
       - ocfs2
       - slab
       - about half of MM
       - procfs
       - kernel/exit.c
       - panic.c tweaks
       - printk upates
       - lib/ updates
       - checkpatch updates
       - fs/binfmt updates
       - the drivers/rtc tree
       - nilfs
       - kmod fixes
       - more kernel/exit.c
       - various other misc tweaks and fixes
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
        exit: pidns: fix/update the comments in zap_pid_ns_processes()
        exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
        exit: exit_notify: re-use "dead" list to autoreap current
        exit: reparent: call forget_original_parent() under tasklist_lock
        exit: reparent: avoid find_new_reaper() if no children
        exit: reparent: introduce find_alive_thread()
        exit: reparent: introduce find_child_reaper()
        exit: reparent: document the ->has_child_subreaper checks
        exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
        exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
        exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
        exit: proc: don't try to flush /proc/tgid/task/tgid
        exit: release_task: fix the comment about group leader accounting
        exit: wait: drop tasklist_lock before psig->c* accounting
        exit: wait: don't use zombie->real_parent
        exit: wait: cleanup the ptrace_reparented() checks
        usermodehelper: kill the kmod_thread_locker logic
        usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
        fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
        nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
        ...
      b6da0076
    • O
      exit: pidns: fix/update the comments in zap_pid_ns_processes() · a53b8315
      Oleg Nesterov 提交于
      The comments in zap_pid_ns_processes() are not clear, we need to explain
      how this code actually works.
      
      1. "Ignore SIGCHLD" looks like optimization but it is not, we also
         need this for correctness.
      
      2. The comment above sys_wait4() could tell more.
      
         EXIT_ZOMBIE child is only possible if it has exited before we
         ignored SIGCHLD. Or if it is traced from the parent namespace,
         but in this case it will be reaped by debugger after detach,
         sys_wait4() acts as a synchronization point.
      
      3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is
         outdated. Contrary to what it says we do not need to make sure
         they all go away after 0a01f2cc "pidns: Make the pidns proc
         mount/umount logic obvious".
      
         At the same time, we do need to wait for nr_hashed==init_pids,
         but the reasons are quite different and not obvious: setns().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a53b8315
    • O
      exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting · 24c037eb
      Oleg Nesterov 提交于
      alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
      fails because disable_pid_allocation() was called by the exiting
      child_reaper.
      
      We could simply move get_pid_ns() down to successful return, but this fix
      tries to be as trivial as possible.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24c037eb
    • O
      exit: exit_notify: re-use "dead" list to autoreap current · 6c66e7db
      Oleg Nesterov 提交于
      After the previous change we can add just the exiting EXIT_DEAD task to
      the "dead" list and remove another release_task(tsk).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c66e7db
    • O
      exit: reparent: call forget_original_parent() under tasklist_lock · 482a3767
      Oleg Nesterov 提交于
      Shift "release dead children" loop from forget_original_parent() to its
      caller, exit_notify().  It is safe to reap them even if our parent reaps
      us right after we drop tasklist_lock, those children no longer have any
      connection to the exiting task.
      
      And this allows us to avoid write_lock_irq(tasklist_lock) right after it
      was released by forget_original_parent(), we can simply call it with
      tasklist_lock held.
      
      While at it, move the comment about forget_original_parent() up to
      this function.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      482a3767
    • O
      exit: reparent: avoid find_new_reaper() if no children · ad9e206a
      Oleg Nesterov 提交于
      Now that pid_ns logic was isolated we can change forget_original_parent()
      to return right after find_child_reaper() when father->children is empty,
      there is nothing to reparent in this case.
      
      In particular this avoids find_alive_thread() and this can help if the
      whole process exits and it has a lot of PF_EXITING threads at the start of
      the thread list, this can easily lead to O(nr_threads ** 2) iterations.
      
      Trivial test case (tested under KVM, 2 CPUs):
      
          static void *tfunc(void *arg)
          {
              pause();
              return NULL;
          }
      
          static int child(unsigned int nt)
          {
              pthread_t pt;
      
              while (nt--)
                  assert(pthread_create(&pt, NULL, tfunc, NULL) == 0);
      
              pthread_kill(pt, SIGTRAP);
              pause();
              return 0;
          }
      
          int main(int argc, const char *argv[])
          {
              int stat;
              unsigned int nf = atoi(argv[1]);
              unsigned int nt = atoi(argv[2]);
      
              while (nf--) {
                  if (!fork())
                      return child(nt);
      
                  wait(&stat);
                  assert(stat == SIGTRAP);
              }
      
              return 0;
          }
      
      $ time ./test 16 16536 shows:
      
                    real        user         sys
          -    5m37.628s    0m4.437s    8m5.560s
          +    0m50.032s    0m7.130s    1m4.927s
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad9e206a
    • O
      exit: reparent: introduce find_alive_thread() · c9dc05bf
      Oleg Nesterov 提交于
      Add the new simple helper to factor out the for_each_thread() code in
      find_child_reaper() and find_new_reaper().  It can also simplify the
      potential PF_EXITING -> exit_state change, plus perhaps we can change this
      code to take SIGNAL_GROUP_EXIT into account.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9dc05bf
    • O
      exit: reparent: introduce find_child_reaper() · 1109909c
      Oleg Nesterov 提交于
      find_new_reaper() does 2 completely different things.  Not only it finds a
      reaper, it also updates pid_ns->child_reaper or kills the whole namespace
      if the caller is ->child_reaper.
      
      Now that has_child_subreaper logic doesn't depend on child_reaper check we
      can move that pid_ns code into a separate helper.  IMHO this makes the
      code more clean, and this allows the next changes.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1109909c
    • O
      exit: reparent: document the ->has_child_subreaper checks · 175aed3f
      Oleg Nesterov 提交于
      Swap the "init_task" and same_thread_group() checks.  This way it is more
      simple to document these checks and we can remove the link to the previous
      discussion on lkml.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      175aed3f
    • O
      exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() · 3750ef97
      Oleg Nesterov 提交于
      Change find_new_reaper() to use for_each_thread() instead of deprecated
      while_each_thread().  We do not bother to check "thread != father" in the
      1st loop, we can rely on PF_EXITING check.
      
      Note: this means the minor behavioural change: for_each_thread() starts
      from the group leader.  But this should be fine, nobody should make any
      assumption about do_wait(__WNOTHREAD) when it comes to reparented tasks.
      And this can avoid the pointless reparenting to a short-living thread
      While zombie leaders are not that common.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3750ef97
    • O
      exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting · 7d24e2df
      Oleg Nesterov 提交于
      find_new_reaper() assumes that "has_child_subreaper" logic is safe as
      long as we are not the exiting ->child_reaper and this is doubly wrong:
      
      1. In fact it is safe if "pid_ns->child_reaper == father"; there must
         be no children after zap_pid_ns_processes() returns, so it doesn't
         matter what we return in this case and even pid_ns->child_reaper is
         wrong otherwise: we can't reparent to ->child_reaper == current.
      
         This is not a bug, but this is confusing.
      
      2. It is not safe if we are not pid_ns->child_reaper but from the same
         thread group. We drop tasklist_lock before zap_pid_ns_processes(),
         so another thread can lock it and choose the new reaper from the
         upper namespace if has_child_subreaper == T, and this is obviously
         wrong.
      
         This is not that bad, zap_pid_ns_processes() won't return until the
         the new reaper reaps all zombies, but this should be fixed anyway.
      
      We could change for_each_thread() loop to use ->exit_state instead of
      PF_EXITING which we had to use until 8aac6270, or we could change
      copy_signal() to check CLONE_NEWPID before setting has_child_subreaper,
      but lets change this code so that it is clear we can't look outside of
      our namespace, otherwise same_thread_group(reaper, child_reaper) check
      will look wrong and confusing anyway.
      
      We can simply start from "father" and fix the problem. We can't wrongly
      return a thread from the same thread group if ->is_child_subreaper == T,
      we know that all threads have PF_EXITING set.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d24e2df
    • O
      exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting · 8a1296ae
      Oleg Nesterov 提交于
      The ->has_child_subreaper code in find_new_reaper() finds alive "thread"
      but returns another "reaper" thread which can be dead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a1296ae
    • O
      exit: proc: don't try to flush /proc/tgid/task/tgid · c35a7f18
      Oleg Nesterov 提交于
      proc_flush_task_mnt() always tries to flush task/pid, but this is
      pointless if we reap the leader. d_invalidate() is recursive, and
      if nothing else the next d_hash_and_lookup(tgid) should fail anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c35a7f18
    • O
      exit: release_task: fix the comment about group leader accounting · 26e75b5c
      Oleg Nesterov 提交于
      Contrary to what the comment in __exit_signal() says we do account the
      group leader. Fix this and explain why.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26e75b5c
    • O
      exit: wait: drop tasklist_lock before psig->c* accounting · 986094df
      Oleg Nesterov 提交于
      wait_task_zombie() no longer needs tasklist_lock to accumulate the
      psig->c* counters, we can drop it right after cmpxchg(exit_state).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      986094df
    • O
      exit: wait: don't use zombie->real_parent · f953ccd0
      Oleg Nesterov 提交于
      1. wait_task_zombie() uses p->real_parent to get psig/siglock. This is
         correct but needs tasklist_lock, ->real_parent can exit.
      
         We can use "current" instead. This is our natural child, its parent
         must be our sub-thread.
      
      2. Read psig/sig outside of ->siglock, ->signal is no longer protected
         by this lock.
      
      3. Fix the outdated comments about tasklist_lock. We can not race with
         __exit_signal(), the whole thread group is dead, nobody but us can
         call it.
      
         Also clarify the usage of ->stats_lock and ->siglock.
      
      Note: thread_group_cputime_adjusted() is sub-optimal in this case, we
      probably want to export cputime_adjust() to avoid thread_group_cputime().
      The comment says "all threads" but there are no other threads.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f953ccd0
    • O
      exit: wait: cleanup the ptrace_reparented() checks · f6507f83
      Oleg Nesterov 提交于
      Now that EXIT_DEAD is the terminal state we can kill "int traced"
      variable and check "state == EXIT_DEAD" instead to cleanup the code.  In
      particular, this way it is clear that the check obviously doesn't need
      tasklist_lock.
      
      Also fix the type of "unsigned long state", "long" was always wrong
      although this doesn't matter because cmpxchg/xchg uses typeof(*ptr).
      
      [akpm@linux-foundation.org: don't make me google the C Operator Precedence table]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6507f83
    • O
      usermodehelper: kill the kmod_thread_locker logic · 7f6def9f
      Oleg Nesterov 提交于
      Now that we do not call kernel_thread(CLONE_VFORK) from the worker
      thread we can not deadlock if do_execve() in turn triggers another
      call_usermodehelper(), we can remove the kmod_thread_locker code.
      
      Note: we should probably kill khelper_wq and simply use one of the
      global workqueues, say, system_unbound_wq, this special wq for umh buys
      nothing nowadays.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f6def9f
    • O
      usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() · 7117bc88
      Oleg Nesterov 提交于
      After "kernel/kmod: fix use-after-free of the sub_infostructure"
      CLONE_VFORK in __call_usermodehelper() buys nothing, we rely on on
      umh_complete() in ____call_usermodehelper() anyway.
      
      Remove it.  This also eliminates the unnecessary sleep/wakeup in the
      likely case, and this allows the next change.
      
      While at it, kill the "int wait" locals in ____call_usermodehelper() and
      __call_usermodehelper(), they can safely use sub_info->wait.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7117bc88
    • R
      fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp · ddbc22e2
      Rasmus Villemoes 提交于
      Relying on the sign (after casting to int) of the difference of two
      quantities for comparison is usually wrong.  For example, should a-b
      turn out to be 2^31, the return value of cmp(a,b) is -2^31; but that
      would also be the return value from cmp(b, a).  So a compares less than
      b and b compares less than a.  One can also easily find three values
      a,b,c such that a compares less than b, b compares less than c, but a
      does not compare less than c.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddbc22e2
    • R
      nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races · 705304a8
      Ryusuke Konishi 提交于
      Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar
      ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
      of insert_inode_locked() and a bug of a check for dead inodes needs to
      be fixed.
      
      If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
      insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
      the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
      
      If nilfs_iget() is called before nilfs_new_inode() calls
      insert_inode_locked4(), it will create an in-core inode and read its
      data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
      zero and fail at nilfs_read_inode_common(), which will lead it to call
      iget_failed() and cleanly fail.
      
      However, this sanity check doesn't work as expected for reused on-disk
      inodes because they leave a non-zero value in i_mode field and it
      hinders the test of i_nlink.  This patch also fixes the issue by
      removing the test on i_mode that nilfs2 doesn't need.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      705304a8
    • M
      nilfs2: deletion of an unnecessary check before the function call "iput" · 72b9918e
      Markus Elfring 提交于
      The iput() function tests whether its argument is NULL and then returns
      immediately.  Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72b9918e
    • A
      nilfs2: avoid duplicate segment construction for fsync() · 75dc857c
      Andreas Rohner 提交于
      This patch removes filemap_write_and_wait_range() from nilfs_sync_file(),
      because it triggers a data segment construction by calling
      nilfs_writepages() with WB_SYNC_ALL.  A data segment construction does not
      remove the inode from the i_dirty list and it does not clear the
      NILFS_I_DIRTY flag.  Therefore nilfs_inode_dirty() still returns true,
      which leads to an unnecessary duplicate segment construction in
      nilfs_sync_file().
      
      A call to filemap_write_and_wait_range() is not needed, because NILFS2
      does not rely on the generic writeback mechanisms.  Instead it implements
      its own mechanism to collect all dirty pages and write them into segments.
       It is more efficient to initiate the segment construction directly in
      nilfs_sync_file() without the detour over filemap_write_and_wait_range().
      
      Additionally the lock of i_mutex is not needed, because all code blocks
      that are protected by i_mutex are also protected by a NILFS transaction:
      
        Function                i_mutex     nilfs_transaction
        ------------------------------------------------------
        nilfs_ioctl_setflags:   yes         yes
        nilfs_fiemap:           yes         no
        nilfs_write_begin:      yes         yes
        nilfs_write_end:        yes         yes
        nilfs_lookup:           yes         no
        nilfs_create:           yes         yes
        nilfs_link:             yes         yes
        nilfs_mknod:            yes         yes
        nilfs_symlink:          yes         yes
        nilfs_mkdir:            yes         yes
        nilfs_unlink:           yes         yes
        nilfs_rmdir:            yes         yes
        nilfs_rename:           yes         yes
        nilfs_setattr:          yes         yes
      
      For nilfs_lookup() i_mutex is held for the parent directory, to protect it
      from modification.  The segment construction does not modify directory
      inodes, so no lock is needed.
      
      nilfs_fiemap() reads the block layout on the disk, by using
      nilfs_bmap_lookup_contig(). This is already protected by bmap->b_sem.
      Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75dc857c
    • X
      rtc: refine rtc_timer_do_work() to consider other set alarm failures · 6528b889
      Xunlei Pang 提交于
      rtc_timer_do_work() only judges -ETIME failure of__rtc_set_alarm(), but
      doesn't handle other failures like -EIO, -EBUSY, etc.
      
      If there is a failure other than -ETIME, the next rtc_timer will stay in
      the timerqueue.  Then later rtc_timers will be enqueued directly because
      they have a later expires time, so the alarm irq will never be programmed.
      
      When such failures happen, this patch will retry __rtc_set_alarm(), if
      still can't program the alarm time, it will remove current rtc_timer from
      timerqueue and fetch next one, thus preventing it from affecting other rtc
      timers.
      Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6528b889
    • X
      rtc/ab8500: set uie_unsupported flag · c594d678
      Xunlei Pang 提交于
      Currently, ab8500 doesn't set uie_unsupported of rtc_device, while it
      doesn't support UIE, see ab8500_rtc_set_alarm().
      
      Thus, when going through rtc_update_irq_enable()->rtc_timer_enqueue(),
      there's a chance it has an alarm timer1 queued before which is going to
      fired, so this update timer2 will be queued because it isn't the leftmost
      one, which means rtc_timer_enqueue() will return 0.
      
      This will result in two problems:
      1) UIE EMUL will not be used.
      2) When the alarm timer1 is fired, in rtc_timer_do_work() timer2 will
         fail to set the alarm time, so this rtc will disfunctional due to
         timer2 with the earliest expires in the timerqueue.
      
      So, rtc drivers must set this flag if they don't support UIE.
      Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c594d678
    • S
      drivers/rtc/rtc-snvs: fix suspend/resume · 7654e9d4
      Sanchayan Maity 提交于
      The alarm interrupt handler also reads registers which are part of SNVS
      and need clocks enabled.  However, the resume function is called after
      IRQ's have been enabled, hence this leads to a abort:
      
          Unhandled fault: external abort on non-linefetch (0x1008) at 0x908c604c
          Internal error: : 1008 [#1] ARM
          Modules linked in:
          CPU: 0 PID: 421 Comm: sh Not tainted 3.18.0-rc5-00135-g0689c67-dirty #1592
          task: 8e03e800 ti: 8cad8000 task.ti: 8cad8000
          PC is at snvs_rtc_irq_handler+0x14/0x74
          LR is at handle_irq_event_percpu+0x3c/0x144
      
      Fix this by using the .{suspend/resume}_noirq callbacks instead of
      .{suspend/resume} .
      Signed-off-by: NSanchayan Maity <maitysanchayan@gmail.com>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7654e9d4
    • S
      drivers/rtc/rtc-snvs: add clock support · 7f899399
      Sanchayan Maity 提交于
      Add clock enable and disable support for the SNVS peripheral, which is
      required for using the RTC within the SNVS block.
      
      The clock is not strictly enforced, as this would break the i.MX devices.
      The clocking for the i.MX devices seems to be enabled elsewhere and
      enabling RTC SNVS for Vybrid results in a crash.  This patch adds the
      clock support but also makes it optional so Vybrid platform can use the
      clock if defined while making sure not to break i.MX.
      Signed-off-by: NSanchayan Maity <maitysanchayan@gmail.com>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Acked-by: NStefan Agner <stefan@agner.ch>
      Acked-by: NAlessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f899399
    • J
      rtc: omap: drop vendor-prefix from power-controller dt property · 094d3ee3
      Johan Hovold 提交于
      Drop the vendor-prefix from the "ti,system-power-controller" device-tree
      property name.
      
      It has been agreed to make "system-power-controller" a standard property
      and to drop the vendor-prefix that is currently used by several drivers.
      
      Note that drivers that have used "<vendor>,system-power-controller" in a
      released kernel will need to support both versions.
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Benot Cousson <bcousson@baylibre.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Felipe Balbi <balbi@ti.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      094d3ee3
    • A
      drivers/rtc/rtc-isl12057.c: report error code upon failure in dev_err() calls · cf67d0b6
      Arnaud Ebalard 提交于
      As pointed out by Mark, it is generally useful to log the error code when
      reporting a failure.  This patch improves existing calls to dev_err() in
      ISL12057 driver to also report error code.
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Suggested-by: NMark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Peter Huewe <peter.huewe@infineon.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Acked-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf67d0b6
    • A
      drivers/rtc/rtc-isl12057.c: add proper handling of oscillator failure bit · 10df1e67
      Arnaud Ebalard 提交于
      As suggested by Uwe, instead of clearing oscillator failure bit
      unconditionally at driver load, this patch adds proper handling of the
      flag.  The driver now returns -ENODATA when reading time from the device
      and oscillator failure bit is set.  The flag is now cleared only when the
      a new time value is pushed to the device.
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Reported-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Acked-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Peter Huewe <peter.huewe@infineon.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10df1e67
    • A
      drivers/rtc/rtc-isl12057.c: add support for century bit · b5f4184d
      Arnaud Ebalard 提交于
      The month register of ISL12057 RTC chip includes a century bit which
      reports overflow of year register from 99 to 0.  This bit can also be
      written, which allows using it to extend the time interval the chip can
      support from 99 to 199 years.
      
      This patch adds support for century overflow bit in tm to regs and regs to
      tm helpers in ISL12057 driver.
      
      This was tested by putting a device 100 years in the future (using a
      specific kernel due to the inability of userland tools such as date or
      hwclock to pass year 2038), rebooting on a kernel w/ this patch applied
      and verifying the device was still 100 years in the future.
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Suggested-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Acked-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Peter Huewe <peter.huewe@infineon.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5f4184d
    • A
      drivers/rtc/rtc-isl12057.c: fix masking of register values · 5945b288
      Arnaud Ebalard 提交于
      When Intersil ISL12057 support was added by commit 70e12337 ("rtc: Add
      support for Intersil ISL12057 I2C RTC chip"), two masks for time registers
      values imported from the device were either wrong or omitted, leading to
      additional bits from those registers to impact read values:
      
       - mask for hour register value when reading it in AM/PM mode. As
         AM/PM mode is not the usual mode used by the driver, this error
         would only have an impact on an externally configured RTC hour
         later read by the driver.
       - mask for month value. The lack of masking would provide an
         erroneous value if century bit is set.
      
      This patch fixes those two masks.
      
      Fixes: 70e12337 ("rtc: Add support for Intersil ISL12057 I2C RTC chip")
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Peter Huewe <peter.huewe@infineon.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Acked-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5945b288
    • T
      of: add vendor prefix for Pericom Technology · dd01a1c5
      Tomas Novotny 提交于
      Signed-off-by: NTomas Novotny <tomas@novotny.cz>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd01a1c5
    • T
      rtc: ds1307: add support for mcp7940x chips · f4199f85
      Tomas Novotny 提交于
      MCP7940x is same RTC as MCP7941x.  The difference is that MCP7941x chips
      contain additional EEPROM on a different i2c address.
      
      DS1307 driver already supports MCP7941x, so just add a new i2c device id
      and rename functions and defines accordingly.
      Signed-off-by: NTomas Novotny <tomas@novotny.cz>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4199f85
    • S
      drivers/rtc/rtc-ds1374.c: add watchdog support · 920f91e5
      Søren Andersen 提交于
      Add support for the watchdog functionality of the DS1374 rtc.  Based on
      the m41t80 watchdog functionality Note: watchdog uses the same registers
      as alarm.
      
      [akpm@linux-foundation.org: don't forget mutex_unlock() in ds1374_wdt_open() error path]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NSoeren Andersen <san@rosetechnology.dk>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      920f91e5
    • B
      drivers/rtc/rtc-sirfsoc.c: replace local_irq_disable by spin_lock_irq for SMP safety · e9bc7363
      Barry Song 提交于
      Signed-off-by: NBarry Song <Baohua.Song@csr.com>
      Cc: hao liu <hao.liu@csr.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9bc7363