1. 23 7月, 2015 6 次提交
  2. 16 7月, 2015 2 次提交
    • N
      rcu: Change return type to bool · f765d113
      Nicholas Mc Guire 提交于
      Type-checking coccinelle spatches are being used to locate type mismatches
      between function signatures and return values in this case this produced:
      ./kernel/rcu/srcu.c:271 WARNING: return of wrong type
              int != unsigned long,
      
      srcu_readers_active() returns an int that is the sum of per_cpu unsigned
      long but the only user is cleanup_srcu_struct() which is using it as a
      boolean (condition) to see if there is any readers rather than actually
      using the approximate number of readers. The theoretically possible
      unsigned long overflow case does not need to be handled explicitly - if
      we had 4G++ readers then something else went wrong a long time ago.
      
      proposal: change the return type to boolean. The function name is left
                unchanged as it fits the naming expectation for a boolean.
      
      patch was compile tested for x86_64_defconfig (implies CONFIG_SRCU=y)
      
      patch is against 4.1-rc5 (localversion-next is -next-20150525)
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f765d113
    • D
      rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC · d5671f6b
      Denys Vlasenko 提交于
      DEBUG_LOCK_ALLOC=y is not a production setting, but it is
      not very unusual either. Many developers routinely
      use kernels built with it enabled.
      
      Apart from being selected by hand, it is also auto-selected by
      PROVE_LOCKING "Lock debugging: prove locking correctness" and
      LOCK_STAT "Lock usage statistics" config options.
      LOCK STAT is necessary for "perf lock" to work.
      
      I wouldn't spend too much time optimizing it, but this particular
      function has a very large cost in code size: when it is deinlined,
      code size decreases by 830,000 bytes:
      
          text     data      bss       dec     hex filename
      85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
      84837612 22294424 20627456 127759492 79d7484 vmlinux
      
      (with this config: http://busybox.net/~vda/kernel_config)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      CC: Tejun Heo <tj@kernel.org>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: linux-kernel@vger.kernel.org
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d5671f6b
  3. 07 7月, 2015 1 次提交
  4. 04 7月, 2015 5 次提交
  5. 03 7月, 2015 1 次提交
  6. 01 7月, 2015 8 次提交
  7. 28 6月, 2015 2 次提交
  8. 27 6月, 2015 1 次提交
    • T
      timer: Fix hotplug regression · 24bfcb10
      Thomas Gleixner 提交于
      The recent timer wheel rework removed the get/put_cpu_var() pair in
      the hotplug migration code, which results in:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: hib.sh/2845
      ...
      [<ffffffff810d4fa3>] timer_cpu_notify+0x53/0x12
      
      That hunk is a leftover from an earlier iteration and went unnoticed
      so far.
      
      Restore the previous code which was obviously correct.
      
      Fixes: 0eeda71b 'timer: Replace timer base by a cpu index'
      Reported-and_tested-by: Borislav Petkov <bp@alien8.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      24bfcb10
  9. 26 6月, 2015 13 次提交
    • R
      exit,stats: /* obey this comment */ · 51229b49
      Rik van Riel 提交于
      There is a helpful comment in do_exit() that states we sync the mm's RSS
      info before statistics gathering.
      
      The function that does the statistics gathering is called right above that
      comment.
      
      Change the code to obey the comment.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51229b49
    • R
      kernel/trace/blktrace.c: use strreplace() in do_blk_trace_setup() · ff14417c
      Rasmus Villemoes 提交于
      Part of the disassembly of do_blk_trace_setup:
      
          231b:       e8 00 00 00 00          callq  2320 <do_blk_trace_setup+0x50>
                              231c: R_X86_64_PC32     strlen+0xfffffffffffffffc
          2320:       eb 0a                   jmp    232c <do_blk_trace_setup+0x5c>
          2322:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
          2328:       48 83 c3 01             add    $0x1,%rbx
          232c:       48 39 d8                cmp    %rbx,%rax
          232f:       76 47                   jbe    2378 <do_blk_trace_setup+0xa8>
          2331:       41 80 3c 1c 2f          cmpb   $0x2f,(%r12,%rbx,1)
          2336:       75 f0                   jne    2328 <do_blk_trace_setup+0x58>
          2338:       41 c6 04 1c 5f          movb   $0x5f,(%r12,%rbx,1)
          233d:       4c 89 e7                mov    %r12,%rdi
          2340:       e8 00 00 00 00          callq  2345 <do_blk_trace_setup+0x75>
                              2341: R_X86_64_PC32     strlen+0xfffffffffffffffc
          2345:       eb e1                   jmp    2328 <do_blk_trace_setup+0x58>
      
      Yep, that's right: gcc isn't smart enough to realize that replacing '/' by
      '_' cannot change the strlen(), so we call it again and again (at least
      when a '/' is found).  Even if gcc were that smart, this construction
      would still loop over the string twice, once for the initial strlen() call
      and then the open-coded loop.
      
      Let's simply use strreplace() instead.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Liked-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff14417c
    • R
      kernel/trace/trace_events_filter.c: use strreplace() · 1bb56471
      Rasmus Villemoes 提交于
      There's no point in starting over every time we see a ','...
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bb56471
    • V
      check_syslog_permissions() cleanup · 3ea4331c
      Vasily Averin 提交于
      Patch fixes drawbacks in heck_syslog_permissions() noticed by AKPM:
      "from_file handling makes me cry.
      
      That's not a boolean - it's an enumerated value with two values
      currently defined.
      
      But the code in check_syslog_permissions() treats it as a boolean and
      also hardwires the knowledge that SYSLOG_FROM_PROC == 1 (or == `true`).
      
      And the name is wrong: it should be called from_proc to match
      SYSLOG_FROM_PROC."
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ea4331c
    • V
      security_syslog() should be called once only · d194e5d6
      Vasily Averin 提交于
      The final version of commit 637241a9 ("kmsg: honor dmesg_restrict
      sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
      processed incorrectly:
      
      - open of /dev/kmsg checks syslog access permissions by using
        check_syslog_permissions() where security_syslog() is not called if
        dmesg_restrict is set.
      
      - syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
        can be executed twice (inside check_syslog_permissions() and then
        directly in do_syslog())
      
      With this patch security_syslog() is called once only in all
      syslog-related operations regardless of dmesg_restrict value.
      
      Fixes: 637241a9 ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d194e5d6
    • T
      printk: implement support for extended console drivers · 6fe29354
      Tejun Heo 提交于
      printk log_buf keeps various metadata for each message including its
      sequence number and timestamp.  The metadata is currently available only
      through /dev/kmsg and stripped out before passed onto console drivers.  We
      want this metadata to be available to console drivers too so that console
      consumers can get full information including the metadata and dictionary,
      which among other things can be used to detect whether messages got lost
      in transit.
      
      This patch implements support for extended console drivers.  Consoles can
      indicate that they want extended messages by setting the new CON_EXTENDED
      flag and they'll be fed messages formatted the same way as /dev/kmsg.
      
       "<level>,<sequnum>,<timestamp>,<contflag>;<message text>\n"
      
      If extended consoles exist, in-kernel fragment assembly is disabled.  This
      ensures that all messages emitted to consoles have full metadata including
      sequence number.  The contflag carries enough information to reassemble
      the fragments from the reader side trivially.  Note that this only affects
      /dev/kmsg.  Regular console and /proc/kmsg outputs are not affected by
      this change.
      
      * Extended message formatting for console drivers is enabled iff there
        are registered extended consoles.
      
      * Comment describing /dev/kmsg message format updated to add missing
        contflag field and help distinguishing variable from verbatim terms.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kay Sievers <kay@vrfy.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6fe29354
    • T
      printk: factor out message formatting from devkmsg_read() · 0a295e67
      Tejun Heo 提交于
      The extended message formatting used for /dev/kmsg will be used implement
      extended consoles.  Factor out msg_print_ext_header() and
      msg_print_ext_body() from devkmsg_read().
      
      This is pure restructuring.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kay Sievers <kay@vrfy.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a295e67
    • T
      printk: guard the amount written per line by devkmsg_read() · d43ff430
      Tejun Heo 提交于
      This patchset updates netconsole so that it can emit messages with the
      same header as used in /dev/kmsg which gives neconsole receiver full log
      information which enables things like structured logging and detection
      of lost messages.
      
      This patch (of 7):
      
      devkmsg_read() uses 8k buffer and assumes that the formatted output
      message won't overrun which seems safe given LOG_LINE_MAX, the current use
      of dict and the escaping method being used; however, we're planning to use
      devkmsg formatting wider and accounting for the buffer size properly isn't
      that complicated.
      
      This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read()
      so that it limits output accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kay Sievers <kay@vrfy.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d43ff430
    • J
      clone: support passing tls argument via C rather than pt_regs magic · 3033f14a
      Josh Triplett 提交于
      clone has some of the quirkiest syscall handling in the kernel, with a
      pile of special cases, historical curiosities, and architecture-specific
      calling conventions.  In particular, clone with CLONE_SETTLS accepts a
      parameter "tls" that the C entry point completely ignores and some
      assembly entry points overwrite; instead, the low-level arch-specific
      code pulls the tls parameter out of the arch-specific register captured
      as part of pt_regs on entry to the kernel.  That's a massive hack, and
      it makes the arch-specific code only work when called via the specific
      existing syscall entry points; because of this hack, any new clone-like
      system call would have to accept an identical tls argument in exactly
      the same arch-specific position, rather than providing a unified system
      call entry point across architectures.
      
      The first patch allows architectures to handle the tls argument via
      normal C parameter passing, if they opt in by selecting
      HAVE_COPY_THREAD_TLS.  The second patch makes 32-bit and 64-bit x86 opt
      into this.
      
      These two patches came out of the clone4 series, which isn't ready for
      this merge window, but these first two cleanup patches were entirely
      uncontroversial and have acks.  I'd like to go ahead and submit these
      two so that other architectures can begin building on top of this and
      opting into HAVE_COPY_THREAD_TLS.  However, I'm also happy to wait and
      send these through the next merge window (along with v3 of clone4) if
      anyone would prefer that.
      
      This patch (of 2):
      
      clone with CLONE_SETTLS accepts an argument to set the thread-local
      storage area for the new thread.  sys_clone declares an int argument
      tls_val in the appropriate point in the argument list (based on the
      various CLONE_BACKWARDS variants), but doesn't actually use or pass along
      that argument.  Instead, sys_clone calls do_fork, which calls
      copy_process, which calls the arch-specific copy_thread, and copy_thread
      pulls the corresponding syscall argument out of the pt_regs captured at
      kernel entry (knowing what argument of clone that architecture passes tls
      in).
      
      Apart from being awful and inscrutable, that also only works because only
      one code path into copy_thread can pass the CLONE_SETTLS flag, and that
      code path comes from sys_clone with its architecture-specific
      argument-passing order.  This prevents introducing a new version of the
      clone system call without propagating the same architecture-specific
      position of the tls argument.
      
      However, there's no reason to pull the argument out of pt_regs when
      sys_clone could just pass it down via C function call arguments.
      
      Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
      and a new copy_thread_tls that accepts the tls parameter as an additional
      unsigned long (syscall-argument-sized) argument.  Change sys_clone's tls
      argument to an unsigned long (which does not change the ABI), and pass
      that down to copy_thread_tls.
      
      Architectures that don't opt into copy_thread_tls will continue to ignore
      the C argument to sys_clone in favor of the pt_regs captured at kernel
      entry, and thus will be unable to introduce new versions of the clone
      syscall.
      
      Patch co-authored by Josh Triplett and Thiago Macieira.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thiago Macieira <thiago.macieira@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3033f14a
    • A
      prctl: more prctl(PR_SET_MM_*) checks · 4a00e9df
      Alexey Dobriyan 提交于
      Individual prctl(PR_SET_MM_*) calls do some checking to maintain a
      consistent view of mm->arg_start et al fields, but not enough.  In
      particular PR_SET_MM_ARG_START/PR_SET_MM_ARG_END/ R_SET_MM_ENV_START/
      PR_SET_MM_ENV_END only check that the address lies in an existing VMA,
      but don't check that the start address is lower than the end address _at
      all_.
      
      Consolidate all consistency checks, so there will be no difference in
      the future between PR_SET_MM_MAP and individual PR_SET_MM_* calls.
      
      The program below makes both ARGV and ENVP areas be reversed.  It makes
      /proc/$PID/cmdline show garbage (it doesn't oops by luck).
      
      #include <sys/mman.h>
      #include <sys/prctl.h>
      #include <unistd.h>
      
      enum {PAGE_SIZE=4096};
      
      int main(void)
      {
      	void *p;
      
      	p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      
      #define PR_SET_MM               35
      #define PR_SET_MM_ARG_START     8
      #define PR_SET_MM_ARG_END       9
      #define PR_SET_MM_ENV_START     10
      #define PR_SET_MM_ENV_END       11
      	prctl(PR_SET_MM, PR_SET_MM_ARG_START, (unsigned long)p + PAGE_SIZE - 1, 0, 0);
      	prctl(PR_SET_MM, PR_SET_MM_ARG_END,   (unsigned long)p, 0, 0);
      	prctl(PR_SET_MM, PR_SET_MM_ENV_START, (unsigned long)p + PAGE_SIZE - 1, 0, 0);
      	prctl(PR_SET_MM, PR_SET_MM_ENV_END,   (unsigned long)p, 0, 0);
      
      	pause();
      	return 0;
      }
      
      [akpm@linux-foundation.org: tidy code, tweak comment]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jarod Wilson <jarod@redhat.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a00e9df
    • S
      tracing: Fix typo from "static inlin" to "static inline" · cc9e4bde
      Steven Rostedt (Red Hat) 提交于
      The trace.h header when called without CONFIG_EVENT_TRACING enabled
      (seldom done), will not compile because of a typo in the protocol
      of trace_event_enum_update().
      
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cc9e4bde
    • S
      tracing/filter: Do not allow infix to exceed end of string · 6b88f44e
      Steven Rostedt (Red Hat) 提交于
      While debugging a WARN_ON() for filtering, I found that it is possible
      for the filter string to be referenced after its end. With the filter:
      
       # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
      
      The filter_parse() function can call infix_get_op() which calls
      infix_advance() that updates the infix filter pointers for the cnt
      and tail without checking if the filter is already at the end, which
      will put the cnt to zero and the tail beyond the end. The loop then calls
      infix_next() that has
      
      	ps->infix.cnt--;
      	return ps->infix.string[ps->infix.tail++];
      
      The cnt will now be below zero, and the tail that is returned is
      already passed the end of the filter string. So far the allocation
      of the filter string usually has some buffer that is zeroed out, but
      if the filter string is of the exact size of the allocated buffer
      there's no guarantee that the charater after the nul terminating
      character will be zero.
      
      Luckily, only root can write to the filter.
      
      Cc: stable@vger.kernel.org # 2.6.33+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6b88f44e
    • S
      tracing/filter: Do not WARN on operand count going below zero · b4875bbe
      Steven Rostedt (Red Hat) 提交于
      When testing the fix for the trace filter, I could not come up with
      a scenario where the operand count goes below zero, so I added a
      WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
      that it can happen (although the filter would be wrong).
      
       # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
      
      That is, a single operation without any operands will hit the path
      where the WARN_ON_ONCE() can trigger. Although this is harmless,
      and the filter is reported as a error. But instead of spitting out
      a warning to the kernel dmesg, just fail nicely and report it via
      the proper channels.
      
      Link: http://lkml.kernel.org/r/558C6082.90608@oracle.comReported-by: NVince Weaver <vincent.weaver@maine.edu>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: stable@vger.kernel.org # 2.6.33+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b4875bbe
  10. 25 6月, 2015 1 次提交