1. 29 3月, 2012 5 次提交
    • S
      lib/cpumask.c: remove __any_online_cpu() · 38b93780
      Srivatsa S. Bhat 提交于
      __any_online_cpu() is not optimal and also unnecessary.  So, replace its
      use by faster cpumask_* operations.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38b93780
    • G
      smp: add func to IPI cpus based on parameter func · b3a7e98e
      Gilad Ben-Yossef 提交于
      Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and
      calculates the cpumask of cpus to IPI by calling a function supplied as a
      parameter in order to determine whether to IPI each specific cpu.
      
      The function works around allocation failure of cpumask variable in
      CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time
      via smp_call_function_single().
      
      The function is useful since it allows to seperate the specific code that
      decided in each case whether to IPI a specific cpu for a specific request
      from the common boilerplate code of handling creating the mask, handling
      failures etc.
      
      [akpm@linux-foundation.org: s/gfpflags/gfp_flags/]
      [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func']
      [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment]
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Avi Kivity <avi@redhat.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.org>
      Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
      Cc: Milton Miller <miltonm@bga.com>
      Reviewed-by: N"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3a7e98e
    • G
      smp: introduce a generic on_each_cpu_mask() function · 3fc498f1
      Gilad Ben-Yossef 提交于
      We have lots of infrastructure in place to partition multi-core systems
      such that we have a group of CPUs that are dedicated to specific task:
      cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
      Still, kernel code will at times interrupt all CPUs in the system via IPIs
      for various needs.  These IPIs are useful and cannot be avoided
      altogether, but in certain cases it is possible to interrupt only specific
      CPUs that have useful work to do and not the entire system.
      
      This patch set, inspired by discussions with Peter Zijlstra and Frederic
      Weisbecker when testing the nohz task patch set, is a first stab at trying
      to explore doing this by locating the places where such global IPI calls
      are being made and turning the global IPI into an IPI for a specific group
      of CPUs.  The purpose of the patch set is to get feedback if this is the
      right way to go for dealing with this issue and indeed, if the issue is
      even worth dealing with at all.  Based on the feedback from this patch set
      I plan to offer further patches that address similar issue in other code
      paths.
      
      This patch creates an on_each_cpu_mask() and on_each_cpu_cond()
      infrastructure API (the former derived from existing arch specific
      versions in Tile and Arm) and uses them to turn several global IPI
      invocation to per CPU group invocations.
      
      Core kernel:
      
      on_each_cpu_mask() calls a function on processors specified by cpumask,
      which may or may not include the local processor.
      
      You must not call this function with disabled interrupts or from a
      hardware interrupt handler or from a bottom half handler.
      
      arch/arm:
      
      Note that the generic version is a little different then the Arm one:
      
      1. It has the mask as first parameter
      2. It calls the function on the calling CPU with interrupts disabled,
         but this should be OK since the function is called on the other CPUs
         with interrupts disabled anyway.
      
      arch/tile:
      
      The API is the same as the tile private one, but the generic version
      also calls the function on the with interrupts disabled in UP case
      
      This is OK since the function is called on the other CPUs
      with interrupts disabled.
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Avi Kivity <avi@redhat.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.org>
      Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com>
      Cc: Milton Miller <miltonm@bga.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fc498f1
    • H
      swapon: check validity of swap_flags · d15cab97
      Hugh Dickins 提交于
      Most system calls taking flags first check that the flags passed in are
      valid, and that helps userspace to detect when new flags are supported.
      
      But swapon never did so: start checking now, to help if we ever want to
      support more swap_flags in future.
      
      It's difficult to get stray bits set in an int, and swapon is not widely
      used, so this is most unlikely to break any userspace; but we can just
      revert if it turns out to do so.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d15cab97
    • H
      mm for fs: add truncate_pagecache_range() · 623e3db9
      Hugh Dickins 提交于
      Holepunching filesystems ext4 and xfs are using truncate_inode_pages_range
      but forgetting to unmap pages first (ocfs2 remembers).  This is not really
      a bug, since races already require truncate_inode_page() to handle that
      case once the page is locked; but it can be very inefficient if the file
      being punched happens to be mapped into many vmas.
      
      Provide a drop-in replacement truncate_pagecache_range() which does the
      unmapping pass first, handling the awkward mismatch between arguments to
      truncate_inode_pages_range() and arguments to unmap_mapping_range().
      
      Note that holepunching does not unmap privately COWed pages in the range:
      POSIX requires that we do so when truncating, but it's hard to justify,
      difficult to implement without an i_size cutoff, and no filesystem is
      attempting to implement it.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      623e3db9
  2. 26 3月, 2012 1 次提交
  3. 24 3月, 2012 27 次提交
    • K
      procfs: speed up /proc/pid/stat, statm · bda7bad6
      KAMEZAWA Hiroyuki 提交于
      Process accounting applications as top, ps visit some files under
      /proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
      and /proc/<pid>/statm files.
      
      This patch adds
        - seq_put_decimal_ll() for signed values.
        - allow delimiter == 0.
        - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.
      
      Test result on a system with 2000+ procs.
      
      Before patch:
        [kamezawa@bluextal test]$ top -b -n 1 | wc -l
        2223
        [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null
      
        real    0m0.675s
        user    0m0.044s
        sys     0m0.121s
      
        [kamezawa@bluextal test]$ time ps -elf > /dev/null
      
        real    0m0.236s
        user    0m0.056s
        sys     0m0.176s
      
      After patch:
        kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null
      
        real    0m0.657s
        user    0m0.052s
        sys     0m0.100s
      
        [kamezawa@bluextal ~]$ time ps -elf > /dev/null
      
        real    0m0.198s
        user    0m0.050s
        sys     0m0.145s
      
      Considering top, ps tend to scan /proc periodically, this will reduce cpu
      consumption by top/ps to some extent.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bda7bad6
    • K
      procfs: add num_to_str() to speed up /proc/stat · 1ac101a5
      KAMEZAWA Hiroyuki 提交于
      == stat_check.py
      num = 0
      with open("/proc/stat") as f:
              while num < 1000 :
                      data = f.read()
                      f.seek(0, 0)
                      num = num + 1
      ==
      
      perf shows
      
          20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
          13.41%  stat_check.py  [kernel.kallsyms]    [k] number
          12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
          10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
           4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
           4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf
      
      This patch removes most of calls to vsnprintf() by adding num_to_str()
      and seq_print_decimal_ull(), which prints decimal numbers without rich
      functions provided by printf().
      
      On my 8cpu box.
      == Before patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.150s
      user    0m0.026s
      sys     0m0.121s
      
      == After patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.055s
      user    0m0.022s
      sys     0m0.030s
      
      [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
      [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrea Righi <andrea@betterlinux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Turner <pjt@google.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ac101a5
    • J
      coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP · accb61fe
      Jason Baron 提交于
      Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
      for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
      MADV_DONTDUMP, which can be set by applications to specifically request
      memory regions which should not dump core.
      
      The specific application I have in mind is qemu: we can add a flag there
      that wouldn't dump all of guest memory when qemu dumps core.  This flag
      might also be useful for security sensitive apps that want to absolutely
      make sure that parts of memory are not dumped.  To clear the flag use:
      MADV_DODUMP.
      
      [akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
      [akpm@linux-foundation.org: fix up the architectures which broke]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      accb61fe
    • J
      coredump: remove VM_ALWAYSDUMP flag · 909af768
      Jason Baron 提交于
      The motivation for this patchset was that I was looking at a way for a
      qemu-kvm process, to exclude the guest memory from its core dump, which
      can be quite large.  There are already a number of filter flags in
      /proc/<pid>/coredump_filter, however, these allow one to specify 'types'
      of kernel memory, not specific address ranges (which is needed in this
      case).
      
      Since there are no more vma flags available, the first patch eliminates
      the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
      the kernel to mark vdso and vsyscall pages.  However, it is simple
      enough to check if a vma covers a vdso or vsyscall page without the need
      for this flag.
      
      The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
      'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
      'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
      continue to work the same as before unless 'MADV_DONTDUMP' is set on the
      region.
      
      The qemu code which implements this features is at:
      
        http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch
      
      In my testing the qemu core dump shrunk from 383MB -> 13MB with this
      patch.
      
      I also believe that the 'MADV_DONTDUMP' flag might be useful for
      security sensitive apps, which might want to select which areas are
      dumped.
      
      This patch:
      
      The VM_ALWAYSDUMP flag is currently used by the coredump code to
      indicate that a vma is part of a vsyscall or vdso section.  However, we
      can determine if a vma is in one these sections by checking it against
      the gate_vma and checking for a non-NULL return value from
      arch_vma_name().  Thus, freeing a valuable vma bit.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      909af768
    • O
      usermodehelper: kill umh_wait, renumber UMH_* constants · 9d944ef3
      Oleg Nesterov 提交于
      No functional changes.  It is not sane to use UMH_KILLABLE with enum
      umh_wait, but obviously we do not want another argument in
      call_usermodehelper_* helpers.  Kill this enum, use the plain int.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d944ef3
    • O
      usermodehelper: implement UMH_KILLABLE · d0bd587a
      Oleg Nesterov 提交于
      Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
      The caller must ensure that subprocess_info->path/etc can not go away
      until call_usermodehelper_freeinfo().
      
      call_usermodehelper_exec(UMH_KILLABLE) does
      wait_for_completion_killable.  If it fails, it uses
      xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
      does the same xhcg() to access sub_info->complete.
      
      If call_usermodehelper_exec wins, it can safely return.  umh_complete()
      should get NULL and call call_usermodehelper_freeinfo().
      
      Otherwise we know that umh_complete() was already called, in this case
      call_usermodehelper_exec() falls back to wait_for_completion() which
      should succeed "very soon".
      
      Note: UMH_NO_WAIT == -1 but it obviously should not be used with
      UMH_KILLABLE.  We delay the neccessary cleanup to simplify the back
      porting.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0bd587a
    • D
      ptrace: remove PTRACE_SEIZE_DEVEL bit · ee00560c
      Denys Vlasenko 提交于
      PTRACE_SEIZE code is tested and ready for production use, remove the
      code which requires special bit in data argument to make PTRACE_SEIZE
      work.
      
      Strace team prepares for a new release of strace, and we would like to
      ship the code which uses PTRACE_SEIZE, preferably after this change goes
      into released kernel.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee00560c
    • D
      ptrace: renumber PTRACE_EVENT_STOP so that future new options and events can match · 5cdf389a
      Denys Vlasenko 提交于
      PTRACE_EVENT_foo and PTRACE_O_TRACEfoo used to match.
      
      New PTRACE_EVENT_STOP is the first event which has no corresponding
      PTRACE_O_TRACE option.  If we will ever want to add another such option,
      its PTRACE_EVENT's value will collide with PTRACE_EVENT_STOP's value.
      
      This patch changes PTRACE_EVENT_STOP value to prevent this.
      
      While at it, added a comment - the one atop PTRACE_EVENT block, saying
      "Wait extended result codes for the above trace options", is not true
      for PTRACE_EVENT_STOP.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cdf389a
    • D
      ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code · 86b6c1f3
      Denys Vlasenko 提交于
      Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
      PT_option bits contiguous and therefore makes code in
      ptrace_setoptions() much simpler.
      
      Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
      instead of using explicit numeric constants, to ensure we don't mess up
      relationship between bit positions and event ids.
      
      PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
      value of PT_EVENT_FLAG_SHIFT-1 is easier to use.
      
      PT_TRACE_MASK constant is nuked, the only its use is replaced by
      (PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86b6c1f3
    • O
      ptrace: don't send SIGTRAP on exec if SEIZED · b1845ff5
      Oleg Nesterov 提交于
      ptrace_event(PTRACE_EVENT_EXEC) sends SIGTRAP if PT_TRACE_EXEC is not
      set.  This is because this SIGTRAP predates PTRACE_O_TRACEEXEC option,
      we do not need/want this with PT_SEIZED which can set the options during
      attach.
      Suggested-by: NPedro Alves <palves@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Chris Evans <scarybeasts@gmail.com>
      Cc: Indan Zupancic <indan@nul.nu>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1845ff5
    • O
      ptrace: the killed tracee should not enter the syscall · 15cab952
      Oleg Nesterov 提交于
      Another old/known problem.  If the tracee is killed after it reports
      syscall_entry, it starts the syscall and debugger can't control this.
      This confuses the users and this creates the security problems for
      ptrace jailers.
      
      Change tracehook_report_syscall_entry() to return non-zero if killed,
      this instructs syscall_trace_enter() to abort the syscall.
      Reported-by: NChris Evans <scarybeasts@gmail.com>
      Tested-by: NIndan Zupancic <indan@nul.nu>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15cab952
    • H
      poll: add poll_requested_events() and poll_does_not_wait() functions · 626cf236
      Hans Verkuil 提交于
      In some cases the poll() implementation in a driver has to do different
      things depending on the events the caller wants to poll for.  An example
      is when a driver needs to start a DMA engine if the caller polls for
      POLLIN, but doesn't want to do that if POLLIN is not requested but instead
      only POLLOUT or POLLPRI is requested.  This is something that can happen
      in the video4linux subsystem among others.
      
      Unfortunately, the current epoll/poll/select implementation doesn't
      provide that information reliably.  The poll_table_struct does have it: it
      has a key field with the event mask.  But once a poll() call matches one
      or more bits of that mask any following poll() calls are passed a NULL
      poll_table pointer.
      
      Also, the eventpoll implementation always left the key field at ~0 instead
      of using the requested events mask.
      
      This was changed in eventpoll.c so the key field now contains the actual
      events that should be polled for as set by the caller.
      
      The solution to the NULL poll_table pointer is to set the qproc field to
      NULL in poll_table once poll() matches the events, not the poll_table
      pointer itself.  That way drivers can obtain the mask through a new
      poll_requested_events inline.
      
      The poll_table_struct can still be NULL since some kernel code calls it
      internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
      that case poll_requested_events() returns ~0 (i.e.  all events).
      
      Very rarely drivers might want to know whether poll_wait will actually
      wait.  If another earlier file descriptor in the set already matched the
      events the caller wanted to wait for, then the kernel will return from the
      select() call without waiting.  This might be useful information in order
      to avoid doing expensive work.
      
      A new helper function poll_does_not_wait() is added that drivers can use
      to detect this situation.  This is now used in sock_poll_wait() in
      include/net/sock.h.  This was the only place in the kernel that needed
      this information.
      
      Drivers should no longer access any of the poll_table internals, but use
      the poll_requested_events() and poll_does_not_wait() access functions
      instead.  In order to enforce that the poll_table fields are now prepended
      with an underscore and a comment was added warning against using them
      directly.
      
      This required a change in unix_dgram_poll() in unix/af_unix.c which used
      the key field to get the requested events.  It's been replaced by a call
      to poll_requested_events().
      
      For qproc it was especially important to change its name since the
      behavior of that field changes with this patch since this function pointer
      can now be NULL when that wasn't possible in the past.
      
      Any driver accessing the qproc or key fields directly will now fail to compile.
      
      Some notes regarding the correctness of this patch: the driver's poll()
      function is called with a 'struct poll_table_struct *wait' argument.  This
      pointer may or may not be NULL, drivers can never rely on it being one or
      the other as that depends on whether or not an earlier file descriptor in
      the select()'s fdset matched the requested events.
      
      There are only three things a driver can do with the wait argument:
      
      1) obtain the key field:
      
      	events = wait ? wait->key : ~0;
      
         This will still work although it should be replaced with the new
         poll_requested_events() function (which does exactly the same).
         This will now even work better, since wait is no longer set to NULL
         unnecessarily.
      
      2) use the qproc callback. This could be deadly since qproc can now be
         NULL. Renaming qproc should prevent this from happening. There are no
         kernel drivers that actually access this callback directly, BTW.
      
      3) test whether wait == NULL to determine whether poll would return without
         waiting. This is no longer sufficient as the correct test is now
         wait == NULL || wait->_qproc == NULL.
      
         However, the worst that can happen here is a slight performance hit in
         the case where wait != NULL and wait->_qproc == NULL. In that case the
         driver will assume that poll_wait() will actually add the fd to the set
         of waiting file descriptors. Of course, poll_wait() will not do that
         since it tests for wait->_qproc. This will not break anything, though.
      
         There is only one place in the whole kernel where this happens
         (sock_poll_wait() in include/net/sock.h) and that code will be replaced
         by a call to poll_does_not_wait() in the next patch.
      
         Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
         actually waiting. The next file descriptor from the set might match the
         event mask and thus any possible waits will never happen.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: NJonathan Corbet <corbet@lwn.net>
      Reviewed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      626cf236
    • D
      crc32: bolt on crc32c · 46c5801e
      Darrick J. Wong 提交于
      Reuse the existing crc32 code to stamp out a crc32c implementation.
      Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Bob Pearson <rpearson@systemfabricworks.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46c5801e
    • J
      include/ and checkpatch: prefer __scanf to __attribute__((format(scanf,...) · 6061d949
      Joe Perches 提交于
      It's equivalent to __printf, so prefer __scanf.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6061d949
    • K
      leds-lm3530: support pwm input mode · bb982009
      Kim, Milo 提交于
      * add 'struct lm3530_pwm_data' in the platform data
        The pwm data is the platform specific functions which generate the pwm.
        The pwm data is only valid when brightness is pwm input mode.
        Functions should be implemented by the pwm driver.
        pwm_set_intensity() : set duty of pwm.
        pwm_get_intensity() : get current the brightness.
      
      * brightness control by pwm
        If the control mode is pwm, then brightness is changed by the duty of
        pwm=.  So pwm platform function should be called in lm3530_brightness_set().
      
      * do not update brightness register when pwm input mode
        In pwm input mode, brightness register is not used.
        If any value is updated in this register, then the led will be off.
      
      * when input mode is changed, set duty of pwm to 0 if unnecessary.
      Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb982009
    • K
      drivers/leds/leds-lp5521.c: support led pattern data · 011af7bc
      Kim, Milo 提交于
      The lp5521 has autonomous operation mode without external control.
      Using lp5521_platform_data, various led patterns can be configurable.
      For supporting this feature, new functions and device attribute are
      added.
      
      Structure of lp5521_led_pattern: 3 channels are supported - red, green
      and blue.  Pattern(s) of each channel and numbers of pattern(s) are
      defined in the pla= tform data.  Pattern data are hexa codes which
      include pattern commands such like set pwm, wait, ramp up/down, branch
      and so on.
      
      Pattern mode functions:
       * lp5521_clear_program_memory
      	Before running new led pattern, program memory should be cleared.
       * lp5521_write_program_memory
      	Pattern data updated in the program memory via the i2c.
       * lp5521_get_pattern
      	Get pattern from predefined in the platform data.
       * lp5521_run_led_pattern
      	Stop current pattern or run new pattern.
      	Transition time is required between different operation mode.
      
      Device attribute - 'led_pattern': To load specific led pattern, new device
      attribute is added.
      
      When the lp5521 driver is unloaded, stop current led pattern mode.
      
      Documentation updated : description about how to define the led patterns
      and example.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Cc: Arun MURTHY <arun.murthy@stericsson.com>
      Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      011af7bc
    • K
      drivers/leds/leds-lp5521.c: add 'update_config' in the lp5521_platform_data · 3b49aacd
      Kim, Milo 提交于
      The value of CONFIG register(Addr 08h) is configurable.  For supporting
      this feature, update_config is added in the platform data.  If
      'update_config' is not defined, the default value is 'LP5521_PWRSAVE_EN |
      LP5521_CP_MODE_AUTO | LP5521_R_TO_BATT'.
      
      To define CONFIG register in the platform data, the bit definitions were
      mo= ved to the header file.
      
      Documentation updated : description about 'update_config' and example.
      Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Cc: Arun MURTHY <arun.murthy@stericsson.com>
      Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b49aacd
    • K
      drivers/leds/leds-lp5521.c: add 'name' in the lp5521_led_config · 5ae4e8a7
      Kim, Milo 提交于
      The name of each led channel can be configurable.  For the compatibility,
      the name is set to default value(xx:channelN) when 'name' is not defined.
      Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Cc: Arun MURTHY <arun.murthy@stericsson.com>
      Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ae4e8a7
    • A
      bitops: introduce for_each_clear_bit() · 03f4a822
      Akinobu Mita 提交于
      Introduce for_each_clear_bit() and for_each_clear_bit_from().  They are
      similar to for_each_set_bit() and list_for_each_set_bit_from(), but they
      iterate over all the cleared bits in a memory region.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Stefano Panella <stefano.panella@csr.com>
      Cc: David Vrabel <david.vrabel@csr.com>
      Cc: Sergei Shtylyov <sshtylyov@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03f4a822
    • A
      bitops: remove for_each_set_bit_cont() · 0a329d2d
      Akinobu Mita 提交于
      Remove for_each_set_bit_cont() after confirming that no one uses
      for_each_set_bit_cont() anymore.
      
      [sfr@canb.auug.org.au: regmap: cope with bitops API change]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a329d2d
    • A
      bitops: rename for_each_set_bit_cont() in favor of analogous list.h function · 307b1cd7
      Akinobu Mita 提交于
      This renames for_each_set_bit_cont() to for_each_set_bit_from() because
      it is analogous to list_for_each_entry_from() in list.h rather than
      list_for_each_entry_continue().
      
      This doesn't remove for_each_set_bit_cont() for now.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      307b1cd7
    • K
      backlight: new backlight driver for LP855x devices · 7be865ab
      Kim, Milo 提交于
      THis driver supports TI LP8550/LP8551/LP8552/LP8553/LP8556 backlight
      devices.
      
      The brightness can be controlled by the I2C or PWM input.  The lp855x
      driver provides both modes.  For the PWM control, pwm-specific functions
      can be defined in the platform data.  And some information can be read
      via the sysfs(lp855x device attributes).
      
      For details, please refer to Documentation/backlight/lp855x-driver.txt.
      
      [axel.lin@gmail.com: add missing mutex_unlock in lp855x_read_byte() error path]
      [axel.lin@gmail.com: check platform data in lp855x_probe()]
      [axel.lin@gmail.com: small cleanups]
      [dan.carpenter@oracle.com: silence a compiler warning]
      [axel.lin@gmail.com: use id->driver_data to differentiate lp855x chips]
      [akpm@linux-foundation.org: simplify boolean return expression]
      Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
      Signed-off-by: NAxel Lin <axel.lin@gmail.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7be865ab
    • L
      prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision · ebec18a6
      Lennart Poettering 提交于
      Userspace service managers/supervisors need to track their started
      services.  Many services daemonize by double-forking and get implicitly
      re-parented to PID 1.  The service manager will no longer be able to
      receive the SIGCHLD signals for them, and is no longer in charge of
      reaping the children with wait().  All information about the children is
      lost at the moment PID 1 cleans up the re-parented processes.
      
      With this prctl, a service manager process can mark itself as a sort of
      'sub-init', able to stay as the parent for all orphaned processes
      created by the started services.  All SIGCHLD signals will be delivered
      to the service manager.
      
      Receiving SIGCHLD and doing wait() is in cases of a service-manager much
      preferred over any possible asynchronous notification about specific
      PIDs, because the service manager has full access to the child process
      data in /proc and the PID can not be re-used until the wait(), the
      service-manager itself is in charge of, has happened.
      
      As a side effect, the relevant parent PID information does not get lost
      by a double-fork, which results in a more elaborate process tree and
      'ps' output:
      
      before:
        # ps afx
        253 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
        294 ?        Sl     0:00 /usr/libexec/polkit-1/polkitd
        328 ?        S      0:00 /usr/sbin/modem-manager
        608 ?        Sl     0:00 /usr/libexec/colord
        658 ?        Sl     0:00 /usr/libexec/upowerd
        819 ?        Sl     0:00 /usr/libexec/imsettings-daemon
        916 ?        Sl     0:00 /usr/libexec/udisks-daemon
        917 ?        S      0:00  \_ udisks-daemon: not polling any devices
      
      after:
        # ps afx
        294 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
        426 ?        Sl     0:00  \_ /usr/libexec/polkit-1/polkitd
        449 ?        S      0:00  \_ /usr/sbin/modem-manager
        635 ?        Sl     0:00  \_ /usr/libexec/colord
        705 ?        Sl     0:00  \_ /usr/libexec/upowerd
        959 ?        Sl     0:00  \_ /usr/libexec/udisks-daemon
        960 ?        S      0:00  |   \_ udisks-daemon: not polling any devices
        977 ?        Sl     0:00  \_ /usr/libexec/packagekitd
      
      This prctl is orthogonal to PID namespaces.  PID namespaces are isolated
      from each other, while a service management process usually requires the
      services to live in the same namespace, to be able to talk to each
      other.
      
      Users of this will be the systemd per-user instance, which provides
      init-like functionality for the user's login session and D-Bus, which
      activates bus services on-demand.  Both need init-like capabilities to
      be able to properly keep track of the services they start.
      
      Many thanks to Oleg for several rounds of review and insights.
      
      [akpm@linux-foundation.org: fix comment layout and spelling]
      [akpm@linux-foundation.org: add lengthy code comment from Oleg]
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLennart Poettering <lennart@poettering.net>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Acked-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebec18a6
    • J
      consolidate WARN_...ONCE() static variables · 7ccaba53
      Jan Beulich 提交于
      Due to the alignment of following variables, these typically consume
      more than just the single byte that 'bool' requires, and as there are a
      few hundred instances, the cache pollution (not so much the waste of
      memory) sums up.  Put these variables into their own section, outside of
      any half way frequently used memory range.
      
      Do the same also to the __warned variable of rcu_lockdep_assert().
      (Don't, however, include the ones used by printk_once() and alike, as
      they can potentially be hot.)
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ccaba53
    • B
      headers: include linux/types.h where appropriate · 10db4e1e
      Bobby Powers 提交于
      This addresses some header check warnings.  DRM headers which include
      "drm.h" have been excluded, as they indirectly include types.h.
      Signed-off-by: NBobby Powers <bobbypowers@gmail.com>
      Cc: Chris Ball <cjb@laptop.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10db4e1e
    • C
      nmi watchdog: do not use cpp symbol in Kconfig · d314d74c
      Cong Wang 提交于
      ARCH_HAS_NMI_WATCHDOG is a macro defined by arch, but config
      HARDLOCKUP_DETECTOR depends on it.  This is wrong, ARCH_HAS_NMI_WATCHDOG
      has to be a Kconfig config, and arch's need it should select it
      explicitly.
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d314d74c
    • M
      magic.h: move some FS magic numbers into magic.h · b502bd11
      Muthu Kumar 提交于
      - Move open-coded filesystem magic numbers into magic.h
      
      - Rearrange magic.h so that the filesystem-related constants are grouped
        together.
      Signed-off-by: NMuthukumar R <muthur@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b502bd11
  4. 23 3月, 2012 1 次提交
  5. 22 3月, 2012 6 次提交