1. 25 3月, 2012 1 次提交
    • L
      Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux · ed2d265d
      Linus Torvalds 提交于
      Pull <linux/bug.h> cleanup from Paul Gortmaker:
       "The changes shown here are to unify linux's BUG support under the one
        <linux/bug.h> file.  Due to historical reasons, we have some BUG code
        in bug.h and some in kernel.h -- i.e.  the support for BUILD_BUG in
        linux/kernel.h predates the addition of linux/bug.h, but old code in
        kernel.h wasn't moved to bug.h at that time.  As a band-aid, kernel.h
        was including <asm/bug.h> to pseudo link them.
      
        This has caused confusion[1] and general yuck/WTF[2] reactions.  Here
        is an example that violates the principle of least surprise:
      
            CC      lib/string.o
            lib/string.c: In function 'strlcat':
            lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
            make[2]: *** [lib/string.o] Error 1
            $
            $ grep linux/bug.h lib/string.c
            #include <linux/bug.h>
            $
      
        We've included <linux/bug.h> for the BUG infrastructure and yet we
        still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
        very confusing for someone who is new to kernel development.
      
        With the above in mind, the goals of this changeset are:
      
        1) find and fix any include/*.h files that were relying on the
           implicit presence of BUG code.
        2) find and fix any C files that were consuming kernel.h and hence
           relying on implicitly getting some/all BUG code.
        3) Move the BUG related code living in kernel.h to <linux/bug.h>
        4) remove the asm/bug.h from kernel.h to finally break the chain.
      
        During development, the order was more like 3-4, build-test, 1-2.  But
        to ensure that git history for bisect doesn't get needless build
        failures introduced, the commits have been reorderd to fix the problem
        areas in advance.
      
      	[1]  https://lkml.org/lkml/2012/1/3/90
      	[2]  https://lkml.org/lkml/2012/1/17/414"
      
      Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
      and linux-next.
      
      * tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
        kernel.h: doesn't explicitly use bug.h, so don't include it.
        bug: consolidate BUILD_BUG_ON with other bug code
        BUG: headers with BUG/BUG_ON etc. need linux/bug.h
        bug.h: add include of it to various implicit C users
        lib: fix implicit users of kernel.h for TAINT_WARN
        spinlock: macroize assert_spin_locked to avoid bug.h dependency
        x86: relocate get/set debugreg fcns to include/asm/debugreg.
      ed2d265d
  2. 24 3月, 2012 39 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl · f1d38e42
      Linus Torvalds 提交于
      Pull sysctl updates from Eric Biederman:
      
       - Rewrite of sysctl for speed and clarity.
      
         Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
         are no longer bottlenecks in the process of adding and removing
         network devices.
      
         sysctl is now focused on being a filesystem instead of system call
         and the code can all be found in fs/proc/proc_sysctl.c.  Hopefully
         this means the code is now approachable.
      
         Much thanks is owed to Lucian Grinjincu for keeping at this until
         something was found that was usable.
      
       - The recent proc_sys_poll oops found by the fuzzer during hibernation
         is fixed.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
        sysctl: protect poll() in entries that may go away
        sysctl: Don't call sysctl_follow_link unless we are a link.
        sysctl: Comments to make the code clearer.
        sysctl: Correct error return from get_subdir
        sysctl: An easier to read version of find_subdir
        sysctl: fix memset parameters in setup_sysctl_set()
        sysctl: remove an unused variable
        sysctl: Add register_sysctl for normal sysctl users
        sysctl: Index sysctl directories with rbtrees.
        sysctl: Make the header lists per directory.
        sysctl: Move sysctl_check_dups into insert_header
        sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
        sysctl: Replace root_list with links between sysctl_table_sets.
        sysctl: Add sysctl_print_dir and use it in get_subdir
        sysctl: Stop requiring explicit management of sysctl directories
        sysctl: Add a root pointer to ctl_table_set
        sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
        sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
        sysctl: Normalize the root_table data structure.
        sysctl: Factor out insert_header and erase_header
        ...
      f1d38e42
    • L
      Merge tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · dae430c6
      Linus Torvalds 提交于
      Pull AMD64 EDAC fixes from Borislav Petkov:
       "A bunch of fixes/updates for the AMD side of EDAC including
      
         * MCE decoding updates
         * tree-wide EDAC sweep making pci_device_ids __devinitconst
         * Scrub rate API correction
         * two amd64_edac corrections for K8 boxes and sysfs csrow nodes"
      
      * tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        MCE, AMD: Constify error tables
        MCE, AMD: Correct bank 5 error signatures
        MCE, AMD: Rework NB MCE signatures
        MCE, AMD: Correct VB data error description
        MCE, AMD: Correct ucode patch buffer description
        MCE, AMD: Correct some MC0 error types
        EDAC: Make pci_device_id tables __devinitconst.
        EDAC: Correct scrub rate API
        amd64_edac: Fix K8 revD and later chip select sizes
        amd64_edac: Fix missing csrows sysfs nodes
      dae430c6
    • L
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq · cf821923
      Linus Torvalds 提交于
      Pull cpufreq updates for 3.4 from Dave Jones: new drivers and some fixes.
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
        provide disable_cpufreq() function to disable the API.
        EXYNOS5250: Add support cpufreq for EXYNOS5250
        EXYNOS4X12: Add support cpufreq for EXYNOS4X12
        [CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next sampling
        [CPUFREQ] Add S3C2416/S3C2450 cpufreq driver
        [CPUFREQ] Fix exposure of ARM_EXYNOS4210_CPUFREQ
        [CPUFREQ] EXYNOS4210: update the name of EXYNOS clock register
        [CPUFREQ] EXYNOS: Initialize locking_frequency with initial frequency
        [CPUFREQ] s3c64xx: Fix mis-cherry pick of VDDINT
      
      Fix up trivial conflicts in Kconfig and Makefile due to just changes
      next to each other (OMAP2PLUS changes vs some new EXYNOS cpufreq
      drivers).
      cf821923
    • L
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq · 4416b0ea
      Linus Torvalds 提交于
      Pull cpufreq fixes from Dave Jones:
       "I meant to get some of these in for 3.3 final, but left things too
        late, so I've got two trees this time."
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
        cpufreq: OMAP: specify range for voltage scaling
        cpufreq: OMAP: scale voltage along with frequency
        cpufreq: OMAP driver depends CPUfreq tables
      4416b0ea
    • L
      Merge branch 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm · 24613ff9
      Linus Torvalds 提交于
      Pull #3 ARM updates from Russell King:
       "This adds gpio support to soc_common, allowing an amount of code to be
        deleted from each PCMCIA socket driver for the PXA/SA11x0 SoCs."
      
      * 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm:
        PCMCIA: sa1111: rename sa1111 socket drivers to have sa1111_ prefix.
        PCMCIA: make lubbock socket driver part of sa1111_cs
        PCMCIA: add Kconfig control for building sa11xx_base.c
        PCMCIA: sa1111: jornada720: no need to disable IRQs around sa1111_set_io
        PCMCIA: sa1111: pass along sa1111_pcmcia_configure_socket() failure code
        PCMCIA: soc_common: remove explicit wrprot initialization in socket drivers
        PCMCIA: soc_common: remove soc_pcmcia_*_irqs functions
        PCMCIA: sa11x0: h3600: convert to use new irq/gpio management
        PCMCIA: sa11x0: simpad: convert to use new irq/gpio management
        PCMCIA: sa11x0: shannon: convert to use new irq/gpio management
        PCMCIA: sa11x0: nanoengine: convert reset handling to use GPIO subsystem
        PCMCIA: sa11x0: nanoengine: convert to use new irq/gpio management
        PCMCIA: sa11x0: cerf: convert reset handling to use GPIO subsystem
        PCMCIA: sa11x0: cerf: convert to use new irq/gpio management
        PCMCIA: sa11x0: assabet: convert to use new irq/gpio management
        PCMCIA: sa1111: use new per-socket irq/gpio infrastructure
        PCMCIA: pxa: convert PXA socket drivers to use new irq/gpio management
        PCMCIA: soc_common: add GPIO support for card status signals
        PCMCIA: soc_common: move common initialization into soc_common
      24613ff9
    • L
      Merge branch 'amba' of git://git.linaro.org/people/rmk/linux-arm · 0d19eac1
      Linus Torvalds 提交于
      Pull #2 ARM updates from Russell King:
       "Further ARM AMBA primecell updates which aren't included directly in
        the previous commit.  I wanted to keep these separate as they're
        touching stuff outside arch/arm/."
      
      * 'amba' of git://git.linaro.org/people/rmk/linux-arm:
        ARM: 7362/1: AMBA: Add module_amba_driver() helper macro for amba_driver
        ARM: 7335/1: mach-u300: do away with MMC config files
        ARM: 7280/1: mmc: mmci: Cache MMCICLOCK and MMCIPOWER register
        ARM: 7309/1: realview: fix unconnected interrupts on EB11MP
        ARM: 7230/1: mmc: mmci: Fix PIO read for small SDIO packets
        ARM: 7227/1: mmc: mmci: Prepare for SDIO before setting up DMA job
        ARM: 7223/1: mmc: mmci: Fixup use of runtime PM and use autosuspend
        ARM: 7221/1: mmc: mmci: Change from using legacy suspend
        ARM: 7219/1: mmc: mmci: Change vdd_handler to a generic ios_handler
        ARM: 7218/1: mmc: mmci: Provide option to configure bus signal direction
        ARM: 7217/1: mmc: mmci: Put power register deviations in variant data
        ARM: 7216/1: mmc: mmci: Do not release spinlock in request_end
        ARM: 7215/1: mmc: mmci: Increase max_segs from 16 to 128
      0d19eac1
    • L
      Merge branch 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm · 56c10bf8
      Linus Torvalds 提交于
      Pull #1 ARM updates from Russell King:
       "This one covers stuff which Arnd is waiting for me to push, as this is
        shared between both our trees and probably other trees elsewhere.
      
        Essentially, this contains:
         - AMBA primecell device initializer updates - mostly shrinking the
           size of the device declarations in platform code to something more
           reasonable.
         - Getting rid of the NO_IRQ crap from AMBA primecell stuff.
         - Nicolas' idle cleanups.  This in combination with the restart
           cleanups from the last merge window results in a great many
           mach/system.h files being deleted."
      
      Yay: ~80 files, ~2000 lines deleted.
      
      * 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm: (60 commits)
        ARM: remove disable_fiq and arch_ret_to_user macros
        ARM: make entry-macro.S depend on !MULTI_IRQ_HANDLER
        ARM: rpc: make default fiq handler run-time installed
        ARM: make arch_ret_to_user macro optional
        ARM: amba: samsung: use common amba device initializers
        ARM: amba: spear: use common amba device initializers
        ARM: amba: nomadik: use common amba device initializers
        ARM: amba: u300: use common amba device initializers
        ARM: amba: lpc32xx: use common amba device initializers
        ARM: amba: netx: use common amba device initializers
        ARM: amba: bcmring: use common amba device initializers
        ARM: amba: ep93xx: use common amba device initializers
        ARM: amba: omap2: use common amba device initializers
        ARM: amba: integrator: use common amba device initializers
        ARM: amba: realview: get rid of private platform amba_device initializer
        ARM: amba: versatile: get rid of private platform amba_device initializer
        ARM: amba: vexpress: get rid of private platform amba_device initializer
        ARM: amba: provide common initializers for static amba devices
        ARM: amba: make use of -1 IRQs warn
        ARM: amba: u300: get rid of NO_IRQ initializers
        ...
      56c10bf8
    • L
      Merge tag 'for-3.4' of git://openrisc.net/jonas/linux · bab2d8c6
      Linus Torvalds 提交于
      Pull OpenRISC changes for 3.4 from Jonas Bonn:
       "This series for the OpenRISC architecture consists of mostly trivial
        fixups.  The most interesting bits of the series are:
      
        * A fix to the timer code whereby the shortest trigger period is set
          to 100 cycles; previously, it was possible to set this to 1 cycle,
          but by the time the register was written, that time had already
          passed and the timer interrupt would not go off until the cycle
          counter had gone a full cycle.
      
        * Allowing a device tree binary to be passed in to the kernel from
          u-boot.  The OpenRISC architecture has been recently merged into
          upstream u-boot, so this change gets OpenRISC Linux into sync with
          that project."
      
      * tag 'for-3.4' of git://openrisc.net/jonas/linux:
        OpenRISC: Remove memory_start/end prototypes
        openrisc: remove semicolon from KSTK_ defs
        openrisc: sanitize use of orig_gpr11
        openrisc: fix virt_addr_valid
        OpenRISC: Export dump_stack()
        OpenRISC: Select GENERIC_ATOMIC64
        openrisc: Set shortest clock event to 100 ticks
        openrisc: included linux/thread_info.h twice
        OpenRISC: Use set_current_blocked() and block_sigmask()
        OpenRISC: Don't mask signals if we fail to setup signal stack
        OpenRISC: No need to reset handler if SA_ONESHOT
        OpenRISC: Don't reimplement force_sigsegv()
        openrisc: enable passing of flattened device tree pointer
        arch/openrisc/mm/init.c: trivial: use BUG_ON
      bab2d8c6
    • L
      Merge tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 0e65ae09
      Linus Torvalds 提交于
      Pull miscellaneous Itanium patches from Tony Luck.
      
      The conflicts in arch/ia64/hp/sim/simserial.c were due to patches to
      simserial that had alredy been included (with lots of further cleanups)
      in the serial tree.
      
      * tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        Documentation/kernel-parameters: remove inttest parameter
        [IA64] Fix ISA IRQ trigger model and polarity setting
        [IA64] Fix a couple of warnings for EXPORT_SYMBOL
        [IA64] Check return from device_register() in cx_device_register()
        [IA64] Fix warning from machine_kexec.c
        [IA64] simserial, bail out when request_irq fails
        [IA64] hpsim, initialize chip for assigned irqs
        [IA64] simserial, include some headers
        [IA64] hpsim, fix SAL handling in fw-emu
        [IA64] genirq fixup for SGI/SN
        [IA64] disable interrupts when exiting from ia64_mca_cmc_int_handler()
      0e65ae09
    • R
      Merge branch 'mmci' into amba · bba1594d
      Russell King 提交于
      bba1594d
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2fb9e96c
      Linus Torvalds 提交于
      Pull additional x86 fixes from Peter Anvin:
       - address a long-standing bug related to when a kernel-spawned process
         gets a signal on an i386 kernel compiled without CONFIG_VM86.
      
       - fix the newly introduced build warning in arch/x86/boot.
      
       - fix a typo in the i386 system call table which affects building some
         libcs.
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86-32: Fix endless loop when processing signals for kernel tasks
        x86, boot: Correct CFLAGS for hostprogs
        x86-32: Fix typo for mq_getsetattr in syscall table
      2fb9e96c
    • L
      Merge branch 'akpm' (Andrew's patch-bomb) · 8e3ade25
      Linus Torvalds 提交于
      Merge second batch of patches from Andrew Morton:
       - various misc things
       - core kernel changes to prctl, exit, exec, init, etc.
       - kernel/watchdog.c updates
       - get_maintainer
       - MAINTAINERS
       - the backlight driver queue
       - core bitops code cleanups
       - the led driver queue
       - some core prio_tree work
       - checkpatch udpates
       - largeish crc32 update
       - a new poll() feature for the v4l guys
       - the rtc driver queue
       - fatfs
       - ptrace
       - signals
       - kmod/usermodehelper updates
       - coredump
       - procfs updates
      
      * emailed from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
        seq_file: add seq_set_overflow(), seq_overflow()
        proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
        procfs: speed up /proc/pid/stat, statm
        procfs: add num_to_str() to speed up /proc/stat
        proc: speed up /proc/stat handling
        fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
        coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
        coredump: remove VM_ALWAYSDUMP flag
        kmod: make __request_module() killable
        kmod: introduce call_modprobe() helper
        usermodehelper: ____call_usermodehelper() doesn't need do_exit()
        usermodehelper: kill umh_wait, renumber UMH_* constants
        usermodehelper: implement UMH_KILLABLE
        usermodehelper: introduce umh_complete(sub_info)
        usermodehelper: use UMH_WAIT_PROC consistently
        signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
        signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
        signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
        signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
        Hexagon: use set_current_blocked() and block_sigmask()
        ...
      8e3ade25
    • K
      seq_file: add seq_set_overflow(), seq_overflow() · e075f591
      KAMEZAWA Hiroyuki 提交于
      It is undocumented but a seq_file's overflow state is indicated by
      m->count == m->size.  Add seq_set_overflow() and seq_overflow() to
      set/check overflow status explicitly.
      
      Based on an idea from Eric Dumazet.
      
      [akpm@linux-foundation.org: tweak code comment]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e075f591
    • P
      proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate(). · 1b26c9b3
      Pravin B Shelar 提交于
      The namespace cleanup path leaks a dentry which holds a reference count
      on a network namespace.  Keeping that network namespace from being freed
      when the last user goes away.  Leaving things like vlan devices in the
      leaked network namespace.
      
      If you use ip netns add for much real work this problem becomes apparent
      pretty quickly.  It light testing the problem hides because frequently
      you simply don't notice the leak.
      
      Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
      
      This issue exists back to 3.0.
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Reported-by: NJustin Pettit <jpettit@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b26c9b3
    • K
      procfs: speed up /proc/pid/stat, statm · bda7bad6
      KAMEZAWA Hiroyuki 提交于
      Process accounting applications as top, ps visit some files under
      /proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
      and /proc/<pid>/statm files.
      
      This patch adds
        - seq_put_decimal_ll() for signed values.
        - allow delimiter == 0.
        - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.
      
      Test result on a system with 2000+ procs.
      
      Before patch:
        [kamezawa@bluextal test]$ top -b -n 1 | wc -l
        2223
        [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null
      
        real    0m0.675s
        user    0m0.044s
        sys     0m0.121s
      
        [kamezawa@bluextal test]$ time ps -elf > /dev/null
      
        real    0m0.236s
        user    0m0.056s
        sys     0m0.176s
      
      After patch:
        kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null
      
        real    0m0.657s
        user    0m0.052s
        sys     0m0.100s
      
        [kamezawa@bluextal ~]$ time ps -elf > /dev/null
      
        real    0m0.198s
        user    0m0.050s
        sys     0m0.145s
      
      Considering top, ps tend to scan /proc periodically, this will reduce cpu
      consumption by top/ps to some extent.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bda7bad6
    • K
      procfs: add num_to_str() to speed up /proc/stat · 1ac101a5
      KAMEZAWA Hiroyuki 提交于
      == stat_check.py
      num = 0
      with open("/proc/stat") as f:
              while num < 1000 :
                      data = f.read()
                      f.seek(0, 0)
                      num = num + 1
      ==
      
      perf shows
      
          20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
          13.41%  stat_check.py  [kernel.kallsyms]    [k] number
          12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
          10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
           4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
           4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf
      
      This patch removes most of calls to vsnprintf() by adding num_to_str()
      and seq_print_decimal_ull(), which prints decimal numbers without rich
      functions provided by printf().
      
      On my 8cpu box.
      == Before patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.150s
      user    0m0.026s
      sys     0m0.121s
      
      == After patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.055s
      user    0m0.022s
      sys     0m0.030s
      
      [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
      [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrea Righi <andrea@betterlinux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Turner <pjt@google.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ac101a5
    • E
      proc: speed up /proc/stat handling · 59a32e2c
      Eric Dumazet 提交于
      On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
      and is slow :
      
        # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
        5826
        # grep "cpu " /tmp/STRACE
        read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>
      
      Thats partly because show_stat() must be called twice since initial
      buffer size is too small (4096 bytes for less than 32 possible cpus)
      
      Fix this by :
      
       1) Taking into account nr_irqs in the initial buffer sizing.
      
       2) Using ksize() to allow better filling of initial buffer.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59a32e2c
    • D
      fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static · b908243c
      Djalal Harouni 提交于
      get_sparsemem_vmemmap_info() is only used inside fs/proc/kcore.c
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b908243c
    • J
      coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP · accb61fe
      Jason Baron 提交于
      Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
      for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
      MADV_DONTDUMP, which can be set by applications to specifically request
      memory regions which should not dump core.
      
      The specific application I have in mind is qemu: we can add a flag there
      that wouldn't dump all of guest memory when qemu dumps core.  This flag
      might also be useful for security sensitive apps that want to absolutely
      make sure that parts of memory are not dumped.  To clear the flag use:
      MADV_DODUMP.
      
      [akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
      [akpm@linux-foundation.org: fix up the architectures which broke]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      accb61fe
    • J
      coredump: remove VM_ALWAYSDUMP flag · 909af768
      Jason Baron 提交于
      The motivation for this patchset was that I was looking at a way for a
      qemu-kvm process, to exclude the guest memory from its core dump, which
      can be quite large.  There are already a number of filter flags in
      /proc/<pid>/coredump_filter, however, these allow one to specify 'types'
      of kernel memory, not specific address ranges (which is needed in this
      case).
      
      Since there are no more vma flags available, the first patch eliminates
      the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
      the kernel to mark vdso and vsyscall pages.  However, it is simple
      enough to check if a vma covers a vdso or vsyscall page without the need
      for this flag.
      
      The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
      'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
      'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
      continue to work the same as before unless 'MADV_DONTDUMP' is set on the
      region.
      
      The qemu code which implements this features is at:
      
        http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch
      
      In my testing the qemu core dump shrunk from 383MB -> 13MB with this
      patch.
      
      I also believe that the 'MADV_DONTDUMP' flag might be useful for
      security sensitive apps, which might want to select which areas are
      dumped.
      
      This patch:
      
      The VM_ALWAYSDUMP flag is currently used by the coredump code to
      indicate that a vma is part of a vsyscall or vdso section.  However, we
      can determine if a vma is in one these sections by checking it against
      the gate_vma and checking for a non-NULL return value from
      arch_vma_name().  Thus, freeing a valuable vma bit.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      909af768
    • O
      kmod: make __request_module() killable · 1cc684ab
      Oleg Nesterov 提交于
      As Tetsuo Handa pointed out, request_module() can stress the system
      while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.
      
      The task T uses "almost all" memory, then it does something which
      triggers request_module().  Say, it can simply call sys_socket().  This
      in turn needs more memory and leads to OOM.  oom-killer correctly
      chooses T and kills it, but this can't help because it sleeps in
      TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
      TIF_MEMDIE task T.
      
      Make __request_module() killable.  The only necessary change is that
      call_modprobe() should kmalloc argv and module_name, they can't live in
      the stack if we use UMH_KILLABLE.  This memory is freed via
      call_usermodehelper_freeinfo()->cleanup.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cc684ab
    • O
      kmod: introduce call_modprobe() helper · 3e63a93b
      Oleg Nesterov 提交于
      No functional changes.  Move the call_usermodehelper code from
      __request_module() into the new simple helper, call_modprobe().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e63a93b
    • O
      usermodehelper: ____call_usermodehelper() doesn't need do_exit() · 5b9bd473
      Oleg Nesterov 提交于
      Minor cleanup.  ____call_usermodehelper() can simply return, no need to
      call do_exit() explicitely.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b9bd473
    • O
      usermodehelper: kill umh_wait, renumber UMH_* constants · 9d944ef3
      Oleg Nesterov 提交于
      No functional changes.  It is not sane to use UMH_KILLABLE with enum
      umh_wait, but obviously we do not want another argument in
      call_usermodehelper_* helpers.  Kill this enum, use the plain int.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d944ef3
    • O
      usermodehelper: implement UMH_KILLABLE · d0bd587a
      Oleg Nesterov 提交于
      Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
      The caller must ensure that subprocess_info->path/etc can not go away
      until call_usermodehelper_freeinfo().
      
      call_usermodehelper_exec(UMH_KILLABLE) does
      wait_for_completion_killable.  If it fails, it uses
      xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
      does the same xhcg() to access sub_info->complete.
      
      If call_usermodehelper_exec wins, it can safely return.  umh_complete()
      should get NULL and call call_usermodehelper_freeinfo().
      
      Otherwise we know that umh_complete() was already called, in this case
      call_usermodehelper_exec() falls back to wait_for_completion() which
      should succeed "very soon".
      
      Note: UMH_NO_WAIT == -1 but it obviously should not be used with
      UMH_KILLABLE.  We delay the neccessary cleanup to simplify the back
      porting.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0bd587a
    • O
      usermodehelper: introduce umh_complete(sub_info) · b3449922
      Oleg Nesterov 提交于
      Preparation.  Add the new trivial helper, umh_complete().  Currently it
      simply does complete(sub_info->complete).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3449922
    • O
      usermodehelper: use UMH_WAIT_PROC consistently · 70834d30
      Oleg Nesterov 提交于
      A few call_usermodehelper() callers use the hardcoded constant instead of
      the proper UMH_WAIT_PROC, fix them.
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Januszewski <spock@gentoo.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70834d30
    • O
      signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/ · a02d6fd6
      Oleg Nesterov 提交于
      Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
      clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
      send_signal().
      
      It is also more efficient if we need to kill a lot of tasks because it
      doesn't alloc sigqueue.
      
      While at it, add the __fatal_signal_pending(task) check as a minor
      optimization.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a02d6fd6
    • O
      signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig() · d2d39309
      Oleg Nesterov 提交于
      Change oom_kill_task() to use do_send_sig_info(SEND_SIG_FORCED) instead
      of force_sig(SIGKILL).  With the recent changes we do not need force_ to
      kill the CLONE_NEWPID tasks.
      
      And this is more correct.  force_sig() can race with the exiting thread
      even if oom_kill_task() checks p->mm != NULL, while
      do_send_sig_info(group => true) kille the whole process.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2d39309
    • O
      signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths · def8cf72
      Oleg Nesterov 提交于
      Cosmetic, rename the from_ancestor_ns argument in prepare_signal()
      paths.  After the previous change it doesn't match the reality.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      def8cf72
    • O
      signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE · 629d362b
      Oleg Nesterov 提交于
      force_sig_info() and friends have the special semantics for synchronous
      signals, this interface should not be used if the target is not current.
      And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
      is not exactly right.
      
      However there are callers which have to use force_ exactly because it
      clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
      although this is almost always is wrong by various reasons.
      
      With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
      the signal comes from the ancestor namespace.
      
      This makes the naming in prepare_signal() paths insane, fixed by the
      next cleanup.
      
      Note: this only affects SIGKILL/SIGSTOP, but this is enough for
      force_sig() abusers.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      629d362b
    • M
      Hexagon: use set_current_blocked() and block_sigmask() · 43aca324
      Matt Fleming 提交于
      As described in e6fa16ab ("signal: sigprocmask() should do
      retarget_shared_pending()") the modification of current->blocked is
      incorrect as we need to check whether the signal we're about to block is
      pending in the shared queue.
      
      Also, use the new helper function introduced in commit 5e6292c0
      ("signal: add block_sigmask() for adding sigmask to current->blocked")
      which centralises the code for updating current->blocked after
      successfully delivering a signal and reduces the amount of duplicate
      code across architectures.  In the past some architectures got this code
      wrong, so using this helper function should stop that from happening
      again.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRichard Kuo <rkuo@codeaurora.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43aca324
    • D
      ptrace: remove PTRACE_SEIZE_DEVEL bit · ee00560c
      Denys Vlasenko 提交于
      PTRACE_SEIZE code is tested and ready for production use, remove the
      code which requires special bit in data argument to make PTRACE_SEIZE
      work.
      
      Strace team prepares for a new release of strace, and we would like to
      ship the code which uses PTRACE_SEIZE, preferably after this change goes
      into released kernel.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee00560c
    • D
      ptrace: renumber PTRACE_EVENT_STOP so that future new options and events can match · 5cdf389a
      Denys Vlasenko 提交于
      PTRACE_EVENT_foo and PTRACE_O_TRACEfoo used to match.
      
      New PTRACE_EVENT_STOP is the first event which has no corresponding
      PTRACE_O_TRACE option.  If we will ever want to add another such option,
      its PTRACE_EVENT's value will collide with PTRACE_EVENT_STOP's value.
      
      This patch changes PTRACE_EVENT_STOP value to prevent this.
      
      While at it, added a comment - the one atop PTRACE_EVENT block, saying
      "Wait extended result codes for the above trace options", is not true
      for PTRACE_EVENT_STOP.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cdf389a
    • D
      ptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter · aa9147c9
      Denys Vlasenko 提交于
      This can be used to close a few corner cases in strace where we get
      unwanted racy behavior after attach, but before we have a chance to set
      options (the notorious post-execve SIGTRAP comes to mind), and removes
      the need to track "did we set opts for this task" state in strace
      internals.
      
      While we are at it:
      
      Make it possible to extend SEIZE in the future with more functionality
      by passing non-zero 'addr' parameter.  To that end, error out if 'addr'
      is non-zero.  PTRACE_ATTACH did not (and still does not) have such
      check, and users (strace) do pass garbage there...  let's avoid
      repeating this mistake with SEIZE.
      
      Set all task->ptrace bits in one operation - before this change, we were
      adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops.  This
      was probably ok (not a bug), but let's be on a safer side.
      
      Changes since v2: use (unsigned long) casts instead of (long) ones, move
      PTRACE_SEIZE_DEVEL-related code to separate lines of code.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa9147c9
    • D
      ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code · 86b6c1f3
      Denys Vlasenko 提交于
      Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
      PT_option bits contiguous and therefore makes code in
      ptrace_setoptions() much simpler.
      
      Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
      instead of using explicit numeric constants, to ensure we don't mess up
      relationship between bit positions and event ids.
      
      PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
      value of PT_EVENT_FLAG_SHIFT-1 is easier to use.
      
      PT_TRACE_MASK constant is nuked, the only its use is replaced by
      (PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86b6c1f3
    • D
      ptrace: don't modify flags on PTRACE_SETOPTIONS failure · 8c5cf9e5
      Denys Vlasenko 提交于
      On ptrace(PTRACE_SETOPTIONS, pid, 0, <opts>), we used to set those
      option bits which are known, and then fail with -EINVAL if there are
      some unknown bits in <opts>.
      
      This is inconsistent with typical error handling, which does not change
      any state if input is invalid.
      
      This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
      return -EINVAL and don't change any bits in task->ptrace.
      
      It's very unlikely that there is userspace code in the wild which will
      be affected by this change: it should have the form
      
          ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)
      
      where PTRACE_O_BOGUSOPT is a constant unknown to the kernel.  But kernel
      headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
      way userspace can use one if it defines one itself.  I can't see why
      anyone would do such a thing deliberately.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c5cf9e5
    • O
      ptrace: don't send SIGTRAP on exec if SEIZED · b1845ff5
      Oleg Nesterov 提交于
      ptrace_event(PTRACE_EVENT_EXEC) sends SIGTRAP if PT_TRACE_EXEC is not
      set.  This is because this SIGTRAP predates PTRACE_O_TRACEEXEC option,
      we do not need/want this with PT_SEIZED which can set the options during
      attach.
      Suggested-by: NPedro Alves <palves@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Chris Evans <scarybeasts@gmail.com>
      Cc: Indan Zupancic <indan@nul.nu>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1845ff5
    • O
      ptrace: the killed tracee should not enter the syscall · 15cab952
      Oleg Nesterov 提交于
      Another old/known problem.  If the tracee is killed after it reports
      syscall_entry, it starts the syscall and debugger can't control this.
      This confuses the users and this creates the security problems for
      ptrace jailers.
      
      Change tracehook_report_syscall_entry() to return non-zero if killed,
      this instructs syscall_trace_enter() to abort the syscall.
      Reported-by: NChris Evans <scarybeasts@gmail.com>
      Tested-by: NIndan Zupancic <indan@nul.nu>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15cab952