1. 31 7月, 2012 40 次提交
    • L
      Merge branch 'akpm' (Andrew's patch-bomb) · 27c1ee3f
      Linus Torvalds 提交于
      Merge Andrew's first set of patches:
       "Non-MM patches:
      
         - lots of misc bits
      
         - tree-wide have_clk() cleanups
      
         - quite a lot of printk tweaks.  I draw your attention to "printk:
           convert the format for KERN_<LEVEL> to a 2 byte pattern" which
           looks a bit scary.  But afaict it's solid.
      
         - backlight updates
      
         - lib/ feature work (notably the addition and use of memweight())
      
         - checkpatch updates
      
         - rtc updates
      
         - nilfs updates
      
         - fatfs updates (partial, still waiting for acks)
      
         - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc
      
         - new fault-injection feature work"
      
      * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
        drivers/misc/lkdtm.c: fix missing allocation failure check
        lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
        fault-injection: add tool to run command with failslab or fail_page_alloc
        fault-injection: add selftests for cpu and memory hotplug
        powerpc: pSeries reconfig notifier error injection module
        memory: memory notifier error injection module
        PM: PM notifier error injection module
        cpu: rewrite cpu-notifier-error-inject module
        fault-injection: notifier error injection
        c/r: fcntl: add F_GETOWNER_UIDS option
        resource: make sure requested range is included in the root range
        include/linux/aio.h: cpp->C conversions
        fs: cachefiles: add support for large files in filesystem caching
        pps: return PTR_ERR on error in device_create
        taskstats: check nla_reserve() return
        sysctl: suppress kmemleak messages
        ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
        ipc: compat: use signed size_t types for msgsnd and msgrcv
        ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
        ipc: add COMPAT_SHMLBA support
        ...
      27c1ee3f
    • A
      086ff4b3
    • M
      lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() · e04f2283
      Mandeep Singh Baines 提交于
      We are seeing a lot of sg_alloc_table allocation failures using the new
      drm prime infrastructure.  We isolated the cause to code in
      __sg_alloc_table that was re-writing the gfp_flags.
      
      There is a comment in the code that suggest that there is an assumption
      about the allocation coming from a memory pool.  This was likely true
      when sg lists were primarily used for disk I/O.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Cong Wang <amwang@redhat.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Rob Clark <rob.clark@linaro.org>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Olof Johansson <olofj@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e04f2283
    • A
      fault-injection: add tool to run command with failslab or fail_page_alloc · c24aa64d
      Akinobu Mita 提交于
      This adds tools/testing/fault-injection/failcmd.sh to run a command while
      injecting slab/page allocation failures via fault injection.
      
      Example:
      
      Run a command "make -C tools/testing/selftests/ run_tests" with
      injecting slab allocation failure.
      
      	# ./tools/testing/fault-injection/failcmd.sh \
      		-- make -C tools/testing/selftests/ run_tests
      
      Same as above except to specify 100 times failures at most instead of
      one time at most by default.
      
      	# ./tools/testing/fault-injection/failcmd.sh --times=100 \
      		-- make -C tools/testing/selftests/ run_tests
      
      Same as above except to inject page allocation failure instead of slab
      allocation failure.
      
      	# env FAILCMD_TYPE=fail_page_alloc \
      		./tools/testing/fault-injection/failcmd.sh --times=100 \
      		-- make -C tools/testing/selftests/ run_tests
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c24aa64d
    • A
      fault-injection: add selftests for cpu and memory hotplug · d89dffa9
      Akinobu Mita 提交于
      This adds two selftests
      
      * tools/testing/selftests/cpu-hotplug/on-off-test.sh is testing script
      for CPU hotplug
      
      1. Online all hot-pluggable CPUs
      2. Offline all hot-pluggable CPUs
      3. Online all hot-pluggable CPUs again
      4. Exit if cpu-notifier-error-inject.ko is not available
      5. Offline all hot-pluggable CPUs in preparation for testing
      6. Test CPU hot-add error handling by injecting notifier errors
      7. Online all hot-pluggable CPUs in preparation for testing
      8. Test CPU hot-remove error handling by injecting notifier errors
      
      * tools/testing/selftests/memory-hotplug/on-off-test.sh is doing the
      similar thing for memory hotplug.
      
      1. Online all hot-pluggable memory
      2. Offline 10% of hot-pluggable memory
      3. Online all hot-pluggable memory again
      4. Exit if memory-notifier-error-inject.ko is not available
      5. Offline 10% of hot-pluggable memory in preparation for testing
      6. Test memory hot-add error handling by injecting notifier errors
      7. Online all hot-pluggable memory in preparation for testing
      8. Test memory hot-remove error handling by injecting notifier errors
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d89dffa9
    • A
      powerpc: pSeries reconfig notifier error injection module · 08dfb4dd
      Akinobu Mita 提交于
      This provides the ability to inject artifical errors to pSeries reconfig
      notifier chain callbacks.  It is controlled through debugfs interface
      under /sys/kernel/debug/notifier-error-inject/pSeries-reconfig
      
      If the notifier call chain should be failed with some events
      notified, write the error code to "actions/<notifier event>/error".
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08dfb4dd
    • A
      memory: memory notifier error injection module · 9579f5bd
      Akinobu Mita 提交于
      This provides the ability to inject artifical errors to memory hotplug
      notifier chain callbacks.  It is controlled through debugfs interface
      under /sys/kernel/debug/notifier-error-inject/memory
      
      If the notifier call chain should be failed with some events notified,
      write the error code to "actions/<notifier event>/error".
      
      Example: Inject memory hotplug offline error (-12 == -ENOMEM)
      
      	# cd /sys/kernel/debug/notifier-error-inject/memory
      	# echo -12 > actions/MEM_GOING_OFFLINE/error
      	# echo offline > /sys/devices/system/memory/memoryXXX/state
      	bash: echo: write error: Cannot allocate memory
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9579f5bd
    • A
      PM: PM notifier error injection module · 048b9c35
      Akinobu Mita 提交于
      This provides the ability to inject artifical errors to PM notifier chain
      callbacks.  It is controlled through debugfs interface under
      /sys/kernel/debug/notifier-error-inject/pm
      
      Each of the files in "error" directory represents an event which can be
      failed and contains the error code.  If the notifier call chain should be
      failed with some events notified, write the error code to the files.
      
      If the notifier call chain should be failed with some events notified,
      write the error code to "actions/<notifier event>/error".
      
      Example: Inject PM suspend error (-12 = -ENOMEM)
      
      	# cd /sys/kernel/debug/notifier-error-inject/pm
      	# echo -12 > actions/PM_SUSPEND_PREPARE/error
      	# echo mem > /sys/power/state
      	bash: echo: write error: Cannot allocate memory
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      048b9c35
    • A
      cpu: rewrite cpu-notifier-error-inject module · f5a9f52e
      Akinobu Mita 提交于
      Rewrite existing cpu-notifier-error-inject module to use debugfs based new
      framework.
      
      This change removes cpu_up_prepare_error and cpu_down_prepare_error module
      parameters which were used to specify error code to be injected.  We could
      keep these module parameters for backward compatibility by module_param_cb
      but it seems overkill for this module.
      
      This provides the ability to inject artifical errors to CPU notifier chain
      callbacks.  It is controlled through debugfs interface under
      /sys/kernel/debug/notifier-error-inject/cpu
      
      If the notifier call chain should be failed with some events notified,
      write the error code to "actions/<notifier event>/error".
      
      Example1: inject CPU offline error (-1 == -EPERM)
      
      	# cd /sys/kernel/debug/notifier-error-inject/cpu
      	# echo -1 > actions/CPU_DOWN_PREPARE/error
      	# echo 0 > /sys/devices/system/cpu/cpu1/online
      	bash: echo: write error: Operation not permitted
      
      Example2: inject CPU online error (-2 == -ENOENT)
      
      	# cd /sys/kernel/debug/notifier-error-inject/cpu
      	# echo -2 > actions/CPU_UP_PREPARE/error
      	# echo 1 > /sys/devices/system/cpu/cpu1/online
      	bash: echo: write error: No such file or directory
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5a9f52e
    • A
      fault-injection: notifier error injection · 8d438288
      Akinobu Mita 提交于
      This patchset provides kernel modules that can be used to test the error
      handling of notifier call chain failures by injecting artifical errors to
      the following notifier chain callbacks.
      
       * CPU notifier
       * PM notifier
       * memory hotplug notifier
       * powerpc pSeries reconfig notifier
      
      Example: Inject CPU offline error (-1 == -EPERM)
      
        # cd /sys/kernel/debug/notifier-error-inject/cpu
        # echo -1 > actions/CPU_DOWN_PREPARE/error
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        bash: echo: write error: Operation not permitted
      
      The patchset also adds cpu and memory hotplug tests to
      tools/testing/selftests These tests first do simple online and offline
      test and then do fault injection tests if notifier error injection
      module is available.
      
      This patch:
      
      The notifier error injection provides the ability to inject artifical
      errors to specified notifier chain callbacks.  It is useful to test the
      error handling of notifier call chain failures.
      
      This adds common basic functions to define which type of events can be
      fail and to initialize the debugfs interface to control what error code
      should be returned and which event should be failed.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d438288
    • C
      c/r: fcntl: add F_GETOWNER_UIDS option · 1d151c33
      Cyrill Gorcunov 提交于
      When we restore file descriptors we would like them to look exactly as
      they were at dumping time.
      
      With help of fcntl it's almost possible, the missing snippet is file
      owners UIDs.
      
      To be able to read their values the F_GETOWNER_UIDS is introduced.
      
      This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
      returning -EINVAL.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d151c33
    • O
      resource: make sure requested range is included in the root range · 65fed8f6
      Octavian Purdila 提交于
      When the requested range is outside of the root range the logic in
      __reserve_region_with_split will cause an infinite recursion which will
      overflow the stack as seen in the warning bellow.
      
      This particular stack overflow was caused by requesting the
      (100000000-107ffffff) range while the root range was (0-ffffffff).  In
      this case __request_resource would return the whole root range as
      conflict range (i.e.  0-ffffffff).  Then, the logic in
      __reserve_region_with_split would continue the recursion requesting the
      new range as (conflict->end+1, end) which incidentally in this case
      equals the originally requested range.
      
      This patch aborts looking for an usable range when the request does not
      intersect with the root range.  When the request partially overlaps with
      the root range, it ajust the request to fall in the root range and then
      continues with the new request.
      
      When the request is modified or aborted errors and a stack trace are
      logged to allow catching the errors in the upper layers.
      
      [    5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
      [    5.975150] Modules linked in:
      [    5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
      [    5.985324] Call Trace:
      [    5.987759]  [<c1039dfc>] ? console_unlock+0x17b/0x18d
      [    5.992891]  [<c1039620>] warn_slowpath_common+0x48/0x5d
      [    5.998194]  [<c1031758>] ? sub_preempt_count+0x63/0x89
      [    6.003412]  [<c1039644>] warn_slowpath_null+0xf/0x13
      [    6.008453]  [<c1031758>] sub_preempt_count+0x63/0x89
      [    6.013499]  [<c14d60c4>] _raw_spin_unlock+0x27/0x3f
      [    6.018453]  [<c10c6349>] add_partial+0x36/0x3b
      [    6.022973]  [<c10c7c0a>] deactivate_slab+0x96/0xb4
      [    6.027842]  [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241
      [    6.034456]  [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
      [    6.039842]  [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
      [    6.045232]  [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0
      [    6.050710]  [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
      [    6.056100]  [<c103f78f>] kzalloc.constprop.5+0x29/0x38
      [    6.061320]  [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1
      [    6.067230]  [<c17b4693>] __reserve_region_with_split+0xc6/0xd1
      ...
      [    7.179057]  [<c17b4693>] __reserve_region_with_split+0xc6/0xd1
      [    7.184970]  [<c17b4779>] reserve_region_with_split+0x30/0x42
      [    7.190709]  [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9
      [    7.196623]  [<c17c9526>] pcibios_resource_survey+0x23/0x2a
      [    7.202184]  [<c17cad8a>] pcibios_init+0x23/0x35
      [    7.206789]  [<c17ca574>] pci_subsys_init+0x3f/0x44
      [    7.211659]  [<c1002088>] do_one_initcall+0x72/0x122
      [    7.216615]  [<c17ca535>] ? pci_legacy_init+0x3d/0x3d
      [    7.221659]  [<c17a27ff>] kernel_init+0xa6/0x118
      [    7.226265]  [<c17a2759>] ? start_kernel+0x334/0x334
      [    7.231223]  [<c14d7482>] kernel_thread_helper+0x6/0x10
      Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65fed8f6
    • A
      include/linux/aio.h: cpp->C conversions · f7e1becb
      Andrew Morton 提交于
      Convert init_sync_kiocb() from a nasty macro into a nice C function.  The
      struct assignment trick takes care of zeroing all unmentioned fields.
      Shrinks fs/read_write.o's .text from 9857 bytes to 9714.
      
      Also demacroize is_sync_kiocb() and aio_ring_avail().  The latter fixes an
      arg-referenced-multiple-times hand grenade.
      
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7e1becb
    • J
    • E
      pps: return PTR_ERR on error in device_create · 668f06b9
      Emil Goode 提交于
      We should return PTR_ERR if the call to the device_create function fails.
      Without this patch we instead return the value from a successful call to
      cdev_add if the call to device_create fails.
      Signed-off-by: NEmil Goode <emilgoode@gmail.com>
      Acked-by: NDevendra Naga <devendra.aaru@gmail.com>
      Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
      Cc: Rodolfo Giometti <giometti@enneenne.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668f06b9
    • A
      taskstats: check nla_reserve() return · 25353b33
      Alan Cox 提交于
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44621
      
      Reported-by: <rucsoftsec@gmail.com>
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25353b33
    • S
      sysctl: suppress kmemleak messages · fd4b616b
      Steven Rostedt 提交于
      register_sysctl_table() is a strange function, as it makes internal
      allocations (a header) to register a sysctl_table.  This header is a
      handle to the table that is created, and can be used to unregister the
      table.  But if the table is permanent and never unregistered, the header
      acts the same as a static variable.
      
      Unfortunately, this allocation of memory that is never expected to be
      freed fools kmemleak in thinking that we have leaked memory.  For those
      sysctl tables that are never unregistered, and have no pointer referencing
      them, kmemleak will think that these are memory leaks:
      
      unreferenced object 0xffff880079fb9d40 (size 192):
        comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8146b590>] kmemleak_alloc+0x73/0x98
          [<ffffffff8110a935>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
          [<ffffffff8110b852>] __kmalloc+0x107/0x153
          [<ffffffff8116fa72>] kzalloc.constprop.8+0xe/0x10
          [<ffffffff811703c9>] __register_sysctl_paths+0xe1/0x160
          [<ffffffff81170463>] register_sysctl_paths+0x1b/0x1d
          [<ffffffff8117047d>] register_sysctl_table+0x18/0x1a
          [<ffffffff81afb0a1>] sysctl_init+0x10/0x14
          [<ffffffff81b05a6f>] proc_sys_init+0x2f/0x31
          [<ffffffff81b0584c>] proc_root_init+0xa5/0xa7
          [<ffffffff81ae5b7e>] start_kernel+0x3d0/0x40a
          [<ffffffff81ae52a7>] x86_64_start_reservations+0xae/0xb2
          [<ffffffff81ae53ad>] x86_64_start_kernel+0x102/0x111
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The sysctl_base_table used by sysctl itself is one such instance that
      registers the table to never be unregistered.
      
      Use kmemleak_not_leak() to suppress the kmemleak false positive.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd4b616b
    • W
      ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION · c1d7e01d
      Will Deacon 提交于
      Rather than #define the options manually in the architecture code, add
      Kconfig options for them and select them there instead.  This also allows
      us to select the compat IPC version parsing automatically for platforms
      using the old compat IPC interface.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1d7e01d
    • W
      ipc: compat: use signed size_t types for msgsnd and msgrcv · 05ba3f1a
      Will Deacon 提交于
      The msgsnd and msgrcv system calls use size_t to represent the size of the
      message being transferred.  POSIX states that values of msgsz greater than
      SSIZE_MAX cause the result to be implementation-defined.  On Linux, this
      equates to returning -EINVAL if (long) msgsz < 0.
      
      For compat tasks where !CONFIG_ARCH_WANT_OLD_COMPAT_IPC and compat_size_t
      is smaller than size_t, negative size values passed from userspace will be
      interpreted as positive values by do_msg{rcv,snd} and will fail to exit
      early with -EINVAL.
      
      This patch changes the compat prototypes for msg{rcv,snd} so that the
      message size is represented as a compat_ssize_t, which we cast to the
      native ssize_t type for the core IPC code.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05ba3f1a
    • W
      ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC · b610c04c
      Will Deacon 提交于
      Commit 48b25c43 ("ipc: provide generic compat versions of IPC
      syscalls") added a new ARCH_WANT_OLD_COMPAT_IPC config option for
      architectures to select if their compat target requires the old IPC
      syscall interface.
      
      For architectures (such as AArch64) that do not require the internal
      calling conventions provided by this option, but have a compat target
      where the C library passes the IPC_64 flag explicitly,
      compat_ipc_parse_version no longer strips out the flag before calling
      the native system call implementation, resulting in unknown SHM/IPC
      commands and -EINVAL being returned to userspace.
      
      This patch separates the selection of the internal calling conventions
      for the IPC syscalls from the version parsing, allowing architectures to
      select __ARCH_WANT_COMPAT_IPC_PARSE_VERSION if they want to use version
      parsing whilst retaining the newer syscall calling conventions.
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b610c04c
    • W
      ipc: add COMPAT_SHMLBA support · 079a96ae
      Will Deacon 提交于
      If the SHMLBA definition for a native task differs from the definition for
      a compat task, the do_shmat() function would need to handle both.
      
      This patch introduces COMPAT_SHMLBA, which is used by the compat shmat
      syscall when calling the ipc code and allows architectures such as AArch64
      (where the native SHMLBA is 64k but the compat (AArch32) definition is
      16k) to provide the correct semantics for compat IPC system calls.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      079a96ae
    • V
      kdump: append newline to the last lien of vmcoreinfo note · 63dca8d5
      Vivek Goyal 提交于
      The last line of vmcoreinfo note does not end with \n.  Parsing all the
      lines in note becomes easier if all lines end with \n instead of trying to
      special case the last line.
      
      I know at least one tool, vmcore-dmesg in kexec-tools tree which made the
      assumption that all lines end with \n.  I think it is a good idea to fix
      it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63dca8d5
    • A
      fork: fix error handling in dup_task() · f19b9f74
      Akinobu Mita 提交于
      The function dup_task() may fail at the following function calls in the
      following order.
      
      0) alloc_task_struct_node()
      1) alloc_thread_info_node()
      2) arch_dup_task_struct()
      
      Error by 0) is not a matter, it can just return.  But error by 1) requires
      releasing task_struct allocated by 0) before it returns.  Likewise, error
      by 2) requires releasing task_struct and thread_info allocated by 0) and
      1).
      
      The existing error handling calls free_task_struct() and
      free_thread_info() which do not only release task_struct and thread_info,
      but also call architecture specific arch_release_task_struct() and
      arch_release_thread_info().
      
      The problem is that task_struct and thread_info are not fully initialized
      yet at this point, but arch_release_task_struct() and
      arch_release_thread_info() are called with them.
      
      For example, x86 defines its own arch_release_task_struct() that releases
      a task_xstate.  If alloc_thread_info_node() fails in dup_task(),
      arch_release_task_struct() is called with task_struct which is just
      allocated and filled with garbage in this error handling.
      
      This actually happened with tools/testing/fault-injection/failcmd.sh
      
      	# env FAILCMD_TYPE=fail_page_alloc \
      		./tools/testing/fault-injection/failcmd.sh --times=100 \
      		--min-order=0 --ignore-gfp-wait=0 \
      		-- make -C tools/testing/selftests/ run_tests
      
      In order to fix this issue, make free_{task_struct,thread_info}() not to
      call arch_release_{task_struct,thread_info}() and call
      arch_release_{task_struct,thread_info}() implicitly where needed.
      
      Default arch_release_task_struct() and arch_release_thread_info() are
      defined as empty by default.  So this change only affects the
      architectures which implement their own arch_release_task_struct() or
      arch_release_thread_info() as listed below.
      
      arch_release_task_struct(): x86, sh
      arch_release_thread_info(): mn10300, tile
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f19b9f74
    • A
      revert "sched: Fix fork() error path to not crash" · 87bec58a
      Andrew Morton 提交于
      To make way for "fork: fix error handling in dup_task()", which fixes the
      errors more completely.
      
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87bec58a
    • H
      fork: use vma_pages() to simplify the code · b2412b7f
      Huang Shijie 提交于
      The current code can be replaced by vma_pages().  So use it to simplify
      the code.
      
      [akpm@linux-foundation.org: initialise `len' at its definition site]
      Signed-off-by: NHuang Shijie <shijie8@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2412b7f
    • D
      proc: do not allow negative offsets on /proc/<pid>/environ · bc452b4b
      Djalal Harouni 提交于
      __mem_open() which is called by both /proc/<pid>/environ and
      /proc/<pid>/mem ->open() handlers will allow the use of negative offsets.
      /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ.
      
      Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open()
      to allow negative offsets only on /proc/<pid>/mem.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc452b4b
    • D
      proc: environ_read() make sure offset points to environment address range · e8905ec2
      Djalal Harouni 提交于
      Currently the following offset and environment address range check in
      environ_read() of /proc/<pid>/environ is buggy:
      
        int this_len = mm->env_end - (mm->env_start + src);
        if (this_len <= 0)
          break;
      
      Large or negative offsets on /proc/<pid>/environ converted to 'unsigned
      long' may pass this check since '(mm->env_start + src)' can overflow and
      'this_len' will be positive.
      
      This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since
      (mm->env_start + src) will point and read from another VMA.
      
      There are two fixes here plus some code cleaning:
      
      1) Fix the overflow by checking if the offset that was converted to
         unsigned long will always point to the [mm->env_start, mm->env_end]
         address range.
      
      2) Remove the truncation that was made to the result of the check,
         storing the result in 'int this_len' will alter its value and we can
         not depend on it.
      
      For kernels that have commit b409e578 ("proc: clean up
      /proc/<pid>/environ handling") which adds the appropriate ptrace check and
      saves the 'mm' at ->open() time, this is not a security issue.
      
      This patch is taken from the grsecurity patch since it was just made
      available.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8905ec2
    • J
      coredump: fix wrong comments on core limits of pipe coredump case · 108ceeb0
      Jovi Zhang 提交于
      In commit 898b374a ("exec: replace call_usermodehelper_pipe with use
      of umh init function and resolve limit"), the core limits recursive
      check value was changed from 0 to 1, but the corresponding comments were
      not updated.
      Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      108ceeb0
    • T
      kmod: avoid deadlock from recursive kmod call · 0f20784d
      Tetsuo Handa 提交于
      The system deadlocks (at least since 2.6.10) when
      call_usermodehelper(UMH_WAIT_EXEC) request triggers
      call_usermodehelper(UMH_WAIT_PROC) request.
      
      This is because "khelper thread is waiting for the worker thread at
      wait_for_completion() in do_fork() since the worker thread was created
      with CLONE_VFORK flag" and "the worker thread cannot call complete()
      because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper
      thread cannot start processing UMH_WAIT_PROC request because the khelper
      thread is waiting for the worker thread at wait_for_completion() in
      do_fork()".
      
      The easiest example to observe this deadlock is to use a corrupted
      /sbin/hotplug binary (like shown below).
      
        # : > /tmp/dummy
        # chmod 755 /tmp/dummy
        # echo /tmp/dummy > /proc/sys/kernel/hotplug
        # modprobe whatever
      
      call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from
      kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a
      module.  do_execve("/tmp/dummy") triggers a call to
      request_module("binfmt-0000") from search_binary_handler() which in turn
      calls call_usermodehelper(UMH_WAIT_PROC).
      
      In order to avoid deadlock, as a for-now and easy-to-backport solution, do
      not try to call wait_for_completion() in call_usermodehelper_exec() if the
      worker thread was created by khelper thread with CLONE_VFORK flag.  Future
      and fundamental solution might be replacing singleton khelper thread with
      some workqueue so that recursive calls up to max_active dependency loop
      can be handled without deadlock.
      
      [akpm@linux-foundation.org: add comment to kmod_thread_locker]
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f20784d
    • A
      kernel/kmod.c: document call_usermodehelper_fns() a bit · 79c743dd
      Andrew Morton 提交于
      This function's interface is, uh, subtle.  Attempt to apologise for it.
      
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79c743dd
    • S
      fat: refactor shortname parsing · deb8274a
      Steven J. Magnani 提交于
      Nearly identical shortname parsing is performed in fat_search_long() and
      __fat_readdir().  Extract this code into a function that may be called by
      both.
      Signed-off-by: NSteven J. Magnani <steve@digidescorp.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      deb8274a
    • S
      fat: accessors for msdos_dir_entry 'start' fields · a943ed71
      Steven J. Magnani 提交于
      Simplify code by providing accessor functions for the directory entry
      start cluster fields.
      Signed-off-by: NSteven J. Magnani <steve@digidescorp.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a943ed71
    • N
      hfsplus: use -ENOMEM when kzalloc() fails · 497d48bd
      Namjae Jeon 提交于
      Use -ENOMEM return value instead of -EINVAL when kzalloc() fails.
      Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      497d48bd
    • V
      nilfs2: add omitted comments for different structures in driver implementation · f5974c8f
      Vyacheslav Dubeyko 提交于
      Add omitted comments for different structures in driver implementation.
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5974c8f
    • V
      nilfs2: add omitted comments for structures in nilfs2_fs.h · 8c74ac05
      Vyacheslav Dubeyko 提交于
      Add omitted comments for structures in nilfs2_fs.h.
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c74ac05
    • R
      nilfs2: fix deadlock issue between chcp and thaw ioctls · 572d8b39
      Ryusuke Konishi 提交于
      An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:
      
       chcp            D ffff88013870f3d0     0  1325   1324 0x00000004
       ...
       Call Trace:
         nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
         wake_up_bit+0x20/0x20
         copy_from_user+0x18/0x30 [nilfs2]
         nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
         nilfs_ioctl+0x252/0x61a [nilfs2]
         do_page_fault+0x311/0x34c
         get_unmapped_area+0x132/0x14e
         do_vfs_ioctl+0x44b/0x490
         __set_task_blocked+0x5a/0x61
         vm_mmap_pgoff+0x76/0x87
         __set_current_blocked+0x30/0x4a
         sys_ioctl+0x4b/0x6f
         system_call_fastpath+0x16/0x1b
       thaw            D ffff88013870d890     0  1352   1351 0x00000004
       ...
       Call Trace:
         rwsem_down_failed_common+0xdb/0x10f
         call_rwsem_down_write_failed+0x13/0x20
         down_write+0x25/0x27
         thaw_super+0x13/0x9e
         do_vfs_ioctl+0x1f5/0x490
         vm_mmap_pgoff+0x76/0x87
         sys_ioctl+0x4b/0x6f
         filp_close+0x64/0x6c
         system_call_fastpath+0x16/0x1b
      
      where the thaw ioctl deadlocked at thaw_super() when called while chcp was
      waiting at nilfs_transaction_begin() called from
      nilfs_ioctl_change_cpmode().  This deadlock is 100% reproducible.
      
      This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
      read mode and then waits for unfreezing in nilfs_transaction_begin(),
      whereas thaw_super() locks sb->s_umount in write mode.  The locking of
      sb->s_umount here was intended to make snapshot mounts and the downgrade
      of snapshots to checkpoints exclusive.
      
      This fixes the deadlock issue by replacing the sb->s_umount usage in
      nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
      mounts.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      572d8b39
    • R
      nilfs2: fix timing issue between rmcp and chcp ioctls · fe0627e7
      Ryusuke Konishi 提交于
      The checkpoint deletion ioctl (rmcp ioctl) has potential for breaking
      snapshot because it is not fully exclusive with checkpoint mode change
      ioctl (chcp ioctl).
      
      The rmcp ioctl first tests if the specified checkpoint is a snapshot or
      not within nilfs_cpfile_delete_checkpoint function, and then calls
      nilfs_cpfile_delete_checkpoints function to actually invalidate the
      checkpoint only if it's not a snapshot.  However, the checkpoint can be
      changed into a snapshot by the chcp ioctl between these two operations.
      In that case, calling nilfs_cpfile_delete_checkpoints() wrongly
      invalidates the snapshot, which leads to snapshot list corruption and
      snapshot count mismatch.
      
      This fixes the issue by changing nilfs_cpfile_delete_checkpoints() so
      that it reconfirms the target checkpoints are snapshot or not.
      
      This second check is exclusive with the chcp operation since it is
      protected by an existing semaphore.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe0627e7
    • F
      nilfs2: remove references to long gone super operations · 278038ac
      Fernando Luis Vazquez Cao 提交于
      ->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove
      references to them in the NTFS code.  Noticed while cleaning up the
      fsfreeze mess.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      278038ac
    • V
      nilfs2: add omitted comment for ns_mount_state field of the_nilfs structure · 6b0f3393
      Vyacheslav Dubeyko 提交于
      Add omitted comment for ns_mount_state field of the_nilfs structure.
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b0f3393
    • V
      minixfs: fix block limit check · 6ed6a722
      Vladimir Serbinenko 提交于
      On minix2 and minix3 usually max_size is 7fffffff and the check in
      question prohibits creation of last block spanning right before 7fffffff,
      due to downward rounding during the division.  Fix it by using
      multiplication instead.
      
      [akpm@linux-foundation.org: fix up code layout, use local `sb']
      Signed-off-by: NVladimir Serbinenko <phcoder@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ed6a722