1. 07 6月, 2014 40 次提交
    • F
      FS/CACHEFILES: convert printk to pr_foo() · 4e1eb883
      Fabian Frederick 提交于
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e1eb883
    • F
      fs/pstore: logging clean-up · ef748853
      Fabian Frederick 提交于
      - Define pr_fmt in plateform.c and ram_core.c for global prefix.
      
      - Coalesce format fragments.
      
      - Separate format/arguments on lines > 80 characters.
      
      Note: Some pr_foo() were initially declared without prefix and therefore
      this could break existing log analyzer.
      
      [akpm@linux-foundation.org: missed a couple of prefix removals]
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joe Perches <joe@perches.com>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef748853
    • F
      kernel/profile.c: use static const char instead of static char · f3da64d1
      Fabian Frederick 提交于
      schedstr, sleepstr and kvmstr are only used in strcmp & strlen
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3da64d1
    • F
      kernel/profile.c: convert printk to pr_foo() · aba871f1
      Fabian Frederick 提交于
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aba871f1
    • F
      fs/affs: pr_debug cleanup · 9606d9aa
      Fabian Frederick 提交于
      - Remove AFFS: prefix (defined in pr_fmt)
      
      - Use __func__
      
      - Separate format/arguments on lines > 80 characters.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9606d9aa
    • F
      fs/affs: convert printk to pr_foo() · 0158de12
      Fabian Frederick 提交于
      -All printk(KERN_foo converted to pr_foo()
      
      -Default printk converted to pr_warn()
      
      -Add pr_fmt to affs.h
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0158de12
    • F
      fs/affs/file.c: remove unnecessary function parameters · 0c89d670
      Fabian Frederick 提交于
      - affs_do_readpage_ofs is always called with from = 0 ie reading from
        page->index
      
      - File parameter is never used
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c89d670
    • H
      include/asm-generic/ioctl.h: fix _IOC_TYPECHECK sparse error · d55875f5
      Hans Verkuil 提交于
      When running sparse over drivers/media/v4l2-core/v4l2-ioctl.c I get these
      errors:
      
        drivers/media/v4l2-core/v4l2-ioctl.c:2043:9: error: bad integer constant expression
        drivers/media/v4l2-core/v4l2-ioctl.c:2044:9: error: bad integer constant expression
        drivers/media/v4l2-core/v4l2-ioctl.c:2045:9: error: bad integer constant expression
        drivers/media/v4l2-core/v4l2-ioctl.c:2046:9: error: bad integer constant expression
      
      etc.
      
      The root cause of that turns out to be in include/asm-generic/ioctl.h:
      
      #include <uapi/asm-generic/ioctl.h>
      
      /* provoke compile error for invalid uses of size argument */
      extern unsigned int __invalid_size_argument_for_IOC;
      #define _IOC_TYPECHECK(t) \
              ((sizeof(t) == sizeof(t[1]) && \
                sizeof(t) < (1 << _IOC_SIZEBITS)) ? \
                sizeof(t) : __invalid_size_argument_for_IOC)
      
      If it is defined as this (as is already done if __KERNEL__ is not defined):
      
        #define _IOC_TYPECHECK(t) (sizeof(t))
      
      then all is well with the world.
      
      This patch allows sparse to work correctly.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d55875f5
    • F
      kernel/user_namespace.c: kernel-doc/checkpatch fixes · 68a9a435
      Fabian Frederick 提交于
      -uid->gid
      -split some function declarations
      -if/then/else warning
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68a9a435
    • K
      tools/testing/selftests/sysctl: validate sysctl_writes_strict · 24fe831c
      Kees Cook 提交于
      This adds several behavioral tests to sysctl string and number writing
      to detect unexpected cases that behaved differently when the sysctl
      kernel.sysctl_writes_strict != 1.
      
      [ original ]
          root@localhost:~# make test_num
          == Testing sysctl behavior against /proc/sys/kernel/domainname ==
          Writing test file ... ok
          Checking sysctl is not set to test value ... ok
          Writing sysctl from shell ... ok
          Resetting sysctl to original value ... ok
          Writing entire sysctl in single write ... ok
          Writing middle of sysctl after synchronized seek ... FAIL
          Writing beyond end of sysctl ... FAIL
          Writing sysctl with multiple long writes ... FAIL
          Writing entire sysctl in short writes ... FAIL
          Writing middle of sysctl after unsynchronized seek ... ok
          Checking sysctl maxlen is at least 65 ... ok
          Checking sysctl keeps original string on overflow append ... FAIL
          Checking sysctl stays NULL terminated on write ... ok
          Checking sysctl stays NULL terminated on overwrite ... ok
          make: *** [test_num] Error 1
          root@localhost:~# make test_string
          == Testing sysctl behavior against /proc/sys/vm/swappiness ==
          Writing test file ... ok
          Checking sysctl is not set to test value ... ok
          Writing sysctl from shell ... ok
          Resetting sysctl to original value ... ok
          Writing entire sysctl in single write ... ok
          Writing middle of sysctl after synchronized seek ... FAIL
          Writing beyond end of sysctl ... FAIL
          Writing sysctl with multiple long writes ... ok
          make: *** [test_string] Error 1
      
      [ with CONFIG_PROC_SYSCTL_STRICT_WRITES ]
          root@localhost:~# make run_tests
          == Testing sysctl behavior against /proc/sys/kernel/domainname ==
          Writing test file ... ok
          Checking sysctl is not set to test value ... ok
          Writing sysctl from shell ... ok
          Resetting sysctl to original value ... ok
          Writing entire sysctl in single write ... ok
          Writing middle of sysctl after synchronized seek ... ok
          Writing beyond end of sysctl ... ok
          Writing sysctl with multiple long writes ... ok
          Writing entire sysctl in short writes ... ok
          Writing middle of sysctl after unsynchronized seek ... ok
          Checking sysctl maxlen is at least 65 ... ok
          Checking sysctl keeps original string on overflow append ... ok
          Checking sysctl stays NULL terminated on write ... ok
          Checking sysctl stays NULL terminated on overwrite ... ok
          == Testing sysctl behavior against /proc/sys/vm/swappiness ==
          Writing test file ... ok
          Checking sysctl is not set to test value ... ok
          Writing sysctl from shell ... ok
          Resetting sysctl to original value ... ok
          Writing entire sysctl in single write ... ok
          Writing middle of sysctl after synchronized seek ... ok
          Writing beyond end of sysctl ... ok
          Writing sysctl with multiple long writes ... ok
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24fe831c
    • K
      sysctl: allow for strict write position handling · f4aacea2
      Kees Cook 提交于
      When writing to a sysctl string, each write, regardless of VFS position,
      begins writing the string from the start.  This means the contents of
      the last write to the sysctl controls the string contents instead of the
      first:
      
        open("/proc/sys/kernel/modprobe", O_WRONLY)   = 1
        write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
        write(1, "/bin/true", 9)                = 9
        close(1)                                = 0
      
        $ cat /proc/sys/kernel/modprobe
        /bin/true
      
      Expected behaviour would be to have the sysctl be "AAAA..." capped at
      maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
      contents of the second write.  Similarly, multiple short writes would
      not append to the sysctl.
      
      The old behavior is unlike regular POSIX files enough that doing audits
      of software that interact with sysctls can end up in unexpected or
      dangerous situations.  For example, "as long as the input starts with a
      trusted path" turns out to be an insufficient filter, as what must also
      happen is for the input to be entirely contained in a single write
      syscall -- not a common consideration, especially for high level tools.
      
      This provides kernel.sysctl_writes_strict as a way to make this behavior
      act in a less surprising manner for strings, and disallows non-zero file
      position when writing numeric sysctls (similar to what is already done
      when reading from non-zero file positions).  For now, the default (0) is
      to warn about non-zero file position use, but retain the legacy
      behavior.  Setting this to -1 disables the warning, and setting this to
      1 enables the file position respecting behavior.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: move misplaced hunk, per Randy]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4aacea2
    • K
      sysctl: refactor sysctl string writing logic · 2ca9bb45
      Kees Cook 提交于
      Consolidate buffer length checking with new-line/end-of-line checking.
      Additionally, instead of reading user memory twice, just do the
      assignment during the loop.
      
      This change doesn't affect the potential races here.  It was already
      possible to read a sysctl that was in the middle of a write.  In both
      cases, the string will always be NULL terminated.  The pre-existing race
      remains a problem to be solved.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ca9bb45
    • K
      sysctl: clean up char buffer arguments · f8808300
      Kees Cook 提交于
      When writing to a sysctl string, each write, regardless of VFS position,
      began writing the string from the start.  This meant the contents of the
      last write to the sysctl controlled the string contents instead of the
      first.
      
      This misbehavior was featured in an exploit against Chrome OS.  While
      it's not in itself a vulnerability, it's a weirdness that isn't on the
      mind of most auditors: "This filter looks correct, the first line
      written would not be meaningful to sysctl" doesn't apply here, since the
      size of the write and the contents of the final write are what matter
      when writing to sysctls.
      
      This adds the sysctl kernel.sysctl_writes_strict to control the write
      behavior.  The default (0) reports when VFS position is non-0 on a
      write, but retains legacy behavior, -1 disables the warning, and 1
      enables the position-respecting behavior.
      
      The long-term plan here is to wait for userspace to be fixed in response
      to the new warning and to then switch the default kernel behavior to the
      new position-respecting behavior.
      
      This patch (of 4):
      
      The char buffer arguments are needlessly cast in weird places.  Clean it
      up so things are easier to read.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8808300
    • A
      rapidio/tsi721: use pci_enable_msix_exact() instead of pci_enable_msix() · 1c92ab1e
      Alexander Gordeev 提交于
      As result of deprecation of MSI-X/MSI enablement functions
      pci_enable_msix() and pci_enable_msi_block() all drivers using these two
      interfaces need to be updated to use the new pci_enable_msi_range() or
      pci_enable_msi_exact() and pci_enable_msix_range() or
      pci_enable_msix_exact() interfaces.
      
      The patch has no runtime effect.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Acked-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c92ab1e
    • L
      idr: reorder the fields · dcbff5d1
      Lai Jiangshan 提交于
      idr_layer->layer is always accessed in read path, move it in the front.
      
      idr_layer->bitmap is moved on the bottom.  And rcu_head shares with
      bitmap due to they do not be accessed at the same time.
      
      idr->id_free/id_free_cnt/lock are free list fields, and moved to the
      bottom.  They will be removed in near future.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcbff5d1
    • L
      idr: reduce the unneeded check in free_layer() · 15f3ec3f
      Lai Jiangshan 提交于
      If "idr->hint == p" is true, it also implies "idr->hint" is true(not NULL).
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15f3ec3f
    • L
      idr: don't need to shink the free list when idr_remove() · aefb7682
      Lai Jiangshan 提交于
      After idr subsystem is changed to RCU-awared, the free layer will not go
      to the free list.  The free list will not be filled up when
      idr_remove().  So we don't need to shink it too.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aefb7682
    • L
      idr: fix idr_replace()'s returned error code · b93804b2
      Lai Jiangshan 提交于
      When the smaller id is not found, idr_replace() returns -ENOENT.  But
      when the id is bigger enough, idr_replace() returns -EINVAL, actually
      there is no difference between these two kinds of ids.
      
      These are all unallocated id, the return values of the idr_replace() for
      these ids should be the same: -ENOENT.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b93804b2
    • L
      idr: fix NULL pointer dereference when ida_remove(unallocated_id) · aef0f62e
      Lai Jiangshan 提交于
      If the ida has at least one existing id, and when an unallocated ID
      which meets a certain condition is passed to the ida_remove(), the
      system will crash because it hits NULL pointer dereference.
      
      The condition is that the unallocated ID shares the same lowest idr
      layer with the existing ID, but the idr slot would be different if the
      unallocated ID were to be allocated.
      
      In this case the matching idr slot for the unallocated_id is NULL,
      causing @bitmap to be NULL which the function dereferences without
      checking crashing the kernel.
      
      See the test code:
      
        static void test3(void)
        {
              int id;
              DEFINE_IDA(test_ida);
      
              printk(KERN_INFO "Start test3\n");
              if (ida_pre_get(&test_ida, GFP_KERNEL) < 0) return;
              if (ida_get_new(&test_ida,  &id) < 0) return;
              ida_remove(&test_ida, 4000); /* bug: null deference here */
              printk(KERN_INFO "End of test3\n");
        }
      
      It happens only when the caller tries to free an unallocated ID which is
      the caller's fault.  It is not a bug.  But it is better to add the
      proper check and complain rather than crashing the kernel.
      
      [tj@kernel.org: updated patch description]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aef0f62e
    • L
      idr: fix unexpected ID-removal when idr_remove(unallocated_id) · 8f9f665a
      Lai Jiangshan 提交于
      If unallocated_id = (ANY * idr_max(idp->layers) + existing_id) is passed
      to idr_remove().  The existing_id will be removed unexpectedly.
      
      The following test shows this unexpected id-removal:
      
        static void test4(void)
        {
              int id;
              DEFINE_IDR(test_idr);
      
              printk(KERN_INFO "Start test4\n");
              id = idr_alloc(&test_idr, (void *)1, 42, 43, GFP_KERNEL);
              BUG_ON(id != 42);
              idr_remove(&test_idr, 42 + IDR_SIZE);
              TEST_BUG_ON(idr_find(&test_idr, 42) != (void *)1);
              idr_destroy(&test_idr);
              printk(KERN_INFO "End of test4\n");
        }
      
      ida_remove() shares the similar problem.
      
      It happens only when the caller tries to free an unallocated ID which is
      the caller's fault.  It is not a bug.  But it is better to add the
      proper check and complain rather than removing an existing_id silently.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f9f665a
    • L
      idr: fix overflow bug during maximum ID calculation at maximum height · 3afb69cb
      Lai Jiangshan 提交于
      idr_replace() open-codes the logic to calculate the maximum valid ID
      given the height of the idr tree; unfortunately, the open-coded logic
      doesn't account for the fact that the top layer may have unused slots
      and over-shifts the limit to zero when the tree is at its maximum
      height.
      
      The following test code shows it fails to replace the value for
      id=((1<<27)+42):
      
        static void test5(void)
        {
              int id;
              DEFINE_IDR(test_idr);
        #define TEST5_START ((1<<27)+42) /* use the highest layer */
      
              printk(KERN_INFO "Start test5\n");
              id = idr_alloc(&test_idr, (void *)1, TEST5_START, 0, GFP_KERNEL);
              BUG_ON(id != TEST5_START);
              TEST_BUG_ON(idr_replace(&test_idr, (void *)2, TEST5_START) != (void *)1);
              idr_destroy(&test_idr);
              printk(KERN_INFO "End of test5\n");
        }
      
      Fix the bug by using idr_max() which correctly takes into account the
      maximum allowed shift.
      
      sub_alloc() shares the same problem and may incorrectly fail with
      -EAGAIN; however, this bug doesn't affect correct operation because
      idr_get_empty_slot(), which already uses idr_max(), retries with the
      increased @id in such cases.
      
      [tj@kernel.org: Updated patch description.]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3afb69cb
    • F
      kernel/kexec.c: convert printk to pr_foo() · e1bebcf4
      Fabian Frederick 提交于
      + some pr_warning -> pr_warn and checkpatch warning fixes
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1bebcf4
    • M
      kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers · f06e5153
      Masami Hiramatsu 提交于
      Add a "crash_kexec_post_notifiers" boot option to run kdump after
      running panic_notifiers and dump kmsg.  This can help rare situations
      where kdump fails because of unstable crashed kernel or hardware failure
      (memory corruption on critical data/code), or the 2nd kernel is already
      broken by the 1st kernel (it's a broken behavior, but who can guarantee
      that the "crashed" kernel works correctly?).
      
      Usage: add "crash_kexec_post_notifiers" to kernel boot option.
      
      Note that this actually increases risks of the failure of kdump.  This
      option should be set only if you worry about the rare case of kdump
      failure rather than increasing the chance of success.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: NMotohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Satoru MORIYA <satoru.moriya.br@hitachi.com>
      Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f06e5153
    • S
      smp: print more useful debug info upon receiving IPI on an offline CPU · a219ccf4
      Srivatsa S. Bhat 提交于
      There is a longstanding problem related to CPU hotplug which causes IPIs
      to be delivered to offline CPUs, and the smp-call-function IPI handler
      code prints out a warning whenever this is detected.  Every once in a
      while this (usually harmless) warning gets reported on LKML, but so far
      it has not been completely fixed.  Usually the solution involves finding
      out the IPI sender and fixing it by adding appropriate synchronization
      with CPU hotplug.
      
      However, while going through one such internal bug reports, I found that
      there is a significant bug in the receiver side itself (more
      specifically, in stop-machine) that can lead to this problem even when
      the sender code is perfectly fine.  This patchset fixes that
      synchronization problem in the CPU hotplug stop-machine code.
      
      Patch 1 adds some additional debug code to the smp-call-function
      framework, to help debug such issues easily.
      
      Patch 2 modifies the stop-machine code to ensure that any IPIs that were
      sent while the target CPU was online, would be noticed and handled by
      that CPU without fail before it goes offline.  Thus, this avoids
      scenarios where IPIs are received on offline CPUs (as long as the sender
      uses proper hotplug synchronization).
      
      In fact, I debugged the problem by using Patch 1, and found that the
      payload of the IPI was always the block layer's trigger_softirq()
      function.  But I was not able to find anything wrong with the block
      layer code.  That's when I started looking at the stop-machine code and
      realized that there is a race-window which makes the IPI _receiver_ the
      culprit, not the sender.  Patch 2 fixes that race and hence this should
      put an end to most of the hard-to-debug IPI-to-offline-CPU issues.
      
      This patch (of 2):
      
      Today the smp-call-function code just prints a warning if we get an IPI
      on an offline CPU.  This info is sufficient to let us know that
      something went wrong, but often it is very hard to debug exactly who
      sent the IPI and why, from this info alone.
      
      In most cases, we get the warning about the IPI to an offline CPU,
      immediately after the CPU going offline comes out of the stop-machine
      phase and reenables interrupts.  Since all online CPUs participate in
      stop-machine, the information regarding the sender of the IPI is already
      lost by the time we exit the stop-machine loop.  So even if we dump the
      stack on each CPU at this point, we won't find anything useful since all
      of them will show the stack-trace of the stopper thread.  So we need a
      better way to figure out who sent the IPI and why.
      
      To achieve this, when we detect an IPI targeted to an offline CPU, loop
      through the call-single-data linked list and print out the payload
      (i.e., the name of the function which was supposed to be executed by the
      target CPU).  This would give us an insight as to who might have sent
      the IPI and help us debug this further.
      
      [akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a219ccf4
    • F
      fs/proc/vmcore.c: remove NULL assignment to static · a05e16ad
      Fabian Frederick 提交于
      Static values are automatically initialized to NULL.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a05e16ad
    • F
      fs/proc/task_mmu.c: replace seq_printf by seq_puts · 17c2b4ee
      Fabian Frederick 提交于
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17c2b4ee
    • O
      signals: change wait_for_helper() to use kernel_sigaction() · 76e0a6f4
      Oleg Nesterov 提交于
      Now that we have kernel_sigaction() we can change wait_for_helper() to
      use it and cleans up the code a bit.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76e0a6f4
    • O
      signals: introduce kernel_sigaction() · b4e74264
      Oleg Nesterov 提交于
      Now that allow_signal() is really trivial we can unify it with
      disallow_signal().  Add the new helper, kernel_sigaction(), and
      reimplement allow_signal/disallow_signal as a trivial wrappers.
      
      This saves one EXPORT_SYMBOL() and the new helper can have more users.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4e74264
    • O
      signals: disallow_signal() should flush the potentially pending signal · 580d34e4
      Oleg Nesterov 提交于
      disallow_signal() simply sets SIG_IGN, this is not enough and
      recalc_sigpending() is simply pointless because in can never change the
      state of TIF_SIGPENDING.
      
      If we ignore a signal, we also need to do flush_sigqueue_mask() for the
      case when this signal is pending, this way recalc_sigpending() can
      actually clear TIF_SIGPENDING and we do not "leak" the allocated
      siginfo's.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      580d34e4
    • O
      signals: kill the obsolete sigdelset() and recalc_sigpending() in allow_signal() · ec5955b8
      Oleg Nesterov 提交于
      allow_signal() does sigdelset(current->blocked) due to historic reason,
      previously it could be called by a daemonize()'ed kthread, and
      daemonize() played with current->blocked.
      
      Now that daemonize() has gone away we can remove sigdelset() and
      recalc_sigpending().  If a user really wants to unblock a signal, it
      must use sigprocmask() or set_current_block() explicitely.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec5955b8
    • O
      signals: jffs2: fix the wrong usage of disallow_signal() · c240837f
      Oleg Nesterov 提交于
      jffs2_garbage_collect_thread() does disallow_signal(SIGHUP) around
      jffs2_garbage_collect_pass() and the comment says "We don't want SIGHUP
      to interrupt us".
      
      But disallow_signal() can't ensure that jffs2_garbage_collect_pass()
      won't be interrupted by SIGHUP, the problem is that SIGHUP can be
      already pending when disallow_signal() is called, and in this case any
      interruptible sleep won't block.
      
      Note: this is in fact because disallow_signal() is buggy and should be
      fixed, see the next changes.
      
      But there is another reason why disallow_signal() is wrong: SIG_IGN set
      by disallow_signal() silently discards any SIGHUP which can be sent
      before the next allow_signal(SIGHUP).
      
      Change this code to use sigprocmask(SIG_UNBLOCK/SIG_BLOCK, SIGHUP).
      This even matches the old (and wrong) semantics allow/disallow had when
      this logic was written.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c240837f
    • O
      signals: mv {dis,}allow_signal() from sched.h/exit.c to signal.[ch] · 0341729b
      Oleg Nesterov 提交于
      Move the declaration/definition of allow_signal/disallow_signal to
      signal.h/signal.c.  The new place is more logical and allows to use the
      static helpers in signal.c (see the next changes).
      
      While at it, make them return void and remove the valid_signal() check.
      Nobody checks the returned value, and in-kernel users must not pass the
      wrong signal number.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0341729b
    • O
      signals: cleanup the usage of t/current in do_sigaction() · afe2b038
      Oleg Nesterov 提交于
      The usage of "task_struct *t" and "current" in do_sigaction() looks really
      annoying and chaotic.  Initially "t" is used as a cached value of current
      but not consistently, then it is reused as a loop variable and we have to
      use "current" again.
      
      Clean up this mess and also convert the code to use for_each_thread().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afe2b038
    • O
      signals: rename rm_from_queue_full() to flush_sigqueue_mask() · c09c1441
      Oleg Nesterov 提交于
      "rm_from_queue_full" looks ugly and misleading, especially now that
      rm_from_queue() has gone away.  Rename it to flush_sigqueue_mask(), this
      matches flush_sigqueue() we already have.
      
      Also remove the obsolete comment which explains the difference with
      rm_from_queue() we already killed.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c09c1441
    • O
      signals: kill rm_from_queue(), change prepare_signal() to use for_each_thread() · 9490592f
      Oleg Nesterov 提交于
      rm_from_queue() doesn't make sense.  The only caller, prepare_signal(),
      can use rm_from_queue_full() with the same effect.
      
      While at it, change prepare_signal() to use for_each_thread() instead of
      do/while_each_thread.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9490592f
    • O
      signals: s/siginitset/sigemptyset/ in do_sigtimedwait() · 6114041a
      Oleg Nesterov 提交于
      Cosmetic, but siginitset(0) looks a bit strange, sigemptyset() is what
      do_sigtimedwait() needs.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6114041a
    • O
      signals: kill sigfindinword() · 36fac0a2
      Oleg Nesterov 提交于
      It has no users and it doesn't look useful.  I do not know why/when it was
      introduced, I can't even find any user in the git history.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36fac0a2
    • O
      ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() · 650226bd
      Oleg Nesterov 提交于
      __wake_up_bit() checks waitqueue_active() and thus the caller needs mb()
      as wake_up_bit() documents, fix task_clear_jobctl_trapping().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      650226bd
    • M
      ptrace: fix fork event messages across pid namespaces · 4e52365f
      Matthew Dempsky 提交于
      When tracing a process in another pid namespace, it's important for fork
      event messages to contain the child's pid as seen from the tracer's pid
      namespace, not the parent's.  Otherwise, the tracer won't be able to
      correlate the fork event with later SIGTRAP signals it receives from the
      child.
      
      We still risk a race condition if a ptracer from a different pid
      namespace attaches after we compute the pid_t value.  However, sending a
      bogus fork event message in this unlikely scenario is still a vast
      improvement over the status quo where we always send bogus fork event
      messages to debuggers in a different pid namespace than the forking
      process.
      Signed-off-by: NMatthew Dempsky <mdempsky@chromium.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Julien Tinnes <jln@chromium.org>
      Cc: Roland McGrath <mcgrathr@chromium.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e52365f
    • A
      Documentation/memory-barriers.txt: fix important typo re memory barriers · 615cc2c9
      Alexey Dobriyan 提交于
      Examples introducing neccesity of RMB+WMP pair reads as
      
              A=3	READ B
              www	rrrrrr
              B=4	READ A
      
      Note the opposite order of reads vs writes.
      
      But the first example without barriers reads as
      
              A=3	READ A
              B=4	READ B
      
      There are 4 outcomes in the first example.
      
      But if someone new to the concept tries to insert barriers like this:
      
              A=3	READ A
              www	rrrrrr
              B=4	READ B
      
      he will still get all 4 possible outcomes, because "READ A" is first.
      
      All this can be utterly confusing because barrier pair seems to be
      superfluous.  In short, fixup first example to match latter examples
      with barriers.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      615cc2c9