1. 15 12月, 2016 3 次提交
    • D
      ipc/sem: optimize perform_atomic_semop() · 4ce33ec2
      Davidlohr Bueso 提交于
      This is the main workhorse that deals with semop user calls such that
      the waitforzero or semval update operations, on the set, can complete on
      not as the sma currently stands.  Currently, the set is iterated twice
      (setting semval, then backwards for the sempid value).  Slowpaths, and
      particularly SEM_UNDO calls, must undo any altered sem when it is
      detected that the caller must block or has errored-out.
      
      With larger sets, there can occur situations where this involves a lot
      of cycles and can obviously be a suboptimal use of cached resources in
      shared memory.  Ie, discarding CPU caches that are also calling semop
      and have the sembuf cached (and can complete), while the current lock
      holder doing the semop will block, error, or does a waitforzero
      operation.
      
      This patch proposes still iterating the set twice, but the first scan is
      read-only, and we perform the actual updates afterward, once we know
      that the call will succeed.  In order to not suffer from the overhead of
      dealing with sops that act on the same sem_num, such (rare) cases use
      perform_atomic_semop_slow(), which is exactly what we have now.
      Duplicates are detected before grabbing sem_lock, and uses simple a
      32/64-bit hash array variable to based on the sem_num we are working on.
      
      In addition add some comments to when we expect to the caller to block.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [colin.king@canonical.com: ensure we left shift a ULL rather than a 32 bit integer]
        Link: http://lkml.kernel.org/r/20161028181129.7311-1-colin.king@canonical.com
      Link: http://lkml.kernel.org/r/20160921194603.GB21438@linux-80c1.suseSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ce33ec2
    • D
      ipc/sem: rework task wakeups · 9ae949fa
      Davidlohr Bueso 提交于
      Our sysv sems have been using the notion of lockless wakeups for a
      while, ever since commit 0a2b9d4c ("ipc/sem.c: move wake_up_process
      out of the spinlock section"), in order to reduce the sem_lock hold
      times.  This in-house pending queue can be replaced by wake_q (just like
      all the rest of ipc now), in that it provides the following advantages:
      
       o Simplifies and gets rid of unnecessary code.
      
       o We get rid of the IN_WAKEUP complexities. Given that wake_q_add()
         grabs reference to the task, if awoken due to an unrelated event,
         between the wake_q_add() and wake_up_q() window, we cannot race with
         sys_exit and the imminent call to wake_up_process().
      
       o By not spinning IN_WAKEUP, we no longer need to disable preemption.
      
      In consequence, the wakeup paths (after schedule(), that is) must
      acknowledge an external signal/event, as well spurious wakeup occurring
      during the pending wakeup window.  Obviously no changes in semantics
      that could be visible to the user.  The fastpath is _only_ for when we
      know for sure that we were awoken due to a the waker's successful semop
      call (queue.status is not -EINTR).
      
      On a 48-core Haswell, running the ipcscale 'waitforzero' test, the
      following is seen with increasing thread counts:
      
                                     v4.8-rc5                v4.8-rc5
                                                              semopv2
      Hmean    sembench-sem-2      574733.00 (  0.00%)   578322.00 (  0.62%)
      Hmean    sembench-sem-8      811708.00 (  0.00%)   824689.00 (  1.59%)
      Hmean    sembench-sem-12     842448.00 (  0.00%)   845409.00 (  0.35%)
      Hmean    sembench-sem-21     933003.00 (  0.00%)   977748.00 (  4.80%)
      Hmean    sembench-sem-48     935910.00 (  0.00%)  1004759.00 (  7.36%)
      Hmean    sembench-sem-79     937186.00 (  0.00%)   983976.00 (  4.99%)
      Hmean    sembench-sem-234    974256.00 (  0.00%)  1060294.00 (  8.83%)
      Hmean    sembench-sem-265    975468.00 (  0.00%)  1016243.00 (  4.18%)
      Hmean    sembench-sem-296    991280.00 (  0.00%)  1042659.00 (  5.18%)
      Hmean    sembench-sem-327    975415.00 (  0.00%)  1029977.00 (  5.59%)
      Hmean    sembench-sem-358   1014286.00 (  0.00%)  1049624.00 (  3.48%)
      Hmean    sembench-sem-389    972939.00 (  0.00%)  1043127.00 (  7.21%)
      Hmean    sembench-sem-420    981909.00 (  0.00%)  1056747.00 (  7.62%)
      Hmean    sembench-sem-451    990139.00 (  0.00%)  1051609.00 (  6.21%)
      Hmean    sembench-sem-482    965735.00 (  0.00%)  1040313.00 (  7.72%)
      
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: merge fix for WAKE_Q to DEFINE_WAKE_Q rename]
        Link: http://lkml.kernel.org/r/20161122210410.5eca9fc2@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1474225896-10066-3-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ae949fa
    • D
      ipc/sem: do not call wake_sem_queue_do() prematurely ... as this call should... · 248e7357
      Davidlohr Bueso 提交于
      ipc/sem: do not call wake_sem_queue_do() prematurely ... as this call should obviously be paired with its _prepare()
      
      counterpart.  At least whenever possible, as there is no harm in calling
      it bogusly as we do now in a few places.  Immediate error semop(2) paths
      that are far from ever having the task block can be simplified and avoid
      a few unnecessary loads on their way out of the call as it is not deeply
      nested.
      
      Link: http://lkml.kernel.org/r/1474225896-10066-2-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      248e7357
  2. 12 10月, 2016 2 次提交
    • N
      ipc/sem.c: add cond_resched in exit_sme · 2a1613a5
      Nikolay Borisov 提交于
      In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
      exit_sem.  Apparently it's possible for the loop to take quite a long time
      and it doesn't have a scheduling point in it.  Since the codes is
      executing under an rcu read section this may also cause rcu stalls, which
      in turn block synchronize_rcu operations, which more or less de-stabilises
      the whole system.
      
      Fix this by introducing a cond_resched() at the beginning of the loop.
      
      So this patch fixes the following:
      
        NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
        CPU: 10 PID: 18119 Comm: httpd Tainted: G           O    4.4.20-clouder2 #6
        Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
        task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
        RIP: 0010:[<ffffffff81614bc7>]  [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30
        RSP: 0018:ffff881c95553e40  EFLAGS: 00000246
        RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
        RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
        RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
        R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
        R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
        FS:  0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
        Call Trace:
          ? exit_sem+0x7c/0x280
          do_exit+0x338/0xb40
          do_group_exit+0x43/0xd0
          SyS_exit_group+0x14/0x20
          entry_SYSCALL_64_fastpath+0x16/0x6e
      
      Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.comSigned-off-by: NNikolay Borisov <kernel@kyup.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a1613a5
    • M
      ipc/sem.c: fix complex_count vs. simple op race · 5864a2fd
      Manfred Spraul 提交于
      Commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") introduced a
      race:
      
      sem_lock has a fast path that allows parallel simple operations.
      There are two reasons why a simple operation cannot run in parallel:
       - a non-simple operations is ongoing (sma->sem_perm.lock held)
       - a complex operation is sleeping (sma->complex_count != 0)
      
      As both facts are stored independently, a thread can bypass the current
      checks by sleeping in the right positions.  See below for more details
      (or kernel bugzilla 105651).
      
      The patch fixes that by creating one variable (complex_mode)
      that tracks both reasons why parallel operations are not possible.
      
      The patch also updates stale documentation regarding the locking.
      
      With regards to stable kernels:
      The patch is required for all kernels that include the
      commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") (3.10?)
      
      The alternative is to revert the patch that introduced the race.
      
      The patch is safe for backporting, i.e. it makes no assumptions
      about memory barriers in spin_unlock_wait().
      
      Background:
      Here is the race of the current implementation:
      
      Thread A: (simple op)
      - does the first "sma->complex_count == 0" test
      
      Thread B: (complex op)
      - does sem_lock(): This includes an array scan. But the scan can't
        find Thread A, because Thread A does not own sem->lock yet.
      - the thread does the operation, increases complex_count,
        drops sem_lock, sleeps
      
      Thread A:
      - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
      - sleeps before the complex_count test
      
      Thread C: (complex op)
      - does sem_lock (no array scan, complex_count==1)
      - wakes up Thread B.
      - decrements complex_count
      
      Thread A:
      - does the complex_count test
      
      Bug:
      Now both thread A and thread C operate on the same array, without
      any synchronization.
      
      Fixes: 6d07b68c ("ipc/sem.c: optimize sem_lock()")
      Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
      Reported-by: <felixh@informatik.uni-bremen.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <1vier1@web.de>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5864a2fd
  3. 03 8月, 2016 1 次提交
    • F
      sysv, ipc: fix security-layer leaking · 9b24fef9
      Fabian Frederick 提交于
      Commit 53dad6d3 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
      to receive rcu freeing function but used generic ipc_rcu_free() instead
      of msg_rcu_free() which does security cleaning.
      
      Running LTP msgsnd06 with kmemleak gives the following:
      
        cat /sys/kernel/debug/kmemleak
      
        unreferenced object 0xffff88003c0a11f8 (size 8):
          comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
          hex dump (first 8 bytes):
            1b 00 00 00 01 00 00 00                          ........
          backtrace:
            kmemleak_alloc+0x23/0x40
            kmem_cache_alloc_trace+0xe1/0x180
            selinux_msg_queue_alloc_security+0x3f/0xd0
            security_msg_queue_alloc+0x2e/0x40
            newque+0x4e/0x150
            ipcget+0x159/0x1b0
            SyS_msgget+0x39/0x40
            entry_SYSCALL_64_fastpath+0x13/0x8f
      
      Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
      only use ipc_rcu_free in case of security allocation failure in newary()
      
      Fixes: 53dad6d3 ("ipc: fix race with LSMs")
      Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b24fef9
  4. 14 6月, 2016 2 次提交
  5. 23 3月, 2016 1 次提交
    • D
      ipc/sem: make semctl setting sempid consistent · a5f4db87
      Davidlohr Bueso 提交于
      As indicated by bug#112271, Linux sets the sempid value upon semctl, and
      not only for semop calls.  However, within semctl we only do this for
      SETVAL, leaving SETALL without updating the field, and therefore rather
      inconsistent behavior when compared to other Unices.
      
      There is really no documentation regarding this and therefore users
      should not make assumptions.  With this patch, along with updating
      semctl.2 manpages, this scenario should become less ambiguous As such,
      set sempid on SETALL cmd.
      
      Also update some in-code documentation, specifying where the sempid is
      set.
      
      Passes ltp and custom testcase where a child (fork) does SETALL to the
      set.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NPhilip Semanchuk <linux_kernel.20.ick@spamgourmet.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5f4db87
  6. 23 1月, 2016 1 次提交
  7. 15 8月, 2015 3 次提交
    • M
      ipc/sem.c: update/correct memory barriers · 3ed1f8a9
      Manfred Spraul 提交于
      sem_lock() did not properly pair memory barriers:
      
      !spin_is_locked() and spin_unlock_wait() are both only control barriers.
      The code needs an acquire barrier, otherwise the cpu might perform read
      operations before the lock test.
      
      As no primitive exists inside <include/spinlock.h> and since it seems
      noone wants another primitive, the code creates a local primitive within
      ipc/sem.c.
      
      With regards to -stable:
      
      The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
      nop (just a preprocessor redefinition to improve the readability).  The
      bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
      starting from 3.10).
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ed1f8a9
    • H
      ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem() · a9795584
      Herton R. Krzesinski 提交于
      After we acquire the sma->sem_perm lock in exit_sem(), we are protected
      against a racing IPC_RMID operation.  Also at that point, we are the last
      user of sem_undo_list.  Therefore it isn't required that we acquire or use
      ulp->lock.
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9795584
    • H
      ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits · 602b8593
      Herton R. Krzesinski 提交于
      The current semaphore code allows a potential use after free: in
      exit_sem we may free the task's sem_undo_list while there is still
      another task looping through the same semaphore set and cleaning the
      sem_undo list at freeary function (the task called IPC_RMID for the same
      semaphore set).
      
      For example, with a test program [1] running which keeps forking a lot
      of processes (which then do a semop call with SEM_UNDO flag), and with
      the parent right after removing the semaphore set with IPC_RMID, and a
      kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
      CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
      in the kernel log:
      
         Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
         000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
         010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
         Prev obj: start=ffff88003b45c180, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
         Next obj: start=ffff88003b45c200, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
         BUG: spinlock wrong CPU on CPU#2, test/18028
         general protection fault: 0000 [#1] SMP
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: spin_dump+0x53/0xc0
         Call Trace:
           spin_bug+0x30/0x40
           do_raw_spin_unlock+0x71/0xa0
           _raw_spin_unlock+0xe/0x10
           freeary+0x82/0x2a0
           ? _raw_spin_lock+0xe/0x10
           semctl_down.clone.0+0xce/0x160
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           SyS_semctl+0x236/0x2c0
           ? syscall_trace_leave+0xde/0x130
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
         RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
          RSP <ffff88003750fd68>
         ---[ end trace 783ebb76612867a0 ]---
         NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: native_read_tsc+0x0/0x20
         Call Trace:
           ? delay_tsc+0x40/0x70
           __delay+0xf/0x20
           do_raw_spin_lock+0x96/0x140
           _raw_spin_lock+0xe/0x10
           sem_lock_and_putref+0x11/0x70
           SYSC_semtimedop+0x7bf/0x960
           ? handle_mm_fault+0xbf6/0x1880
           ? dequeue_task_fair+0x79/0x4a0
           ? __do_page_fault+0x19a/0x430
           ? kfree_debugcheck+0x16/0x40
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           ? do_audit_syscall_entry+0x66/0x70
           ? syscall_trace_enter_phase1+0x139/0x160
           SyS_semtimedop+0xe/0x10
           SyS_semop+0x10/0x20
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
         Kernel panic - not syncing: softlockup: hung tasks
      
      I wasn't able to trigger any badness on a recent kernel without the
      proper config debugs enabled, however I have softlockup reports on some
      kernel versions, in the semaphore code, which are similar as above (the
      scenario is seen on some servers running IBM DB2 which uses semaphore
      syscalls).
      
      The patch here fixes the race against freeary, by acquiring or waiting
      on the sem_undo_list lock as necessary (exit_sem can race with freeary,
      while freeary sets un->semid to -1 and removes the same sem_undo from
      list_proc or when it removes the last sem_undo).
      
      After the patch I'm unable to reproduce the problem using the test case
      [1].
      
      [1] Test case used below:
      
          #include <stdio.h>
          #include <sys/types.h>
          #include <sys/ipc.h>
          #include <sys/sem.h>
          #include <sys/wait.h>
          #include <stdlib.h>
          #include <time.h>
          #include <unistd.h>
          #include <errno.h>
      
          #define NSEM 1
          #define NSET 5
      
          int sid[NSET];
      
          void thread()
          {
                  struct sembuf op;
                  int s;
                  uid_t pid = getuid();
      
                  s = rand() % NSET;
                  op.sem_num = pid % NSEM;
                  op.sem_op = 1;
                  op.sem_flg = SEM_UNDO;
      
                  semop(sid[s], &op, 1);
                  exit(EXIT_SUCCESS);
          }
      
          void create_set()
          {
                  int i, j;
                  pid_t p;
                  union {
                          int val;
                          struct semid_ds *buf;
                          unsigned short int *array;
                          struct seminfo *__buf;
                  } un;
      
                  /* Create and initialize semaphore set */
                  for (i = 0; i < NSET; i++) {
                          sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                          if (sid[i] < 0) {
                                  perror("semget");
                                  exit(EXIT_FAILURE);
                          }
                  }
                  un.val = 0;
                  for (i = 0; i < NSET; i++) {
                          for (j = 0; j < NSEM; j++) {
                                  if (semctl(sid[i], j, SETVAL, un) < 0)
                                          perror("semctl");
                          }
                  }
      
                  /* Launch threads that operate on semaphore set */
                  for (i = 0; i < NSEM * NSET * NSET; i++) {
                          p = fork();
                          if (p < 0)
                                  perror("fork");
                          if (p == 0)
                                  thread();
                  }
      
                  /* Free semaphore set */
                  for (i = 0; i < NSET; i++) {
                          if (semctl(sid[i], NSEM, IPC_RMID))
                                  perror("IPC_RMID");
                  }
      
                  /* Wait for forked processes to exit */
                  while (wait(NULL)) {
                          if (errno == ECHILD)
                                  break;
                  };
          }
      
          int main(int argc, char **argv)
          {
                  pid_t p;
      
                  srand(time(NULL));
      
                  while (1) {
                          p = fork();
                          if (p < 0) {
                                  perror("fork");
                                  exit(EXIT_FAILURE);
                          }
                          if (p == 0) {
                                  create_set();
                                  goto end;
                          }
      
                          /* Wait for forked processes to exit */
                          while (wait(NULL)) {
                                  if (errno == ECHILD)
                                          break;
                          };
                  }
          end:
                  return 0;
          }
      
      [akpm@linux-foundation.org: use normal comment layout]
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      602b8593
  8. 01 7月, 2015 1 次提交
  9. 16 4月, 2015 1 次提交
  10. 18 2月, 2015 1 次提交
  11. 14 12月, 2014 1 次提交
  12. 04 12月, 2014 1 次提交
  13. 07 6月, 2014 9 次提交
  14. 28 1月, 2014 6 次提交
  15. 17 10月, 2013 1 次提交
    • M
      ipc/sem.c: synchronize semop and semctl with IPC_RMID · 6e224f94
      Manfred Spraul 提交于
      After acquiring the semlock spinlock, operations must test that the
      array is still valid.
      
       - semctl() and exit_sem() would walk stale linked lists (ugly, but
         should be ok: all lists are empty)
      
       - semtimedop() would sleep forever - and if woken up due to a signal -
         access memory after free.
      
      The patch also:
       - standardizes the tests for .deleted, so that all tests in one
         function leave the function with the same approach.
       - unconditionally tests for .deleted immediately after every call to
         sem_lock - even it it means that for semctl(GETALL), .deleted will be
         tested twice.
      
      Both changes make the review simpler: After every sem_lock, there must
      be a test of .deleted, followed by a goto to the cleanup code (if the
      function uses "goto cleanup").
      
      The only exception is semctl_down(): If sem_ids().rwsem is locked, then
      the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
      additional test is required.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e224f94
  16. 01 10月, 2013 4 次提交
    • M
      ipc/sem.c: update sem_otime for all operations · 0e8c6656
      Manfred Spraul 提交于
      In commit 0a2b9d4c ("ipc/sem.c: move wake_up_process out of the
      spinlock section"), the update of semaphore's sem_otime(last semop time)
      was moved to one central position (do_smart_update).
      
      But since do_smart_update() is only called for operations that modify
      the array, this means that wait-for-zero semops do not update sem_otime
      anymore.
      
      The fix is simple:
      Non-alter operations must update sem_otime.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Reported-by: NJia He <jiakernel@gmail.com>
      Tested-by: NJia He <jiakernel@gmail.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e8c6656
    • M
      ipc/sem.c: synchronize the proc interface · d8c63376
      Manfred Spraul 提交于
      The proc interface is not aware of sem_lock(), it instead calls
      ipc_lock_object() directly.  This means that simple semop() operations
      can run in parallel with the proc interface.  Right now, this is
      uncritical, because the implementation doesn't do anything that requires
      a proper synchronization.
      
      But it is dangerous and therefore should be fixed.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8c63376
    • M
      ipc/sem.c: optimize sem_lock() · 6d07b68c
      Manfred Spraul 提交于
      Operations that need access to the whole array must guarantee that there
      are no simple operations ongoing.  Right now this is achieved by
      spin_unlock_wait(sem->lock) on all semaphores.
      
      If complex_count is nonzero, then this spin_unlock_wait() is not
      necessary, because it was already performed in the past by the thread
      that increased complex_count and even though sem_perm.lock was dropped
      inbetween, no simple operation could have started, because simple
      operations cannot start when complex_count is non-zero.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d07b68c
    • M
      ipc/sem.c: fix race in sem_lock() · 5e9d5275
      Manfred Spraul 提交于
      The exclusion of complex operations in sem_lock() is insufficient: after
      acquiring the per-semaphore lock, a simple op must first check that
      sem_perm.lock is not locked and only after that test check
      complex_count.  The current code does it the other way around - and that
      creates a race.  Details are below.
      
      The patch is a complete rewrite of sem_lock(), based in part on the code
      from Mike Galbraith.  It removes all gotos and all loops and thus the
      risk of livelocks.
      
      I have tested the patch (together with the next one) on my i3 laptop and
      it didn't cause any problems.
      
      The bug is probably also present in 3.10 and 3.11, but for these kernels
      it might be simpler just to move the test of sma->complex_count after
      the spin_is_locked() test.
      
      Details of the bug:
      
      Assume:
       - sma->complex_count = 0.
       - Thread 1: semtimedop(complex op that must sleep)
       - Thread 2: semtimedop(simple op).
      
      Pseudo-Trace:
      
      Thread 1: sem_lock(): acquire sem_perm.lock
      Thread 1: sem_lock(): check for ongoing simple ops
      			Nothing ongoing, thread 2 is still before sem_lock().
      Thread 1: try_atomic_semop()
      	<<< preempted.
      
      Thread 2: sem_lock():
              static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                            int nsops)
              {
                      int locknum;
               again:
                      if (nsops == 1 && !sma->complex_count) {
                              struct sem *sem = sma->sem_base + sops->sem_num;
      
                              /* Lock just the semaphore we are interested in. */
                              spin_lock(&sem->lock);
      
                              /*
                               * If sma->complex_count was set while we were spinning,
                               * we may need to look at things we did not lock here.
                               */
                              if (unlikely(sma->complex_count)) {
                                      spin_unlock(&sem->lock);
                                      goto lock_array;
                              }
              <<<<<<<<<
      	<<< complex_count is still 0.
      	<<<
              <<< Here it is preempted
              <<<<<<<<<
      
      Thread 1: try_atomic_semop() returns, notices that it must sleep.
      Thread 1: increases sma->complex_count.
      Thread 1: drops sem_perm.lock
      Thread 2:
                      /*
                       * Another process is holding the global lock on the
                       * sem_array; we cannot enter our critical section,
                       * but have to wait for the global lock to be released.
                       */
                      if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
                              spin_unlock(&sem->lock);
                              spin_unlock_wait(&sma->sem_perm.lock);
                              goto again;
                      }
      	<<< sem_perm.lock already dropped, thus no "goto again;"
      
                      locknum = sops->sem_num;
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e9d5275
  17. 25 9月, 2013 1 次提交
    • D
      ipc: fix race with LSMs · 53dad6d3
      Davidlohr Bueso 提交于
      Currently, IPC mechanisms do security and auditing related checks under
      RCU.  However, since security modules can free the security structure,
      for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
      race if the structure is freed before other tasks are done with it,
      creating a use-after-free condition.  Manfred illustrates this nicely,
      for instance with shared mem and selinux:
      
       -> do_shmat calls rcu_read_lock()
       -> do_shmat calls shm_object_check().
           Checks that the object is still valid - but doesn't acquire any locks.
           Then it returns.
       -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
       -> selinux_shm_shmat calls ipc_has_perm()
       -> ipc_has_perm accesses ipc_perms->security
      
      shm_close()
       -> shm_close acquires rw_mutex & shm_lock
       -> shm_close calls shm_destroy
       -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
       -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
       -> ipc_free_security calls kfree(ipc_perms->security)
      
      This patch delays the freeing of the security structures after all RCU
      readers are done.  Furthermore it aligns the security life cycle with
      that of the rest of IPC - freeing them based on the reference counter.
      For situations where we need not free security, the current behavior is
      kept.  Linus states:
      
       "... the old behavior was suspect for another reason too: having the
        security blob go away from under a user sounds like it could cause
        various other problems anyway, so I think the old code was at least
        _prone_ to bugs even if it didn't have catastrophic behavior."
      
      I have tested this patch with IPC testcases from LTP on both my
      quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
      enabled, and tests pass for both voluntary and forced preemption models.
      While the mentioned races are theoretical (at least no one as reported
      them), I wanted to make sure that this new logic doesn't break anything
      we weren't aware of.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53dad6d3
  18. 12 9月, 2013 1 次提交