1. 28 1月, 2014 1 次提交
    • P
      ipc/sem.c: avoid overflow of semop undo (semadj) value · 78f5009c
      Petr Mladek 提交于
      When trying to understand semop code, I found a small mistake in the check
      for semadj (undo) value overflow.  The new undo value is not stored
      immediately and next potential checks are done against the old value.
      
      The failing scenario is not much practical.  One semop call has to do more
      operations on the same semaphore.  Also semval and semadj must have
      different values, so there has to be some operations without SEM_UNDO
      flag.  For example:
      
      	struct sembuf depositor_op[1];
      	struct sembuf collector_op[2];
      
      	depositor_op[0].sem_num = 0;
      	depositor_op[0].sem_op = 20000;
      	depositor_op[0].sem_flg = 0;
      
      	collector_op[0].sem_num = 0;
      	collector_op[0].sem_op = -10000;
      	collector_op[0].sem_flg = SEM_UNDO;
      	collector_op[1].sem_num = 0;
      	collector_op[1].sem_op = -10000;
      	collector_op[1].sem_flg = SEM_UNDO;
      
      	if (semop(semid, depositor_op, 1) == -1)
      		{ perror("Failed to do 1st deposit"); return 1; }
      
      	if (semop(semid, collector_op, 2) == -1)
      		{ perror("Failed to do 1st collect"); return 1; }
      
      	if (semop(semid, depositor_op, 1) == -1)
      		{ perror("Failed to do 2nd deposit"); return 1; }
      
      	if (semop(semid, collector_op, 2) == -1)
      		{ perror("Failed to do 2nd collect"); return 1; }
      
      	return 0;
      
      It passes without error now but the semadj value has overflown in the 2nd
      collector operation.
      
      [akpm@linux-foundation.org: restore lessened scope of local `undo']
      [davidlohr@hp.com: correct header comment for perform_atomic_semop]
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78f5009c
  2. 17 10月, 2013 1 次提交
    • M
      ipc/sem.c: synchronize semop and semctl with IPC_RMID · 6e224f94
      Manfred Spraul 提交于
      After acquiring the semlock spinlock, operations must test that the
      array is still valid.
      
       - semctl() and exit_sem() would walk stale linked lists (ugly, but
         should be ok: all lists are empty)
      
       - semtimedop() would sleep forever - and if woken up due to a signal -
         access memory after free.
      
      The patch also:
       - standardizes the tests for .deleted, so that all tests in one
         function leave the function with the same approach.
       - unconditionally tests for .deleted immediately after every call to
         sem_lock - even it it means that for semctl(GETALL), .deleted will be
         tested twice.
      
      Both changes make the review simpler: After every sem_lock, there must
      be a test of .deleted, followed by a goto to the cleanup code (if the
      function uses "goto cleanup").
      
      The only exception is semctl_down(): If sem_ids().rwsem is locked, then
      the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
      additional test is required.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e224f94
  3. 01 10月, 2013 4 次提交
    • M
      ipc/sem.c: update sem_otime for all operations · 0e8c6656
      Manfred Spraul 提交于
      In commit 0a2b9d4c ("ipc/sem.c: move wake_up_process out of the
      spinlock section"), the update of semaphore's sem_otime(last semop time)
      was moved to one central position (do_smart_update).
      
      But since do_smart_update() is only called for operations that modify
      the array, this means that wait-for-zero semops do not update sem_otime
      anymore.
      
      The fix is simple:
      Non-alter operations must update sem_otime.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Reported-by: NJia He <jiakernel@gmail.com>
      Tested-by: NJia He <jiakernel@gmail.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e8c6656
    • M
      ipc/sem.c: synchronize the proc interface · d8c63376
      Manfred Spraul 提交于
      The proc interface is not aware of sem_lock(), it instead calls
      ipc_lock_object() directly.  This means that simple semop() operations
      can run in parallel with the proc interface.  Right now, this is
      uncritical, because the implementation doesn't do anything that requires
      a proper synchronization.
      
      But it is dangerous and therefore should be fixed.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8c63376
    • M
      ipc/sem.c: optimize sem_lock() · 6d07b68c
      Manfred Spraul 提交于
      Operations that need access to the whole array must guarantee that there
      are no simple operations ongoing.  Right now this is achieved by
      spin_unlock_wait(sem->lock) on all semaphores.
      
      If complex_count is nonzero, then this spin_unlock_wait() is not
      necessary, because it was already performed in the past by the thread
      that increased complex_count and even though sem_perm.lock was dropped
      inbetween, no simple operation could have started, because simple
      operations cannot start when complex_count is non-zero.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d07b68c
    • M
      ipc/sem.c: fix race in sem_lock() · 5e9d5275
      Manfred Spraul 提交于
      The exclusion of complex operations in sem_lock() is insufficient: after
      acquiring the per-semaphore lock, a simple op must first check that
      sem_perm.lock is not locked and only after that test check
      complex_count.  The current code does it the other way around - and that
      creates a race.  Details are below.
      
      The patch is a complete rewrite of sem_lock(), based in part on the code
      from Mike Galbraith.  It removes all gotos and all loops and thus the
      risk of livelocks.
      
      I have tested the patch (together with the next one) on my i3 laptop and
      it didn't cause any problems.
      
      The bug is probably also present in 3.10 and 3.11, but for these kernels
      it might be simpler just to move the test of sma->complex_count after
      the spin_is_locked() test.
      
      Details of the bug:
      
      Assume:
       - sma->complex_count = 0.
       - Thread 1: semtimedop(complex op that must sleep)
       - Thread 2: semtimedop(simple op).
      
      Pseudo-Trace:
      
      Thread 1: sem_lock(): acquire sem_perm.lock
      Thread 1: sem_lock(): check for ongoing simple ops
      			Nothing ongoing, thread 2 is still before sem_lock().
      Thread 1: try_atomic_semop()
      	<<< preempted.
      
      Thread 2: sem_lock():
              static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                            int nsops)
              {
                      int locknum;
               again:
                      if (nsops == 1 && !sma->complex_count) {
                              struct sem *sem = sma->sem_base + sops->sem_num;
      
                              /* Lock just the semaphore we are interested in. */
                              spin_lock(&sem->lock);
      
                              /*
                               * If sma->complex_count was set while we were spinning,
                               * we may need to look at things we did not lock here.
                               */
                              if (unlikely(sma->complex_count)) {
                                      spin_unlock(&sem->lock);
                                      goto lock_array;
                              }
              <<<<<<<<<
      	<<< complex_count is still 0.
      	<<<
              <<< Here it is preempted
              <<<<<<<<<
      
      Thread 1: try_atomic_semop() returns, notices that it must sleep.
      Thread 1: increases sma->complex_count.
      Thread 1: drops sem_perm.lock
      Thread 2:
                      /*
                       * Another process is holding the global lock on the
                       * sem_array; we cannot enter our critical section,
                       * but have to wait for the global lock to be released.
                       */
                      if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
                              spin_unlock(&sem->lock);
                              spin_unlock_wait(&sma->sem_perm.lock);
                              goto again;
                      }
      	<<< sem_perm.lock already dropped, thus no "goto again;"
      
                      locknum = sops->sem_num;
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e9d5275
  4. 25 9月, 2013 1 次提交
    • D
      ipc: fix race with LSMs · 53dad6d3
      Davidlohr Bueso 提交于
      Currently, IPC mechanisms do security and auditing related checks under
      RCU.  However, since security modules can free the security structure,
      for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
      race if the structure is freed before other tasks are done with it,
      creating a use-after-free condition.  Manfred illustrates this nicely,
      for instance with shared mem and selinux:
      
       -> do_shmat calls rcu_read_lock()
       -> do_shmat calls shm_object_check().
           Checks that the object is still valid - but doesn't acquire any locks.
           Then it returns.
       -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
       -> selinux_shm_shmat calls ipc_has_perm()
       -> ipc_has_perm accesses ipc_perms->security
      
      shm_close()
       -> shm_close acquires rw_mutex & shm_lock
       -> shm_close calls shm_destroy
       -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
       -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
       -> ipc_free_security calls kfree(ipc_perms->security)
      
      This patch delays the freeing of the security structures after all RCU
      readers are done.  Furthermore it aligns the security life cycle with
      that of the rest of IPC - freeing them based on the reference counter.
      For situations where we need not free security, the current behavior is
      kept.  Linus states:
      
       "... the old behavior was suspect for another reason too: having the
        security blob go away from under a user sounds like it could cause
        various other problems anyway, so I think the old code was at least
        _prone_ to bugs even if it didn't have catastrophic behavior."
      
      I have tested this patch with IPC testcases from LTP on both my
      quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
      enabled, and tests pass for both voluntary and forced preemption models.
      While the mentioned races are theoretical (at least no one as reported
      them), I wanted to make sure that this new logic doesn't break anything
      we weren't aware of.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53dad6d3
  5. 12 9月, 2013 1 次提交
  6. 10 7月, 2013 8 次提交
  7. 27 5月, 2013 1 次提交
    • M
      ipc/sem.c: Fix missing wakeups in do_smart_update_queue() · ab465df9
      Manfred Spraul 提交于
      do_smart_update_queue() is called when an operation (semop,
      semctl(SETVAL), semctl(SETALL), ...) modified the array.  It must check
      which of the sleeping tasks can proceed.
      
      do_smart_update_queue() missed a few wakeups:
       - if a sleeping complex op was completed, then all per-semaphore queues
         must be scanned - not only those that were modified by *sops
       - if a sleeping simple op proceeded, then the global queue must be
         scanned again
      
      And:
       - the test for "|sops == NULL) before scanning the global queue is not
         required: If the global queue is empty, then it doesn't need to be
         scanned - regardless of the reason for calling do_smart_update_queue()
      
      The patch is not optimized, i.e.  even completing a wait-for-zero
      operation causes a rescan.  This is done to keep the patch as simple as
      possible.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab465df9
  8. 10 5月, 2013 2 次提交
  9. 05 5月, 2013 7 次提交
  10. 03 5月, 2013 1 次提交
  11. 01 5月, 2013 4 次提交
    • R
      ipc,sem: fine grained locking for semtimedop · 6062a8dc
      Rik van Riel 提交于
      Introduce finer grained locking for semtimedop, to handle the common case
      of a program wanting to manipulate one semaphore from an array with
      multiple semaphores.
      
      If the call is a semop manipulating just one semaphore in an array with
      multiple semaphores, only take the lock for that semaphore itself.
      
      If the call needs to manipulate multiple semaphores, or another caller is
      in a transaction that manipulates multiple semaphores, the sem_array lock
      is taken, as well as all the locks for the individual semaphores.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      threads			patches		rwlock patches	v3 patches
      10	610652		726325		1783589		2142206
      20	341570		365699		1520453		1977878
      30	288102		307037		1498167		2037995
      40	290714		305955		1612665		2256484
      50	288620		312890		1733453		2650292
      60	289987		306043		1649360		2388008
      70	291298		306347		1723167		2717486
      80	290948		305662		1729545		2763582
      90	290996		306680		1736021		2757524
      100	292243		306700		1773700		3059159
      
      [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
      [davidlohr.bueso@hp.com: make refcounter atomic]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Jason Low <jason.low2@hp.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NEmmanuel Benisty <benisty.e@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6062a8dc
    • R
      ipc,sem: have only one list in struct sem_queue · 9f1bc2c9
      Rik van Riel 提交于
      Having only one list in struct sem_queue, and only queueing simple
      semaphore operations on the list for the semaphore involved, allows us to
      introduce finer grained locking for semtimedop.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f1bc2c9
    • R
      ipc,sem: open code and rename sem_lock · c460b662
      Rik van Riel 提交于
      Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
      later that only locks the sem_array and does nothing else.
      
      Open code the locking from ipc_lock() in sem_obtain_lock() so we can
      introduce finer grained locking for the sem_array in the next patch.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460b662
    • D
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso 提交于
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16df3674
  12. 06 3月, 2013 1 次提交
  13. 04 3月, 2013 1 次提交
  14. 07 9月, 2012 1 次提交
  15. 03 11月, 2011 3 次提交
  16. 26 7月, 2011 1 次提交
  17. 21 7月, 2011 1 次提交
  18. 31 3月, 2011 1 次提交