1. 01 10月, 2013 1 次提交
    • M
      ipc/sem.c: fix race in sem_lock() · 5e9d5275
      Manfred Spraul 提交于
      The exclusion of complex operations in sem_lock() is insufficient: after
      acquiring the per-semaphore lock, a simple op must first check that
      sem_perm.lock is not locked and only after that test check
      complex_count.  The current code does it the other way around - and that
      creates a race.  Details are below.
      
      The patch is a complete rewrite of sem_lock(), based in part on the code
      from Mike Galbraith.  It removes all gotos and all loops and thus the
      risk of livelocks.
      
      I have tested the patch (together with the next one) on my i3 laptop and
      it didn't cause any problems.
      
      The bug is probably also present in 3.10 and 3.11, but for these kernels
      it might be simpler just to move the test of sma->complex_count after
      the spin_is_locked() test.
      
      Details of the bug:
      
      Assume:
       - sma->complex_count = 0.
       - Thread 1: semtimedop(complex op that must sleep)
       - Thread 2: semtimedop(simple op).
      
      Pseudo-Trace:
      
      Thread 1: sem_lock(): acquire sem_perm.lock
      Thread 1: sem_lock(): check for ongoing simple ops
      			Nothing ongoing, thread 2 is still before sem_lock().
      Thread 1: try_atomic_semop()
      	<<< preempted.
      
      Thread 2: sem_lock():
              static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                            int nsops)
              {
                      int locknum;
               again:
                      if (nsops == 1 && !sma->complex_count) {
                              struct sem *sem = sma->sem_base + sops->sem_num;
      
                              /* Lock just the semaphore we are interested in. */
                              spin_lock(&sem->lock);
      
                              /*
                               * If sma->complex_count was set while we were spinning,
                               * we may need to look at things we did not lock here.
                               */
                              if (unlikely(sma->complex_count)) {
                                      spin_unlock(&sem->lock);
                                      goto lock_array;
                              }
              <<<<<<<<<
      	<<< complex_count is still 0.
      	<<<
              <<< Here it is preempted
              <<<<<<<<<
      
      Thread 1: try_atomic_semop() returns, notices that it must sleep.
      Thread 1: increases sma->complex_count.
      Thread 1: drops sem_perm.lock
      Thread 2:
                      /*
                       * Another process is holding the global lock on the
                       * sem_array; we cannot enter our critical section,
                       * but have to wait for the global lock to be released.
                       */
                      if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
                              spin_unlock(&sem->lock);
                              spin_unlock_wait(&sma->sem_perm.lock);
                              goto again;
                      }
      	<<< sem_perm.lock already dropped, thus no "goto again;"
      
                      locknum = sops->sem_num;
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e9d5275
  2. 25 9月, 2013 1 次提交
    • D
      ipc: fix race with LSMs · 53dad6d3
      Davidlohr Bueso 提交于
      Currently, IPC mechanisms do security and auditing related checks under
      RCU.  However, since security modules can free the security structure,
      for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
      race if the structure is freed before other tasks are done with it,
      creating a use-after-free condition.  Manfred illustrates this nicely,
      for instance with shared mem and selinux:
      
       -> do_shmat calls rcu_read_lock()
       -> do_shmat calls shm_object_check().
           Checks that the object is still valid - but doesn't acquire any locks.
           Then it returns.
       -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
       -> selinux_shm_shmat calls ipc_has_perm()
       -> ipc_has_perm accesses ipc_perms->security
      
      shm_close()
       -> shm_close acquires rw_mutex & shm_lock
       -> shm_close calls shm_destroy
       -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
       -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
       -> ipc_free_security calls kfree(ipc_perms->security)
      
      This patch delays the freeing of the security structures after all RCU
      readers are done.  Furthermore it aligns the security life cycle with
      that of the rest of IPC - freeing them based on the reference counter.
      For situations where we need not free security, the current behavior is
      kept.  Linus states:
      
       "... the old behavior was suspect for another reason too: having the
        security blob go away from under a user sounds like it could cause
        various other problems anyway, so I think the old code was at least
        _prone_ to bugs even if it didn't have catastrophic behavior."
      
      I have tested this patch with IPC testcases from LTP on both my
      quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
      enabled, and tests pass for both voluntary and forced preemption models.
      While the mentioned races are theoretical (at least no one as reported
      them), I wanted to make sure that this new logic doesn't break anything
      we weren't aware of.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53dad6d3
  3. 12 9月, 2013 1 次提交
  4. 10 7月, 2013 8 次提交
  5. 27 5月, 2013 1 次提交
    • M
      ipc/sem.c: Fix missing wakeups in do_smart_update_queue() · ab465df9
      Manfred Spraul 提交于
      do_smart_update_queue() is called when an operation (semop,
      semctl(SETVAL), semctl(SETALL), ...) modified the array.  It must check
      which of the sleeping tasks can proceed.
      
      do_smart_update_queue() missed a few wakeups:
       - if a sleeping complex op was completed, then all per-semaphore queues
         must be scanned - not only those that were modified by *sops
       - if a sleeping simple op proceeded, then the global queue must be
         scanned again
      
      And:
       - the test for "|sops == NULL) before scanning the global queue is not
         required: If the global queue is empty, then it doesn't need to be
         scanned - regardless of the reason for calling do_smart_update_queue()
      
      The patch is not optimized, i.e.  even completing a wait-for-zero
      operation causes a rescan.  This is done to keep the patch as simple as
      possible.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab465df9
  6. 10 5月, 2013 2 次提交
  7. 05 5月, 2013 7 次提交
  8. 03 5月, 2013 1 次提交
  9. 01 5月, 2013 4 次提交
    • R
      ipc,sem: fine grained locking for semtimedop · 6062a8dc
      Rik van Riel 提交于
      Introduce finer grained locking for semtimedop, to handle the common case
      of a program wanting to manipulate one semaphore from an array with
      multiple semaphores.
      
      If the call is a semop manipulating just one semaphore in an array with
      multiple semaphores, only take the lock for that semaphore itself.
      
      If the call needs to manipulate multiple semaphores, or another caller is
      in a transaction that manipulates multiple semaphores, the sem_array lock
      is taken, as well as all the locks for the individual semaphores.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      threads			patches		rwlock patches	v3 patches
      10	610652		726325		1783589		2142206
      20	341570		365699		1520453		1977878
      30	288102		307037		1498167		2037995
      40	290714		305955		1612665		2256484
      50	288620		312890		1733453		2650292
      60	289987		306043		1649360		2388008
      70	291298		306347		1723167		2717486
      80	290948		305662		1729545		2763582
      90	290996		306680		1736021		2757524
      100	292243		306700		1773700		3059159
      
      [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
      [davidlohr.bueso@hp.com: make refcounter atomic]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Jason Low <jason.low2@hp.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NEmmanuel Benisty <benisty.e@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6062a8dc
    • R
      ipc,sem: have only one list in struct sem_queue · 9f1bc2c9
      Rik van Riel 提交于
      Having only one list in struct sem_queue, and only queueing simple
      semaphore operations on the list for the semaphore involved, allows us to
      introduce finer grained locking for semtimedop.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f1bc2c9
    • R
      ipc,sem: open code and rename sem_lock · c460b662
      Rik van Riel 提交于
      Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
      later that only locks the sem_array and does nothing else.
      
      Open code the locking from ipc_lock() in sem_obtain_lock() so we can
      introduce finer grained locking for the sem_array in the next patch.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460b662
    • D
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso 提交于
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16df3674
  10. 06 3月, 2013 1 次提交
  11. 04 3月, 2013 1 次提交
  12. 07 9月, 2012 1 次提交
  13. 03 11月, 2011 3 次提交
  14. 26 7月, 2011 1 次提交
  15. 21 7月, 2011 1 次提交
  16. 31 3月, 2011 1 次提交
  17. 24 3月, 2011 1 次提交
  18. 02 10月, 2010 1 次提交
    • D
      sys_semctl: fix kernel stack leakage · 982f7c2b
      Dan Rosenberg 提交于
      The semctl syscall has several code paths that lead to the leakage of
      uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
      IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
      version of the semid_ds struct.
      
      The copy_semid_to_user() function declares a semid_ds struct on the stack
      and copies it back to the user without initializing or zeroing the
      "sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
      allowing the leakage of 16 bytes of kernel stack memory.
      
      The code is still reachable on 32-bit systems - when calling semctl()
      newer glibc's automatically OR the IPC command with the IPC_64 flag, but
      invoking the syscall directly allows users to use the older versions of
      the struct.
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      982f7c2b
  19. 21 7月, 2010 1 次提交
  20. 28 5月, 2010 2 次提交
    • J
      ipc/sem.c: use ERR_CAST · 4de85cd6
      Julia Lawall 提交于
      Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
      clear what is the purpose of the operation, which otherwise looks like a
      no-op.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      type T;
      T x;
      identifier f;
      @@
      
      T f (...) { <+...
      - ERR_PTR(PTR_ERR(x))
      + x
       ...+> }
      
      @@
      expression x;
      @@
      
      - ERR_PTR(PTR_ERR(x))
      + ERR_CAST(x)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4de85cd6
    • M
      ipc/sem.c: update description of the implementation · c5cf6359
      Manfred Spraul 提交于
      ipc/sem.c begins with a 15 year old description about bugs in the initial
      implementation in Linux-1.0.  The patch replaces that with a top level
      description of the current code.
      
      A TODO could be derived from this text:
      
      The opengroup man page for semop() does not mandate FIFO.  Thus there is
      no need for a semaphore array list of pending operations.
      
      If
      
      - this list is removed
      - the per-semaphore array spinlock is removed (possible if there is no
        list to protect)
      - sem_otime is moved into the semaphores and calculated on demand during
        semctl()
      
      then the array would be read-mostly - which would significantly improve
      scaling for applications that use semaphore arrays with lots of entries.
      
      The price would be expensive semctl() calls:
      
      	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
      	<do stuff>
      	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);
      
      I'm not sure if the complexity is worth the effort, thus here is the
      documentation of the current behavior first.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5cf6359