1. 01 5月, 2013 3 次提交
    • R
      ipc,sem: have only one list in struct sem_queue · 9f1bc2c9
      Rik van Riel 提交于
      Having only one list in struct sem_queue, and only queueing simple
      semaphore operations on the list for the semaphore involved, allows us to
      introduce finer grained locking for semtimedop.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f1bc2c9
    • R
      ipc,sem: open code and rename sem_lock · c460b662
      Rik van Riel 提交于
      Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
      later that only locks the sem_array and does nothing else.
      
      Open code the locking from ipc_lock() in sem_obtain_lock() so we can
      introduce finer grained locking for the sem_array in the next patch.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460b662
    • D
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso 提交于
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16df3674
  2. 06 3月, 2013 1 次提交
  3. 04 3月, 2013 1 次提交
  4. 07 9月, 2012 1 次提交
  5. 03 11月, 2011 3 次提交
  6. 26 7月, 2011 1 次提交
  7. 21 7月, 2011 1 次提交
  8. 31 3月, 2011 1 次提交
  9. 24 3月, 2011 1 次提交
  10. 02 10月, 2010 1 次提交
    • D
      sys_semctl: fix kernel stack leakage · 982f7c2b
      Dan Rosenberg 提交于
      The semctl syscall has several code paths that lead to the leakage of
      uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
      IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
      version of the semid_ds struct.
      
      The copy_semid_to_user() function declares a semid_ds struct on the stack
      and copies it back to the user without initializing or zeroing the
      "sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
      allowing the leakage of 16 bytes of kernel stack memory.
      
      The code is still reachable on 32-bit systems - when calling semctl()
      newer glibc's automatically OR the IPC command with the IPC_64 flag, but
      invoking the syscall directly allows users to use the older versions of
      the struct.
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      982f7c2b
  11. 21 7月, 2010 1 次提交
  12. 28 5月, 2010 4 次提交
    • J
      ipc/sem.c: use ERR_CAST · 4de85cd6
      Julia Lawall 提交于
      Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
      clear what is the purpose of the operation, which otherwise looks like a
      no-op.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      type T;
      T x;
      identifier f;
      @@
      
      T f (...) { <+...
      - ERR_PTR(PTR_ERR(x))
      + x
       ...+> }
      
      @@
      expression x;
      @@
      
      - ERR_PTR(PTR_ERR(x))
      + ERR_CAST(x)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4de85cd6
    • M
      ipc/sem.c: update description of the implementation · c5cf6359
      Manfred Spraul 提交于
      ipc/sem.c begins with a 15 year old description about bugs in the initial
      implementation in Linux-1.0.  The patch replaces that with a top level
      description of the current code.
      
      A TODO could be derived from this text:
      
      The opengroup man page for semop() does not mandate FIFO.  Thus there is
      no need for a semaphore array list of pending operations.
      
      If
      
      - this list is removed
      - the per-semaphore array spinlock is removed (possible if there is no
        list to protect)
      - sem_otime is moved into the semaphores and calculated on demand during
        semctl()
      
      then the array would be read-mostly - which would significantly improve
      scaling for applications that use semaphore arrays with lots of entries.
      
      The price would be expensive semctl() calls:
      
      	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
      	<do stuff>
      	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);
      
      I'm not sure if the complexity is worth the effort, thus here is the
      documentation of the current behavior first.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5cf6359
    • M
      ipc/sem.c: move wake_up_process out of the spinlock section · 0a2b9d4c
      Manfred Spraul 提交于
      The wake-up part of semtimedop() consists out of two steps:
      
      - the right tasks must be identified.
      - they must be woken up.
      
      Right now, both steps run while the array spinlock is held.  This patch
      reorders the code and moves the actual wake_up_process() behind the point
      where the spinlock is dropped.
      
      The code also moves setting sem->sem_otime to one place: It does not make
      sense to set the last modify time multiple times.
      
      [akpm@linux-foundation.org: repair kerneldoc]
      [akpm@linux-foundation.org: fix uninitialised retval]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a2b9d4c
    • M
      ipc/sem.c: optimize update_queue() for bulk wakeup calls · fd5db422
      Manfred Spraul 提交于
      The following series of patches tries to fix the spinlock contention
      reported by Chris Mason - his benchmark exposes problems of the current
      code:
      
      - In the worst case, the algorithm used by update_queue() is O(N^2).
        Bulk wake-up calls can enter this worst case.  The patch series fix
        that.
      
        Note that the benchmark app doesn't expose the problem, it just should
        be fixed: Real world apps might do the wake-ups in another order than
        perfect FIFO.
      
      - The part of the code that runs within the semaphore array spinlock is
        significantly larger than necessary.
      
        The patch series fixes that.  This change is responsible for the main
        improvement.
      
      - The cacheline with the spinlock is also used for a variable that is
        read in the hot path (sem_base) and for a variable that is unnecessarily
        written to multiple times (sem_otime).  The last step of the series
        cacheline-aligns the spinlock.
      
      This patch:
      
      The SysV semaphore code allows to perform multiple operations on all
      semaphores in the array as atomic operations.  After a modification,
      update_queue() checks which of the waiting tasks can complete.
      
      The algorithm that is used to identify the tasks is O(N^2) in the worst
      case.  For some cases, it is simple to avoid the O(N^2).
      
      The patch adds a detection logic for some cases, especially for the case
      of an array where all sleeping tasks are single sembuf operations and a
      multi-sembuf operation is used to wake up multiple tasks.
      
      A big database application uses that approach.
      
      The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
      the patch breaks that.
      
      [akpm@linux-foundation.org: make do_smart_update() static]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd5db422
  13. 16 12月, 2009 9 次提交
  14. 15 4月, 2009 1 次提交
  15. 14 1月, 2009 2 次提交
  16. 07 1月, 2009 1 次提交
  17. 06 1月, 2009 1 次提交
  18. 17 10月, 2008 1 次提交
  19. 26 7月, 2008 4 次提交
  20. 29 4月, 2008 2 次提交