1. 02 5月, 2013 29 次提交
  2. 01 5月, 2013 11 次提交
    • D
      Merge branch 'next' into for-linus · bf61c884
      Dmitry Torokhov 提交于
      Prepare first set of updates for 3.10 merge window.
      bf61c884
    • L
      Merge branch 'ipc-scalability' · 823e75f7
      Linus Torvalds 提交于
      Merge IPC cleanup and scalability patches from Andrew Morton.
      
      This cleans up many of the oddities in the IPC code, uses the list
      iterator helpers, splits out locking and adds per-semaphore locks for
      greater scalability of the IPC semaphore code.
      
      Most normal user-level locking by now uses futexes (ie pthreads, but
      also a lot of specialized locks), but SysV IPC semaphores are apparently
      still used in some big applications, either for portability reasons, or
      because they offer tracking and undo (and you don't need to have a
      special shared memory area for them).
      
      Our IPC semaphore scalability was pitiful.  We used to lock much too big
      ranges, and we used to have a single ipc lock per ipc semaphore array.
      Most loads never cared, but some do.  There are some numbers in the
      individual commits.
      
      * ipc-scalability:
        ipc: sysv shared memory limited to 8TiB
        ipc/msg.c: use list_for_each_entry_[safe] for list traversing
        ipc,sem: fine grained locking for semtimedop
        ipc,sem: have only one list in struct sem_queue
        ipc,sem: open code and rename sem_lock
        ipc,sem: do not hold ipc lock more than necessary
        ipc: introduce lockless pre_down ipcctl
        ipc: introduce obtaining a lockless ipc object
        ipc: remove bogus lock comment for ipc_checkid
        ipc/msgutil.c: use linux/uaccess.h
        ipc: refactor msg list search into separate function
        ipc: simplify msg list search
        ipc: implement MSG_COPY as a new receive mode
        ipc: remove msg handling from queue scan
        ipc: set EFAULT as default error in load_msg()
        ipc: tighten msg copy loops
        ipc: separate msg allocation from userspace copy
        ipc: clamp with min()
      823e75f7
    • R
      ipc: sysv shared memory limited to 8TiB · d69f3bad
      Robin Holt 提交于
      Trying to run an application which was trying to put data into half of
      memory using shmget(), we found that having a shmall value below 8EiB-8TiB
      would prevent us from using anything more than 8TiB.  By setting
      kernel.shmall greater than 8EiB-8TiB would make the job work.
      
      In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
      
      ipc/shm.c:
       458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
       459 {
      ...
       465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
      ...
       474         if (ns->shm_tot + numpages > ns->shm_ctlall)
       475                 return -ENOSPC;
      
      [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Reported-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d69f3bad
    • N
      ipc/msg.c: use list_for_each_entry_[safe] for list traversing · 41239fe8
      Nikola Pajkovsky 提交于
      The ipc/msg.c code does its list operations by hand and it open-codes the
      accesses, instead of using for_each_entry_[safe].
      Signed-off-by: NNikola Pajkovsky <npajkovs@redhat.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41239fe8
    • R
      ipc,sem: fine grained locking for semtimedop · 6062a8dc
      Rik van Riel 提交于
      Introduce finer grained locking for semtimedop, to handle the common case
      of a program wanting to manipulate one semaphore from an array with
      multiple semaphores.
      
      If the call is a semop manipulating just one semaphore in an array with
      multiple semaphores, only take the lock for that semaphore itself.
      
      If the call needs to manipulate multiple semaphores, or another caller is
      in a transaction that manipulates multiple semaphores, the sem_array lock
      is taken, as well as all the locks for the individual semaphores.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      threads			patches		rwlock patches	v3 patches
      10	610652		726325		1783589		2142206
      20	341570		365699		1520453		1977878
      30	288102		307037		1498167		2037995
      40	290714		305955		1612665		2256484
      50	288620		312890		1733453		2650292
      60	289987		306043		1649360		2388008
      70	291298		306347		1723167		2717486
      80	290948		305662		1729545		2763582
      90	290996		306680		1736021		2757524
      100	292243		306700		1773700		3059159
      
      [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
      [davidlohr.bueso@hp.com: make refcounter atomic]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Jason Low <jason.low2@hp.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NEmmanuel Benisty <benisty.e@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6062a8dc
    • R
      ipc,sem: have only one list in struct sem_queue · 9f1bc2c9
      Rik van Riel 提交于
      Having only one list in struct sem_queue, and only queueing simple
      semaphore operations on the list for the semaphore involved, allows us to
      introduce finer grained locking for semtimedop.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f1bc2c9
    • R
      ipc,sem: open code and rename sem_lock · c460b662
      Rik van Riel 提交于
      Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
      later that only locks the sem_array and does nothing else.
      
      Open code the locking from ipc_lock() in sem_obtain_lock() so we can
      introduce finer grained locking for the sem_array in the next patch.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c460b662
    • D
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso 提交于
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16df3674
    • D
      ipc: introduce lockless pre_down ipcctl · 444d0f62
      Davidlohr Bueso 提交于
      Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
      check permissions, mostly for IPC_RMID and IPC_SET commands.
      
      Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
      The locking version is retained, yet modified to call the nolock version
      without affecting its semantics, thus transparent to all ipc callers.
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      444d0f62
    • D
      ipc: introduce obtaining a lockless ipc object · 4d2bff5e
      Davidlohr Bueso 提交于
      Through ipc_lock() and therefore ipc_lock_check() we currently return the
      locked ipc object.  This is not necessary for all situations and can,
      therefore, cause unnecessary ipc lock contention.
      
      Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
      functions that only lookup and return the ipc object.
      
      Both these functions must be called within the RCU read critical section.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d2bff5e
    • D
      ipc: remove bogus lock comment for ipc_checkid · 7bb4deff
      Davidlohr Bueso 提交于
      This series makes the sysv semaphore code more scalable, by reducing the
      time the semaphore lock is held, and making the locking more scalable for
      semaphore arrays with multiple semaphores.
      
      The first four patches were written by Davidlohr Buesso, and reduce the
      hold time of the semaphore lock.
      
      The last three patches change the sysv semaphore code locking to be more
      fine grained, providing a performance boost when multiple semaphores in a
      semaphore array are being manipulated simultaneously.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      	threads			patches		rwlock patches	v3 patches
      	10	610652		726325		1783589		2142206
      	20	341570		365699		1520453		1977878
      	30	288102		307037		1498167		2037995
      	40	290714		305955		1612665		2256484
      	50	288620		312890		1733453		2650292
      	60	289987		306043		1649360		2388008
      	70	291298		306347		1723167		2717486
      	80	290948		305662		1729545		2763582
      	90	290996		306680		1736021		2757524
      	100	292243		306700		1773700		3059159
      
      This patch:
      
      There is no reason to be holding the ipc lock while reading ipcp->seq,
      hence remove misleading comment.
      
      Also simplify the return value for the function.
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bb4deff