提交 · 321310ced2d6cc0175c76fa512fa8a829ee35223 · OpenHarmony / kernel_linux

05 5月, 2013 5 次提交

ipc: move sem_obtain_lock() rcu locking into the only caller · 321310ce

由 Linus Torvalds 提交于 5月 04, 2013

sem_obtain_lock() was another of those functions that returned with the
RCU lock held for reading in the success case.  Move the RCU locking to
the caller (semtimedop()), making it more obvious.  We already did RCU
locking elsewhere in that function.

Side note: why does semtimedop() re-do the semphore lookup after the
sleep, rather than just getting a reference to the semaphore it already
looked up originally?
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

321310ce

ipc: fix double sem unlock in semctl error path · fbfd1d28

由 Linus Torvalds 提交于 5月 04, 2013

Fix another ipc locking buglet introduced by the scalability patches:
when semctl_down() was changed to delay the semaphore locking, one error
path for security_sem_semctl() went through the semaphore unlock logic
even though the semaphore had never been locked.

Introduced by commit 16df3674 ("ipc,sem: do not hold ipc lock more
than necessary")
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fbfd1d28

ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers · 4091fd94

由 Linus Torvalds 提交于 5月 04, 2013

This is another ipc semaphore locking cleanup, trying to make the
locking more straightforward.  We move the rcu read locking into the
callers of sem_lock_and_putref(), which in general means that we now
mostly do the rcu_read_lock() and rcu_read_unlock() in the same
function.

Mostly.  We still have the ipc_addid/newary/freeary mess, and things
like ipcctl_pre_down_nolock().
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4091fd94

ipc: sem_putref() does not need the semaphore lock any more · 73b29505

由 Linus Torvalds 提交于 5月 03, 2013

ipc_rcu_putref() uses atomics for the refcount, and the games to lock
and unlock the semaphore just to try to keep the reference counting
working are no longer useful.
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73b29505

ipc: move rcu_read_unlock() out of sem_unlock() and into callers · 6d49dab8

由 Linus Torvalds 提交于 5月 03, 2013

The IPC locking is a mess, and sem_unlock() unlocks not only the
semaphore spinlock, it also drops the rcu read lock.  Unlike sem_lock(),
which just gets the spin-lock, and expects the caller to get the rcu
read lock.

This all makes things very hard to follow, and it's very confusing when
you take the rcu read lock in one function, and then release it in
another.  And it has caused actual bugs: the sem_obtain_lock() function
ended up dropping the RCU read lock twice in one error path, because it
first did the sem_unlock(), and then did a rcu_read_unlock() to match
the rcu_read_lock() it had done.

This is just a totally mindless "remove rcu_read_unlock() from
sem_unlock() and add it immediately after each caller" (except for the
aforementioned bug where we did too many rcu_read_unlock(), and in
find_alloc_undo() where we just got the rcu_read_lock() to correct for
the fact that sem_unlock would immediately drop it again).

We can (and should) clean things up further, but this fixes the bug with
the minimal amount of subtlety.
Reviewed-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d49dab8

03 5月, 2013 1 次提交

ipc: fix GETALL/IPC_RM race for sysv semaphores · ce857229

由 Al Viro 提交于 5月 03, 2013

We can step on WARN_ON_ONCE() in sem_getref() if a semaphore is removed
just as we are about to call sem_getref() from semctl_main(); results
are not pretty.

We should fail with -EIDRM, same as if IPC_RM happened while we'd been
doing allocation there.  This also expands sem_getref() at its only
callsite (and fixed there), while sem_getref_and_unlock() is simply
killed off - it has no callers at all.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ce857229

01 5月, 2013 4 次提交

ipc,sem: fine grained locking for semtimedop · 6062a8dc

由 Rik van Riel 提交于 4月 30, 2013

Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.

If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.

If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
threads			patches		rwlock patches	v3 patches
10	610652		726325		1783589		2142206
20	341570		365699		1520453		1977878
30	288102		307037		1498167		2037995
40	290714		305955		1612665		2256484
50	288620		312890		1733453		2650292
60	289987		306043		1649360		2388008
70	291298		306347		1723167		2717486
80	290948		305662		1729545		2763582
90	290996		306680		1736021		2757524
100	292243		306700		1773700		3059159

[davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
[davidlohr.bueso@hp.com: make refcounter atomic]
Signed-off-by: NRik van Riel <riel@redhat.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Jason Low <jason.low2@hp.com>
Reviewed-by: NMichel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: NEmmanuel Benisty <benisty.e@gmail.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6062a8dc

ipc,sem: have only one list in struct sem_queue · 9f1bc2c9

由 Rik van Riel 提交于 4月 30, 2013

Having only one list in struct sem_queue, and only queueing simple
semaphore operations on the list for the semaphore involved, allows us to
introduce finer grained locking for semtimedop.
Signed-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f1bc2c9

ipc,sem: open code and rename sem_lock · c460b662

由 Rik van Riel 提交于 4月 30, 2013

Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
later that only locks the sem_array and does nothing else.

Open code the locking from ipc_lock() in sem_obtain_lock() so we can
introduce finer grained locking for the sem_array in the next patch.

[akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
Signed-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c460b662

ipc,sem: do not hold ipc lock more than necessary · 16df3674

由 Davidlohr Bueso 提交于 4月 30, 2013

Instead of holding the ipc lock for permissions and security checks, among
others, only acquire it when necessary.

Some numbers....

1) With Rik's semop-multi.c microbenchmark we can see the following
   results:

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
+   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
+   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check

2) While on an Oracle swingbench DSS (data mining) workload the
   improvements are not as exciting as with Rik's benchmark, we can see
   some positive numbers.  For an 8 socket machine the following are the
   percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

[riel@redhat.com: fix two locking bugs]
[sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
Acked-by: NMichel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

16df3674

06 3月, 2013 1 次提交

get rid of union semop in sys_semctl(2) arguments · e1fd1f49

由 Al Viro 提交于 3月 05, 2013

just have the bugger take unsigned long and deal with SETVAL
case (when we use an int member in the union) explicitly.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1fd1f49

04 3月, 2013 1 次提交
- A
  make HAVE_SYSCALL_WRAPPERS unconditional · 22d1a35d
  由 Al Viro 提交于 1月 21, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  22d1a35d
07 9月, 2012 1 次提交

userns: Convert ipc to use kuid and kgid where appropriate · 1efdb69b

由 Eric W. Biederman 提交于 2月 07, 2012

- Store the ipc owner and creator with a kuid
- Store the ipc group and the crators group with a kgid.
- Add error handling to ipc_update_perms, allowing it to
  fail if the uids and gids can not be converted to kuids
  or kgids.
- Modify the proc files to display the ipc creator and
  owner in the user namespace of the opener of the proc file.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

1efdb69b

03 11月, 2011 3 次提交

ipc/sem.c: remove private structures from public header file · e57940d7

由 Manfred Spraul 提交于 11月 02, 2011

include/linux/sem.h contains several structures that are only used within
ipc/sem.c.

The patch moves them into ipc/sem.c - there is no need to expose the
structures to the whole kernel.

No functional changes, only whitespace cleanups and 80-char per line
fixes.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e57940d7

ipc/sem.c: handle spurious wakeups · 0b0577f6

由 Manfred Spraul 提交于 11月 02, 2011

semtimedop() does not handle spurious wakeups, it returns -EINTR to user
space.  Most other schedule() users would just loop and not return to user
space.  The patch adds such a loop to semtimedop()
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0b0577f6

ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID) · 3c24783b

由 Manfred Spraul 提交于 11月 02, 2011

sys_semtimedop() may return -EIDRM although the semaphore operation
completed successfully:

thread 1:	thread 2:
		semtimedop(), sleeps
semop():
* acquires sem_lock()
		semtimedop() woken up due to timeout
		sem_lock() loops
* notices that thread 2 could be completed.
* performs the operations that thread 2 is sleeping on.
* marks the semaphore operation as IN_WAKEUP
* drops sem_lock(), does wakeup, sets return code to 0
		* thread delayed due to interrupt, whatever
* returns to user space
		* thread still delayed
semctl(IPC_RMID)
* acquires sem_lock()
* ipc_rmid(), ipcp->deleted=1
* drops sem_lock()
		* thread finally continues - but seem_lock()
		  now fails due to ipcp->deleted == 1
		* returns -EIDRM instead of 0

The fix is trivial: Always use the return code in queue.status.

In real world, the race probably doesn't matter:
If the semaphore array is destroyed, the app is probably not interested
if the last operation succeeded or was already cancelled.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c24783b

26 7月, 2011 1 次提交

ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID · d694ad62

由 Manfred Spraul 提交于 7月 25, 2011

If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed. This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.

The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142Reported-by: NYuriy Yevtukhov <yuriy@ucoz.com>
Reported-by: NHarald Laabs <kernel@dasr.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@kernel.org> [2.6.35+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d694ad62

21 7月, 2011 1 次提交

ipc,rcu: Convert call_rcu(free_un) to kfree_rcu() · 693a8b6e

由 Lai Jiangshan 提交于 3月 18, 2011

The rcu callback free_un() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(free_un).
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

693a8b6e

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

24 3月, 2011 1 次提交

userns: user namespaces: convert several capable() calls · b0e77598

由 Serge E. Hallyn 提交于 3月 23, 2011

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
	Jan 11: Use task_ns_capable() in place of sched_capable().
	Jan 11: Use nsown_capable() as suggested by Bastian Blank.
	Jan 11: Clarify (hopefully) some logic in futex and sched.c
	Feb 15: use ns_capable for ipc, not nsown_capable
	Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
	Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0e77598

02 10月, 2010 1 次提交

sys_semctl: fix kernel stack leakage · 982f7c2b

由 Dan Rosenberg 提交于 9月 30, 2010

The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.
Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

982f7c2b

21 7月, 2010 1 次提交

ipc/sem.c: bugfix for semop() not reporting successful operation · c61284e9

由 Manfred Spraul 提交于 7月 20, 2010

The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma->sem_perm.lock).

This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task.  Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.

Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255

[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Reported-by: NLuca Tettamanti <kronos.it@gmail.com>
Tested-by: NLuca Tettamanti <kronos.it@gmail.com>
Reported-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c61284e9

28 5月, 2010 4 次提交

ipc/sem.c: use ERR_CAST · 4de85cd6

由 Julia Lawall 提交于 5月 26, 2010

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4de85cd6

ipc/sem.c: update description of the implementation · c5cf6359

由 Manfred Spraul 提交于 5月 26, 2010

ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0.  The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO.  Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
  list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
  semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
	<do stuff>
	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5cf6359

ipc/sem.c: move wake_up_process out of the spinlock section · 0a2b9d4c

由 Manfred Spraul 提交于 5月 26, 2010

The wake-up part of semtimedop() consists out of two steps:

- the right tasks must be identified.
- they must be woken up.

Right now, both steps run while the array spinlock is held.  This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.

The code also moves setting sem->sem_otime to one place: It does not make
sense to set the last modify time multiple times.

[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a2b9d4c

ipc/sem.c: optimize update_queue() for bulk wakeup calls · fd5db422

由 Manfred Spraul 提交于 5月 26, 2010

The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:

- In the worst case, the algorithm used by update_queue() is O(N^2).
  Bulk wake-up calls can enter this worst case.  The patch series fix
  that.

  Note that the benchmark app doesn't expose the problem, it just should
  be fixed: Real world apps might do the wake-ups in another order than
  perfect FIFO.

- The part of the code that runs within the semaphore array spinlock is
  significantly larger than necessary.

  The patch series fixes that.  This change is responsible for the main
  improvement.

- The cacheline with the spinlock is also used for a variable that is
  read in the hot path (sem_base) and for a variable that is unnecessarily
  written to multiple times (sem_otime).  The last step of the series
  cacheline-aligns the spinlock.

This patch:

The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations.  After a modification,
update_queue() checks which of the waiting tasks can complete.

The algorithm that is used to identify the tasks is O(N^2) in the worst
case.  For some cases, it is simple to avoid the O(N^2).

The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.

A big database application uses that approach.

The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.

[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd5db422

16 12月, 2009 9 次提交

ipc: remove unreachable code in sem.c · e5cc9c7b

由 Amerigo Wang 提交于 12月 15, 2009

This line is unreachable, remove it.

[akpm@linux-foundation.org: remove unneeded initialisation of `err']
Signed-off-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5cc9c7b

ipc/sem.c: optimize single sops when semval is zero · d987f8b2

由 Manfred Spraul 提交于 12月 15, 2009

If multiple simple decrements on the same semaphore are pending, then the
current code scans all decrement operations, even if the semaphore value
is already 0.

The patch optimizes that: if the semaphore value is 0, then there is no
need to scan the q->alter entries.

Note that this is a common case: It happens if 100 decrements by one are
pending and now an increment by one increases the semaphore value from 0
to 1.  Without this patch, all 100 entries are scanned.  With the patch,
only one entry is scanned, then woken up.  Then the new rule triggers and
the scanning is aborted, without looking at the remaining 99 tasks.

With this patch, single sop increment/decrement by 1 are now O(1).
(same as with Nick's patch)
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d987f8b2

ipc/sem.c: optimize single semop operations · 636c6be8

由 Manfred Spraul 提交于 12月 15, 2009

sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch optimizes single semaphore operation calls that affect only one
semaphore: It's not necessary to scan all pending operations, it is
sufficient to scan the per-semaphore list.

The idea is from Nick Piggin version of an ipc sem improvement, the
implementation is different: The code tries to keep as much common code as
possible.

As the result, the patch is simpler, but optimizes fewer cases.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

636c6be8

ipc/sem.c: add a per-semaphore pending list · b97e820f

由 Manfred Spraul 提交于 12月 15, 2009

Based on Nick's findings:

sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch is the first step for optimizing simple, single semaphore
operations: In addition to the global list of all pending operations, a
2nd, per-semaphore list with the simple operations is added.

Note: this patch does not make sense by itself, the new list is used
nowhere.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b97e820f

ipc/sem.c: optimize if semops fail · b6e90822

由 Manfred Spraul 提交于 12月 15, 2009

Reduce the amount of scanning of the list of pending semaphore operations:
If try_atomic_semop failed, then no changes were applied.  Thus no need to
restart.

Additionally, this patch correct an incorrect comment: It's possible to
wait for arbitrary semaphore values (do a dec by <x>, wait-for-zero, inc
by <x> in one atomic operation)

Both changes are from Nick Piggin, the patch is the result of a different
split of the individual changes.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6e90822

ipc/sem.c: sem preempt improve · d4212093

由 Nick Piggin 提交于 12月 15, 2009

The strange sysv semaphore wakeup scheme has a kind of busy-wait lock
involved, which could deadlock if preemption is enabled during the "lock".

It is an implementation detail (due to a spinlock being held) that this is
actually the case. However if "spinlocks" are made preemptible, or if the
sem lock is changed to a sleeping lock for example, then the wakeup would
become buggy. So this might be a bugfix for -rt kernels.

Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- if
wakee has higher RT priority then there is a priority inversion deadlock.
Even if there is not a priority inversion to cause a deadlock, then there
is still time wasted spinning.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4212093

ipc/sem.c: sem use list operations · 9cad200c

由 Nick Piggin 提交于 12月 15, 2009

Replace the handcoded list operations in update_queue() with the standard
list_for_each_entry macros.

list_for_each_entry_safe() must be used, because list entries can
disappear immediately uppon the wakeup event.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cad200c

ipc/sem.c: sem optimise undo list search · bf17bb71

由 Nick Piggin 提交于 12月 15, 2009

Around a month ago, there was some discussion about an improvement of the
sysv sem algorithm: Most (at least: some important) users only use simple
semaphore operations, therefore it's worthwile to optimize this use case.

This patch:

Move last looked up sem_undo struct to the head of the task's undo list.
Attempt to move common entries to the front of the list so search time is
reduced.  This reduces lookup_undo on oprofile of problematic SAP workload
by 30% (see patch 4 for a description of SAP workload).
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf17bb71

ipc ns: fix memory leak (idr) · 7d6feeb2

由 Serge E. Hallyn 提交于 12月 15, 2009

We have apparently had a memory leak since
7ca7e564 "ipc: store ipcs into IDRs" in
2007.  The idr of which 3 exist for each ipc namespace is never freed.

This patch simply frees them when the ipcns is freed.  I don't believe any
idr_remove() are done from rcu (and could therefore be delayed until after
this idr_destroy()), so the patch should be safe.  Some quick testing
showed no harm, and the memory leak fixed.

Caught by kmemleak.
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d6feeb2

15 4月, 2009 1 次提交

rculist: use list_entry_rcu in places where it's appropriate · 05725f7e

由 Jiri Pirko 提交于 4月 14, 2009

Use previously introduced list_entry_rcu instead of an open-coded
list_entry + rcu_dereference combination.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: dipankar@in.ibm.com
LKML-Reference: <20090414181715.GA3634@psychotron.englab.brq.redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

05725f7e

14 1月, 2009 2 次提交

H
[CVE-2009-0029] System call wrappers part 25 · d5460c99
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
d5460c99

[CVE-2009-0029] System call wrapper special cases · 6673e0c3

由 Heiko Carstens 提交于 1月 14, 2009

System calls with an unsigned long long argument can't be converted with
the standard wrappers since that would include a cast to long, which in
turn means that we would lose the upper 32 bit on 32 bit architectures.
Also semctl can't use the standard wrapper since it has a 'union'
parameter.

So we handle them as special case and add some extra wrappers instead.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

6673e0c3

07 1月, 2009 1 次提交

ipc: do not goto to the next line · e953ac21

由 Denis V. Lunev 提交于 1月 06, 2009

Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NWANG Cong <wangcong@zeuux.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e953ac21

06 1月, 2009 1 次提交

mm: update my address · 046c6884

由 Alan Cox 提交于 1月 05, 2009

Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

046c6884

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多