提交 · 4395eb1f16cc55406fe3de4546134fc61253a06b · openanolis / cloud-kernel

18 2月, 2015 1 次提交

ipc,sem: use current->state helpers · 52644c9a

由 Davidlohr Bueso 提交于 2月 17, 2015

Call __set_current_state() instead of assigning the new state directly.
These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping
track of who changed the state.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52644c9a

14 12月, 2014 4 次提交

shmdt: use i_size_read() instead of ->i_size · 07a46ed2

由 Dave Hansen 提交于 12月 12, 2014

Andrew Morton noted

	http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org

that the shmdt uses inode->i_size outside of i_mutex being held.
There is one more case in shm.c in shm_destroy().  This converts
both users over to use i_size_read().
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07a46ed2

ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments · d3c97900

由 Dave Hansen 提交于 12月 12, 2014

This is a highly-contrived scenario.  But, a single shmdt() call can be
induced in to unmapping memory from mulitple shm segments.  Example code
is here:

	http://www.sr71.net/~dave/intel/shmfun.c

The fix is pretty simple: Record the 'struct file' for the first VMA we
encounter and then stick to it.  Decline to unmap anything not from the
same file and thus the same segment.

I found this by inspection and the odds of anyone hitting this in practice
are pretty darn small.

Lightly tested, but it's a pretty small patch.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3c97900

ipc/msg: increase MSGMNI, remove scaling · 0050ee05

由 Manfred Spraul 提交于 12月 12, 2014

SysV can be abused to allocate locked kernel memory.  For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0050ee05

ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() · 2e094abf

由 Manfred Spraul 提交于 12月 12, 2014

When I fixed bugs in the sem_lock() logic, I was more conservative than
necessary.  Therefore it is safe to replace the smp_mb() with smp_rmb().
And: With smp_rmb(), semop() syscalls are up to 10% faster.

The race we must protect against is:

	sem->lock is free
	sma->complex_count = 0
	sma->sem_perm.lock held by thread B

thread A:

A: spin_lock(&sem->lock)

			B: sma->complex_count++; (now 1)
			B: spin_unlock(&sma->sem_perm.lock);

A: spin_is_locked(&sma->sem_perm.lock);
A: XXXXX memory barrier
A: if (sma->complex_count == 0)

Thread A must read the increased complex_count value, i.e. the read must
not be reordered with the read of sem_perm.lock done by spin_is_locked().

Since it's about ordering of reads, smp_rmb() is sufficient.

[akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr]
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
Acked-by: NRafael Aquini <aquini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e094abf

05 12月, 2014 5 次提交

A
copy address of proc_ns_ops into ns_common · 33c42940
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
33c42940

new helpers: ns_alloc_inum/ns_free_inum · 6344c433

由 Al Viro 提交于 11月 01, 2014

take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6344c433

make proc_ns_operations work with struct ns_common * instead of void * · 64964528

由 Al Viro 提交于 11月 01, 2014

We can do that now.  And kill ->inum(), while we are at it - all instances
are identical.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

64964528

A
switch the rest of proc_ns_operations to working with &...->ns · 3c041184
由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3c041184

common object embedded into various struct ....ns · 435d5f4b

由 Al Viro 提交于 10月 31, 2014

for now - just move corresponding ->proc_inum instances over there
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

435d5f4b

04 12月, 2014 1 次提交

ipc/sem.c: fully initialize sem_array before making it visible · e8577d1f

由 Manfred Spraul 提交于 12月 02, 2014

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Reported-by: NRik van Riel <riel@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
Acked-by: NRafael Aquini <aquini@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8577d1f

20 11月, 2014 1 次提交

new helper: audit_file() · 9f45f5bf

由 Al Viro 提交于 10月 31, 2014

... for situations when we don't have any candidate in pathnames - basically,
in descriptor-based syscalls.

[Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9f45f5bf

14 10月, 2014 4 次提交

ipc: resolve shadow warnings · 0d5e7580

由 Mark Rustad 提交于 10月 13, 2014

Resolve some shadow warnings produced in W=2 builds by changing the name
of some parameters and local variables.  Change instances of "s64"
because that clashes with the well-known typedef.  Also change a local
variable with the name "up" because that clashes with the name of of the
"up" function for semaphores.  These are hazards so eliminate the
hazards by renaming them.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0d5e7580

ipc/util.c: use __seq_open_private() instead of seq_open() · d66a0520

由 Rob Jones 提交于 10月 13, 2014

Using __seq_open_private() removes boilerplate code from
sysvipc_proc_open().

The resultant code is shorter and easier to follow.

However, please note that __seq_open_private() call kzalloc() rather than
kmalloc() which may affect timing due to the memory initialisation
overhead.
Signed-off-by: NRob Jones <rob.jones@codethink.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d66a0520

ipc/shm: kill the historical/wrong mm->start_stack check · bf77b94c

由 Oleg Nesterov 提交于 10月 13, 2014

do_shmat() is the only user of ->start_stack (proc just reports its
value), and this check looks ugly and wrong.

The reason for this check is not clear at all, and it wrongly assumes that
the stack can only grow down.

But the main problem is that in general mm->start_stack has nothing to do
with stack_vma->vm_start.  Not only the application can switch to another
stack and even unmap this area, setup_arg_pages() expands the stack
without updating mm->start_stack during exec().  This means that in the
likely case "addr > start_stack - size - PAGE_SIZE * 5" is simply
impossible after find_vma_intersection() == F, or the stack can't grow
anyway because of RLIMIT_STACK.

Many thanks to Hugh for his explanations.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf77b94c

ipc: always handle a new value of auto_msgmni · 1195d94e

由 Andrey Vagin 提交于 10月 13, 2014

proc_dointvec_minmax() returns zero if a new value has been set.  So we
don't need to check all charecters have been handled.

Below you can find two examples.  In the new value has not been handled
properly.

$ strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n\0", 3)                    = 2
close(3)                                = 0
exit_group(0)
$ cat /sys/kernel/debug/tracing/trace

$strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n", 2)                      = 2
close(3)                                = 0

$ cat /sys/kernel/debug/tracing/trace
a.out-697   [000] ....  3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax

Fixes: 9eefe520 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin")
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Joe Perches <joe@perches.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1195d94e

09 9月, 2014 1 次提交

Documentation: Docbook: Fix generated DocBook/kernel-api.xml · da3dae54

由 Masanari Iida 提交于 9月 09, 2014

This patch fix spelling typo found in DocBook/kernel-api.xml.
It is because the file is generated from the source comments,
I have to fix the comments in source codes.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

da3dae54

09 8月, 2014 2 次提交

shm: allow exit_shm in parallel if only marking orphans · 83293c0f

由 Jack Miller 提交于 8月 08, 2014

If shm_rmid_force (the default state) is not set then the shmids are only
marked as orphaned and does not require any add, delete, or locking of the
tree structure.

Seperate the sysctl on and off case, and only obtain the read lock.  The
newly added list head can be deleted under the read lock because we are
only called with current and will only change the semids allocated by this
task and not manipulate the list.

This commit assumes that up_read includes a sufficient memory barrier for
the writes to be seen my others that later obtain a write lock.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NJack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83293c0f

shm: make exit_shm work proportional to task activity · ab602f79

由 Jack Miller 提交于 8月 08, 2014

This is small set of patches our team has had kicking around for a few
versions internally that fixes tasks getting hung on shm_exit when there
are many threads hammering it at once.

Anton wrote a simple test to cause the issue:

  http://ozlabs.org/~anton/junkcode/bust_shm_exit.c

Before applying this patchset, this test code will cause either hanging
tracebacks or pthread out of memory errors.

After this patchset, it will still produce output like:

  root@somehost:~# ./bust_shm_exit 1024 160
  ...
  INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
  INFO: Stall ended before state dump start
  ...

But the task will continue to run along happily, so we consider this an
improvement over hanging, even if it's a bit noisy.

This patch (of 3):

exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
walks every shared memory segment in the namespace.  Thus the amount of
work is related to the number of shm segments in the namespace not the
number of segments that might need to be cleaned.

In addition, this occurs after the task has been notified the thread has
exited, so the number of tasks waiting for the ns shm rwsem can grow
without bound until memory is exausted.

Add a list to the task struct of all shmids allocated by this task.  Init
the list head in copy_process.  Use the ns->rwsem for locking.  Add
segments after id is added, remove before removing from id.

On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
to handling of semaphore undo.

I chose a define for the init sequence since its a simple list init,
otherwise it would require a function call to avoid include loops between
the semaphore code and the task struct.  Converting the list_del to
list_del_init for the unshare cases would remove the exit followed by
init, but I left it blow up if not inited.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NJack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab602f79

30 7月, 2014 1 次提交

namespaces: Use task_lock and not rcu to protect nsproxy · 728dba3a

由 Eric W. Biederman 提交于 2月 03, 2014

The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
a sufficiently expensive system call that people have complained.

Upon inspect nsproxy no longer needs rcu protection for remote reads.
remote reads are rare.  So optimize for same process reads and write
by switching using rask_lock instead.

This yields a simpler to understand lock, and a faster setns system call.

In particular this fixes a performance regression observed
by Rafael David Tinoco <rafael.tinoco@canonical.com>.

This is effectively a revert of Pavel Emelyanov's commit
cf7b708c Make access to task's nsproxy lighter
from 2007.  The race this originialy fixed no longer exists as
do_notify_parent uses task_active_pid_ns(parent) instead of
parent->nsproxy.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

728dba3a

07 6月, 2014 16 次提交

ipc: convert use of typedef ctl_table to struct ctl_table · a5c5928b

由 Joe Perches 提交于 6月 06, 2014

This typedef is unnecessary and should just be removed.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5c5928b

ipc/sem.c: add a printk_once for semctl(GETNCNT/GETZCNT) · 9b44ee2e

由 Manfred Spraul 提交于 6月 06, 2014

The actual Linux implementation for semctl(GETNCNT) and semctl(GETZCNT)
always (since 0.99.10) reported a thread as sleeping on all semaphores
that are listed in the semop() call.

The documented behavior (both in the Linux man page and in the Single
Unix Specification) is that a task should be reported on exactly one
semaphore: The semaphore that caused the thread to got to sleep.

This patch adds a pr_info_once() that is triggered if a thread hits the
relevant case.

The code triggers slightly too often, otherwise it would be necessary to
replicate the old code.  As there are no known users of GETNCNT or
GETZCNT, this is done to prevent unnecessary bloat.

The task that triggered is reported with name (tsk->comm) and pid.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b44ee2e

ipc/sem.c: make semctl(,,{GETNCNT,GETZCNT}) standard compliant · b220c57a

由 Manfred Spraul 提交于 6月 06, 2014

SUSv4 clearly defines how semncnt and semzcnt must be calculated: A task
waits on exactly one semaphore: The semaphore from the first operation
in the sop array that cannot proceed.

The Linux implementation never followed the standard, it tried to count
all semaphores that might be the reason why a task sleeps.

This patch fixes that.

Note:
a) The implementation assumes that GETNCNT and GETZCNT are rare operations,
   therefore the code counts them only on demand.
   (If they wouldn't be rare, then the non-compliance would have
   been found earlier)

b) compared to the initial version of the patch, the BUG_ONs were removed
   and it was clarified that the new behavior conforms to SUS.

Back-compatibility concerns:

Manfred:

: - there is no application in Fedora that uses GETNCNT or GETZCNT.
:
: - application that use only single-sop semop() are also safe, the
:   difference only affects complex apps.
:
: - portable application are also safe, the new behavior is standard
:   compliant.
:
: But that's it.  The old behavior existed in Linux from 0.99.something
: until now.

Michael:

: * These operations seem to be very little used.  Grepping the public
:   source that is contained Fedora 20 source DVD, there appear to be no
:   uses.  Of course, this says nothing about uses in private /
:   non-mainstream FOSS code, but it seems likely that the same pattern
:   is followed there.
:
: * The existing behavior is hard enough to understand that I suspect
:   that no one understood it well enough to rely on it anyway
:   (especially as that behavior contradicted both man page and POSIX).
:
: So, there's a chance of breakage, but I estimate that it's minute.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b220c57a

ipc/sem.c: store which operation blocks in perform_atomic_semop() · ed247b7c

由 Manfred Spraul 提交于 6月 06, 2014

Preparation for the next patch:

In the slow-path of perform_atomic_semop(), store a pointer to the
operation that caused the operation to block.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed247b7c

ipc/sem.c: change perform_atomic_semop parameters · d198cd6d

由 Manfred Spraul 提交于 6月 06, 2014

Right now, perform_atomic_semop gets the content of sem_queue as
individual fields.  Changes that, instead pass a pointer to sem_queue.

This is a preparation for the next patch: it uses sem_queue to store the
reason why a task must sleep.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d198cd6d

ipc/sem.c: remove code duplication · 2f2ed41d

由 Manfred Spraul 提交于 6月 06, 2014

count_semzcnt and count_semncnt are more of less identical.  The patch
creates a single function that either counts the number of tasks waiting
for zero or waiting due to a decrease operation.

Compared to the initial version, the BUG_ONs were removed.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f2ed41d

ipc/sem.c: bugfix for semctl(,,GETZCNT) · 1994862d

由 Manfred Spraul 提交于 6月 06, 2014

GETZCNT is supposed to return the number of threads that wait until a
semaphore value becomes 0.

The current implementation overlooks complex operations that contain
both wait-for-zero operation and operations that alter at least one
semaphore.

The patch fixes that.  It's intentionally copy&paste, this will be
cleaned up in the next patch.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1994862d

ipc,msg: document volatile r_msg · 4bb6657d

由 Davidlohr Bueso 提交于 6月 06, 2014

The need for volatile is not obvious, document it.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bb6657d

ipc,msg: move some msgq ns code around · 3440a6bd

由 Davidlohr Bueso 提交于 6月 06, 2014

Nothing big and no logical changes, just get rid of some redundant
function declarations.  Move msg_[init/exit]_ns down the end of the
file.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3440a6bd

ipc,msg: use current->state helpers · f75a2f35

由 Davidlohr Bueso 提交于 6月 06, 2014

Call __set_current_state() instead of assigning the new state directly.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NManfred Spraul <manfred@colorfullif.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f75a2f35

ipc/shm.c: check for integer overflow during shmget. · 1376327c

由 Manfred Spraul 提交于 6月 06, 2014

SHMMAX is the upper limit for the size of a shared memory segment, counted
in bytes.  The actual allocation is that size, rounded up to the next full
page.

Add a check that prevents the creation of segments where the rounded up
size causes an integer overflow.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1376327c

ipc/shm.c: check for overflows of shm_tot · 09c6eb1f

由 Manfred Spraul 提交于 6月 06, 2014

shm_tot counts the total number of pages used by shm segments.

If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can
overflow.  Subsequent calls to shmctl(,SHM_INFO,) would return wrong
values for shm_tot.

The patch adds a detection for overflows.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09c6eb1f

ipc/shm.c: check for ulong overflows in shmat · 247a8ce8

由 Manfred Spraul 提交于 6月 06, 2014

The increase of SHMMAX/SHMALL is a 4 patch series.

The change itself is trivial, the only problem are interger overflows.
The overflows are not new, but if we make huge values the default, then
the code should be free from overflows.

SHMMAX:

- shmmem_file_setup places a hard limit on the segment size:
  MAX_LFS_FILESIZE.

  On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are
  possible. Rounded up to full pages the actual allocated size
  is 0. --> must be fixed, patch 3

- shmat:
  - find_vma_intersection does not handle overflows properly.
    --> must be fixed, patch 1

  - the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE
    and checks for overflows (i.e.: map 2 GB, starting from
    addr=2.5GB fails).

SHMALL:
- after creating 8192 segments size (1L<<63)-1, shm_tot overflows and
  returns 0.  --> must be fixed, patch 2.

Userspace:
- Obviously, there could be overflows in userspace. There is nothing
  we can do, only use values smaller than ULONG_MAX.
  I ended with "ULONG_MAX - 1L<<24":

  - TASK_SIZE cannot be used because it is the size of the current
    task. Could be 4G if it's a 32-bit task on a 64-bit kernel.

  - The maximum size is not standardized across archs:
    I found TASK_MAX_SIZE, TASK_SIZE_MAX and TASK_SIZE_64.

  - Just in case some arch revives a 4G/4G split, nearly
    ULONG_MAX is a valid segment size.

  - Using "0" as a magic value for infinity is even worse, because
    right now 0 means 0, i.e. fail all allocations.

This patch (of 4):

find_vma_intersection() does not work as intended if addr+size overflows.
The patch adds a manual check before the call to find_vma_intersection.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

247a8ce8

ipc, kernel: clear whitespace · 46c0a8ca

由 Paul McQuade 提交于 6月 06, 2014

trailing whitespace
Signed-off-by: NPaul McQuade <paulmcquad@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

46c0a8ca

ipc, kernel: use Linux headers · 7153e402

由 Paul McQuade 提交于 6月 06, 2014

Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>
Signed-off-by: NPaul McQuade <paulmcquad@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7153e402

ipc: constify ipc_ops · eb66ec44

由 Mathias Krause 提交于 6月 06, 2014

There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget.  Just declare it static and be
done with it.  While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb66ec44

08 4月, 2014 2 次提交

ipc: use device_initcall · 6d08a256

由 Davidlohr Bueso 提交于 4月 07, 2014

... since __initcall is now deprecated.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d08a256

ipc/compat.c: remove sc_semopm macro · 187841a8

由 Davidlohr Bueso 提交于 4月 07, 2014

This macro appears to have been introduced back in the 2.5 era for
semtimedop32 backward compatibility on ia32:

  https://lkml.org/lkml/2003/4/28/78

Nowadays, this syscall in compat just defaults back to the code found in
sem.c, so it is no longer used and can thus be removed:

long compat_sys_semtimedop(int semid, struct sembuf __user *tsems,
		unsigned nsops, const struct compat_timespec __user *timeout)
{
	struct timespec __user *ts64;
	if (compat_convert_timespec(&ts64, timeout))
		return -EFAULT;
	return sys_semtimedop(semid, tsems, nsops, ts64);
}

Furthermore, there are no users in compat.c.  After this change, kernel
builds just fine with both CONFIG_SYSVIPC_COMPAT and CONFIG_SYSVIPC.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

187841a8

17 3月, 2014 1 次提交

ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation · 4f87dac3

由 Michael Kerrisk 提交于 3月 10, 2014

While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
Kinsbursky added in commit 4a674f34 ("ipc: introduce message queue
copy feature" => kernel 3.8), I discovered a couple of bugs in the
implementation.  The two bugs concern MSG_COPY interactions with other
msgrcv() flags, namely:

 (A) MSG_COPY + MSG_EXCEPT
 (B) MSG_COPY + !IPC_NOWAIT

The bugs are distinct (and the fix for the first one is obvious),
however my fix for both is a single-line patch, which is why I'm
combining them in a single mail, rather than writing two mails+patches.

 ===== (A) MSG_COPY + MSG_EXCEPT =====

With the addition of the MSG_COPY flag, there are now two msgrcv()
flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
argument in unrelated ways.  Specifying both in the same call is a
logical error that is currently permitted, with the effect that MSG_COPY
has priority and MSG_EXCEPT is ignored.  The call should give an error
if both flags are specified.  The patch below implements that behavior.

 ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====

The test code that was submitted in commit 3a665531 ("selftests: IPC
message queue copy feature test") shows MSG_COPY being used in
conjunction with IPC_NOWAIT.  In other words, if there is no message at
the position 'msgtyp'.  return immediately with the error in ENOMSG.

What was not (fully) tested is the behavior if MSG_COPY is specified
*without* IPC_NOWAIT, and there is an odd behavior.  If the queue
contains less than 'msgtyp' messages, then the call blocks until the
next message is written to the queue.  At that point, the msgrcv() call
returns a copy of the newly added message, regardless of whether that
message is at the ordinal position 'msgtyp'.  This is clearly bogus, and
problematic for applications that might want to make use of the MSG_COPY
flag.

I considered the following possible solutions to this problem:

 (1) Force the call to block until a message *does* appear at the
     position 'msgtyp'.

 (2) If the MSG_COPY flag is specified, the kernel should implicitly add
     IPC_NOWAIT, so that the call fails with ENOMSG for this case.

 (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
     an error (probably, EINVAL is the right one).

I do not know if any application would really want to have the
functionality of solution (1), especially since an application can
determine in advance the number of messages in the queue using msgctl()
IPC_STAT.  Obviously, this solution would be the most work to implement.

Solution (2) would have the effect of silently fixing any applications
that tried to employ broken behavior.  However, it would mean that if we
later decided to implement solution (1), then user-space could not
easily detect what the kernel supports (but, since I'm somewhat doubtful
that solution (1) is needed, I'm not sure that this is much of a
problem).

Solution (3) would have the effect of informing broken applications that
they are doing something broken.  The downside is that this would cause
a ABI breakage for any applications that are currently employing the
broken behavior.  However:

a) Those applications are almost certainly not getting the results they
   expect.
b) Possibly, those applications don't even exist, because MSG_COPY is
   currently hidden behind CONFIG_CHECKPOINT_RESTORE.

The upside of solution (3) is that if we later decided to implement
solution (1), user-space could determine what the kernel supports, via
the error return.

In my view, solution (3) is mildly preferable to solution (2), and
solution (1) could still be done later if anyone really cares.  The
patch below implements solution (3).

PS.  For anyone out there still listening, it's the usual story:
documenting an API (and the thinking about, and the testing of the API,
that documentation entails) is the one of the single best ways of
finding bugs in the API, as I've learned from a lot of experience.  Best
to do that documentation before releasing the API.
Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4f87dac3

06 3月, 2014 1 次提交

ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types · 8eee9093

由 Heiko Carstens 提交于 3月 04, 2014

In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
performs proper zero and sign extension convert all 64 bit parameters
to their corresponding 32 bit compat counterparts.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

8eee9093

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功