提交 · ed247b7ca0d20b4528eec362f8435594818d7601 · openeuler / Kernel

07 6月, 2014 13 次提交

ipc/sem.c: store which operation blocks in perform_atomic_semop() · ed247b7c

由 Manfred Spraul 提交于 6月 06, 2014

Preparation for the next patch:

In the slow-path of perform_atomic_semop(), store a pointer to the
operation that caused the operation to block.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed247b7c

ipc/sem.c: change perform_atomic_semop parameters · d198cd6d

由 Manfred Spraul 提交于 6月 06, 2014

Right now, perform_atomic_semop gets the content of sem_queue as
individual fields.  Changes that, instead pass a pointer to sem_queue.

This is a preparation for the next patch: it uses sem_queue to store the
reason why a task must sleep.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d198cd6d

ipc/sem.c: remove code duplication · 2f2ed41d

由 Manfred Spraul 提交于 6月 06, 2014

count_semzcnt and count_semncnt are more of less identical.  The patch
creates a single function that either counts the number of tasks waiting
for zero or waiting due to a decrease operation.

Compared to the initial version, the BUG_ONs were removed.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f2ed41d

ipc/sem.c: bugfix for semctl(,,GETZCNT) · 1994862d

由 Manfred Spraul 提交于 6月 06, 2014

GETZCNT is supposed to return the number of threads that wait until a
semaphore value becomes 0.

The current implementation overlooks complex operations that contain
both wait-for-zero operation and operations that alter at least one
semaphore.

The patch fixes that.  It's intentionally copy&paste, this will be
cleaned up in the next patch.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1994862d

ipc,msg: document volatile r_msg · 4bb6657d

由 Davidlohr Bueso 提交于 6月 06, 2014

The need for volatile is not obvious, document it.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bb6657d

ipc,msg: move some msgq ns code around · 3440a6bd

由 Davidlohr Bueso 提交于 6月 06, 2014

Nothing big and no logical changes, just get rid of some redundant
function declarations.  Move msg_[init/exit]_ns down the end of the
file.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3440a6bd

ipc,msg: use current->state helpers · f75a2f35

由 Davidlohr Bueso 提交于 6月 06, 2014

Call __set_current_state() instead of assigning the new state directly.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NManfred Spraul <manfred@colorfullif.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f75a2f35

ipc/shm.c: check for integer overflow during shmget. · 1376327c

由 Manfred Spraul 提交于 6月 06, 2014

SHMMAX is the upper limit for the size of a shared memory segment, counted
in bytes.  The actual allocation is that size, rounded up to the next full
page.

Add a check that prevents the creation of segments where the rounded up
size causes an integer overflow.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1376327c

ipc/shm.c: check for overflows of shm_tot · 09c6eb1f

由 Manfred Spraul 提交于 6月 06, 2014

shm_tot counts the total number of pages used by shm segments.

If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can
overflow.  Subsequent calls to shmctl(,SHM_INFO,) would return wrong
values for shm_tot.

The patch adds a detection for overflows.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09c6eb1f

ipc/shm.c: check for ulong overflows in shmat · 247a8ce8

由 Manfred Spraul 提交于 6月 06, 2014

The increase of SHMMAX/SHMALL is a 4 patch series.

The change itself is trivial, the only problem are interger overflows.
The overflows are not new, but if we make huge values the default, then
the code should be free from overflows.

SHMMAX:

- shmmem_file_setup places a hard limit on the segment size:
  MAX_LFS_FILESIZE.

  On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are
  possible. Rounded up to full pages the actual allocated size
  is 0. --> must be fixed, patch 3

- shmat:
  - find_vma_intersection does not handle overflows properly.
    --> must be fixed, patch 1

  - the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE
    and checks for overflows (i.e.: map 2 GB, starting from
    addr=2.5GB fails).

SHMALL:
- after creating 8192 segments size (1L<<63)-1, shm_tot overflows and
  returns 0.  --> must be fixed, patch 2.

Userspace:
- Obviously, there could be overflows in userspace. There is nothing
  we can do, only use values smaller than ULONG_MAX.
  I ended with "ULONG_MAX - 1L<<24":

  - TASK_SIZE cannot be used because it is the size of the current
    task. Could be 4G if it's a 32-bit task on a 64-bit kernel.

  - The maximum size is not standardized across archs:
    I found TASK_MAX_SIZE, TASK_SIZE_MAX and TASK_SIZE_64.

  - Just in case some arch revives a 4G/4G split, nearly
    ULONG_MAX is a valid segment size.

  - Using "0" as a magic value for infinity is even worse, because
    right now 0 means 0, i.e. fail all allocations.

This patch (of 4):

find_vma_intersection() does not work as intended if addr+size overflows.
The patch adds a manual check before the call to find_vma_intersection.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

247a8ce8

ipc, kernel: clear whitespace · 46c0a8ca

由 Paul McQuade 提交于 6月 06, 2014

trailing whitespace
Signed-off-by: NPaul McQuade <paulmcquad@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

46c0a8ca

ipc, kernel: use Linux headers · 7153e402

由 Paul McQuade 提交于 6月 06, 2014

Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>
Signed-off-by: NPaul McQuade <paulmcquad@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7153e402

ipc: constify ipc_ops · eb66ec44

由 Mathias Krause 提交于 6月 06, 2014

There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget.  Just declare it static and be
done with it.  While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb66ec44

08 4月, 2014 2 次提交

ipc: use device_initcall · 6d08a256

由 Davidlohr Bueso 提交于 4月 07, 2014

... since __initcall is now deprecated.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d08a256

ipc/compat.c: remove sc_semopm macro · 187841a8

由 Davidlohr Bueso 提交于 4月 07, 2014

This macro appears to have been introduced back in the 2.5 era for
semtimedop32 backward compatibility on ia32:

  https://lkml.org/lkml/2003/4/28/78

Nowadays, this syscall in compat just defaults back to the code found in
sem.c, so it is no longer used and can thus be removed:

long compat_sys_semtimedop(int semid, struct sembuf __user *tsems,
		unsigned nsops, const struct compat_timespec __user *timeout)
{
	struct timespec __user *ts64;
	if (compat_convert_timespec(&ts64, timeout))
		return -EFAULT;
	return sys_semtimedop(semid, tsems, nsops, ts64);
}

Furthermore, there are no users in compat.c.  After this change, kernel
builds just fine with both CONFIG_SYSVIPC_COMPAT and CONFIG_SYSVIPC.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

187841a8

17 3月, 2014 1 次提交

ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation · 4f87dac3

由 Michael Kerrisk 提交于 3月 10, 2014

While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
Kinsbursky added in commit 4a674f34 ("ipc: introduce message queue
copy feature" => kernel 3.8), I discovered a couple of bugs in the
implementation.  The two bugs concern MSG_COPY interactions with other
msgrcv() flags, namely:

 (A) MSG_COPY + MSG_EXCEPT
 (B) MSG_COPY + !IPC_NOWAIT

The bugs are distinct (and the fix for the first one is obvious),
however my fix for both is a single-line patch, which is why I'm
combining them in a single mail, rather than writing two mails+patches.

 ===== (A) MSG_COPY + MSG_EXCEPT =====

With the addition of the MSG_COPY flag, there are now two msgrcv()
flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
argument in unrelated ways.  Specifying both in the same call is a
logical error that is currently permitted, with the effect that MSG_COPY
has priority and MSG_EXCEPT is ignored.  The call should give an error
if both flags are specified.  The patch below implements that behavior.

 ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====

The test code that was submitted in commit 3a665531 ("selftests: IPC
message queue copy feature test") shows MSG_COPY being used in
conjunction with IPC_NOWAIT.  In other words, if there is no message at
the position 'msgtyp'.  return immediately with the error in ENOMSG.

What was not (fully) tested is the behavior if MSG_COPY is specified
*without* IPC_NOWAIT, and there is an odd behavior.  If the queue
contains less than 'msgtyp' messages, then the call blocks until the
next message is written to the queue.  At that point, the msgrcv() call
returns a copy of the newly added message, regardless of whether that
message is at the ordinal position 'msgtyp'.  This is clearly bogus, and
problematic for applications that might want to make use of the MSG_COPY
flag.

I considered the following possible solutions to this problem:

 (1) Force the call to block until a message *does* appear at the
     position 'msgtyp'.

 (2) If the MSG_COPY flag is specified, the kernel should implicitly add
     IPC_NOWAIT, so that the call fails with ENOMSG for this case.

 (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
     an error (probably, EINVAL is the right one).

I do not know if any application would really want to have the
functionality of solution (1), especially since an application can
determine in advance the number of messages in the queue using msgctl()
IPC_STAT.  Obviously, this solution would be the most work to implement.

Solution (2) would have the effect of silently fixing any applications
that tried to employ broken behavior.  However, it would mean that if we
later decided to implement solution (1), then user-space could not
easily detect what the kernel supports (but, since I'm somewhat doubtful
that solution (1) is needed, I'm not sure that this is much of a
problem).

Solution (3) would have the effect of informing broken applications that
they are doing something broken.  The downside is that this would cause
a ABI breakage for any applications that are currently employing the
broken behavior.  However:

a) Those applications are almost certainly not getting the results they
   expect.
b) Possibly, those applications don't even exist, because MSG_COPY is
   currently hidden behind CONFIG_CHECKPOINT_RESTORE.

The upside of solution (3) is that if we later decided to implement
solution (1), user-space could determine what the kernel supports, via
the error return.

In my view, solution (3) is mildly preferable to solution (2), and
solution (1) could still be done later if anyone really cares.  The
patch below implements solution (3).

PS.  For anyone out there still listening, it's the usual story:
documenting an API (and the thinking about, and the testing of the API,
that documentation entails) is the one of the single best ways of
finding bugs in the API, as I've learned from a lot of experience.  Best
to do that documentation before releasing the API.
Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4f87dac3

06 3月, 2014 3 次提交

ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types · 8eee9093

由 Heiko Carstens 提交于 3月 04, 2014

In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
performs proper zero and sign extension convert all 64 bit parameters
to their corresponding 32 bit compat counterparts.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

8eee9093

ipc/compat: convert to COMPAT_SYSCALL_DEFINE · 5d70a596

由 Heiko Carstens 提交于 3月 04, 2014

Convert all compat system call functions where all parameter types
have a size of four or less than four bytes, or are pointer types
to COMPAT_SYSCALL_DEFINE.
The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
zero and sign extension to 64 bit of all parameters if needed.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

5d70a596

ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t · 291fdb0b

由 Heiko Carstens 提交于 3月 04, 2014

Change the type of compat_sys_msgrcv's msgtyp parameter from long
to compat_long_t, since compat user space passes only a 32 bit signed
value.
Let the compat wrapper do proper sign extension to 64 bit of this
parameter.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

291fdb0b

26 2月, 2014 1 次提交

ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9

由 Davidlohr Bueso 提交于 2月 25, 2014

Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

 "This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Reported-by: NMadars Vitolins <m@silodev.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[3.5+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3713fd9

03 2月, 2014 1 次提交

compat: Get rid of (get|put)_compat_time(val|spec) · 81993e81

由 H. Peter Anvin 提交于 2月 01, 2014

We have two APIs for compatiblity timespec/val, with confusingly
similar names.  compat_(get|put)_time(val|spec) *do* handle the case
where COMPAT_USE_64BIT_TIME is set, whereas
(get|put)_compat_time(val|spec) do not.  This is an accident waiting
to happen.

Clean it up by favoring the full-service version; the limited version
is replaced with double-underscore versions static to kernel/compat.c.

A common pattern is to convert a struct timespec to kernel format in
an allocation on the user stack.  Unfortunately it is open-coded in
several places.  Since this allocation isn't actually needed if
COMPAT_USE_64BIT_TIME is true (since user format == kernel format)
encapsulate that whole pattern into the function
compat_convert_timespec().  An equivalent function should be written
for struct timeval if it is needed in the future.

Finally, get rid of compat_(get|put)_timeval_convert(): each was only
used once, and the latter was not even doing what the function said
(no conversion actually was being done.)  Moving the conversion into
compat_sys_settimeofday() itself makes the code much more similar to
sys_settimeofday() itself.

v3: Remove unused compat_convert_timeval().

v2: Drop bogus "const" in the destination argument for
    compat_convert_time*().

Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: NH.J. Lu <hjl.tools@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

81993e81

28 1月, 2014 11 次提交

ipc: fix compat msgrcv with negative msgtyp · e7ca2552

由 Mateusz Guzik 提交于 1月 27, 2014

Compat function takes msgtyp argument as u32 and passes it down to
do_msgrcv which results in casting to long, thus the sign is lost and we
get a big positive number instead.

Cast the argument to signed type before passing it down.
Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
Reported-by: NGabriellla Schmidt <gsc@bruker.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e7ca2552

ipc,msg: document barriers · ffa571da

由 Davidlohr Bueso 提交于 1月 27, 2014

Both expunge_all() and pipeline_send() rely on both a nil msg value and
a full barrier to guarantee the correct ordering when waking up a task.

While its counterpart at the receiving end is well documented for the
lockless recv algorithm, we still need to document these specific
smp_mb() calls.

[akpm@linux-foundation.org: fix typo, per Mike]
[akpm@linux-foundation.org: mroe tpyos]
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ffa571da

ipc: delete seq_max field in struct ipc_ids · daf948c7

由 Davidlohr Bueso 提交于 1月 27, 2014

This field is only used to reset the ids seq number if it exceeds the
smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be
moved out of the structure and into its own macro.  Since each
ipc_namespace contains a table of 3 pointers to struct ipc_ids we can
save space in instruction text:

   text    data     bss     dec     hex filename
  56232    2348      24   58604    e4ec ipc/built-in.o
  56216    2348      24   58588    e4dc ipc/built-in.o-after
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Reviewed-by: NJonathan Gonzalez <jgonzalez@linets.cl>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

daf948c7

ipc: simplify sysvipc_proc_open() return · 8dc5cd04

由 Davidlohr Bueso 提交于 1月 27, 2014

Get rid of silly/useless label jumping.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8dc5cd04

ipc: remove useless return statement · 95d4eb28

由 Davidlohr Bueso 提交于 1月 27, 2014

Only found in ipc_rmid().
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95d4eb28

ipc: remove braces for single statements · 3ab08fe2

由 Davidlohr Bueso 提交于 1月 27, 2014

Deal with checkpatch messages:
     WARNING: braces {} are not necessary for single statement blocks
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ab08fe2

ipc: standardize code comments · 8001c858

由 Davidlohr Bueso 提交于 1月 27, 2014

IPC commenting style is all over the place, *specially* in util.c.  This
patch orders things a bit.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8001c858

ipc: whitespace cleanup · 239521f3

由 Manfred Spraul 提交于 1月 27, 2014

The ipc code does not adhere the typical linux coding style.
This patch fixes lots of simple whitespace errors.

- mostly autogenerated by
  scripts/checkpatch.pl -f --fix \
	--types=pointer_location,spacing,space_before_tab
- one manual fixup (keep structure members tab-aligned)
- removal of additional space_before_tab that were not found by --fix

Tested with some of my msg and sem test apps.

Andrew: Could you include it in -mm and move it towards Linus' tree?
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Suggested-by: NLi Bin <huawei.libin@huawei.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: NRafael Aquini <aquini@redhat.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

239521f3

ipc: change kern_ipc_perm.deleted type to bool · 72a8ff2f

由 Rafael Aquini 提交于 1月 27, 2014

struct kern_ipc_perm.deleted is meant to be used as a boolean toggle, and
the changes introduced by this patch are just to make the case explicit.
Signed-off-by: NRafael Aquini <aquini@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72a8ff2f

ipc: introduce ipc_valid_object() helper to sort out IPC_RMID races · 0f3d2b01

由 Rafael Aquini 提交于 1月 27, 2014

After the locking semantics for the SysV IPC API got improved, a couple
of IPC_RMID race windows were opened because we ended up dropping the
'kern_ipc_perm.deleted' check performed way down in ipc_lock().  The
spotted races got sorted out by re-introducing the old test within the
racy critical sections.

This patch introduces ipc_valid_object() to consolidate the way we cope
with IPC_RMID races by using the same abstraction across the API
implementation.
Signed-off-by: NRafael Aquini <aquini@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NGreg Thelen <gthelen@google.com>
Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f3d2b01

ipc/sem.c: avoid overflow of semop undo (semadj) value · 78f5009c

由 Petr Mladek 提交于 1月 27, 2014

When trying to understand semop code, I found a small mistake in the check
for semadj (undo) value overflow.  The new undo value is not stored
immediately and next potential checks are done against the old value.

The failing scenario is not much practical.  One semop call has to do more
operations on the same semaphore.  Also semval and semadj must have
different values, so there has to be some operations without SEM_UNDO
flag.  For example:

	struct sembuf depositor_op[1];
	struct sembuf collector_op[2];

	depositor_op[0].sem_num = 0;
	depositor_op[0].sem_op = 20000;
	depositor_op[0].sem_flg = 0;

	collector_op[0].sem_num = 0;
	collector_op[0].sem_op = -10000;
	collector_op[0].sem_flg = SEM_UNDO;
	collector_op[1].sem_num = 0;
	collector_op[1].sem_op = -10000;
	collector_op[1].sem_flg = SEM_UNDO;

	if (semop(semid, depositor_op, 1) == -1)
		{ perror("Failed to do 1st deposit"); return 1; }

	if (semop(semid, collector_op, 2) == -1)
		{ perror("Failed to do 1st collect"); return 1; }

	if (semop(semid, depositor_op, 1) == -1)
		{ perror("Failed to do 2nd deposit"); return 1; }

	if (semop(semid, collector_op, 2) == -1)
		{ perror("Failed to do 2nd collect"); return 1; }

	return 0;

It passes without error now but the semadj value has overflown in the 2nd
collector operation.

[akpm@linux-foundation.org: restore lessened scope of local `undo']
[davidlohr@hp.com: correct header comment for perform_atomic_semop]
Signed-off-by: NPetr Mladek <pmladek@suse.cz>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

78f5009c

22 11月, 2013 2 次提交

ipc,shm: correct error return value in shmctl (SHM_UNLOCK) · 3a72660b

由 Jesper Nilsson 提交于 11月 21, 2013

Commit 2caacaa8 ("ipc,shm: shorten critical region for shmctl")
restructured the ipc shm to shorten critical region, but introduced a
path where the return value could be -EPERM, even if the operation
actually was performed.

Before the commit, the err return value was reset by the return value
from security_shm_shmctl() after the if (!ns_capable(...)) statement.

Now, we still exit the if statement with err set to -EPERM, and in the
case of SHM_UNLOCK, it is not reset at all, and used as the return value
from shmctl.

To fix this, we only set err when errors occur, leaving the fallthrough
case alone.
Signed-off-by: NJesper Nilsson <jesper.nilsson@axis.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[3.12.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a72660b

ipc,shm: fix shm_file deletion races · a399b29d

由 Greg Thelen 提交于 11月 21, 2013

When IPC_RMID races with other shm operations there's potential for
use-after-free of the shm object's associated file (shm_file).

Here's the race before this patch:

  TASK 1                     TASK 2
  ------                     ------
  shm_rmid()
    ipc_lock_object()
                             shmctl()
                             shp = shm_obtain_object_check()

    shm_destroy()
      shum_unlock()
      fput(shp->shm_file)
                             ipc_lock_object()
                             shmem_lock(shp->shm_file)
                             <OOPS>

The oops is caused because shm_destroy() calls fput() after dropping the
ipc_lock.  fput() clears the file's f_inode, f_path.dentry, and
f_path.mnt, which causes various NULL pointer references in task 2.  I
reliably see the oops in task 2 if with shmlock, shmu

This patch fixes the races by:
1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
2) modify at risk operations to check shm_file while holding
   ipc_object_lock().

Example workloads, which each trigger oops...

Workload 1:
  while true; do
    id=$(shmget 1 4096)
    shm_rmid $id &
    shmlock $id &
    wait
  done

  The oops stack shows accessing NULL f_inode due to racing fput:
    _raw_spin_lock
    shmem_lock
    SyS_shmctl

Workload 2:
  while true; do
    id=$(shmget 1 4096)
    shmat $id 4096 &
    shm_rmid $id &
    wait
  done

  The oops stack is similar to workload 1 due to NULL f_inode:
    touch_atime
    shmem_mmap
    shm_mmap
    mmap_region
    do_mmap_pgoff
    do_shmat
    SyS_shmat

Workload 3:
  while true; do
    id=$(shmget 1 4096)
    shmlock $id
    shm_rmid $id &
    shmunlock $id &
    wait
  done

  The oops stack shows second fput tripping on an NULL f_inode.  The
  first fput() completed via from shm_destroy(), but a racing thread did
  a get_file() and queued this fput():
    locks_remove_flock
    __fput
    ____fput
    task_work_run
    do_notify_resume
    int_signal

Fixes: c2c737a0 ("ipc,shm: shorten critical region for shmat")
Fixes: 2caacaa8 ("ipc,shm: shorten critical region for shmctl")
Signed-off-by: NGreg Thelen <gthelen@google.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>  # 3.10.17+ 3.11.6+
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a399b29d

13 11月, 2013 2 次提交

ipc, msg: fix message length check for negative values · 4e9b45a1

由 Mathias Krause 提交于 11月 12, 2013

On 64 bit systems the test for negative message sizes is bogus as the
size, which may be positive when evaluated as a long, will get truncated
to an int when passed to load_msg().  So a long might very well contain a
positive value but when truncated to an int it would become negative.

That in combination with a small negative value of msg_ctlmax (which will
be promoted to an unsigned type for the comparison against msgsz, making
it a big positive value and therefore make it pass the check) will lead to
two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
small buffer as the addition of alen is effectively a subtraction.  2/ The
copy_from_user() call in load_msg() will first overflow the buffer with
userland data and then, when the userland access generates an access
violation, the fixup handler copy_user_handle_tail() will try to fill the
remainder with zeros -- roughly 4GB.  That almost instantly results in a
system crash or reset.

  ,-[ Reproducer (needs to be run as root) ]--
  | #include <sys/stat.h>
  | #include <sys/msg.h>
  | #include <unistd.h>
  | #include <fcntl.h>
  |
  | int main(void) {
  |     long msg = 1;
  |     int fd;
  |
  |     fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
  |     write(fd, "-1", 2);
  |     close(fd);
  |
  |     msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
  |
  |     return 0;
  | }
  '---

Fix the issue by preventing msgsz from getting truncated by consistently
using size_t for the message length.  This way the size checks in
do_msgsnd() could still be passed with a negative value for msg_ctlmax but
we would fail on the buffer allocation in that case and error out.

Also change the type of m_ts from int to size_t to avoid similar nastiness
in other code paths -- it is used in similar constructs, i.e.  signed vs.
unsigned checks.  It should never become negative under normal
circumstances, though.

Setting msg_ctlmax to a negative value is an odd configuration and should
be prevented.  As that might break existing userland, it will be handled
in a separate commit so it could easily be reverted and reworked without
reintroducing the above described bug.

Hardening mechanisms for user copy operations would have catched that bug
early -- e.g.  checking slab object sizes on user copy operations as the
usercopy feature of the PaX patch does.  Or, for that matter, detect the
long vs.  int sign change due to truncation, as the size overflow plugin
of the very same patch does.

[akpm@linux-foundation.org: fix i386 min() warnings]
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[ v2.3.27+ -- yes, that old ;) ]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e9b45a1

ipc/util.c: remove unnecessary work pending test · 206fa940

由 Xie XiuQi 提交于 11月 12, 2013

Remove unnecessary work pending test before calling schedule_work().  It
has been tested in queue_work_on() already.  No functional changed.
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

206fa940

09 11月, 2013 1 次提交

locks: break delegations on unlink · b21996e3

由 J. Bruce Fields 提交于 9月 20, 2011

We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

	acquire locks
	...
	test for delegation; if found:
		take reference on inode
		release locks
		wait for delegation break
		drop reference on inode
		retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b21996e3

04 11月, 2013 1 次提交

ipc, msg: forbid negative values for "msg{max,mnb,mni}" · 9bf76ca3

由 Mathias Krause 提交于 11月 03, 2013

Negative message lengths make no sense -- so don't do negative queue
lenghts or identifier counts. Prevent them from getting negative.

Also change the underlying data types to be unsigned to avoid hairy
surprises with sign extensions in cases where those variables get
evaluated in unsigned expressions with bigger data types, e.g size_t.

In case a user still wants to have "unlimited" sizes she could just use
INT_MAX instead.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bf76ca3

17 10月, 2013 2 次提交

ipc/sem.c: synchronize semop and semctl with IPC_RMID · 6e224f94

由 Manfred Spraul 提交于 10月 16, 2013

After acquiring the semlock spinlock, operations must test that the
array is still valid.

 - semctl() and exit_sem() would walk stale linked lists (ugly, but
   should be ok: all lists are empty)

 - semtimedop() would sleep forever - and if woken up due to a signal -
   access memory after free.

The patch also:
 - standardizes the tests for .deleted, so that all tests in one
   function leave the function with the same approach.
 - unconditionally tests for .deleted immediately after every call to
   sem_lock - even it it means that for semctl(GETALL), .deleted will be
   tested twice.

Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").

The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
additional test is required.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e224f94

ipc: update locking scheme comments · 18ccee26

由 Davidlohr Bueso 提交于 10月 16, 2013

The initial documentation was a bit incomplete, update accordingly.

[akpm@linux-foundation.org: make it more readable in 80 columns]
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18ccee26

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功