提交 · 95603e2293de556de7e82221649bfd7fd98b64a3 · openeuler / raspberrypi-kernel

13 6月, 2012 3 次提交

net-next: add dev_loopback_xmit() to avoid duplicate code · 95603e22

由 Michel Machado 提交于 6月 12, 2012

Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).

I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.

ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().
Signed-off-by: NMichel Machado <michel@digirati.com.br>
CC: "David S. Miller" <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jpirko@redhat.com>
CC: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95603e22

usbnet: remove flag of EVENT_DEV_WAKING · 4a5a14d3

由 tom.leiming@gmail.com 提交于 6月 11, 2012

The flag of EVENT_DEV_WAKING is not used any more, so just remove it.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a5a14d3

ipv4: Add interface option to enable routing of 127.0.0.0/8 · d0daebc3

由 Thomas Graf 提交于 6月 12, 2012

Routing of 127/8 is tradtionally forbidden, we consider
packets from that address block martian when routing and do
not process corresponding ARP requests.

This is a sane default but renders a huge address space
practically unuseable.

The RFC states that no address within the 127/8 block should
ever appear on any network anywhere but it does not forbid
the use of such addresses outside of the loopback device in
particular. For example to address a pool of virtual guests
behind a load balancer.

This patch adds a new interface option 'route_localnet'
enabling routing of the 127/8 address block and processing
of ARP requests on a specific interface.

Note that for the feature to work, the default local route
covering 127/8 dev lo needs to be removed.

Example:
  $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
  $ ip route del 127.0.0.0/8 dev lo table local
  $ ip addr add 127.1.0.1/16 dev eth0
  $ ip route flush cache

V2: Fix invalid check to auto flush cache (thanks davem)
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0daebc3

12 6月, 2012 1 次提交

net: keep name_hlist close to name · 9136461a

由 Eric Dumazet 提交于 6月 11, 2012

__dev_get_by_name() is slow because pm_qos_req has been inserted between
name[] and name_hlist, adding cache misses.

pm_qos_req has nothing to do at the beginning of struct net_device
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9136461a

10 6月, 2012 1 次提交

[PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. · 2397849b

由 David S. Miller 提交于 6月 09, 2012

Since it's guarenteed that we will access the inetpeer if we're trying
to do timewait recycling and TCP options were enabled on the
connection, just cache the peer in the timewait socket.

In the future, inetpeer lookups will be context dependent (per routing
realm), and this helps facilitate that as well.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2397849b

08 6月, 2012 1 次提交

Added kernel support in EEE Ethtool commands · 80f12ecc

由 Yuval Mintz 提交于 6月 06, 2012

This patch extends the kernel's ethtool interface by adding support
for 2 new EEE commands - get_eee and set_eee.

Thanks goes to Giuseppe Cavallaro for his original patch adding this support.
Signed-off-by: NYuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80f12ecc

07 6月, 2012 6 次提交

netfilter: xt_recent: add address masking option · efdedd54

由 Denys Fedoryshchenko 提交于 5月 17, 2012

The mask option allows you put all address belonging that mask into
the same recent slot. This can be useful in case that recent is used
to detect attacks from the same network segment.

Tested for backward compatibility.
Signed-off-by: NDenys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

efdedd54

netfilter: Add fail-open support · fdb694a0

由 Krishna Kumar 提交于 5月 24, 2012

Implement a new "fail-open" mode where packets are not dropped
upon queue-full condition. This mode can be enabled/disabled per
queue using netlink NFQA_CFG_FLAGS & NFQA_CFG_MASK attributes.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NVivek Kashyap <vivk@us.ibm.com>
Signed-off-by: NSridhar Samudrala <samudrala@us.ibm.com>

fdb694a0

netfilter: xt_connlimit: remove revision 0 · 68c07cb6

由 Cong Wang 提交于 5月 19, 2012

It was scheduled to be removed.

Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

68c07cb6

netfilter: remove include/linux/netfilter_ipv4/ipt_addrtype.h · 7a74c1a1

由 Cong Wang 提交于 5月 19, 2012

It was scheduled to be removed.
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7a74c1a1

cfg80211: validate remain-on-channel time better · ebf348fc

由 Johannes Berg 提交于 6月 01, 2012

The remain-on-channel time validation shouldn't
depend on the value of HZ, as it does now with
the check against jiffies, since then you might
use a value that works on one system but not on
another. Fix it by checking against a minimum
that's fixed.

Also add validation of the wait duration for a
management frame TX since this also translates
into remain-on-channel internally.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

ebf348fc

ssb: recognize ARM Cortex M3 · ccaf8c32

由 Hauke Mehrtens 提交于 5月 31, 2012

I found this core on a BCM4322, a PCI card in the Linksys WRT610N V1.
This core is not used by the driver, this patch just makes ssb show the
correct name.
Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

ccaf8c32

06 6月, 2012 2 次提交

cfg80211: provide channel to start_ap function · aa430da4

由 Johannes Berg 提交于 5月 16, 2012

Instead of setting the channel first and then
starting the AP, let cfg80211 store the channel
and provide it as one of the AP settings.

This means that now you have to set the channel
before you can start an AP interface, but since
hostapd/wpa_supplicant always do that we're OK
with this change.

Alternatively, it's now possible to give the
channel as an attribute to the start-ap nl80211
command, overriding any preset channel.

Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

aa430da4

nl80211: add new rssi event to indicate beacon loss · 5dad021d

由 Eliad Peller 提交于 5月 15, 2012

Tell userspace about beacon loss event.
This event doesn't replace the deauth/disassoc that
might come if the AP is not available.

The driver can send this event in order to hint
userspace what might follow (which in turn can
use it as roaming trigger).
Signed-off-by: NEliad Peller <eliad@wizery.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

5dad021d

05 6月, 2012 3 次提交
- S
  NFC: Set the NFC device RF mode appropriately · f212ad5e
  由 Samuel Ortiz 提交于 5月 31, 2012
```
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
```
  f212ad5e
- S
  NFC: Add target mode activation netlink event · fc40a8c1
  由 Samuel Ortiz 提交于 6月 01, 2012
```
Userspace gets a netlink event upon target mode activation.
The LLCP layer is also signaled when we get an ATR_REQ in order to get
the remote general bytes.
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
```
  fc40a8c1
- S
  NFC: Add target mode protocols to the polling loop startup routine · fe7c5800
  由 Samuel Ortiz 提交于 5月 15, 2012
```
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
```
  fe7c5800
04 6月, 2012 2 次提交

R
net/ethernet: ks8851_mll mac address configuration support added · 29a6b6c0
由 Raffaele Recalcati 提交于 6月 03, 2012
```
Signed-off-by: NRaffaele Recalcati <raffaele.recalcati@bticino.it>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
29a6b6c0

sock_diag: add SK_MEMINFO_BACKLOG · d594e987

由 Eric Dumazet 提交于 6月 04, 2012

Adding socket backlog len in INET_DIAG_SKMEMINFO is really useful to
diagnose various TCP problems.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d594e987

03 6月, 2012 1 次提交

tty: Revert the tty locking series, it needs more work · f309532b

由 Linus Torvalds 提交于 6月 02, 2012

This reverts the tty layer change to use per-tty locking, because it's
not correct yet, and fixing it will require some more deep surgery.

The main revert is d29f3ef3 ("tty_lock: Localise the lock"), but
there are several smaller commits that built upon it, they also get
reverted here. The list of reverted commits is:

  fde86d31 - tty: add lockdep annotations
  8f6576ad - tty: fix ldisc lock inversion trace
  d3ca8b64 - pty: Fix lock inversion
  b1d679af - tty: drop the pty lock during hangup
  abcefe5f - tty/amiserial: Add missing argument for tty_unlock()
  fd11b42e - cris: fix missing tty arg in wait_event_interruptible_tty call
  d29f3ef3 - tty_lock: Localise the lock

The revert had a trivial conflict in the 68360serial.c staging driver
that got removed in the meantime.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f309532b

02 6月, 2012 9 次提交

new helper: signal_delivered() · efee984c

由 Al Viro 提交于 4月 28, 2012

Does block_sigmask() + tracehook_signal_handler();  called when
sigframe has been successfully built.  All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).

I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

efee984c

most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set · 77097ae5

由 Al Viro 提交于 4月 27, 2012

Only 3 out of 63 do not.  Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77097ae5

A
set_restore_sigmask() is never called without SIGPENDING (and never should be) · edd63a27
由 Al Viro 提交于 4月 27, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
edd63a27

new helper: sigmask_to_save() · b7f9a11a

由 Al Viro 提交于 5月 02, 2012

replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7f9a11a

new helper: restore_saved_sigmask() · 51a7b448

由 Al Viro 提交于 5月 21, 2012

first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper.  Open-coded instances switched...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51a7b448

new helpers: {clear,test,test_and_clear}_restore_sigmask() · 4ebefe3e

由 Al Viro 提交于 4月 26, 2012

helpers parallel to set_restore_sigmask(), used in the next commits
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ebefe3e

HAVE_RESTORE_SIGMASK is defined on all architectures now · 754421c8

由 Al Viro 提交于 4月 26, 2012

Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
and picks default set_restore_sigmask() in linux/thread_info.h.  Kill the
ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
get merged.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

754421c8

vfs: retry last component if opening stale dentry · 16b1c1cd

由 Miklos Szeredi 提交于 5月 21, 2012

NFS optimizes away d_revalidates for last component of open.  This means that
open itself can find the dentry stale.

This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.

If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost.  In this case fall back to
non-RCU lookup.  Currently this is not used since NFS will always leave RCU
mode.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

16b1c1cd

fs: introduce inode operation ->update_time · c3b2da31

由 Josef Bacik 提交于 3月 26, 2012

Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c3b2da31

01 6月, 2012 11 次提交

A
switch aio and shm to do_mmap_pgoff(), make do_mmap() static · e3fc629d
由 Al Viro 提交于 5月 30, 2012
```
after all, 0 bytes and 0 pages is the same thing...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e3fc629d
A
take security_mmap_file() outside of ->mmap_sem · 8b3ec681
由 Al Viro 提交于 5月 30, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8b3ec681

c/r: prctl: add ability to set new mm_struct::exe_file · b32dfe37

由 Cyrill Gorcunov 提交于 5月 31, 2012

When we do restore we would like to have a way to setup a former
mm_struct::exe_file so that /proc/pid/exe would point to the original
executable file a process had at checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.  This option takes a
file descriptor which will be set as a source for new /proc/$pid/exe
symlink.

Note it allows to change /proc/$pid/exe if there are no VM_EXECUTABLE
vmas present for current process, simply because this feature is a special
to C/R and mm::num_exe_file_vmas become meaningless after that.

To minimize the amount of transition the /proc/pid/exe symlink might have,
this feature is implemented in one-shot manner.  Thus once changed the
symlink can't be changed again.  This should help sysadmins to monitor the
symlinks over all process running in a system.

In particular one could make a snapshot of processes and ring alarm if
there unexpected changes of /proc/pid/exe's in a system.

Note -- this feature is available iif CONFIG_CHECKPOINT_RESTORE is set and
the caller must have CAP_SYS_RESOURCE capability granted, otherwise the
request to change symlink will be rejected.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b32dfe37

c/r: prctl: extend PR_SET_MM to set up more mm_struct entries · fe8c7f5c

由 Cyrill Gorcunov 提交于 5月 31, 2012

During checkpoint we dump whole process memory to a file and the dump
includes process stack memory.  But among stack data itself, the stack
carries additional parameters such as command line arguments, environment
data and auxiliary vector.

So when we do restore procedure and once we've restored stack data itself
we need to setup mm_struct::arg_start/end, env_start/end, so restored
process would be able to find command line arguments and environment data
it had at checkpoint time.  The same applies to auxiliary vector.

For this reason additional PR_SET_MM_(ARG_START | ARG_END | ENV_START |
ENV_END | AUXV) codes are introduced.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe8c7f5c

syscalls, x86: add __NR_kcmp syscall · d97b46a6

由 Cyrill Gorcunov 提交于 5月 31, 2012

While doing the checkpoint-restore in the user space one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are
shared between tasks and restore this state.

The 2nd step can be solved by using appropriate CLONE_ flags and the
unshare syscall, while there's currently no ways for solving the 1st one.

One of the ways for checking whether two tasks share e.g.  mm_struct is to
provide some mm_struct ID of a task to its proc file, but showing such
info considered to be not that good for security reasons.

Thus after some debates we end up in conclusion that using that named
'comparison' syscall might be the best candidate.  So here is it --
__NR_kcmp.

It takes up to 5 arguments - the pids of the two tasks (which
characteristics should be compared), the comparison type and (in case of
comparison of files) two file descriptors.

Lookups for pids are done in the caller's PID namespace only.

At moment only x86 is supported and tested.

[akpm@linux-foundation.org: fix up selftests, warnings]
[akpm@linux-foundation.org: include errno.h]
[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: Michal Marek <mmarek@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d97b46a6

aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() · ac34ebb3

由 Christopher Yeoh 提交于 5月 31, 2012

A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.

Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to. This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.
Signed-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac34ebb3

eventfd: change int to __u64 in eventfd_signal() · ee62c6b2

由 Sha Zhengju 提交于 5月 31, 2012

eventfd_ctx->count is an __u64 counter which is allowed to reach
ULLONG_MAX.  eventfd_write() adds a __u64 value to "count", but the kernel
side eventfd_signal() only adds an int value to it.  Make them consistent.

[akpm@linux-foundation.org: update interface documentation]
Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee62c6b2

rapidio: add DMA engine support for RIO data transfers · e42d98eb

由 Alexandre Bounine 提交于 5月 31, 2012

Adds DMA Engine framework support into RapidIO subsystem.

Uses DMA Engine DMA_SLAVE interface to generate data transfers to/from
remote RapidIO target devices.

Introduces RapidIO-specific wrapper for prep_slave_sg() interface with an
extra parameter to pass target specific information.

Uses scatterlist to describe local data buffer.  Address flat data buffer
on a remote side.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: NVinod Koul <vinod.koul@linux.intel.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e42d98eb

mqueue: separate mqueue default value from maximum value · cef0184c

由 KOSAKI Motohiro 提交于 5月 31, 2012

Commit b231cca4 ("message queues: increase range limits") changed
mqueue default value when attr parameter is specified NULL from hard
coded value to fs.mqueue.{msg,msgsize}_max sysctl value.

This made large side effect.  When user need to use two mqueue
applications 1) using !NULL attr parameter and it require big message
size and 2) using NULL attr parameter and only need small size message,
app (1) require to raise fs.mqueue.msgsize_max and app (2) consume large
memory size even though it doesn't need.

Doug Ledford propsed to switch back it to static hard coded value.
However it also has a compatibility problem.  Some applications might
started depend on the default value is tunable.

The solution is to separate default value from maximum value.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NJoe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cef0184c

mqueue: revert bump up DFLT_*MAX · e6315bb1

由 KOSAKI Motohiro 提交于 5月 31, 2012

Mqueue limitation is slightly naieve parameter likes other ipcs because
unprivileged user can consume kernel memory by using ipcs.

Thus, too aggressive raise bring us security issue.  Example, current
setting allow evil unprivileged user use 256GB (= 256 * 1024 * 1024*1024)
and it's enough large to system will belome unresponsive.  Don't do that.

Instead, every admin should adjust the knobs for their own systems.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NJoe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6315bb1

ipc/mqueue: update maximums for the mqueue subsystem · 5b5c4d1a

由 Doug Ledford 提交于 5月 31, 2012

Commit b231cca4 ("message queues: increase range limits") changed the
maximum size of a message in a message queue from INT_MAX to 8192*128.
Unfortunately, we had customers that relied on a size much larger than
8192*128 on their production systems.  After reviewing POSIX, we found
that it is silent on the maximum message size.  We did find a couple other
areas in which it was not silent.  Fix up the mqueue maximums so that the
customer's system can continue to work, and document both the POSIX and
real world requirements in ipc_namespace.h so that we don't have this
issue crop back up.

Also, commit 9cf18e1d ("ipc: HARD_MSGMAX should be higher not lower
on 64bit") fiddled with HARD_MSGMAX without realizing that the number was
intentionally in place to limit the msg queue depth to one that was small
enough to kmalloc an array of pointers (hence why we divided 128k by
sizeof(long)).  If we wish to meet POSIX requirements, we have no choice
but to change our allocation to a vmalloc instead (at least for the large
queue size case).  With that, it's possible to increase our allowed
maximum to the POSIX requirements (or more if we choose).

[sfr@canb.auug.org.au: using vmalloc requires including vmalloc.h]
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: Joe Korty <joe.korty@ccur.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b5c4d1a