提交 · 7b4ce2353467fdab6e003be7a3129fb09b09deac · openanolis / cloud-kernel

14 11月, 2014 2 次提交

rhashtable: Add parent argument to mutex_is_held · 7b4ce235

由 Herbert Xu 提交于 11月 13, 2014

Currently mutex_is_held can only test locks in the that are global
since it takes no arguments.  This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.

This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b4ce235

netlink: Move mutex_is_held under PROVE_LOCKING · 97127566

由 Herbert Xu 提交于 11月 13, 2014

The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netlink so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97127566

06 11月, 2014 1 次提交

net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b

由 David S. Miller 提交于 11月 05, 2014

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f3d02b

22 10月, 2014 1 次提交

netlink: Re-add locking to netlink_lookup() and seq walker · 78fd1d0a

由 Thomas Graf 提交于 10月 21, 2014

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78fd1d0a

09 10月, 2014 1 次提交

fix misuses of f_count() in ppp and netlink · 24dff96a

由 Al Viro 提交于 10月 08, 2014

we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are.  That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared.  The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.

It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
well before that, though, and the same fix used to apply in old
location of file.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

24dff96a

15 8月, 2014 1 次提交

netlink: Annotate RCU locking for seq_file walker · 9ce12eb1

由 Thomas Graf 提交于 8月 13, 2014

Silences the following sparse warnings:
net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit
net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce12eb1

08 8月, 2014 1 次提交

netlink: reset network header before passing to taps · 4e48ed88

由 Daniel Borkmann 提交于 8月 07, 2014

netlink doesn't set any network header offset thus when the skb is
being passed to tap devices via dev_queue_xmit_nit(), it emits klog
false positives due to it being unset like:

  ...
  [  124.990397] protocol 0000 is buggy, dev nlmon0
  [  124.990411] protocol 0000 is buggy, dev nlmon0
  ...

So just reset the network header before passing to the device; for
packet sockets that just means nothing will change - mac and net
offset hold the same value just as before.
Reported-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e48ed88

07 8月, 2014 1 次提交

netlink: hold nl_sock_hash_lock during diag dump · 6c8f7e70

由 Thomas Graf 提交于 8月 07, 2014

Although RCU protection would be possible during diag dump, doing
so allows for concurrent table mutations which can render the
in-table offset between individual Netlink messages invalid and
thus cause legitimate sockets to be skipped in the dump.

Since the diag dump is relatively low volume and consistency is
more important than performance, the table mutex is held during
dump.
Reported-by: NAndrey Wagin <avagin@gmail.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c8f7e70

05 8月, 2014 1 次提交

netlink: fix lockdep splats · 67a24ac1

由 Eric Dumazet 提交于 8月 05, 2014

With netlink_lookup() conversion to RCU, we need to use appropriate
rcu dereference in netlink_seq_socket_idx() & netlink_seq_next()
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67a24ac1

03 8月, 2014 1 次提交

netlink: Convert netlink_lookup() to use RCU protected hash table · e341694e

由 Thomas Graf 提交于 8月 02, 2014

Heavy Netlink users such as Open vSwitch spend a considerable amount of
time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
RCU relieves the lock contention.

Makes use of the new resizable hash table to avoid locking on the
lookup.

The hash table will grow if entries exceeds 75% of table size up to a
total table size of 64K. It will automatically shrink if usage falls
below 30%.

Also splits nl_table_lock into a separate mutex to protect hash table
mutations and allow synchronize_rcu() to sleep while waiting for readers
during expansion and shrinking.

Before:
   9.16%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
   6.42%  kpktgend_0  [pktgen]           [k] mod_cur_headers
   6.26%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
   6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
   4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
   4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
   3.60%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
   2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2

After:
  15.26%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
   8.12%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
   7.92%  kpktgend_0  [pktgen]           [k] mod_cur_headers
   5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
   4.11%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
   4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
   3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
   [...]
   0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Reviewed-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e341694e

01 8月, 2014 1 次提交

netlink: Use PAGE_ALIGNED macro · 74e83b23

由 Tobias Klauser 提交于 7月 31, 2014

Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE).
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74e83b23

17 7月, 2014 1 次提交

netlink: remove bool varible · 498044bb

由 Varka Bhadram 提交于 7月 16, 2014

This patch removes the bool variable 'pass'.
If the swith case exist return true or return false.
Signed-off-by: NVarka Bhadram <varkab@cdac.in>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

498044bb

10 7月, 2014 1 次提交

netlink: Fix handling of error from netlink_dump(). · ac30ef83

由 Ben Pfaff 提交于 7月 09, 2014

netlink_dump() returns a negative errno value on error.  Until now,
netlink_recvmsg() directly recorded that negative value in sk->sk_err, but
that's wrong since sk_err takes positive errno values.  (This manifests as
userspace receiving a positive return value from the recv() system call,
falsely indicating success.) This bug was introduced in the commit that
started checking the netlink_dump() return value, commit b44d211e (netlink:
handle errors from netlink_dump()).

Multithreaded Netlink dumps are one way to trigger this behavior in
practice, as described in the commit message for the userspace workaround
posted here:
    http://openvswitch.org/pipermail/dev/2014-June/042339.html

This commit also fixes the same bug in netlink_poll(), introduced in commit
cd1df525 (netlink: add flow control for memory mapped I/O).
Signed-off-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac30ef83

08 7月, 2014 1 次提交

netlink: Fix do_one_broadcast() prototype. · 46c9521f

由 Rami Rosen 提交于 7月 01, 2014

This patch changes the prototype of the do_one_broadcast() method so that it will return void.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46c9521f

03 6月, 2014 1 次提交

netlink: Only check file credentials for implicit destinations · 2d7a85f4

由 Eric W. Biederman 提交于 5月 30, 2014

It was possible to get a setuid root or setcap executable to write to
it's stdout or stderr (which has been set made a netlink socket) and
inadvertently reconfigure the networking stack.

To prevent this we check that both the creator of the socket and
the currentl applications has permission to reconfigure the network
stack.

Unfortunately this breaks Zebra which always uses sendto/sendmsg
and creates it's socket without any privileges.

To keep Zebra working don't bother checking if the creator of the
socket has privilege when a destination address is specified.  Instead
rely exclusively on the privileges of the sender of the socket.

Note from Andy: This is exactly Eric's code except for some comment
clarifications and formatting fixes.  Neither I nor, I think, anyone
else is thrilled with this approach, but I'm hesitant to wait on a
better fix since 3.15 is almost here.

Note to stable maintainers: This is a mess.  An earlier series of
patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
but they did so in a way that breaks Zebra.  The offending series
includes:

    commit aa4cf945
    Author: Eric W. Biederman <ebiederm@xmission.com>
    Date:   Wed Apr 23 14:28:03 2014 -0700

        net: Add variants of capable for use on netlink messages

If a given kernel version is missing that series of fixes, it's
probably worth backporting it and this patch.  if that series is
present, then this fix is critical if you care about Zebra.

Cc: stable@vger.kernel.org
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d7a85f4

25 4月, 2014 2 次提交

net: Add variants of capable for use on netlink messages · aa4cf945

由 Eric W. Biederman 提交于 4月 23, 2014

netlink_net_capable - The common case use, for operations that are safe on a network namespace
netlink_capable - For operations that are only known to be safe for the global root
netlink_ns_capable - The general case of capable used to handle special cases

__netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of
		       the skbuff of a netlink message.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa4cf945

netlink: Rename netlink_capable netlink_allowed · 5187cd05

由 Eric W. Biederman 提交于 4月 23, 2014

netlink_capable is a static internal function in af_netlink.c and we
have better uses for the name netlink_capable.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5187cd05

23 4月, 2014 2 次提交

netlink: implement unbind to netlink_setsockopt NETLINK_DROP_MEMBERSHIP · 7774d5e0

由 Richard Guy Briggs 提交于 4月 22, 2014

Call the per-protocol unbind function rather than bind function on
NETLINK_DROP_MEMBERSHIP in netlink_setsockopt().
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7774d5e0

netlink: have netlink per-protocol bind function return an error code. · 4f520900

由 Richard Guy Briggs 提交于 4月 22, 2014

Have the netlink per-protocol optional bind function return an int error code
rather than void to signal a failure.

This will enable netlink protocols to perform extra checks including
capabilities and permissions verifications when updating memberships in
multicast groups.

In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind
function was moved above the multicast group update to prevent any access to
the multicast socket groups before checking with the per-protocol bind
function.  This will enable the per-protocol bind function to be used to check
permissions which could be denied before making them available, and to avoid
the messy job of undoing the addition should the per-protocol bind function
fail.

The netfilter subsystem seems to be the only one currently using the
per-protocol bind function.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f520900

12 4月, 2014 1 次提交

net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369

由 David S. Miller 提交于 4月 11, 2014

Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

676d2369

11 3月, 2014 1 次提交

netlink: autosize skb lengthes · 9063e21f

由 Eric Dumazet 提交于 3月 07, 2014

One known problem with netlink is the fact that NLMSG_GOODSIZE is
really small on PAGE_SIZE==4096 architectures, and it is difficult
to know in advance what buffer size is used by the application.

This patch adds an automatic learning of the size.

First netlink message will still be limited to ~4K, but if user used
bigger buffers, then following messages will be able to use up to 16KB.

This speedups dump() operations by a large factor and should be safe
for legacy applications.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9063e21f

26 2月, 2014 1 次提交

net: Fix permission check in netlink_connect() · 46833a86

由 Mike Pecovnik 提交于 2月 24, 2014

netlink_sendmsg() was changed to prevent non-root processes from sending
messages with dst_pid != 0.
netlink_connect() however still only checks if nladdr->nl_groups is set.
This patch modifies netlink_connect() to check for the same condition.
Signed-off-by: NMike Pecovnik <mike.pecovnik@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46833a86

18 2月, 2014 1 次提交

netlink: fix checkpatch errors space and "foo *bar" · 23b45672

由 Wang Yufen 提交于 2月 17, 2014

ERROR: spaces required and "(foo*)" should be "(foo *)"
Signed-off-by: NWang Yufen <wangyufen@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23b45672

19 1月, 2014 1 次提交

net: add build-time checks for msg->msg_name size · 342dfc30

由 Steffen Hurrle 提交于 1月 17, 2014

This is a follow-up patch to f3d33426 ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.
Signed-off-by: NSteffen Hurrle <steffen@hurrle.net>
Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

342dfc30

07 1月, 2014 1 次提交

netlink: Avoid netlink mmap alloc if msg size exceeds frame size · aae9f0e2

由 Thomas Graf 提交于 11月 30, 2013

An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

aae9f0e2

02 1月, 2014 1 次提交

netlink: cleanup tap related functions · 2173f8d9

由 stephen hemminger 提交于 12月 30, 2013

Cleanups in netlink_tap code
 * remove unused function netlink_clear_multicast_users
 * make local function static
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2173f8d9

01 1月, 2014 2 次提交

netlink: specify netlink packet direction for nlmon · 604d13c9

由 Daniel Borkmann 提交于 12月 23, 2013

In order to facilitate development for netlink protocol dissector,
fill the unused field skb->pkt_type of the cloned skb with a hint
of the address space of the new owner (receiver) socket in the
notion of "to kernel" resp. "to user".

At the time we invoke __netlink_deliver_tap_skb(), we already have
set the new skb owner via netlink_skb_set_owner_r(), so we can use
that for netlink_is_kernel() probing.

In normal PF_PACKET network traffic, this field denotes if the
packet is destined for us (PACKET_HOST), if it's broadcast
(PACKET_BROADCAST), etc.

As we only have 3 bit reserved, we can use the value (= 6) of
PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
and not supported anywhere, and packets of such type were never
exposed to user space, so there are no overlapping users of such
kind. Thus, as wished, that seems the only way to make both
PACKET_* values non-overlapping and therefore device agnostic.

By using those two flags for netlink skbs on nlmon devices, they
can be made available and picked up via sll_pkttype (previously
unused in netlink context) in struct sockaddr_ll. We now have
these two directions:

 - PACKET_USER (= 6)    ->  to user space
 - PACKET_KERNEL (= 7)  ->  to kernel space

Partial `ip a` example strace for sa_family=AF_NETLINK with
detected nl msg direction:

syscall:                     direction:
sendto(3,  ...) = 40         /* to kernel */
recvmsg(3, ...) = 3404       /* to user */
recvmsg(3, ...) = 1120       /* to user */
recvmsg(3, ...) = 20         /* to user */
sendto(3,  ...) = 40         /* to kernel */
recvmsg(3, ...) = 168        /* to user */
recvmsg(3, ...) = 144        /* to user */
recvmsg(3, ...) = 20         /* to user */
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

604d13c9

netlink: only do not deliver to tap when both sides are kernel sks · 73bfd370

由 Daniel Borkmann 提交于 12月 23, 2013

We should also deliver packets to nlmon devices when we are in
netlink_unicast_kernel(), and only one of the {src,dst} sockets
is user sk and the other one kernel sk. That's e.g. the case in
netlink diag, netlink route, etc. Still, forbid to deliver messages
from kernel to kernel sks.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73bfd370

21 11月, 2013 1 次提交

net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426

由 Hannes Frederic Sowa 提交于 11月 21, 2013

This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.

This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.

Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.

Also document these changes in include/linux/net.h as suggested by David
Miller.

Changes since RFC:

Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.

With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
	msg->msg_name = NULL
".

This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.

Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.

Cc: David Miller <davem@davemloft.net>
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3d33426

20 11月, 2013 1 次提交

netlink: fix documentation typo in netlink_set_err() · 840e93f2

由 Johannes Berg 提交于 11月 19, 2013

The parameter is just 'group', not 'groups', fix the documentation typo.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

840e93f2

07 9月, 2013 1 次提交

net: netlink: filter particular protocols from analyzers · 5ffd5cdd

由 Daniel Borkmann 提交于 9月 05, 2013

Fix finer-grained control and let only a whitelist of allowed netlink
protocols pass, in our case related to networking. If later on, other
subsystems decide they want to add their protocol as well to the list
of allowed protocols they shall simply add it. While at it, we also
need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can
not pick it up (as it's not filled out).
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ffd5cdd

16 8月, 2013 1 次提交

netlink: Eliminate kmalloc in netlink dump operation. · 16b304f3

由 Pravin B Shelar 提交于 8月 15, 2013

Following patch stores struct netlink_callback in netlink_sock
to avoid allocating and freeing it on every netlink dump msg.
Only one dump operation is allowed for a given socket at a time
therefore we can safely convert cb pointer to cb struct inside
netlink_sock.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16b304f3

03 8月, 2013 1 次提交

net: netlink: minor: remove unused pointer in alloc_pg_vec · 8a849bb7

由 Daniel Borkmann 提交于 8月 02, 2013

Variable ptr is being assigned, but never used, so just remove it.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a849bb7

28 6月, 2013 1 次提交

netlink: fix splat in skb_clone with large messages · 3a36515f

由 Pablo Neira 提交于 6月 28, 2013

Since (c05cdb1b netlink: allow large data transfers from user-space),
netlink splats if it invokes skb_clone on large netlink skbs since:

* skb_shared_info was not correctly initialized.
* skb->destructor is not set in the cloned skb.

This was spotted by trinity:

[  894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
[  894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0
[...]
[  894.991034] Call Trace:
[  894.991034]  [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240
[  894.991034]  [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40
[  894.991034]  [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0
[  894.991034]  [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0

Fix it by:

1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
   that sets our special skb->destructor in the cloned skb. Moreover, handle
   the release of the large cloned skb head area in the destructor path.

2) not allowing large skbuffs in the netlink broadcast path. I cannot find
   any reasonable use of the large data transfer using netlink in that path,
   moreover this helps to skip extra skb_clone handling.

I found two more netlink clients that are cloning the skbs, but they are
not in the sendmsg path. Therefore, the sole client cloning that I found
seems to be the fib frontend.

Thanks to Eric Dumazet for helping to address this issue.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a36515f

25 6月, 2013 1 次提交

net: netlink: virtual tap device management · bcbde0d4

由 Daniel Borkmann 提交于 6月 21, 2013

Similarly to the networking receive path with ptype_all taps, we add
the possibility to register netdevices that are for ARPHRD_NETLINK to
the netlink subsystem, so that those can be used for netlink analyzers
resp. debuggers. We do not offer a direct callback function as out-of-tree
modules could do crap with it. Instead, a netdevice must be registered
properly and only receives a clone, managed by the netlink layer. Symbols
are exported as GPL-only.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcbde0d4

13 6月, 2013 1 次提交

netlink: make compare exist all the time · ca15febf

由 Gao feng 提交于 6月 13, 2013

Commit da12c90e
"netlink: Add compare function for netlink_table"
only set compare at the time we create kernel netlink,
and reset compare to NULL at the time we finially
release netlink socket, but netlink_lookup wants
the compare exist always.

So we should set compare after we allocate nl_table,
and never reset it. make comapre exist all the time.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca15febf

11 6月, 2013 2 次提交

netlink: fix error propagation in netlink_mmap() · 7cdbac71

由 Patrick McHardy 提交于 6月 11, 2013

Return the error if something went wrong instead of unconditionally
returning 0.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cdbac71

netlink: Add compare function for netlink_table · da12c90e

由 Gao feng 提交于 6月 06, 2013

As we know, netlink sockets are private resource of
net namespace, they can communicate with each other
only when they in the same net namespace. this works
well until we try to add namespace support for other
subsystems which use netlink.

Don't like ipv4 and route table.., it is not suited to
make these subsytems belong to net namespace, Such as
audit and crypto subsystems,they are more suitable to
user namespace.

So we must have the ability to make the netlink sockets
in same user namespace can communicate with each other.

This patch adds a new function pointer "compare" for
netlink_table, we can decide if the netlink sockets can
communicate with each other through this netlink_table
self-defined compare function.

The behavior isn't changed if we don't provide the compare
function for netlink_table.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da12c90e

08 6月, 2013 1 次提交

netlink: allow large data transfers from user-space · c05cdb1b

由 Pablo Neira Ayuso 提交于 6月 03, 2013

I can hit ENOBUFS in the sendmsg() path with a large batch that is
composed of many netlink messages. Here that limit is 8 MBytes of
skbuff data area as kmalloc does not manage to get more than that.

While discussing atomic rule-set for nftables with Patrick McHardy,
we decided to put all rule-set updates that need to be applied
atomically in one single batch to simplify the existing approach.
However, as explained above, the existing netlink code limits us
to a maximum of ~20000 rules that fit in one single batch without
hitting ENOBUFS. iptables does not have such limitation as it is
using vmalloc.

This patch adds netlink_alloc_large_skb() which is only used in
the netlink_sendmsg() path. It uses alloc_skb if the memory
requested is <= one memory page, that should be the common case
for most subsystems, else vmalloc for higher memory allocations.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c05cdb1b

05 6月, 2013 1 次提交

net: fix sk_buff head without data area · 5e71d9d7

由 Pablo Neira 提交于 6月 03, 2013

Eric Dumazet spotted that we have to check skb->head instead
of skb->data as skb->head points to the beginning of the
data area of the skbuff. Similarly, we have to initialize the
skb->head pointer, not skb->data in __alloc_skb_head.

After this fix, netlink crashes in the release path of the
sk_buff, so let's fix that as well.

This bug was introduced in (0ebd0ac5 net: add function to
allocate sk_buff head without data area).
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e71d9d7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功