提交 · f0627cfa24389cab25c67bb7ca902912216a8a2d · openanolis / cloud-kernel

24 9月, 2013 2 次提交

openvswitch: remove duplicated include from vport-gre.c · f0627cfa

由 Wei Yongjun 提交于 9月 23, 2013

Remove duplicated include.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NJesse Gross <jesse@nicira.com>

f0627cfa

openvswitch: remove duplicated include from vport-vxlan.c · 9db55079

由 Wei Yongjun 提交于 9月 23, 2013

Remove duplicated include.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NJesse Gross <jesse@nicira.com>

9db55079

18 9月, 2013 1 次提交

openvswitch: Move flow table rehashing to flow install. · e7f13329

由 Pravin B Shelar 提交于 9月 17, 2013

Rehashing in ovs-workqueue can cause ovs-mutex lock contentions
in case of heavy flow setups where both needs ovs-mutex.  So by
moving rehashing to flow-setup we can eliminate contention.
This also simplify ovs locking and reduces dependence on
workqueue.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

e7f13329

13 9月, 2013 2 次提交

Remove GENERIC_HARDIRQ config option · 0244ad00

由 Martin Schwidefsky 提交于 8月 30, 2013

After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0244ad00

memcg: rename RESOURCE_MAX to RES_COUNTER_MAX · 6de5a8bf

由 Sha Zhengju 提交于 9月 12, 2013

RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX.
Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
Signed-off-by: NQiang Huang <h.huangqiang@huawei.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Jeff Liu <jeff.liu@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6de5a8bf

12 9月, 2013 12 次提交

SUNRPC: No, I did not intend to create a 256KiB hashtable · 23c323af

由 Trond Myklebust 提交于 9月 12, 2013

Fix the declaration of the gss_auth_hash_table so that it creates
a 16 bucket hashtable, as I had intended.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

23c323af

sunrpc: Add missing kuids conversion for printing · 13429305

由 Geert Uytterhoeven 提交于 9月 12, 2013

m68k/allmodconfig:

net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’:
net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but
argument 2 has type ‘kuid_t’

commit cdba321e ("sunrpc: Convert kuids and
kgids to uids and gids for printing") forgot to convert one instance.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

13429305

kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user() · 3ddc5b46

由 Mathieu Desnoyers 提交于 9月 11, 2013

I found the following pattern that leads in to interesting findings:

  grep -r "ret.*|=.*__put_user" *
  grep -r "ret.*|=.*__get_user" *
  grep -r "ret.*|=.*__copy" *

The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat,
since those appear in compat code, we could probably expect the kernel
addresses not to be reachable in the lower 32-bit range, so I think they
might not be exploitable.

For the "__get_user" cases, I don't think those are exploitable: the worse
that can happen is that the kernel will copy kernel memory into in-kernel
buffers, and will fail immediately afterward.

The alpha csum_partial_copy_from_user() seems to be missing the
access_ok() check entirely.  The fix is inspired from x86.  This could
lead to information leak on alpha.  I also noticed that many architectures
map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I
wonder if the latter is performing the access checks on every
architectures.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ddc5b46

net_sched: htb: fix a typo in htb_change_class() · f3ad857e

由 Vimalkumar 提交于 9月 10, 2013

Fix a typo added in commit 56b765b7 ("htb: improved accuracy at high
rates")

cbuffer should not be a copy of buffer.
Signed-off-by: NVimalkumar <j.vimal@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3ad857e

ipv6: don't call fib6_run_gc() until routing is ready · 2c861cc6

由 Michal Kubeček 提交于 9月 09, 2013

When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().

Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c861cc6

fib6_rules: fix indentation · 04f0888d

由 Stefan Tomanek 提交于 9月 08, 2013

This change just removes two tabs from the source file.
Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04f0888d

net: fix multiqueue selection · 50d1784e

由 Eric Dumazet 提交于 9月 07, 2013

commit 416186fb ("net: Split core bits of netdev_pick_tx
into __netdev_pick_tx") added a bug that disables caching of queue
index in the socket.

This is the source of packet reorders for TCP flows, and
again this is happening more often when using FQ pacing.

Old code was doing

if (queue_index != old_index)
	sk_tx_queue_set(sk, queue_index);

Alexander renamed the variables but forgot to change sk_tx_queue_set()
2nd parameter.

if (queue_index != new_index)
	sk_tx_queue_set(sk, queue_index);

This means we store -1 over and over in sk->sk_tx_queue_mapping
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50d1784e

net: sctp: fix smatch warning in sctp_send_asconf_del_ip · 88362ad8

由 Daniel Borkmann 提交于 9月 07, 2013

This was originally reported in [1] and posted by Neil Horman [2], he said:

  Fix up a missed null pointer check in the asconf code. If we don't find
  a local address, but we pass in an address length of more than 1, we may
  dereference a NULL laddr pointer. Currently this can't happen, as the only
  users of the function pass in the value 1 as the addrcnt parameter, but
  its not hot path, and it doesn't hurt to check for NULL should that ever
  be the case.

The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered
from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1
while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR
so that we do *not* find a single address in the association's bind address
list that is not in the packed array of addresses. If this happens when we
have an established association with ASCONF-capable peers, then we could get
a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1
and call later sctp_make_asconf_update_ip() with NULL laddr.

BUT: this actually won't happen as sctp_bindx_rem() will catch such a case
and return with an error earlier. As this is incredably unintuitive and error
prone, add a check to catch at least future bugs here. As Neil says, its not
hot path. Introduced by 8a07eb0a ("sctp: Add ASCONF operation on the
single-homed host").

 [1] http://www.spinics.net/lists/linux-sctp/msg02132.html
 [2] http://www.spinics.net/lists/linux-sctp/msg02133.htmlReported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Michio Honda <micchie@sfc.wide.ad.jp>
Acked-By: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88362ad8

net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE · a0fb05d1

由 Daniel Borkmann 提交于 9月 07, 2013

If we do not add braces around ...

  mask |= POLLERR |
          sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0;

... then this condition always evaluates to true as POLLERR is
defined as 8 and binary or'd with whatever result comes out of
sock_flag(). Hence instead of (X | Y) ? A : B, transform it into
X | (Y ? A : B). Unfortunatelty, commit 8facd5fb ("net: fix
smatch warnings inside datagram_poll") forgot about SCTP. :-(

Introduced by 7d4c04fc ("net: add option to enable error queue
packets waking select").
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Acked-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0fb05d1

net: fib: fib6_add: fix potential NULL pointer dereference · ae7b4e1f

由 Daniel Borkmann 提交于 9月 07, 2013

When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
with an error in fn = fib6_add_1(), then error codes are encoded into
the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
write the error code into err and jump to out, hence enter the if(err)
condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:

  if (pn != fn && pn->leaf == rt)
    ...
  if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
    ...

Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
evaluates to true and causes a NULL-pointer dereference on further
checks on pn. Fix it, by setting both NULL in error case, so that
pn != fn already evaluates to false and no further dereference
takes place.

This was first correctly implemented in 4a287eba ("IPv6 routing,
NLM_F_* flag support: REPLACE and EXCL flags support, warn about
missing CREATE flag"), but the bug got later on introduced by
188c517a ("ipv6: return errno pointers consistently for fib6_add_1()").
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Lin Ming <mlin@ss.pku.edu.cn>
Cc: Matti Vaittinen <matti.vaittinen@nsn.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NMatti Vaittinen <matti.vaittinen@nsn.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae7b4e1f

net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs · 3bf4b5b1

由 Daniel Borkmann 提交于 9月 07, 2013

In function __parse_flow_nlattrs(), we check for condition
(type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do
not return from this function as in other checks. It seems this
has been forgotten, as otherwise, we could access beyond the
memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
Hence, a maliciously prepared nla_type from user space could access
beyond this upper limit.

Introduced by 03f0d916 ("openvswitch: Mega flow implementation").
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Andy Zhou <azhou@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bf4b5b1

ipv6/exthdrs: accept tlv which includes only padding · 8112b1fe

由 Jiri Pirko 提交于 9月 06, 2013

In rfc4942 and rfc2460 I cannot find anything which would implicate to
drop packets which have only padding in tlv.

Current behaviour breaks TAHI Test v6LC.1.2.6.

Problem was intruduced in:
9b905fe6 "ipv6/exthdrs: strict Pad1 and PadN check"
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8112b1fe

11 9月, 2013 1 次提交

shrinker: convert remaining shrinkers to count/scan API · 70534a73

由 Dave Chinner 提交于 8月 28, 2013

Convert the remaining couple of random shrinkers in the tree to the new
API.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NGlauber Costa <glommer@openvz.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

70534a73

07 9月, 2013 3 次提交

tcp: properly increase rcv_ssthresh for ofo packets · 4e4f1fc2

由 Eric Dumazet 提交于 9月 06, 2013

TCP receive window handling is multi staged.

A socket has a memory budget, static or dynamic, in sk_rcvbuf.

Because we do not really know how this memory budget translates to
a TCP window (payload), TCP announces a small initial window
(about 20 MSS).

When a packet is received, we increase TCP rcv_win depending
on the payload/truesize ratio of this packet. Good citizen
packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

This heuristic takes place in tcp_grow_window()

Problem is : We currently call tcp_grow_window() only for in-order
packets.

This means that reorders or packet losses stop proper grow of
rcv_win, and senders are unable to benefit from fast recovery,
or proper reordering level detection.

Really, a packet being stored in OFO queue is not a bad citizen.
It should be part of the game as in-order packets.

In our traces, we very often see sender is limited by linux small
receive windows, even if linux hosts use autotuning (DRS) and should
allow rcv_win to grow to ~3MB.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e4f1fc2

tcp: fix no cwnd growth after timeout · 16edfe7e

由 Yuchung Cheng 提交于 9月 05, 2013

In commit 0f7cc9a3 "tcp: increase throughput when reordering is high",
it only allows cwnd to increase in Open state. This mistakenly disables
slow start after timeout (CA_Loss). Moreover cwnd won't grow if the
state moves from Disorder to Open later in tcp_fastretrans_alert().

Therefore the correct logic should be to allow cwnd to grow as long
as the data is received in order in Open, Loss, or even Disorder state.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16edfe7e

net: netlink: filter particular protocols from analyzers · 5ffd5cdd

由 Daniel Borkmann 提交于 9月 05, 2013

Fix finer-grained control and let only a whitelist of allowed netlink
protocols pass, in our case related to networking. If later on, other
subsystems decide they want to add their protocol as well to the list
of allowed protocols they shall simply add it. While at it, we also
need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can
not pick it up (as it's not filled out).
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ffd5cdd

06 9月, 2013 11 次提交

rpc: let xdr layer allocate gssproxy receieve pages · d4a51656

由 J. Bruce Fields 提交于 8月 23, 2013

In theory the linux cred in a gssproxy reply can include up to
NGROUPS_MAX data, 256K of data.  In the common case we expect it to be
shorter.  So do as the nfsv3 ACL code does and let the xdr code allocate
the pages as they come in, instead of allocating a lot of pages that
won't typically be used.
Tested-by: NSimo Sorce <simo@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d4a51656

rpc: fix huge kmalloc's in gss-proxy · 9dfd87da

由 J. Bruce Fields 提交于 8月 20, 2013

The reply to a gssproxy can include up to NGROUPS_MAX gid's, which will
take up more than a page.  We therefore need to allocate an array of
pages to hold the reply instead of trying to allocate a single huge
buffer.
Tested-by: NSimo Sorce <simo@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9dfd87da

rpc: comment on linux_cred encoding, treat all as unsigned · 6a36978e

由 J. Bruce Fields 提交于 8月 23, 2013

The encoding of linux creds is a bit confusing.

Also: I think in practice it doesn't really matter whether we treat any
of these things as signed or unsigned, but unsigned seems more
straightforward: uid_t/gid_t are unsigned and it simplifies the ngroups
overflow check.
Tested-by: NSimo Sorce <simo@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6a36978e

rpc: clean up decoding of gssproxy linux creds · 778e512b

由 J. Bruce Fields 提交于 8月 21, 2013

We can use the normal coding infrastructure here.

Two minor behavior changes:

	- we're assuming no wasted space at the end of the linux cred.
	  That seems to match gss-proxy's behavior, and I can't see why
	  it would need to do differently in the future.

	- NGROUPS_MAX check added: note groups_alloc doesn't do this,
	  this is the caller's responsibility.
Tested-by: NSimo Sorce <simo@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

778e512b

openvswitch: Fix alignment of struct sw_flow_key. · 0d40f75b

由 Jesse Gross 提交于 9月 05, 2013

sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou <azhou@nicira.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d40f75b

netfilter: Fix build errors with xt_socket.c · 1a5bbfc3

由 David S. Miller 提交于 9月 05, 2013

As reported by Randy Dunlap:

====================
when CONFIG_IPV6=m
and CONFIG_NETFILTER_XT_MATCH_SOCKET=y:

net/built-in.o: In function `socket_mt6_v1_v2':
xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup'
net/built-in.o: In function `socket_mt_init':
xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable'
====================

Like several other modules under net/netfilter/ we have to
have a dependency "IPV6 disabled or set compatibly with this
module" clause.
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a5bbfc3

tcp: Add missing braces to do_tcp_setsockopt · e2e5c4c0

由 Dave Jones 提交于 9月 05, 2013

Signed-off-by: NDave Jones <davej@fedoraproject.org>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2e5c4c0

caif: Add missing braces to multiline if in cfctrl_linkup_request · 0c1db731

由 Dave Jones 提交于 9月 05, 2013

The indentation here implies this was meant to be a multi-line if.

Introduced several years back in commit c85c2951
("caif: Handle dev_queue_xmit errors.")
Signed-off-by: NDave Jones <davej@fedoraproject.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c1db731

ipv6:introduce function to find route for redirect · b55b76b2

由 Duan Jiong 提交于 9月 04, 2013

RFC 4861 says that the IP source address of the Redirect is the
same as the current first-hop router for the specified ICMP
Destination Address, so the gateway should be taken into
consideration when we find the route for redirect.

There was once a check in commit
a6279458 ("NDISC: Search over
all possible rules on receipt of redirect.") and the check
went away in commit b94f1c09
("ipv6: Use icmpv6_notify() to propagate redirect, instead of
rt6_redirect()").

The bug is only "exploitable" on layer-2 because the source
address of the redirect is checked to be a valid link-local
address but it makes spoofing a lot easier in the same L2
domain nonetheless.

Thanks very much for Hannes's help.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b55b76b2

bridge: apply multicast snooping to IPv6 link-local, too · 3c3769e6

由 Linus Lüssing 提交于 9月 04, 2013

The multicast snooping code should have matured enough to be safely
applicable to IPv6 link-local multicast addresses (excluding the
link-local all nodes address, ff02::1), too.
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c3769e6

bridge: prevent flooding IPv6 packets that do not have a listener · 8fad9c39

由 Linus Lüssing 提交于 9月 04, 2013

Currently if there is no listener for a certain group then IPv6 packets
for that group are flooded on all ports, even though there might be no
host and router interested in it on a port.

With this commit they are only forwarded to ports with a multicast
router.

Just like commit bd4265fe ("bridge: Only flood unregistered groups
to routers") did for IPv4, let's do the same for IPv6 with the same
reasoning.
Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fad9c39

05 9月, 2013 8 次提交

SUNRPC: Add an identifier for struct rpc_clnt · 2f048db4

由 Trond Myklebust 提交于 9月 04, 2013

Add an identifier in order to aid debugging.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2f048db4

net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions · b4af8def

由 Daniel Borkmann 提交于 9月 04, 2013

We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4af8def

net: ipv6: mld: refactor query processing into v1/v2 functions · 2b7c121f

由 Daniel Borkmann 提交于 9月 04, 2013

Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b7c121f

net: ipv6: mld: similarly to MLDv2 have min max_delay of 1 · cc7f7ab7

由 Daniel Borkmann 提交于 9月 04, 2013

Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc7f7ab7

net: ipv6: mld: implement RFC3810 MLDv2 mode only · 58c0ecfd

由 Daniel Borkmann 提交于 9月 04, 2013

RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

  A forged Version 1 Query message will put MLDv2 listeners on that
  link in MLDv1 Host Compatibility Mode. This scenario can be avoided
  by providing MLDv2 hosts with a configuration option to ignore
  Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

  echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
  echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58c0ecfd

net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170

由 Daniel Borkmann 提交于 9月 04, 2013

Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3f5b170

net: ipv6: mld: clean up MLD_V1_SEEN macro · 6c567b78

由 Daniel Borkmann 提交于 9月 04, 2013

Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c567b78

net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c

由 Daniel Borkmann 提交于 9月 04, 2013

i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89225d1c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功