提交 · 3d1bec99320d4e96897805440f8cf4f68eff226b · openeuler / raspberrypi-kernel

24 3月, 2015 3 次提交

ipv6: introduce secret_stable to ipv6_devconf · 3d1bec99

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.

We don't inherit the secret to newly created namespaces.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d1bec99

ipv6: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets · 2215089b

由 Eric Dumazet 提交于 3月 22, 2015

tcp_v6_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2215089b

net: convert syn_wait_lock to a spinlock · b2827053

由 Eric Dumazet 提交于 3月 22, 2015

This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.

We hold syn_wait_lock for such small sections, that it makes no sense to use
a read/write lock. A spin lock is simply faster.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2827053

23 3月, 2015 1 次提交

netfilter: ip6t_REJECT: check for IP6T_F_PROTO · e35158e4

由 Pablo Neira Ayuso 提交于 3月 21, 2015

Make sure IP6T_F_PROTO is set to enforce layer 4 protocol matching from
the ip6_tables core.
Suggested-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e35158e4

21 3月, 2015 5 次提交

ipv6: invert join/leave anycast rtnl/socket locking order · c4a6853d

由 Marcelo Ricardo Leitner 提交于 3月 20, 2015

Commit baf606d9 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.

As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.

Fixes: baf606d9 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: NYing Huang <ying.huang@intel.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4a6853d

ipv6: fix backtracking for throw routes · 73ba57bf

由 Steven Barth 提交于 3月 19, 2015

for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4

A simple testcase for verification is:

ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1
Signed-off-by: NSteven Barth <cyrus@openwrt.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73ba57bf

ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment · 8e199dfd

由 Sabrina Dubroca 提交于 3月 19, 2015

Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.

ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb.  This approach was suggested by Vladislav
Yasevich.

Fixes: 0508c07f ("ipv6: Select fragment id during UFO segmentation if not set.")
Cc: Vladislav Yasevich <vyasevic@redhat.com>
Reported-by: NMatt Grant <matt@mattgrant.net.nz>
Tested-by: NMatt Grant <matt@mattgrant.net.nz>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e199dfd

inet: get rid of central tcp/dccp listener timer · fa76ce73

由 Eric Dumazet 提交于 3月 19, 2015

One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.

This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.

SYNACK are sent in huge bursts, likely to cause severe drops anyway.

This model was OK 15 years ago when memory was very tight.

We now can afford to have a timer per request sock.

Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.

With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.

This is ~100 times more what we could achieve before this patch.

Later, we will get rid of the listener hash and use ehash instead.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa76ce73

inet: drop prev pointer handling in request sock · 52452c54

由 Eric Dumazet 提交于 3月 19, 2015

When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.

Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52452c54

19 3月, 2015 6 次提交

ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} · 54ff9ef3

由 Marcelo Ricardo Leitner 提交于 3月 18, 2015

in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
  reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
  __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
  __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54ff9ef3

ipv4,ipv6: grab rtnl before locking the socket · baf606d9

由 Marcelo Ricardo Leitner 提交于 3月 18, 2015

There are some setsockopt operations in ipv4 and ipv6 that are grabbing
rtnl after having grabbed the socket lock. Yet this makes it impossible
to do operations that have to lock the socket when already within a rtnl
protected scope, like ndo dev_open and dev_stop.

We normally take coarse grained locks first but setsockopt inverted that.

So this patch invert the lock logic for these operations and makes
setsockopt grab rtnl if it will be needed prior to grabbing socket lock.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baf606d9

inet: get rid of last __inet_hash_connect() argument · b4d6444e

由 Eric Dumazet 提交于 3月 18, 2015

We now always call __inet_hash_nolisten(), no need to pass it
as an argument.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4d6444e

ipv6: get rid of __inet6_hash() · 77a6a471

由 Eric Dumazet 提交于 3月 18, 2015

We can now use inet_hash() and __inet_hash() instead of private
functions.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77a6a471

inet: add IPv6 support to sk_ehashfn() · d1e559d0

由 Eric Dumazet 提交于 3月 18, 2015

Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.

IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.

__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1e559d0

netns: constify net_hash_mix() and various callers · 6eada011

由 Eric Dumazet 提交于 3月 18, 2015

const qualifiers ease code review by making clear
which objects are not written in a function.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6eada011

18 3月, 2015 4 次提交

inet: fix request sock refcounting · 0470c8ca

由 Eric Dumazet 提交于 3月 17, 2015

While testing last patch series, I found req sock refcounting was wrong.

We must set skc_refcnt to 1 for all request socks added in hashes,
but also on request sockets created by FastOpen or syncookies.

It is tricky because we need to defer this initialization so that
future RCU lookups do not try to take a refcount on a not yet
fully initialized request socket.

Also get rid of ireq_refcnt alias.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 13854e5a ("inet: add proper refcounting to request sock")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0470c8ca

tcp: rename struct tcp_request_sock listener · 9439ce00

由 Eric Dumazet 提交于 3月 17, 2015

The listener field in struct tcp_request_sock is a pointer
back to the listener. We now have req->rsk_listener, so TCP
only needs one boolean and not a full pointer.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9439ce00

inet: add sk_listener argument to inet_reqsk_alloc() · 407640de

由 Eric Dumazet 提交于 3月 17, 2015

listener socket can be used to set net pointer, and will
be later used to hold a reference on listener.

Add a const qualifier to first argument (struct request_sock_ops *),
and factorize all write_pnet(&ireq->ireq_net, sock_net(sk));
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

407640de

ip6_tunnel: fix error code when tunnel exists · 37355565

由 Nicolas Dichtel 提交于 3月 16, 2015

After commit 2b0bb01b, the kernel returns -ENOBUFS when user tries to add
an existing tunnel with ioctl API:
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1
add tunnel "ip6tnl0" failed: No buffer space available

It's confusing, the right error is EEXIST.

This patch also change a bit the code returned:
 - ENOBUFS -> ENOMEM
 - ENOENT -> ENODEV

Fixes: 2b0bb01b ("ip6_tunnel: Return an error when adding an existing tunnel.")
CC: Steffen Klassert <steffen.klassert@secunet.com>
Reported-by: NPierre Cheynier <me@pierre-cheynier.net>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37355565

17 3月, 2015 1 次提交

inet: ip early demux should avoid request sockets · f7e4eb03

由 Eric Dumazet 提交于 3月 15, 2015

When a request socket is created, we do not cache ip route
dst entry, like for timewait sockets.

Let's use sk_fullsock() helper.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7e4eb03

15 3月, 2015 1 次提交

inet: fill request sock ir_iif for IPv4 · 16f86165

由 Eric Dumazet 提交于 3月 13, 2015

Once request socks will be in ehash table, they will need to have
a valid ir_iff field.

This is currently true only for IPv6. This patch extends support
for IPv4 as well.

This means inet_diag_fill_req() can now properly use ir_iif,
which is better for IPv6 link locals anyway, as request sockets
and established sockets will propagate consistent netlink idiag_if.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16f86165

13 3月, 2015 5 次提交

inet: introduce ireq_family · 3f66b083

由 Eric Dumazet 提交于 3月 12, 2015

Before inserting request socks into general hash table,
fill their socket family.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f66b083

inet: get_openreq4() & get_openreq6() do not need listener · d4f06873

由 Eric Dumazet 提交于 3月 12, 2015

ireq->ir_num contains local port, use it.

Also, get_openreq4() dumping listen_sk->refcnt makes litle sense.

inet_diag_fill_req() can also use ireq->ir_num
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4f06873

ipv6: add missing ireq_net & ir_cookie initializations · bd337c58

由 Eric Dumazet 提交于 3月 12, 2015

I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie

Lets clear ireq->ir_cookie in inet_reqsk_alloc()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 33cf7c90 ("net: add real socket cookies")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd337c58

net: Introduce possible_net_t · 0c5c9fb5

由 Eric W. Biederman 提交于 3月 11, 2015

Having to say
> #ifdef CONFIG_NET_NS
> 	struct net *net;
> #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
> typedef struct {
> #ifdef CONFIG_NET_NS
>       struct net *net;
> #endif
> } possible_net_t;

And then in a header say:

> 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c5c9fb5

net: Kill hold_net release_net · efd7ef1c

由 Eric W. Biederman 提交于 3月 11, 2015

hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008.  Kill the code it is long past due.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efd7ef1c

12 3月, 2015 2 次提交

xps: must clear sender_cpu before forwarding · c29390c6

由 Eric Dumazet 提交于 3月 11, 2015

John reported that my previous commit added a regression
on his router.

This is because sender_cpu & napi_id share a common location,
so get_xps_queue() can see garbage and perform an out of bound access.

We need to make sure sender_cpu is cleared before doing the transmit,
otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
this bug.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJohn <jw@nuclearfallout.net>
Bisected-by: NJohn <jw@nuclearfallout.net>
Fixes: 2bd82484 ("xps: fix xps for stacked devices")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c29390c6

ipv6: expose RFC4191 route preference via rtnetlink · c78ba6d6

由 Lubomir Rintel 提交于 3月 11, 2015

This makes it possible to retain the route preference when RAs are handled in
userspace.
Signed-off-by: NLubomir Rintel <lkundrak@v3.sk>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c78ba6d6

10 3月, 2015 2 次提交

netfilter: fix sparse warnings in reject handling · a03a8dbe

由 Florian Westphal 提交于 3月 09, 2015

make C=1 CF=-D__CHECK_ENDIAN__ shows following:

net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types)
net/bridge/netfilter/nft_reject_bridge.c:65:50: expected restricted __be16 [usertype] protocol [..]
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16
net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..]
net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..]

Caused by two (harmless) errors:
1. htons() instead of ntohs()
2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a03a8dbe

net: Remove protocol from struct dst_ops · ddb3b603

由 Eric W. Biederman 提交于 3月 09, 2015

After my change to neigh_hh_init to obtain the protocol from the
neigh_table there are no more users of protocol in struct dst_ops.
Remove the protocol field from dst_ops and all of it's initializers.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddb3b603

09 3月, 2015 1 次提交

ip: fix error queue empty skb handling · c247f053

由 Willem de Bruijn 提交于 3月 07, 2015

When reading from the error queue, msg_name and msg_control are only
populated for some errors. A new exception for empty timestamp skbs
added a false positive on icmp errors without payload.

`traceroute -M udpconn` only displayed gateways that return payload
with the icmp error: the embedded network headers are pulled before
sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise.

Fix this regression by refining when msg_name and msg_control
branches are taken. The solutions for the two fields are independent.

msg_name only makes sense for errors that configure serr->port and
serr->addr_offset. Test the first instead of skb->len. This also fixes
another issue. saddr could hold the wrong data, as serr->addr_offset
is not initialized  in some code paths, pointing to the start of the
network header. It is only valid when serr->port is set (non-zero).

msg_control support differs between IPv4 and IPv6. IPv4 only honors
requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The
skb->len test can simply be removed, because skb->dev is also tested
and never true for empty skbs. IPv6 honors requests for all errors
aside from local errors and timestamps on empty skbs.

In both cases, make the policy more explicit by moving this logic to
a new function that decides whether to process msg_control and that
optionally prepares the necessary fields in skb->cb[]. After this
change, the IPv4 and IPv6 paths are more similar.

The last case is rxrpc. Here, simply refine to only match timestamps.

Fixes: 49ca0d8b ("net-timestamp: no-payload option")
Reported-by: NJan Niehusmann <jan@gondor.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes
  v1->v2
  - fix local origin test inversion in ip6_datagram_support_cmsg
  - make v4 and v6 code paths more similar by introducing analogous
    ipv4_datagram_support_cmsg
  - fix compile bug in rxrpc
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c247f053

06 3月, 2015 1 次提交
- P
  netfilter: nf_tables: consolidate Kconfig options · f04e599e
  由 Pablo Neira Ayuso 提交于 3月 05, 2015
```
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
```
  f04e599e
05 3月, 2015 1 次提交

net: ping: Return EAFNOSUPPORT when appropriate. · 9145736d

由 Lorenzo Colitti 提交于 3月 03, 2015

1. For an IPv4 ping socket, ping_check_bind_addr does not check
   the family of the socket address that's passed in. Instead,
   make it behave like inet_bind, which enforces either that the
   address family is AF_INET, or that the family is AF_UNSPEC and
   the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
   if the socket family is not AF_INET6. Return EAFNOSUPPORT
   instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
   instead of EINVAL if an incorrect socket address structure is
   passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
   IPv4, and it cannot easily be made to support IPv4 because
   the protocol numbers for ICMP and ICMPv6 are different. This
   makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
   of making the socket unusable.

Among other things, this fixes an oops that can be triggered by:

    int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
    struct sockaddr_in6 sin6 = {
        .sin6_family = AF_INET6,
        .sin6_addr = in6addr_any,
    };
    bind(s, (struct sockaddr *) &sin6, sizeof(sin6));

Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9145736d

04 3月, 2015 1 次提交

neigh: Factor out ___neigh_lookup_noref · 60395a20

由 Eric W. Biederman 提交于 3月 03, 2015

While looking at the mpls code I found myself writing yet another
version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
and __ipv6_lookup_noref.

So to make my work a little easier and to make it a smidge easier to
verify/maintain the mpls code in the future I stopped and wrote
___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
__ipv6_lookup_noref in terms of this new function.  I tested my new
version by verifying that the same code is generated in
ip_finish_output2 and ip6_finish_output2 where these functions are
inlined.

To get to ___neigh_lookup_noref I added a new neighbour cache table
function key_eq.  So that the static size of the key would be
available.

I also added __neigh_lookup_noref for people who want to to lookup
a neighbour table entry quickly but don't know which neibhgour table
they are going to look up.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60395a20

03 3月, 2015 4 次提交

udp: only allow UFO for packets from SOCK_DGRAM sockets · acf8dd0a

由 Michal Kubeček 提交于 3月 02, 2015

If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).

Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acf8dd0a

netfilter: reject: don't send icmp error if csum is invalid · ee586bbc

由 Florian Westphal 提交于 2月 16, 2015

tcp resets are never emitted if the packet that triggers the
reject/reset has an invalid checksum.

For icmp error responses there was no such check.
It allows to distinguish icmp response generated via

iptables -I INPUT -p udp --dport 42 -j REJECT

and those emitted by network stack (won't respond if csum is invalid,
REJECT does).

Arguably its possible to avoid this by using conntrack and only
using REJECT with -m conntrack NEW/RELATED.

However, this doesn't work when connection tracking is not in use
or when using nf_conntrack_checksum=0.

Furthermore, sending errors in response to invalid csums doesn't make
much sense so just add similar test as in nf_send_reset.

Validate csum if needed and only send the response if it is ok.

Reference: http://bugzilla.redhat.com/show_bug.cgi?id=1169829Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ee586bbc

neigh: Don't require dst in neigh_hh_init · bdf53c58

由 Eric W. Biederman 提交于 3月 02, 2015

- Add protocol to neigh_tbl so that dst->ops->protocol is not needed
- Acquire the device from neigh->dev

This results in a neigh_hh_init that will cache the samve values
regardless of the packets flowing through it.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdf53c58

net: Remove iocb argument from sendmsg and recvmsg · 1b784140

由 Ying Xue 提交于 3月 02, 2015

After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b784140

02 3月, 2015 1 次提交

net: use common macro for assering skb->cb[] available size in protocol families · b4772ef8

由 Eyal Birger 提交于 3月 01, 2015

As part of an effort to move skb->dropcount to skb->cb[] use a common
macro in protocol families using skb->cb[] for ancillary data to
validate available room in skb->cb[].
Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4772ef8

01 3月, 2015 1 次提交

tcp: cleanup static functions · 74abc20c

由 Eric Dumazet 提交于 2月 26, 2015

tcp_fastopen_create_child() is static and should not be exported.

tcp4_gso_segment() and tcp6_gso_segment() should be static.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74abc20c