提交 · 7b1311807f3d3eb8bef3ccc53127838b3bea3771 · _Walt / cloud-kernel

22 10月, 2015 2 次提交

ipv6: gro: support sit protocol · feec0cb3

由 Eric Dumazet 提交于 10月 19, 2015

Tom Herbert added SIT support to GRO with commit
19424e05 ("sit: Add gro callbacks to sit_offload"),
later reverted by Herbert Xu.

The problem came because Tom patch was building GRO
packets without proper meta data : If packets were locally
delivered, we would not care.

But if packets needed to be forwarded, GSO engine was not
able to segment individual segments.

With the following patch, we correctly set skb->encapsulation
and inner network header. We also update gso_type.

Tested:

Server :
netserver
modprobe dummy
ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
arp -s 8.0.0.100 4e:32:51:04:47:e5
iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
ifconfig sixtofour0
sixtofour0 Link encap:IPv6-in-IPv4
          inet6 addr: 2002:af6:798::1/128 Scope:Global
          inet6 addr: 2002:af6:798::/128 Scope:Global
          UP RUNNING NOARP  MTU:1480  Metric:1
          RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:20319631739 (20.3 GB)  TX bytes:29529556 (29.5 MB)

Client :
netperf -H 2002:af6:798::1 -l 1000 &

Checked on server traffic copied on dummy0 and verify segments were
properly rebuilt, with proper IP headers, TCP checksums...

tcpdump on eth0 shows proper GRO aggregation takes place.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

feec0cb3

netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0

由 Arad, Ronen 提交于 10月 19, 2015

if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
Signed-off-by: NRonen Arad <ronen.arad@intel.com>
Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1974ed0

19 10月, 2015 1 次提交

tcp: do not set queue_mapping on SYNACK · dc6ef6be

由 Eric Dumazet 提交于 10月 16, 2015

At the time of commit fff32699 ("tcp: reflect SYN queue_mapping into
SYNACK packets") we had little ways to cope with SYN floods.

We no longer need to reflect incoming skb queue mappings, and instead
can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
affinities.

Note that all SYNACK retransmits were picking TX queue 0, this no longer
is a win given that SYNACK rtx are now distributed on all cpus.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc6ef6be

17 10月, 2015 1 次提交

netfilter: remove hook owner refcounting · 2ffbceb2

由 Florian Westphal 提交于 10月 13, 2015

since commit 8405a8ff ("netfilter: nf_qeueue: Drop queue entries on
nf_unregister_hook") all pending queued entries are discarded.

So we can simply remove all of the owner handling -- when module is
removed it also needs to unregister all its hooks.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2ffbceb2

16 10月, 2015 3 次提交

tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper · f03f2e15

由 Eric Dumazet 提交于 10月 14, 2015

Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.

Fixes: 4bdc3d66 ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f03f2e15

ipv6: Initialize rt6_info properly in ip6_blackhole_route() · 0a1f5962

由 Martin KaFai Lau 提交于 10月 15, 2015

ip6_blackhole_route() does not initialize the newly allocated
rt6_info properly.  This patch:
1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached

2. The current rt->dst._metrics init code is incorrect:
   - 'rt->dst._metrics = ort->dst._metris' is not always safe
   - Not sure what dst_copy_metrics() is trying to do here
     considering ip6_rt_blackhole_cow_metrics() always returns
     NULL

   Fix:
   - Always do dst_copy_metrics()
   - Replace ip6_rt_blackhole_cow_metrics() with
     dst_cow_metrics_generic()

3. Mask out the RTF_PCPU bit from the newly allocated blackhole route.
   This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie().
   It is because RTF_PCPU is set while rt->dst.from is NULL.

Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reported-by: NPhil Sutter <phil@nwl.cc>
Tested-by: NPhil Sutter <phil@nwl.cc>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a1f5962

ipv6: Move common init code for rt6_info to a new function rt6_info_init() · ebfa45f0

由 Martin KaFai Lau 提交于 10月 15, 2015

Introduce rt6_info_init() to do the common init work for
'struct rt6_info' (after calling dst_alloc).

It is a prep work to fix the rt6_info init logic in the
ip6_blackhole_route().
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebfa45f0

14 10月, 2015 3 次提交

netfilter: ipv6: pointer cast layout · dbb526eb

由 Ian Morris 提交于 10月 11, 2015

Correct whitespace layout of a pointer casting.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

dbb526eb

netfilter: ip6_tables: improve if statements · 4305ae44

由 Ian Morris 提交于 10月 11, 2015

Correct whitespace layout of if statements.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4305ae44

tcp/dccp: fix behavior of stale SYN_RECV request sockets · 4bdc3d66

由 Eric Dumazet 提交于 10月 13, 2015

When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets
become stale, meaning 3WHS can not complete.

But current behavior is wrong :
incoming packets finding such stale sockets are dropped.

We need instead to cleanup the request socket and perform another
lookup :
- Incoming ACK will give a RST answer,
- SYN rtx might find another listener if available.
- We expedite cleanup of request sockets and old listener socket.

Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bdc3d66

13 10月, 2015 12 次提交

netfilter: ip6_tables: ternary operator layout · 544d9b17

由 Ian Morris 提交于 10月 11, 2015

Correct whitespace layout of ternary operators in the netfilter-ipv6
code.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

544d9b17

netfilter: ipv6: whitespace around operators · f9527ea9

由 Ian Morris 提交于 10月 11, 2015

This patch cleanses whitespace around arithmetical operators.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f9527ea9

netfilter: ipv6: code indentation · 7695495d

由 Ian Morris 提交于 10月 11, 2015

Use tabs instead of spaces to indent code.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7695495d

netfilter: ip6_tables: function definition layout · cda219c6

由 Ian Morris 提交于 10月 11, 2015

Use tabs instead of spaces to indent second line of parameters in
function definitions.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cda219c6

netfilter: ip6_tables: label placement · 6ac94619

由 Ian Morris 提交于 10月 11, 2015

Whitespace cleansing: Labels should not be indented.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6ac94619

net: Add VRF support to IPv6 stack · ca254490

由 David Ahern 提交于 10月 12, 2015

As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded
table ids with possibly device specific ones and manipulating the oif in
the flowi6. The flow flags are used to skip oif compare in nexthop lookups
if the device is enslaved to a VRF via the L3 master device.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca254490

net: Export fib6_get_table and nd_tbl · c4850687

由 David Ahern 提交于 10月 12, 2015

Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4850687

ipv6: Don't call with rt6_uncached_list_flush_dev · e332bc67

由 Eric W. Biederman 提交于 10月 12, 2015

As originally written rt6_uncached_list_flush_dev makes no sense when
called with dev == NULL as it attempts to flush all uncached routes
regardless of network namespace when dev == NULL.  Which is simply
incorrect behavior.

Furthermore at the point rt6_ifdown is called with dev == NULL no more
network devices exist in the network namespace so even if the code in
rt6_uncached_list_flush_dev were to attempt something sensible it
would be meaningless.

Therefore remove support in rt6_uncached_list_flush_dev for handling
network devices where dev == NULL, and only call rt6_uncached_list_flush_dev
 when rt6_ifdown is called with a network device.

Fixes: 8d0b94af ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
Tested-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e332bc67

ipv6 route: use err pointers instead of returning pointer by reference · 8c5b83f0

由 Roopa Prabhu 提交于 10月 10, 2015

This patch makes ip6_route_info_create return err pointer instead of
returning the rt pointer by reference as suggested  by Dave
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c5b83f0

ipv6: Pass struct net into nf_ct_frag6_gather · b7277597

由 Eric W. Biederman 提交于 10月 09, 2015

The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack.  In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.

The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7277597

net: shrink struct sock and request_sock by 8 bytes · ed53d0ab

由 Eric Dumazet 提交于 10月 08, 2015

One 32bit hole is following skc_refcnt, use it.
skc_incoming_cpu can also be an union for request_sock rcv_wnd.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed53d0ab

net: SO_INCOMING_CPU setsockopt() support · 70da268b

由 Eric Dumazet 提交于 10月 08, 2015

SO_INCOMING_CPU as added in commit 2c8c56e1 was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()

This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.

This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.

Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70da268b

11 10月, 2015 2 次提交

ipv6: drop frames with attached skb->sk in forwarding · 9ef2e965

由 Hannes Frederic Sowa 提交于 10月 08, 2015

This is a clone of commit 2ab95749 ("ip_forward: Drop frames with
attached skb->sk") for ipv6.

This commit has exactly the same reasons as the above mentioned commit,
namely to prevent panics during netfilter reload or a misconfigured stack.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ef2e965

ipv6: gre: setup default multicast routes over PtP links · d9e4ce65

由 Hannes Frederic Sowa 提交于 10月 08, 2015

GRE point-to-point interfaces should also support ipv6 multicast. Setting
up default multicast routes on interface creation was forgotten. Add it.

Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=103231>
Cc: Julien Muchembled <jm@jmuchemb.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dumazet <ndumazet@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9e4ce65

08 10月, 2015 7 次提交

dst: Pass net into dst->output · ede2059d

由 Eric W. Biederman 提交于 10月 07, 2015

The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ede2059d

E
ipv4, ipv6: Pass net into ip_local_out and ip6_local_out · 33224b16
由 Eric W. Biederman 提交于 10月 07, 2015
```
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
33224b16
E
ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out · cf91a99d
由 Eric W. Biederman 提交于 10月 07, 2015
```
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
cf91a99d

ipv6: Merge ip6_local_out and ip6_local_out_sk · 79288330

由 Eric W. Biederman 提交于 10月 07, 2015

Stop hidding the sk parameter with an inline helper function and make
all of the callers pass it, so that it is clear what the function is
doing.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79288330

ipv6: Merge __ip6_local_out and __ip6_local_out_sk · 9f8955cc

由 Eric W. Biederman 提交于 10月 07, 2015

Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk
__ip6_local_out and remove the previous __ip6_local_out.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f8955cc

dst: Pass a sk into .local_out · 4ebdfba7

由 Eric W. Biederman 提交于 10月 07, 2015

For consistency with the other similar methods in the kernel pass a
struct sock into the dst_ops .local_out method.

Simplifying the socket passing case is needed a prequel to passing a
struct net reference into .local_out.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ebdfba7

net: Pass net into dst_output and remove dst_output_okfn · 13206b6b

由 Eric W. Biederman 提交于 10月 07, 2015

Replace dst_output_okfn with dst_output
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13206b6b

07 10月, 2015 2 次提交

net: Fix vti use case with oif in dst lookups for IPv6 · 6e28b000

由 David Ahern 提交于 10月 05, 2015

It occurred to me yesterday that 741a11d9 ("net: ipv6: Add
RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
the oif to be considered in lookups which is known to break vti. This
explains why 58189ca7 did not the IPv6 change at the time it was
submitted.

Fixes: 42a7b32b ("xfrm: Add oif to dst lookups")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e28b000

net: Fix vti use case with oif in dst lookups for IPv6 · 4148987a

由 David Ahern 提交于 10月 05, 2015

It occurred to me yesterday that 741a11d9 ("net: ipv6: Add
RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
the oif to be considered in lookups which is known to break vti. This
explains why 58189ca7 did not the IPv6 change at the time it was
submitted.

Fixes: 42a7b32b ("xfrm: Add oif to dst lookups")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4148987a

05 10月, 2015 2 次提交

ipv6: use ktime_t for internal timestamps · 3dd7669f

由 Arnd Bergmann 提交于 9月 30, 2015

The ipv6 mip6 implementation is one of only a few users of the
skb_get_timestamp() function in the kernel, which is both unsafe
on 32-bit architectures because of the 2038 overflow, and slightly
less efficient than the skb_get_ktime() based approach.

This converts the function call and the mip6_report_rate_limiter
structure that stores the time stamp, eliminating all uses of
timeval in the ipv6 code.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dd7669f

tcp: avoid two atomic ops for syncookies · a1a5344d

由 Eric Dumazet 提交于 10月 04, 2015

inet_reqsk_alloc() is used to allocate a temporary request
in order to generate a SYNACK with a cookie. Then later,
syncookie validation also uses a temporary request.

These paths already took a reference on listener refcount,
we can avoid a couple of atomic operations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1a5344d

03 10月, 2015 5 次提交

tcp: do not lock listener to process SYN packets · e994b2f0