提交 · 9fe516ba3fb29b6f6a752ffd93342fdee500ec01 · openeuler / raspberrypi-kernel

02 7月, 2014 3 次提交

inet: move ipv6only in sock_common · 9fe516ba

由 Eric Dumazet 提交于 6月 27, 2014

When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fe516ba

ipv6: Allow accepting RA from local IP addresses. · d9333196

由 Ben Greear 提交于 6月 25, 2014

This can be used in virtual networking applications, and
may have other uses as well.  The option is disabled by
default.

A specific use case is setting up virtual routers, bridges, and
hosts on a single OS without the use of network namespaces or
virtual machines.  With proper use of ip rules, routing tables,
veth interface pairs and/or other virtual interfaces,
and applications that can bind to interfaces and/or IP addresses,
it is possibly to create one or more virtual routers with multiple
hosts attached.  The host interfaces can act as IPv6 systems,
with radvd running on the ports in the virtual routers.  With the
option provided in this patch enabled, those hosts can now properly
obtain IPv6 addresses from the radvd.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9333196

ipv6: Add more debugging around accept-ra logic. · f2a762d8

由 Ben Greear 提交于 6月 25, 2014

This is disabled by default, just like similar debug info
already in this module.  But, makes it easier to find out
why RA is not being accepted when debugging strange behaviour.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2a762d8

28 6月, 2014 12 次提交

tcp: add tcp_conn_request · 1fb6f159

由 Octavian Purdila 提交于 6月 25, 2014

Create tcp_conn_request and remove most of the code from
tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fb6f159

tcp: add queue_add_hash to tcp_request_sock_ops · 695da14e

由 Octavian Purdila 提交于 6月 25, 2014

Add queue_add_hash member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

695da14e

tcp: add mss_clamp to tcp_request_sock_ops · 2aec4a29

由 Octavian Purdila 提交于 6月 25, 2014

Add mss_clamp member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2aec4a29

tcp: unify tcp_v4_rtx_synack and tcp_v6_rtx_synack · 5db92c99

由 Octavian Purdila 提交于 6月 25, 2014

Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5db92c99

tcp: add send_synack method to tcp_request_sock_ops · d6274bd8

由 Octavian Purdila 提交于 6月 25, 2014

Create a new tcp_request_sock_ops method to unify the IPv4/IPv6
signature for tcp_v[46]_send_synack. This allows us to later unify
tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with
tcp_v4_conn_request.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6274bd8

tcp: add init_seq method to tcp_request_sock_ops · 936b8bdb

由 Octavian Purdila 提交于 6月 25, 2014

More work in preparation of unifying tcp_v4_conn_request and
tcp_v6_conn_request: indirect the init sequence calls via the
tcp_request_sock_ops.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

936b8bdb

tcp: move around a few calls in tcp_v6_conn_request · 94037159

由 Octavian Purdila 提交于 6月 25, 2014

Make the tcp_v6_conn_request calls flow similar with that of
tcp_v4_conn_request.

Note that want_cookie can be true only if isn is zero and that is why
we can move the if (want_cookie) block out of the if (!isn) block.

Moving security_inet_conn_request() has a couple of side effects:
missing inet_rsk(req)->ecn_ok update and the req->cookie_ts
update. However, neither SELinux nor Smack security hooks seems to
check them. This change should also avoid future different behaviour
for IPv4 and IPv6 in the security hooks.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94037159

tcp: add route_req method to tcp_request_sock_ops · d94e0417

由 Octavian Purdila 提交于 6月 25, 2014

Create wrappers with same signature for the IPv4/IPv6 request routing
calls and use these wrappers (via route_req method from
tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request
with the purpose of unifying the two functions in a later patch.

We can later drop the wrapper functions and modify inet_csk_route_req
and inet6_cks_route_req to use the same signature.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d94e0417

tcp: add init_cookie_seq method to tcp_request_sock_ops · fb7b37a7

由 Octavian Purdila 提交于 6月 25, 2014

Move the specific IPv4/IPv6 cookie sequence initialization to a new
method in tcp_request_sock_ops in preparation for unifying
tcp_v4_conn_request and tcp_v6_conn_request.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb7b37a7

tcp: add init_req method to tcp_request_sock_ops · 16bea70a

由 Octavian Purdila 提交于 6月 25, 2014

Move the specific IPv4/IPv6 intializations to a new method in
tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request
and tcp_v6_conn_request.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16bea70a

net: remove inet6_reqsk_alloc · 476eab82

由 Octavian Purdila 提交于 6月 25, 2014

Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.

This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

476eab82

tcp: tcp_v[46]_conn_request: fix snt_synack initialization · aa27fc50

由 Octavian Purdila 提交于 6月 25, 2014

Commit 016818d0 (tcp: TCP Fast Open Server - take SYNACK RTT after
completing 3WHS) changes the code to only take a snt_synack timestamp
when a SYNACK transmit or retransmit succeeds. This behaviour is later
broken by commit 843f4a55 (tcp: use tcp_v4_send_synack on first
SYN-ACK), as snt_synack is now updated even if tcp_v4_send_synack
fails.

Also, commit 3a19ce0e (tcp: IPv6 support for fastopen server) misses
the required IPv6 updates for 016818d0.

This patch makes sure that snt_synack is updated only when the SYNACK
trasnmit/retransmit succeeds, for both IPv4 and IPv6.

Cc: Cardwell <ncardwell@google.com>
Cc: Daniel Lee <longinus00@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa27fc50

18 6月, 2014 1 次提交

tcp: move ir_mark initialization to tcp_openreq_init · e0f802fb

由 Octavian Purdila 提交于 6月 17, 2014

ir_mark initialization is done for both TCP v4 and v6, move it in the
common tcp_openreq_init function.
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0f802fb

11 6月, 2014 2 次提交

net: ipv6: Fixed up ipsec packet be re-routing issue · f6c20c59

由 huizhang 提交于 6月 09, 2014

Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781

When a local output ipsec packet match the mangle table rule,
and be set mark value, the packet will be route again in
route_me_harder -> _session_decoder6

In this case, the nhoff in CB of skb was still the default
value 0. So the protocal match can't success and the packet can't match
correct SA rule,and then the packet be send out in plaintext.

To fixed up the issue. The CB->nhoff must be set.
Signed-off-by: NHui Zhang <huizhang@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6c20c59

ipip, sit: fix ipv4_{update_pmtu,redirect} calls · 2346829e

由 Dmitry Popov 提交于 6月 06, 2014

ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a
tunnel netdevice). It caused wrong route lookup and failure of pmtu update or
redirect. We should use the same ifindex that we use in ip_route_output_* in
*tunnel_xmit code. It is t->parms.link .
Signed-off-by: NDmitry Popov <ixaphire@qrator.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2346829e

06 6月, 2014 1 次提交

ipv6: Shrink udp_v6_mcast_next() to one socket variable · 9e89fd8b

由 Sven Wegener 提交于 5月 29, 2014

To avoid the confusion of having two variables, shrink the function to
only use the parameter variable for looping.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e89fd8b

05 6月, 2014 4 次提交

gre: Call gso_make_checksum · 4749c09c

由 Tom Herbert 提交于 6月 04, 2014

Call gso_make_checksum. This should have the benefit of using a
checksum that may have been previously computed for the packet.

This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
offload GRE GSO with and without the GRE checksum offloaed.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4749c09c

net: Add GSO support for UDP tunnels with checksum · 0f4f4ffa

由 Tom Herbert 提交于 6月 04, 2014

Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
that a device is capable of computing the UDP checksum in the
encapsulating header of a UDP tunnel.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f4f4ffa

udp: Generic functions to set checksum · af5fcba7

由 Tom Herbert 提交于 6月 04, 2014

Added udp_set_csum and udp6_set_csum functions to set UDP checksums
in packets. These are for simple UDP packets such as those that might
be created in UDP tunnels.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af5fcba7

ipv6: Fix regression caused by in udp_v6_mcast_next() · 3bfdc59a

由 Sven Wegener 提交于 5月 29, 2014

Commit efe4208f ("ipv6: make lookups simpler and faster") introduced a
regression in udp_v6_mcast_next(), resulting in multicast packets not
reaching the destination sockets under certain conditions.

The packet's IPv6 addresses are wrongly compared to the IPv6 addresses
from the function's socket argument, which indicates the starting point
for looping, instead of the loop variable. If the addresses from the
first socket do not match the packet's addresses, no socket in the list
will match.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bfdc59a

03 6月, 2014 2 次提交

net: fix inet_getid() and ipv6_select_ident() bugs · 39c36094

由 Eric Dumazet 提交于 5月 29, 2014

I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.

06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

It appears I introduced this bug in linux-3.1.

inet_getid() must return the old value of peer->ip_id_count,
not the new one.

Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.

Fixes: 87c48fa3 ("ipv6: make fragment identifications less predictable")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39c36094

inetpeer: get rid of ip_id_count · 73f156a6

由 Eric Dumazet 提交于 6月 02, 2014

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73f156a6

24 5月, 2014 3 次提交

net: Make enabling of zero UDP6 csums more restrictive · 1c19448c

由 Tom Herbert 提交于 5月 23, 2014

RFC 6935 permits zero checksums to be used in IPv6 however this is
recommended only for certain tunnel protocols, it does not make
checksums completely optional like they are in IPv4.

This patch restricts the use of IPv6 zero checksums that was previously
intoduced. no_check6_tx and no_check6_rx have been added to control
the use of checksums in UDP6 RX and TX path. The normal
sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when
dealing with a dual stack socket).

A helper function has been added (udp_set_no_check6) which can be
called by tunnel impelmentations to all zero checksums (send on the
socket, and accept them as valid).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c19448c

net: Split sk_no_check into sk_no_check_{rx,tx} · 28448b80

由 Tom Herbert 提交于 5月 23, 2014

Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28448b80

net: Eliminate no_check from protosw · b26ba202

由 Tom Herbert 提交于 5月 23, 2014

It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b26ba202

22 5月, 2014 3 次提交

ipv6: gro: fix CHECKSUM_COMPLETE support · 4de462ab

由 Eric Dumazet 提交于 5月 19, 2014

When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling
broke on GRE+IPv6 because we did not update/use the appropriate csum :

GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of
skb->csum

Tested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens
at the first level (ethernet device) instead of being done in gre
tunnel. Native IPv6+TCP is still properly aggregated.

Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4de462ab

ipv6: slight optimization in ip6_dst_gc · 14956643

由 Li RongQing 提交于 5月 19, 2014

entries is always greater than rt_max_size here, since if entries is less
than rt_max_size, the fib6_run_gc function will be skipped
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14956643

net: tunnels - enable module autoloading · f98f89a0

由 Tom Gundersen 提交于 5月 15, 2014

Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.
Signed-off-by: NTom Gundersen <teg@jklm.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f98f89a0

17 5月, 2014 1 次提交

net: ipv6: make "ip -6 route get mark xyz" work. · 2e47b291

由 Lorenzo Colitti 提交于 5月 15, 2014

Currently, "ip -6 route get mark xyz" ignores the mark passed in
by userspace. Make it honour the mark, just like IPv4 does.
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e47b291

16 5月, 2014 2 次提交

ipv6: update Destination Cache entries when gateway turn into host · be7a010d

由 Duan Jiong 提交于 5月 15, 2014

RFC 4861 states in 7.2.5:

	The IsRouter flag in the cache entry MUST be set based on the
         Router flag in the received advertisement.  In those cases
         where the IsRouter flag changes from TRUE to FALSE as a result
         of this update, the node MUST remove that router from the
         Default Router List and update the Destination Cache entries
         for all destinations using that neighbor as a router as
         specified in Section 7.3.3.  This is needed to detect when a
         node that is used as a router stops forwarding packets due to
         being configured as a host.

Currently, when dealing with NA Message which IsRouter flag changes from
TRUE to FALSE, the kernel only removes router from the Default Router List,
and don't update the Destination Cache entries.

Now in order to update those Destination Cache entries, i introduce
function rt6_clean_tohost().
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be7a010d

vti6: delete unneeded call to netdev_priv · 112a3513

由 Julia Lawall 提交于 5月 15, 2014

Netdev_priv is an accessor function, and has no purpose if its result is
not used.

A simplified version of the semantic match that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@ local idexpression x; @@
-x = netdev_priv(...);
... when != x
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

112a3513

15 5月, 2014 1 次提交

snmp: fix some left over of snmp stats · c9f2dba6

由 WANG Cong 提交于 5月 12, 2014

Fengguang reported the following sparse warning:

>> net/ipv6/proc.c:198:41: sparse: incorrect type in argument 1 (different address spaces)
   net/ipv6/proc.c:198:41:    expected void [noderef] <asn:3>*mib
   net/ipv6/proc.c:198:41:    got void [noderef] <asn:3>**pcpumib

Fixes: commit 698365fa (net: clean up snmp stats code)
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9f2dba6

14 5月, 2014 5 次提交

ipv6: fix calculation of option len in ip6_append_data · 3a1cebe7

由 Hannes Frederic Sowa 提交于 5月 11, 2014

tot_len does specify the size of struct ipv6_txoptions. We need opt_flen +
opt_nflen to calculate the overall length of additional ipv6 extensions.

I found this while auditing the ipv6 output path for a memory corruption
reported by Alexey Preobrazhensky while he fuzzed an instrumented
AddressSanitizer kernel with trinity. This may or may not be the cause
of the original bug.

Fixes: 4df98e76 ("ipv6: pmtudisc setting not respected with UFO/CORK")
Reported-by: NAlexey Preobrazhensky <preobr@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a1cebe7

net: support marking accepting TCP sockets · 84f39b08

由 Lorenzo Colitti 提交于 5月 13, 2014

When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.

This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.

This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established.  If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.

Black-box tested using user-mode linux:

- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
  mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
  incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84f39b08

net: Use fwmark reflection in PMTU discovery. · 1b3c61dc

由 Lorenzo Colitti 提交于 5月 13, 2014

Currently, routing lookups used for Path PMTU Discovery in
absence of a socket or on unmarked sockets use a mark of 0.
This causes PMTUD not to work when using routing based on
netfilter fwmark mangling and fwmark ip rules, such as:

  iptables -j MARK --set-mark 17
  ip rule add fwmark 17 lookup 100

This patch causes these route lookups to use the fwmark from the
received ICMP error when the fwmark_reflect sysctl is enabled.
This allows the administrator to make PMTUD work by configuring
appropriate fwmark rules to mark the inbound ICMP packets.

Black-box tested using user-mode linux by pointing different
fwmarks at routing tables egressing on different interfaces, and
using iptables mangling to mark packets inbound on each interface
with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery
work as expected when mark reflection is enabled and fail when
it is disabled.
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b3c61dc

net: add a sysctl to reflect the fwmark on replies · e110861f

由 Lorenzo Colitti 提交于 5月 13, 2014

Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e110861f

tcp: IPv6 support for fastopen server · 3a19ce0e

由 Daniel Lee 提交于 5月 11, 2014

After all the preparatory works, supporting IPv6 in Fast Open is now easy.
We pretty much just mirror v4 code. The only difference is how we
generate the Fast Open cookie for IPv6 sockets. Since Fast Open cookie
is 128 bits and we use AES 128, we use CBC-MAC to encrypt both the
source and destination IPv6 addresses since the cookie is a MAC tag.
Signed-off-by: NDaniel Lee <longinus00@gmail.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NJerry Chu <hkchu@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a19ce0e