提交 · 802ab55adc39a06940a1b384e9fd0387fc762d7e · openanolis / cloud-kernel

15 4月, 2016 3 次提交

GSO: Support partial segmentation offload · 802ab55a

由 Alexander Duyck 提交于 4月 10, 2016

This patch adds support for something I am referring to as GSO partial.
The basic idea is that we can support a broader range of devices for
segmentation if we use fixed outer headers and have the hardware only
really deal with segmenting the inner header.  The idea behind the naming
is due to the fact that everything before csum_start will be fixed headers,
and everything after will be the region that is handled by hardware.

With the current implementation it allows us to add support for the
following GSO types with an inner TSO_MANGLEID or TSO6 offload:
NETIF_F_GSO_GRE
NETIF_F_GSO_GRE_CSUM
NETIF_F_GSO_IPIP
NETIF_F_GSO_SIT
NETIF_F_UDP_TUNNEL
NETIF_F_UDP_TUNNEL_CSUM

In the case of hardware that already supports tunneling we may be able to
extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
hardware can support updating inner IPv4 headers.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

802ab55a

GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values · 1530545e

由 Alexander Duyck 提交于 4月 10, 2016

This patch does two things.

First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field.  As
a result we should now be able to aggregate flows that were converted from
IPv6 to IPv4.  In addition this allows us more flexibility for future
implementations of segmentation as we may be able to use a fixed IP ID when
segmenting the flow.

The second thing this does is that it places limitations on the outer IPv4
ID header in the case of tunneled frames.  Specifically it forces the IP ID
to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
This way we can avoid creating overlapping series of IP IDs that could
possibly be fragmented if the frame goes through GRO and is then
resegmented via GSO.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1530545e

GSO: Add GSO type for fixed IPv4 ID · cbc53e08

由 Alexander Duyck 提交于 4月 10, 2016

This patch adds support for TSO using IPv4 headers with a fixed IP ID
field.  This is meant to allow us to do a lossless GRO in the case of TCP
flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
headers.

In addition I am adding a feature that for now I am referring to TSO with
IP ID mangling.  Basically when this flag is enabled the device has the
option to either output the flow with incrementing IP IDs or with a fixed
IP ID regardless of what the original IP ID ordering was.  This is useful
in cases where the DF bit is set and we do not care if the original IP ID
value is maintained.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbc53e08

14 4月, 2016 2 次提交

udp: do not expect udp headers in recv cmsg IP_CMSG_CHECKSUM · 31c2e492

由 Willem de Bruijn 提交于 4月 07, 2016

On udp sockets, recv cmsg IP_CMSG_CHECKSUM returns a checksum over
the packet payload. Since commit e6afc8ac pulled the headers,
taking skb->data as the start of transport header is incorrect. Use
the transport header pointer.

Also, when peeking at an offset from the start of the packet, only
return a checksum from the start of the peeked data. Note that the
cmsg does not subtract a tail checkum when reading truncated data.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31c2e492

udp: do not expect udp headers on ioctl SIOCINQ · 9f9a45be

由 Willem de Bruijn 提交于 4月 07, 2016

On udp sockets, ioctl SIOCINQ returns the payload size of the first
packet. Since commit e6afc8ac pulled the headers, the result is
incorrect when subtracting header length. Remove that operation.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9a45be

12 4月, 2016 1 次提交

net: ipv4: Consider failed nexthops in multipath routes · a6db4494

由 David Ahern 提交于 4月 07, 2016

Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6db4494

08 4月, 2016 8 次提交

GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU · a0ca153f

由 Alexander Duyck 提交于 4月 05, 2016

This patch fixes an issue I found in which we were dropping frames if we
had enabled checksums on GRE headers that were encapsulated by either FOU
or GUE. Without this patch I was barely able to get 1 Gb/s of throughput.
With this patch applied I am now at least getting around 6 Gb/s.

The issue is due to the fact that with FOU or GUE applied we do not provide
a transport offset pointing to the GRE header, nor do we offload it in
software as the GRE header is completely skipped by GSO and treated like a
VXLAN or GENEVE type header. As such we need to prevent the stack from
generating it and also prevent GRE from generating it via any interface we
create.

Fixes: c3483384 ("gro: Allow tunnel stacking in the case of FOU/GUE")
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0ca153f

udp: Remove udp_offloads · 46aa2f30

由 Tom Herbert 提交于 4月 05, 2016

Now that the UDP encapsulation GRO functions have been moved to the UDP
socket we not longer need the udp_offload insfrastructure so removing it.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46aa2f30

fou: change to use UDP socket GRO · d92283e3

由 Tom Herbert 提交于 4月 05, 2016

Adapt gue_gro_receive, gue_gro_complete to take a socket argument.
Don't set udp_offloads any more.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d92283e3

udp: Add socket based GRO and config · 38fd2af2

由 Tom Herbert 提交于 4月 05, 2016

Add gro_receive and  gro_complete to struct udp_tunnel_sock_cfg.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38fd2af2

udp: Add GRO functions to UDP socket · a6024562

由 Tom Herbert 提交于 4月 05, 2016

This patch adds GRO functions (gro_receive and gro_complete) to UDP
sockets. udp_gro_receive is changed to perform socket lookup on a
packet. If a socket is found the related GRO functions are called.

This features obsoletes using UDP offload infrastructure for GRO
(udp_offload). This has the advantage of not being limited to provide
offload on a per port basis, GRO is now applied to whatever individual
UDP sockets are bound to.  This also allows the possbility of
"application defined GRO"-- that is we can attach something like
a BPF program to a UDP socket to perfrom GRO on an application
layer protocol.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6024562

udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb · 63058308

由 Tom Herbert 提交于 4月 05, 2016

Add externally visible functions to lookup a UDP socket by skb. This
will be used for GRO in UDP sockets. These functions also check
if skb->dst is set, and if it is not skb->dev is used to get dev_net.
This allows calling lookup functions before dst has been set on the
skbuff.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63058308

net: introduce lockdep_is_held and update various places to use it · 1e1d04e6

由 Hannes Frederic Sowa 提交于 4月 05, 2016

The socket is either locked if we hold the slock spin_lock for
lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned
!= 0). Check for this and at the same time improve that the current
thread/cpu is really holding the lock.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e1d04e6

tcp/dccp: fix inet_reuseport_add_sock() · 85017869

由 Eric Dumazet 提交于 4月 06, 2016

David Ahern reported panics in __inet_hash() caused by my recent commit.

The reason is inet_reuseport_add_sock() was still using
sk_nulls_for_each_rcu() instead of sk_for_each_rcu().
SO_REUSEPORT enabled listeners were causing an instant crash.

While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE
flag, as it is inherited from the parent at clone time.

Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85017869

07 4月, 2016 1 次提交

ip_tunnel: implement __iptunnel_pull_header · a6d5bbf3

由 Jiri Benc 提交于 4月 05, 2016

Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner
protocol.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6d5bbf3

06 4月, 2016 2 次提交

udp: enable MSG_PEEK at non-zero offset · 627d2d6b

由 samanthakumar 提交于 4月 05, 2016

Enable peeking at UDP datagrams at the offset specified with socket
option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up
to the end of the given datagram.

Implement the SO_PEEK_OFF semantics introduced in commit ef64a54f
("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset
on peek, decrease it on regular reads.

When peeking, always checksum the packet immediately, to avoid
recomputation on subsequent peeks and final read.

The socket lock is not held for the duration of udp_recvmsg, so
peek and read operations can run concurrently. Only the last store
to sk_peek_off is preserved.
Signed-off-by: NSam Kumar <samanthakumar@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

627d2d6b

udp: remove headers from UDP packets before queueing · e6afc8ac

由 samanthakumar 提交于 4月 05, 2016

Remove UDP transport headers before queueing packets for reception.
This change simplifies a follow-up patch to add MSG_PEEK support.
Signed-off-by: NSam Kumar <samanthakumar@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6afc8ac

05 4月, 2016 10 次提交

tcp: rate limit ACK sent by SYN_RECV request sockets · 4ce7e93c