提交 · 14ca0751c96f8d3d0f52e8ed3b3236f8b34d3460 · openeuler / raspberrypi-kernel

08 3月, 2016 2 次提交

tcp: fix tcpi_segs_in after connection establishment · a9d99ce2

由 Eric Dumazet 提交于 3月 06, 2016

If final packet (ACK) of 3WHS is lost, it appears we do not properly
account the following incoming segment into tcpi_segs_in

While we are at it, starts segs_in with one, to count the SYN packet.

We do not yet count number of SYN we received for a request sock, we
might add this someday.

packetdrill script showing proper behavior after fix :

// Tests tcpi_segs_in when 3rd packet (ACK) of 3WHS is lost
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
+.020 < P. 1:1001(1000) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

+.000 %{ assert tcpi_segs_in == 2, 'tcpi_segs_in=%d' % tcpi_segs_in }%

Fixes: 2efd055c ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d99ce2

arp: correct return value of arp_rcv · 8dfd329f

由 Zhang Shengju 提交于 3月 04, 2016

Currently, arp_rcv() always return zero on a packet delivery upcall.

To make its behavior more compliant with the way this API should be
used, this patch changes this to let it return NET_RX_SUCCESS when the
packet is proper handled, and NET_RX_DROP otherwise.

v1->v2:
If sanity check is failed, call kfree_skb() instead of consume_skb(), then
return the correct return value.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dfd329f

04 3月, 2016 1 次提交

mld, igmp: Fix reserved tailroom calculation · 1837b2e2

由 Benjamin Poirier 提交于 2月 29, 2016

The current reserved_tailroom calculation fails to take hlen and tlen into
account.

skb:
[__hlen__|__data____________|__tlen___|__extra__]
^                                               ^
head                                            skb_end_offset

In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:

[__hlen__|__data____________|__extra__|__tlen___]
^                                               ^
head                                            skb_end_offset

The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.

The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
	reserved_tailroom = skb_tailroom - mtu
else:
	reserved_tailroom = tlen

Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.

Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1837b2e2

03 3月, 2016 1 次提交

net/ipv4: remove left over dead code · a9d56235

由 Eric Engestrom 提交于 2月 29, 2016

8cc785f6 ("net: ipv4: make the ping
/proc code AF-independent") removed the code using it, but renamed this
variable instead of removing it.
Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d56235

02 3月, 2016 3 次提交

net: remove skb_sender_cpu_clear() · 64d4e343

由 WANG Cong 提交于 2月 27, 2016

After commit 52bd2d62 ("net: better skb->sender_cpu and skb->napi_id cohabitation")
skb_sender_cpu_clear() becomes empty and can be removed.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64d4e343

net: ipv4: tcp_probe: Replace timespec with timespec64 · b1b270d8

由 Deepa Dinamani 提交于 2月 27, 2016

TCP probe log timestamps use struct timespec which is
not y2038 safe. Even though timespec might be good enough here
as it is used to represent delta time, the plan is to get rid
of all uses of timespec in the kernel.
Replace with struct timespec64 which is y2038 safe.

Prints still use unsigned long format and type.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b270d8

net: ipv4: Convert IP network timestamps to be y2038 safe · 822c8685

由 Deepa Dinamani 提交于 2月 27, 2016

ICMP timestamp messages and IP source route options require
timestamps to be in milliseconds modulo 24 hours from
midnight UT format.

Add inet_current_timestamp() function to support this. The function
returns the required timestamp in network byte order.

Timestamp calculation is also changed to call ktime_get_real_ts64()
which uses struct timespec64. struct timespec64 is y2038 safe.
Previously it called getnstimeofday() which uses struct timespec.
struct timespec is not y2038 safe.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

822c8685

27 2月, 2016 3 次提交

GSO: Provide software checksum of tunneled UDP fragmentation offload · 22463876

由 Alexander Duyck 提交于 2月 24, 2016

On reviewing the code I realized that GRE and UDP tunnels could cause a
kernel panic if we used GSO to segment a large UDP frame that was sent
through the tunnel with an outer checksum and hardware offloads were not
available.

In order to correct this we need to update the feature flags that are
passed to the skb_segment function so that in the event of UDP
fragmentation being requested for the inner header the segmentation
function will correctly generate the checksum for the payload if we cannot
segment the outer header.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22463876

net: l3mdev: prefer VRF master for source address selection · 17b693cd

由 David Lamparter 提交于 2月 24, 2016

When selecting an address in context of a VRF, the vrf master should be
preferred for address selection.  If it isn't, the user has a hard time
getting the system to select to their preference - the code will pick
the address off the first in-VRF interface it can find, which on a
router could well be a non-routable address.
Signed-off-by: NDavid Lamparter <equinox@diac24.net>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
[dsa: Fixed comment style and removed extra blank link ]
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17b693cd

net: l3mdev: address selection should only consider devices in L3 domain · 3f2fb9a8

由 David Ahern 提交于 2月 24, 2016

David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.

Relevant commands from his script:
    ip addr add 9.9.9.9/32 dev lo
    ip link set lo up

    ip link add name vrf0 type vrf table 101
    ip rule add oif vrf0 table 101
    ip rule add iif vrf0 table 101
    ip link set vrf0 up
    ip addr add 10.0.0.3/32 dev vrf0

    ip link add name dummy2 type dummy
    ip link set dummy2 master vrf0 up

    --> note dummy2 has no address - unnumbered device

    ip route add 10.2.2.2/32 dev dummy2 table 101
    ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02

    tcpdump -ni dummy2 &

And using ping instead of his socat example:
    $ ping -I vrf0 -c1 10.2.2.2
    ping: Warning: source address might be selected on device other than vrf0.
    PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64

Note the source address is from lo and is not a VRF local address. With
this patch:

    $ ping -I vrf0 -c1 10.2.2.2
    PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64

Now the source address comes from vrf0.

The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.

IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.

Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f2fb9a8

25 2月, 2016 2 次提交

ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL · a8c4a252

由 Hannes Frederic Sowa 提交于 2月 22, 2016

Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL
skbs down. This can easily happen with UDP+IPv4 sockets with the first
MSG_MORE write smaller than the MTU, second write is a sendfile.

Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path,
were we calculate the checksum manually during copying.

Commit d749c9cb ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked
sockets") started to exposes this bug.

Fixes: d749c9cb ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
Reported-by: NJiri Benc <jbenc@redhat.com>
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: NWakko Warner <wakko@animx.eu.org>
Cc: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8c4a252

soreuseport: fix merge conflict in tcp bind · e5fbfc1c

由 Craig Gallek 提交于 2月 22, 2016

One of the validation checks for the new array-based TCP SO_REUSEPORT
validation was unintentionally dropped in ea8add2b.  This adds it back.

Lack of this check allows the user to allocate multiple sock_reuseport
structures (leaking all but the first).

Fixes: ea8add2b ("tcp/dccp: better use of ephemeral ports in bind()")
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5fbfc1c

24 2月, 2016 2 次提交

tunnel: Clear IPCB(skb)->opt before dst_link_failure called · 5146d1f1

由 Bernie Harris 提交于 2月 22, 2016

IPCB may contain data from previous layers (in the observed case the
qdisc layer). In the observed scenario, the data was misinterpreted as
ip header options, which later caused the ihl to be set to an invalid
value (<5). This resulted in an infinite loop in the mips implementation
of ip_fast_csum.

This patch clears IPCB(skb)->opt before dst_link_failure can be called for
various types of tunnels. This change only applies to encapsulated ipv4
packets.

The code introduced in 11c21a30 which clears all of IPCB has been removed
to be consistent with these changes, and instead the opt field is cleared
unconditionally in ip_tunnel_xmit. The change in ip_tunnel_xmit applies to
SIT, GRE, and IPIP tunnels.

The relevant vti, l2tp, and pptp functions already contain similar code for
clearing the IPCB.
Signed-off-by: NBernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5146d1f1

tcp: convert cached rtt from usec to jiffies when feeding initial rto · 9bdfb3b7

由 Konstantin Khlebnikov 提交于 2月 21, 2016

Currently it's converted into msecs, thus HZ=1000 intact.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bdfb3b7

20 2月, 2016 1 次提交

rtnl: RTM_GETNETCONF: fix wrong return value · a97eb33f

由 Anton Protopopov 提交于 2月 16, 2016

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a97eb33f

19 2月, 2016 4 次提交

gre: clear IFF_TX_SKB_SHARING · d13b161c

由 Jiri Benc 提交于 2月 17, 2016

ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre
as it modifies the skb on xmit.

Also, clean up whitespace in ipgre_tap_setup when we're already touching it.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d13b161c

iptunnel: scrub packet in iptunnel_pull_header · 7f290c94

由 Jiri Benc 提交于 2月 18, 2016

Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f290c94

tcp/dccp: fix another race at listener dismantle · 7716682c

由 Eric Dumazet 提交于 2月 18, 2016

Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.
Reported-by: NIlya Dryomov <idryomov@gmail.com>
Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7716682c

route: check and remove route cache when we get route · deed49df

由 Xin Long 提交于 2月 18, 2016

Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deed49df

18 2月, 2016 2 次提交

tcp: correctly crypto_alloc_hash return check · 1eea84b7

由 Insu Yun 提交于 2月 15, 2016

crypto_alloc_hash never returns NULL
Signed-off-by: NInsu Yun <wuninsu@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eea84b7

ipv4: Remove inet_lro library · 7bbf3cae

由 Ben Hutchings 提交于 2月 15, 2016

There are no longer any in-tree drivers that use it.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bbf3cae

17 2月, 2016 11 次提交

net: Export ip fragment sysctl to unprivileged users · 52a773d6

由 Nikolay Borisov 提交于 2月 15, 2016

Now that all the ip fragmentation related sysctls are namespaceified
there is no reason to hide them anymore from "root" users inside
containers.
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52a773d6

N
ipv4: namespacify ip fragment max dist sysctl knob · 0fbf4cb2
由 Nikolay Borisov 提交于 2月 15, 2016
```
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
0fbf4cb2

ipv4: namespacify ip_early_demux sysctl knob · e21145a9

由 Nikolay Borisov 提交于 2月 15, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e21145a9

ipv4: Namespacify ip_dynaddr sysctl knob · 287b7f38

由 Nikolay Borisov 提交于 2月 15, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

287b7f38

igmp: net: Move igmp namespace init to correct file · dcd87999

由 Nikolay Borisov 提交于 2月 15, 2016

When igmp related sysctl were namespacified their initializatin was
erroneously put into the tcp socket namespace constructor. This
patch moves the relevant code into the igmp namespace constructor to
keep things consistent.

Also sprinkle some #ifdefs to silence warnings
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcd87999

ipv4: Namespaceify ip_default_ttl sysctl knob · fa50d974

由 Nikolay Borisov 提交于 2月 15, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa50d974

tcp: add tcpi_min_rtt and tcpi_notsent_bytes to tcp_info · cd9b2660

由 Eric Dumazet 提交于 2月 11, 2016

tcpi_min_rtt reports the minimal rtt observed by TCP stack for the flow,
in usec unit. Might be ~0U if not yet known.

tcpi_notsent_bytes reports the amount of bytes in the write queue that
were not yet sent.

This is done in a single patch to not add a temporary 32bit padding hole
in tcp_info.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd9b2660

tcp: md5: release request socket instead of listener · 72923555

由 Eric Dumazet 提交于 2月 11, 2016

If tcp_v4_inbound_md5_hash() returns an error, we must release
the refcount on the request socket, not on the listener.

The bug was added for IPv4 only.

Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72923555

net/ipv4: add dst cache support for gre lwtunnels · 3c1cb4d2

由 Paolo Abeni 提交于 2月 12, 2016

In case of UDP traffic with datagram length below MTU this
gives about 4% performance increase
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Suggested-and-Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c1cb4d2

ip_tunnel: replace dst_cache with generic implementation · e09acddf

由 Paolo Abeni 提交于 2月 12, 2016

The current ip_tunnel cache implementation is prone to a race
that will cause the wrong dst to be cached on cuncurrent dst cache
miss and ip tunnel update via netlink.

Replacing with the generic implementation fix the issue.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e09acddf

tcp: do not set rtt_min to 1 · 37202283

由 Eric Dumazet 提交于 2月 11, 2016

There are some cases where rtt_us derives from deltas of jiffies,
instead of using usec timestamps.

Since we want to track minimal rtt, better to assume a delta of 0 jiffie
might be in fact be very close to 1 jiffie.

It is kind of sad jiffies_to_usecs(1) calls a function instead of simply
using a constant.

Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37202283

13 2月, 2016 1 次提交

ipv4: fix memory leaks in ip_cmsg_send() callers · 91948309

由 Eric Dumazet 提交于 2月 04, 2016

Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.

Callers are responsible for the freeing.

Many thanks to Dmitry for the report and diagnostic.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91948309

12 2月, 2016 7 次提交

net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads · 6fa79666

由 Edward Cree 提交于 2月 11, 2016

All users now pass false, so we can remove it, and remove the code that
 was conditional upon it.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fa79666

net: gre: Implement LCO for GRE over IPv4 · 53936107

由 Edward Cree 提交于 2月 11, 2016

Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53936107

fou: enable LCO in FOU and GUE · 06f62292

由 Edward Cree 提交于 2月 11, 2016

Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06f62292

net: udp: always set up for CHECKSUM_PARTIAL offload · d75f1306

由 Edward Cree 提交于 2月 11, 2016

If the dst device doesn't support it, it'll get fixed up later anyway
 by validate_xmit_skb().  Also, this allows us to take advantage of LCO
 to avoid summing the payload multiple times.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d75f1306

net: local checksum offload for encapsulation · 179bc67f

由 Edward Cree 提交于 2月 11, 2016

The arithmetic properties of the ones-complement checksum mean that a
correctly checksummed inner packet, including its checksum, has a ones
complement sum depending only on whatever value was used to initialise
the checksum field before checksumming (in the case of TCP and UDP,
this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
packed data not covered by the inner checksum, and the initial value of
the inner checksum field.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

179bc67f

tcp/dccp: better use of ephemeral ports in bind() · ea8add2b

由 Eric Dumazet 提交于 2月 11, 2016

Implement strategy used in __inet_hash_connect() in opposite way :

Try to find a candidate using odd ports, then fallback to even ports.

We no longer disable BH for whole traversal, but one bucket at a time.
We also use cond_resched() to yield cpu to other tasks if needed.

I removed one indentation level and tried to mirror the loop we have
in __inet_hash_connect() and variable names to ease code maintenance.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea8add2b

tcp/dccp: better use of ephemeral ports in connect() · 1580ab63

由 Eric Dumazet 提交于 2月 11, 2016

In commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), I added a very simple heuristic, so that we got better
chances to use even ports, and allow bind() users to have more available
slots.

It gave nice results, but with more than 200,000 TCP sessions on a typical
server, the ~30,000 ephemeral ports are still a rare resource.

I chose to go a step further, by looking at all even ports, and if none
was available, fallback to odd ports.

The companion patch does the same in bind(), but in opposite way.

I've seen exec times of up to 30ms on busy servers, so I no longer
disable BH for the whole traversal, but only for each hash bucket.
I also call cond_resched() to be gentle to other tasks.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1580ab63