提交 · fbd40ea0180a2d328c5adc61414dc8bab9335ce2 · openeuler / Kernel

14 3月, 2016 3 次提交

ipv4: Don't do expensive useless work during inetdev destroy. · fbd40ea0

由 David S. Miller 提交于 3月 13, 2016

When an inetdev is destroyed, every address assigned to the interface
is removed.  And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:

1) Address promotion.  We are deleting all addresses, so there is no
   point in doing this.

2) A full nf conntrack table purge for every address.  We only need to
   do this once, as is already caught by the existing
   masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: NSolar Designer <solar@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NCyrill Gorcunov <gorcunov@openvz.org>

fbd40ea0

netconf: add macro to represent all attributes · 136ba622

由 Zhang Shengju 提交于 3月 10, 2016

This patch adds macro NETCONFA_ALL to represent all type of netconf
attributes for IPv4 and IPv6.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

136ba622

gro: Defer clearing of flush bit in tunnel paths · c194cf93

由 Alexander Duyck 提交于 3月 09, 2016

This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler. Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE. By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c194cf93

10 3月, 2016 1 次提交

tcp: Add tcp_inq to get available receive bytes on socket · 473bd239

由 Tom Herbert 提交于 3月 07, 2016

Create a common kernel function to get the number of bytes available
on a TCP socket. This is based on code in INQ getsockopt and we now call
the function for that getsockopt.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

473bd239

09 3月, 2016 1 次提交

bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit · db3c6139

由 Daniel Borkmann 提交于 3月 04, 2016

The assumptions from commit 0c1d70af ("net: use dst_cache for vxlan
device"), 468dfffc ("geneve: add dst caching support") and 3c1cb4d2
("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
when ip_tunnel_info is used is unfortunately not always valid as assumed.

While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
with different remote dsts, tos, etc, therefore they cannot be assumed as
packet independent.

Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
would reuse the same entry that was first created on the initial route
lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
a different one.

Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
in vxlan needs to be handeled differently in this context as it is currently
inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
helper is added for the three tunnel cases, which checks if we can use dst
cache.

Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
Fixes: 468dfffc ("geneve: add dst caching support")
Fixes: 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db3c6139

08 3月, 2016 2 次提交

tcp: fix tcpi_segs_in after connection establishment · a9d99ce2

由 Eric Dumazet 提交于 3月 06, 2016

If final packet (ACK) of 3WHS is lost, it appears we do not properly
account the following incoming segment into tcpi_segs_in

While we are at it, starts segs_in with one, to count the SYN packet.

We do not yet count number of SYN we received for a request sock, we
might add this someday.

packetdrill script showing proper behavior after fix :

// Tests tcpi_segs_in when 3rd packet (ACK) of 3WHS is lost
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
+.020 < P. 1:1001(1000) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

+.000 %{ assert tcpi_segs_in == 2, 'tcpi_segs_in=%d' % tcpi_segs_in }%

Fixes: 2efd055c ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d99ce2

arp: correct return value of arp_rcv · 8dfd329f

由 Zhang Shengju 提交于 3月 04, 2016

Currently, arp_rcv() always return zero on a packet delivery upcall.

To make its behavior more compliant with the way this API should be
used, this patch changes this to let it return NET_RX_SUCCESS when the
packet is proper handled, and NET_RX_DROP otherwise.

v1->v2:
If sanity check is failed, call kfree_skb() instead of consume_skb(), then
return the correct return value.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dfd329f

04 3月, 2016 1 次提交

mld, igmp: Fix reserved tailroom calculation · 1837b2e2

由 Benjamin Poirier 提交于 2月 29, 2016

The current reserved_tailroom calculation fails to take hlen and tlen into
account.

skb:
[__hlen__|__data____________|__tlen___|__extra__]
^                                               ^
head                                            skb_end_offset

In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:

[__hlen__|__data____________|__extra__|__tlen___]
^                                               ^
head                                            skb_end_offset

The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.

The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
	reserved_tailroom = skb_tailroom - mtu
else:
	reserved_tailroom = tlen

Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.

Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1837b2e2

03 3月, 2016 5 次提交

net/ipv4: remove left over dead code · a9d56235

由 Eric Engestrom 提交于 2月 29, 2016

8cc785f6 ("net: ipv4: make the ping
/proc code AF-independent") removed the code using it, but renamed this
variable instead of removing it.
Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d56235

netfilter: nft_masq: support port range · 8a6bf5da

由 Pablo Neira Ayuso 提交于 3月 01, 2016

Complete masquerading support by allowing port range selection.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8a6bf5da

netfilter: xtables: don't hook tables by default · b9e69e12

由 Florian Westphal 提交于 2月 25, 2016

delay hook registration until the table is being requested inside a
namespace.

Historically, a particular table (iptables mangle, ip6tables filter, etc)
was registered on module load.

When netns support was added to iptables only the ip/ip6tables ruleset was
made namespace aware, not the actual hook points.

This means f.e. that when ipt_filter table/module is loaded on a system,
then each namespace on that system has an (empty) iptables filter ruleset.

In other words, if a namespace sends a packet, such skb is 'caught' by
netfilter machinery and fed to hooking points for that table (i.e. INPUT,
FORWARD, etc).

Thanks to Eric Biederman, hooks are no longer global, but per namespace.

This means that we can avoid allocation of empty ruleset in a namespace and
defer hook registration until we need the functionality.

We register a tables hook entry points ONLY in the initial namespace.
When an iptables get/setockopt is issued inside a given namespace, we check
if the table is found in the per-namespace list.

If not, we attempt to find it in the initial namespace, and, if found,
create an empty default table in the requesting namespace and register the
needed hooks.

Hook points are destroyed only once namespace is deleted, there is no
'usage count' (it makes no sense since there is no 'remove table' operation
in xtables api).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b9e69e12

netfilter: xtables: prepare for on-demand hook register · a67dd266

由 Florian Westphal 提交于 2月 25, 2016

This change prepares for upcoming on-demand xtables hook registration.

We change the protoypes of the register/unregister functions.
A followup patch will then add nf_hook_register/unregister calls
to the iptables one.

Once a hook is registered packets will be picked up, so all assignments
of the form

net->ipv4.iptable_$table = new_table

have to be moved to ip(6)t_register_table, else we can see NULL
net->ipv4.iptable_$table later.

This patch doesn't change functionality; without this the actual change
simply gets too big.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a67dd266

netfilter: nf_defrag_ipv4: Drop redundant ip_send_check() · 5f547391

由 Joe Stringer 提交于 2月 03, 2016

Since commit 0848f642 ("inet: frags: fix defragmented packet's IP
header for af_packet"), ip_send_check() would be called twice for
defragmentation that occurs from netfilter ipv4 defrag hooks. Remove the
extra call.
Signed-off-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5f547391

02 3月, 2016 3 次提交

net: remove skb_sender_cpu_clear() · 64d4e343

由 WANG Cong 提交于 2月 27, 2016

After commit 52bd2d62 ("net: better skb->sender_cpu and skb->napi_id cohabitation")
skb_sender_cpu_clear() becomes empty and can be removed.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64d4e343

net: ipv4: tcp_probe: Replace timespec with timespec64 · b1b270d8

由 Deepa Dinamani 提交于 2月 27, 2016

TCP probe log timestamps use struct timespec which is
not y2038 safe. Even though timespec might be good enough here
as it is used to represent delta time, the plan is to get rid
of all uses of timespec in the kernel.
Replace with struct timespec64 which is y2038 safe.

Prints still use unsigned long format and type.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b270d8

net: ipv4: Convert IP network timestamps to be y2038 safe · 822c8685

由 Deepa Dinamani 提交于 2月 27, 2016

ICMP timestamp messages and IP source route options require
timestamps to be in milliseconds modulo 24 hours from
midnight UT format.

Add inet_current_timestamp() function to support this. The function
returns the required timestamp in network byte order.

Timestamp calculation is also changed to call ktime_get_real_ts64()
which uses struct timespec64. struct timespec64 is y2038 safe.
Previously it called getnstimeofday() which uses struct timespec.
struct timespec is not y2038 safe.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

822c8685

27 2月, 2016 3 次提交

GSO: Provide software checksum of tunneled UDP fragmentation offload · 22463876

由 Alexander Duyck 提交于 2月 24, 2016

On reviewing the code I realized that GRE and UDP tunnels could cause a
kernel panic if we used GSO to segment a large UDP frame that was sent
through the tunnel with an outer checksum and hardware offloads were not
available.

In order to correct this we need to update the feature flags that are
passed to the skb_segment function so that in the event of UDP
fragmentation being requested for the inner header the segmentation
function will correctly generate the checksum for the payload if we cannot
segment the outer header.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22463876

net: l3mdev: prefer VRF master for source address selection · 17b693cd

由 David Lamparter 提交于 2月 24, 2016

When selecting an address in context of a VRF, the vrf master should be
preferred for address selection.  If it isn't, the user has a hard time
getting the system to select to their preference - the code will pick
the address off the first in-VRF interface it can find, which on a
router could well be a non-routable address.
Signed-off-by: NDavid Lamparter <equinox@diac24.net>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
[dsa: Fixed comment style and removed extra blank link ]
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17b693cd

net: l3mdev: address selection should only consider devices in L3 domain · 3f2fb9a8

由 David Ahern 提交于 2月 24, 2016

David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.

Relevant commands from his script:
    ip addr add 9.9.9.9/32 dev lo
    ip link set lo up

    ip link add name vrf0 type vrf table 101
    ip rule add oif vrf0 table 101
    ip rule add iif vrf0 table 101
    ip link set vrf0 up
    ip addr add 10.0.0.3/32 dev vrf0

    ip link add name dummy2 type dummy
    ip link set dummy2 master vrf0 up

    --> note dummy2 has no address - unnumbered device

    ip route add 10.2.2.2/32 dev dummy2 table 101
    ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02

    tcpdump -ni dummy2 &

And using ping instead of his socat example:
    $ ping -I vrf0 -c1 10.2.2.2
    ping: Warning: source address might be selected on device other than vrf0.
    PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64

Note the source address is from lo and is not a VRF local address. With
this patch:

    $ ping -I vrf0 -c1 10.2.2.2
    PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64

Now the source address comes from vrf0.

The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.

IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.

Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f2fb9a8

25 2月, 2016 2 次提交

ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL · a8c4a252

由 Hannes Frederic Sowa 提交于 2月 22, 2016

Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL
skbs down. This can easily happen with UDP+IPv4 sockets with the first
MSG_MORE write smaller than the MTU, second write is a sendfile.

Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path,
were we calculate the checksum manually during copying.

Commit d749c9cb ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked
sockets") started to exposes this bug.

Fixes: d749c9cb ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
Reported-by: NJiri Benc <jbenc@redhat.com>
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: NWakko Warner <wakko@animx.eu.org>
Cc: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8c4a252

soreuseport: fix merge conflict in tcp bind · e5fbfc1c

由 Craig Gallek 提交于 2月 22, 2016

One of the validation checks for the new array-based TCP SO_REUSEPORT
validation was unintentionally dropped in ea8add2b.  This adds it back.

Lack of this check allows the user to allocate multiple sock_reuseport
structures (leaking all but the first).

Fixes: ea8add2b ("tcp/dccp: better use of ephemeral ports in bind()")
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5fbfc1c

24 2月, 2016 2 次提交

tunnel: Clear IPCB(skb)->opt before dst_link_failure called · 5146d1f1

由 Bernie Harris 提交于 2月 22, 2016

IPCB may contain data from previous layers (in the observed case the
qdisc layer). In the observed scenario, the data was misinterpreted as
ip header options, which later caused the ihl to be set to an invalid
value (<5). This resulted in an infinite loop in the mips implementation
of ip_fast_csum.

This patch clears IPCB(skb)->opt before dst_link_failure can be called for
various types of tunnels. This change only applies to encapsulated ipv4
packets.

The code introduced in 11c21a30 which clears all of IPCB has been removed
to be consistent with these changes, and instead the opt field is cleared
unconditionally in ip_tunnel_xmit. The change in ip_tunnel_xmit applies to
SIT, GRE, and IPIP tunnels.

The relevant vti, l2tp, and pptp functions already contain similar code for
clearing the IPCB.
Signed-off-by: NBernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5146d1f1

tcp: convert cached rtt from usec to jiffies when feeding initial rto · 9bdfb3b7

由 Konstantin Khlebnikov 提交于 2月 21, 2016

Currently it's converted into msecs, thus HZ=1000 intact.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bdfb3b7

20 2月, 2016 1 次提交

rtnl: RTM_GETNETCONF: fix wrong return value · a97eb33f

由 Anton Protopopov 提交于 2月 16, 2016

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a97eb33f

19 2月, 2016 4 次提交

gre: clear IFF_TX_SKB_SHARING · d13b161c

由 Jiri Benc 提交于 2月 17, 2016

ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre
as it modifies the skb on xmit.

Also, clean up whitespace in ipgre_tap_setup when we're already touching it.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d13b161c

iptunnel: scrub packet in iptunnel_pull_header · 7f290c94

由 Jiri Benc 提交于 2月 18, 2016

Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f290c94

tcp/dccp: fix another race at listener dismantle · 7716682c

由 Eric Dumazet 提交于 2月 18, 2016

Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.
Reported-by: NIlya Dryomov <idryomov@gmail.com>
Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7716682c

route: check and remove route cache when we get route · deed49df

由 Xin Long 提交于 2月 18, 2016

Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deed49df

18 2月, 2016 2 次提交

tcp: correctly crypto_alloc_hash return check · 1eea84b7

由 Insu Yun 提交于 2月 15, 2016

crypto_alloc_hash never returns NULL
Signed-off-by: NInsu Yun <wuninsu@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eea84b7

ipv4: Remove inet_lro library · 7bbf3cae

由 Ben Hutchings 提交于 2月 15, 2016

There are no longer any in-tree drivers that use it.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bbf3cae

17 2月, 2016 10 次提交

net: Export ip fragment sysctl to unprivileged users · 52a773d6

由 Nikolay Borisov 提交于 2月 15, 2016

Now that all the ip fragmentation related sysctls are namespaceified
there is no reason to hide them anymore from "root" users inside
containers.
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52a773d6

N
ipv4: namespacify ip fragment max dist sysctl knob · 0fbf4cb2
由 Nikolay Borisov 提交于 2月 15, 2016
```
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
0fbf4cb2

ipv4: namespacify ip_early_demux sysctl knob · e21145a9

由 Nikolay Borisov 提交于 2月 15, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e21145a9

ipv4: Namespacify ip_dynaddr sysctl knob · 287b7f38

由 Nikolay Borisov 提交于 2月 15, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

287b7f38

igmp: net: Move igmp namespace init to correct file · dcd87999

由 Nikolay Borisov 提交于 2月 15, 2016

When igmp related sysctl were namespacified their initializatin was
erroneously put into the tcp socket namespace constructor. This
patch moves the relevant code into the igmp namespace constructor to
keep things consistent.

Also sprinkle some #ifdefs to silence warnings
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcd87999

ipv4: Namespaceify ip_default_ttl sysctl knob · fa50d974

由 Nikolay Borisov 提交于 2月 15, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa50d974

tcp: add tcpi_min_rtt and tcpi_notsent_bytes to tcp_info · cd9b2660

由 Eric Dumazet 提交于 2月 11, 2016

tcpi_min_rtt reports the minimal rtt observed by TCP stack for the flow,
in usec unit. Might be ~0U if not yet known.

tcpi_notsent_bytes reports the amount of bytes in the write queue that
were not yet sent.

This is done in a single patch to not add a temporary 32bit padding hole
in tcp_info.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd9b2660

tcp: md5: release request socket instead of listener · 72923555

由 Eric Dumazet 提交于 2月 11, 2016

If tcp_v4_inbound_md5_hash() returns an error, we must release
the refcount on the request socket, not on the listener.

The bug was added for IPv4 only.

Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72923555

net/ipv4: add dst cache support for gre lwtunnels · 3c1cb4d2

由 Paolo Abeni 提交于 2月 12, 2016

In case of UDP traffic with datagram length below MTU this
gives about 4% performance increase
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Suggested-and-Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c1cb4d2

ip_tunnel: replace dst_cache with generic implementation · e09acddf

由 Paolo Abeni 提交于 2月 12, 2016

The current ip_tunnel cache implementation is prone to a race
that will cause the wrong dst to be cached on cuncurrent dst cache
miss and ip tunnel update via netlink.

Replacing with the generic implementation fix the issue.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e09acddf

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功