提交 · 5b7ed0892f2af4e60b9a8d2c71c77774512a6cb9 · OpenHarmony / kernel_linux

14 5月, 2014 1 次提交

tcp: move fastopen functions to tcp_fastopen.c · 5b7ed089

由 Yuchung Cheng 提交于 5月 11, 2014

Move common TFO functions that will be used by both v4 and v6
to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check().
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDaniel Lee <longinus00@gmail.com>
Signed-off-by: NJerry Chu <hkchu@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b7ed089

13 5月, 2014 1 次提交

net: rename local_df to ignore_df · 60ff7467

由 WANG Cong 提交于 5月 04, 2014

As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60ff7467

09 5月, 2014 7 次提交

net: Verify UDP checksum before handoff to encap · 0a80966b

由 Tom Herbert 提交于 5月 07, 2014

Moving validation of UDP checksum to be done in UDP not encap layer.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a80966b

icmp: Call skb_checksum_simple_validate · 29a96e1f

由 Tom Herbert 提交于 5月 07, 2014

Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29a96e1f

igmp: Call skb_checksum_simple_validate · de08dc1a

由 Tom Herbert 提交于 5月 07, 2014

Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de08dc1a

gre: Call skb_checksum_simple_validate · b1036c6a

由 Tom Herbert 提交于 5月 07, 2014

Use skb_checksum_simple_validate to verify checksum.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1036c6a

ipv4: remove inet_addr_hash_lock in devinet.c · 32a4be48

由 WANG Cong 提交于 5月 06, 2014

All the callers hold RTNL lock, so there is no need to use inet_addr_hash_lock
to protect the hash list.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32a4be48

ping: move ping_group_range out of CONFIG_SYSCTL · ba6b918a

由 Cong Wang 提交于 5月 06, 2014

Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.

BTW, rename it to ->ping_group_range instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: NStefan de Konink <stefan@konink.de>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba6b918a

ipv4: move local_port_range out of CONFIG_SYSCTL · c9d8f1a6

由 Cong Wang 提交于 5月 06, 2014

When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: NStefan de Konink <stefan@konink.de>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9d8f1a6

08 5月, 2014 4 次提交

ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation · aeefa1ec

由 Sergey Popovich 提交于 5月 06, 2014

Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().
Signed-off-by: NSergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aeefa1ec

net: clean up snmp stats code · 698365fa

由 WANG Cong 提交于 5月 05, 2014

commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f5
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

698365fa

net: ip: push gso skb forwarding handling down the stack · c7ba65d7

由 Florian Westphal 提交于 5月 05, 2014

Doing the segmentation in the forward path has one major drawback:

When using virtio, we may process gso udp packets coming
from host network stack.  In that case, netfilter POSTROUTING
will see one packet with udp header followed by multiple ip
fragments.

Delay the segmentation and do it after POSTROUTING invocation
to avoid this.

Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7ba65d7

net: ipv4: ip_forward: fix inverted local_df test · ca6c5d4a

由 Florian Westphal 提交于 5月 04, 2014

local_df means 'ignore DF bit if set', so if its set we're
allowed to perform ip fragmentation.

This wasn't noticed earlier because the output path also drops such skbs
(and emits needed icmp error) and because netfilter ip defrag did not
set local_df until couple of days ago.

Only difference is that DF-packets-larger-than MTU now discarded
earlier (f.e. we avoid pointless netfilter postrouting trip).

While at it, drop the repeated test ip_exceeds_mtu, checking it once
is enough...

Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca6c5d4a

06 5月, 2014 2 次提交

ip_tunnel: Set network header properly for IP_ECN_decapsulate() · e96f2e7c

由 Ying Cai 提交于 5月 04, 2014

In ip_tunnel_rcv(), set skb->network_header to inner IP header
before IP_ECN_decapsulate().

Without the fix, IP_ECN_decapsulate() takes outer IP header as
inner IP header, possibly causing error messages or packet drops.

Note that this skb_reset_network_header() call was in this spot when
the original feature for checking consistency of ECN bits through
tunnels was added in eccc1bb8 ("tunnel: drop packet if ECN present
with not-ECT"). It was only removed from this spot in 3d7b46cd
("ip_tunnel: push generic protocol handling to ip_tunnel module.").

Fixes: 3d7b46cd ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
Reported-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYing Cai <ycai@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e96f2e7c

net: Call skb_checksum_init in IPv4 · ed70fcfc

由 Tom Herbert 提交于 5月 02, 2014

Call skb_checksum_init instead of private functions.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed70fcfc

05 5月, 2014 1 次提交

ipv4: fix "conntrack zones" support for defrag user check in ip_expire · 7c3d5ab1

由 Vasily Averin 提交于 5月 03, 2014

Defrag user check in ip_expire was not updated after adding support for
"conntrack zones".

This bug manifests as a RFC violation, since the router will send
the icmp time exceeeded message when using conntrack zones.
Signed-off-by: NVasily Averin <vvs@openvz.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7c3d5ab1

04 5月, 2014 2 次提交

netfilter: ipv4: defrag: set local_df flag on defragmented skb · 895162b1

由 Florian Westphal 提交于 5月 02, 2014

else we may fail to forward skb even if original fragments do fit
outgoing link mtu:

1. remote sends 2k packets in two 1000 byte frags, DF set
2. we want to forward but only see '2k > mtu and DF set'
3. we then send icmp error saying that outgoing link is 1500

But original sender never sent a packet that would not fit
the outgoing link.

Setting local_df makes outgoing path test size vs.
IPCB(skb)->frag_max_size, so we will still send the correct
error in case the largest original size did not fit
outgoing link mtu.
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Suggested-by: NMaxime Bizon <mbizon@freebox.fr>
Fixes: 5f2d04f1 (ipv4: fix path MTU discovery with connection tracking)
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

895162b1

tcp: remove in_flight parameter from cong_avoid() methods · 24901551

由 Eric Dumazet 提交于 5月 02, 2014

Commit e114a710 ("tcp: fix cwnd limited checking to improve
congestion control") obsoleted in_flight parameter from
tcp_is_cwnd_limited() and its callers.

This patch does the removal as promised.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24901551

03 5月, 2014 1 次提交

tcp: fix cwnd limited checking to improve congestion control · e114a710

由 Eric Dumazet 提交于 4月 30, 2014

Yuchung discovered tcp_is_cwnd_limited() was returning false in
slow start phase even if the application filled the socket write queue.

All congestion modules take into account tcp_is_cwnd_limited()
before increasing cwnd, so this behavior limits slow start from
probing the bandwidth at full speed.

The problem is that even if write queue is full (aka we are _not_
application limited), cwnd can be under utilized if TSO should auto
defer or TCP Small queues decided to hold packets.

So the in_flight can be kept to smaller value, and we can get to the
point tcp_is_cwnd_limited() returns false.

With TCP Small Queues and FQ/pacing, this issue is more visible.

We fix this by having tcp_cwnd_validate(), which is supposed to track
such things, take into account unsent_segs, the number of segs that we
are not sending at the moment due to TSO or TSQ, but intend to send
real soon. Then when we are cwnd-limited, remember this fact while we
are processing the window of ACKs that comes back.

For example, suppose we have a brand new connection with cwnd=10; we
are in slow start, and we send a flight of 9 packets. By the time we
have received ACKs for all 9 packets we want our cwnd to be 18.
We implement this by setting tp->lsnd_pending to 9, and
considering ourselves to be cwnd-limited while cwnd is less than
twice tp->lsnd_pending (2*9 -> 18).

This makes tcp_is_cwnd_limited() more understandable, by removing
the GSO/TSO kludge, that tried to work around the issue.

Note the in_flight parameter can be removed in a followup cleanup
patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e114a710

01 5月, 2014 2 次提交

tcp_cubic: fix the range of delayed_ack · 0cda345d

由 Liu Yu 提交于 4月 30, 2014

commit b9f47a3a (tcp_cubic: limit delayed_ack ratio to prevent
divide error) try to prevent divide error, but there is still a little
chance that delayed_ack can reach zero. In case the param cnt get
negative value, then ratio+cnt would overflow and may happen to be zero.
As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero.

In some old kernels, such as 2.6.32, there is a bug that would
pass negative param, which then ultimately leads to this divide error.

commit 5b35e1e6 (tcp: fix tcp_trim_head() to adjust segment count
with skb MSS) fixed the negative param issue. However,
it's safe that we fix the range of delayed_ack as well,
to make sure we do not hit a divide by zero.

CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NLiu Yu <allanyuliu@tencent.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cda345d

tcp: increment retransmit counters in tlp and fast open · fc9f3501

由 Eric Dumazet 提交于 4月 28, 2014

Both TLP and Fast Open call __tcp_retransmit_skb() instead of
tcp_retransmit_skb() to avoid changing tp->retrans_out.

This has the side effect of missing SNMP counters increments as well
as tcp_info tcpi_total_retrans updates.

Fix this by moving the stats increments of into __tcp_retransmit_skb()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNandita Dukkipati <nanditad@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc9f3501

29 4月, 2014 1 次提交

ipv4: Use predefined value for readability · 5a2b646f

由 Hisao Tanabe 提交于 4月 27, 2014

Signed-off-by: NHisao Tanabe <xtanabe@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a2b646f

27 4月, 2014 1 次提交

inetpeer_gc_worker: trivial cleanup · 851bdd11

由 xiao jin 提交于 4月 25, 2014

Do not initialize list twice.
list_replace_init() already takes care of initializing list.
We don't need to initialize it with LIST_HEAD() beforehand.
Signed-off-by: Nxiao jin <jin.xiao@intel.com>
Reviewed-by: NDavid Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

851bdd11

24 4月, 2014 1 次提交

gre: add x-netns support · b57708ad

由 Nicolas Dichtel 提交于 4月 22, 2014

This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b57708ad

23 4月, 2014 1 次提交

tcp: avoid retransmits of TCP packets hanging in host queues · 1f3279ae

由 Eric Dumazet 提交于 4月 20, 2014

In commit 0e280af0 ("tcp: introduce TCPSpuriousRtxHostQueues SNMP
counter") we added a logic to detect when a packet was retransmitted
while the prior clone was still in a qdisc or driver queue.

We are now confident we can do better, and catch the problem before
we fragment a TSO packet before retransmit, or in TLP path.

This patch fully exploits the logic by simply canceling the spurious
retransmit.
Original packet is in a queue and will eventually leave the host.

This helps to avoid network collapses when some events make the RTO
estimations very wrong, particularly when dealing with huge number of
sockets with synchronized blast.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f3279ae

21 4月, 2014 2 次提交

tcp: make tcp_cwnd_application_limited() static · 86fd14ad

由 Weiping Pan 提交于 4月 18, 2014

Make tcp_cwnd_application_limited() static and move it from tcp_input.c to
tcp_output.c
Signed-off-by: NWeiping Pan <wpan@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86fd14ad

tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner · 1536e285

由 Kenjiro Nakayama 提交于 4月 17, 2014

This patch adds a TCP_FASTOPEN socket option to get a max backlog on its
listener to getsockopt().
Signed-off-by: NKenjiro Nakayama <nakayamakenjiro@gmail.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1536e285

17 4月, 2014 3 次提交

ip_tunnel: use the right netns in ioctl handler · 8c923ce2

由 Nicolas Dichtel 提交于 4月 16, 2014

Because the netdevice may be in another netns than the i/o netns, we should
use the i/o netns instead of dev_net(dev).

The variable 'tunnel' was used only to get 'itn', hence to simplify code I
remove it and use 't' instead.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c923ce2

ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source() · 0d5edc68

由 Cong Wang 提交于 4月 15, 2014

In my special case, when a packet is redirected from veth0 to lo,
its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we
pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source()
in ip_route_input_slow(). This would cause the following check
in fib_validate_source() fail:

            (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))

when rp_filter is disabeld on loopback. As suggested by Julian,
the caller should pass 0 here so that we will not end up by
calling __fib_validate_source().

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d5edc68

ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif · 6a662719

由 Cong Wang 提交于 4月 15, 2014

As suggested by Julian:

	Simply, flowi4_iif must not contain 0, it does not
	look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

        if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
                 from include/net/netns/netfilter.h:5,
                 from include/net/net_namespace.h:21,
                 from include/linux/netdevice.h:43,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:61,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:27,
                 from include/linux/nfs_fs.h:30,
                 from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a662719

16 4月, 2014 2 次提交

ipv4: add a sock pointer to dst->output() path. · aad88724

由 Eric Dumazet 提交于 4月 15, 2014

In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c92 ("vxlan: keep original skb ownership")
Reported-by: Nlucien xin <lucien.xin@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aad88724

ipv4: add a sock pointer to ip_queue_xmit() · b0270e91

由 Eric Dumazet 提交于 4月 15, 2014

ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d59 ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.

One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.

Fixes: 31c70d59 ("l2tp: keep original skb ownership")
Reported-by: NZhan Jianyu <nasa4836@gmail.com>
Tested-by: NZhan Jianyu <nasa4836@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0270e91

14 4月, 2014 2 次提交

ipv4: return valid RTA_IIF on ip route get · 91146153

由 Julian Anastasov 提交于 4月 13, 2014

Extend commit 13378cad
("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
which is displayed as 'iif *'.

inet_iif is not appropriate to use because skb_iif is not set.
Use the skb->dev->ifindex instead.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91146153

net: ipv4: current group_info should be put after using. · b04c4619

由 Wang, Xiaoming 提交于 4月 14, 2014

Plug a group_info refcount leak in ping_init.
group_info is only needed during initialization and
the code failed to release the reference on exit.
While here move grabbing the reference to a place
where it is actually needed.
Signed-off-by: NChuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: NZhang Dongxing <dongxing.zhang@intel.com>
Signed-off-by: Nxiaoming wang <xiaoming.wang@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b04c4619

13 4月, 2014 2 次提交

vti: don't allow to add the same tunnel twice · 8d89dcdf

由 Nicolas Dichtel 提交于 4月 11, 2014

Before the patch, it was possible to add two times the same tunnel:
ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41

It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.

Introduced by commit b9959fd3 ("vti: switch to new ip tunnel code").

CC: Cong Wang <amwang@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d89dcdf

gre: don't allow to add the same tunnel twice · 5a455275

由 Nicolas Dichtel 提交于 4月 11, 2014

Before the patch, it was possible to add two times the same tunnel:
ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249

It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.

Introduced by commit c5441932 ("GRE: Refactor GRE tunneling code.").

CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a455275

12 4月, 2014 1 次提交

net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369

由 David S. Miller 提交于 4月 11, 2014

Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

676d2369

08 4月, 2014 1 次提交

net: replace __this_cpu_inc in route.c with raw_cpu_inc · 3ed66e91

由 Christoph Lameter 提交于 4月 07, 2014

The RT_CACHE_STAT_INC macro triggers the new preemption checks
for __this_cpu ops.

I do not see any other synchronization that would allow the use of a
__this_cpu operation here however in commit dbd2915c ("[IPV4]:
RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
raw_smp_processor_id() here because "we do not care" about races.  In
the past we agreed that the price of disabling interrupts here to get
consistent counters would be too high.  These counters may be inaccurate
due to race conditions.

The use of __this_cpu op improves the situation already from what commit
dbd2915c did since the single instruction emitted on x86 does not
allow the race to occur anymore.  However, non x86 platforms could still
experience a race here.

Trace:

  __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    __ip_route_output_key+0x575/0x8c0
    ip_route_output_flow+0x27/0x70
    udp_sendmsg+0x825/0xa20
    inet_sendmsg+0x85/0xc0
    sock_sendmsg+0x9c/0xd0
    ___sys_sendmsg+0x37c/0x390
    __sys_sendmsg+0x49/0x90
    SyS_sendmsg+0x12/0x20
    tracesys+0xe1/0xe6
Signed-off-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ed66e91

05 4月, 2014 1 次提交

netfilter: Can't fail and free after table replacement · c58dd2dd

由 Thomas Graf 提交于 4月 04, 2014

All xtables variants suffer from the defect that the copy_to_user()
to copy the counters to user memory may fail after the table has
already been exchanged and thus exposed. Return an error at this
point will result in freeing the already exposed table. Any
subsequent packet processing will result in a kernel panic.

We can't copy the counters before exposing the new tables as we
want provide the counter state after the old table has been
unhooked. Therefore convert this into a silent error.

Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c58dd2dd

29 3月, 2014 1 次提交

tcp: fix get_timewait4_sock() delay computation on 64bit · e2a1d3e4

由 Eric Dumazet 提交于 3月 27, 2014

It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.

This could result in wrong output in /proc/net/tcp for tm->when field.

Fixes: 96f817fe ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2a1d3e4

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年