- 16 9月, 2014 1 次提交
-
-
由 Steffen Klassert 提交于
Currently we genarate a blackhole route route whenever we have matching policies but can not resolve the states. Here we assume that dst_output() is called to kill the balckholed packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating blackhole routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.") Reported-by: NKonstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 10 9月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2014 2 次提交
-
-
由 Ian Morris 提交于
This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. This patch removes some blank lines between the end of a function definition and the EXPORT_SYMBOL_GPL macro in order to prevent checkpatch warning that EXPORT_SYMBOL must immediately follow a function. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ian Morris 提交于
This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char *)foo etc. * Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2014 1 次提交
-
-
由 Willem de Bruijn 提交于
Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 7月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NJeffrey Knockel <jeffk@cs.unm.edu> Reported-by: NJedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2014 1 次提交
-
-
由 Li RongQing 提交于
After 11878b40[net-timestamp: SOCK_RAW and PING timestamping], this comment becomes obsolete since the codes check not only UDP socket, but also RAW sock; and the codes are clear, not need the comments Signed-off-by: NLi RongQing <roy.qing.li@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 7月, 2014 1 次提交
-
-
由 Willem de Bruijn 提交于
Add SO_TIMESTAMPING to sockets of type PF_INET[6]/SOCK_RAW: Add the necessary sock_tx_timestamp calls to the datapath for RAW sockets (ping sockets already had these calls). Fix the IP output path to pass the timestamp flags on the first fragment also for these sockets. The existing code relies on transhdrlen != 0 to indicate a first fragment. For these sockets, that assumption does not hold. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=77221 Tested SOCK_RAW on IPv4 and IPv6, not PING. Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2014 1 次提交
-
-
由 Himangi Saraogi 提交于
The semantic patch that makes this transformation is as follows: // <smpl> @@ expression e; @@ -if (e) BUG(); +BUG_ON(e); // </smpl> Signed-off-by: NHimangi Saraogi <himangi774@gmail.com> Acked-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2014 1 次提交
-
-
由 Tom Herbert 提交于
Automatically generate flow labels for IPv6 packets on transmit. The flow label is computed based on skb_get_hash. The flow label will only automatically be set when it is zero otherwise (i.e. flow label manager hasn't set one). This supports the transmit side functionality of RFC 6438. Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this functionality per socket. By default, auto flowlabels are disabled to avoid possible conflicts with flow label manager, however if this feature proves useful we may want to enable it by default. It should also be noted that FreeBSD has already implemented automatic flow labels (including the sysctl and socket option). In FreeBSD, automatic flow labels default to enabled. Performance impact: Running super_netperf with 200 flows for TCP_RR and UDP_RR for IPv6. Note that in UDP case, __skb_get_hash will be called for every packet with explains slight regression. In the TCP case the hash is saved in the socket so there is no regression. Automatic flow labels disabled: TCP_RR: 86.53% CPU utilization 127/195/322 90/95/99% latencies 1.40498e+06 tps UDP_RR: 90.70% CPU utilization 118/168/243 90/95/99% latencies 1.50309e+06 tps Automatic flow labels enabled: TCP_RR: 85.90% CPU utilization 128/199/337 90/95/99% latencies 1.40051e+06 UDP_RR 92.61% CPU utilization 115/164/236 90/95/99% latencies 1.4687e+06 Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 6月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 5月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
tot_len does specify the size of struct ipv6_txoptions. We need opt_flen + opt_nflen to calculate the overall length of additional ipv6 extensions. I found this while auditing the ipv6 output path for a memory corruption reported by Alexey Preobrazhensky while he fuzzed an instrumented AddressSanitizer kernel with trinity. This may or may not be the cause of the original bug. Fixes: 4df98e76 ("ipv6: pmtudisc setting not respected with UFO/CORK") Reported-by: NAlexey Preobrazhensky <preobr@google.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 5月, 2014 1 次提交
-
-
由 WANG Cong 提交于
As suggested by several people, rename local_df to ignore_df, since it means "ignore df bit if it is set". Cc: Maciej Żenczykowski <maze@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 5月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c92 ("vxlan: keep original skb ownership") Reported-by: Nlucien xin <lucien.xin@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 4月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
After commit c15b1cca ("ipv6: move DAD and addrconf_verify processing to workqueue") some counters are now updated in process context and thus need to disable bh before doing so, otherwise deadlocks can happen on 32-bit archs. Fabio Estevam noticed this while while mounting a NFS volume on an ARM board. As a compensation for missing this I looked after the other *_STATS_BH and found three other calls which need updating: 1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling) 2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ... (only in case of icmp protocol with raw sockets in error handling) 3) ping6_v6_sendmsg (error handling) Fixes: c15b1cca ("ipv6: move DAD and addrconf_verify processing to workqueue") Reported-by: NFabio Estevam <festevam@gmail.com> Tested-by: NFabio Estevam <fabio.estevam@freescale.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2014 1 次提交
-
-
由 lucien 提交于
In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as transport),the ipsec header need to be added in the first fragment, so the mtu will decrease to reserve space for it, then the second fragment come, the mtu should be turn back, as the commit 0c183379 said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use *mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway equal with the first fragment's. and cannot turn back. when I test through ping6 -c1 -s5000 $ip (mtu=1280): ...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232 ...frag (1232|1216) ...frag (2448|1216) ...frag (3664|1216) ...frag (4880|164) which should be: ...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232 ...frag (1232|1232) ...frag (2464|1232) ...frag (3696|1232) ...frag (4928|116) so delete the min() when change back the mtu. Signed-off-by: NXin Long <lucien.xin@gmail.com> Fixes: 75a493e6 ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size") Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2014 1 次提交
-
-
由 Li RongQing 提交于
Packets which have L2 address different from ours should be already filtered before entering into ip6_forward(). Perform that check at the beginning to avoid processing such packets. Signed-off-by: NLi RongQing <roy.qing.li@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which got recently introduced. It doesn't honor the path mtu discovered by the host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of fragments if the packet size exceeds the MTU of the outgoing interface MTU. Fixes: 93b36cf3 ("ipv6: support IPV6_PMTU_INTERFACE on sockets") Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get lots of "TRACE: filter:output:policy:1 IN=..." warnings as several places will leave skb->nf_trace uninitialised. Unlike iptables tracing functionality is not conditional in nftables, so always copy/zero nf_trace setting when nftables is enabled. Move this into __nf_copy() helper. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 14 2月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
In the IPv6 forwarding path we are only concerend about the outgoing interface MTU, but also respect locked MTUs on routes. Tunnel provider or IPSEC already have to recheck and if needed send PtB notifications to the sending host in case the data does not fit into the packet with added headers (we only know the final header sizes there, while also using path MTU information). The reason for this change is, that path MTU information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv6 forwarding path to wrongfully emit Packet-too-Big errors and drop IPv6 packets. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 12月, 2013 2 次提交
-
-
由 Hannes Frederic Sowa 提交于
Sockets marked with IPV6_PMTUDISC_PROBE (or later IPV6_PMTUDISC_INTERFACE) don't respect this setting when the outgoing interface supports UFO. We had the same problem in IPv4, which was fixed in commit daba287b ("ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK"). Also IPV6_DONTFRAG mode did not care about already corked data, thus it may generate a fragmented frame even if this socket option was specified. It also did not care about the length of the ipv6 header and possible options. In the error path allow the user to receive the pmtu notifications via both, rxpmtu method or error queue. The user may opted in for both, so deliver the notification to both error handlers (the handlers check if the error needs to be enqueued). Also report back consistent pmtu values when sending on an already cork-appended socket. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it nontheless for symmetry with IPv4 sockets. Also drop incoming MTU information if this mode is enabled. The additional bit in ipv6_pinfo just eats in the padding behind the bitfield. There are no changes to the layout of the struct at all. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
ip6_forward() runs from softirq context, we can use the SNMP macros assuming this. Use same indentation for all IP6_INC_STATS_BH() calls. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2013 1 次提交
-
-
由 Steffen Klassert 提交于
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility to sleep until the needed states are resolved. This code is gone, so FLOWI_FLAG_CAN_SLEEP is not needed anymore. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 01 12月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2013 1 次提交
-
-
由 Jiri Pirko 提交于
If reassembled packet would fit into outdev MTU, it is not fragmented according the original frag size and it is send as single big packet. The second case is if skb is gso. In that case fragmentation does not happen according to the original frag size. This patch fixes these. Signed-off-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2013 1 次提交
-
-
由 John Stultz 提交于
While enabling lockdep on seqlocks, I ran across the warning below caused by the ipv6 stats being updated in both irq and non-irq context. This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested by Eric Dumazet) to resolve this problem. [ 11.120383] ================================= [ 11.121024] [ INFO: inconsistent lock state ] [ 11.121663] 3.12.0-rc1+ #68 Not tainted [ 11.122229] --------------------------------- [ 11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 11.124505] (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130 [ 11.125736] {SOFTIRQ-ON-W} state was registered at: [ 11.126447] [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0 [ 11.127222] [<c10e2996>] lock_acquire+0x96/0xd0 [ 11.127925] [<c1a9a2c3>] write_seqcount_begin+0x33/0x40 [ 11.128766] [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460 [ 11.129582] [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80 [ 11.130014] [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0 [ 11.130014] [<c1a4d0b5>] inet_dgram_connect+0x25/0x70 [ 11.130014] [<c198dd61>] SYSC_connect+0xa1/0xc0 [ 11.130014] [<c198f571>] SyS_connect+0x11/0x20 [ 11.130014] [<c198fe6b>] SyS_socketcall+0x12b/0x300 [ 11.130014] [<c1bbf880>] syscall_call+0x7/0xb [ 11.130014] irq event stamp: 1184 [ 11.130014] hardirqs last enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110 [ 11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110 [ 11.130014] softirqs last enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0 [ 11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0 [ 11.130014] [ 11.130014] other info that might help us debug this: [ 11.130014] Possible unsafe locking scenario: [ 11.130014] [ 11.130014] CPU0 [ 11.130014] ---- [ 11.130014] lock(&stats->syncp.seq#6); [ 11.130014] <Interrupt> [ 11.130014] lock(&stats->syncp.seq#6); [ 11.130014] [ 11.130014] *** DEADLOCK *** [ 11.130014] [ 11.130014] 3 locks held by init/4483: [ 11.130014] #0: (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620 [ 11.130014] #1: (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0 [ 11.130014] #2: (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0 [ 11.130014] [ 11.130014] stack backtrace: [ 11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68 [ 11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 11.130014] 00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123 [ 11.130014] c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000 [ 11.130014] c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2 [ 11.130014] Call Trace: [ 11.130014] [<c1bb0e71>] dump_stack+0x4b/0x66 [ 11.130014] [<c1badf79>] print_usage_bug+0x1d3/0x1dd [ 11.130014] [<c10de492>] mark_lock+0x282/0x2f0 [ 11.130014] [<c10755f2>] ? kvm_clock_read+0x22/0x30 [ 11.130014] [<c10dd8b0>] ? check_usage_backwards+0x150/0x150 [ 11.130014] [<c10e0e74>] __lock_acquire+0x584/0x1af0 [ 11.130014] [<c10b1baf>] ? sched_clock_cpu+0xef/0x190 [ 11.130014] [<c10de58c>] ? mark_held_locks+0x8c/0xf0 [ 11.130014] [<c10e2996>] lock_acquire+0x96/0xd0 [ 11.130014] [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0 [ 11.130014] [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c1ab80c2>] ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c108cc32>] ? mod_timer+0xf2/0x160 [ 11.130014] [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150 [ 11.130014] [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150 [ 11.130014] [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [<c108c233>] call_timer_fn+0x73/0xf0 [ 11.130014] [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0 [ 11.130014] [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [<c108c5b1>] run_timer_softirq+0x141/0x1e0 [ 11.130014] [<c1086b20>] ? __do_softirq+0x70/0x1b0 [ 11.130014] [<c1086b70>] __do_softirq+0xc0/0x1b0 [ 11.130014] [<c1086e05>] irq_exit+0xa5/0xb0 [ 11.130014] [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50 [ 11.130014] [<c1bbfbca>] apic_timer_interrupt+0x32/0x38 [ 11.130014] [<c10936ed>] ? SyS_setpriority+0xfd/0x620 [ 11.130014] [<c10e26c9>] ? lock_release+0x9/0x240 [ 11.130014] [<c10936d7>] ? SyS_setpriority+0xe7/0x620 [ 11.130014] [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30 [ 11.130014] [<c1093701>] SyS_setpriority+0x111/0x620 [ 11.130014] [<c109363c>] ? SyS_setpriority+0x4c/0x620 [ 11.130014] [<c1bbf880>] syscall_call+0x7/0xb Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 10月, 2013 1 次提交
-
-
由 Julian Anastasov 提交于
Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 10月, 2013 1 次提交
-
-
由 Jiri Pirko 提交于
Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: NJiri Pirko <jiri@resnulli.us> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 9月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2013 1 次提交
-
-
由 Cong Wang 提交于
It will be used the vxlan kernel module. Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 8月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Currently we don't initialize skb->protocol when transmitting data via tcp, raw(with and without inclhdr) or udp+ufo or appending data directly to the socket transmit queue (via ip6_append_data). This needs to be done so that we can get the correct mtu in the xfrm layer. Setting of skb->protocol happens only in functions where we also have a transmitting socket and a new skb, so we don't overwrite old values. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 03 7月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track of this when appending the second frame on a corked socket. This results in the following splat: [37598.993962] ------------[ cut here ]------------ [37598.994008] kernel BUG at net/core/skbuff.c:2064! [37598.994008] invalid opcode: 0000 [#1] SMP [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm [37598.994008] snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video [37598.994008] CPU 0 [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG [37598.994008] RIP: 0010:[<ffffffff815443a5>] [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP: 0018:ffff88003670da18 EFLAGS: 00010202 [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0 [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00 [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040 [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8 [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000 [37598.994008] FS: 00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [37598.994008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0 [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0) [37598.994008] Stack: [37598.994008] ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8 [37598.994008] ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200 [37598.994008] 0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4 [37598.994008] Call Trace: [37598.994008] [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0 [37598.994008] [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0 [37598.994008] [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40 [37598.994008] [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10 [37598.994008] [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90 [37598.994008] [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0 [37598.994008] [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30 [37598.994008] [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0 [37598.994008] [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0 [37598.994008] [<ffffffff8153d97d>] sys_sendto+0x12d/0x180 [37598.994008] [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0 [37598.994008] [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240 [37598.994008] [<ffffffff8166a7e7>] tracesys+0xdd/0xe2 [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 [37598.994008] RIP [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP <ffff88003670da18> [37599.007323] ---[ end trace d69f6a17f8ac8eee ]--- While there, also check if path mtu discovery is activated for this socket. The logic was adapted from ip6_append_data when first writing on the corked socket. This bug was introduced with commit 0c183379 ("ipv6: fix incorrect ipsec fragment"). v2: a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE. b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao feng, thanks!). c) Change mtu to unsigned int, else we get a warning about non-matching types because of the min()-macro type-check. Acked-by: NGao feng <gaofeng@cn.fujitsu.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 6月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
It's possible to use AF_INET6 sockets and to connect to an IPv4 destination. After this, socket dst cache is a pointer to a rtable, not rt6_info. ip6_sk_dst_check() should check the socket dst cache is IPv6, or else various corruptions/crashes can happen. Dave Jones can reproduce immediate crash with trinity -q -l off -n -c sendmsg -c connect With help from Hannes Frederic Sowa Reported-by: NDave Jones <davej@redhat.com> Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2013 1 次提交
-
-
由 YOSHIFUJI Hideaki / 吉藤英明 提交于
Router Alert option is marked in skb. Previously, IP6CB(skb)->ra was set to positive value for such packets. Since commit dd3332bf ("ipv6: Store Router Alert option in IP6CB directly."), IP6SKB_ROUTERALERT is set in IP6CB(skb)->flags, and the value of Router Alert option (in network byte order) is set to IP6CB(skb)->ra for such packets. Multicast forwarding path uses that flag and value, but unicast forwarding path does not use the flag and misuses IP6CB(skb)->ra value. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
commit 0178b695 ("ipv6: Copy cork options in ip6_append_data") added some code duplication and bad error recovery, leading to potential crash in ip6_cork_release() as kfree() could be called with garbage. use kzalloc() to make sure this wont happen. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Neal Cardwell <ncardwell@google.com>
-
- 15 4月, 2013 1 次提交
-
-
由 Daniel Borkmann 提交于
Currently, sock_tx_timestamp() always returns 0. The comment that describes the sock_tx_timestamp() function wrongly says that it returns an error when an invalid argument is passed (from commit 20d49473, ``net: socket infrastructure for SO_TIMESTAMPING''). Make the function void, so that we can also remove all the unneeded if conditions that check for such a _non-existant_ error case in the output path. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Reported-by: NErik Hugne <erik.hugne@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-