- 11 3月, 2014 1 次提交
-
-
由 Li RongQing 提交于
Packets which have L2 address different from ours should be already filtered before entering into ip6_forward(). Perform that check at the beginning to avoid processing such packets. Signed-off-by: NLi RongQing <roy.qing.li@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which got recently introduced. It doesn't honor the path mtu discovered by the host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of fragments if the packet size exceeds the MTU of the outgoing interface MTU. Fixes: 93b36cf3 ("ipv6: support IPV6_PMTU_INTERFACE on sockets") Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get lots of "TRACE: filter:output:policy:1 IN=..." warnings as several places will leave skb->nf_trace uninitialised. Unlike iptables tracing functionality is not conditional in nftables, so always copy/zero nf_trace setting when nftables is enabled. Move this into __nf_copy() helper. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 14 2月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
In the IPv6 forwarding path we are only concerend about the outgoing interface MTU, but also respect locked MTUs on routes. Tunnel provider or IPSEC already have to recheck and if needed send PtB notifications to the sending host in case the data does not fit into the packet with added headers (we only know the final header sizes there, while also using path MTU information). The reason for this change is, that path MTU information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv6 forwarding path to wrongfully emit Packet-too-Big errors and drop IPv6 packets. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 12月, 2013 2 次提交
-
-
由 Hannes Frederic Sowa 提交于
Sockets marked with IPV6_PMTUDISC_PROBE (or later IPV6_PMTUDISC_INTERFACE) don't respect this setting when the outgoing interface supports UFO. We had the same problem in IPv4, which was fixed in commit daba287b ("ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK"). Also IPV6_DONTFRAG mode did not care about already corked data, thus it may generate a fragmented frame even if this socket option was specified. It also did not care about the length of the ipv6 header and possible options. In the error path allow the user to receive the pmtu notifications via both, rxpmtu method or error queue. The user may opted in for both, so deliver the notification to both error handlers (the handlers check if the error needs to be enqueued). Also report back consistent pmtu values when sending on an already cork-appended socket. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it nontheless for symmetry with IPv4 sockets. Also drop incoming MTU information if this mode is enabled. The additional bit in ipv6_pinfo just eats in the padding behind the bitfield. There are no changes to the layout of the struct at all. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
ip6_forward() runs from softirq context, we can use the SNMP macros assuming this. Use same indentation for all IP6_INC_STATS_BH() calls. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2013 1 次提交
-
-
由 Steffen Klassert 提交于
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility to sleep until the needed states are resolved. This code is gone, so FLOWI_FLAG_CAN_SLEEP is not needed anymore. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 01 12月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2013 1 次提交
-
-
由 Jiri Pirko 提交于
If reassembled packet would fit into outdev MTU, it is not fragmented according the original frag size and it is send as single big packet. The second case is if skb is gso. In that case fragmentation does not happen according to the original frag size. This patch fixes these. Signed-off-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2013 1 次提交
-
-
由 John Stultz 提交于
While enabling lockdep on seqlocks, I ran across the warning below caused by the ipv6 stats being updated in both irq and non-irq context. This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested by Eric Dumazet) to resolve this problem. [ 11.120383] ================================= [ 11.121024] [ INFO: inconsistent lock state ] [ 11.121663] 3.12.0-rc1+ #68 Not tainted [ 11.122229] --------------------------------- [ 11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 11.124505] (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130 [ 11.125736] {SOFTIRQ-ON-W} state was registered at: [ 11.126447] [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0 [ 11.127222] [<c10e2996>] lock_acquire+0x96/0xd0 [ 11.127925] [<c1a9a2c3>] write_seqcount_begin+0x33/0x40 [ 11.128766] [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460 [ 11.129582] [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80 [ 11.130014] [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0 [ 11.130014] [<c1a4d0b5>] inet_dgram_connect+0x25/0x70 [ 11.130014] [<c198dd61>] SYSC_connect+0xa1/0xc0 [ 11.130014] [<c198f571>] SyS_connect+0x11/0x20 [ 11.130014] [<c198fe6b>] SyS_socketcall+0x12b/0x300 [ 11.130014] [<c1bbf880>] syscall_call+0x7/0xb [ 11.130014] irq event stamp: 1184 [ 11.130014] hardirqs last enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110 [ 11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110 [ 11.130014] softirqs last enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0 [ 11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0 [ 11.130014] [ 11.130014] other info that might help us debug this: [ 11.130014] Possible unsafe locking scenario: [ 11.130014] [ 11.130014] CPU0 [ 11.130014] ---- [ 11.130014] lock(&stats->syncp.seq#6); [ 11.130014] <Interrupt> [ 11.130014] lock(&stats->syncp.seq#6); [ 11.130014] [ 11.130014] *** DEADLOCK *** [ 11.130014] [ 11.130014] 3 locks held by init/4483: [ 11.130014] #0: (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620 [ 11.130014] #1: (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0 [ 11.130014] #2: (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0 [ 11.130014] [ 11.130014] stack backtrace: [ 11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68 [ 11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 11.130014] 00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123 [ 11.130014] c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000 [ 11.130014] c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2 [ 11.130014] Call Trace: [ 11.130014] [<c1bb0e71>] dump_stack+0x4b/0x66 [ 11.130014] [<c1badf79>] print_usage_bug+0x1d3/0x1dd [ 11.130014] [<c10de492>] mark_lock+0x282/0x2f0 [ 11.130014] [<c10755f2>] ? kvm_clock_read+0x22/0x30 [ 11.130014] [<c10dd8b0>] ? check_usage_backwards+0x150/0x150 [ 11.130014] [<c10e0e74>] __lock_acquire+0x584/0x1af0 [ 11.130014] [<c10b1baf>] ? sched_clock_cpu+0xef/0x190 [ 11.130014] [<c10de58c>] ? mark_held_locks+0x8c/0xf0 [ 11.130014] [<c10e2996>] lock_acquire+0x96/0xd0 [ 11.130014] [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0 [ 11.130014] [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c1ab80c2>] ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c108cc32>] ? mod_timer+0xf2/0x160 [ 11.130014] [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150 [ 11.130014] [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150 [ 11.130014] [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [<c108c233>] call_timer_fn+0x73/0xf0 [ 11.130014] [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0 [ 11.130014] [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [<c108c5b1>] run_timer_softirq+0x141/0x1e0 [ 11.130014] [<c1086b20>] ? __do_softirq+0x70/0x1b0 [ 11.130014] [<c1086b70>] __do_softirq+0xc0/0x1b0 [ 11.130014] [<c1086e05>] irq_exit+0xa5/0xb0 [ 11.130014] [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50 [ 11.130014] [<c1bbfbca>] apic_timer_interrupt+0x32/0x38 [ 11.130014] [<c10936ed>] ? SyS_setpriority+0xfd/0x620 [ 11.130014] [<c10e26c9>] ? lock_release+0x9/0x240 [ 11.130014] [<c10936d7>] ? SyS_setpriority+0xe7/0x620 [ 11.130014] [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30 [ 11.130014] [<c1093701>] SyS_setpriority+0x111/0x620 [ 11.130014] [<c109363c>] ? SyS_setpriority+0x4c/0x620 [ 11.130014] [<c1bbf880>] syscall_call+0x7/0xb Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 10月, 2013 1 次提交
-
-
由 Julian Anastasov 提交于
Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 10月, 2013 1 次提交
-
-
由 Jiri Pirko 提交于
Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: NJiri Pirko <jiri@resnulli.us> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 9月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2013 1 次提交
-
-
由 Cong Wang 提交于
It will be used the vxlan kernel module. Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 8月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Currently we don't initialize skb->protocol when transmitting data via tcp, raw(with and without inclhdr) or udp+ufo or appending data directly to the socket transmit queue (via ip6_append_data). This needs to be done so that we can get the correct mtu in the xfrm layer. Setting of skb->protocol happens only in functions where we also have a transmitting socket and a new skb, so we don't overwrite old values. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 03 7月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track of this when appending the second frame on a corked socket. This results in the following splat: [37598.993962] ------------[ cut here ]------------ [37598.994008] kernel BUG at net/core/skbuff.c:2064! [37598.994008] invalid opcode: 0000 [#1] SMP [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm [37598.994008] snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video [37598.994008] CPU 0 [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG [37598.994008] RIP: 0010:[<ffffffff815443a5>] [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP: 0018:ffff88003670da18 EFLAGS: 00010202 [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0 [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00 [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040 [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8 [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000 [37598.994008] FS: 00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [37598.994008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0 [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0) [37598.994008] Stack: [37598.994008] ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8 [37598.994008] ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200 [37598.994008] 0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4 [37598.994008] Call Trace: [37598.994008] [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0 [37598.994008] [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0 [37598.994008] [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40 [37598.994008] [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10 [37598.994008] [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90 [37598.994008] [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0 [37598.994008] [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30 [37598.994008] [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0 [37598.994008] [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0 [37598.994008] [<ffffffff8153d97d>] sys_sendto+0x12d/0x180 [37598.994008] [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0 [37598.994008] [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240 [37598.994008] [<ffffffff8166a7e7>] tracesys+0xdd/0xe2 [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 [37598.994008] RIP [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP <ffff88003670da18> [37599.007323] ---[ end trace d69f6a17f8ac8eee ]--- While there, also check if path mtu discovery is activated for this socket. The logic was adapted from ip6_append_data when first writing on the corked socket. This bug was introduced with commit 0c183379 ("ipv6: fix incorrect ipsec fragment"). v2: a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE. b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao feng, thanks!). c) Change mtu to unsigned int, else we get a warning about non-matching types because of the min()-macro type-check. Acked-by: NGao feng <gaofeng@cn.fujitsu.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 6月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
It's possible to use AF_INET6 sockets and to connect to an IPv4 destination. After this, socket dst cache is a pointer to a rtable, not rt6_info. ip6_sk_dst_check() should check the socket dst cache is IPv6, or else various corruptions/crashes can happen. Dave Jones can reproduce immediate crash with trinity -q -l off -n -c sendmsg -c connect With help from Hannes Frederic Sowa Reported-by: NDave Jones <davej@redhat.com> Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2013 1 次提交
-
-
由 YOSHIFUJI Hideaki / 吉藤英明 提交于
Router Alert option is marked in skb. Previously, IP6CB(skb)->ra was set to positive value for such packets. Since commit dd3332bf ("ipv6: Store Router Alert option in IP6CB directly."), IP6SKB_ROUTERALERT is set in IP6CB(skb)->flags, and the value of Router Alert option (in network byte order) is set to IP6CB(skb)->ra for such packets. Multicast forwarding path uses that flag and value, but unicast forwarding path does not use the flag and misuses IP6CB(skb)->ra value. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
commit 0178b695 ("ipv6: Copy cork options in ip6_append_data") added some code duplication and bad error recovery, leading to potential crash in ip6_cork_release() as kfree() could be called with garbage. use kzalloc() to make sure this wont happen. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Neal Cardwell <ncardwell@google.com>
-
- 15 4月, 2013 1 次提交
-
-
由 Daniel Borkmann 提交于
Currently, sock_tx_timestamp() always returns 0. The comment that describes the sock_tx_timestamp() function wrongly says that it returns an error when an invalid argument is passed (from commit 20d49473, ``net: socket infrastructure for SO_TIMESTAMPING''). Make the function void, so that we can also remove all the unneeded if conditions that check for such a _non-existant_ error case in the output path. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Reported-by: NErik Hugne <erik.hugne@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2013 1 次提交
-
-
由 Steffen Klassert 提交于
Calling icmpv6_send() on a local message size error leads to an incorrect update of the path mtu in the case when IPsec is used. So use ipv6_local_error() instead to notify the socket about the error. Reported-by: NJiri Bohac <jbohac@suse.cz> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 1月, 2013 1 次提交
-
-
由 Cong Wang 提交于
It is declared in: include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)); and net/ip6_route.h is already included. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 1月, 2013 1 次提交
-
-
由 YOSHIFUJI Hideaki / 吉藤英明 提交于
- move ip6_nd_hdr() to its users' source files. In net/ipv6/mcast.c, it will be called ip6_mc_hdr(). - make return type to void since this function never fails. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 1月, 2013 2 次提交
-
-
由 YOSHIFUJI Hideaki / 吉藤英明 提交于
If neigh is not found, create new one. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 YOSHIFUJI Hideaki / 吉藤英明 提交于
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 1月, 2013 1 次提交
-
-
由 Romain KUNTZ 提交于
Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem) has introduced a error in the header length calculation that provokes corrupted packets when non-fragmentable extensions headers (Destination Option or Routing Header Type 2) are used. rt->rt6i_nfheader_len is the length of the non-fragmentable extension header, and it should be substracted to rt->dst.header_len, and not to exthdrlen, as it was done before commit 299b0767. This patch reverts to the original and correct behavior. It has been successfully tested with and without IPsec on packets that include non-fragmentable extensions headers. Signed-off-by: NRomain Kuntz <r.kuntz@ipflavors.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2013 1 次提交
-
-
由 YOSHIFUJI Hideaki / 吉藤英明 提交于
This is not only for readability but also for optimization. What we do here is to build the 32bit word at the beginning of the ipv6 header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and we do not need to read the tclass portion of the target buffer. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2012 1 次提交
-
-
由 Vlad Yasevich 提交于
UDP offload needs some additional functions to be in the static kernel for it work correclty. Move those functions into the core. Signed-off-by: NVlad Yasevich <vyasevic@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 11月, 2012 1 次提交
-
-
由 Amerigo Wang 提交于
As suggested by Eric, we could introduce a helper function for ipv6 too, to avoid checking if rt is NULL before dst_release(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 11月, 2012 1 次提交
-
-
由 Amerigo Wang 提交于
#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 9月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
We currently use a per socket order-0 page cache for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of memory for applications handling many mostly idle sockets, since each socket holds one page in sk->sk_sndmsg_page Its also quite inefficient to build TSO 64KB packets, because we need about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit page allocator more than wanted. This patch adds a per task frag allocator and uses bigger pages, if available. An automatic fallback is done in case of memory pressure. (up to 32768 bytes per frag, thats order-3 pages on x86) This increases TCP stream performance by 20% on loopback device, but also benefits on other network devices, since 8x less frags are mapped on transmit and unmapped on tx completion. Alexander Duyck mentioned a probable performance win on systems with IOMMU enabled. Its possible some SG enabled hardware cant cope with bigger fragments, but their ndo_start_xmit() should already handle this, splitting a fragment in sub fragments, since some arches have PAGE_SIZE=65536 Successfully tested on various ethernet devices. (ixgbe, igb, bnx2x, tg3, mellanox mlx4) Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 9月, 2012 1 次提交
-
-
由 Amerigo Wang 提交于
After this commit: commit 97cac082 Author: David S. Miller <davem@davemloft.net> Date: Mon Jul 2 22:43:47 2012 -0700 ipv6: Store route neighbour in rt6_info struct. we no longer use RCU to protect route neighbour. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2012 1 次提交
-
-
由 Patrick McHardy 提交于
The IPv6 conntrack fragmentation currently has a couple of shortcomings. Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the defragmented packet is then passed to conntrack, the resulting conntrack information is attached to each original fragment and the fragments then continue their way through the stack. Helper invocation occurs in the POSTROUTING hook, at which point only the original fragments are available. The result of this is that fragmented packets are never passed to helpers. This patch improves the situation in the following way: - If a reassembled packet belongs to a connection that has a helper assigned, the reassembled packet is passed through the stack instead of the original fragments. - During defragmentation, the largest received fragment size is stored. On output, the packet is refragmented if required. If the largest received fragment size exceeds the outgoing MTU, a "packet too big" message is generated, thus behaving as if the original fragments were passed through the stack from an outside point of view. - The ipv6_helper() hook function can't receive fragments anymore for connections using a helper, so it is switched to use ipv6_skip_exthdr() instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the reassembled packets are passed to connection tracking helpers. The result of this is that we can properly track fragmented packets, but still generate ICMPv6 Packet too big messages if we would have before. This patch is also required as a precondition for IPv6 NAT, where NAT helpers might enlarge packets up to a point that they require fragmentation. In that case we can't generate Packet too big messages since the proper MTU can't be calculated in all cases (f.i. when changing textual representation of a variable amount of addresses), so the packet is transparently fragmented iff the original packet or fragments would have fit the outgoing MTU. IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>. Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
- 11 7月, 2012 1 次提交
-
-
由 David S. Miller 提交于
Only use it in the absolutely required cases: 1) COW'ing metrics 2) ipv4 PMTU 3) ipv4 redirects Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is "struct dst_entry **", not "struct dst_entry *" Reported-by: NFengguang Wu <wfg@linux.intel.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 7月, 2012 2 次提交
-
-
由 David S. Miller 提交于
This makes for a simplified conversion away from dst_get_neighbour*(). All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
When a dst_confirm() happens, mark the confirmation as pending in the dst. Then on the next packet out, when we have the neigh in-hand, do the update. This removes the dependency in dst_confirm() of dst's having an attached neigh. While we're here, remove the explicit 'dst' NULL check, all except 2 or 3 call sites ensure it's not NULL. So just fix those cases up. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-