1. 11 3月, 2014 1 次提交
  2. 27 2月, 2014 1 次提交
  3. 17 2月, 2014 1 次提交
  4. 14 2月, 2014 1 次提交
    • F
      net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f
      Florian Westphal 提交于
      Marcelo Ricardo Leitner reported problems when the forwarding link path
      has a lower mtu than the incoming one if the inbound interface supports GRO.
      
      Given:
      Host <mtu1500> R1 <mtu1200> R2
      
      Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
      
      In this case, the kernel will fail to send ICMP fragmentation needed
      messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
      checks in forward path. Instead, Linux tries to send out packets exceeding
      the mtu.
      
      When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
      not fragment the packets when forwarding, and again tries to send out
      packets exceeding R1-R2 link mtu.
      
      This alters the forwarding dstmtu checks to take the individual gso
      segment lengths into account.
      
      For ipv6, we send out pkt too big error for gso if the individual
      segments are too big.
      
      For ipv4, we either send icmp fragmentation needed, or, if the DF bit
      is not set, perform software segmentation and let the output path
      create fragments when the packet is leaving the machine.
      It is not 100% correct as the error message will contain the headers of
      the GRO skb instead of the original/segmented one, but it seems to
      work fine in my (limited) tests.
      
      Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
      sofware segmentation.
      
      However it turns out that skb_segment() assumes skb nr_frags is related
      to mss size so we would BUG there.  I don't want to mess with it considering
      Herbert and Eric disagree on what the correct behavior should be.
      
      Hannes Frederic Sowa notes that when we would shrink gso_size
      skb_segment would then also need to deal with the case where
      SKB_MAX_FRAGS would be exceeded.
      
      This uses sofware segmentation in the forward path when we hit ipv4
      non-DF packets and the outgoing link mtu is too small.  Its not perfect,
      but given the lack of bug reports wrt. GRO fwd being broken this is a
      rare case anyway.  Also its not like this could not be improved later
      once the dust settles.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6cc55f
  5. 14 1月, 2014 1 次提交
    • H
      ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with it · 0954cf9c
      Hannes Frederic Sowa 提交于
      In the IPv6 forwarding path we are only concerend about the outgoing
      interface MTU, but also respect locked MTUs on routes. Tunnel provider
      or IPSEC already have to recheck and if needed send PtB notifications
      to the sending host in case the data does not fit into the packet with
      added headers (we only know the final header sizes there, while also
      using path MTU information).
      
      The reason for this change is, that path MTU information can be injected
      into the kernel via e.g. icmp_err protocol handler without verification
      of local sockets. As such, this could cause the IPv6 forwarding path to
      wrongfully emit Packet-too-Big errors and drop IPv6 packets.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: John Heffner <johnwheffner@gmail.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0954cf9c
  6. 19 12月, 2013 2 次提交
    • H
      ipv6: pmtudisc setting not respected with UFO/CORK · 4df98e76
      Hannes Frederic Sowa 提交于
      Sockets marked with IPV6_PMTUDISC_PROBE (or later IPV6_PMTUDISC_INTERFACE)
      don't respect this setting when the outgoing interface supports UFO.
      
      We had the same problem in IPv4, which was fixed in commit
      daba287b ("ipv4: fix DO and PROBE pmtu
      mode regarding local fragmentation with UFO/CORK").
      
      Also IPV6_DONTFRAG mode did not care about already corked data, thus
      it may generate a fragmented frame even if this socket option was
      specified. It also did not care about the length of the ipv6 header and
      possible options.
      
      In the error path allow the user to receive the pmtu notifications via
      both, rxpmtu method or error queue. The user may opted in for both,
      so deliver the notification to both error handlers (the handlers check
      if the error needs to be enqueued).
      
      Also report back consistent pmtu values when sending on an already
      cork-appended socket.
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df98e76
    • H
      ipv6: support IPV6_PMTU_INTERFACE on sockets · 93b36cf3
      Hannes Frederic Sowa 提交于
      IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
      nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
      information if this mode is enabled.
      
      The additional bit in ipv6_pinfo just eats in the padding behind the
      bitfield. There are no changes to the layout of the struct at all.
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93b36cf3
  7. 07 12月, 2013 1 次提交
  8. 06 12月, 2013 1 次提交
  9. 01 12月, 2013 1 次提交
  10. 11 11月, 2013 1 次提交
  11. 06 11月, 2013 1 次提交
    • J
      ipv6: Fix possible ipv6 seqlock deadlock · 5ac68e7c
      John Stultz 提交于
      While enabling lockdep on seqlocks, I ran across the warning below
      caused by the ipv6 stats being updated in both irq and non-irq context.
      
      This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested
      by Eric Dumazet) to resolve this problem.
      
      [   11.120383] =================================
      [   11.121024] [ INFO: inconsistent lock state ]
      [   11.121663] 3.12.0-rc1+ #68 Not tainted
      [   11.122229] ---------------------------------
      [   11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [   11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes:
      [   11.124505]  (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
      [   11.125736] {SOFTIRQ-ON-W} state was registered at:
      [   11.126447]   [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0
      [   11.127222]   [<c10e2996>] lock_acquire+0x96/0xd0
      [   11.127925]   [<c1a9a2c3>] write_seqcount_begin+0x33/0x40
      [   11.128766]   [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460
      [   11.129582]   [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80
      [   11.130014]   [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0
      [   11.130014]   [<c1a4d0b5>] inet_dgram_connect+0x25/0x70
      [   11.130014]   [<c198dd61>] SYSC_connect+0xa1/0xc0
      [   11.130014]   [<c198f571>] SyS_connect+0x11/0x20
      [   11.130014]   [<c198fe6b>] SyS_socketcall+0x12b/0x300
      [   11.130014]   [<c1bbf880>] syscall_call+0x7/0xb
      [   11.130014] irq event stamp: 1184
      [   11.130014] hardirqs last  enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110
      [   11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110
      [   11.130014] softirqs last  enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0
      [   11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0
      [   11.130014]
      [   11.130014] other info that might help us debug this:
      [   11.130014]  Possible unsafe locking scenario:
      [   11.130014]
      [   11.130014]        CPU0
      [   11.130014]        ----
      [   11.130014]   lock(&stats->syncp.seq#6);
      [   11.130014]   <Interrupt>
      [   11.130014]     lock(&stats->syncp.seq#6);
      [   11.130014]
      [   11.130014]  *** DEADLOCK ***
      [   11.130014]
      [   11.130014] 3 locks held by init/4483:
      [   11.130014]  #0:  (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620
      [   11.130014]  #1:  (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0
      [   11.130014]  #2:  (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0
      [   11.130014]
      [   11.130014] stack backtrace:
      [   11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68
      [   11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   11.130014]  00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123
      [   11.130014]  c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000
      [   11.130014]  c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2
      [   11.130014] Call Trace:
      [   11.130014]  [<c1bb0e71>] dump_stack+0x4b/0x66
      [   11.130014]  [<c1badf79>] print_usage_bug+0x1d3/0x1dd
      [   11.130014]  [<c10de492>] mark_lock+0x282/0x2f0
      [   11.130014]  [<c10755f2>] ? kvm_clock_read+0x22/0x30
      [   11.130014]  [<c10dd8b0>] ? check_usage_backwards+0x150/0x150
      [   11.130014]  [<c10e0e74>] __lock_acquire+0x584/0x1af0
      [   11.130014]  [<c10b1baf>] ? sched_clock_cpu+0xef/0x190
      [   11.130014]  [<c10de58c>] ? mark_held_locks+0x8c/0xf0
      [   11.130014]  [<c10e2996>] lock_acquire+0x96/0xd0
      [   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
      [   11.130014]  [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0
      [   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
      [   11.130014]  [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
      [   11.130014]  [<c108cc32>] ? mod_timer+0xf2/0x160
      [   11.130014]  [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150
      [   11.130014]  [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150
      [   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
      [   11.130014]  [<c108c233>] call_timer_fn+0x73/0xf0
      [   11.130014]  [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0
      [   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
      [   11.130014]  [<c108c5b1>] run_timer_softirq+0x141/0x1e0
      [   11.130014]  [<c1086b20>] ? __do_softirq+0x70/0x1b0
      [   11.130014]  [<c1086b70>] __do_softirq+0xc0/0x1b0
      [   11.130014]  [<c1086e05>] irq_exit+0xa5/0xb0
      [   11.130014]  [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50
      [   11.130014]  [<c1bbfbca>] apic_timer_interrupt+0x32/0x38
      [   11.130014]  [<c10936ed>] ? SyS_setpriority+0xfd/0x620
      [   11.130014]  [<c10e26c9>] ? lock_release+0x9/0x240
      [   11.130014]  [<c10936d7>] ? SyS_setpriority+0xe7/0x620
      [   11.130014]  [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30
      [   11.130014]  [<c1093701>] SyS_setpriority+0x111/0x620
      [   11.130014]  [<c109363c>] ? SyS_setpriority+0x4c/0x620
      [   11.130014]  [<c1bbf880>] syscall_call+0x7/0xb
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ac68e7c
  12. 22 10月, 2013 1 次提交
  13. 20 10月, 2013 1 次提交
  14. 24 9月, 2013 1 次提交
  15. 01 9月, 2013 1 次提交
  16. 26 8月, 2013 1 次提交
  17. 03 7月, 2013 1 次提交
    • H
      ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size · 75a493e6
      Hannes Frederic Sowa 提交于
      If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track
      of this when appending the second frame on a corked socket. This results
      in the following splat:
      
      [37598.993962] ------------[ cut here ]------------
      [37598.994008] kernel BUG at net/core/skbuff.c:2064!
      [37598.994008] invalid opcode: 0000 [#1] SMP
      [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat
      +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi
      +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm
      [37598.994008]  snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc
      +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
      [37598.994008] CPU 0
      [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG
      [37598.994008] RIP: 0010:[<ffffffff815443a5>]  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
      [37598.994008] RSP: 0018:ffff88003670da18  EFLAGS: 00010202
      [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0
      [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00
      [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040
      [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8
      [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000
      [37598.994008] FS:  00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
      [37598.994008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0
      [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0)
      [37598.994008] Stack:
      [37598.994008]  ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8
      [37598.994008]  ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200
      [37598.994008]  0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4
      [37598.994008] Call Trace:
      [37598.994008]  [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0
      [37598.994008]  [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0
      [37598.994008]  [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40
      [37598.994008]  [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10
      [37598.994008]  [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90
      [37598.994008]  [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0
      [37598.994008]  [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30
      [37598.994008]  [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0
      [37598.994008]  [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0
      [37598.994008]  [<ffffffff8153d97d>] sys_sendto+0x12d/0x180
      [37598.994008]  [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0
      [37598.994008]  [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240
      [37598.994008]  [<ffffffff8166a7e7>] tracesys+0xdd/0xe2
      [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48
      [37598.994008] RIP  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
      [37598.994008]  RSP <ffff88003670da18>
      [37599.007323] ---[ end trace d69f6a17f8ac8eee ]---
      
      While there, also check if path mtu discovery is activated for this
      socket. The logic was adapted from ip6_append_data when first writing
      on the corked socket.
      
      This bug was introduced with commit
      0c183379 ("ipv6: fix incorrect ipsec
      fragment").
      
      v2:
      a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE.
      b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao
         feng, thanks!).
      c) Change mtu to unsigned int, else we get a warning about
         non-matching types because of the min()-macro type-check.
      Acked-by: NGao feng <gaofeng@cn.fujitsu.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75a493e6
  18. 27 6月, 2013 1 次提交
  19. 26 6月, 2013 1 次提交
  20. 19 5月, 2013 1 次提交
  21. 15 4月, 2013 1 次提交
  22. 12 2月, 2013 1 次提交
  23. 07 2月, 2013 1 次提交
  24. 23 1月, 2013 1 次提交
  25. 22 1月, 2013 1 次提交
  26. 18 1月, 2013 2 次提交
  27. 17 1月, 2013 1 次提交
  28. 14 1月, 2013 1 次提交
  29. 16 11月, 2012 1 次提交
  30. 04 11月, 2012 1 次提交
  31. 02 11月, 2012 1 次提交
  32. 25 9月, 2012 1 次提交
    • E
      net: use a per task frag allocator · 5640f768
      Eric Dumazet 提交于
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5640f768
  33. 11 9月, 2012 1 次提交
  34. 30 8月, 2012 1 次提交
    • P
      netfilter: nf_conntrack_ipv6: improve fragmentation handling · 4cdd3408
      Patrick McHardy 提交于
      The IPv6 conntrack fragmentation currently has a couple of shortcomings.
      Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
      defragmented packet is then passed to conntrack, the resulting conntrack
      information is attached to each original fragment and the fragments then
      continue their way through the stack.
      
      Helper invocation occurs in the POSTROUTING hook, at which point only
      the original fragments are available. The result of this is that
      fragmented packets are never passed to helpers.
      
      This patch improves the situation in the following way:
      
      - If a reassembled packet belongs to a connection that has a helper
        assigned, the reassembled packet is passed through the stack instead
        of the original fragments.
      
      - During defragmentation, the largest received fragment size is stored.
        On output, the packet is refragmented if required. If the largest
        received fragment size exceeds the outgoing MTU, a "packet too big"
        message is generated, thus behaving as if the original fragments
        were passed through the stack from an outside point of view.
      
      - The ipv6_helper() hook function can't receive fragments anymore for
        connections using a helper, so it is switched to use ipv6_skip_exthdr()
        instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
        reassembled packets are passed to connection tracking helpers.
      
      The result of this is that we can properly track fragmented packets, but
      still generate ICMPv6 Packet too big messages if we would have before.
      
      This patch is also required as a precondition for IPv6 NAT, where NAT
      helpers might enlarge packets up to a point that they require
      fragmentation. In that case we can't generate Packet too big messages
      since the proper MTU can't be calculated in all cases (f.i. when
      changing textual representation of a variable amount of addresses),
      so the packet is transparently fragmented iff the original packet or
      fragments would have fit the outgoing MTU.
      
      IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      4cdd3408
  35. 11 7月, 2012 1 次提交
  36. 06 7月, 2012 1 次提交
  37. 05 7月, 2012 2 次提交