- 20 11月, 2016 1 次提交
-
-
由 Alexey Dobriyan 提交于
1) cast to "int" is unnecessary: u8 will be promoted to int before decrementing, small positive numbers fit into "int", so their values won't be changed during promotion. Once everything is int including loop counters, signedness doesn't matter: 32-bit operations will stay 32-bit operations. But! Someone tried to make this loop smart by making everything of the same type apparently in an attempt to optimise it. Do the optimization, just differently. Do the cast where it matters. :^) 2) frag size is unsigned entity and sum of fragments sizes is also unsigned. Make everything unsigned, leave no MOVSX instruction behind. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4) function old new delta skb_cow_data 835 834 -1 ip_do_fragment 2549 2548 -1 ip6_fragment 3130 3128 -2 Total: Before=154865032, After=154865028, chg -0.00% Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 11月, 2016 1 次提交
-
-
由 David Lebrun 提交于
This patch prepares for insertion of SRH through setsockopt(). The new source address argument is used when an HMAC field is present in the SRH, which must be filled. The HMAC signature process requires the source address as input text. Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 11月, 2016 1 次提交
-
-
由 Jakub Sitnicki 提交于
Similar to commit c146066a ("ipv4: Don't use ufo handling on later transformed packets"), don't perform UFO on packets that will be IPsec transformed. To detect it we rely on the fact that headerlen in dst_entry is non-zero only for transformation bundles (xfrm_dst objects). Unwanted segmentation can be observed with a NETIF_F_UFO capable device, such as a dummy device: DEV=dum0 LEN=1493 ip li add $DEV type dummy ip addr add fc00::1/64 dev $DEV nodad ip link set $DEV up ip xfrm policy add dir out src fc00::1 dst fc00::2 \ tmpl src fc00::1 dst fc00::2 proto esp spi 1 ip xfrm state add src fc00::1 dst fc00::2 \ proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b tcpdump -n -nn -i $DEV -t & socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN tcpdump output before: IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448 IP6 fc00::1 > fc00::2: frag (1448|48) IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88 ... and after: IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448 IP6 fc00::1 > fc00::2: frag (1448|80) Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach") Signed-off-by: NJakub Sitnicki <jkbs@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 9月, 2016 3 次提交
-
-
由 David Ahern 提交于
No longer needed Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
A previous patch added l3mdev flow update making these hooks redundant. Remove them. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
This patch adds the infrastructure to the output path to pass an skb to an l3mdev device if it has a hook registered. This is the Tx parallel to l3mdev_ip{6}_rcv in the receive path and is the basis for removing the existing hook that returns the vrf dst on the fib lookup. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2016 1 次提交
-
-
由 Roopa Prabhu 提交于
Today mpls iptunnel lwtunnel_output redirect expects the tunnel output function to handle fragmentation. This is ok but can be avoided if we did not do the mpls output redirect too early. ie we could wait until ip fragmentation is done and then call mpls output for each ip fragment. To make this work we will need, 1) the lwtunnel state to carry encap headroom 2) and do the redirect to the encap output handler on the ip fragment (essentially do the output redirect after fragmentation) This patch adds tunnel headroom in lwtstate to make sure we account for tunnel data in mtu calculations during fragmentation and adds new xmit redirect handler to redirect to lwtunnel xmit func after ip fragmentation. This includes IPV6 and some mtu fixes and testing from David Ahern. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2016 1 次提交
-
-
由 David Ahern 提交于
IPv6 source address selection needs to consider the real egress route. Similar to IPv4 implement a get_saddr6 method which is called if source address has not been set. The get_saddr6 method does a full lookup which means pulling a route from the VRF FIB table and properly considering linklocal/multicast destination addresses. Lookup failures (eg., unreachable) then cause the source address selection to fail which gets propagated back to the caller. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 6月, 2016 1 次提交
-
-
由 Jakub Sitnicki 提交于
At present we perform an xfrm_lookup() for each UDPv6 message we send. The lookup involves querying the flow cache (flow_cache_lookup) and, in case of a cache miss, creating an XFRM bundle. If we miss the flow cache, we can end up creating a new bundle and deriving the path MTU (xfrm_init_pmtu) from on an already transformed dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down to xfrm_lookup(). This can happen only if we're caching the dst_entry in the socket, that is when we're using a connected UDP socket. To put it another way, the path MTU shrinks each time we miss the flow cache, which later on leads to incorrectly fragmented payload. It can be observed with ESPv6 in transport mode: 1) Set up a transformation and lower the MTU to trigger fragmentation # ip xfrm policy add dir out src ::1 dst ::1 \ tmpl src ::1 dst ::1 proto esp spi 1 # ip xfrm state add src ::1 dst ::1 \ proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b # ip link set dev lo mtu 1500 2) Monitor the packet flow and set up an UDP sink # tcpdump -ni lo -ttt & # socat udp6-listen:12345,fork /dev/null & 3) Send a datagram that needs fragmentation with a connected socket # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345 2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error 00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448 00:00:00.000014 IP6 ::1 > ::1: frag (1448|32) 00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272 (^ ICMPv6 Parameter Problem) 00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136 4) Compare it to a non-connected socket # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345 00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448 00:00:00.000010 IP6 ::1 > ::1: frag (1448|64) What happens in step (3) is: 1) when connecting the socket in __ip6_datagram_connect(), we perform an XFRM lookup, miss the flow cache, create an XFRM bundle, and cache the destination, 2) afterwards, when sending the datagram, we perform an XFRM lookup, again, miss the flow cache (due to mismatch of flowi6_iif and flowi6_oif, which is an issue of its own), and recreate an XFRM bundle based on the cached (and already transformed) destination. To prevent the recreation of an XFRM bundle, avoid an XFRM lookup altogether whenever we already have a destination entry cached in the socket. This prevents the path MTU shrinkage and brings us on par with UDPv4. The fix also benefits connected PINGv6 sockets, another user of ip6_sk_dst_lookup_flow(), who also suffer messages being transformed twice. Joint work with Hannes Frederic Sowa. Reported-by: NJan Tluka <jtluka@redhat.com> Signed-off-by: NJakub Sitnicki <jkbs@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2016 1 次提交
-
-
由 Marcelo Ricardo Leitner 提交于
skb_gso_network_seglen is not enough for checking fragment sizes if skb is using GSO_BY_FRAGS as we have to check frag per frag. This patch introduces skb_gso_validate_mtu, based on the former, which will wrap the use case inside it as all calls to skb_gso_network_seglen were to validate if it fits on a given TMU, and improve the check. Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2016 1 次提交
-
-
由 Wei Wang 提交于
In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local variables like hlimits, tclass, opt and dontfrag and pass them to corresponding functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames. This is not a good practice and makes it hard to add new parameters. This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in ipv4 and include the above mentioned variables. And we only pass the pointer to this structure to corresponding functions. This makes it easier to add new parameters in the future and makes the function cleaner. Signed-off-by: NWei Wang <weiwan@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Rename IP6_INC_STATS_BH() to __IP6_INC_STATS() and IP6_ADD_STATS_BH() to __IP6_ADD_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2016 1 次提交
-
-
由 Jakub Sitnicki 提交于
When sending a UDPv6 message longer than MTU, account for the length of fragmentable IPv6 extension headers in skb->network_header offset. Same as we do in alloc_new_skb path in __ip6_append_data(). This ensures that later on __ip6_make_skb() will make space in headroom for fragmentable extension headers: /* move skb->data to ip header from ext header */ if (skb->data < skb_network_header(skb)) __skb_pull(skb, skb_network_offset(skb)); Prevents a splat due to skb_under_panic: skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \ head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] KASAN CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65 [...] Call Trace: [<ffffffff813eb7b9>] skb_push+0x79/0x80 [<ffffffff8143397b>] eth_header+0x2b/0x100 [<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310 [<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0 [<ffffffff814efe3a>] ip6_output+0x16a/0x280 [<ffffffff815440c1>] ip6_local_out+0xb1/0xf0 [<ffffffff814f1115>] ip6_send_skb+0x45/0xd0 [<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0 [<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090 [...] Reported-by: NJi Jianwen <jiji@redhat.com> Signed-off-by: NJakub Sitnicki <jkbs@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2016 1 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
Currently, SOL_TIMESTAMPING can only be enabled using setsockopt. This is very costly when users want to sample writes to gather tx timestamps. Add support for enabling SO_TIMESTAMPING via control messages by using tsflags added in `struct sockcm_cookie` (added in the previous patches in this series) to set the tx_flags of the last skb created in a sendmsg. With this patch, the timestamp recording bits in tx_flags of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg. Please note that this is only effective for overriding the recording timestamps flags. Users should enable timestamp reporting (e.g., SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using socket options and then should ask for SOF_TIMESTAMPING_TX_* using control messages per sendmsg to sample timestamps for each write. Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2016 1 次提交
-
-
由 WANG Cong 提交于
After commit 52bd2d62 ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 1月, 2016 1 次提交
-
-
由 Paolo Abeni 提交于
The current implementation of ip6_dst_lookup_tail basically ignore the egress ifindex match: if the saddr is set, ip6_route_output() purposefully ignores flowi6_oif, due to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"), if the saddr is 'any' the first route lookup in ip6_dst_lookup_tail fails, but upon failure a second lookup will be performed with saddr set, thus ignoring the ifindex constraint. This commit adds an output route lookup function variant, which allows the caller to specify lookup flags, and modify ip6_dst_lookup_tail() to enforce the ifindex match on the second lookup via said helper. ip6_route_output() becames now a static inline function build on top of ip6_route_output_flags(); as a side effect, out-of-tree modules need now a GPL license to access the output route lookup functionality. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 1月, 2016 1 次提交
-
-
由 Michal Kubeček 提交于
Commit acf8dd0a ("udp: only allow UFO for packets from SOCK_DGRAM sockets") disallows UFO for packets sent from raw sockets. We need to do the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if for a bit different reason: while such socket would override the CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and bad offloading flags warning is triggered in __skb_gso_segment(). In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow UFO for packets sent by sockets with UDP_NO_CHECK6_TX option. Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Tested-by: NShannon Nelson <shannon.nelson@intel.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 12月, 2015 1 次提交
-
-
由 Tom Herbert 提交于
These netif flags are unnecessary convolutions. It is more straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM directly. This patch also: - Cleans up can_checksum_protocol - Simplifies netdev_intersect_features Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 11月, 2015 2 次提交
-
-
由 Hannes Frederic Sowa 提交于
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Fixes: commit 32dce968 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in the existence of IPv6 extension headers, but the condition was wrong. Fix it up, too. Also the condition to check whether the packet fits into one fragment was wrong and has been corrected. Fixes: commit 32dce968 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 10月, 2015 2 次提交
-
-
由 Hannes Frederic Sowa 提交于
Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b4, reversing changes made to c80dbe04. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 10月, 2015 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 10月, 2015 1 次提交
-
-
由 David Ahern 提交于
6e28b000 ("net: Fix vti use case with oif in dst lookups for IPv6") is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. Fixes: 42a7b32b ("xfrm: Add oif to dst lookups") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 10月, 2015 1 次提交
-
-
由 David Ahern 提交于
As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded table ids with possibly device specific ones and manipulating the oif in the flowi6. The flow flags are used to skip oif compare in nexthop lookups if the device is enslaved to a VRF via the L3 master device. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 10月, 2015 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
This is a clone of commit 2ab95749 ("ip_forward: Drop frames with attached skb->sk") for ipv6. This commit has exactly the same reasons as the above mentioned commit, namely to prevent panics during netfilter reload or a misconfigured stack. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2015 4 次提交
-
-
由 Eric W. Biederman 提交于
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Replace dst_output_okfn with dst_output Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2015 1 次提交
-
-
由 Eric W. Biederman 提交于
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 26 9月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
This is to document that socket lock might not be held at this point. skb_set_owner_w() and ipv6_local_error() are using proper atomic ops or spinlocks, so we promote the socket to non const when calling them. netfilter hooks should never assume socket lock is held, we also promote the socket to non const. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch socket, lets add a const qualifier. This will permit the same change in inet6_csk_route_req() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 9月, 2015 7 次提交
-
-
由 Florian Westphal 提交于
David Woodhouse reports skb_under_panic when we try to push ethernet header to fragmented ipv6 skbs: skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000 data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan [..] ip6_finish_output2+0x196/0x4da David further debugged this: [..] offending fragments were arriving here with skb_headroom(skb)==10. Which is reasonable, being the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP frame type. The problem is that if netfilter ipv6 defragmentation is used, skb_cow() in ip6_forward will only see reassembled skb. Therefore, headroom is overestimated by 8 bytes (we pulled fragment header) and we don't check the skbs in the frag_list either. We can't do these checks in netfilter defrag since outdev isn't known yet. Furthermore, existing tests in ip6_fragment did not consider the fragment or ipv6 header size when checking headroom of the fraglist skbs. While at it, also fix a skb leak on memory allocation -- ip6_fragment must consume the skb. I tested this e1000 driver hacked to not allocate additional headroom (we end up in slowpath, since LL_RESERVED_SPACE is 16). If 2 bytes of headroom are allocated, fastpath is taken (14 byte ethernet header was pulled, so 16 byte headroom available in all fragments). Reported-by: NDavid Woodhouse <dwmw2@infradead.org> Diagnosed-by: NDavid Woodhouse <dwmw2@infradead.org> Signed-off-by: NFlorian Westphal <fw@strlen.de> Tested-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
In code review it was noticed that I had failed to add some blank lines in places where they are customarily used. Taking a second look at the code I have to agree blank lines would be nice so I have added them here. Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Keep net in a local variable so I can use it in NF_HOOK_COND when I pass struct net to all of the netfilter hooks. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-