- 14 8月, 2015 1 次提交
-
-
由 David Ahern 提交于
If output device is not specified use VRF device if input device is enslaved. This is needed to ensure tcp acks and resets go out VRF device. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
text data bss dec hex filename old: 16527 44 0 16571 40bb net/ipv4/ip_output.o new: 14935 44 0 14979 3a83 net/ipv4/ip_output.o Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 6月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
since commit d6b915e2 ("ip_fragment: don't forward defragmented DF packet") the largest fragment size is available in the IPCB. Therefore we no longer need to care about 'encapsulation' overhead of stripped PPPOE/VLAN headers since ip_do_fragment doesn't use device mtu in such cases. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 28 5月, 2015 2 次提交
-
-
由 Florian Westphal 提交于
We currently always send fragments without DF bit set. Thus, given following setup: mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280 A R1 R2 B Where R1 and R2 run linux with netfilter defragmentation/conntrack enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1 will respond with icmp too big error if one of these fragments exceeded 1400 bytes. However, if R1 receives fragment sizes 1200 and 100, it would forward the reassembled packet without refragmenting, i.e. R2 will send an icmp error in response to a packet that was never sent, citing mtu that the original sender never exceeded. The other minor issue is that a refragmentation on R1 will conceal the MTU of R2-B since refragmentation does not set DF bit on the fragments. This modifies ip_fragment so that we track largest fragment size seen both for DF and non-DF packets, and set frag_max_size to the largest value. If the DF fragment size is larger or equal to the non-df one, we will consider the packet a path mtu probe: We set DF bit on the reassembled skb and also tag it with a new IPCB flag to force refragmentation even if skb fits outdev mtu. We will also set DF bit on each fragment in this case. Joint work with Hannes Frederic Sowa. Reported-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
ip_skb_dst_mtu is small inline helper, but its called in several places. before: 17061 44 0 17105 42d1 net/ipv4/ip_output.o after: 16805 44 0 16849 41d1 net/ipv4/ip_output.o Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 5月, 2015 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2015 1 次提交
-
-
由 Andy Zhou 提交于
When bridge netfilter re-fragments an IP packet for output, all packets that can not be re-fragmented to their original input size should be silently discarded. However, current bridge netfilter output path generates an ICMP packet with 'size exceeded MTU' message for such packets, this is a bug. This patch refactors the ip_fragment() API to allow two separate use cases. The bridge netfilter user case will not send ICMP, the routing output will, as before. Signed-off-by: NAndy Zhou <azhou@nicira.com> Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 5月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
__ip_local_out_sk() is only used from net/ipv4/ip_output.c net/ipv4/ip_output.c:94:5: warning: symbol '__ip_local_out_sk' was not declared. Should it be static? Fixes: 7026b1dd ("netfilter: Pass socket pointer down through okfn().") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2015 2 次提交
-
-
由 Sheng Yong 提交于
Signed-off-by: NSheng Yong <shengyong1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Miller 提交于
On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 4月, 2015 2 次提交
-
-
由 Ian Morris 提交于
The ipv4 code uses a mixture of coding styles. In some instances check for non-NULL pointer is done as x != NULL and sometimes as x. x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ian Morris 提交于
The ipv4 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 3月, 2015 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
As namespaces are sometimes used with overlapping ip address ranges, we should also use the namespace as input to the hash to select the ip fragmentation counter bucket. Cc: Eric Dumazet <edumazet@google.com> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 3月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
The br_netfilter frag output function calls skb_cow_head() so in case it needs a larger headroom to e.g. re-add a previously stripped PPPOE or VLAN header things will still work (at cost of reallocation). We can then move nf_bridge_encap_header_len to br_netfilter. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 03 3月, 2015 1 次提交
-
-
由 Michal Kubeček 提交于
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer checksum is to be computed on segmentation. However, in this case, skb->csum_start and skb->csum_offset are never set as raw socket transmit path bypasses udp_send_skb() where they are usually set. As a result, driver may access invalid memory when trying to calculate the checksum and store the result (as observed in virtio_net driver). Moreover, the very idea of modifying the userspace provided UDP header is IMHO against raw socket semantics (I wasn't able to find a document clearly stating this or the opposite, though). And while allowing CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit too intrusive change just to handle a corner case like this. Therefore disallowing UFO for packets from SOCK_DGRAM seems to be the best option. Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 2月, 2015 1 次提交
-
-
由 Al Viro 提交于
That takes care of the majority of ->sendmsg() instances - most of them via memcpy_to_msg() or assorted getfrag() callbacks. One place where we still keep memcpy_fromiovecend() is tipc - there we potentially read the same data over and over; separate patch, that... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 2月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 1月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
When I added sk_pacing_rate field, I forgot to initialize its value in the per cpu unicast_sock used in ip_send_unicast_reply() This means that for sch_fq users, RST packets, or ACK packets sent on behalf of TIME_WAIT sockets might be sent to slowly or even dropped once we reach the per flow limit. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 95bd09eb ("tcp: TSO packets automatic sizing") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 12月, 2014 2 次提交
-
-
由 Al Viro 提交于
Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 11月, 2014 1 次提交
-
-
由 Joe Perches 提交于
These messages aren't useful as there's a generic dump_stack() on OOM. Neaten the comment and if test above the OOM by separating the assign in if into an allocation then if test. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 10月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
skb_gso_segment has three possible return values: 1. a pointer to the first segmented skb 2. an errno value (IS_ERR()) 3. NULL. This can happen when GSO is used for header verification. However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL and would oops when NULL is returned. Note that these call sites should never actually see such a NULL return value; all callers mask out the GSO bits in the feature argument. However, there have been issues with some protocol handlers erronously not respecting the specified feature mask in some cases. It is preferable to get 'have to turn off hw offloading, else slow' reports rather than 'kernel crashes'. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 10月, 2014 1 次提交
-
-
由 Vasily Averin 提交于
ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork and in case errors in __ip_append_data() nobody frees stolen dst entry Fixes: 2e77d89b ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()") Signed-off-by: NVasily Averin <vvs@parallels.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2014 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
In 34666d46 ("netfilter: bridge: move br_netfilter out of the core"), the bridge netfilter code has been modularized. Use IS_ENABLED instead of ifdef to cover the module case. Fixes: 34666d46 ("netfilter: bridge: move br_netfilter out of the core") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 29 9月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt Lets break this assumption, but provide a helper to not change all call points. ip_send_unicast_reply() gets a new struct ip_options pointer. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2014 1 次提交
-
-
由 Willem de Bruijn 提交于
Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 7月, 2014 1 次提交
-
-
由 Willem de Bruijn 提交于
Add SO_TIMESTAMPING to sockets of type PF_INET[6]/SOCK_RAW: Add the necessary sock_tx_timestamp calls to the datapath for RAW sockets (ping sockets already had these calls). Fix the IP output path to pass the timestamp flags on the first fragment also for these sockets. The existing code relies on transhdrlen != 0 to indicate a first fragment. For these sockets, that assumption does not hold. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=77221 Tested SOCK_RAW on IPv4 and IPv6, not PING. Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 6月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 5月, 2014 1 次提交
-
-
由 Lorenzo Colitti 提交于
Kernel-originated IP packets that have no user socket associated with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.) are emitted with a mark of zero. Add a sysctl to make them have the same mark as the packet they are replying to. This allows an administrator that wishes to do so to use mark-based routing, firewalling, etc. for these replies by marking the original packets inbound. Tested using user-mode linux: - ICMP/ICMPv6 echo replies and errors. - TCP RST packets (IPv4 and IPv6). Signed-off-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 5月, 2014 1 次提交
-
-
由 WANG Cong 提交于
As suggested by several people, rename local_df to ignore_df, since it means "ignore df bit if it is set". Cc: Maciej Żenczykowski <maze@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 5月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
Doing the segmentation in the forward path has one major drawback: When using virtio, we may process gso udp packets coming from host network stack. In that case, netfilter POSTROUTING will see one packet with udp header followed by multiple ip fragments. Delay the segmentation and do it after POSTROUTING invocation to avoid this. Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2014 2 次提交
-
-
由 Eric Dumazet 提交于
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c92 ("vxlan: keep original skb ownership") Reported-by: Nlucien xin <lucien.xin@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit 31c70d59 ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: 31c70d59 ("l2tp: keep original skb ownership") Reported-by: NZhan Jianyu <nasa4836@gmail.com> Tested-by: NZhan Jianyu <nasa4836@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2014 2 次提交
-
-
由 Hannes Frederic Sowa 提交于
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the generation of fragments if the interface mtu is exceeded, it is very hard to make use of this option in already deployed name server software for which I introduced this option. This patch adds yet another new IP_MTU_DISCOVER option to not honor any path mtu information and not accepting new icmp notifications destined for the socket this option is enabled on. But we allow outgoing fragmentation in case the packet size exceeds the outgoing interface mtu. As such this new option can be used as a drop-in replacement for IP_PMTUDISC_DONT, which is currently in use by most name server software making the adoption of this option very smooth and easy. The original advantage of IP_PMTUDISC_INTERFACE is still maintained: ignoring incoming path MTU updates and not honoring discovered path MTUs in the output path. Fixes: 482fc609 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE") Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
ip_skb_dst_mtu mostly falls back to ip_dst_mtu_maybe_forward if no socket is attached to the skb (in case of forwarding) or determines the mtu like we do in ip_finish_output, which actually checks if we should branch to ip_fragment. Thus use the same function to determine the mtu here, too. This is important for the introduction of IP_PMTUDISC_OMIT, where we want the packets getting cut in pieces of the size of the outgoing interface mtu. IPv6 already does this correctly. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get lots of "TRACE: filter:output:policy:1 IN=..." warnings as several places will leave skb->nf_trace uninitialised. Unlike iptables tracing functionality is not conditional in nftables, so always copy/zero nf_trace setting when nftables is enabled. Move this into __nf_copy() helper. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 15 1月, 2014 1 次提交
-
-
由 WANG Cong 提交于
We still need this notifier even when we don't config PROC_FS. It should be rare to have a kernel without PROC_FS, so just for completeness. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2014 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
While forwarding we should not use the protocol path mtu to calculate the mtu for a forwarded packet but instead use the interface mtu. We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was introduced for multicast forwarding. But as it does not conflict with our usage in unicast code path it is perfect for reuse. I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular dependencies because of IPSKB_FORWARDED. Because someone might have written a software which does probe destinations manually and expects the kernel to honour those path mtus I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone can disable this new behaviour. We also still use mtus which are locked on a route for forwarding. The reason for this change is, that path mtus information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv4 forwarding path to wrongfully emit fragmentation needed notifications or start to fragment packets along a path. Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED won't be set and further fragmentation logic will use the path mtu to determine the fragmentation size. They also recheck packet size with help of path mtu discovery and report appropriate errors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 12月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
We report different pmtu values back on the first write and on further writes on an corked socket. Also don't include the dst.header_len (respectively exthdrlen) as this should already be dealt with by the interface mtu of the outgoing (virtual) interface and policy of that interface should dictate if fragmentation should happen. Instead reduce the pmtu data by IP options as we do for IPv6. Make the same changes for ip_append_data, where we did not care about options or dst.header_len at all. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 11月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery, their sockets won't accept and install new path mtu information and they will always use the interface mtu for outgoing packets. It is guaranteed that the packet is not fragmented locally. But we won't set the DF-Flag on the outgoing frames. Florian Weimer had the idea to use this flag to ensure DNS servers are never generating outgoing fragments. They may well be fragmented on the path, but the server never stores or usees path mtu values, which could well be forged in an attack. (The root of the problem with path MTU discovery is that there is no reliable way to authenticate ICMP Fragmentation Needed But DF Set messages because they are sent from intermediate routers with their source addresses, and the IMCP payload will not always contain sufficient information to identify a flow.) Recent research in the DNS community showed that it is possible to implement an attack where DNS cache poisoning is feasible by spoofing fragments. This work was done by Amir Herzberg and Haya Shulman: <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf> This issue was previously discussed among the DNS community, e.g. <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>, without leading to fixes. This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK" for the enforcement of the non-fragmentable checks. If other users than ip_append_page/data should use this semantic too, we have to add a new flag to IPCB(skb)->flags to suppress local fragmentation and check for this in ip_finish_output. Many thanks to Florian Weimer for the idea and feedback while implementing this patch. Cc: David S. Miller <davem@davemloft.net> Suggested-by: NFlorian Weimer <fweimer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-