- 17 2月, 2016 2 次提交
-
-
由 Nikolay Borisov 提交于
Now that all the ip fragmentation related sysctls are namespaceified there is no reason to hide them anymore from "root" users inside containers. Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 1月, 2016 1 次提交
-
-
由 Joe Stringer 提交于
Later parts of the stack (including fragmentation) expect that there is never a socket attached to frag in a frag_list, however this invariant was not enforced on all defrag paths. This could lead to the BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the end of this commit message. While the call could be added to openvswitch to fix this particular error, the head and tail of the frags list are already orphaned indirectly inside ip_defrag(), so it seems like the remaining fragments should all be orphaned in all circumstances. kernel BUG at net/ipv4/ip_output.c:586! [...] Call Trace: <IRQ> [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch] [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch] [<ffffffff81667830>] ? dst_discard_out+0x20/0x20 [<ffffffff81667810>] ? dst_ifdown+0x80/0x80 [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack] [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch] [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch] [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch] [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch] [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch] [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch] [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch] [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0 [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0 [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930 [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220 [<ffffffff81661060>] dev_queue_xmit+0x10/0x20 [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220 [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590 [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590 [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 [<ffffffff816ab4c0>] ip_output+0x70/0x110 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 [<ffffffff816abf89>] ip_send_skb+0x19/0x40 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 [<ffffffff816df21a>] icmp_push_reply+0xea/0x120 [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230 [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70 [<ffffffff816dfa06>] icmp_echo+0x36/0x70 [<ffffffff816e0d11>] icmp_rcv+0x271/0x450 [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0 [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0 [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0 [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560 [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560 [<ffffffff816a5453>] ip_rcv+0x283/0x3e0 [<ffffffff810b6302>] ? match_held_lock+0x192/0x200 [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40 [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0 [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60 [<ffffffff8165e678>] process_backlog+0x78/0x230 [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230 [<ffffffff8165e355>] net_rx_action+0x155/0x400 [<ffffffff8106b48c>] __do_softirq+0xcc/0x420 [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590 [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8106b88e>] do_softirq+0x4e/0x60 [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0 [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590 [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 [<ffffffff816ab4c0>] ip_output+0x70/0x110 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 [<ffffffff816abf89>] ip_send_skb+0x19/0x40 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0 [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0 [<ffffffff8163e398>] sock_sendmsg+0x38/0x50 [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270 [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320 [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460 [<ffffffff81204886>] ? __fget_light+0x66/0x90 [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80 [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20 [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 RIP [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0 RSP <ffff88006d603170> Fixes: 7f8a436e ("openvswitch: Add conntrack action") Signed-off-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 1月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
The only user was removed in commit 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free clone operations"). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 11月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 10月, 2015 1 次提交
-
-
由 Eric W. Biederman 提交于
The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2015 1 次提交
-
-
由 David Ahern 提交于
Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either l3mdev_master_ifindex_rcu or l3mdev_master_ifindex. The pattern: oif = vrf_master_ifindex(dev) ? : dev->ifindex; is replaced with oif = l3mdev_fib_oif(dev); And remove the now unused vrf macros. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 8月, 2015 1 次提交
-
-
由 David Ahern 提交于
inetpeer caches based on address only, so duplicate IP addresses within a namespace return the same cached entry. Enhance the ipv4 address key to contain both the IPv4 address and VRF device index. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 8月, 2015 1 次提交
-
-
由 David Ahern 提交于
Fragmentation cache uses information from the IP header to reassemble packets. That information can be duplicated across VRFs -- same source and destination addresses, protocol and id. Handle fragmentation with VRFs by adding the VRF device index to entries in the cache and the lookup arg. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 7月, 2015 2 次提交
-
-
由 Nikolay Aleksandrov 提交于
We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags race conditions with the evictor and use a participation test for the evictor list, when we're at that point (after inet_frag_kill) in the timer there're 2 possible cases: 1. The evictor added the entry to its evictor list while the timer was waiting for the chainlock or 2. The timer unchained the entry and the evictor won't see it In both cases we should be able to see list_evictor correctly due to the sync on the chainlock. Joint work with Florian Westphal. Tested-by: NFrank Schreuder <fschreuder@transip.nl> Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Followup patch will call it after inet_frag_queue was freed, so q->net doesn't work anymore (but netf = q->net; free(q); mem_limit(netf) would). Tested-by: NFrank Schreuder <fschreuder@transip.nl> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2015 1 次提交
-
-
由 Edward Hyunkoo Jee 提交于
When ip_frag_queue() computes positions, it assumes that the passed sk_buff does not contain L2 headers. However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly functions can be called on outgoing packets that contain L2 headers. Also, IPv4 checksum is not corrected after reassembly. Fixes: 7736d33f ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: NEdward Hyunkoo Jee <edjee@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
This reverts commit 3cc49492. There is nothing wrong with coalescing during defragmentation, it reduces truesize overhead and simplifies things for the receiving socket (no fraglist walk needed). However, it also destroys geometry of the original fragments. While that doesn't cause any breakage (we make sure to not exceed largest original size) ip_do_fragment contains a 'fastpath' that takes advantage of a present frag list and results in fragments that (in most cases) match what was received. In case its needed the coalescing could be done later, when we're sure the skb is not forwarded. But discussion during NFWS resulted in 'lets just remove this for now'. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 5月, 2015 1 次提交
-
-
由 Florian Westphal 提交于
We currently always send fragments without DF bit set. Thus, given following setup: mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280 A R1 R2 B Where R1 and R2 run linux with netfilter defragmentation/conntrack enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1 will respond with icmp too big error if one of these fragments exceeded 1400 bytes. However, if R1 receives fragment sizes 1200 and 100, it would forward the reassembled packet without refragmenting, i.e. R2 will send an icmp error in response to a packet that was never sent, citing mtu that the original sender never exceeded. The other minor issue is that a refragmentation on R1 will conceal the MTU of R2-B since refragmentation does not set DF bit on the fragments. This modifies ip_fragment so that we track largest fragment size seen both for DF and non-DF packets, and set frag_max_size to the largest value. If the DF fragment size is larger or equal to the non-df one, we will consider the packet a path mtu probe: We set DF bit on the reassembled skb and also tag it with a new IPCB flag to force refragmentation even if skb fits outdev mtu. We will also set DF bit on each fragment in this case. Joint work with Hannes Frederic Sowa. Reported-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2015 2 次提交
-
-
由 Andy Zhou 提交于
users in [IP_DEFRAG_CONNTRACK_BRIDGE_IN, __IP_DEFRAG_CONNTRACK_BR_IN] should not ICMP message also. Reported-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NAndy Zhou <azhou@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andy Zhou 提交于
Improve readability of skip ICMP for de-fragmentation expiration logic. This change will also make the logic easier to maintain when the following patches in this series are applied. Signed-off-by: NAndy Zhou <azhou@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 4月, 2015 2 次提交
-
-
由 Ian Morris 提交于
The ipv4 code uses a mixture of coding styles. In some instances check for non-NULL pointer is done as x != NULL and sometimes as x. x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ian Morris 提交于
The ipv4 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: NIan Morris <ipm@chirality.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 3月, 2015 1 次提交
-
-
由 Alexander Drozdov 提交于
ip_check_defrag() may be used by af_packet to defragment outgoing packets. skb_network_offset() of af_packet's outgoing packets is not zero. Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 2月, 2015 1 次提交
-
-
由 Alexander Drozdov 提交于
skb_copy_bits() returns zero on success and negative value on error, so it is needed to invert the condition in ip_check_defrag(). Fixes: 1bf3751e ("ipv4: ip_check_defrag must not modify skb before unsharing") Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 11月, 2014 1 次提交
-
-
由 Joe Perches 提交于
Use the more common dynamic_debug capable net_dbg_ratelimited and remove the LIMIT_NETDEBUG macro. All messages are still ratelimited. Some KERN_<LEVEL> uses are changed to KERN_DEBUG. This may have some negative impact on messages that were emitted at KERN_INFO that are not not enabled at all unless DEBUG is defined or dynamic_debug is enabled. Even so, these messages are now _not_ emitted by default. This also eliminates the use of the net_msg_warn sysctl "/proc/sys/net/core/warnings". For backward compatibility, the sysctl is not removed, but it has no function. The extern declaration of net_msg_warn is removed from sock.h and made static in net/core/sysctl_net_core.c Miscellanea: o Update the sysctl documentation o Remove the embedded uses of pr_fmt o Coalesce format fragments o Realign arguments Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 11月, 2014 1 次提交
-
-
由 Fabian Frederick 提交于
remove __inline__ / inline and let compiler decide what to do with static functions Inspired-by: N"David S. Miller" <davem@davemloft.net> Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 10月, 2014 1 次提交
-
-
由 Fabian Frederick 提交于
ip4_frags_ctl_register is only called by __init ipfrag_init Signed-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 8月, 2014 3 次提交
-
-
由 Nikolay Aleksandrov 提交于
Use kmem_cache to allocate/free inet_frag_queue objects since they're all the same size per inet_frags user and are alloced/freed in high volumes thus making it a perfect case for kmem_cache. Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Now that we have INET_FRAG_EVICTED we might as well use it to stop sending icmp messages in the "frag_expire" functions instead of stripping INET_FRAG_FIRST_IN from their flags when evicting. Also fix the comment style in ip6_expire_frag_queue(). Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
The last_in field has been used to store various flags different from first/last frag in so give it a more descriptive name: flags. Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 7月, 2014 9 次提交
-
-
由 Nikolay Aleksandrov 提交于
This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit threshold equal to the sum of the individual high_thresh limits which are capped. It also introduces some sane minimums for low_thresh as it shouldn't be able to drop below 0 (or > high_thresh in the unsigned case), and overall low_thresh should not ever be above high_thresh, so we make the following relations for a namespace: init_net: high_thresh - max(not capped), min(init_net low_thresh) low_thresh - max(init_net high_thresh), min (0) all other namespaces: high_thresh = max(init_net high_thresh), min(namespace's low_thresh) low_thresh = max(namespace's high_thresh), min(0) The major issue with having low_thresh > high_thresh is that we'll schedule eviction but never evict anything and thus rely only on the timers. Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
rehash is rare operation, don't force readers to take the read-side rwlock. Instead, we only have to detect the (rare) case where the secret was altered while we are trying to insert a new inetfrag queue into the table. If it was changed, drop the bucket lock and recompute the hash to get the 'new' chain bucket that we have to insert into. Joint work with Nikolay Aleksandrov. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
merge functionality into the eviction workqueue. Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit. If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds. ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion. Joint work with Nikolay Aleksandrov. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
no longer used. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
The 'nqueues' counter is protected by the lru list lock, once thats removed this needs to be converted to atomic counter. Given this isn't used for anything except for reporting it to userspace via /proc, just remove it. We still report the memory currently used by fragment reassembly queues. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context. This has two drawbacks: 1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim. 2) LRU list maintenance is expensive. But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue. This moves eviction out of the fast path: When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired. When the high threshold is reached, no more fragment queues are created until we're below the limit again. The LRU list is now unused and will be removed in a followup patch. Joint work with Nikolay Aleksandrov. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
First step to move eviction handling into a work queue. We lose two spots that accounted evicted fragments in MIB counters. Accounting will be restored since the upcoming work-queue evictor invokes the frag queue timer callbacks instead. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
hide actual hash size from individual users: The _find function will now fold the given hash value into the required range. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 5月, 2014 1 次提交
-
-
由 Vasily Averin 提交于
Defrag user check in ip_expire was not updated after adding support for "conntrack zones". This bug manifests as a RFC violation, since the router will send the icmp time exceeeded message when using conntrack zones. Signed-off-by: NVasily Averin <vvs@openvz.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 18 12月, 2013 1 次提交
-
-
由 Tom Herbert 提交于
In several places 'skb->rxhash = 0' is being done to clear the rxhash value in an skb. This does not clear l4_rxhash which could still be set so that the rxhash wouldn't be recalculated on subsequent call to skb_get_rxhash. This patch adds an explict function to clear all the rxhash related information in the skb properly. skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as l4_rxhash. Fixed up places where 'skb->rxhash = 0' was being called. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 10月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
Defer the generation of the first hash secret for the ipv4 fragmentation cache as late as possible. ip4_frags.rnd gets initial seeded by inet_frags_init and regulary reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly from ip_fragment.c in which case we initialize the secret directly. If we first get called by inet_frag_secret_rebuild we install a new secret by a manual call to get_random_bytes. This secret will be overwritten as soon as the first call to ipqhashfn happens. This is safe because we won't race while publishing the new secrets with anyone else. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 4月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
Commit 4a94445c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, as non refcounted dst could escape an RCU protected section. Commit 64f3b9e2 (net: ip_expire() must revalidate route) fixed the case of timeouts, but not the general problem. Tom Parkin noticed crashes in UDP stack and provided a patch, but further analysis permitted us to pinpoint the root cause. Before queueing a packet into a frag list, we must drop its dst, as this dst has limited lifetime (RCU protected) When/if a packet is finally reassembled, we use the dst of the very last skb, still protected by RCU and valid, as the dst of the reassembled packet. Use same logic in IPv6, as there is no need to hold dst references. Reported-by: NTom Parkin <tparkin@katalix.com> Tested-by: NTom Parkin <tparkin@katalix.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 3月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
This patch just moves some code arround to make the ip4_frag_ecn_table and IPFRAG_ECN_* constants accessible from the other reassembly engines. I also renamed ip4_frag_ecn_table to ip_frag_ecn_table. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-