- 09 10月, 2012 6 次提交
-
-
由 Eric Dumazet 提交于
It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive() if a new skb is allocated (to serve as an anchor for frag_list) We copy NAPI_GRO_CB() only (not the IPV6 specific part) in : *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p); So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead of IPPROTO_TCP (6) ipv6_gro_complete() isnt able to call ops->gro_complete() [ tcp6_gro_complete() ] Fix this by moving proto in NAPI_GRO_CB() and getting rid of IPV6_GRO_CB Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Zumbiehl 提交于
6a32e4f9 made the vlan code skip marking vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if there was an rx_handler, as the rx_handler could cause the frame to be received on a different (virtual) vlan-capable interface where that vlan might be configured. As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause frames for unknown vlans to be delivered to the protocol stack as if they had been received untagged. For example, if an ipv6 router advertisement that's tagged for a locally not configured vlan is received on an interface with macvlan interfaces attached, macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the macvlan interfaces, which caused it to be passed to the protocol stack, leading to ipv6 addresses for the announced prefix being configured even though those are completely unusable on the underlying interface. The fix moves marking as PACKET_OTHERHOST after the rx_handler so the rx_handler, if there is one, sees the frame unchanged, but afterwards, before the frame is delivered to the protocol stack, it gets marked whether there is an rx_handler or not. Signed-off-by: NFlorian Zumbiehl <florz@florz.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Current GRO can hold packets in gro_list for almost unlimited time, in case napi->poll() handler consumes its budget over and over. In this case, napi_complete()/napi_gro_flush() are not called. Another problem is that gro_list is flushed in non friendly way : We scan the list and complete packets in the reverse order. (youngest packets first, oldest packets last) This defeats priorities that sender could have cooked. Since GRO currently only store TCP packets, we dont really notice the bug because of retransmits, but this behavior can add unexpected latencies, particularly on mice flows clamped by elephant flows. This patch makes sure no packet can stay more than 1 ms in queue, and only in stress situations. It also complete packets in the right order to minimize latencies. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jesse Gross <jesse@nicira.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
We report cached pmtu values even if they are already expired. Change this to not report these values after they are expired and fix a race in the expire time calculation, as suggested by Eric Dumazet. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
When a local tool like tracepath tries to send packets bigger than the device mtu, we create a nh exeption and set the pmtu to device mtu. The device mtu does not expire, so check if the device mtu is smaller than the reported pmtu and don't crerate a nh exeption in that case. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
Some protocols, like IPsec still cache routes. So we need to invalidate the old route on pmtu events to avoid the reuse of stale routes. We also need to update the mtu and expire time of the route if we already use a nh exception route, otherwise we ignore newly learned pmtu values after the first expiration. With this patch we always invalidate or update the route on pmtu events. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2012 3 次提交
-
-
由 Eric Dumazet 提交于
Before accessing skb first fragment, better make sure there is one. This is probably not needed for old kernels, since an ethernet frame cannot contain only an ethernet header, but the recent GRO addition to tunnels makes this patch needed. Also skb_gro_reset_offset() can be static, it actually allows compiler to inline it. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPv4 side of the problem was addressed in commit a9e050f4 (net: tcp: GRO should be ECN friendly) This patch does the same, but for IPv6 : A Traffic Class mismatch doesnt mean flows are different, but instead should force a flush of previous packets. This patch removes artificial packet reordering problem. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 ramesh.nagappa@gmail.com 提交于
The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: NRamesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: NShawn Lu <shawn.lu@ericsson.com> Reviewed-by: NRobert Coulson <robert.coulson@ericsson.com> Reviewed-by: NBillie Alsup <billie.alsup@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 10月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
Over time, skb recycling infrastructure got litle interest and many bugs. Generic rx path skb allocation is now using page fragments for efficient GRO / TCP coalescing, and recyling a tx skb for rx path is not worth the pain. Last identified bug is that fat skbs can be recycled and it can endup using high order pages after few iterations. With help from Maxime Bizon, who pointed out that commit 87151b86 (net: allow pskb_expand_head() to get maximum tailroom) introduced this regression for recycled skbs. Instead of fixing this bug, lets remove skb recycling. Drivers wanting really hot skbs should use build_skb() anyway, to allocate/populate sk_buff right before netif_receive_skb() Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
I get a panic when I use ss -a and rmmod inet_diag at the same time. It's because netlink_dump uses inet_diag_dump which belongs to module inet_diag. I search the codes and find many modules have the same problem. We need to add a reference to the module which the cb->dump belongs to. Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo. Change From v3: change netlink_dump_start to inline,suggestion from Pablo and Eric. Change From v2: delete netlink_dump_done,and call module_put in netlink_dump and netlink_sock_destruct. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 10月, 2012 2 次提交
-
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 10月, 2012 7 次提交
-
-
由 Gao feng 提交于
as we hold dst_entry before we call __ip6_del_rt, so we should alse call dst_release not only return -ENOENT when the rt6_info is ip6_null_entry. and we already hold the dst entry, so I think it's safe to call dst_release out of the write-read lock. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jones 提交于
Validation of userspace input shouldn't trigger dmesg spamming. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Erik Hugne 提交于
When large buffers are sent over connected TIPC sockets, it is likely that the sk_backlog will be filled up on the receiver side, but the TIPC flow control mechanism is happily unaware of this since that is based on message count. The sender will receive a TIPC_ERR_OVERLOAD message when this occurs and drop it's side of the connection, leaving it stale on the receiver end. By increasing the sk_rcvbuf to a 'worst case' value, we avoid the overload caused by a full backlog queue and the flow control will work properly. This worst case value is the max TIPC message size times the flow control window, multiplied by two because a sender will transmit up to double the window size before a port is marked congested. We multiply this by 2 to account for the sk_buff and other overheads. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jones 提交于
Fuzzing causes these printks to spew constantly. Changing them to DEBUG statements is consistent with other usage in the file, and makes them disappear when CONFIG_IRDA_DEBUG is disabled. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Suppose we have an SCTP connection with two paths. After connection is established, path1 is not available, thus this path is marked as inactive. Then traffic goes through path2, but for some reasons packets are delayed (after rto.max). Because packets are delayed, the retransmit mechanism will switch again to path1. At this time, we receive a delayed SACK from path2. When we update the state of the path in sctp_check_transmitted(), we do not take into account the source address of the SACK, hence we update the wrong path. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Just to avoid confusion when people only reads this prototype. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.) introduced a regression for forwarding. This was hard to reproduce but the symptom was that packets were delivered to local host instead of being forwarded. David suggested to add fib_type to fib_info so that we dont inadvertently share same fib_info for different purposes. With help from Julian Anastasov who provided very helpful hints, reproduced here : <quote> Can it be a problem related to fib_info reuse from different routes. For example, when local IP address is created for subnet we have: broadcast 192.168.0.255 dev DEV proto kernel scope link src 192.168.0.1 192.168.0.0/24 dev DEV proto kernel scope link src 192.168.0.1 local 192.168.0.1 dev DEV proto kernel scope host src 192.168.0.1 The "dev DEV proto kernel scope link src 192.168.0.1" is a reused fib_info structure where we put cached routes. The result can be same fib_info for 192.168.0.255 and 192.168.0.0/24. RTN_BROADCAST is cached only for input routes. Incoming broadcast to 192.168.0.255 can be cached and can cause problems for traffic forwarded to 192.168.0.0/24. So, this patch should solve the problem because it separates the broadcast from unicast traffic. And the ip_route_input_slow caching will work for local and broadcast input routes (above routes 1 and 3) just because they differ in scope and use different fib_info. </quote> Many thanks to Chris Clayton for his patience and help. Reported-by: NChris Clayton <chris2553@googlemail.com> Bisected-by: NChris Clayton <chris2553@googlemail.com> Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Tested-by: NChris Clayton <chris2553@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2012 3 次提交
-
-
由 Antonio Quartulli 提交于
skb_reset_mac_len() relies on the value of the skb->network_header pointer, therefore we must wait for such pointer to be recalculated before computing the new mac_len value. Signed-off-by: NAntonio Quartulli <ordex@autistici.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route to fe80::/64 is added in the main table: unreachable fe80::/64 dev lo proto kernel metric 256 error -101 This route does not match any prefix (no fe80:: address on lo). In fact, addrconf_dev_config() will not add link local address because this function filters interfaces by type. If the link local address is added manually, the route to the link local prefix will be automatically added by addrconf_add_linklocal(). Note also, that this route is not deleted when the address is removed. After looking at the code, it seems that addrconf_add_lroute() is redundant with addrconf_add_linklocal(), because this function will add the link local route when the link local address is configured. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
The network merge brought in a few users of functions that got deprecated by the workqueue cleanups: the 'system_nrt_wq' is now the same as the regular system_wq, since all workqueues are now non- reentrant. Similarly, remove one use of flush_work_sync() - the regular flush_work() has become synchronous, and the "_sync()" version is thus deprecated as being superfluous. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 10月, 2012 9 次提交
-
-
由 stephen hemminger 提交于
Needed for VXLAN. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Later changes need to be able to refer to neighbour attributes when doing fdb_add. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Use be16 consistently when looking at flags. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Carpenter 提交于
Because sizeof() is size_t then if "len" is negative, it counts as a large positive value. The call tree looks like: pfkey_sendmsg() -> pfkey_process() -> pfkey_spdadd() -> parse_ipsecrequests() Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Add GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and checking GRO is building large packets. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This adds a new include file (include/net/gro_cells.h), to bring GRO (Generic Receive Offload) capability to tunnels, in a modular way. Because tunnels receive path is lockless, and GRO adds a serialization using a napi_struct, I chose to add an array of up to DEFAULT_MAX_NUM_RSS_QUEUES cells, so that multi queue devices wont be slowed down because of GRO layer. skb_get_rx_queue() is used as selector. In the future, we might add optional fanout capabilities, using rxhash for example. With help from Ben Hutchings who reminded me netif_get_num_default_rss_queues() function. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
skb with CHECKSUM_NONE cant currently be handled by GRO, and we notice this deep in GRO stack in tcp[46]_gro_receive() But there are cases where GRO can be a benefit, even with a lack of checksums. This preliminary work is needed to add GRO support to tunnels. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), two routes are added: - one in the local table: local 2002::1 via :: dev lo proto none metric 0 - one the in main table (for the prefix): unreachable 2002::1 dev lo proto kernel metric 256 error -101 When the address is deleted, the route inserted in the main table remains because we use rt6_lookup(), which returns NULL when dst->error is set, which is the case here! Thus, it is better to use ip6_route_lookup() to avoid this kind of filter. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Weiping Pan 提交于
Commit ec47ea82(skb: Add inline helper for getting the skb end offset from head) introduces this helper function, skb_end_offset(), we should make use of it. Signed-off-by: NWeiping Pan <wpan@redhat.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2012 1 次提交
-
-
由 Lin Ming 提交于
fib6_add_1() should consistently return errno pointers, rather than a mixture of NULL and errno pointers. Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 9月, 2012 7 次提交
-
-
由 Eric Dumazet 提交于
We currently use percpu order-0 pages in __netdev_alloc_frag to deliver fragments used by __netdev_alloc_skb() Depending on NIC driver and arch being 32 or 64 bit, it allows a page to be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096 Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows : - Better filling of space (the ending hole overhead is less an issue) - Less calls to page allocator or accesses to page->_count - Could allow struct skb_shared_info futures changes without major performance impact. This patch implements a transparent fallback to smaller pages in case of memory pressure. It also uses a standard "struct page_frag" instead of a custom one. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
When jiffies wraps around (for example, 5 minutes after the boot, see INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be < XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus some icmp packets can be unexpectedly dropped. Fix this case by initializing last_rate to 60 seconds in the past. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Paasch 提交于
struct sock *sk is not used inside tcp_v4_save_options. Thus it can be removed. Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
dev_parse_header() callers provide 8 bytes of storage, so it's not possible to store an IPv6 address. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
It seems sk_init() has no value today and even does strange things : # grep . /proc/sys/net/core/?mem_* /proc/sys/net/core/rmem_default:212992 /proc/sys/net/core/rmem_max:131071 /proc/sys/net/core/wmem_default:212992 /proc/sys/net/core/wmem_max:131071 We can remove it completely. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NShan Wei <davidshan@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
GCC refuses to recognize that all error control flows do in fact set err to something. Add an explicit initialization to shut it up. net/sched/sch_drr.c: In function ‘drr_enqueue’: net/sched/sch_drr.c:359:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/sched/sch_qfq.c: In function ‘qfq_enqueue’: net/sched/sch_qfq.c:885:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Konstantin Khlebnikov 提交于
fix copy-paste error introduced in linux-next commit "ipv6: add a new namespace for nf_conntrack_reasm" Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Amerigo Wang <amwang@redhat.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-