- 28 1月, 2009 3 次提交
-
-
由 David S. Miller 提交于
We now only TX hash on pre-computed SKB properties. The thinking is: 1) High performance routing and firewalling setups will have a multiqueue capable card used for receive, and therefore would have RX queue recordings made into the SKB which can be used for the TX side hash. 2) Locally generated packets will have an attached socket and thus a valid sk->sk_hash to make use of. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The idea is that drivers which implement multiqueue RX pre-seed the SKB by recording the RX queue selected by the hardware. If such a seed is found on TX, we'll use that to select the outgoing TX queue. This helps get more consistent load balancing on router and firewall loads. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 1月, 2009 3 次提交
-
-
由 Herbert Xu 提交于
The previous fix to paged packets broke the merging because it reset the skb->len before we added it to the merged packet. This wasn't detected because it simply resulted in the truncation of the packet while the missing bit is subsequently retransmitted. The fix is to store skb->len before we clobber it. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
When a frag is shorter than an Ethernet header, we'd return a zeroed packet instead of aborting. This patch fixes that. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Slaby 提交于
register_pernet_gen_subsys omits mutex_unlock in one fail path. Fix it. Signed-off-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2009 2 次提交
-
-
由 Jarek Poplawski 提交于
The trick in socket splicing where we try to convert the skb->data into a page based reference using virt_to_page() does not work so well. The idea is to pass the virt_to_page() reference via the pipe buffer, and refcount the buffer using a SKB reference. But if we are splicing from a socket to a socket (via sendpage) this doesn't work. The from side processing will grab the page (and SKB) references. The sendpage() calls will grab page references only, return, and then the from side processing completes and drops the SKB ref. The page based reference to skb->data is not enough to keep the kmalloc() buffer backing it from being reused. Yet, that is all that the socket send side has at this point. This leads to data corruption if the skb->data buffer is reused by SLAB before the send side socket actually gets the TX packet out to the device. The fix employed here is to simply allocate a page and copy the skb->data bytes into that page. This will hurt performance, but there is no clear way to fix this properly without a copy at the present time, and it is important to get rid of the data corruption. With fixes from Herbert Xu. Tested-by: NWilly Tarreau <w@1wt.eu> Foreseen-by: NChangli Gao <xiaosuo@gmail.com> Diagnosed-by: NWilly Tarreau <w@1wt.eu> Reported-by: NWilly Tarreau <w@1wt.eu> Fixed-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
I'm trying to track down why people're hitting the checksum warning in skb_gso_segment. As the problem seems to be hitting lots of people and I can't reproduce it or locate the bug, here is a patch to print out more details which hopefully should help us to track this down. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 1月, 2009 3 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This adds an init_dummy_netdev() function that gets a network device structure (allocation and lifetime entirely under caller's control) and initialize the minimum amount of fields so it can be used to schedule NAPI polls without registering a full blown interface. This is to be used by drivers that need to tie several hardware interfaces to a single NAPI poll scheduler due to HW limitations. It also updates the ibm_newemac driver to use that, this fixing the oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add() Symbol is exported GPL only a I don't think we want binary drivers doing that sort of acrobatics (if we want them at all). Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
When an skb with page frags is merged into an existing one, we cannibalise its reference count. This is OK when the skb is reused because we set nr_frags to zero in that case. However, for the case where the skb is freed through kfree_skb, we didn't clear nr_frags which causes the page to be freed prematurely. This is fixed by moving the skb resetting into skb_gro_receive. Reported-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
As GRO cannot be applied to packets with frag_list we need to make sure that we reject such packets if they are fed to us, e.g., through a tunnel device. Also there is no point in applying GRO on GSO packets so they too should be rejected. This allows GRO to be used in virtio-net which may produce GSO packets directly but may still benefit from GRO if the other end of it doesn't support GSO. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 1月, 2009 1 次提交
-
-
由 Dan Williams 提交于
The recent dmaengine rework removed the capability to remove dma device driver modules while net_dma is active. Rather than notify dmaengine-clients that channels are trying to be removed, we now rely on clients to notify dmaengine when they no longer have a need for channels. Teach net_dma to release channels by taking dmaengine references at netdevice open and dropping references at netdevice close. Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 1月, 2009 5 次提交
-
-
由 Herbert Xu 提交于
Previously GRO's only entry point from the outside is through napi_gro_receive and napi_gro_frags. These interfaces are for device drivers. This patch rearranges things to provide a new set of interfaces for VLANs. These interfaces are for internal use only. The VLAN code itself can then provide a set of entry points for device drivers. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Williams 提交于
All users have been converted to either the general-purpose allocator, dma_find_channel, or dma_request_channel. Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Now that clients no longer need to be notified of channel arrival dma_async_client_register can simply increment the dmaengine_ref_count. Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Use the general-purpose channel allocation provided by dmaengine. Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
async_tx and net_dma each have open-coded versions of issue_pending_all, so provide a common routine in dmaengine. The implementation needs to walk the global device list, so implement rcu to allow dma_issue_pending_all to run lockless. Clients protect themselves from channel removal events by holding a dmaengine reference. Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 06 1月, 2009 1 次提交
-
-
由 David S. Miller 提交于
This reverts commit 22604c86. We can't fix this issue in this way, because we now can try to take the dev_base_lock rwlock as a writer in software interrupt context and that is not allowed without major surgery elsewhere. This initial link state problem needs to be solved in some other way. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 1月, 2009 3 次提交
-
-
由 Michael Marineau 提交于
From: Michael Marineau <mike@marineau.org> Commit b4730016 "Do not fire linkwatch events until the device is registered." was made as a workaround for drivers that call netif_carrier_off before registering the device. Unfortunately this causes these drivers to incorrectly report their link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING flag when the interface is first brought up. This issues was previously pointed out[1] but was dismissed saying that IFF_RUNNING is not related to the link status. From my digging IFF_RUNNING, as reported to userspace, is based on the link state. It is set based on __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3], and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is not reported to user space so it may well be independent of the link, I don't know if and when it may get set.) The end result depends slightly depending on the driver. The the two I tested were e1000e and b44. With e1000e if the system is booted without a network cable attached the interface will falsely report RUNNING when it is brought up causing NetworkManager to attempt to start it and eventually time out. With b44 when the system is booted with a network cable attached and brought up with dhcpcd it will time out the first time. The attached patch that will still set the operstate variable correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called but then return rather than skipping the linkwatch_fire_event call entirely as the previous fix did. (sorry it isn't inline, I don't have a patch friendly email client at the moment) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags) in one skb, rather than using the less efficient frag_list. It also adds a new interface, napi_gro_frags to allow drivers to inject page frags directly into the stack without allocating an skb. This is intended to be the GRO equivalent for LRO's lro_receive_frags interface. The existing GSO interface can already handle page frags with or without an appended frag_list so nothing needs to be changed there. The merging itself is rather simple. We store any new frag entries after the last existing entry, without checking whether the first new entry can be merged with the last existing entry. Making this check would actually be easy but since no existing driver can produce contiguous frags anyway it would just be mental masturbation. If the total number of entries would exceed the capacity of a single skb, we simply resort to using frag_list as we do now. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
In order to allow GRO packets without frag_list at all, we need to store the MSS in the packet itself. The obvious place is gso_size. The only thing to watch out for is if the packet ends up not being GRO then we need to clear gso_size before pushing the packet into the stack. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 12月, 2008 2 次提交
-
-
由 Rusty Russell 提交于
In future all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
During network namespace teardown we either move or delete all of the network devices associated with a network namespace. In the case of veth devices deleting one will also delete it's pair device. If both devices are in the same network namespace then for_each_netdev_safe is insufficient as next may point to the second veth device we have deleted. To avoid problems I do what we do in __rtnl_kill_links and restart the scan of the device list, after we have deleted a device. Currently dev_change_netnamespace does not appear to suffer from this problem, but wireless devices are also paired and likely should be moved between network namespaces together. So I have errored on the side of caution and restart the scan of the network devices in that case as well. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 12月, 2008 1 次提交
-
-
由 Herbert Xu 提交于
The initial skb may have been freed after napi_gro_complete in napi_gro_receive if it was merged into an existing packet. Thus we cannot check same_flow (which indicates whether it was merged) after calling napi_gro_complete. This patch fixes this by saving the same_flow status before the call to napi_gro_complete. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 12月, 2008 1 次提交
-
-
由 Peter P Waskiewicz Jr 提交于
The recent GRO patches introduced the NAPI removal of devices in free_netdev. For drivers that can change the number of queues during driver operation, the NAPI infrastructure doesn't allow the freeing and re-addition of NAPI entities without reloading the driver. This change reinitializes the dev_list in each NAPI struct on delete, instead of just deleting it (and assigning the list pointers to POISON). Drivers that wish to remove/re-add NAPI will need to re-initialize the netdev napi_list after removing all NAPI instances, before re-adding NAPI devices again. Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 12月, 2008 1 次提交
-
-
由 Jarek Poplawski 提交于
A command like this: "brctl addif br1 eth1" issued as a user gave me an oops when bridge module wasn't loaded. It's caused by using a dev pointer before checking for NULL. Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 12月, 2008 3 次提交
-
-
由 David S. Miller 提交于
This reverts commit 70355602. As pointed out by Mark McLoughlin IP_PKTINFO cmsg data is one post-queueing user, so this optimization is not valid right now. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rémi Denis-Courmont 提交于
A separate xmit lock class supports GPRS over a Phonet pipe over a TUN device (type ARPHRD_NONE). Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rémi Denis-Courmont 提交于
Also leave some room for more 802.11 types. Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 12月, 2008 5 次提交
-
-
由 Herbert Xu 提交于
This patch adds the ethtool ops to enable and disable GRO. It also makes GRO depend on RX checksum offload much the same as how TSO depends on SG support. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch adds the helper skb_gro_receive to merge packets for GRO. The current method is to allocate a new header skb and then chain the original packets to its frag_list. This is done to make it easier to integrate into the existing GSO framework. In future as GSO is moved into the drivers, we can undo this and simply chain the original packets together. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch adds the top-level GRO (Generic Receive Offload) infrastructure. This is pretty similar to LRO except that this is protocol-independent. Instead of holding packets in an lro_mgr structure, they're now held in napi_struct. For drivers that intend to use this, they can set the NETIF_F_GRO bit and call napi_gro_receive instead of netif_receive_skb or just call netif_rx. The latter will call napi_receive_skb automatically. When napi_gro_receive is used, the driver must either call napi_complete/napi_rx_complete, or call napi_gro_flush in softirq context if the driver uses the primitives __napi_complete/__napi_rx_complete. Protocols will set the gro_receive and gro_complete function pointers in order to participate in this scheme. In addition to the packet, gro_receive will get a list of currently held packets. Each packet in the list has a same_flow field which is non-zero if it is a potential match for the new packet. For each packet that may match, they also have a flush field which is non-zero if the held packet must not be merged with the new packet. Once gro_receive has determined that the new skb matches a held packet, the held packet may be processed immediately if the new skb cannot be merged with it. In this case gro_receive should return the pointer to the existing skb in gro_list. Otherwise the new skb should be merged into the existing packet and NULL should be returned, unless the new skb makes it impossible for any further merges to be made (e.g., FIN packet) where the merged skb should be returned. Whenever the skb is merged into an existing entry, the gro_receive function should set NAPI_GRO_CB(skb)->same_flow. Note that if an skb merely matches an existing entry but can't be merged with it, then this shouldn't be set. If gro_receive finds it pointless to hold the new skb for future merging, it should set NAPI_GRO_CB(skb)->flush. Held packets will be flushed by napi_gro_flush which is called by napi_complete and napi_rx_complete. Currently held packets are stored in a singly liked list just like LRO. The list is limited to a maximum of 8 entries. In future, this may be expanded to use a hash table to allow more flows to be held for merging. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch allows GSO to handle frag_list in a limited way for the purposes of allowing packets merged by GRO to be refragmented on output. Most hardware won't (and aren't expected to) support handling GRO frag_list packets directly. Therefore we will perform GSO in software for those cases. However, for drivers that can support it (such as virtual NICs) we may not have to segment the packets at all. Whether the added overhead of GRO/GSO is worthwhile for bridges and routers when weighed against the benefit of potentially increasing the MTU within the host is still an open question. However, for the case of host nodes this is undoubtedly a win. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch adds limited support for handling frag_list packets in skb_segment. The intention is to support GRO (Generic Receive Offload) packets which will be constructed by chaining normal packets using frag_list. As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 12月, 2008 1 次提交
-
-
由 Neil Horman 提交于
A few months back a race was discused between the netpoll napi service path, and the fast path through net_rx_action: http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470 A patch was submitted for that bug, but I think we missed a case. Consider the following scenario: INITIAL STATE CPU0 has one napi_struct A on its poll_list CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same napi_struct A that CPU0 has on its list CPU0 CPU1 net_rx_action poll_napi !list_empty (returns true) locks poll_lock for A poll_one_napi napi->poll netif_rx_complete __napi_complete (removes A from poll_list) list_entry(list->next) In the above scenario, net_rx_action assumes that the per-cpu poll_list is exclusive to that cpu. netpoll of course violates that, and because the netpoll path can dequeue from the poll list, its possible for CPU0 to detect a non-empty list at the top of the while loop in net_rx_action, but have it become empty by the time it calls list_entry. Since the poll_list isn't surrounded by any other structure, the returned data from that list_entry call in this situation is garbage, and any number of crashes can result based on what exactly that garbage is. Given that its not fasible for performance reasons to place exclusive locks arround each cpus poll list to provide that mutal exclusion, I think the best solution is modify the netpoll path in such a way that we continue to guarantee that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've implemented the patch below. It adds an additional bit to the state field in the napi_struct. When executing napi->poll from the netpoll_path, this bit will be set. When a driver calls netif_rx_complete, if that bit is set, it will not remove the napi_struct from the poll_list. That work will be saved for the next iteration of net_rx_action. I've tested this and it seems to work well. About the biggest drawback I can see to it is the fact that it might result in an extra loop through net_rx_action in the event that the device is actually contended for (i.e. the netpoll path actually preforms all the needed work no the device, and the call to net_rx_action winds up doing nothing, except removing the napi_struct from the poll_list. However I think this is probably a small price to pay, given that the alternative is a crash. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 12月, 2008 1 次提交
-
-
由 Wang Chen 提交于
This is the last shoot of this series. After I removing all directly reference of netdev->priv, I am killing "priv" of "struct net_device" and fixing relative comments/docs. Anyone will not be allowed to reference netdev->priv directly. If you want to reference the memory of private data, use netdev_priv() instead. If the private data is not allocted when alloc_netdev(), use netdev->ml_priv to point that memory after you creating that private data. Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 11月, 2008 1 次提交
-
-
由 Jarek Poplawski 提交于
Since all other gen_estimator functions use bstats and rate_est params together, and searching for them is optimized now, let's use this also in gen_estimator_active(). The return type of gen_estimator_active() is changed to bool, and gen_find_node() parameters to const, btw. In tcf_act_police_locate() a check for ACT_P_CREATED is added before calling gen_estimator_active(). Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 11月, 2008 3 次提交
-
-
由 Eric Dumazet 提交于
When queuing a skb to sk->sk_receive_queue, we can release its dst, not anymore needed. Since current cpu did the dst_hold(), refcount is probably still hot int this cpu caches. This avoids readers to access the original dst to decrement its refcount, possibly a long time after packet reception. This should speedup UDP and RAW receive path. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Instead of using one atomic_t per protocol, use a percpu_counter for "sockets_allocated", to reduce cache line contention on heavy duty network servers. Note : We revert commit (248969ae net: af_unix can make unix_nr_socks visbile in /proc), since it is not anymore used after sock_prot_inuse_add() addition Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Found that while trying average rate policing, it was possible to request average rate policing without a rate estimator. This results in no policing which is harmless but incorrect. Since policing could be setup in two steps, need to check in the kernel. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-