- 23 4月, 2013 12 次提交
-
-
由 Wei Liu 提交于
Also fix a typo in comment. Signed-off-by: NWei Liu <wei.liu2@citrix.com> Acked-by: NIan Campbell <ian.campbell@citrix.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Craig Hada 提交于
This patch sets the coherent DMA mask to 64-bit after the be2net driver has been acknowledged that the system is 64-bit DMA capable. The coherent DMA mask is examined by the Intel IOMMU driver to determine whether to allow pass through context mapping for all devices. With this patch, the be2net driver combined with be2net compatible hardware provides comparable performance to the case where vt-d is disabled. The main use-case for this change is to decrease the time necessary to copy virtual machine memory during KVM live migration instantiations. This patch was tested on a system that enables the IOMMU in non-coherent mode. Two DMA remapper issues were encountered in the previous version and both patches have been committed. commit ea2447f7 commit 2e12bc29Signed-off-by: NCraig Hada <craig.hada@hp.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
Use GET_PROFILE_CONFIG_V1 cmd for BE3-R, to query the maximum number of TX rings available per function. On SH-R the same is queried via the GET_FUNCTION_CONFIG cmd. Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
Avoid flashing BE3 UFI on BE3-R chip by verifying asic_revision number of the chip. Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
Don't log "Out of MCCQ wrbs" msg. When the driver doesn't receive any response from the FW it already logs a "FW not responding" message. The driver runs out of MCCQ wrbs much later. Also, this message can swamp the kernel log in HW/FW error scenarios. Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
Skyhawk-R and BE3-R (SuperNIC profile) require V2 version of TXQ_CREATE cmd to be used. Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
a. Common tree of `dir` structures. b. Multi-port devices structures. CC: Francious Romieu <romieu@fz.zoreil.com> Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
CC: Francious Romieu <romieu@fz.zoreil.com> Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
CC: Francious Romieu <romieu@fz.zoreil.com> Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kravkov 提交于
introduce a procedure to read in u32 granularity. CC: Francious Romieu <romieu@fz.zoreil.com> Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com> Signed-off-by: NEilon Greenstein <eilong@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 4月, 2013 4 次提交
-
-
由 Patrick McHardy 提交于
drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_add_vlan_mc': >> drivers/s390/net/qeth_l3_main.c:1662:3: error: too few arguments to function '__vlan_find_dev_deep' include/linux/if_vlan.h:88:27: note: declared here drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_add_vlan_mc6': >> drivers/s390/net/qeth_l3_main.c:1723:3: error: too few arguments to function '__vlan_find_dev_deep' include/linux/if_vlan.h:88:27: note: declared here drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_free_vlan_addresses4': >> drivers/s390/net/qeth_l3_main.c:1767:2: error: too few arguments to function '__vlan_find_dev_deep' include/linux/if_vlan.h:88:27: note: declared here drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_free_vlan_addresses6': >> drivers/s390/net/qeth_l3_main.c:1797:2: error: too few arguments to function '__vlan_find_dev_deep' include/linux/if_vlan.h:88:27: note: declared here drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_process_inbound_buffer': >> drivers/s390/net/qeth_l3_main.c:1980:6: error: too few arguments to function '__vlan_hwaccel_put_tag' include/linux/if_vlan.h:234:31: note: declared here drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_verify_vlan_dev': >> drivers/s390/net/qeth_l3_main.c:2089:3: error: too few arguments to function '__vlan_find_dev_deep' include/linux/if_vlan.h:88:27: note: declared here Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Add missing return statement for CONFIG_BUG=n. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Fix up some function signatures for CONFIG_VLAN=n that were missed during the 802.1ad support patches. Found by the kbuild robot. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
The following leak is reported by kmemleak: [ 86.812073] kmemleak: Found object by alias at 0xffff88006ecc76f0 [ 86.816019] Pid: 739, comm: kworker/u:1 Not tainted 3.9.0-rc5+ #842 [ 86.816019] Call Trace: [ 86.816019] <IRQ> [<ffffffff81151c58>] find_and_get_object+0x8c/0xdf [ 86.816019] [<ffffffff8190e90d>] ? vlan_info_rcu_free+0x33/0x49 [ 86.816019] [<ffffffff81151cbe>] delete_object_full+0x13/0x2f [ 86.816019] [<ffffffff8194bbb6>] kmemleak_free+0x26/0x45 [ 86.816019] [<ffffffff8113e8c7>] slab_free_hook+0x1e/0x7b [ 86.816019] [<ffffffff81141c05>] kfree+0xce/0x14b [ 86.816019] [<ffffffff8190e90d>] vlan_info_rcu_free+0x33/0x49 [ 86.816019] [<ffffffff810d0b0b>] rcu_do_batch+0x261/0x4e7 The reason is that in vlan_info_rcu_free() we don't take the VLAN protocol into account when iterating over the vlan_devices_array. Reported-by: NCong Wang <amwang@redhat.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Tested-by: NCong Wang <amwang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 4月, 2013 24 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next由 David S. Miller 提交于
Pablo Neira Ayuso says: ==================== The following patchset contains a small batch of Netfilter updates for your net-next tree, they are: * Three patches that provide more accurate error reporting to user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing code and NAT, from Patrick McHardy. * Update copyright statements in Netfilter filters of Patrick McHardy, from himself. * Add Kconfig dependency on the raw/mangle tables to the rpfilter, from Florian Westphal. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andy Gospodarek 提交于
This patch adds support for the get_settings ethtool op to the bonding driver. This was motivated by users who wanted to get the speed of the bond and compare that against throughput to understand utilization. The behavior before this patch was added was problematic when computing line utilization after trying to get link-speed and throughput via SNMP. Output from ethtool looks like this for a round-robin bond: Settings for bond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 11000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Link detected: yes I tested this and verified it works as expected. A test was also done on a version backported to an older kernel and it worked well there. v2: Switch to using ethtool_cmd_speed_set to set speed, added check to SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations as they were not needed, and set port type to 'Other.' v3: Fix useless assignment and checkpatch warning. Signed-off-by: NAndy Gospodarek <andy@greyhouse.net> Reviewed-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This patch introduces a small, internal helper function, that is used by PF_PACKET. Based on the flags that are passed, it extracts the packet timestamp in the receive path. This is merely a refactoring to remove some duplicate code in tpacket_rcv(), to make it more readable, and to enable others to use this function in PF_PACKET as well, e.g. for TX. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Currently, ktime2ts is a small helper function that is only used in net/socket.c. Move this helper into the ktime API as a small inline function, so that i) it's maintained together with ktime routines, and ii) also other files can make use of it. The function is named ktime_to_timespec_cond() and placed into the generic part of ktime, since we internally make use of ktime_to_timespec(). ktime_to_timespec() itself does not check the ktime variable for zero, hence, we name this function ktime_to_timespec_cond() for only a conditional conversion, and adapt its users to it. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Noticed by Ben Hutchings. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Rajesh Borundia says: ==================== * "qlcnic: Change 82xx adapter VLAN id endian type". - Adapter requires VLAN id in little endian. VLAN id was being converted to __le16 and then passed as a parameter. Pass VLAN id as u16 and then use cpu_to_le16 at appropriate places. It is appropriate for net-next as SR-IOV patches have a dependency on it. * "qlcnic: Fix loopback test for SR-IOV PF". - It is appropriate for net-next as change is needed for SRIOV PF only. * Remaining patches add enhancements to SR-IOV functionality like - FLR handling - Adapter reset recovery handling - iproute2 tool support for configuring MAC address, Tx rate and VLAN id. - Mailbox polling support for SR-IOV PF in case mailbox interrupts are disabled. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o When mailbox interrupt is disabled PF should be able to process request from VF. Enable polling for such cases. Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o Do not disable mailbox interrupts while running loopback test through SR-IOV PF. Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o Add support for VLAN id configuration per VF using iproute2 tool. o VLAN id's 1-4094 are treated as PVID by the PF and Guest VLAN tagging is not allowed by default. o PVID is disabled when the VLAN id is set to 0 o Guest VLAN tagging is allowed when the VLAN id is set to 4095. o Only one Guest VLAN id is supported. o VLAN id can be changed only when the VF driver is not loaded. Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o Add support for MAC address and Tx rate configuration per VF via iproute2 tool. o Tx rate change is allowed while the guest is running and the VF driver is loaded. o MAC address change is allowed only when VF driver is not loaded. Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o Implement recovery mechanism for VF to recover from adapter resets. Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o FLR from Hypervisor - When hypervisor issues a VF FLR request, adapter notifies the parent PF driver of the FLR request for PF driver to perform any cleanup on behalf of that VF. o FLR from VF Driver - VF driver may initiate a VF FLR request, if VF state needs to be cleaned up before a re-initialization. VF re-initialization during kdump is an example. o PF driver cleans up all resources allocated on behalf of a VF, on VF FLR notifications from the adapter or from the VF driver. Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rajesh Borundia 提交于
o 82xx adapter requires VLAN id in little endian format. Instead of passing vlan id parameter as __le16, pass the parameter as u16 and use cpu_to_le16 at appropriate places. Signed-off-by: NRajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Patrick McHardy says: ==================== The following patches contain an implementation of memory mapped I/O for netlink. The implementation is modelled after AF_PACKET memory mapped I/O with a few differences: - In order to perform memory mapped I/O to userspace, the kernel allocates skbs with the data area pointing to the data area of the mapped frames. All netlink subsystems assume a linear data area, so for the sake of simplicity, the mapped data area is not attached to the paged area but to skb->data. This requires introduction of a special skb alloction function that just allocates an skb head without the data area. Since this is a quite rare use case, I introduced a new function based on __alloc_skb instead of splitting it up into head and data alloction. The alternative would be to introduce an __alloc_skb_head and __alloc_skb_data function, which would actually be useful for a specific error case in memory mapped netlink, but would require a couple of extra instructions for the common skb allocation case, so it doesn't really seem worth it. In order to get the destination memory area for skb->data before message construction, memory mapped netlink I/O needs to look up the destination socket during allocation instead of during transmission because the ring is owned by the receiveing socket/process. A special skb allocation function (netlink_alloc_skb) taking the destination pid as an argument is used for this, all subsystems that want to support memory mapped I/O need to use this function, automatic fallback to the receive queue happens for unconverted subsystems. Dumps automatically use memory mapped I/O if the receiving socket has enabled it. The visible effect of looking up the destination socket during allocation instead of transmission is that message ordering in userspace might change in case allocation and transmission aren't performed atomically. This usually doesn't matter since most subsystems have a BKL-like lock like the rtnl mutex, to my knowledge the currently only existing case where it might matter is nfnetlink_queue combined with the recently introduced batched verdicts, but a) that subsystem already includes sequence numbers which allow userspace to reorder messages in case it cares to, also the reodering window is quite small and b) with memory mapped transmission batching can be performed in a subsystem indepandant manner. - AF_NETLINK contains flow control for database dumps, with regular I/O dump continuation are triggered based on the sockets receive queue space and by recvmsg() calls. Since with memory mapped I/O there are no recvmsg() calls under normal operation, this is done in netlink_poll(), under the assumption that userspace has processed all pending frames before invoking poll(), thus the ring is expected to have room for new messages. Dumps currently don't benefit as much as they could from memory mapped I/O because each single continuation requires a poll() call. A more agressive approach seems like a good idea to me, especially in case the socket is not subscribed to any multicast groups (IOW only receiving explicitly requested data). Besides that, the memory mapped netlink implementation extends the states defined by AF_PACKET between userspace and the kernel by a SKIP status, this is intended for the case that userspace wants to queue frames (specifically when using nfnetlink_queue, an IDS and stream reassembly, requested by Eric Leblond) for a longer period of time. The kernel skips over all frames marked with SKIP when looking or unused frames and only fails when not finding a free frame or when having skipped the entire ring. Also noteworthy is memory mapped sendmsg: the kernel performs validation of messages before accepting and processing them, in order to prevent userspace from changing the messages contents after validation, the kernel checks that the ring is only mapped once and the file descriptor is not shared (in order to avoid having userspace set up another mapping after the first mentioned check). If either of both is not true, the message copied to an allocated skb and processed as with regular I/O. I'd especially appreciate review of this part since I'm not really versed in memory, file and process management, The remaining interesting details are included in the changelogs of the individual patches and the documentation, so I won't repeat them here. As an example, nfnetlink_queue is convererted to support memory mapped I/O. Other subsystems that would probably benefit are nfnetlink_log, audit and maybe ISCSI, not sure. Following are some numbers collected by Florian Westphal based on a slightly older version, which included an experimental patch for the nfnetlink_queue ordering issue. === Test hardware is a 12-core machine Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz ixgbe interfaces are used (i.e., multiqueue nics). irqs are distributed across the cpus. I've made several tests. The simple one consists of 3GBit UDP traffic, packets are 1500 bytes in size (i.e., no fragmentation), with a single nfqueue and the test client programs in libmnl examples directory. Packets are sent from one /24 net to another /24 net, i.e. there are a few hundred flows active at any given time. I've also tested with snort, but I disabled all rules. 6Gbit UDP traffic is generated in the snort case, and 6 nfqueues are used (i.e., 6 snorts run in parallel). I've tested with 3 different kernels, all based on 3.7.1. - 3.7.1, without the mmap patches - 3.7.1, with Patricks mmap patches - 3.7.1, with mmap patches and extended spinlock to ensure packet ids are monotonically increasing and cannot be re-ordered. This is what we currently ship in our product. [ the spinlock that is extended is the per nfqueue spinlock, it will be held from the time the netlink skb is allocated until the netlink skb is sent to userspace: http://1984.lsi.us.es/git/nf-next/commit/?h=mmap-netlink3&id=b8eb19c46650fef4e9e4fe53f367f99bbf72afc9 ] snort is normally used in "batch mode", i.e., after processing 25 packets a single "batch verdict" is sent to accept the packets seen so far. "mmap snort" means RX_RING + sendmsg(), i.e. TX_RING is not used at this time (except where noted below). One reason is that snort has a reload thread, so kernel needs to copy; also in the snort case no payload rewrite takes place, so compared to the rx path the tx path is cheap. Results: 3.7.1, without mmap patches, i.e. recv()+sendmsg() for everyone nfq-queue: 1.7 gbit out snort-recv-batch-25 5.1 gbit out snort-recv-no-batch 3.1 gbit out 3.7.1 + mmap + without extended spinlocked section nfq-queue: 1.7 gbit out (recv/sendmsg) nfq-queue-mmap: 2.4 gbit out snort-mmap-batch-25 5.6 gbit out (warning: since ids can be re-ordered, this version is "broken"). snort-recv-batch-25 5.1 gbit out snort-mmap-no-batch 4.6 gbit out (i.e., one verdict per packet) Kernel 3.7.1 + mmap + extended spinlock section: nfq-queue: 1.4 gbit out nfq-queue-mmap: 2.3 gbit out snort: 5.6 gbit out Conclusions: - The "extended spinlocked section" hurts performance in the single queue case; with 6 snorts there is no measureable slowdown. - I tried to re-write the mmap-snort to work without batch verdicts, but results were not very encouraging: kernel 3.7.1 + mmap (without extended spinlocked section): snort-mmap-batch-25 5.6 gbit out (what we currenlty ship) snort-recv-batch-25 5.1 gbit out (without using mmap) snort-mmap-batch-1 4.6 gbit out (with mmap but without batch verdicts) snort-mmap-txring-25 5.2 gbit out (with mmap but without batch verdicts) snort-mmap-txring-1 4.6 gbit out (with mmap but without batch verdicts) The difference between the last two is that in the txring-25 case, we put a verdict into the tx ring after every packet, but will only invoke sendmsg(, NULL, 0) after processing 25 packets. So the only difference is the number of sendmsg calls/context switches. So, i.o.w, kernel 3.7.1 + mmap + the extra locking crap is faster than 3.7.1 + mmap-without-extra-locking and single-verdict-per packet. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Get rid of the confusing mix of pid and portid and use portid consistently for all netlink related socket identities. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Based on AF_PACKET. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Add flow control for memory mapped RX. Since user-space usually doesn't invoke recvmsg() when using memory mapped I/O, flow control is performed in netlink_poll(). Dumps are allowed to continue if at least half of the ring frames are unused. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Add support for mmap'ed recvmsg(). To allow the kernel to construct messages into the mapped area, a dataless skb is allocated and the data pointer is set to point into the ring frame. This means frames will be delivered to userspace in order of allocation instead of order of transmission. This usually doesn't matter since the order is either not determinable by userspace or message creation/transmission is serialized. The only case where this can have a visible difference is nfnetlink_queue. Userspace can't assume mmap'ed messages have ordered IDs anymore and needs to check this if using batched verdicts. For non-mapped sockets, nothing changes. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Add support for mmap'ed sendmsg() to netlink. Since the kernel validates received messages before processing them, the code makes sure userspace can't modify the message contents after invoking sendmsg(). To do that only a single mapping of the TX ring is allowed to exist and the socket must not be shared. If either of these two conditions does not hold, it falls back to copying. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Add helper functions for looking up mmap'ed frame headers, reading and writing their status, allocating skbs with mmap'ed data areas and a poll function. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-