- 07 1月, 2014 40 次提交
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch由 David S. Miller 提交于
Jesse Gross says: ==================== [GIT net-next] Open vSwitch Open vSwitch changes for net-next/3.14. Highlights are: * Performance improvements in the mechanism to get packets to userspace using memory mapped netlink and skb zero copy where appropriate. * Per-cpu flow stats in situations where flows are likely to be shared across CPUs. Standard flow stats are used in other situations to save memory and allocation time. * A handful of code cleanups and rationalization. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Several functions and datastructures could be local Found with 'make namespacecheck' Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
The copy & csum optimization is no longer present with zerocopy enabled. Compute the checksum in skb_gso_segment() directly by dropping the HW CSUM capability from the features passed in. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Use of skb_zerocopy() can avoid the expensive call to memcpy() when copying the packet data into the Netlink skb. Completes checksum through skb_checksum_help() if not already done in GSO segmentation. Zerocopy is only performed if user space supported unaligned Netlink messages. memory mapped netlink i/o is preferred over zerocopy if it is set up. Cost of upcall is significantly reduced from: + 7.48% vhost-8471 [k] memcpy + 5.57% ovs-vswitchd [k] memcpy + 2.81% vhost-8471 [k] csum_partial_copy_generic to: + 5.72% ovs-vswitchd [k] memcpy + 3.32% vhost-5153 [k] memcpy + 0.68% vhost-5153 [k] skb_zerocopy (megaflows disabled) Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Allows removing the net and dp_ifindex argument and simplify the code. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Drop user features if an outdated user space instance that does not understand the concept of user_features attempted to create a new datapath. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Signed-off-by: NThomas Graf <tgraf@suug.ch> Reviewed-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Make the skb zerocopy logic written for nfnetlink queue available for use by other modules. Signed-off-by: NThomas Graf <tgraf@suug.ch> Reviewed-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Wei Yongjun 提交于
Remove duplicated include. Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Daniel Borkmann 提交于
As we're only doing a kfree() anyway in the RCU callback, we can simply use kfree_rcu, which does the same job, and remove the function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback(). Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Pravin B Shelar 提交于
With mega flow implementation ovs flow can be shared between multiple CPUs which makes stats updates highly contended operation. This patch uses per-CPU stats in cases where a flow is likely to be shared (if there is a wildcard in the 5-tuple and therefore likely to be spread by RSS). In other situations, it uses the current strategy, saving memory and allocation time. Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Use memory mapped Netlink i/o for all unicast openvswitch communication if a ring has been set up. Benchmark * pktgen -> ovs internal port * 5M pkts, 5M flows * 4 threads, 8 cores Before: Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags) 74163pps 5339Mb/sec (5339736000bps) errors: 0 + 2.98% ovs-vswitchd [k] copy_user_generic_string + 2.49% ovs-vswitchd [k] memcpy + 1.84% kpktgend_2 [k] memcpy + 1.81% kpktgend_1 [k] memcpy + 1.81% kpktgend_3 [k] memcpy + 1.78% kpktgend_0 [k] memcpy After: Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags) 206358pps 14857Mb/sec (14857776000bps) errors: 0 + 2.80% ovs-vswitchd [k] memcpy + 1.31% kpktgend_2 [k] memcpy + 1.23% kpktgend_0 [k] memcpy + 1.09% kpktgend_1 [k] memcpy + 1.04% kpktgend_3 [k] memcpy + 0.96% ovs-vswitchd [k] copy_user_generic_string Signed-off-by: NThomas Graf <tgraf@suug.ch> Reviewed-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
An insufficent ring frame size configuration can lead to an unnecessary skb allocation for every Netlink message. Check frame size before taking the queue lock and allocating the skb and re-check with lock to be safe. Signed-off-by: NThomas Graf <tgraf@suug.ch> Reviewed-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Thomas Graf 提交于
Allocates a new sk_buff large enough to cover the specified payload plus required Netlink headers. Will check receiving socket for memory mapped i/o capability and use it if enabled. Will fall back to non-mapped skb if message size exceeds the frame size of the ring. Signed-of-by: NThomas Graf <tgraf@suug.ch> Reviewed-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Jesse Gross 提交于
Flow lookup can happen either in packet processing context or userspace context but it was annotated as requiring RCU read lock to be held. This also allows OVS mutex to be held without causing warnings. Reported-by: NJustin Pettit <jpettit@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com> Reviewed-by: NThomas Graf <tgraf@redhat.com>
-
由 Andy Zhou 提交于
API changes only for code readability. No functional chnages. This patch removes the underscored version. Added a new API ovs_flow_tbl_lookup_stats() that returns the n_mask_hits. Reported by: Ben Pfaff <blp@nicira.com> Reviewed-by: NThomas Graf <tgraf@redhat.com> Signed-off-by: NAndy Zhou <azhou@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Ben Pfaff 提交于
We won't normally have a ton of flow masks but using a size_t to store values no bigger than sizeof(struct sw_flow_key) seems excessive. This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit systems. On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but sw_flow_mask only by 8 bytes due to padding. Compile tested only. Signed-off-by: NBen Pfaff <blp@nicira.com> Acked-by: NAndy Zhou <azhou@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
由 Ben Pfaff 提交于
Signed-off-by: NBen Pfaff <blp@nicira.com> Signed-off-by: NJesse Gross <jesse@nicira.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net由 David S. Miller 提交于
Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jamal Hadi Salim 提交于
action flushing missaccounting Account only for deleted actions Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jamal Hadi Salim 提交于
Remove unnecessary checks for act->ops (suggested by Eric Dumazet). Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions. Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Curt Brune 提交于
br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: NCurt Brune <curt@cumulusnetworks.com> Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Sathya Perla posted a patch trying to address following problem : <quote> The vxlan driver sets itself as the socket owner for all the TX flows it encapsulates (using vxlan_set_owner()) and assigns it's own skb destructor. This causes all tunneled traffic to land up on only one TXQ as all encapsulated skbs refer to the vxlan socket and not the original socket. Also, the vxlan skb destructor breaks some functionality for tunneled traffic like wmem accounting and as TCP small queues and FQ/pacing packet scheduler. </quote> I reworked Sathya patch and added some explanations. vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair and gain better drop monitor accuracy, by calling kfree_skb() when appropriate. The UDP socket used by vxlan to perform encapsulation of xmit packets do not need to be alive while packets leave vxlan code. Its better to keep original socket ownership to get proper feedback from qdisc and NIC layers. We use skb->sk to A) control amount of bytes/packets queued on behalf of a socket, but prior vxlan code did the skb->sk transfert without any limit/control on vxlan socket sk_sndbuf. B) security purposes (as selinux) or netfilter uses, and I do not think anything is prepared to handle vxlan stacked case in this area. By not changing ownership, vxlan tunnels behave like other tunnels. As Stephen mentioned, we might do the same change in L2TP. Reported-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
TCP out_of_order_queue lock is not used, as queue manipulation happens with socket lock held and we therefore use the lockless skb queue routines (as __skb_queue_head()) We can use __skb_queue_head_init() instead of skb_queue_head_init() to make this more consistent. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
It does not make sense to create an anycast address for an /128-prefix. Suppress it. As 32019e65 ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail <fx.lebail@yahoo.com> Cc: Thomas Haller <thaller@redhat.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Williams 提交于
The existing serial state notification handling expected older Option devices, having a hardcoded assumption that the Modem port was always USB interface #2. That isn't true for devices from the past few years. hso_serial_state_notification is a local cache of a USB Communications Interface Class SERIAL_STATE notification from the device, and the USB CDC specification (section 6.3, table 67 "Class-Specific Notifications") defines wIndex as the USB interface the event applies to. For hso devices this will always be the Modem port, as the Modem port is the only port which is set up to receive them by the driver. So instead of always expecting USB interface #2, instead validate the notification with the actual USB interface number of the Modem port. Signed-off-by: NDan Williams <dcbw@redhat.com> Tested-by: NH. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Veaceslav Falico 提交于
It returns 0 in case of success, !0 error otherwise. Fix the improper error verification. Fixes: 2c9839c1 ("bonding: add num_grat_arp attribute netlink support") CC: sfeldma@cumulusnetworks.com CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: NVeaceslav Falico <vfalico@redhat.com> Acked-by: NScott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 hayeswang 提交于
Replace the boolean value with the error code for the return value of the rtl_ops_init(). Signed-off-by: NHayes Wang <hayeswang@realtek.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 hayeswang 提交于
Some information of the device may be used in other functions. Move the relative code to make sure it would be initialzed correctly before using it. Signed-off-by: NHayes Wang <hayeswang@realtek.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 hayeswang 提交于
Replace the tabs of the variables declaration with the spaces. Signed-off-by: NHayes Wang <hayeswang@realtek.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guenter Roeck 提交于
With arm:allmodconfig, building the Teles PCI driver fails with telespci.c:294:2: error: #error "not running on big endian machines now" Similar, building the driver for HFC PCI-Bus cards fails with hfc_pci.c:1647:2: error: #error "not running on big endian machines now" Remove the big endian cpp check from both drivers to fix the build errors. Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vijay Subramanian 提交于
Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. >From the IETF draft below: " Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " Many thanks to Dave Taht for extensive feedback, reviews, testing and suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and suggestions. Naeem Khademi and Dave Taht independently contributed to ECN support. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NMythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid': net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function) net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1 Just a missing include of net/tcp_states.h Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables由 David S. Miller 提交于
Pablo Neira Ayuso says: <pablo@netfilter.org> ==================== nftables updates for net-next The following patchset contains nftables updates for your net-next tree, they are: * Add set operation to the meta expression by means of the select_ops() infrastructure, this allows us to set the packet mark among other things. From Arturo Borrero Gonzalez. * Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel Borkmann. * Add new queue expression to nf_tables. These comes with two previous patches to prepare this new feature, one to add mask in nf_tables_core to evaluate the queue verdict appropriately and another to refactor common code with xt_NFQUEUE, from Eric Leblond. * Do not hide nftables from Kconfig if nfnetlink is not enabled, also from Eric Leblond. * Add the reject expression to nf_tables, this adds the missing TCP RST support. It comes with an initial patch to refactor common code with xt_NFQUEUE, again from Eric Leblond. * Remove an unused variable assignment in nf_tables_dump_set(), from Michal Nazarewicz. * Remove the nft_meta_target code, now that Arturo added the set operation to the meta expression, from me. * Add help information for nf_tables to Kconfig, also from me. * Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is available to other nf_tables objects, requested by Arturo, from me. * Expose the table usage counter, so we can know how many chains are using this table without dumping the list of chains, from Tomasz Bursztyka. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next由 David S. Miller 提交于
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to i40e only. Majority of this series contains patches from Greg and Mitch to fix up or add functionality to the PF/VF driver interactions. Notably, a fix for SR-IOV VF port VLAN which resolved the problem of port VLAN configurations not being persistent across VF driver loads and unloads and enable/disable of the feature. Also do not enable the default port on the VEB, which is designed only to bridge the PF to an Open vSwitch or bridge. Another fix to resolve a possible memory corruption condition where ARQ messages are written to random memory locations. Fix a problem where the 'ip link show' command would display stale link address information after the link address was set via the 'ip link set' command. Anjali provides several patches, one which saves information that can be used while cleaning the Tx ring and useful in detecting Tx hangs. Then provides a fixes to the admin queue shutdown function to ensure we are shutting down the queue in the shutdown path and ensure ASQ is alive before issuing the admin queue command. Shannon provides a fix for get/update vsi params where the incorrect struct was being used. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Sathya Perla says: ==================== be2net: patch set Pls apply the following bug fixes to the 'net' tree. Thanks. Suresh Reddy (2): be2net: increase the timeout value for loopback-test FW cmd be2net: fix max_evt_qs calculation for BE3 in SR-IOV config Vasundhara Volam (1): be2net: disable RSS when number of RXQs is reduced to 1 via set-channels ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Suresh Reddy 提交于
The driver wrongly assumes 16 EQs/vectors are available for each BE3 PF. When SR-IOV is enabled, a BE3 PF can support only a max of 8 EQs. Signed-off-by: NSuresh Reddy <suresh.reddy@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Suresh Reddy 提交于
The loopback test FW cmd may need upto 15 seconds to complete on certain PHYs. This patch also fixes the name of the completion variable used to synchronize FW cmd completions as it not used by the flashing cmd alone anymore. Signed-off-by: NSuresh Reddy <suresh.reddy@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
When *only* the default RXQ is used, the RSS policy must be disabled so that all IP and no-IP traffic is placed into the default RXQ. If not, IP traffic is dropped. Also, issue the RSS_CONFIG cmd only if FW advertises RSS capability for the interface. Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: NSathya Perla <sathya.perla@emulex.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-