- 20 2月, 2016 6 次提交
-
-
由 Alexei Starovoitov 提交于
. avoid walking the stack when there is no room left in the buffer . generalize get_perf_callchain() to be called from bpf helper Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kan Liang 提交于
This patch implements sub command ETHTOOL_SCOALESCE for ioctl ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to set coalesce of each masked queue to device driver. The wanted coalesce information are stored in "data" for each masked queue, which can copy from userspace. If it fails to set coalesce to device driver, the value which already set to specific queue will be tried to rollback. Signed-off-by: NKan Liang <kan.liang@intel.com> Reviewed-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kan Liang 提交于
This patch implements sub command ETHTOOL_GCOALESCE for ioctl ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to get coalesce of each masked queue from device driver. Then the interrupt coalescing parameters will be copied back to user space one by one. Signed-off-by: NKan Liang <kan.liang@intel.com> Reviewed-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kan Liang 提交于
Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting. The following patches will enable some SUB_COMMANDs for per queue setting. Signed-off-by: NKan Liang <kan.liang@intel.com> Reviewed-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Decotigny 提交于
Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic way. Tested: unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE, ARM. Signed-off-by: NDavid Decotigny <decot@googlers.com> Reviewed-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Currently mdb entries are exported directly as a structure inside MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without breaking user-space. In order to export new mdb fields, I've converted the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before with struct br_mdb_entry (without header, as it's casted directly in iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we keep compatibility with older users and can export new data. I've tested this with iproute2, both with and without support for the added attribute and it works fine. So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry inside but it may contain also some additional MDBA_MDB_EATTR_ attributes such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space. So the new structure is: [MDBA_MDB] = { [MDBA_MDB_ENTRY] = { [MDBA_MDB_ENTRY_INFO] [MDBA_MDB_ENTRY_INFO] { <- Nested attribute struct br_mdb_entry <- nla_put_nohdr() [MDBA_MDB_ENTRY attributes] <- normal netlink attributes } } } Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 2月, 2016 9 次提交
-
-
由 Yuval Mintz 提交于
Today, interfaces are working in vlan-promisc mode; But once vlan filtering offloaded would be supported, we'll need a method to control it directly [e.g., when setting device to PROMISC, or when running out of vlan credits]. This adds the necessary API for L2 client to manually choose whether to accept all vlans or only those for which filters were configured. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch takes advantage of several assumptions we can make about the headers of the frame in order to reduce overall processing overhead for computing the outer header checksum. First we can assume the entire header is in the region pointed to by skb->head as this is what csum_start is based on. Second, as a result of our first assumption, we can just call csum_partial instead of making a call to skb_checksum which would end up having to configure things so that we could walk through the frags list. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Poirier 提交于
follows up commit 45f6fad8 ("ipv6: add complete rcu protection around np->opt") which added mixed rcu/refcount protection to np->opt. Given the current implementation of rcu_pointer_handoff(), this has no effect at runtime. Signed-off-by: NBenjamin Poirier <bpoirier@suse.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call skb_scrub_packet directly instead. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
The tun_id field in struct ip_tunnel_key is __be64, not __be32. We need to convert the vni to tun_id correctly. Fixes: 54bfd872 ("vxlan: keep flags and vni in network byte order") Reported-by: NPaolo Abeni <pabeni@redhat.com> Tested-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJiri Benc <jbenc@redhat.com> Acked-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
reverts commit 3ab1f683 ("nfnetlink: add support for memory mapped netlink")' Like previous commits in the series, remove wrappers that are not needed after mmapped netlink removal. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Following mmapped netlink removal this code can be simplified by removing the alloc wrapper. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
This reverts commit bb9b18fb ("genl: Add genlmsg_new_unicast() for unicast message allocation")'. Nothing wrong with it; its no longer needed since this was only for mmapped netlink support. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
mmapped netlink has a number of unresolved issues: - TX zerocopy support had to be disabled more than a year ago via commit 4682a035 ("netlink: Always copy on mmap TX.") because the content of the mmapped area can change after netlink attribute validation but before message processing. - RX support was implemented mainly to speed up nfqueue dumping packet payload to userspace. However, since commit ae08ce00 ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy with the socket-based interface too (via the skb_zerocopy helper). The other problem is that skbs attached to mmaped netlink socket behave different from normal skbs: - they don't have a shinfo area, so all functions that use skb_shinfo() (e.g. skb_clone) cannot be used. - reserving headroom prevents userspace from seeing the content as it expects message to start at skb->head. See for instance commit aa3a0220 ("netlink: not trim skb for mmaped socket when dump"). - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we crash because it needs the sk to check if a tx ring is attached. Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf35 ("netfilter: nfnetlink: use original skbuff when acking batches"). mmaped netlink also didn't play nicely with the skb_zerocopy helper used by nfqueue and openvswitch. Daniel Borkmann fixed this via commit 6bb0fef4 ("netlink, mmap: fix edge-case leakages in nf queue zero-copy")' but at the cost of also needing to provide remaining length to the allocation function. nfqueue also has problems when used with mmaped rx netlink: - mmaped netlink doesn't allow use of nfqueue batch verdict messages. Problem is that in the mmap case, the allocation time also determines the ordering in which the frame will be seen by userspace (A allocating before B means that A is located in earlier ring slot, but this also means that B might get a lower sequence number then A since seqno is decided later. To fix this we would need to extend the spinlocked region to also cover the allocation and message setup which isn't desirable. - nfqueue can now be configured to queue large (GSO) skbs to userspace. Queing GSO packets is faster than having to force a software segmentation in the kernel, so this is a desirable option. However, with a mmap based ring one has to use 64kb per ring slot element, else mmap has to fall back to the socket path (NL_MMAP_STATUS_COPY) for all large packets. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 2月, 2016 6 次提交
-
-
由 Jiri Benc 提交于
Prevent repeated conversions from and to network order in the fast path. To achieve this, define all flag constants in big endian order and store VNI as __be32. To prevent confusion between the actual VNI value and the VNI field from the header (which contains additional reserved byte), strictly distinguish between "vni" and "vni_field". Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
Currently, pointer to the vxlan header is kept in a local variable. It has to be reloaded whenever the pskb pull operations are performed which usually happens somewhere deep in called functions. Create a vxlan_hdr function and use it to reference the vxlan header instead. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
By packing the structure we can remove a few holes as Jamal suggests. before: struct tc_cls_u32_knode { struct tcf_exts * exts; /* 0 8 */ u8 fshift; /* 8 1 */ /* XXX 3 bytes hole, try to pack */ u32 handle; /* 12 4 */ u32 val; /* 16 4 */ u32 mask; /* 20 4 */ u32 link_handle; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ struct tc_u32_sel * sel; /* 32 8 */ /* size: 40, cachelines: 1, members: 7 */ /* sum members: 33, holes: 2, sum holes: 7 */ /* last cacheline: 40 bytes */ }; after: struct tc_cls_u32_knode { struct tcf_exts * exts; /* 0 8 */ struct tc_u32_sel * sel; /* 8 8 */ u32 handle; /* 16 4 */ u32 val; /* 20 4 */ u32 mask; /* 24 4 */ u32 link_handle; /* 28 4 */ u8 fshift; /* 32 1 */ /* size: 40, cachelines: 1, members: 7 */ /* padding: 7 */ /* last cacheline: 40 bytes */ }; Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
There are no longer any in-tree drivers that use it. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
This patch moves the qed* driver into utilizing the 8.7.3.0 FW. This new FW is required for a lot of new SW features, including: - Vlan filtering offload - Encapsulation offload support - HW ingress aggregations As well as paving the way for the possibility of adding storage protocols in the future. V2: - Fix kbuild test robot error/warnings. Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@qlogic.com> Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
Since commit 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg") used sctp_datamsg_put in sctp_sendmsg, instead of sctp_datamsg_free, this function has no use in sctp. So we will remove it. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2016 16 次提交
-
-
由 John Fastabend 提交于
This is a helper function drivers can use to learn if the action type is a drop action. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
Its useful to turn off the qdisc offload feature at a per device level. This gives us a big hammer to enable/disable offloading. More fine grained control (i.e. per rule) may be supported later. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
This patch allows netdev drivers to consume cls_u32 offloads via the ndo_setup_tc ndo op. This works aligns with how network drivers have been doing qdisc offloads for mqprio. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
This patch updates setup_tc so we can pass additional parameters into the ndo op in a generic way. To do this we provide structured union and type flag. This lets each classifier and qdisc provide its own set of attributes without having to add new ndo ops or grow the signature of the callback. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
The ndo_setup_tc() op was added to support drivers offloading tx qdiscs however only support for mqprio was ever added. So we only ever added support for passing the number of traffic classes to the driver. This patch generalizes the ndo_setup_tc op so that a handle can be provided to indicate if the offload is for ingress or egress or potentially even child qdiscs. CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Borisov 提交于
Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcpi_min_rtt reports the minimal rtt observed by TCP stack for the flow, in usec unit. Might be ~0U if not yet known. tcpi_notsent_bytes reports the amount of bytes in the write queue that were not yet sent. This is done in a single patch to not add a temporary 32bit padding hole in tcp_info. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
In case of UDP traffic with datagram length below MTU this give about 2% performance increase when tunneling over ipv4 and about 60% when tunneling over ipv6 Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
In case of UDP traffic with datagram length below MTU this give about 3% performance increase when tunneling over ipv4 and about 70% when tunneling over ipv6. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
The current ip_tunnel cache implementation is prone to a race that will cause the wrong dst to be cached on cuncurrent dst cache miss and ip tunnel update via netlink. Replacing with the generic implementation fix the issue. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
This also fix a potential race into the existing tunnel code, which could lead to the wrong dst to be permanenty cached: CPU1: CPU2: <xmit on ip6_tunnel> <cache lookup fails> dst = ip6_route_output(...) <tunnel params are changed via nl> dst_cache_reset() // no effect, // the cache is empty dst_cache_set() // the wrong dst // is permanenty stored // into the cache With the new dst implementation the above race is not possible since the first cache lookup after dst_cache_reset will fail due to the timestamp check Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
This patch add a generic, lockless dst cache implementation. The need for lock is avoided updating the dst cache fields only in per cpu scope, and requiring that the cache manipulation functions are invoked with the local bh disabled. The refresh_ts and reset_ts fields are used to ensure the cache consistency in case of cuncurrent cache update (dst_cache_set*) and reset operation (dst_cache_reset). Consider the following scenario: CPU1: CPU2: <cache lookup with emtpy cache: it fails> <get dst via uncached route lookup> <related configuration changes> dst_cache_reset() dst_cache_set() The dst entry set passed to dst_cache_set() should not be used for later dst cache lookup, because it's obtained using old configuration values. Since the refresh_ts is updated only on dst_cache lookup, the cached value in the above scenario will be discarded on the next lookup. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Keller, Jacob E 提交于
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops incorrectly allow SCHANNELS when it would conflict with the settings from SRXFH. This occurs because it is not possible for drivers to understand whether their Rx flow indirection table has been configured or is in the default state. In addition, drivers currently behave in various ways when increasing the number of Rx channels. Some drivers will always destroy the Rx flow indirection table when this occurs, whether it has been set by the user or not. Other drivers will attempt to preserve the table even if the user has never modified it from the default driver settings. Neither of these situation is desirable because it leads to unexpected behavior or loss of user configuration. The correct behavior is to simply return -EINVAL when SCHANNELS would conflict with the current Rx flow table settings. However, it should only do so if the current settings were modified by the user. If we required that the new settings never conflict with the current (default) Rx flow settings, we would force users to first reduce their Rx flow settings and then reduce the number of Rx channels. This patch proposes a solution implemented in net/core/ethtool.c which ensures that all drivers behave correctly. It checks whether the RXFH table has been configured to non-default settings, and stores this information in a private netdev flag. When the number of channels is requested to change, it first ensures that the current Rx flow table is not going to assign flows to now disabled channels. Signed-off-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2016 3 次提交
-
-
由 Edward Cree 提交于
Signed-off-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edward Cree 提交于
All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edward Cree 提交于
The only protocol affected at present is Geneve. Signed-off-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-