- 15 10月, 2014 6 次提交
-
-
由 Alexei Starovoitov 提交于
userspace programs that use eBPF instruction macros need to include two files: uapi/linux/filter.h and uapi/linux/bpf.h Move common macro definitions that are shared between classic BPF and eBPF into uapi/linux/bpf_common.h, so that user app can include only one bpf.h file Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tilman Schmidt 提交于
Function capi20_manufacturer() is declared with unsigned int cmd argument but called with unsigned long. Fix by correcting the function prototype since the actual argument is part of the user visible API. Spotted with Coverity. Signed-off-by: NTilman Schmidt <tilman@imap.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Li RongQing 提交于
no user uses this lock. Signed-off-by: NLi RongQing <roy.qing.li@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Aring 提交于
If the skb is not dropped afterwards we should run consume_skb instead kfree_skb. Inside of function skb_unshare we do always a kfree_skb, doesn't depend if skb_copy failed or was successful. This patch switch this behaviour like skb_share_check, if allocation of sk_buff failed we use kfree_skb otherwise consume_skb. Signed-off-by: NAlexander Aring <alex.aring@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit 2e3216cd changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Commit 6f4c618d ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8144fb1c>] skb_put+0x5c/0x70 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp] [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp] [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp] [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp] [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp] [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp] [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440 [<ffffffff81496ac5>] ip_rcv+0x275/0x350 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750 [<ffffffff81460588>] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification *and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NVlad Yasevich <vyasevich@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 10月, 2014 1 次提交
-
-
由 Sascha Hauer 提交于
The KSZ8021 and KSZ8031 support RMII reference input clocks of 25MHz and 50MHz. Both PHYs differ in the default frequency they expect after reset. If this differs from the actual input clock, then register 0x1f bit 7 must be changed. Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2014 3 次提交
-
-
由 Eric Dumazet 提交于
Add two helpers so that drivers do not have to care of BQL being available or not. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NJim Davis <jim.epost@gmail.com> Fixes: 29d40c90 ("net/mlx4_en: Use prefetch in tx path") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Masanari Iida 提交于
In commit 7bced397, dma_cookie was removed from struct skbuff. But the description of dma_cookie still exist. So the "make xmldocs" output following warning. Warning(.//include/linux/skbuff.h:609): Excess struct/union /enum/typedef member 'dma_cookie' description in 'sk_buff' Remove description of dma_cookie fix the symptom. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 LEROY Christophe 提交于
In the probe function, use_napi is inconditionnaly set to 1. This patch removes all the code which is conditional to !use_napi, and removes use_napi which has then become useless. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2014 4 次提交
-
-
由 Pablo Neira Ayuso 提交于
NFT_REJECT_ICMPX_MAX should be __NFT_REJECT_ICMPX_MAX - 1. nft_reject_icmp_code() and nft_reject_icmpv6_code() are called from the packet path, so BUG_ON in case we try to access an unknown abstracted ICMP code. This should not happen since we already validate this from nft_reject_{inet,bridge}_init(). Fixes: 51b0a5d8 ("netfilter: nft_reject: introduce icmp code abstraction for inet and bridge") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c Fixes: c8d7b98b ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules") Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Acked-by: NEric Dumazet <edumazet@google.com>
-
由 Eric Dumazet 提交于
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Petri Gynther 提交于
Adjust fixed_phy_register() to return struct phy_device *, so that it becomes easy to use fixed PHYs without device tree support: phydev = fixed_phy_register(PHY_POLL, &fixed_phy_status, NULL); fixed_phy_set_link_update(phydev, fixed_phy_link_update); phy_connect_direct(netdev, phydev, handler_fn, phy_interface); This change is a prerequisite for modifying bcmgenet driver to work without a device tree on Broadcom's MIPS-based 7xxx platforms. Signed-off-by: NPetri Gynther <pgynther@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 10月, 2014 6 次提交
-
-
由 Andy Zhou 提交于
Fix a openvswitch compilation error when CONFIG_INET is not set: ===================================================== In file included from include/net/geneve.h:4:0, from net/openvswitch/flow_netlink.c:45: include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads': >> include/net/udp_tunnel.h:100:2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration] >> return iptunnel_handle_offloads(skb, udp_csum, type); >> ^ >> >> include/net/udp_tunnel.h:100:2: warning: return makes pointer from integer without a cast >> >> cc1: some warnings being treated as errors ===================================================== Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NAndy Zhou <azhou@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Try to reduce number of possible fn_sernum mutation by constraining them to their namespace. Also remove rt_genid which I forgot to remove in 705f1c86 ("ipv6: remove rt6i_genid"). Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Also renamed struct fib6_walker_t to fib6_walker and enum fib_walk_state_t to fib6_walk_state as recommended by Cong Wang. Cc: Cong Wang <cwang@twopensource.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Martin Lau <kafai@fb.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
This removes the tcf_proto argument from the ematch code paths that only need it to reference the net namespace. This allows simplifying qdisc code paths especially when we need to tear down the ematch from an RCU callback. In this case we can not guarentee that the tcf_proto structure is still valid. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NCong Wang <cwang@twopensource.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Some TSO engines might have a too heavy setup cost, that impacts performance on hosts sending small bursts (2 MSS per packet). This patch adds a device gso_min_segs, allowing drivers to set a minimum segment size for TSO packets, according to the NIC performance. Tested on a mlx4 NIC, this allows to get a ~110% increase of throughput when sending 2 MSS per packet. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 10月, 2014 8 次提交
-
-
由 Eric Dumazet 提交于
Use new ethtool [sg]et_tunable() to set tx_copybread (inline threshold) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
- Remove unused variable ring->poll_cnt - No need to set some fields if using blueflame - Add missing const's - Use unlikely - Remove unneeded new line - Make some comments more precise - struct mlx4_bf @offset field reduced to unsigned int to save space Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Standard qdisc API to setup a timer implies an atomic operation on every packet dequeue : qdisc_unthrottled() It turns out this is not really needed for FQ, as FQ has no concept of global qdisc throttling, being a qdisc handling many different flows, some of them can be throttled, while others are not. Fix is straightforward : add a 'bool throttle' to qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled() in sch_fq. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesse Gross 提交于
The Openvswitch implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array. Signed-off-by: NJesse Gross <jesse@nicira.com> Singed-off-by: NAnsis Atteka <aatteka@nicira.com> Signed-off-by: NAndy Zhou <azhou@nicira.com> Acked-by: NThomas Graf <tgraf@noironetworks.com> Acked-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesse Gross 提交于
Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers. This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup. Signed-off-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NAndy Zhou <azhou@nicira.com> Acked-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesse Gross 提交于
Some tunnel formats have mechanisms for indicating that packets are OAM frames that should be handled specially (either as high priority or not forwarded beyond an endpoint). This provides support for allowing those types of packets to be matched. Signed-off-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NAndy Zhou <azhou@nicira.com> Acked-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andy Zhou 提交于
This adds a device level support for Geneve -- Generic Network Virtualization Encapsulation. The protocol is documented at http://tools.ietf.org/html/draft-gross-geneve-01 Only protocol layer Geneve support is provided by this driver. Openvswitch can be used for configuring, set up and tear down functional Geneve tunnels. Signed-off-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NAndy Zhou <azhou@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Yasevich 提交于
Currently association restarts do not take into consideration the state of the socket. When a restart happens, the current assocation simply transitions into established state. This creates a condition where a remote system, through a the restart procedure, may create a local association that is no way reachable by user. The conditions to trigger this are as follows: 1) Remote does not acknoledge some data causing data to remain outstanding. 2) Local application calls close() on the socket. Since data is still outstanding, the association is placed in SHUTDOWN_PENDING state. However, the socket is closed. 3) The remote tries to create a new association, triggering a restart on the local system. The association moves from SHUTDOWN_PENDING to ESTABLISHED. At this point, it is no longer reachable by any socket on the local system. This patch addresses the above situation by moving the newly ESTABLISHED association into SHUTDOWN-SENT state and bundling a SHUTDOWN after the COOKIE-ACK chunk. This way, the restarted associate immidiately enters the shutdown procedure and forces the termination of the unreachable association. Reported-by: NDavid Laight <David.Laight@aculab.com> Signed-off-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 10月, 2014 2 次提交
-
-
由 Sébastien Barré 提交于
the inet6 state INET6_IFADDR_STATE_UP only appeared in its definition. Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NSébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vijay Subramanian 提交于
SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb. 1: If skb is allocated from head_cache, it indicates fclone is not available. 2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates it is available to be used. To avoid confusion for case 2 above, this patch replaces SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone companion skbs, this indicates it is free for use. SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and cannot / will not have a companion fclone. Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 10月, 2014 10 次提交
-
-
由 Tom Herbert 提交于
This patch allows configuring IPIP, sit, and GRE tunnels to use GUE. This is very similar to fou excpet that we need to insert the GUE header in addition to the UDP header on transmit. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch adds support receiving for GUE packets in the fou module. The fou module now supports direct foo-over-udp (no encapsulation header) and GUE. To support this a type parameter is added to the fou netlink parameters. For a GUE socket we define gue_udp_recv, gue_gro_receive, and gue_gro_complete to handle the specifics of the GUE protocol. Most of the code to manage and configure sockets is common with the fou. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch removes fou[46]_gro_receive and fou[46]_gro_complete functions. The v4 or v6 variants were chosen for the UDP offloads based on the address family of the socket this is not necessary or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb. This is set in udp6_gro_receive and unset in udp4_gro_receive. In fou_gro_receive the value is used to select the correct inet_offloads for the protocol of the outer IP header. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
This patch puts a common part as the first field of mlx5_core_qp. This field is used to identify which resource generated an event. This is required since upcoming new resource types such as DC targets are allocated for the same numerical space as regular QPs and may generate the same events. By searching the resource in the same table we can then look at the common field to identify the resource. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
Transform device capabilities related commands to use set/get macros to manipulate command mailboxes. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
Add an auto generated header file that describes hardware registers along with set of macros that set/get values. The macros do static checks to avoid overflow, handle endianess, and overall provide a clean way to code commands. Currently the header file is small and we will add structs as we make use of the macros. A few commands were removed from the commands enum since they are not supported currently and will be added when support is available. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
Rearrange struct mlx5_caps so it has a "gen" field to represent the current capabilities configured for the device. Max capabilities can also be queried from the device. Also update capabilities struct to contain more fields as per the latest revision if firmware specification. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Validation of skb can be pretty expensive : GSO segmentation and/or checksum computations. We can do this without holding qdisc lock, so that other cpus can queue additional packets. Trick is that requeued packets were already validated, so we carry a boolean so that sch_direct_xmit() can validate a fresh skb list, or directly use an old one. Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads host. Turning TSO on or off had no effect on throughput, only few more cpu cycles. Lock contention on qdisc lock disappeared. Same if disabling TX checksum offload. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
The return value is not used by callers of these functions so change the functions to return void. Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jesper Dangaard Brouer 提交于
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows sending/processing an entire skb list. This patch implements qdisc bulk dequeue, by allowing multiple packets to be dequeued in dequeue_skb(). The optimization principle for this is two fold, (1) to amortize locking cost and (2) avoid expensive tailptr update for notifying HW. (1) Several packets are dequeued while holding the qdisc root_lock, amortizing locking cost over several packet. The dequeued SKB list is processed under the TXQ lock in dev_hard_start_xmit(), thus also amortizing the cost of the TXQ lock. (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more API to delay HW tailptr update, which also reduces the cost per packet. One restriction of the new API is that every SKB must belong to the same TXQ. This patch takes the easy way out, by restricting bulk dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the qdisc only have attached a single TXQ. Some detail about the flow; dev_hard_start_xmit() will process the skb list, and transmit packets individually towards the driver (see xmit_one()). In case the driver stops midway in the list, the remaining skb list is returned by dev_hard_start_xmit(). In sch_direct_xmit() this returned list is requeued by dev_requeue_skb(). To avoid overshooting the HW limits, which results in requeuing, the patch limits the amount of bytes dequeued, based on the drivers BQL limits. In-effect bulking will only happen for BQL enabled drivers. Small amounts for extra HoL blocking (2x MTU/0.24ms) were measured at 100Mbit/s, with bulking 8 packets, but the oscillating nature of the measurement indicate something, like sched latency might be causing this effect. More comparisons show, that this oscillation goes away occationally. Thus, we disregard this artifact completely and remove any "magic" bulking limit. For now, as a conservative approach, stop bulking when seeing TSO and segmented GSO packets. They already benefit from bulking on their own. A followup patch add this, to allow easier bisect-ability for finding regressions. Jointed work with Hannes, Daniel and Florian. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-