- 11 5月, 2018 9 次提交
-
-
由 David Ahern 提交于
Provide a helper for doing a FIB and neighbor lookup in the kernel tables from an XDP program. The helper provides a fastpath for forwarding packets. If the packet is a local delivery or for any reason is not a simple lookup and forward, the packet continues up the stack. If it is to be forwarded, the forwarding can be done directly if the neighbor is already known. If the neighbor does not exist, the first few packets go up the stack for neighbor resolution. Once resolved, the xdp program provides the fast path. On successful lookup the nexthop dmac, current device smac and egress device index are returned. The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6 are implemented in this patch. The API includes layer 4 parameters if the XDP program chooses to do deep packet inspection to allow compare against ACLs implemented as FIB rules. Header rewrite is left to the XDP program. The lookup takes 2 flags: - BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes straight to the table associated with the device (expert setting for those looking to maximize throughput) - BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective. Default is an ingress lookup. Initial performance numbers collected by Jesper, forwarded packets/sec: Full stack XDP FIB lookup XDP Direct lookup IPv4 1,947,969 7,074,156 7,415,333 IPv6 1,728,000 6,165,504 7,262,720 These number are single CPU core forwarding on a Broadwell E5-1650 v4 @ 3.60GHz. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table, a stub to do a lookup in a specific table, fib6_table_lookup, and a stub for a full route lookup. The stubs are needed for core bpf code to handle the case when the IPv6 module is not builtin. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
Similar to IPv4, IPv6 should use the FIB lookup result in the tracepoint. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
Add IPv6 equivalent to fib_lookup. Does a fib lookup, including rules, but returns a FIB entry, fib6_info, rather than a dst based rt6_info. fib6_lookup is any where from 140% (MULTIPLE_TABLES config disabled) to 60% faster than any of the dst based lookup methods (without custom rules) and 25% faster with custom rules (e.g., l3mdev rule). Since the lookup function has a completely different signature, fib6_rule_action is split into 2 paths: the existing one is renamed __fib6_rule_action and a new one for the fib6_info path is added. fib6_rule_action decides which to call based on the lookup_ptr. If it is fib6_table_lookup then the new path is taken. Caller must hold rcu lock as no reference is taken on the returned fib entry. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
Move source address lookup from fib6_rule_action to a helper. It will be used in a later patch by a second variant for fib6_rule_action. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
ip6_pol_route is used for ingress and egress FIB lookups. Refactor it moving the table lookup into a separate fib6_table_lookup that can be invoked separately and export the new function. ip6_pol_route now calls fib6_table_lookup and uses the result to generate a dst based rt6_info. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
Rename rt6_multipath_select to fib6_multipath_select and export it. A later patch wants access to it similar to IPv4's fib_select_path. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
Rename fib6_lookup to fib6_node_lookup to better reflect what it returns. The fib6_lookup name will be used in a later patch for an IPv6 equivalent to IPv4's fib_lookup. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Wang YanQing 提交于
For me, as a reader whose mother language isn't English, the old words bring a little difficulty to catch the meaning, this patch rewords the subsection in a more clarificatory way. This patch also add blank lines as separator at two places to improve readability. Signed-off-by: NWang YanQing <udknight@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 10 5月, 2018 6 次提交
-
-
由 Sirio Balmelli 提交于
Update .gitignore files. Signed-off-by: NSirio Balmelli <sirio@b-ad.ch> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Sirio Balmelli 提交于
The BPF selftests fail to build with missing headers 'asm/bitsperlong.h' and 'asm/errno.h'. These already exist in 'tools/arch/[arch]/include'; add architecture-agnostic header files in 'tools/include/uapi' to reference them. Signed-off-by: NSirio Balmelli <sirio@b-ad.ch> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Björn Töpel 提交于
i386 builds report: net/xdp/xdp_umem.o: In function `xdp_umem_reg': xdp_umem.c:(.text+0x47e): undefined reference to `__udivdi3' This fix uses div_u64 instead of the GCC built-in. Fixes: c0c77d8f ("xsk: add user memory registration support sockopt") Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Reported-by: NRandy Dunlap <rdunlap@infradead.org> Tested-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Daniel Borkmann 提交于
Jakub Kicinski says: ==================== This small series adds a feature which extends BPF offload beyond a pure host processing offload and firmly into the realm of heterogeneous processing. Allowing offloaded XDP programs to set the RX queue index opens the door for defining fully programmable RSS/n-tuple filter replacement. In fact the device datapath will skip the RSS processing completely if BPF decided on the queue already, making the XDP program replace part of the standard NIC datapath. We hope some day the entire NIC datapath will be defined by BPF :) ==================== Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Jakub Kicinski 提交于
BPF has access to all internal FW datapath structures. Including the structure containing RX queue selection. With little coordination with the datapath we can let the offloaded BPF select the RX queue. We just need a way to tell the datapath that queue selection has already been done and it shouldn't overwrite it. Define a bit to tell datapath BPF already selected a queue (QSEL_SET), if the selected queue is not enabled (>= number of enabled queues) datapath will perform normal RSS. BPF queue selection on the NIC can be used to replace standard datapath RSS with fully programmable BPF/XDP RSS. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
It's fairly easy for offloaded XDP programs to select the RX queue packets go to. We need a way of expressing this in the software. Allow write to the rx_queue_index field of struct xdp_md for device-bound programs. Skip convert_ctx_access callback entirely for offloads. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 09 5月, 2018 9 次提交
-
-
由 Daniel Borkmann 提交于
Martin KaFai Lau says: ==================== This series introduces BTF ID which is exposed through the new BPF_BTF_GET_FD_BY_ID cmd, new "struct bpf_btf_info" and new members in the "struct bpf_map_info". Please see individual patch for details. ==================== Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch adds test for BPF_BTF_GET_FD_BY_ID and the new btf_id/btf_key_id/btf_value_id in the "struct bpf_map_info". It also modifies the existing BPF_OBJ_GET_INFO_BY_FD test to reflect the new "struct bpf_btf_info". Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch sync the tools/include/uapi/linux/btf.h with the newly introduced BTF ID support. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch adds a CHECK() macro for condition checking and error report purpose. Something similar to test_progs.c It also counts the number of tests passed/skipped/failed and print them at the end of the test run. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's info.info is directly filled with the BTF binary data. It is not extensible. In this case, we want to add BTF ID. This patch adds "struct bpf_btf_info" which has the BTF ID as one of its member. The BTF binary data itself is exposed through the "btf" and "btf_size" members. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch gives an ID to each loaded BTF. The ID is allocated by the idr like the existing prog-id and map-id. The bpf_put(map->btf) is moved to __bpf_map_put() so that the userspace can stop seeing the BTF ID ASAP when the last BTF refcnt is gone. It also makes BTF accessible from userspace through the 1. new BPF_BTF_GET_FD_BY_ID command. It is limited to CAP_SYS_ADMIN which is inline with the BPF_BTF_LOAD cmd and the existing BPF_[MAP|PROG]_GET_FD_BY_ID cmd. 2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info" Once the BTF ID handler is accessible from userspace, freeing a BTF object has to go through a rcu period. The BPF_BTF_GET_FD_BY_ID cmd can then be done under a rcu_read_lock() instead of taking spin_lock. [Note: A similar rcu usage can be done to the existing bpf_prog_get_fd_by_id() in a follow up patch] When processing the BPF_BTF_GET_FD_BY_ID cmd, refcount_inc_not_zero() is needed because the BTF object could be already in the rcu dead row . btf_get() is removed since its usage is currently limited to btf.c alone. refcount_inc() is used directly instead. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
If CONFIG_REFCOUNT_FULL=y, refcount_inc() WARN when refcount is 0. When creating a new btf, the initial btf->refcnt is 0 and triggered the following: [ 34.855452] refcount_t: increment on 0; use-after-free. [ 34.856252] WARNING: CPU: 6 PID: 1857 at lib/refcount.c:153 refcount_inc+0x26/0x30 .... [ 34.868809] Call Trace: [ 34.869168] btf_new_fd+0x1af6/0x24d0 [ 34.869645] ? btf_type_seq_show+0x200/0x200 [ 34.870212] ? lock_acquire+0x3b0/0x3b0 [ 34.870726] ? security_capable+0x54/0x90 [ 34.871247] __x64_sys_bpf+0x1b2/0x310 [ 34.871761] ? __ia32_sys_bpf+0x310/0x310 [ 34.872285] ? bad_area_access_error+0x310/0x310 [ 34.872894] do_syscall_64+0x95/0x3f0 This patch uses refcount_set() instead. Reported-by: NYonghong Song <yhs@fb.com> Tested-by: NYonghong Song <yhs@fb.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Fabio Estevam 提交于
If the example binding is used on a real dts file, the following DTC warning is seen with W=1: arch/arm/boot/dts/imx6q-b450v3.dtb: Warning (avoid_unnecessary_addr_size): /mdio-gpio/switch@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property Remove unnecessary #address-cells/#size-cells to improve the binding document examples. Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com> Reviewed-by: NRob Herring <robh@kernel.org> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
When computing the bitrate using values read from an SFP module EEPROM, we use the nominal BR plus BR,min and BR,max to determine the boundaries. But in some cases BR,min and BR,max aren't provided, which led the SFP code to end up having the nominal value for both the minimum and maximum bitrate values. When using a passive cable, the nominal value should be used as the maximum one, and there is no minimum one so we should use 0. Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com> Acked-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 5月, 2018 16 次提交
-
-
由 David S. Miller 提交于
Michael Chan says: ==================== bnxt_en: Fixes for net-next. This series includes a bug fix for a regression in firmware message polling introduced recently on net-next. There are 3 additional minor fixes for unsupported link speed checking, VF MAC address handling, and setting PHY eeprom length. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
The current code already forwards the VF MAC address to the PF, except in one case. If the VF driver gets a valid MAC address from the firmware during probe time, it will not forward the MAC address to the PF, incorrectly assuming that the PF already knows the MAC address. This causes "ip link show" to show zero VF MAC addresses for this case. This assumption is not correct. Newer firmware remembers the VF MAC address last used by the VF and provides it to the VF driver during probe. So we need to always forward the VF MAC address to the PF. The forwarded MAC address may now be the PF assigned MAC address and so we need to make sure we approve it for this case. Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
For SFP+ modules, 0xA2 page is available only when Diagnostic Monitoring Type [Address A0h, Byte 92] is implemented. Extend bnxt_get_module_info(), to read optical diagnostics support at offset 92(0x5c) and set eeprom_len length to ETH_MODULE_SFF_8436_LEN (to exclude A2 page), if dianostics is not supported. Also in bnxt_get_module_info(), module id is read from offset 0x5e which is not correct. It was working by accident, as offset was not effective without setting enables flag in the firmware request. SFP module id is present at location 0. Fix this by removing the offset and read it from location 0. Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Only non-NPAR PFs need to actively check and manage unsupported link speeds. NPAR functions and VFs do not control the link speed and should skip the unsupported speed detection logic, to avoid warning messages from firmware rejecting the unsupported firmware calls. Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
A recent change to reduce delay granularity waiting for firmware reponse has caused a regression. With a tighter delay loop, the driver may see the beginning part of the response faster. The original 5 usec delay to wait for the rest of the message is not long enough and some messages are detected as invalid. Increase the maximum wait time from 5 usec to 20 usec. Also, fix the debug message that shows the total delay time for the response when the message times out. With the new logic, the delay time is not fixed per iteration of the loop, so we define a macro to show the total delay time. Fixes: 9751e8e7 ("bnxt_en: reduce timeout on initial HWRM calls") Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhao Chen 提交于
This patch adds PCI device IDs to support 25GE and 100GE card: 1. Add device id 0x0201 for HINIC 100GE dual port card. 2. Add device id 0x0200 for HINIC 25GE dual port card. 3. Macro of device id 0x1822 is modified for HINIC 25GE quad port card. Signed-off-by: NZhao Chen <zhaochen6@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
This change fixes a couple of type mismatch reported by the sparse tool, explicitly using the requested type for the offending arguments. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
When the core networking needs to detect the transport offset in a given packet and parse it explicitly, a full-blown flow_keys struct is used for storage. This patch introduces a smaller keys store, rework the basic flow dissect helper to use it, and apply this new helper where possible - namely in skb_probe_transport_header(). The used flow dissector data structures are renamed to match more closely the new role. The above gives ~50% performance improvement in micro benchmarking around skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due to the smaller memset. Small, but measurable improvement is measured also in macro benchmarking. v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(), as per DaveM suggestion Suggested-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next由 David S. Miller 提交于
Minor conflict in ip_output.c, overlapping changes to the body of an if() statement. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Tariq Toukan says: ==================== net/ipv6 misc This patchset contains two patches for net/ipv6. Patch 1 is a trivial typo fix in documentation. Patch 2 by Eran is a re-spin. It adds GRO support for IPv6 GRE tunnel, this significantly improves performance in case GRO in native interface is disabled. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
Add GRO capability for IPv6 GRE tunnel and ip6erspan tap, via gro_cells infrastructure. Performance testing: 55% higher badwidth. Measuring bandwidth of 1 thread IPv4 TCP traffic over IPv6 GRE tunnel while GRO on the physical interface is disabled. CPU: Intel Xeon E312xx (Sandy Bridge) NIC: Mellanox Technologies MT27700 Family [ConnectX-4] Before (GRO not working in tunnel) : 2.47 Gbits/sec After (GRO working in tunnel) : 3.85 Gbits/sec Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tariq Toukan 提交于
Fix 'an' into 'and', and use a comma instead of a period. Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Sudarsana Reddy Kalluru says: ==================== qed*: Add support for new multi partitioning modes. The patch series simplifies the multi function (MF) mode implementation of qed/qede drivers, and adds support for new MF modes. Please consider applying it to net-next branch. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
This patch adds driver changes for supporting the Unified Fabric Port (UFP). This is a new paritioning mode wherein MFW provides the set of parameters to be used by the device such as traffic class, outer-vlan tag value, priority type etc. Drivers receives this info via notifications from mfw and configures the hardware accordingly. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
The patch adds support for new Multi function mode wherein the traffic classification is done based on the 802.1ad tagging and the outer vlan tag provided by the management firmware. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
The data member 'is_mf_default' is not used by the qed/qede drivers, removing the same. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-