- 09 1月, 2018 13 次提交
-
-
由 Pablo Neira Ayuso 提交于
We cannot make a direct call to nf_ip6_reroute() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define reroute indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Pablo Neira Ayuso 提交于
We cannot make a direct call to nf_ip6_route() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define route indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Pablo Neira Ayuso 提交于
This is only used by nf_queue.c and this function comes with no symbol dependencies with IPv6, it just refers to structure layouts. Therefore, we can replace it by a direct function call from where it belongs. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Pablo Neira Ayuso 提交于
We cannot make a direct call to nf_ip6_checksum_partial() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define checksum_partial indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Pablo Neira Ayuso 提交于
We cannot make a direct call to nf_ip6_checksum() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define checksum indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Florian Westphal 提交于
The netfilter NAT core cannot deal with more than one NAT hook per hook location (prerouting, input ...), because the NAT hooks install a NAT null binding in case the iptables nat table (iptable_nat hooks) or the corresponding nftables chain (nft nat hooks) doesn't specify a nat transformation. Null bindings are needed to detect port collsisions between NAT-ed and non-NAT-ed connections. This causes nftables NAT rules to not work when iptable_nat module is loaded, and vice versa because nat binding has already been attached when the second nat hook is consulted. The netfilter core is not really the correct location to handle this (hooks are just hooks, the core has no notion of what kinds of side effects a hook implements), but its the only place where we can check for conflicts between both iptables hooks and nftables hooks without adding dependencies. So add nat annotation to hook_ops to describe those hooks that will add NAT bindings and then make core reject if such a hook already exists. The annotation fills a padding hole, in case further restrictions appar we might change this to a 'u8 type' instead of bool. iptables error if nft nat hook active: iptables -t nat -A POSTROUTING -j MASQUERADE iptables v1.4.21: can't initialize iptables table `nat': File exists Perhaps iptables or your kernel needs to be upgraded. nftables error if iptables nat table present: nft -f /etc/nftables/ipv4-nat /usr/etc/nftables/ipv4-nat:3:1-2: Error: Could not process rule: File exists table nat { ^^ Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Florian Westphal 提交于
currently we always return -ENOENT to userspace if we can't find a particular table, or if the table initialization fails. Followup patch will make nat table init fail in case nftables already registered a nat hook so this change makes xt_find_table_lock return an ERR_PTR to return the errno value reported from the table init function. Add xt_request_find_table_lock as try_then_request_module replacement and use it where needed. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
This can be same as NF_INET_NUMHOOKS if we don't support DECNET. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
no need to define hook points if the family isn't supported. Because we need these hooks for either nftables, arp/ebtables or the 'call-iptables' hack we have in the bridge layer add two new dependencies, NETFILTER_FAMILY_{ARP,BRIDGE}, and have the users select them. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Florian Westphal 提交于
no need to define hook points if the family isn't supported. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
The kernel already has defines for this, but they are in uapi exposed headers. Including these from netns.h causes build errors and also adds unneeded dependencies on heads that we don't need. So move these defines to netfilter_defs.h and place the uapi ones in ifndef __KERNEL__ to keep them for userspace. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
struct net contains: struct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS]; which store the hook entry point locations for the various protocol families and the hooks. Using array results in compact c code when doing accesses, i.e. x = rcu_dereference(net->nf.hooks[pf][hook]); but its also wasting a lot of memory, as most families are not used. So split the array into those families that are used, which are only 5 (instead of 13). In most cases, the 'pf' argument is constant, i.e. gcc removes switch statement. struct net before: /* size: 5184, cachelines: 81, members: 46 */ after: /* size: 4672, cachelines: 73, members: 46 */ Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Giuseppe Scrivano says: "SELinux, if enabled, registers for each new network namespace 6 netfilter hooks." Cost for this is high. With synchronize_net() removed: "The net benefit on an SMP machine with two cores is that creating a new network namespace takes -40% of the original time." This patch replaces synchronize_net+kvfree with call_rcu(). We store rcu_head at the tail of a structure that has no fixed layout, i.e. we cannot use offsetof() to compute the start of the original allocation. Thus store this information right after the rcu head. We could simplify this by just placing the rcu_head at the start of struct nf_hook_entries. However, this structure is used in packet processing hotpath, so only place what is needed for that at the beginning of the struct. Reported-by: NGiuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 06 1月, 2018 2 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Hook points for xdp_rxq_info: * reg : netif_alloc_rx_queues * unreg: netif_free_rx_queues The net_device have some members (num_rx_queues + real_num_rx_queues) and data-area (dev->_rx with struct netdev_rx_queue's) that were primarily used for exporting information about RPS (CONFIG_RPS) queues to sysfs (CONFIG_SYSFS). For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info, and remove some of the CONFIG_SYSFS ifdefs. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Jesper Dangaard Brouer 提交于
This patch only introduce the core data structures and API functions. All XDP enabled drivers must use the API before this info can used. There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending xdp_buff each time new info is needed, the patch creates a separate read-mostly struct xdp_rxq_info, that contains this info. We stress this data/cache-line is for read-only info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. The performance advantage is this info can be setup at RX-ring init time, instead of updating N-members in xdp_buff. A possible (driver level) micro optimization is that xdp_buff->rxq assignment could be done once per XDP/NAPI loop. The extra pointer deref only happens for program needing access to this info (thus, no slowdown to existing use-cases). Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 05 1月, 2018 2 次提交
-
-
由 Egil Hjelmeland 提交于
chip->phy_addr_sel_strap is declared as a bool, but is also used as an integer address base. Rename 'phy_addr_sel_strap' to 'phy_addr_base', and change type to int. Signed-off-by: NEgil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
The sockmap infrastructure is only aware of TCP sockets at the moment. In the future we plan to add UDP. In both cases CONFIG_NET should be built-in. So lets only build sockmap if CONFIG_INET is enabled. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 04 1月, 2018 4 次提交
-
-
由 Russell King 提交于
Add phy_modify() convenience accessor to complement the mdiobus counterpart. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Add a set of paged phy register accessors which are inherently safe in their design against other accesses interfering with the paged access. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Add unlocked versions of the bus accessors, which allows access to the bus with all the tracing. These accessors validate that the bus mutex is held, which is a basic requirement for all mii bus accesses. Also added is a read-modify-write unlocked accessor with the same locking requirements. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Add unlocked versions of the bus accessors, which allows access to the bus with all the tracing. These accessors validate that the bus mutex is held, which is a basic requirement for all mii bus accesses. Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 1月, 2018 8 次提交
-
-
由 Russell King 提交于
mvneta is the only user of fixed_phy_update_state(), which has been converted to use phylink instead. Remove fixed_phy_update_state(). Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Improve the support for direct-attach copper so that we avoid kernel warning messages, and report the appropriate PORT_DA type to userspace. Direct Attach cables can use a number of protocols depending on their range of speeds. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Add a helper to convert the result of the autonegotiation advertisment into the PHYs speed and duplex settings. If the result is full duplex, also extract the pause mode settings from the link partner advertisment. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Add reporting of the MDI swap to the Marvell 10G PHY driver by providing a generic implementation for the standard 10GBASE-T pair swap register and polarity register. We also support reading the MDI swap status for 1G and below from a PCS register. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tomer Tayar 提交于
Advance the qed* drivers to use firmware 8.33.1.0: Modify core driver (qed) to utilize the new FW and initialize the device with it. This is the lion's share of the patch, and includes changes to FW interface files, device initialization flows, FW interaction flows, and debug collection flows. Modify Ethernet driver (qede) to make use of new FW in fastpath. Modify RoCE/iWARP driver (qedr) to make use of new FW in fastpath. Modify FCoE driver (qedf) to make use of new FW in fastpath. Modify iSCSI driver (qedi) to make use of new FW in fastpath. Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NYuval Bason <Yuval.Bason@cavium.com> Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com> Signed-off-by: NManish Chopra <Manish.Chopra@cavium.com> Signed-off-by: NChad Dupuis <Chad.Dupuis@cavium.com> Signed-off-by: NManish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tomer Tayar 提交于
This patch renames defines and structures in the FW HSI files to allow a distinction between different types of HW. Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NChad Dupuis <Chad.Dupuis@cavium.com> Signed-off-by: NManish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tomer Tayar 提交于
This patch refactors and reorders the FW API files in preparation of upgrading the code to support new FW. - Make use of the BIT macro in appropriate places. - Whitespace changes to align values and code blocks. - Comments are updated (spelling mistakes, removed if not clear). - Group together code blocks which are related or deal with similar matters. Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
When running consumer and/or producer operations and empty checks in parallel its possible to have the empty check run past the end of the array. The scenario occurs when an empty check is run while __ptr_ring_discard_one() is in progress. Specifically after the consumer_head is incremented but before (consumer_head >= ring_size) check is made and the consumer head is zeroe'd. To resolve this, without having to rework how consumer/producer ops work on the array, simply add an extra dummy slot to the end of the array. Even if we did a rework to avoid the extra slot it looks like the normal case checks would suffer some so best to just allocate an extra pointer. Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com> Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 12月, 2017 5 次提交
-
-
由 Jakub Kicinski 提交于
Report to the user ifindex and namespace information of offloaded programs. If device has disappeared return -ENODEV. Specify the namespace using dev/inode combination. CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
ns_get_path() takes struct task_struct and proc_ns_ops as its parameters. For path resolution directly from a namespace, e.g. based on a networking device's net name space, we need more flexibility. Add a ns_get_path_cb() helper which will allow callers to use any method of obtaining the name space reference. Convert ns_get_path() to use ns_get_path_cb(). Following patches will bring a networking user. CC: Eric W. Biederman <ebiederm@xmission.com> Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
Bound programs are quite useless after their device disappears. They are simply waiting for reference count to go to zero, don't list them in BPF_PROG_GET_NEXT_ID by freeing their ID early. Note that orphaned offload programs will return -ENODEV on BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
To allow verifier instruction callbacks without any extra locking NETDEV_UNREGISTER notification would wait on a waitqueue for verifier to finish. This design decision was made when rtnl lock was providing all the locking. Use the read/write lock instead and remove the workqueue. Verifier will now call into the offload code, so dev_ops are moved to offload structure. Since verifier calls are all under bpf_prog_is_dev_bound() we no longer need static inline implementations to please builds with CONFIG_NET=n. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
We currently use aux->offload to indicate that program is bound to a specific device. This forces us to keep the offload structure around even after the device is gone. Add a bool member to struct bpf_prog_aux to indicate if offload was requested. Suggested-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 29 12月, 2017 1 次提交
-
-
由 Gal Pressman 提交于
Each vport has its own root flow table for the ACL flow tables and root flow table is per namespace, therefore we should create a namespace for each vport. Fixes: efdc810b ("net/mlx5: Flow steering, Add vport ACL support") Signed-off-by: NGal Pressman <galp@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
- 28 12月, 2017 2 次提交
-
-
由 Alexei Starovoitov 提交于
Instead of computing max stack depth for current call chain during the main verifier pass track stack depth of each function independently and after do_check() is done do another pass over all instructions analyzing depth of all possible call stacks. Fixes: f4d7e40a ("bpf: introduce function calls (verification)") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Richard Leitner 提交于
As suggested by Rob Herring [1] rename the previously introduced reset-{,post-}delay-us bindings to the clearer reset-{,de}assert-us [1] https://patchwork.kernel.org/patch/10104905/Signed-off-by: NRichard Leitner <richard.leitner@skidata.com> Reviewed-by: NRob Herring <robh@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 12月, 2017 1 次提交
-
-
由 Leon Romanovsky 提交于
ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with two exceptions. First, it prints stack for multiple hits and not only once as WARN_ONCE() does. Second, the user can disable prints of WARN_ONCE by setting CONFIG_BUG to N. The multiple prints of dump stack are actually not needed, because calls without rtnl lock are programming errors and user can't do anything about them except to complain to the mailing list after first occurrence of such failure. The user who disabled BUG/WARN prints did it explicitly because by default in upstream kernel and distributions this option is enabled. It means that user doesn't want to see prints about missing locks too. This patch replaces open-coded variant in favor of already existing macro and change error prints to be once only. Reviewed-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 12月, 2017 2 次提交
-
-
由 Majd Dibbiny 提交于
Congestion counters are counted and queried per physical function. When working in LAG mode, CNP packets can be sent or received on both of the functions, thus congestion counters should be aggregated from the two physical functions. Fixes: e1f24a79 ("IB/mlx5: Support congestion related counters") Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NAviv Heller <avivh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Shaohua Li 提交于
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: NShaohua Li <shli@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-