- 10 12月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
In UDP RX handler, we currently clear skb->dev before skb is added to receive queue, because device pointer is no longer available once we exit from RCU section. Since this first cache line is always hot, lets reuse this space to store skb->truesize and thus avoid a cache line miss at udp_recvmsg()/udp_skb_destructor time while receive queue spinlock is held. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 12月, 2016 7 次提交
-
-
由 Martin KaFai Lau 提交于
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
>From : Woojung Huh <woojung.huh@microchip.com> Add functions to unregister phy fixup for modules. int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask) Unregister phy fixup from phy_fixup_list per bus_id, phy_uid & phy_uid_mask int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask) Unregister phy fixup from phy_fixup_list. Use it for fixup registered by phy_register_fixup_for_uid() int phy_unregister_fixup_for_id(const char *bus_id) Unregister phy fixup from phy_fixup_list. Use it for fixup registered by phy_register_fixup_for_id() Signed-off-by: NWoojung Huh <woojung.huh@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
Commmits 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") and 48461135 ("bpf: allow access into map value arrays") by themselves are correct, but in combination they make state equivalence ignore 'id' field of the register state which can lead to accepting invalid program. Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Fixes: 48461135 ("bpf: allow access into map value arrays") Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Under UDP flood, many softirq producers try to add packets to UDP receive queue, and one user thread is burning one cpu trying to dequeue packets as fast as possible. Two parts of the per packet cost are : - copying payload from kernel space to user space, - freeing memory pieces associated with skb. If socket is under pressure, softirq handler(s) can try to pull in skb->head the payload of the packet if it fits. Meaning the softirq handler(s) can free/reuse the page fragment immediately, instead of letting udp_recvmsg() do this hundreds of usec later, possibly from another node. Additional gains : - We reduce skb->truesize and thus can store more packets per SO_RCVBUF - We avoid cache line misses at copyout() time and consume_skb() time, and avoid one put_page() with potential alien freeing on NUMA hosts. This comes at the cost of a copy, bounded to available tail room, which is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger than necessary) This patch gave me about 5 % increase in throughput in my tests. skb_condense() helper could probably used in other contexts. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
RFS is not commonly used, so add a jump label to avoid some conditionals in fast path. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Niklas Cassel 提交于
The driver currently always sets the PBLx8/PBLx4 bit, which means that the pbl values configured via the pbl/txpbl/rxpbl DT properties are always multiplied by 8/4 in the hardware. In order to allow the DT to configure lower pbl values, while at the same time not changing behavior of any existing device trees using the pbl/txpbl/rxpbl settings, add a property to disable the multiplication of the pbl by 8/4 in the hardware. Suggested-by: NRabin Vincent <rabinv@axis.com> Signed-off-by: NNiklas Cassel <niklas.cassel@axis.com> Acked-by: NAlexandre Torgue <alexandre.torgue@st.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Niklas Cassel 提交于
GMAC and newer supports independent programmable burst lengths for DMA tx/rx. Add new optional devicetree properties representing this. To be backwards compatible, snps,pbl will still be valid, but snps,txpbl/snps,rxpbl will override the value in snps,pbl if set. If the IP is synthesized to use the AXI interface, there is a register and a matching DT property inside the optional stmmac-axi-config DT node for controlling burst lengths, named snps,blen. However, using this register, it is not possible to control tx and rx independently. Also, this register is not available if the IP was synthesized with, e.g., the AHB interface. Signed-off-by: NNiklas Cassel <niklas.cassel@axis.com> Acked-by: NAlexandre Torgue <alexandre.torgue@st.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2016 6 次提交
-
-
由 Florian Westphal 提交于
The caller assumes that < 0 means that skb was stolen (or free'd). All other return values continue skb processing. nf_hook_slow returns 3 different return value types: A) a (negative) errno value: the skb was dropped (NF_DROP, e.g. by iptables '-j DROP' rule). B) 0. The skb was stolen by the hook or queued to userspace. C) 1. all hooks returned NF_ACCEPT so the caller should invoke the okfn so packet processing can continue. nft ingress facility currently doesn't have the 'okfn' that the NF_HOOK() macros use; there is no nfqueue support either. So 1 means that nf_hook_ingress() caller should go on processing the skb. In order to allow use of NF_STOLEN from ingress we need to translate this to an errno number, else we'd crash because we continue with already-free'd (or about to be free-d) skb. The errno value isn't checked, its just important that its less than 0, so return -1. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Keeps some noise away from a followup patch. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Aaron Conole 提交于
During nfhook traversal we only need a very small subset of nf_hook_ops members. We need: - next element - hook function to call - hook function priv argument Bridge netfilter also needs 'thresh'; can be obtained via ->orig_ops. nf_hook_entry struct is now 32 bytes on x86_64. A followup patch will turn the run-time list into an array that only stores hook functions plus their priv arguments, eliminating the ->next element. Suggested-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NAaron Conole <aconole@bytheb.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Aaron Conole 提交于
This allows easier future refactoring. Signed-off-by: NAaron Conole <aconole@bytheb.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 06 12月, 2016 3 次提交
-
-
由 Daniel Borkmann 提交于
When loading a BPF program via bpf(2), calculate the digest over the program's instruction stream and store it in struct bpf_prog's digest member. This is done at a point in time before any instructions are rewritten by the verifier. Any unstable map file descriptor number part of the imm field will be zeroed for the hash. fdinfo example output for progs: # cat /proc/1590/fdinfo/5 pos: 0 flags: 02000002 mnt_id: 11 prog_type: 1 prog_jited: 1 prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28 memlock: 4096 When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. Furthermore, this can also be exposed through other means such as netlink in case of a tc cls/act dump (or xdp in future), but also through tracepoints or other facilities to identify the program. Other than that, the digest can also serve as a base name for the work in progress kallsyms support of programs. The digest doesn't depend/select the crypto layer, since we need to keep dependencies to a minimum. iproute2 will get support for this facility. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tsq_flags being in the same cache line than sk_wmem_alloc makes a lot of sense. Both fields are changed from tcp_wfree() and more generally by various TSQ related functions. Prior patch made room in struct sock and added sk_tsq_flags, this patch deletes tsq_flags from struct tcp_sock. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This is a cleanup, to ease code review of following patches. Old 'enum tsq_flags' is renamed, and a new enumeration is added with the flags used in cmpxchg() operations as opposed to single bit operations. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 12月, 2016 1 次提交
-
-
由 Davide Caratti 提交于
CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection tracking support for DCCP protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| dccp | ipv4 | ipv6 | nf_conntrack ---------++--------+--------+--------+-------------- none || 469140 | 828755 | 828676 | 6141434 DCCP || - | 830566 | 829935 | 6533526 Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 04 12月, 2016 1 次提交
-
-
由 Erik Nordmark 提交于
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: NErik Nordmark <nordmark@arista.com> Signed-off-by: NBob Gilligan <gilligan@arista.com> Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2016 6 次提交
-
-
由 David Ahern 提交于
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Code move and rename only; no functional change intended. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hadar Hen Zion 提交于
In order to support hardware offloading when the device given by the tc rule is different from the Hardware underline device, extract the mirred (egress) device from the tc action when a filter is added, using the new tc_action_ops, get_dev(). Flower caches the information about the mirred device and use it for calling ndo_setup_tc in filter change, update stats and delete. Calling ndo_setup_tc of the mirred (egress) device instead of the ingress device will allow a resolution between the software ingress device and the underline hardware device. The resolution will take place inside the offloading driver using 'egress_device' flag added to tc_to_netdev struct which is provided to the offloading driver. Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit ceaa1fef ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Mintz 提交于
This adds the backbone required for the various HW initalizations which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ 4xxxx line of adapters - FW notification, resource initializations, etc. Signed-off-by: NArun Easi <arun.easi@cavium.com> Signed-off-by: NYuval Mintz <yuval.mintz@cavium.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
After 326fe02d ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers need to hold rcu_read_lock() already to make sure BPF program doesn't get released in the background. Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading. Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping in XDP supported drivers and to keep the typecheck on the context intact. For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock() out of the helper. When the driver gets atomic replace support, this will move to call-sites eventually. mlx5 needs actual fixing as it has the same issue as described already in 326fe02d ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), that is, we're under RCU bh at this time, BPF programs are released via call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue reset. Fixes: 86994156 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 12月, 2016 3 次提交
-
-
由 Thomas Graf 提交于
Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: NThomas Graf <tgraf@suug.ch> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tariq Toukan 提交于
Add new type of struct mlx5_frag_buf which is used to allocate fragmented buffers rather than contiguous, and make the Completion Queues (CQs) use it as they are big (default of 2MB per CQ in Striding RQ). This fixes the failures of type: "mlx5e_open_locked: mlx5e_open_channels failed, -12" due to dma_zalloc_coherent insufficient contiguous coherent memory to satisfy the driver's request when the user tries to setup more or larger rings. Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Raju Lakkaraju 提交于
Add new parameter mdix_ctrl to hold the user configuration. Existing mdix maintain the current status of MDI(X) crossover performed or not. mdix_ctrl can configure either ETH_TP_MDI or ETH_TP_MDI_X orETH_TP_MDI_AUTO. Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2016 5 次提交
-
-
由 Kirill A. Shutemov 提交于
Hugetlb pages have ->index in size of the huge pages (PMD_SIZE or PUD_SIZE), not in PAGE_SIZE as other types of pages. This means we cannot user page_to_pgoff() to check whether we've got the right page for the radix-tree index. Let's introduce page_to_index() which would return radix-tree index for given page. We will be able to get rid of this once hugetlb will be switched to multi-order entries. Fixes: fc127da0 ("truncate: handle file thp") Link: http://lkml.kernel.org/r/20161123093053.mjbnvn5zwxw5e6lk@black.fi.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: NDoug Nelson <doug.nelson@intel.com> Tested-by: NDoug Nelson <doug.nelson@intel.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Vyukov 提交于
kasan_global struct is part of compiler/runtime ABI. gcc revision 241983 has added a new field to kasan_global struct. Update kernel definition of kasan_global struct to include the new field. Without this patch KASAN is broken with gcc 7. Link: http://lkml.kernel.org/r/1479219743-28682-1-git-send-email-dvyukov@google.comSigned-off-by: NDmitry Vyukov <dvyukov@google.com> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mintz, Yuval 提交于
The driver needs to maintain several FW/HW-indices for each one of its queues. Currently, that mapping is done by the QED where it uses an rx/tx array of so-called hw-cids, populating them whenever a new queue is opened and clearing them upon destruction of said queues. This maintenance is far from ideal - there's no real reason why QED needs to maintain such a data-structure. It becomes even worse when considering the fact that the PF's queues and its child VFs' queues are all mapped into the same data-structure. As a by-product, the set of parameters an interface needs to supply for queue APIs is non-trivial, and some of the variables in the API structures have different meaning depending on their exact place in the configuration flow. This patch re-organizes the way L2 queues are configured and maintained. In short: - Required parameters for queue init are now well-defined. - Qed would allocate a queue-cid based on parameters. Upon initialization success, it would return a handle to caller. - Queue-handle would be maintained by entity requesting queue-init, not necessarily qed. - All further queue-APIs [update, destroy] would use the opaque handle as reference for the queue instead of various indices. The possible owners of such handles: - PF queues [qede] - complete handles based on provided configuration. - VF queues [qede] - fw-context-less handles, containing only relative information; Only the PF-side would need the absolute indices for configuration, so they're omitted here. - VF queues [qed, PF-side] - complete handles based on VF initialization. Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mintz, Yuval 提交于
The chain structure and functions are widely used by the qed* modules, both for configuration and datapath. E.g., qede's Tx has one such chain and its Rx has two. Currently, the strucutre's fields which are required for datapath related functions [produce/consume] are intertwined with fields which are required only for configuration purposes [init/destroy/etc.]. This patch re-arranges the chain structure so that all the fields which are required for datapath usage could reside in a single cacheline instead of the two which are required today. Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
this #include is unnecessary and brings whole set of other headers into cgroup-defs.h. Remove it. Fixes: 30070984 ("cgroup: add support for eBPF programs") Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NRami Rosen <roszenrami@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NDaniel Mack <daniel@zonque.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2016 6 次提交
-
-
由 Daniel Borkmann 提交于
Add an IFLA_XDP_FLAGS attribute that can be passed for setting up XDP along with IFLA_XDP_FD, which eventually allows user space to implement typical add/replace/delete logic for programs. Right now, calling into dev_change_xdp_fd() will always replace previous programs. When passed XDP_FLAGS_UPDATE_IF_NOEXIST, we can handle this more graceful when requested by returning -EBUSY in case we try to attach a new program, but we find that another one is already attached. This will be used by upcoming front-end for iproute2 as well. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
Broadcom PHYs expose a number of PHY error counters: receive errors, false carrier sense, SerDes BER count, local and remote receive errors. Add support code to allow retrieving these error counters. Since the Broadcom PHY library code is used by several drivers, make it possible for them to specify the storage for the software copy of the statistics. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch exports the sender chronograph stats via the socket SO_TIMESTAMPING channel. Currently we can instrument how long a particular application unit of data was queued in TCP by tracking SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having these sender chronograph stats exported simultaneously along with these timestamps allow further breaking down the various sender limitation. For example, a video server can tell if a particular chunk of video on a connection takes a long time to deliver because TCP was experiencing small receive window. It is not possible to tell before this patch without packet traces. To prepare these stats, the user needs to set SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags while requesting other SOF_TIMESTAMPING TX timestamps. When the timestamps are available in the error queue, the stats are returned in a separate control message of type SCM_TIMESTAMPING_OPT_STATS, in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME, TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch implements the skeleton of the TCP chronograph instrumentation on sender side limits: 1) idle (unspec) 2) busy sending data other than 3-4 below 3) rwnd-limited 4) sndbuf-limited The limits are enumerated 'tcp_chrono'. Since a connection in theory can idle forever, we do not track the actual length of this uninteresting idle period. For the rest we track how long the sender spends in each limit. At any point during the life time of a connection, the sender must be in one of the four states. If there are multiple conditions worthy of tracking in a chronograph then the highest priority enum takes precedence over the other conditions. So that if something "more interesting" starts happening, stop the previous chrono and start a new one. The time unit is jiffy(u32) in order to save space in tcp_sock. This implies application must sample the stats no longer than every 49 days of 1ms jiffy. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johan Hovold 提交于
Add helper to deregister fixed-link PHYs registered using of_phy_register_fixed_link(). Convert the two drivers that care to deregister their fixed-link PHYs to use the new helper, but note that most drivers currently fail to do so. Signed-off-by: NJohan Hovold <johan@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 jbrunet 提交于
This patch adds an option to disable EEE advertisement in the generic PHY by providing a mask of prohibited modes corresponding to the value found in the MDIO_AN_EEE_ADV register. On some platforms, PHY Low power idle seems to be causing issues, even breaking the link some cases. The patch provides a convenient way for these platforms to disable EEE advertisement and work around the issue. Signed-off-by: NJerome Brunet <jbrunet@baylibre.com> Tested-by: NYegor Yefremov <yegorslists@googlemail.com> Tested-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2016 1 次提交
-
-
由 Tariq Toukan 提交于
This reverts commit 9d769311. Using unregister_netdev at shutdown flow prevents calling the netdev's ndos or trying to access its freed resources. This fixes crashes like the following: Call Trace: [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30 [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0 [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0 [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50 [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20 [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50 [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50 [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170 [<ffffffff815a6205>] linkwatch_event+0x25/0x30 [<ffffffff81099a82>] process_one_work+0x152/0x400 [<ffffffff8109a325>] worker_thread+0x125/0x4b0 [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350 [<ffffffff8109fc6a>] kthread+0xca/0xe0 [<ffffffff8109fba0>] ? kthread_park+0x60/0x60 [<ffffffff816a1285>] ret_from_fork+0x25/0x30 Fixes: 9d769311 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow") Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Reported-by: NSteve Wise <swise@opengridcomputing.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-