- 26 9月, 2017 2 次提交
-
-
由 Alexey Dobriyan 提交于
Key length can't be negative. Leave comparisons against nla_len() signed just in case truncated attribute can sneak in there. Space savings: add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-7 (-7) function old new delta pneigh_delete 273 272 -1 mlx5e_rep_netevent_event 1415 1414 -1 mlx5e_create_encap_header_ipv6 1194 1193 -1 mlx5e_create_encap_header_ipv4 1071 1070 -1 cxgb4_l2t_get 1104 1103 -1 __pneigh_lookup 69 68 -1 __neigh_create 2452 2451 -1 Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As measured in my prior patch ("sch_netem: faster rb tree removal"), rbtree_postorder_for_each_entry_safe() is nice looking but much slower than using rb_next() directly, except when tree is small enough to fit in CPU caches (then the cost is the same) Also note that there is not even an increase of text size : $ size net/core/skbuff.o.before net/core/skbuff.o text data bss dec hex filename 40711 1298 0 42009 a419 net/core/skbuff.o.before 40711 1298 0 42009 a419 net/core/skbuff.o From: Eric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 9月, 2017 2 次提交
-
-
由 Alexey Dobriyan 提交于
Private part of allocation is never big enough to warrant size_t. Space savings: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10) function old new delta alloc_netdev_mqs 1120 1110 -10 Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Zerocopy skbs frags are copied when the skb is looped to a local sock. Commit 1080e512 ("net: orphan frags on receive") introduced calls to skb_orphan_frags to deliver_skb and __netif_receive_skb for this. With msg_zerocopy, these skbs can also exist in the tx path and thus loop from dev_queue_xmit_nit. This already calls deliver_skb in its loop. But it does not orphan before a separate pt_prev->func(). Add the missing skb_orphan_frags_rx. Changes v1->v2: handle skb_orphan_frags_rx failure Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY") Signed-off-by: NWillem de Bruijn <willemb@google.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 9月, 2017 1 次提交
-
-
由 Florian Fainelli 提交于
Commit 3f1ac7a7 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") deprecated the ethtool_cmd::transceiver field, which was fine in premise, except that the PHY library was actually using it to report the type of transceiver: internal or external. Use the first word of the reserved field to put this __u8 transceiver field back in. It is made read-only, and we don't expect the ETHTOOL_xLINKSETTINGS API to be doing anything with this anyway, so this is mostly for the legacy path where we do: ethtool_get_settings() -> dev->ethtool_ops->get_link_ksettings() -> convert_link_ksettings_to_legacy_settings() to have no information loss compared to the legacy get_settings API. Fixes: 3f1ac7a7 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 9月, 2017 1 次提交
-
-
由 Edward Cree 提交于
Since XDP's view of the packet includes the MAC header, moving the start- of-packet with bpf_xdp_adjust_head needs to also update the offset of the MAC header (which is relative to skb->head, not to the skb->data that was changed). Without this, tcpdump sees packets starting from the old MAC header rather than the new one, at least in my tests on the loopback device. Fixes: b5cdae32 ("net: Generic XDP") Signed-off-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 9月, 2017 1 次提交
-
-
由 Daniel Borkmann 提交于
Commit 109980b8 ("bpf: don't select potentially stale ri->map from buggy xdp progs") passed the pointer to the prog itself to be loaded into r4 prior on bpf_redirect_map() helper call, so that we can store the owner into ri->map_owner out of the helper. Issue with that is that the actual address of the prog is still subject to change when subsequent rewrites occur that require slow path in bpf_prog_realloc() to alloc more memory, e.g. from patching inlining helper functions or constant blinding. Thus, we really need to take prog->aux as the address we're holding, which also works with prog clones as they share the same aux object. Instead of then fetching aux->prog during runtime, which could potentially incur cache misses due to false sharing, we are going to just use aux for comparison on the map owner. This will also keep the patchlet of the same size, and later check in xdp_map_invalid() only accesses read-only aux pointer from the prog, it's also in the same cacheline already from prior access when calling bpf_func. Fixes: 109980b8 ("bpf: don't select potentially stale ri->map from buggy xdp progs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 9月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
Denys reported wrong rate estimations with HTB classes. It appears the bug was added in linux-4.10, since my tests where using intervals of one second only. HTB using 4 sec default rate estimators, reported rates were 4x higher. We need to properly scale the bytes/packets samples before integrating them in EWMA. Tested: echo 1 >/sys/module/sch_htb/parameters/htb_rate_est Setup HTB with one class with a rate/cail of 5Gbit Generate traffic on this class tc -s -d cl sh dev eth0 classid 7002:11 class htb 7002:11 parent 7002:1 prio 5 quantum 200000 rate 5Gbit ceil 5Gbit linklayer ethernet burst 80000b/1 mpu 0b cburst 80000b/1 mpu 0b level 0 rate_handle 1 Sent 1488215421648 bytes 982969243 pkt (dropped 0, overlimits 0 requeues 0) rate 5Gbit 412814pps backlog 136260b 2p requeues 0 TCP pkts/rtx 982969327/45 bytes 1488215557414/68130 lended: 22732826 borrowed: 0 giants: 0 tokens: -1684 ctokens: -1684 Fixes: 1c0d32fd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDenys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 9月, 2017 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Using bpf_redirect_map is allowed for generic XDP programs, but the appropriate map lookup was never performed in xdp_do_generic_redirect(). Instead the map-index is directly used as the ifindex. For the xdp_redirect_map sample in SKB-mode '-S', this resulted in trying sending on ifindex 0 which isn't valid, resulting in getting SKB packets dropped. Thus, the reported performance numbers are wrong in commit 24251c26 ("samples/bpf: add option for native and skb mode for redirect apps") for the 'xdp_redirect_map -S' case. Before commit 109980b8 ("bpf: don't select potentially stale ri->map from buggy xdp progs") it could crash the kernel. Like this commit also check that the map_owner owner is correct before dereferencing the map pointer. But make sure that this API misusage can be caught by a tracepoint. Thus, allowing userspace via tracepoints to detect misbehaving bpf_progs. Fixes: 6103aa96 ("net: implement XDP_REDIRECT for xdp generic") Fixes: 24251c26 ("samples/bpf: add option for native and skb mode for redirect apps") Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 9月, 2017 3 次提交
-
-
由 Daniel Borkmann 提交于
Differ between illegal XDP action code and just driver unsupported one to provide better feedback when we throw a one-time warning here. Reason is that with 814abfab ("xdp: add bpf_redirect helper function") not all drivers support the new XDP return code yet and thus they will fall into their 'default' case when checking for return codes after program return, which then triggers a bpf_warn_invalid_xdp_action() stating that the return code is illegal, but from XDP perspective it's not. I decided not to place something like a XDP_ACT_MAX define into uapi i) given we don't have this either for all other program types, ii) future action codes could have further encoding there, which would render such define unsuitable and we wouldn't be able to rip it out again, and iii) we rarely add new action codes. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
do_xdp_generic must be called inside rcu critical section with preempt disabled to ensure BPF programs are valid and per-cpu variables used for redirect operations are consistent. This patch ensures this is true and fixes the splat below. The netif_receive_skb_internal() code path is now broken into two rcu critical sections. I decided it was better to limit the preempt_enable/disable block to just the xdp static key portion and the fallout is more rcu_read_lock/unlock calls. Seems like the best option to me. [ 607.596901] ============================= [ 607.596906] WARNING: suspicious RCU usage [ 607.596912] 4.13.0-rc4+ #570 Not tainted [ 607.596917] ----------------------------- [ 607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage! [ 607.596927] [ 607.596927] other info that might help us debug this: [ 607.596927] [ 607.596933] [ 607.596933] rcu_scheduler_active = 2, debug_locks = 1 [ 607.596938] 2 locks held by pool/14624: [ 607.596943] #0: (rcu_read_lock_bh){......}, at: [<ffffffff95445ffd>] ip_finish_output2+0x14d/0x890 [ 607.596973] #1: (rcu_read_lock_bh){......}, at: [<ffffffff953c8e3a>] __dev_queue_xmit+0x14a/0xfd0 [ 607.597000] [ 607.597000] stack backtrace: [ 607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570 [ 607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 03/01/2017 [ 607.597016] Call Trace: [ 607.597027] dump_stack+0x67/0x92 [ 607.597040] lockdep_rcu_suspicious+0xdd/0x110 [ 607.597054] do_xdp_generic+0x313/0xa50 [ 607.597068] ? time_hardirqs_on+0x5b/0x150 [ 607.597076] ? mark_held_locks+0x6b/0xc0 [ 607.597088] ? netdev_pick_tx+0x150/0x150 [ 607.597117] netif_rx_internal+0x205/0x3f0 [ 607.597127] ? do_xdp_generic+0xa50/0xa50 [ 607.597144] ? lock_downgrade+0x2b0/0x2b0 [ 607.597158] ? __lock_is_held+0x93/0x100 [ 607.597187] netif_rx+0x119/0x190 [ 607.597202] loopback_xmit+0xfd/0x1b0 [ 607.597214] dev_hard_start_xmit+0x127/0x4e0 Fixes: d4455169 ("net: xdp: support xdp generic on virtual devices") Fixes: b5cdae32 ("net: Generic XDP") Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net> -
由 Daniel Borkmann 提交于
We can potentially run into a couple of issues with the XDP bpf_redirect_map() helper. The ri->map in the per CPU storage can become stale in several ways, mostly due to misuse, where we can then trigger a use after free on the map: i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT and running on a driver not supporting XDP_REDIRECT yet. The ri->map on that CPU becomes stale when the XDP program is unloaded on the driver, and a prog B loaded on a different driver which supports XDP_REDIRECT return code. prog B would have to omit calling to bpf_redirect_map() and just return XDP_REDIRECT, which would then access the freed map in xdp_do_redirect() since not cleared for that CPU. ii) prog A is calling bpf_redirect_map(), returning a code other than XDP_REDIRECT. prog A is then detached, which triggers release of the map. prog B is attached which, similarly as in i), would just return XDP_REDIRECT without having called bpf_redirect_map() and thus be accessing the freed map in xdp_do_redirect() since not cleared for that CPU. iii) prog A is attached to generic XDP, calling the bpf_redirect_map() helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is currently not handling ri->map (will be fixed by Jesper), so it's not being reset. Later loading a e.g. native prog B which would, say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would find in xdp_do_redirect() that a map was set and uses that causing use after free on map access. Fix thus needs to avoid accessing stale ri->map pointers, naive way would be to call a BPF function from drivers that just resets it to NULL for all XDP return codes but XDP_REDIRECT and including XDP_REDIRECT for drivers not supporting it yet (and let ri->map being handled in xdp_do_generic_redirect()). There is a less intrusive way w/o letting drivers call a reset for each BPF run. The verifier knows we're calling into bpf_xdp_redirect_map() helper, so it can do a small insn rewrite transparent to the prog itself in the sense that it fills R4 with a pointer to the own bpf_prog. We have that pointer at verification time anyway and R4 is allowed to be used as per calling convention we scratch R0 to R5 anyway, so they become inaccessible and program cannot read them prior to a write. Then, the helper would store the prog pointer in the current CPUs struct redirect_info. Later in xdp_do_*_redirect() we check whether the redirect_info's prog pointer is the same as passed xdp_prog pointer, and if that's the case then all good, since the prog holds a ref on the map anyway, so it is always valid at that point in time and must have a reference count of at least 1. If in the unlikely case they are not equal, it means we got a stale pointer, so we clear and bail out right there. Also do reset map and the owning prog in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and bpf_xdp_redirect() won't get mixed up, only the last call should take precedence. A tc bpf_redirect() doesn't use map anywhere yet, so no need to clear it there since never accessed in that layer. Note that in case the prog is released, and thus the map as well we're still under RCU read critical section at that time and have preemption disabled as well. Once we commit with the __dev_map_insert_ctx() from xdp_do_redirect_map() and set the map to ri->map_to_flush, we still wait for a xdp_do_flush_map() to finish in devmap dismantle time once flush_needed bit is set, so that is fine. Fixes: 97f91a7c ("bpf: add bpf_redirect_map helper routine") Reported-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 9月, 2017 1 次提交
-
-
由 Paolo Abeni 提交于
After commit 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing") we clear the skb head state as soon as the skb carrying them is first processed. Since the same skb can be processed several times when MSG_PEEK is used, we can end up lacking the required head states, and eventually oopsing. Fix this clearing the skb head state only when processing the last skb reference. Reported-by: NEric Dumazet <edumazet@google.com> Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 9月, 2017 2 次提交
-
-
由 Tom Herbert 提交于
In flow dissector there are no limits to the number of nested encapsulations or headers that might be dissected which makes for a nice DOS attack. This patch sets a limit of the number of headers that flow dissector will parse. Headers includes network layer headers, transport layer headers, shim headers for encapsulation, IPv6 extension headers, etc. The limit for maximum number of headers to parse has be set to fifteen to account for a reasonable number of encapsulations, extension headers, VLAN, in a packet. Note that this limit does not supercede the STOP_AT_* flags which may stop processing before the headers limit is reached. Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NTom Herbert <tom@quantonium.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
__skb_flow_dissect is riddled with gotos that make discerning the flow, debugging, and extending the capability difficult. This patch reorganizes things so that we only perform goto's after the two main switch statements (no gotos within the cases now). It also eliminates several goto labels so that there are only two labels that can be target for goto. Reported-by: NAlexander Popov <alex.popov@linux.com> Signed-off-by: NTom Herbert <tom@quantonium.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2017 5 次提交
-
-
由 Ido Schimmel 提交于
When a listener registers to the FIB notification chain it receives a dump of the FIB entries and rules from existing address families by invoking their dump operations. While we call into these modules we need to make sure they aren't removed. Do that by increasing their reference count before invoking their dump operations and decrease it afterwards. Fixes: 04b1d4e5 ("net: core: Make the FIB notification chain generic") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. v2: added the change in drivers/vhost/net.c as spotted by Willem. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In order to convert this atomic_t refcnt to refcount_t, we need to init the refcount to one to not trigger a 0 -> 1 transition. This also removes one atomic operation in fast path. v2: removed dead code in sock_zerocopy_put_abort() as suggested by Willem. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Make sock_filter_is_valid_access consistent with other is_valid_access helpers. Requested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the device in case a port is enslaved to a master netdev such as bridge or bond. Since the driver ignores events unrelated to its ports and their uppers, it's possible to engineer situations in which the device's data path differs from the kernel's. One example to such a situation is when a port is enslaved to a bond that is already enslaved to a bridge. When the bond was enslaved the driver ignored the event - as the bond wasn't one of its uppers - and therefore a bridge port instance isn't created in the device. Until such configurations are supported forbid them by checking that the upper device doesn't have uppers of its own. Fixes: 0d65fc13 ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Reported-by: NNogah Frankel <nogahf@mellanox.com> Tested-by: NNogah Frankel <nogahf@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2017 4 次提交
-
-
由 David Ahern 提交于
Allow BPF programs run on sock create to use the get_current_uid_gid helper. IPv4 and IPv6 sockets are created in a process context so there is always a valid uid/gid Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add socket mark and priority to fields that can be set by ebpf program when a socket is created. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arkadi Sharshevsky 提交于
This will be used by the IPv6 host table which will be introduced in the following patches. The fields in the header are added per-use. This header is global and can be reused by many drivers. Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
This extends bridge fdb table tracepoints to also cover learned fdb entries in the br_fdb_update path. Note that unlike other tracepoints I have moved this to when the fdb is modified because this is in the datapath and can generate a lot of noise in the trace output. br_fdb_update is also called from added_by_user context in the NTF_USE case which is already traced ..hence the !added_by_user check. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2017 1 次提交
-
-
由 David Ahern 提交于
IPv4 name uses "destination ip" as does the IPv6 patch set. Make the mac field consistent. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 8月, 2017 6 次提交
-
-
由 Eric Dumazet 提交于
Florian reported UDP xmit drops that could be root caused to the too small neigh limit. Current limit is 64 KB, meaning that even a single UDP socket would hit it, since its default sk_sndbuf comes from net.core.wmem_default (~212992 bytes on 64bit arches). Once ARP/ND resolution is in progress, we should allow a little more packets to be queued, at least for one producer. Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf limit and either block in sendmsg() or return -EAGAIN. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
A few useful tracepoints to trace bridge forwarding database updates. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
Creating as specific xdp_redirect_map variant of the xdp tracepoints allow users to write simpler/faster BPF progs that get attached to these tracepoints. Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map similar enough, that a tool can read the top part of the TP_STRUCT and produce similar monitor statistics. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
There is a need to separate the xdp_redirect tracepoint into two tracepoints, for separating the error case from the normal forward case. Due to the extreme speeds XDP is operating at, loading a tracepoint have a measurable impact. Single core XDP REDIRECT (ethtool tuned rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps (CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe) The overhead of loading a bpf-based tracepoint can be calculated to cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns). Using perf record on the tracepoint event, with a non-matching --filter expression, the overhead is much larger. Performance drops to 8.3 Mpps, cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74)) Having a separate tracepoint for err cases, which should be less frequent, allow running a continuous monitor for errors while not affecting the redirect forward performance (this have also been verified by measurements). Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
To make sense of the map index, the tracepoint user also need to know that map we are talking about. Supply the map pointer but only expose the map->id. The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case, this is the result of the devmap lookup. The map lookup key is exposed as map_index, which is needed to troubleshoot in case the lookup failed. The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as possible. This also keeps the TP_STRUCT similar enough, that userspace can write a monitor program, that doesn't need to care about whether bpf_redirect or bpf_redirect_map were used. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect is redundant as it is only called in-case this action was specified. Remove the argument, but keep "act" member of the tracepoint struct and populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog processing events. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 8月, 2017 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Noticed that busy_poll_stop() also invoke the drivers napi->poll() function pointer, but didn't have an associated call to trace_napi_poll() like all other call sites. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 8月, 2017 1 次提交
-
-
由 Sabrina Dubroca 提交于
There are a few bugs around refcnt handling in the new BPF congestion control setsockopt: - The new ca is assigned to icsk->icsk_ca_ops even in the case where we cannot get a reference on it. This would lead to a use after free, since that ca is going away soon. - Changing the congestion control case doesn't release the refcnt on the previous ca. - In the reinit case, we first leak a reference on the old ca, then we call tcp_reinit_congestion_control on the ca that we have just assigned, leading to deinitializing the wrong ca (->release of the new ca on the old ca's data) and releasing the refcount on the ca that we actually want to use. This is visible by building (for example) BIC as a module and setting net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from samples/bpf. This patch fixes the refcount issues, and moves reinit back into tcp core to avoid passing a ca pointer back to BPF. Fixes: 91b5b21c ("bpf: Add support for changing congestion control") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2017 7 次提交
-
-
由 Yuchung Cheng 提交于
This patch fixes a bug causing any sock operations to always return EINVAL. Fixes: a5192c52 ("bpf: fix to bpf_setsockops"). Reported-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NCraig Gallek <kraig@google.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
There is too much overhead in the current trace_xdp_redirect tracepoint as it does strcpy and strlen on the net_device names. Besides, exposing the ifindex/index is actually the information that is needed in the tracepoint to diagnose issues. When a lookup fails (either ifindex or devmap index) then there is a need for saying which to_index that have issues. V2: Adjust args to be aligned with trace_xdp_exception. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
If the xdp_do_generic_redirect() call fails, it trigger the trace_xdp_exception tracepoint. It seems better to use the same tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net> -
由 Jesper Dangaard Brouer 提交于
Given there is a tracepoint that can track the error code of xdp_do_redirect calls, the WARN_ONCE in bpf_warn_invalid_xdp_redirect doesn't seem relevant any longer. Simply remove the function. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arkadi Sharshevsky 提交于
The entry clear routine can be shared between the drivers, thus it is moved inside devlink. Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arkadi Sharshevsky 提交于
Up until now the dpipe table's size was static and known at registration time. The host table does not have constant size and it is resized in dynamic manner. In order to support this behavior the size is changed to be obtained dynamically via an op. This patch also adjust the current dpipe table for the new API. Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arkadi Sharshevsky 提交于
This will be used by the IPv4 host table which will be introduced in the following patches. This header is global and can be reused by many drivers. Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-