提交 · 29825717123fb9cfb9e709327d565c2f2fa89903 · openanolis / cloud-kernel

25 8月, 2017 24 次提交

net: Extend struct flowi6 with multipath hash · 29825717

由 Jakub Sitnicki 提交于 8月 23, 2017

Allow for functions that fill out the IPv6 flow info to also pass a hash
computed over the skb contents. The hash value will drive the multipath
routing decisions.

This is intended for special treatment of ICMPv6 errors, where we would
like to make a routing decision based on the flow identifying the
offending IPv6 datagram that triggered the error, rather than the flow
of the ICMP error itself.
Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29825717

devlink: Fix devlink_dpipe_table_register() stub signature. · 790c6056

由 David S. Miller 提交于 8月 24, 2017

One too many arguments compared to the non-stub version.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Fixes: ffd3cdcc ("devlink: Add support for dynamic table size")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

790c6056

ipv6: Add sysctl for per namespace flow label reflection · 22b6722b

由 Jakub Sitnicki 提交于 8月 23, 2017

Reflecting IPv6 Flow Label at server nodes is useful in environments
that employ multipath routing to load balance the requests. As "IPv6
Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
messages generated in response to a downstream packets from the server
can be routed by a load balancer back to the original server without
looking at transport headers, if the server applies the flow label
reflection. This enables the Path MTU Discovery past the ECMP router in
load-balance or anycast environments where each server node is reachable
by only one path.

Introduce a sysctl to enable flow label reflection per net namespace for
all newly created sockets. Same could be earlier achieved only per
socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
socket option.

[1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22b6722b

net/mlx5e: make mlx5e_profile const · 39a7e589

由 Bhumika Goyal 提交于 8月 23, 2017

Make this const as it is only passed as an argument to the function
mlx5e_create_netdev and the corresponding argument is of type const.
Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39a7e589

net/mlx4_core: make mlx4_profile const · 3f2c5fb2

由 Bhumika Goyal 提交于 8月 23, 2017

Make these const as they are only used in a copy operation.
Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f2c5fb2

Merge branch 'xdp-more-work-on-xdp-tracepoints' · e7d12ce1

由 David S. Miller 提交于 8月 24, 2017

Jesper Dangaard Brouer says:

====================
xdp: more work on xdp tracepoints

More work on streamlining and performance optimizing the tracepoints
for XDP.

I've created a simple xdp_monitor application that uses this
tracepoint, and prints statistics. Available at github:

https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_kern.c
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_user.c

The improvement over tracepoint with strcpy: 9810372 - 8428762 = +1381610 pps faster
 - (1/9810372 - 1/8428762)*10^9 = -16.7 nanosec
 - 100-(8428762/9810372*100) = strcpy-trace is 14.08% slower
 - 981037/8428762*100 = removing strcpy made it 11.64% faster

V3: Fix merge conflict with commit e4a8e817 ("bpf: misc xdp redirect cleanups")
V2: Change trace_xdp_redirect() to align with args of trace_xdp_exception()
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7d12ce1

xdp: get tracepoints xdp_exception and xdp_redirect in sync · 315ec399

由 Jesper Dangaard Brouer 提交于 8月 24, 2017

Remove the net_device string name from the xdp_exception tracepoint,
like the xdp_redirect tracepoint.

Align the TP_STRUCT to have common entries between these two
tracepoint.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

315ec399

xdp: remove net_device names from xdp_redirect tracepoint · a8735855

由 Jesper Dangaard Brouer 提交于 8月 24, 2017

There is too much overhead in the current trace_xdp_redirect
tracepoint as it does strcpy and strlen on the net_device names.

Besides, exposing the ifindex/index is actually the information that
is needed in the tracepoint to diagnose issues.  When a lookup fails
(either ifindex or devmap index) then there is a need for saying which
to_index that have issues.

V2: Adjust args to be aligned with trace_xdp_exception.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8735855

ixgbe: use return codes from ndo_xdp_xmit that are distinguishable · 2886447d

由 Jesper Dangaard Brouer 提交于 8月 24, 2017

For XDP_REDIRECT the use of return code -EINVAL is confusing, as it is
used in three different cases.  (1) When the index or ifindex lookup
fails, and in the ixgbe driver (2) when link is down and (3) when XDP
have not been enabled.

The return code can be picked up by the tracepoint xdp:xdp_redirect
for diagnosing why XDP_REDIRECT isn't working.  Thus, there is a need
different return codes to tell the issues apart.

I'm considering using a specific err-code scheme for XDP_REDIRECT
instead of using these errno codes.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2886447d

xdp: make generic xdp redirect use tracepoint trace_xdp_redirect · 2facaad6

由 Jesper Dangaard Brouer 提交于 8月 24, 2017

If the xdp_do_generic_redirect() call fails, it trigger the
trace_xdp_exception tracepoint.  It seems better to use the same
tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2facaad6

xdp: remove bpf_warn_invalid_xdp_redirect · d08adb82

由 Jesper Dangaard Brouer 提交于 8月 24, 2017

Given there is a tracepoint that can track the error code
of xdp_do_redirect calls, the WARN_ONCE in bpf_warn_invalid_xdp_redirect
doesn't seem relevant any longer.  Simply remove the function.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d08adb82

Merge branch 'mlxsw-ipv4-host-dpipe-table' · fb3bbbda

由 David S. Miller 提交于 8月 24, 2017

Jiri Pirko says:

====================
mlxsw: Add IPv4 host dpipe table

Arkadi says:

This patchset adds IPv4 host dpipe table support. This will provide the
ability to observe the hardware offloaded IPv4 neighbors.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb3bbbda

mlxsw: spectrum_dpipe: Add support for controlling neighbor counters · a481d713

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Add support for controlling neighbor counters via dpipe.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a481d713

mlxsw: spectrum_dpipe: Add support for IPv4 host table dump · a86f0309

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Add support for IPv4 host table dump.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a86f0309

mlxsw: spectrum_router: Add support for setting counters on neighbors · 7cfcbc75

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Add support for setting counters on neighbors based on dpipe's host table
counter status. This patch also adds the ability for getting the counter
value, which will be used by the dpipe host table implementation in the
next patches.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cfcbc75

mlxsw: reg: Make flow counter set type enum to be shared · 6bba7e20

由 Arkadi Sharshevsky 提交于 8月 24, 2017

This is done as a preparation before introducing support for neighbor
counters. The flow counter's type enum is used by many registers, yet,
until now it was used only by mgpc and thus it was private. This patch
updates the namespace for more generic usage.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bba7e20

mlxsw: spectrum_dpipe: Add IPv4 host table initial support · 6aecb36b

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Add IPv4 host table initial support.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aecb36b

mlxsw: spectrum_dpipe: Fix label name · 7e57ae9f

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Change label name for case of erif table init failure.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e57ae9f

mlxsw: spectrum_router: Add helpers for neighbor access · f17cc84d

由 Arkadi Sharshevsky 提交于 8月 24, 2017

This is done as a preparation before introducing the ability to dump the
host table via dpipe, and to count the table size. The mlxsw's neighbor
representative struct stays private to the router module.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f17cc84d

devlink: Move dpipe entry clear function into devlink · 35807324

由 Arkadi Sharshevsky 提交于 8月 24, 2017

The entry clear routine can be shared between the drivers, thus it is
moved inside devlink.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35807324

devlink: Add support for dynamic table size · ffd3cdcc

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Up until now the dpipe table's size was static and known at registration
time. The host table does not have constant size and it is resized in
dynamic manner. In order to support this behavior the size is changed
to be obtained dynamically via an op.

This patch also adjust the current dpipe table for the new API.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffd3cdcc

mlxsw: spectrum_dpipe: Fix erif table op name space · 23ca5ec3

由 Arkadi Sharshevsky 提交于 8月 24, 2017

Fix ERIF's table operations name space.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23ca5ec3

devlink: Add IPv4 header for dpipe · 3fb886ec

由 Arkadi Sharshevsky 提交于 8月 24, 2017

This will be used by the IPv4 host table which will be introduced in the
following patches. This header is global and can be reused by many
drivers.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fb886ec

devlink: Add Ethernet header for dpipe · 11770091

由 Arkadi Sharshevsky 提交于 8月 24, 2017

This will be used by the IPv4 host table which will be introduced in the
following patches. This header is global and can be reused by many
drivers.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11770091

24 8月, 2017 16 次提交

bpf: netdev is never null in __dev_map_flush · a5e2da6e

由 Daniel Borkmann 提交于 8月 24, 2017

No need to test for it in fast-path, every dev in bpf_dtab_netdev
is guaranteed to be non-NULL, otherwise dev_map_update_elem() will
fail in the first place.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5e2da6e

bpf, doc: Add arm32 as arch supporting eBPF JIT · d2aaa3dc

由 Shubham Bansal 提交于 8月 23, 2017

As eBPF JIT support for arm32 was added recently with
commit 39c13c20, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.
Signed-off-by: NShubham Bansal <illusionist.neo@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2aaa3dc

Merge branch 'bpf-verifier-fixes' · 81152518

由 David S. Miller 提交于 8月 23, 2017

Edward Cree says:

====================
bpf: verifier fixes

Fix a couple of bugs introduced in my recent verifier patches.
Patch #2 does slightly increase the insn count on bpf_lxc.o, but only by
 about a hundred insns (i.e. 0.2%).

v2: added test for write-marks bug (patch #1); reworded comment on
 propagate_liveness() for clarity.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81152518

bpf/verifier: document liveness analysis · 8e9cd9ce

由 Edward Cree 提交于 8月 23, 2017

The liveness tracking algorithm is quite subtle; add comments to explain it.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e9cd9ce

bpf/verifier: remove varlen_map_value_access flag · 1b688a19

由 Edward Cree 提交于 8月 23, 2017

The optimisation it does is broken when the 'new' register value has a
 variable offset and the 'old' was constant.  I broke it with my pointer
 types unification (see Fixes tag below), before which the 'new' value
 would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
 other changes in that patch mean that its original behaviour (ignore
 min/max values) cannot be restored.
Tests on a sample set of cilium programs show no change in count of
 processed instructions.

Fixes: f1174f77 ("bpf/verifier: rework value tracking")
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b688a19

selftests/bpf: add a test for a pruning bug in the verifier · df20cb7e

由 Alexei Starovoitov 提交于 8月 23, 2017

The test makes a read through a map value pointer, then considers pruning
 a branch where the register holds an adjusted map value pointer.  It
 should not prune, but currently it does.
Signed-off-by: NAlexei Starovoitov <ast@fb.com>
[ecree@solarflare.com: added test-name and patch description]
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df20cb7e

bpf/verifier: when pruning a branch, ignore its write marks · 63f45f84

由 Edward Cree 提交于 8月 23, 2017

The fact that writes occurred in reaching the continuation state does
 not screen off its reads from us, because we're not really its parent.
So detect 'not really the parent' in do_propagate_liveness, and ignore
 write marks in that case.

Fixes: dc503a8a ("bpf/verifier: track liveness for pruning")
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63f45f84

selftests/bpf: add a test for a bug in liveness-based pruning · d893dc26

由 Edward Cree 提交于 8月 23, 2017

Writes in straight-line code should not prevent reads from propagating
 along jumps.  With current verifier code, the jump from 3 to 5 does not
 add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
 the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.

Verifier output:
0: (61) r2 = *(u32 *)(r1 +0)
1: (35) if r2 >= 0x0 goto pc+1
 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
2: (b7) r0 = 0
3: (35) if r2 >= 0x0 goto pc+1
 R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
4: (b7) r0 = 0
5: (95) exit

from 3 to 5: safe

from 1 to 3: safe
processed 8 insns, stack depth 0
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d893dc26

gre: remove duplicated assignment of iph · 60890e04

由 Colin Ian King 提交于 8月 23, 2017

iph is being assigned the same value twice; remove the redundant
first assignment. (Thanks to Nikolay Aleksandrov for pointing out
that the first asssignment should be removed and not the second)

Fixes warning:
net/ipv4/ip_gre.c:265:2: warning: Value stored to 'iph' is never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60890e04

net: tipc: constify genl_ops · 042a9010

由 Arvind Yadav 提交于 8月 23, 2017

genl_ops are not supposed to change at runtime. All functions
working with genl_ops provided by <net/genetlink.h> work with
const genl_ops. So mark the non-const structs as const.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

042a9010

net: hinic: make functions set_ctrl0 and set_ctrl1 static · 5719e5eb

由 Colin Ian King 提交于 8月 23, 2017

The functions set_ctrl0 and set_ctrl1 are local to the source and do
not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'set_ctrl0' was not declared. Should it be static?
symbol 'set_ctrl1' was not declared. Should it be static?
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5719e5eb

net/sock: allow the user to set negative peek offset · 257a7303

由 Paolo Abeni 提交于 8月 23, 2017

This is necessary to allow the user to disable peeking with
offset once it's enabled.
Unix sockets already allow the above, with this patch we
permit it for udp[6] sockets, too.

Fixes: 627d2d6b ("udp: enable MSG_PEEK at non-zero offset")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

257a7303

Merge branch 'mlxsw-multichain-tc-offload' · 110d8465

由 David S. Miller 提交于 8月 23, 2017

Jiri Pirko says:

====================
mlxsw: spectrum: Introduce multichain TC offload

This patchset introduces offloading of rules added to chain with
non-zero index, which was previously forbidden. Also, goto_chain
termination action is offloaded allowing to jump to processing
of desired chain.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

110d8465

mlxsw: spectrum_flower: Offload goto_chain termination action · 0ede6ba2

由 Jiri Pirko 提交于 8月 23, 2017

If action is gact goto_chain, offload it to HW by jumping to another
ruleset.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ede6ba2

mlxsw: spectrum_acl: Provide helper to lookup ruleset · dbec8ee9

由 Jiri Pirko 提交于 8月 23, 2017

We need to lookup ruleset in order to offload goto_chain termination
action. This patch adds it.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbec8ee9

mlxsw: spectrum_acl: Allow to get group_id value for a ruleset · 0ade3b64

由 Jiri Pirko 提交于 8月 23, 2017

For goto_chain action we need to know group_id of a ruleset to jump to.
Provide infrastructure in order to get it.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ade3b64

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功