提交 · 057658cb33fbf4d4309f01fe8845903b1cd07fad · openanolis / cloud-kernel

09 10月, 2017 11 次提交

bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports · 057658cb

由 Roopa Prabhu 提交于 10月 06, 2017

This patch avoids flooding and proxies arp packets
for BR_NEIGH_SUPPRESS ports.

Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp
to support both proxy arp and neigh suppress.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

057658cb

bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood · 821f1b21

由 Roopa Prabhu 提交于 10月 06, 2017

This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to
suppress arp and nd flood on bridge ports. It implements
rfc7432, section 10.
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.

In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.

This patch adds netlink and sysfs support to set this bridge port
flag.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

821f1b21

ipv6: fix a BUG in rt6_get_pcpu_route() · 951f788a

由 Eric Dumazet 提交于 10月 08, 2017

Ido reported following splat and provided a patch.

[  122.221814] BUG: using smp_processor_id() in preemptible [00000000] code: sshd/2672
[  122.221845] caller is debug_smp_processor_id+0x17/0x20
[  122.221866] CPU: 0 PID: 2672 Comm: sshd Not tainted 4.14.0-rc3-idosch-next-custom #639
[  122.221880] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  122.221893] Call Trace:
[  122.221919]  dump_stack+0xb1/0x10c
[  122.221946]  ? _atomic_dec_and_lock+0x124/0x124
[  122.221974]  ? ___ratelimit+0xfe/0x240
[  122.222020]  check_preemption_disabled+0x173/0x1b0
[  122.222060]  debug_smp_processor_id+0x17/0x20
[  122.222083]  ip6_pol_route+0x1482/0x24a0
...

I believe we can simplify this code path a bit, since we no longer
hold a read_lock and need to release it to avoid a dead lock.

By disabling BH, we make sure we'll prevent code re-entry and
rt6_get_pcpu_route()/rt6_make_pcpu_route() run on the same cpu.

Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

951f788a

Merge tag 'mlx5-updates-2017-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 51a0c00c

由 David S. Miller 提交于 10月 08, 2017

Saeed Mahameed says:

====================
Mellanox, mlx5 updates 2017-10-06

This series includes some shared code updates for kernel 4.15 to both
net-next and rdma-next trees.

The series includes mlx5 low level flow steering updates and optimizations
to support firmware command parallelism for flow steering requests from
Maor Gottlieb and two other small fixes from Matan and Maor.

One fix from Matan adds error handling for when the destination
list of the flow steering rule is full.

Maor introduced a patch to avoid NULL pointer dereference on steering cleanup.

Then Some refactoring patches needed by the series for code sharing purposes.
and split the Flow Table Entry (FTE) and Flow Group (FG) creation code to two parts:
    1) Object allocation - allocate the steering node and initialize
    its resources.

    2) The firmware command execution.

This change will give us the ability to take write lock on the
parent node (e.g. FG for FTE creating) only on the software data struct allocation
and creation part of the procedure where the synchronization is really required,
and will allow us to execute multiple firmware commands simultaneously and overcome the
firmware bottleneck.

Refactor the locking scheme of the mlx5 core flow steering as follows:

1) Replace the mutex lock with readers-writers semaphore and take
    the write lock only when necessary (e.g. allocating a new flow
    table entry index or adding a node to the parent's children list).
    When we try to find a suitable child in the parent's children list
    (e.g. search for flow group with the same match_criteria of the rule)
    then we only take the read lock.

2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST)
    will have an incremental version. The version is increased when the
    entity is changed (e.g. when a new FTE was added to FG - the FG's
    version is increased).
    Versioning is used in order to determine if the last traverse of an
    entity's children is valid or a rescan under write lock is required.

Last patch adds FGs and FTEs memory pool, It is useful because these objects
are not small and could be allocated/deallocated many times.

This support improves the insertion rate of steering rules
from ~5k/sec to ~40k/sec.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51a0c00c

Merge branch 'hv_netvsc-TCP-hash-level' · 28f50eb2

由 David S. Miller 提交于 10月 08, 2017

Haiyang Zhang says:

====================
hv_netvsc: support changing TCP hash level

The patch set simplifies the existing hash level switching code for
UDP. It also adds the support for changing TCP hash level. So users
can switch between L3 an L4 hash levels for TCP and UDP.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28f50eb2

hv_netvsc: Update netvsc Document for TCP hash level setting · 78005d91

由 Haiyang Zhang 提交于 10月 06, 2017

Update Documentation/networking/netvsc.txt for TCP hash level setting
and related info.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78005d91

hv_netvsc: Add ethtool handler to set and get TCP hash levels · 0518ec4f

由 Haiyang Zhang 提交于 10月 06, 2017

The patch supports the options to switch TCP hash level between
L3 and L4 by ethtool command. TCP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.

For example, for TCP over IPv4 on eth0:
To include TCP port numbers in hashing:
	ethtool -N eth0 rx-flow-hash tcp4 sdfn
To exclude TCP port numbers in hashing:
	ethtool -N eth0 rx-flow-hash tcp4 sd
To show TCP hash level:
	ethtool -n eth0 rx-flow-hash tcp4
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0518ec4f

hv_netvsc: Change the hash level variable to bit flags · 486e3981

由 Haiyang Zhang 提交于 10月 06, 2017

This simplifies the logic and make it easier to add more
options.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

486e3981

Merge branch 'mlxsw-more-extack' · c1b85a19

由 David S. Miller 提交于 10月 08, 2017

Jiri Pirko says:

====================
mlxsw: Add more extack error reporting

Ido says:

Add error messages to VLAN and bridge enslavements to help users
understand why the enslavement failed.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1b85a19

mlxsw: spectrum: Propagate extack further for bridge enslavements · 9b63ef88

由 Ido Schimmel 提交于 10月 08, 2017

The code that actually takes care of bridge offload introduces a few
more non-trivial constraints with regards to bridge enslavements.
Propagate extack there to indicate the reason.

$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add link enp1s0np1 name enp1s0np1.20 type vlan id 20
$ ip link add name br0 type bridge
$ ip link set dev enp1s0np1.10 master br0
$ ip link set dev enp1s0np1.20 master br0
Error: spectrum: Can not bridge VLAN uppers of the same port.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b63ef88

mlxsw: spectrum: Add extack for VLAN enslavements · c1f2c6d0

由 Ido Schimmel 提交于 10月 08, 2017

Similar to physical ports, enslavement of VLAN devices can also fail.
Use extack to indicate why the enslavement failed.

$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add name bond0 type bond mode 802.3ad
$ ip link set dev enp1s0np1.10 master bond0
Error: spectrum: VLAN devices only support bridge and VRF uppers.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1f2c6d0

08 10月, 2017 29 次提交

Merge branch 'bpf-obj-name-misc' · c9f766bc

由 David S. Miller 提交于 10月 07, 2017

Martin KaFai Lau says:

====================
bpf: Misc improvements and a new usage on bpf obj name

The first two patches make improvements on the bpf obj name.

The last patch adds the prog name to kallsyms.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9f766bc

bpf: Append prog->aux->name in bpf_get_prog_name() · 368211fb

由 Martin KaFai Lau 提交于 10月 05, 2017

This patch makes the bpf_prog's name available
in kallsyms.

The new format is bpf_prog_tag[_name].

Sample kallsyms from running selftests/bpf/test_progs:
[root@arch-fb-vm1 ~]# egrep ' bpf_prog_[0-9a-fA-F]{16}' /proc/kallsyms
ffffffffa0048000 t bpf_prog_dabf0207d1992486_test_obj_id
ffffffffa0038000 t bpf_prog_a04f5eef06a7f555__123456789ABCDE
ffffffffa0050000 t bpf_prog_a04f5eef06a7f555
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

368211fb

bpf: Use char in prog and map name · 067cae47

由 Martin KaFai Lau 提交于 10月 05, 2017

Instead of u8, use char for prog and map name.  It can avoid the
userspace tool getting compiler's signess warning.  The
bpf_prog_aux, bpf_map, bpf_attr, bpf_prog_info and
bpf_map_info are changed.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

067cae47

bpf: Change bpf_obj_name_cpy() to better ensure map's name is init by 0 · 473d9734

由 Martin KaFai Lau 提交于 10月 05, 2017

During get_info_by_fd, the prog/map name is memcpy-ed.  It depends
on the prog->aux->name and map->name to be zero initialized.

bpf_prog_aux is easy to guarantee that aux->name is zero init.

The name in bpf_map may be harder to be guaranteed in the future when
new map type is added.

Hence, this patch makes bpf_obj_name_cpy() to always zero init
the prog/map name.
Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

473d9734

ip_gre: check packet length and mtu correctly in erspan tx · f192970d

由 William Tu 提交于 10月 05, 2017

Similarly to early patch for erspan_xmit(), the ARPHDR_ETHER device
is the length of the whole ether packet.  So skb->len should subtract
the dev->hard_header_len.

Fixes: 1a66a836 ("gre: add collect_md mode to ERSPAN tunnel")
Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f192970d

net: phonet: mark phonet_protocol as const · 548ec114

由 Lin Zhang 提交于 10月 06, 2017

The phonet_protocol structs don't need to be written by anyone and
so can be marked as const.
Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

548ec114

net: phonet: mark header_ops as const · 64237470

由 Lin Zhang 提交于 10月 06, 2017

Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64237470

Merge branch 'bpf-perf-time-helpers' · a1d753d2

由 David S. Miller 提交于 10月 07, 2017

Yonghong Song says:

====================
bpf: add two helpers to read perf event enabled/running time

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch set implements two helper functions.
The helper bpf_perf_event_read_value reads counter/time_enabled/time_running
for perf event array map. The helper bpf_perf_prog_read_value read
counter/time_enabled/time_running for bpf prog with type BPF_PROG_TYPE_PERF_EVENT.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1d753d2

bpf: add a test case for helper bpf_perf_prog_read_value · 81b9cf80