提交 · 8f623a10c31bf7dd9cbb10bb53555b83d5ea3192 · openeuler / Kernel

19 9月, 2020 13 次提交

Merge tag 'batadv-net-for-davem-20200918' of git://git.open-mesh.org/linux-merge · 8f623a10

由 David S. Miller 提交于 9月 18, 2020

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - fix wrong type use in backbone_gw hash, by Linus Luessing

 - disable TT re-routing for multicast packets, by Linus Luessing

 - Add missing include for in_interrupt(), by Sven Eckelmann

 - fix BLA/multicast issues for packets sent via unicast,
   by Linus Luessing (3 patches)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f623a10

dpaa2-eth: fix a build warning in dpmac.c · a1285927

由 Yangbo Lu 提交于 9月 18, 2020

Fix below sparse warning in dpmac.c.
warning: cast to restricted __le64
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1285927

net: hns: kerneldoc fixes · a3a94156

由 Lu Wei 提交于 9月 18, 2020

Fix some parameter description mistakes.
Signed-off-by: NLu Wei <luwei32@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3a94156

hinic: fix sending pkts from core while self testing · fc25fa97

由 Luo bin 提交于 9月 18, 2020

Call netif_tx_disable firstly before starting doing self-test to
avoid sending packet from networking core and self-test packet
simultaneously which may cause self-test failure or hw abnormal.

Fixes: 4aa218a4 ("hinic: add self test support")
Signed-off-by: NLuo bin <luobin9@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc25fa97

Merge branch 'Bugfixes-in-Microsemi-Ocelot-switch-driver' · 2b33b202

由 David S. Miller 提交于 9月 18, 2020

Vladimir Oltean says:

====================
Bugfixes in Microsemi Ocelot switch driver

This is a series of 8 assorted patches for "net", on the drivers for the
VSC7514 MIPS switch (Ocelot-1), the VSC9953 PowerPC (Seville), and a few
more that are common to all supported devices since they are in the
common library portion.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b33b202

net: mscc: ocelot: deinitialize only initialized ports · e5fb512d

由 Vladimir Oltean 提交于 9月 18, 2020

Currently mscc_ocelot_init_ports() will skip initializing a port when it
doesn't have a phy-handle, so the ocelot->ports[port] pointer will be
NULL. Take this into consideration when tearing down the driver, and add
a new function ocelot_deinit_port() to the switch library, mirror of
ocelot_init_port(), which needs to be called by the driver for all ports
it has initialized.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5fb512d

net: mscc: ocelot: unregister net devices on unbind · 22cdb493

由 Vladimir Oltean 提交于 9月 18, 2020

This driver was not unregistering its network interfaces on unbind.
Now it is.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22cdb493

net: mscc: ocelot: refactor ports parsing code into a dedicated function · 7c411799

由 Vladimir Oltean 提交于 9月 18, 2020

mscc_ocelot_probe() is already pretty large and hard to follow. So move
the code for parsing ports in a separate function.

This makes it easier for the next patch to just call
mscc_ocelot_release_ports from the error path of mscc_ocelot_init_ports.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c411799

net: mscc: ocelot: error checking when calling ocelot_init() · d1cc0e93

由 Vladimir Oltean 提交于 9月 18, 2020

ocelot_init() allocates memory, resets the switch and polls for a status
register, things which can fail. Stop probing the driver in that case,
and propagate the error result.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1cc0e93

net: mscc: ocelot: check for errors on memory allocation of ports · c9d4b2cf

由 Vladimir Oltean 提交于 9月 18, 2020

Do not proceed probing if we couldn't allocate memory for the ports
array, just error out.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9d4b2cf

net: dsa: seville: fix buffer size of the queue system · a63ed92d

由 Vladimir Oltean 提交于 9月 18, 2020

The VSC9953 Seville switch has 2 megabits of buffer split into 4360
words of 60 bytes each.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a63ed92d

net: mscc: ocelot: add locking for the port TX timestamp ID · 6565243c

由 Vladimir Oltean 提交于 9月 18, 2020

The ocelot_port->ts_id is used to:
(a) populate skb->cb[0] for matching the TX timestamp in the PTP IRQ
    with an skb.
(b) populate the REW_OP from the injection header of the ongoing skb.
Only then is ocelot_port->ts_id incremented.

This is a problem because, at least theoretically, another timestampable
skb might use the same ocelot_port->ts_id before that is incremented.
Normally all transmit calls are serialized by the netdev transmit
spinlock, but in this case, ocelot_port_add_txtstamp_skb() is also
called by DSA, which has started declaring the NETIF_F_LLTX feature
since commit 2b86cb82 ("net: dsa: declare lockless TX feature for
slave ports").  So the logic of using and incrementing the timestamp id
should be atomic per port.

The solution is to use the global ocelot_port->ts_id only while
protected by the associated ocelot_port->ts_id_lock. That's where we
populate skb->cb[0]. Note that for ocelot, ocelot_port_add_txtstamp_skb
is called for the actual skb, but for felix, it is called for the skb's
clone. That is something which will also be changed in the future.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6565243c

net: mscc: ocelot: fix race condition with TX timestamping · 9dda66ac

由 Vladimir Oltean 提交于 9月 18, 2020

The TX-timestampable skb is added late to the ocelot_port->tx_skbs. It
is in a race with the TX timestamp IRQ, which checks that queue trying
to match the timestamp with the skb by the ts_id. The skb should be
added to the queue before the IRQ can fire.

Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dda66ac

18 9月, 2020 10 次提交

nfp: use correct define to return NONE fec · 5f6857e8

由 Jakub Kicinski 提交于 9月 17, 2020

struct ethtool_fecparam carries bitmasks not bit numbers.
We want to return 1 (NONE), not 0.

Fixes: 0d087093 ("nfp: implement ethtool FEC mode settings")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f6857e8

hinic: fix potential resource leak · ce000c61

由 Wei Li 提交于 9月 17, 2020

In rx_request_irq(), it will just return what irq_set_affinity_hint()
returns. If it is failed, the napi and irq requested are not freed
properly. So add exits for failures to handle these.
Signed-off-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce000c61

Merge branch 'net-phy-Unbind-fixes' · 0dfdbc74

由 David S. Miller 提交于 9月 17, 2020

Florian Fainelli says:

====================
net: phy: Unbind fixes

This patch series fixes a couple of issues with the unbinding of the PHY
drivers and then bringing down a network interface. The first is a NULL
pointer de-reference and the second was an incorrect warning being
triggered.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dfdbc74

net: phy: Do not warn in phy_stop() on PHY_DOWN · 5116a8ad

由 Florian Fainelli 提交于 9月 16, 2020

When phy_is_started() was added to catch incorrect PHY states,
phy_stop() would not be qualified against PHY_DOWN. It is possible to
reach that state when the PHY driver has been unbound and the network
device is then brought down.

Fixes: 2b3e88ea ("net: phy: improve phy state checking")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5116a8ad

net: phy: Avoid NPD upon phy_detach() when driver is unbound · c2b727df

由 Florian Fainelli 提交于 9月 16, 2020

If we have unbound the PHY driver prior to calling phy_detach() (often
via phy_disconnect()) then we can cause a NULL pointer de-reference
accessing the driver owner member. The steps to reproduce are:

echo unimac-mdio-0:01 > /sys/class/net/eth0/phydev/driver/unbind
ip link set eth0 down

Fixes: cafe8df8 ("net: phy: Fix lack of reference count on PHY driver")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2b727df

ethtool: add and use message type for tunnel info reply · 19a83d36

由 Michal Kubecek 提交于 9月 17, 2020

Tunnel offload info code uses ETHTOOL_MSG_TUNNEL_INFO_GET message type (cmd
field in genetlink header) for replies to tunnel info netlink request, i.e.
the same value as the request have. This is a problem because we are using
two separate enums for userspace to kernel and kernel to userspace message
types so that this ETHTOOL_MSG_TUNNEL_INFO_GET (28) collides with
ETHTOOL_MSG_CABLE_TEST_TDR_NTF which is what message type 28 means for
kernel to userspace messages.

As the tunnel info request reached mainline in 5.9 merge window, we should
still be able to fix the reply message type without breaking backward
compatibility.

Fixes: c7d759eb ("ethtool: add tunnel info interface")
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19a83d36

drivers/net/wan/hdlc: Set skb->protocol before transmitting · 9fb030a7

由 Xie He 提交于 9月 16, 2020

This patch sets skb->protocol before transmitting frames on the HDLC
device, so that a user listening on the HDLC device with an AF_PACKET
socket will see outgoing frames' sll_protocol field correctly set and
consistent with that of incoming frames.

1. Control frames in hdlc_cisco and hdlc_ppp

When these drivers send control frames, skb->protocol is not set.

This value should be set to htons(ETH_P_HDLC), because when receiving
control frames, their skb->protocol is set to htons(ETH_P_HDLC).

When receiving, hdlc_type_trans in hdlc.h is called, which then calls
cisco_type_trans or ppp_type_trans. The skb->protocol of control frames
is set to htons(ETH_P_HDLC) so that the control frames can be received
by hdlc_rcv in hdlc.c, which calls cisco_rx or ppp_rx to process the
control frames.

2. hdlc_fr

When this driver sends control frames, skb->protocol is set to internal
values used in this driver.

When this driver sends data frames (from upper stacked PVC devices),
skb->protocol is the same as that of the user data packet being sent on
the upper PVC device (for normal PVC devices), or is htons(ETH_P_802_3)
(for Ethernet-emulating PVC devices).

However, skb->protocol for both control frames and data frames should be
set to htons(ETH_P_HDLC), because when receiving, all frames received on
the HDLC device will have their skb->protocol set to htons(ETH_P_HDLC).

When receiving, hdlc_type_trans in hdlc.h is called, and because this
driver doesn't provide a type_trans function in struct hdlc_proto,
all frames will have their skb->protocol set to htons(ETH_P_HDLC).
The frames are then received by hdlc_rcv in hdlc.c, which calls fr_rx
to process the frames (control frames are consumed and data frames
are re-received on upper PVC devices).

Cc: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: NXie He <xie.he.0141@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fb030a7

drivers/net/wan/lapbether: Make skb->protocol consistent with the header · 83f9a9c8

由 Xie He 提交于 9月 16, 2020

This driver is a virtual driver stacked on top of Ethernet interfaces.

When this driver transmits data on the Ethernet device, the skb->protocol
setting is inconsistent with the Ethernet header prepended to the skb.

This causes a user listening on the Ethernet interface with an AF_PACKET
socket, to see different sll_protocol values for incoming and outgoing
frames, because incoming frames would have this value set by parsing the
Ethernet header.

This patch changes the skb->protocol value for outgoing Ethernet frames,
making it consistent with the Ethernet header prepended. This makes a
user listening on the Ethernet device with an AF_PACKET socket, to see
the same sll_protocol value for incoming and outgoing frames.

Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: NXie He <xie.he.0141@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83f9a9c8

cxgb4: fix memory leak during module unload · f4a26a9b

由 Raju Rangoju 提交于 9月 16, 2020

Fix the memory leak in mps during module unload
path by freeing mps reference entries if the list
adpter->mps_ref is not already empty

Fixes: 28b38705 ("cxgb4: Re-work the logic for mps refcounting")
Signed-off-by: NRaju Rangoju <rajur@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4a26a9b

hv_netvsc: Add validation for untrusted Hyper-V values · 44144185

由 Andres Beltran 提交于 9月 16, 2020

For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer. Ensure that
invalid values cannot cause indexing off the end of an array, or
subvert an existing validation via integer overflow. Ensure that
outgoing packets do not have any leftover guest memory that has not
been zeroed out.
Signed-off-by: NAndres Beltran <lkmlabelt@gmail.com>
Co-developed-by: NAndrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: NAndrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Reviewed-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44144185

17 9月, 2020 3 次提交

net: dsa: microchip: ksz8795: really set the correct number of ports · fd944dc2

由 Matthias Schiffer 提交于 9月 16, 2020

The KSZ9477 and KSZ8795 use the port_cnt field differently: For the
KSZ9477, it includes the CPU port(s), while for the KSZ8795, it doesn't.

It would be a good cleanup to make the handling of both drivers match,
but as a first step, fix the recently broken assignment of num_ports in
the KSZ8795 driver (which completely broke probing, as the CPU port
index was always failing the num_ports check).

Fixes: af199a1a ("net: dsa: microchip: set the correct number of ports")
Signed-off-by: NMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: NCodrin Ciubotariu <codrin.ciubotariu@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd944dc2

geneve: add transport ports in route lookup for geneve · 34beb215

由 Mark Gray 提交于 9月 16, 2020

This patch adds transport ports information for route lookup so that
IPsec can select Geneve tunnel traffic to do encryption. This is
needed for OVS/OVN IPsec with encrypted Geneve tunnels.

This can be tested by configuring a host-host VPN using an IKE
daemon and specifying port numbers. For example, for an
Openswan-type configuration, the following parameters should be
configured on both hosts and IPsec set up as-per normal:

$ cat /etc/ipsec.conf

conn in
...
left=$IP1
right=$IP2
...
leftprotoport=udp/6081
rightprotoport=udp
...
conn out
...
left=$IP1
right=$IP2
...
leftprotoport=udp
rightprotoport=udp/6081
...

The tunnel can then be setup using "ip" on both hosts (but
changing the relevant IP addresses):

$ ip link add tun type geneve id 1000 remote $IP2
$ ip addr add 192.168.0.1/24 dev tun
$ ip link set tun up

This can then be tested by pinging from $IP1:

$ ping 192.168.0.2

Without this patch the traffic is unencrypted on the wire.

Fixes: 2d07dc79 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: NQiuyu Xiao <qiuyu.xiao.qyx@gmail.com>
Signed-off-by: NMark Gray <mark.d.gray@redhat.com>
Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34beb215

net: hns: kerneldoc fixes · 5f1ab0f4

由 Lu Wei 提交于 9月 16, 2020

Fix some parameter description or spelling mistakes.
Signed-off-by: NLu Wei <luwei32@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f1ab0f4

16 9月, 2020 6 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d5d325ea

由 David S. Miller 提交于 9月 15, 2020

Alexei Starovoitov says:

====================
pull-request: bpf 2020-09-15

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 19 day(s) which contain
a total of 10 files changed, 47 insertions(+), 38 deletions(-).

The main changes are:

1) docs/bpf fixes, from Andrii.

2) ld_abs fix, from Daniel.

3) socket casting helpers fix, from Martin.

4) hash iterator fixes, from Yonghong.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5d325ea

bpf: Fix a rcu warning for bpffs map pretty-print · ce880cb8

由 Yonghong Song 提交于 9月 15, 2020

Running selftest
  ./btf_btf -p
the kernel had the following warning:
  [   51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300
  [   51.529217] Modules linked in:
  [   51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878
  [   51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
  [   51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300
  ...
  [   51.542826] Call Trace:
  [   51.543119]  map_seq_next+0x53/0x80
  [   51.543528]  seq_read+0x263/0x400
  [   51.543932]  vfs_read+0xad/0x1c0
  [   51.544311]  ksys_read+0x5f/0xe0
  [   51.544689]  do_syscall_64+0x33/0x40
  [   51.545116]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The related source code in kernel/bpf/hashtab.c:
  709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
  710 {
  711         struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
  712         struct hlist_nulls_head *head;
  713         struct htab_elem *l, *next_l;
  714         u32 hash, key_size;
  715         int i = 0;
  716
  717         WARN_ON_ONCE(!rcu_read_lock_held());

In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key()
without holding a rcu_read_lock(), hence causing the above warning.
To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock.

Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap")
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200916004401.146277-1-yhs@fb.com

ce880cb8

bpf: Bpf_skc_to_* casting helpers require a NULL check on sk · 8c33dadc

由 Martin KaFai Lau 提交于 9月 15, 2020

The bpf_skc_to_* type casting helpers are available to
BPF_PROG_TYPE_TRACING.  The traced PTR_TO_BTF_ID may be NULL.
For example, the skb->sk may be NULL.  Thus, these casting helpers
need to check "!sk" also and this patch fixes them.

Fixes: 0d4fad3e ("bpf: Add bpf_skc_to_udp6_sock() helper")
Fixes: 478cfbdf ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
Fixes: af7ec138 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200915182959.241101-1-kafai@fb.com

8c33dadc

ipv4: Update exception handling for multipath routes via same device · 2fbc6e89

由 David Ahern 提交于 9月 14, 2020

Kfir reported that pmtu exceptions are not created properly for
deployments where multipath routes use the same device.

After some digging I see 2 compounding problems:
1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
   the route lookup. This is the second use case where this has
   been a problem (the first is related to use of vti devices with
   VRF). I can not find any reason for the oif to be changed after the
   lookup; the code goes back to the start of git. It does not seem
   logical so remove it.

2. fib_lookups for exceptions do not call fib_select_path to handle
   multipath route selection based on the hash.

The end result is that the fib_lookup used to add the exception
always creates it based using the first leg of the route.

An example topology showing the problem:

                 |  host1
             +------+
             | eth0 |  .209
             +------+
                 |
             +------+
     switch  | br0  |
             +------+
                 |
       +---------+---------+
       | host2             |  host3
   +------+             +------+
   | eth0 | .250        | eth0 | 192.168.252.252
   +------+             +------+

   +-----+             +-----+
   | vti | .2          | vti | 192.168.247.3
   +-----+             +-----+
       \                  /
 =================================
 tunnels
         192.168.247.1/24

for h in host1 host2 host3; do
        ip netns add ${h}
        ip -netns ${h} link set lo up
        ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
done

ip netns add switch
ip -netns switch li set lo up
ip -netns switch link add br0 type bridge stp 0
ip -netns switch link set br0 up

for n in 1 2 3; do
        ip -netns switch link add eth-sw type veth peer name eth-h${n}
        ip -netns switch li set eth-h${n} master br0 up
        ip -netns switch li set eth-sw netns host${n} name eth0
done

ip -netns host1 addr add 192.168.252.209/24 dev eth0
ip -netns host1 link set dev eth0 up
ip -netns host1 route add 192.168.247.0/24 \
        nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0

ip -netns host2 addr add 192.168.252.250/24 dev eth0
ip -netns host2 link set dev eth0 up

ip -netns host2 addr add 192.168.252.252/24 dev eth0
ip -netns host3 link set dev eth0 up

ip netns add tunnel
ip -netns tunnel li set lo up
ip -netns tunnel li add br0 type bridge
ip -netns tunnel li set br0 up
for n in $(seq 11 20); do
        ip -netns tunnel addr add dev br0 192.168.247.${n}/24
done

for n in 2 3
do
        ip -netns tunnel link add vti${n} type veth peer name eth${n}
        ip -netns tunnel link set eth${n} mtu 1360 master br0 up
        ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
        ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
done
ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3

ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
ip -netns host1 ro ls cache

Before this patch the cache always shows exceptions against the first
leg in the multipath route; 192.168.252.250 per this example. Since the
hash has an initial random seed, you may need to vary the final octet
more than what is listed. In my tests, using addresses between 11 and 19
usually found 1 that used both legs.

With this patch, the cache will have exceptions for both legs.

Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions")
Reported-by: NKfir Itzhak <mastertheknife@gmail.com>
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fbc6e89

net: tipc: kerneldoc fixes · 2e5117ba

由 Lu Wei 提交于 9月 15, 2020

Fix parameter description of tipc_link_bc_create()
Reported-by: NHulk Robot <hulkci@huawei.com>
Fixes: 16ad3f40 ("tipc: introduce variable window congestion control")
Signed-off-by: NLu Wei <luwei32@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e5117ba

ibmvnic: update MAINTAINERS · d3f2ef18

由 Dany Madden 提交于 9月 14, 2020

Update supporters for IBM Power SRIOV Virtual NIC Device Driver.
Thomas Falcon is moving on to other works. Dany Madden, Lijun Pan
and Sukadev Bhattiprolu are the current supporters.
Signed-off-by: NDany Madden <drt@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3f2ef18

15 9月, 2020 8 次提交

batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh · 2369e827

由 Linus Lüssing 提交于 9月 15, 2020

Scenario:
* Multicast frame send from BLA backbone gateways (multiple nodes
  with their bat0 bridged together, with BLA enabled) sharing the same
  LAN to nodes in the mesh

Issue:
* Nodes receive the frame multiple times on bat0 from the mesh,
  once from each foreign BLA backbone gateway which shares the same LAN
  with another

For multicast frames via batman-adv broadcast packets coming from the
same BLA backbone but from different backbone gateways duplicates are
currently detected via a CRC history of previously received packets.

However this CRC so far was not performed for multicast frames received
via batman-adv unicast packets. Fixing this by appyling the same check
for such packets, too.

Room for improvements in the future: Ideally we would introduce the
possibility to not only claim a client, but a complete originator, too.
This would allow us to only send a multicast-in-unicast packet from a BLA
backbone gateway claiming the node and by that avoid potential redundant
transmissions in the first place.

Fixes: 279e89b2 ("batman-adv: add broadcast duplicate check")
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>

2369e827

batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh · 74c09b72

由 Linus Lüssing 提交于 9月 15, 2020

Scenario:
* Multicast frame send from mesh to a BLA backbone (multiple nodes
  with their bat0 bridged together, with BLA enabled)

Issue:
* BLA backbone nodes receive the frame multiple times on bat0,
  once from mesh->bat0 and once from each backbone_gw from LAN

For unicast, a node will send only to the best backbone gateway
according to the TQ. However for multicast we currently cannot determine
if multiple destination nodes share the same backbone if they don't share
the same backbone with us. So we need to keep sending the unicasts to
all backbone gateways and let the backbone gateways decide which one
will forward the frame. We can use the CLAIM mechanism to make this
decision.

One catch: The batman-adv gateway feature for DHCP packets potentially
sends multicast packets in the same batman-adv unicast header as the
multicast optimizations code. And we are not allowed to drop those even
if we did not claim the source address of the sender, as for such
packets there is only this one multicast-in-unicast packet.

How can we distinguish the two cases?

The gateway feature uses a batman-adv unicast 4 address header. While
the multicast-to-unicasts feature uses a simple, 3 address batman-adv
unicast header. So let's use this to distinguish.

Fixes: fe2da6ff ("batman-adv: check incoming packet type for bla")
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>

74c09b72

batman-adv: mcast: fix duplicate mcast packets in BLA backbone from LAN · 3236d215

由 Linus Lüssing 提交于 9月 15, 2020

Scenario:
* Multicast frame send from a BLA backbone (multiple nodes with
  their bat0 bridged together, with BLA enabled)

Issue:
* BLA backbone nodes receive the frame multiple times on bat0

For multicast frames received via batman-adv broadcast packets the
originator of the broadcast packet is checked before decapsulating and
forwarding the frame to bat0 (batadv_bla_is_backbone_gw()->
batadv_recv_bcast_packet()). If it came from a node which shares the
same BLA backbone with us then it is not forwarded to bat0 to avoid a
loop.

When sending a multicast frame in a non-4-address batman-adv unicast
packet we are currently missing this check - and cannot do so because
the batman-adv unicast packet has no originator address field.

However, we can simply fix this on the sender side by only sending the
multicast frame via unicasts to interested nodes which do not share the
same BLA backbone with us. This also nicely avoids some unnecessary
transmissions on mesh side.

Note that no infinite loop was observed, probably because of dropping
via batadv_interface_tx()->batadv_bla_tx(). However the duplicates still
utterly confuse switches/bridges, ICMPv6 duplicate address detection and
neighbor discovery and therefore leads to long delays before being able
to establish TCP connections, for instance. And it also leads to the Linux
bridge printing messages like:
"br-lan: received packet on eth1 with own address as source address ..."

Fixes: 2d3f6ccc ("batman-adv: Modified forwarding behaviour for multicast packets")
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>

3236d215

docs/bpf: Remove source code links · 65dce596

由 Andrii Nakryiko 提交于 9月 14, 2020

Make path to bench_ringbufs.c just a text, not a special link.

Fixes: 97abb2b3 ("docs/bpf: Add BPF ring buffer design notes")
Reported-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200915005031.2748397-1-andriin@fb.com

65dce596

xsk: Fix number of pinned pages/umem size discrepancy · 2b1667e5

由 Björn Töpel 提交于 9月 10, 2020

For AF_XDP sockets, there was a discrepancy between the number of of
pinned pages and the size of the umem region.

The size of the umem region is used to validate the AF_XDP descriptor
addresses. The logic that pinned the pages covered by the region only
took whole pages into consideration, creating a mismatch between the
size and pinned pages. A user could then pass AF_XDP addresses outside
the range of pinned pages, but still within the size of the region,
crashing the kernel.

This change correctly calculates the number of pages to be
pinned. Further, the size check for the aligned mode is
simplified. Now the code simply checks if the size is divisible by the
chunk size.

Fixes: bbff2f32 ("xsk: new descriptor addressing scheme")
Reported-by: NCiara Loftus <ciara.loftus@intel.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NCiara Loftus <ciara.loftus@intel.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200910075609.7904-1-bjorn.topel@gmail.com

2b1667e5

net: sched: initialize with 0 before setting erspan md->u · 8e1b3ac4

由 Xin Long 提交于 9月 13, 2020

In fl_set_erspan_opt(), all bits of erspan md was set 1, as this
function is also used to set opt MASK. However, when setting for
md->u.index for opt VALUE, the rest bits of the union md->u will
be left 1. It would cause to fail the match of the whole md when
version is 1 and only index is set.

This patch is to fix by initializing with 0 before setting erspan
md->u.
Reported-by: NShuang Li <shuali@redhat.com>
Fixes: 79b1011c ("net: sched: allow flower to match erspan options")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e1b3ac4

Merge branch 'net-improve-vxlan-option-process-in-net_sched-and-lwtunnel' · ad7b27c9

由 David S. Miller 提交于 9月 14, 2020

Xin Long says:

====================
net: improve vxlan option process in net_sched and lwtunnel

This patch is to do some mask when setting vxlan option in net_sched
and lwtunnel, so that only available bits can be set on vxlan md gbp.

This would help when users don't know exactly vxlan's gbp bits, and
avoid some mismatch because of some unavailable bits set by users.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad7b27c9

lwtunnel: only keep the available bits when setting vxlan md->gbp · 681d2cfb

由 Xin Long 提交于 9月 13, 2020

As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
be set to or parse from the packet for vxlan gbp option.

So do the mask when set it in lwtunnel, as it does in act_tunnel_key and
cls_flower.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

681d2cfb

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功