1. 27 4月, 2022 1 次提交
  2. 26 4月, 2022 4 次提交
    • N
      virtio_net: fix wrong buf address calculation when using xdp · acb16b39
      Nikolay Aleksandrov 提交于
      We received a report[1] of kernel crashes when Cilium is used in XDP
      mode with virtio_net after updating to newer kernels. After
      investigating the reason it turned out that when using mergeable bufs
      with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
      calculates the build_skb address wrong because the offset can become less
      than the headroom so it gets the address of the previous page (-X bytes
      depending on how lower offset is):
       page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
      
      This is a pr_err() I added in the beginning of page_to_skb which clearly
      shows offset that is less than headroom by adding 4 bytes of metadata
      via an xdp prog. The calculations done are:
       receive_mergeable():
       headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
       offset = xdp.data - page_address(xdp_page) -
                vi->hdr_len - metasize;
      
       page_to_skb():
       p = page_address(page) + offset;
       ...
       buf = p - headroom;
      
      Now buf goes -4 bytes from the page's starting address as can be seen
      above which is set as skb->head and skb->data by build_skb later. Depending
      on what's done with the skb (when it's freed most often) we get all kinds
      of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate
      the new headroom after the xdp program has run, similar to how offset
      and len are recalculated. Headroom is directly related to
      data_hard_start, data and data_meta, so we use them to get the new size.
      The result is correct (similar pr_err() in page_to_skb, one case of
      xdp_page and one case of virtnet buf):
       a) Case with 4 bytes of metadata
       [  115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252
       [  121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252
       b) Case of pushing data +32 bytes
       [  153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288
       [  158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288
       c) Case of pushing data -33 bytes
       [  835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223
       [  840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223
      
      Offset and headroom are equal because offset points to the start of
      reserved bytes for the virtio_net header which are at buf start +
      headroom, while data points at buf start + vnet hdr size + headroom so
      when data or data_meta are adjusted by the xdp prog both the headroom size
      and the offset change equally. We can use data_hard_start to compute the
      new headroom after the xdp prog (linearized / page start case, the
      virtnet buf case is similar just with bigger base offset):
       xdp.data_hard_start = page_address + vnet_hdr
       xdp.data = page_address + vnet_hdr + headroom
       new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize
      
      An example reproducer xdp prog[3] is below.
      
      [1] https://github.com/cilium/cilium/issues/19453
      
      [2] Two of the many traces:
       [   40.437400] BUG: Bad page state in process swapper/0  pfn:14940
       [   40.916726] BUG: Bad page state in process systemd-resolve  pfn:053b7
       [   41.300891] kernel BUG at include/linux/mm.h:720!
       [   41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       [   41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G    B   W         5.18.0-rc1+ #37
       [   41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
       [   41.306018] RIP: 0010:page_frag_free+0x79/0xe0
       [   41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
       [   41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
       [   41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
       [   41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
       [   41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
       [   41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
       [   41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
       [   41.317700] FS:  00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
       [   41.319150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [   41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
       [   41.321387] Call Trace:
       [   41.321819]  <TASK>
       [   41.322193]  skb_release_data+0x13f/0x1c0
       [   41.322902]  __kfree_skb+0x20/0x30
       [   41.343870]  tcp_recvmsg_locked+0x671/0x880
       [   41.363764]  tcp_recvmsg+0x5e/0x1c0
       [   41.384102]  inet_recvmsg+0x42/0x100
       [   41.406783]  ? sock_recvmsg+0x1d/0x70
       [   41.428201]  sock_read_iter+0x84/0xd0
       [   41.445592]  ? 0xffffffffa3000000
       [   41.462442]  new_sync_read+0x148/0x160
       [   41.479314]  ? 0xffffffffa3000000
       [   41.496937]  vfs_read+0x138/0x190
       [   41.517198]  ksys_read+0x87/0xc0
       [   41.535336]  do_syscall_64+0x3b/0x90
       [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [   41.568050] RIP: 0033:0x48765b
       [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
       [   41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
       [   41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
       [   41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
       [   41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
       [   41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
       [   41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
       [   41.744254]  </TASK>
       [   41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net
      
       and
      
       [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
       [   33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
       [   33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
       [   33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
       [   33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
       [   33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
       [   33.532471] page dumped because: nonzero mapcount
       [   33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
       [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
       [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
       [   33.532484] Call Trace:
       [   33.532496]  <TASK>
       [   33.532500]  dump_stack_lvl+0x45/0x5a
       [   33.532506]  bad_page.cold+0x63/0x94
       [   33.532510]  free_pcp_prepare+0x290/0x420
       [   33.532515]  free_unref_page+0x1b/0x100
       [   33.532518]  skb_release_data+0x13f/0x1c0
       [   33.532524]  kfree_skb_reason+0x3e/0xc0
       [   33.532527]  ip6_mc_input+0x23c/0x2b0
       [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
       [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0
      
      [3] XDP program to reproduce(xdp_pass.c):
       #include <linux/bpf.h>
       #include <bpf/bpf_helpers.h>
      
       SEC("xdp_pass")
       int xdp_pkt_pass(struct xdp_md *ctx)
       {
                bpf_xdp_adjust_head(ctx, -(int)32);
                return XDP_PASS;
       }
      
       char _license[] SEC("license") = "GPL";
      
       compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
       load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
      
      CC: stable@vger.kernel.org
      CC: Jason Wang <jasowang@redhat.com>
      CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: "Michael S. Tsirkin" <mst@redhat.com>
      CC: virtualization@lists.linux-foundation.org
      Fixes: 8fb7da9e ("virtio_net: get build_skb() buf by data ptr")
      Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      acb16b39
    • N
      net: dsa: mv88e6xxx: Fix port_hidden_wait to account for port_base_addr · 24cbdb91
      Nathan Rossi 提交于
      The other port_hidden functions rely on the port_read/port_write
      functions to access the hidden control port. These functions apply the
      offset for port_base_addr where applicable. Update port_hidden_wait to
      use the port_wait_bit so that port_base_addr offsets are accounted for
      when waiting for the busy bit to change.
      
      Without the offset the port_hidden_wait function would timeout on
      devices that have a non-zero port_base_addr (e.g. MV88E6141), however
      devices that have a zero port_base_addr would operate correctly (e.g.
      MV88E6390).
      
      Fixes: 60907013 ("net: dsa: mv88e6xxx: update code operating on hidden registers")
      Signed-off-by: NNathan Rossi <nathan@nathanrossi.com>
      Reviewed-by: NMarek Behún <kabel@kernel.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220425070454.348584-1-nathan@nathanrossi.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      24cbdb91
    • B
      net: phy: marvell10g: fix return value on error · 0ed9704b
      Baruch Siach 提交于
      Return back the error value that we get from phy_read_mmd().
      
      Fixes: c84786fa ("net: phy: marvell10g: read copper results from CSSR1")
      Signed-off-by: NBaruch Siach <baruch.siach@siklu.com>
      Reviewed-by: NMarek Behún <kabel@kernel.org>
      Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/f47cb031aeae873bb008ba35001607304a171a20.1650868058.git.baruch@tkos.co.ilSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      0ed9704b
    • J
      net: bcmgenet: hide status block before TX timestamping · acac0541
      Jonathan Lemon 提交于
      The hardware checksum offloading requires use of a transmit
      status block inserted before the outgoing frame data, this was
      updated in '9a9ba2a4 ("net: bcmgenet: always enable status blocks")'
      
      However, skb_tx_timestamp() assumes that it is passed a raw frame
      and PTP parsing chokes on this status block.
      
      Fix this by calling __skb_pull(), which hides the TSB before calling
      skb_tx_timestamp(), so an outgoing PTP packet is parsed correctly.
      
      As the data in the skb has already been set up for DMA, and the
      dma_unmap_* calls use a separately stored address, there is no
      no effective change in the data transmission.
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220424165307.591145-1-jonathan.lemon@gmail.com
      Fixes: d03825fb ("net: bcmgenet: add skb_tx_timestamp call")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      acac0541
  3. 25 4月, 2022 10 次提交
    • V
      net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridge · 1fcb8fb3
      Vladimir Oltean 提交于
      DSA, through dsa_port_bridge_leave(), first notifies the port of the
      fact that it left a bridge, then, if that bridge was VLAN-aware, it
      notifies the port of the change in VLAN awareness state, towards
      VLAN-unaware mode.
      
      So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge
      is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct
      ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on
      that port.
      
      In a way this structure correctly reflects the reality, but by design,
      VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge
      VLAN list of the driver, but managed separately.
      
      Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on
      several sanity checks that did not expect to have this VID there.
      For example, after we leave a VLAN-aware bridge and we re-join it, we
      can no longer program egress-tagged VLANs to hardware:
      
       # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
       # ip link set swp0 master br0
       # ip link set swp0 nomaster
       # ip link set swp0 master br0
       # bridge vlan add dev swp0 vid 100
      Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
      
      But this configuration is in fact supported by the hardware, since we
      could use OCELOT_PORT_TAG_NATIVE. According to its comment:
      
      /* all VLANs except the native VLAN and VID 0 are egress-tagged */
      
      yet when assessing the eligibility for this mode, we do not check for
      VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that
      ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0
      doesn't have a bridge VLAN structure.
      
      The way I identify the problem is that ocelot_port_vlan_filtering(false)
      only means to call ocelot_add_vlan_unaware_pvid() when we dynamically
      turn off VLAN awareness for a bridge we are under, and the PVID changes
      from the bridge PVID to a reserved PVID based on the bridge number.
      
      Since OCELOT_STANDALONE_PVID is statically added to the VLAN table
      during ocelot_vlan_init() and never removed afterwards, calling
      ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve
      any purpose.
      
      Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0)
      when we're resetting VLAN awareness after leaving the bridge, to become
      a standalone port.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fcb8fb3
    • V
      net: mscc: ocelot: ignore VID 0 added by 8021q module · 9323ac36
      Vladimir Oltean 提交于
      Both the felix DSA driver and ocelot switchdev driver declare
      dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*,
      so the 8021q module will add VID 0 to our RX filter when the port goes
      up, to ensure 802.1p traffic is not dropped.
      
      We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which
      deliberately does not have a struct ocelot_bridge_vlan associated with
      it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init().
      
      If we allow external calls to modify VID 0, we reach the following
      situation:
      
       # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
       # ip link set swp0 master br0
       # ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false
      bridge vlan
      port              vlan-id
      swp0              1 PVID Egress Untagged # the bridge also adds VID 1
      br0               1 PVID Egress Untagged
       # bridge vlan add dev swp0 vid 100 untagged
      Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN.
      
      This configuration should have been accepted, because
      ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE.
      Yet it isn't, because we have an entry in ocelot->vlans which says
      VID 0 should be egress-tagged, something the hardware can't do.
      
      Fix this by suppressing additions/deletions on VID 0 and managing this
      VLAN exclusively using OCELOT_STANDALONE_PVID.
      
      *DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware
      bridge. Ocelot declares it unconditionally for some reason.
      
      Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9323ac36
    • D
      net: lan966x: fix a couple off by one bugs · 9810c58c
      Dan Carpenter 提交于
      The lan966x->ports[] array has lan966x->num_phys_ports elements.  These
      are assigned in lan966x_probe().  That means the > comparison should be
      changed to >=.
      
      The first off by one check is harmless but the second one could lead to
      an out of bounds access and a crash.
      
      Fixes: 5ccd66e0 ("net: lan966x: add support for interrupts from analyzer")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9810c58c
    • P
      net: hns: Add missing fwnode_handle_put in hns_mac_init · e85f8a9f
      Peng Wu 提交于
      In one of the error paths of the device_for_each_child_node() loop
      in hns_mac_init, add missing call to fwnode_handle_put.
      Signed-off-by: NPeng Wu <wupeng58@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e85f8a9f
    • J
      net: hns3: add return value for mailbox handling in PF · c59d6062
      Jian Shen 提交于
      Currently, there are some querying mailboxes sent from VF to PF,
      and VF will wait the PF's handling result. For mailbox
      HCLGE_MBX_GET_QID_IN_PF and HCLGE_MBX_GET_RSS_KEY, it may fail
      when the input parameter is invalid, but the prototype of their
      handler function is void. In this case, PF always return success
      to VF, which may cause the VF get incorrect result.
      
      Fixes it by adding return value for these function.
      
      Fixes: 63b1279d ("net: hns3: check queue id range before using")
      Fixes: 532cfc0d ("net: hns3: add a check for index in hclge_get_rss_key()")
      Signed-off-by: NJian Shen <shenjian15@huawei.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c59d6062
    • J
      net: hns3: add validity check for message data length · 7d413735
      Jian Shen 提交于
      Add validity check for message data length in function
      hclge_send_mbx_msg(), avoid unexpected overflow.
      
      Fixes: dde1a86e ("net: hns3: Add mailbox support to PF driver")
      Signed-off-by: NJian Shen <shenjian15@huawei.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d413735
    • J
      net: hns3: modify the return code of hclge_get_ring_chain_from_mbx · 48009e99
      Jie Wang 提交于
      Currently, function hclge_get_ring_chain_from_mbx will return -ENOMEM if
      ring_num is bigger than HCLGE_MBX_MAX_RING_CHAIN_PARAM_NUM. It is better to
      return -EINVAL for the invalid parameter case.
      
      So this patch fixes it by return -EINVAL in this abnormal branch.
      
      Fixes: 5d02a58d ("net: hns3: fix for buffer overflow smatch warning")
      Signed-off-by: NJie Wang <wangjie125@huawei.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48009e99
    • P
      net: hns3: fix error log of tx/rx tqps stats · 123521b6
      Peng Li 提交于
      The comments in function hclge_comm_tqps_update_stats is not right,
      so fix it.
      
      Fixes: 287db5c4 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse")
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      123521b6
    • H
      net: hns3: align the debugfs output to the left · 1ec1968e
      Hao Chen 提交于
      For debugfs node rx/tx_queue_info and rx/tx_bd_info, their output info is
      aligned to the right, it's not aligned with output of other debugfs node,
      so uniform their output info.
      
      Fixes: 907676b1 ("net: hns3: use tx bounce buffer for small packets")
      Fixes: e44c495d ("net: hns3: refactor queue info of debugfs")
      Fixes: 77e91848 ("net: hns3: refactor dump bd info of debugfs")
      Signed-off-by: NHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ec1968e
    • J
      net: hns3: clear inited state and stop client after failed to register netdev · e98365af
      Jian Shen 提交于
      If failed to register netdev, it needs to clear INITED state and stop
      client in case of cause problem when concurrency with uninitialized
      process of driver.
      
      Fixes: a289a7e5 ("net: hns3: put off calling register_netdev() until client initialize complete")
      Signed-off-by: NJian Shen <shenjian15@huawei.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e98365af
  4. 23 4月, 2022 3 次提交
    • D
      net: ethernet: stmmac: fix write to sgmii_adapter_base · 5fd1fe48
      Dinh Nguyen 提交于
      I made a mistake with the commit a6aaa003 ("net: ethernet: stmmac:
      fix altr_tse_pcs function when using a fixed-link"). I should have
      tested against both scenario of having a SGMII interface and one
      without.
      
      Without the SGMII PCS TSE adpater, the sgmii_adapter_base address is
      NULL, thus a write to this address will fail.
      
      Cc: stable@vger.kernel.org
      Fixes: a6aaa003 ("net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link")
      Signed-off-by: NDinh Nguyen <dinguyen@kernel.org>
      Link: https://lore.kernel.org/r/20220420152345.27415-1-dinguyen@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5fd1fe48
    • N
      wireguard: device: check for metadata_dst with skb_valid_dst() · 45ac774c
      Nikolay Aleksandrov 提交于
      When we try to transmit an skb with md_dst attached through wireguard
      we hit a null pointer dereference in wg_xmit() due to the use of
      dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to
      dereference dst->dev.
      
      Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which
      checks for DST_METADATA flag, and if it's set, then falls back to
      wireguard's device mtu. That gives us the best chance of transmitting
      the packet; otherwise if the blackhole netdev is used we'd get
      ETH_MIN_MTU.
      
       [  263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0
       [  263.693908] #PF: supervisor read access in kernel mode
       [  263.694174] #PF: error_code(0x0000) - not-present page
       [  263.694424] PGD 0 P4D 0
       [  263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI
       [  263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522
       [  263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
       [  263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20
       [  263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00
       [  263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246
       [  263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000
       [  263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900
       [  263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002
       [  263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00
       [  263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000
       [  263.698054] FS:  00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000
       [  263.698470] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0
       [  263.699214] Call Trace:
       [  263.699505]  <TASK>
       [  263.699759]  wg_xmit+0x411/0x450
       [  263.700059]  ? bpf_skb_set_tunnel_key+0x46/0x2d0
       [   263.700382]  ? dev_queue_xmit_nit+0x31/0x2b0
       [  263.700719]  dev_hard_start_xmit+0xd9/0x220
       [  263.701047]  __dev_queue_xmit+0x8b9/0xd30
       [  263.701344]  __bpf_redirect+0x1a4/0x380
       [  263.701664]  __dev_queue_xmit+0x83b/0xd30
       [  263.701961]  ? packet_parse_headers+0xb4/0xf0
       [  263.702275]  packet_sendmsg+0x9a8/0x16a0
       [  263.702596]  ? _raw_spin_unlock_irqrestore+0x23/0x40
       [  263.702933]  sock_sendmsg+0x5e/0x60
       [  263.703239]  __sys_sendto+0xf0/0x160
       [  263.703549]  __x64_sys_sendto+0x20/0x30
       [  263.703853]  do_syscall_64+0x3b/0x90
       [  263.704162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [  263.704494] RIP: 0033:0x7f3704d50506
       [  263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
       [  263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       [  263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506
       [  263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003
       [  263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014
       [  263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90
       [  263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001
       [  263.708132]  </TASK>
       [  263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge]
       [  263.708942] CR2: 00000000000000e0
      
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Link: https://github.com/cilium/cilium/issues/19428Reported-by: NMartynas Pumputis <m@lambda.lt>
      Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      45ac774c
    • L
      net: dsa: realtek: remove realtek,rtl8367s string · b107a639
      Luiz Angelo Daros de Luca 提交于
      There is no need to add new compatible strings for each new supported
      chip version. The compatible string is used only to select the subdriver
      (rtl8365mb.c or rtl8366rb.c). Once in the subdriver, it will detect the
      chip model by itself, ignoring which compatible string was used.
      
      Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/Signed-off-by: NLuiz Angelo Daros de Luca <luizluca@gmail.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Acked-by: NArınç ÜNAL <arinc.unal@arinc9.com>
      Link: https://lore.kernel.org/r/20220418233558.13541-2-luizluca@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b107a639
  5. 22 4月, 2022 1 次提交
  6. 21 4月, 2022 1 次提交
  7. 20 4月, 2022 1 次提交
    • K
      net: stmmac: Use readl_poll_timeout_atomic() in atomic state · 234901de
      Kevin Hao 提交于
      The init_systime() may be invoked in atomic state. We have observed the
      following call trace when running "phc_ctl /dev/ptp0 set" on a Intel
      Agilex board.
        BUG: sleeping function called from invalid context at drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:74
        in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 381, name: phc_ctl
        preempt_count: 1, expected: 0
        RCU nest depth: 0, expected: 0
        Preemption disabled at:
        [<ffff80000892ef78>] stmmac_set_time+0x34/0x8c
        CPU: 2 PID: 381 Comm: phc_ctl Not tainted 5.18.0-rc2-next-20220414-yocto-standard+ #567
        Hardware name: SoCFPGA Agilex SoCDK (DT)
        Call trace:
         dump_backtrace.part.0+0xc4/0xd0
         show_stack+0x24/0x40
         dump_stack_lvl+0x7c/0xa0
         dump_stack+0x18/0x34
         __might_resched+0x154/0x1c0
         __might_sleep+0x58/0x90
         init_systime+0x78/0x120
         stmmac_set_time+0x64/0x8c
         ptp_clock_settime+0x60/0x9c
         pc_clock_settime+0x6c/0xc0
         __arm64_sys_clock_settime+0x88/0xf0
         invoke_syscall+0x5c/0x130
         el0_svc_common.constprop.0+0x4c/0x100
         do_el0_svc+0x7c/0xa0
         el0_svc+0x58/0xcc
         el0t_64_sync_handler+0xa4/0x130
         el0t_64_sync+0x18c/0x190
      
      So we should use readl_poll_timeout_atomic() here instead of
      readl_poll_timeout().
      
      Also adjust the delay time to 10us to fix a "__bad_udelay" build error
      reported by "kernel test robot <lkp@intel.com>". I have tested this on
      Intel Agilex and NXP S32G boards, there is no delay needed at all.
      So the 10us delay should be long enough for most cases.
      
      Fixes: ff8ed737 ("net: stmmac: use readl_poll_timeout() function in init_systime()")
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      234901de
  8. 19 4月, 2022 1 次提交
  9. 18 4月, 2022 1 次提交
    • M
      net: atlantic: invert deep par in pm functions, preventing null derefs · cbe6c3a8
      Manuel Ullmann 提交于
      This will reset deeply on freeze and thaw instead of suspend and
      resume and prevent null pointer dereferences of the uninitialized ring
      0 buffer while thawing.
      
      The impact is an indefinitely hanging kernel. You can't switch
      consoles after this and the only possible user interaction is SysRq.
      
      BUG: kernel NULL pointer dereference
      RIP: 0010:aq_ring_rx_fill+0xcf/0x210 [atlantic]
      aq_vec_init+0x85/0xe0 [atlantic]
      aq_nic_init+0xf7/0x1d0 [atlantic]
      atl_resume_common+0x4f/0x100 [atlantic]
      pci_pm_thaw+0x42/0xa0
      
      resolves in aq_ring.o to
      
      ```
      0000000000000ae0 <aq_ring_rx_fill>:
      {
      /* ... */
       baf:	48 8b 43 08          	mov    0x8(%rbx),%rax
       		buff->flags = 0U; /* buff is NULL */
      ```
      
      The bug has been present since the introduction of the new pm code in
      8aaa112a ("net: atlantic: refactoring pm logic") and was hidden
      until 8ce84271 ("net: atlantic: changes for multi-TC support"),
      which refactored the aq_vec_{free,alloc} functions into
      aq_vec_{,ring}_{free,alloc}, but is technically not wrong. The
      original functions just always reinitialized the buffers on S3/S4. If
      the interface is down before freezing, the bug does not occur. It does
      not matter, whether the initrd contains and loads the module before
      thawing.
      
      So the fix is to invert the boolean parameter deep in all pm function
      calls, which was clearly intended to be set like that.
      
      First report was on Github [1], which you have to guess from the
      resume logs in the posted dmesg snippet. Recently I posted one on
      Bugzilla [2], since I did not have an AQC device so far.
      
      #regzbot introduced: 8ce84271
      #regzbot from: koo5 <kolman.jindrich@gmail.com>
      #regzbot monitor: https://github.com/Aquantia/AQtion/issues/32
      
      Fixes: 8aaa112a ("net: atlantic: refactoring pm logic")
      Link: https://github.com/Aquantia/AQtion/issues/32 [1]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215798 [2]
      Cc: stable@vger.kernel.org
      Reported-by: Nkoo5 <kolman.jindrich@gmail.com>
      Signed-off-by: NManuel Ullmann <labre@posteo.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbe6c3a8
  10. 17 4月, 2022 1 次提交
    • S
      bonding: do not discard lowest hash bit for non layer3+4 hashing · 49aefd13
      suresh kumar 提交于
      Commit b5f86218 was introduced to discard lowest hash bit for layer3+4 hashing
      but it also removes last bit from non layer3+4 hashing
      
      Below script shows layer2+3 hashing will result in same slave to be used with above commit.
      $ cat hash.py
      #/usr/bin/python3.6
      
      h_dests=[0xa0, 0xa1]
      h_source=0xe3
      hproto=0x8
      saddr=0x1e7aa8c0
      daddr=0x17aa8c0
      
      for h_dest in h_dests:
          hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
          hash ^= hash >> 16
          hash ^= hash >> 8
          print(hash)
      
      print("with last bit removed")
      for h_dest in h_dests:
          hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
          hash ^= hash >> 16
          hash ^= hash >> 8
          hash = hash >> 1
          print(hash)
      
      Output:
      $ python3.6 hash.py
      522133332
      522133333   <-------------- will result in both slaves being used
      
      with last bit removed
      261066666
      261066666   <-------------- only single slave used
      Signed-off-by: Nsuresh kumar <suresh2514@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49aefd13
  11. 16 4月, 2022 10 次提交
  12. 15 4月, 2022 6 次提交