1. 08 4月, 2021 2 次提交
  2. 07 4月, 2021 2 次提交
    • W
      ethtool: fix incorrect datatype in set_eee ops · 63cf3238
      Wong Vee Khee 提交于
      The member 'tx_lpi_timer' is defined with __u32 datatype in the ethtool
      header file. Hence, we should use ethnl_update_u32() in set_eee ops.
      
      Fixes: fd77be7b ("ethtool: set EEE settings with EEE_SET request")
      Cc: <stable@vger.kernel.org> # 5.10.x
      Cc: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWong Vee Khee <vee.khee.wong@linux.intel.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63cf3238
    • X
      tipc: increment the tmp aead refcnt before attaching it · 2a2403ca
      Xin Long 提交于
      Li Shuang found a NULL pointer dereference crash in her testing:
      
        [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
        [] RIP: 0010:tipc_crypto_rcv_complete+0xc8/0x7e0 [tipc]
        [] Call Trace:
        []  <IRQ>
        []  tipc_crypto_rcv+0x2d9/0x8f0 [tipc]
        []  tipc_rcv+0x2fc/0x1120 [tipc]
        []  tipc_udp_recv+0xc6/0x1e0 [tipc]
        []  udpv6_queue_rcv_one_skb+0x16a/0x460
        []  udp6_unicast_rcv_skb.isra.35+0x41/0xa0
        []  ip6_protocol_deliver_rcu+0x23b/0x4c0
        []  ip6_input+0x3d/0xb0
        []  ipv6_rcv+0x395/0x510
        []  __netif_receive_skb_core+0x5fc/0xc40
      
      This is caused by NULL returned by tipc_aead_get(), and then crashed when
      dereferencing it later in tipc_crypto_rcv_complete(). This might happen
      when tipc_crypto_rcv_complete() is called by two threads at the same time:
      the tmp attached by tipc_crypto_key_attach() in one thread may be released
      by the one attached by that in the other thread.
      
      This patch is to fix it by incrementing the tmp's refcnt before attaching
      it instead of calling tipc_aead_get() after attaching it.
      
      Fixes: fc1b6d6d ("tipc: introduce TIPC encryption & authentication")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a2403ca
  3. 06 4月, 2021 3 次提交
  4. 03 4月, 2021 1 次提交
  5. 02 4月, 2021 3 次提交
  6. 01 4月, 2021 3 次提交
    • O
      xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model · 622d1369
      Ong Boon Leong 提交于
      xdp_return_frame() may be called outside of NAPI context to return
      xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with
      napi_direct = false. For page_pool memory model, __xdp_return() calls
      xdp_return_frame_no_direct() unconditionally and below false negative
      kernel BUG throw happened under preempt-rt build:
      
      [  430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884
      [  430.451678] caller is __xdp_return+0x1ff/0x2e0
      [  430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G     U      E     5.12.0-rc2+ #45
      
      Changes in v2:
       - This patch fixes the issue by making xdp_return_frame_no_direct() is
         only called if napi_direct = true, as recommended for better by
         Jesper Dangaard Brouer. Thanks!
      
      Fixes: 2539650f ("xdp: Helpers for disabling napi_direct of xdp_return_frame")
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622d1369
    • L
      net/rds: Fix a use after free in rds_message_map_pages · bdc2ab5c
      Lv Yunlong 提交于
      In rds_message_map_pages, the rm is freed by rds_message_put(rm).
      But rm is still used by rm->data.op_sg in return value.
      
      My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is
      freed to avoid the uaf.
      
      Fixes: 7dba9203 ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()")
      Signed-off-by: NLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdc2ab5c
    • T
      neighbour: Disregard DEAD dst in neigh_update · d47ec7a0
      Tong Zhu 提交于
      After a short network outage, the dst_entry is timed out and put
      in DST_OBSOLETE_DEAD. We are in this code because arp reply comes
      from this neighbour after network recovers. There is a potential
      race condition that dst_entry is still in DST_OBSOLETE_DEAD.
      With that, another neighbour lookup causes more harm than good.
      
      In best case all packets in arp_queue are lost. This is
      counterproductive to the original goal of finding a better path
      for those packets.
      
      I observed a worst case with 4.x kernel where a dst_entry in
      DST_OBSOLETE_DEAD state is associated with loopback net_device.
      It leads to an ethernet header with all zero addresses.
      A packet with all zero source MAC address is quite deadly with
      mac80211, ath9k and 802.11 block ack.  It fails
      ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx
      queue (ath_tx_complete_aggr). BAW (block ack window) is not
      updated. BAW logic is damaged and ath9k transmission is disabled.
      Signed-off-by: NTong Zhu <zhutong@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d47ec7a0
  7. 31 3月, 2021 5 次提交
    • P
      net: let skb_orphan_partial wake-up waiters. · 9adc89af
      Paolo Abeni 提交于
      Currently the mentioned helper can end-up freeing the socket wmem
      without waking-up any processes waiting for more write memory.
      
      If the partially orphaned skb is attached to an UDP (or raw) socket,
      the lack of wake-up can hang the user-space.
      
      Even for TCP sockets not calling the sk destructor could have bad
      effects on TSQ.
      
      Address the issue using skb_orphan to release the sk wmem before
      setting the new sock_efree destructor. Additionally bundle the
      whole ownership update in a new helper, so that later other
      potential users could avoid duplicate code.
      
      v1 -> v2:
       - use skb_orphan() instead of sort of open coding it (Eric)
       - provide an helper for the ownership change (Eric)
      
      Fixes: f6ba8d33 ("netem: fix skb_orphan_partial()")
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9adc89af
    • Y
      sch_htb: fix null pointer dereference on a null new_q · ae81feb7
      Yunjian Wang 提交于
      sch_htb: fix null pointer dereference on a null new_q
      
      Currently if new_q is null, the null new_q pointer will be
      dereference when 'q->offload' is true. Fix this by adding
      a braces around htb_parent_to_leaf_offload() to avoid it.
      
      Addresses-Coverity: ("Dereference after null check")
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: NYunjian Wang <wangyunjian@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae81feb7
    • L
      net: qrtr: Fix memory leak on qrtr_tx_wait failure · 8a03dd92
      Loic Poulain 提交于
      qrtr_tx_wait does not check for radix_tree_insert failure, causing
      the 'flow' object to be unreferenced after qrtr_tx_wait return. Fix
      that by releasing flow on radix_tree_insert failure.
      
      Fixes: 5fdeb0d3 ("net: qrtr: Implement outgoing flow control")
      Reported-by: syzbot+739016799a89c530b32a@syzkaller.appspotmail.com
      Signed-off-by: NLoic Poulain <loic.poulain@linaro.org>
      Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Reviewed-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a03dd92
    • K
      net: sched: bump refcount for new action in ACT replace mode · 6855e821
      Kumar Kartikeya Dwivedi 提交于
      Currently, action creation using ACT API in replace mode is buggy.
      When invoking for non-existent action index 42,
      
      	tc action replace action bpf obj foo.o sec <xyz> index 42
      
      kernel creates the action, fills up the netlink response, and then just
      deletes the action after notifying userspace.
      
      	tc action show action bpf
      
      doesn't list the action.
      
      This happens due to the following sequence when ovr = 1 (replace mode)
      is enabled:
      
      tcf_idr_check_alloc is used to atomically check and either obtain
      reference for existing action at index, or reserve the index slot using
      a dummy entry (ERR_PTR(-EBUSY)).
      
      This is necessary as pointers to these actions will be held after
      dropping the idrinfo lock, so bumping the reference count is necessary
      as we need to insert the actions, and notify userspace by dumping their
      attributes. Finally, we drop the reference we took using the
      tcf_action_put_many call in tcf_action_add. However, for the case where
      a new action is created due to free index, its refcount remains one.
      This when paired with the put_many call leads to the kernel setting up
      the action, notifying userspace of its creation, and then tearing it
      down. For existing actions, the refcount is still held so they remain
      unaffected.
      
      Fortunately due to rtnl_lock serialization requirement, such an action
      with refcount == 1 will not be concurrently deleted by anything else, at
      best CLS API can move its refcount up and down by binding to it after it
      has been published from tcf_idr_insert_many. Since refcount is atleast
      one until put_many call, CLS API cannot delete it. Also __tcf_action_put
      release path already ensures deterministic outcome (either new action
      will be created or existing action will be reused in case CLS API tries
      to bind to action concurrently) due to idr lock serialization.
      
      We fix this by making refcount of newly created actions as 2 in ACT API
      replace mode. A relaxed store will suffice as visibility is ensured only
      after the tcf_idr_insert_many call.
      
      Note that in case of creation or overwriting using CLS API only (i.e.
      bind = 1), overwriting existing action object is not allowed, and any
      such request is silently ignored (without error).
      
      The refcount bump that occurs in tcf_idr_check_alloc call there for
      existing action will pair with tcf_exts_destroy call made from the
      owner module for the same action. In case of action creation, there
      is no existing action, so no tcf_exts_destroy callback happens.
      
      This means no code changes for CLS API.
      
      Fixes: cae422f3 ("net: sched: use reference counting action init")
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6855e821
    • M
      net/ncsi: Avoid channel_monitor hrtimer deadlock · 03cb4d05
      Milton Miller 提交于
      Calling ncsi_stop_channel_monitor from channel_monitor is a guaranteed
      deadlock on SMP because stop calls del_timer_sync on the timer that
      invoked channel_monitor as its timer function.
      
      Recognise the inherent race of marking the monitor disabled before
      deleting the timer by just returning if enable was cleared.  After
      a timeout (the default case -- reset to START when response received)
      just mark the monitor.enabled false.
      
      If the channel has an entry on the channel_queue list, or if the
      state is not ACTIVE or INACTIVE, then warn and mark the timer stopped
      and don't restart, as the locking is broken somehow.
      
      Fixes: 0795fb20 ("net/ncsi: Stop monitor if channel times out or is inactive")
      Signed-off-by: NMilton Miller <miltonm@us.ibm.com>
      Signed-off-by: NEddie James <eajames@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03cb4d05
  8. 30 3月, 2021 3 次提交
    • D
      xfrm/compat: Cleanup WARN()s that can be user-triggered · ef19e111
      Dmitry Safonov 提交于
      Replace WARN_ONCE() that can be triggered from userspace with
      pr_warn_once(). Those still give user a hint what's the issue.
      
      I've left WARN()s that are not possible to trigger with current
      code-base and that would mean that the code has issues:
      - relying on current compat_msg_min[type] <= xfrm_msg_min[type]
      - expected 4-byte padding size difference between
        compat_msg_min[type] and xfrm_msg_min[type]
      - compat_policy[type].len <= xfrma_policy[type].len
      (for every type)
      
      Reported-by: syzbot+834ffd1afc7212eb8147@syzkaller.appspotmail.com
      Fixes: 5f3eea6b ("xfrm/compat: Attach xfrm dumps to 64=>32 bit translator")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      ef19e111
    • L
      net:tipc: Fix a double free in tipc_sk_mcast_rcv · 6bf24dc0
      Lv Yunlong 提交于
      In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get
      the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq
      and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq)
      finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time.
      
      Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after
      the branch completed.
      
      My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because
      this skb will be freed by kfree_skb(skb) finally.
      
      Fixes: cb1b7280 ("tipc: eliminate race condition at multicast reception")
      Signed-off-by: NLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bf24dc0
    • M
      net: dsa: Fix type was not set for devlink port · fb6ec87f
      Maxim Kochetkov 提交于
      If PHY is not available on DSA port (described at devicetree but absent or
      failed to detect) then kernel prints warning after 3700 secs:
      
      [ 3707.948771] ------------[ cut here ]------------
      [ 3707.948784] Type was not set for devlink port.
      [ 3707.948894] WARNING: CPU: 1 PID: 17 at net/core/devlink.c:8097 0xc083f9d8
      
      We should unregister the devlink port as a user port and
      re-register it as an unused port before executing "continue" in case of
      dsa_port_setup error.
      
      Fixes: 86f8b1c0 ("net: dsa: Do not make user port errors fatal")
      Signed-off-by: NMaxim Kochetkov <fido_max@inbox.ru>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb6ec87f
  9. 29 3月, 2021 3 次提交
  10. 26 3月, 2021 5 次提交
  11. 24 3月, 2021 3 次提交
    • X
      xfrm: BEET mode doesn't support fragments for inner packets · 68dc022d
      Xin Long 提交于
      BEET mode replaces the IP(6) Headers with new IP(6) Headers when sending
      packets. However, when it's a fragment before the replacement, currently
      kernel keeps the fragment flag and replace the address field then encaps
      it with ESP. It would cause in RX side the fragments to get reassembled
      before decapping with ESP, which is incorrect.
      
      In Xiumei's testing, these fragments went over an xfrm interface and got
      encapped with ESP in the device driver, and the traffic was broken.
      
      I don't have a good way to fix it, but only to warn this out in dmesg.
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      68dc022d
    • V
      net: bridge: don't notify switchdev for local FDB addresses · 6ab4c311
      Vladimir Oltean 提交于
      As explained in this discussion:
      https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
      
      the switchdev notifiers for FDB entries managed to have a zero-day bug.
      The bridge would not say that this entry is local:
      
      ip link add br0 type bridge
      ip link set swp0 master br0
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      and the switchdev driver would be more than happy to offload it as a
      normal static FDB entry. This is despite the fact that 'local' and
      non-'local' entries have completely opposite directions: a local entry
      is locally terminated and not forwarded, whereas a static entry is
      forwarded and not locally terminated. So, for example, DSA would install
      this entry on swp0 instead of installing it on the CPU port as it should.
      
      There is an even sadder part, which is that the 'local' flag is implicit
      if 'static' is not specified, meaning that this command produces the
      same result of adding a 'local' entry:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master
      
      I've updated the man pages for 'bridge', and after reading it now, it
      should be pretty clear to any user that the commands above were broken
      and should have never resulted in the 00:01:02:03:04:05 address being
      forwarded (this behavior is coherent with non-switchdev interfaces):
      https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/
      If you're a user reading this and this is what you want, just use:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master static
      
      Because switchdev should have given drivers the means from day one to
      classify FDB entries as local/non-local, but didn't, it means that all
      drivers are currently broken. So we can just as well omit the switchdev
      notifications for local FDB entries, which is exactly what this patch
      does to close the bug in stable trees. For further development work
      where drivers might want to trap the local FDB entries to the host, we
      can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and
      selectively make drivers act upon that bit, while all the others ignore
      those entries if the 'is_local' bit is set.
      
      Fixes: 6b26b51b ("net: bridge: Add support for notifying devices about FDB add/del")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ab4c311
    • M
      net/sched: act_ct: clear post_ct if doing ct_clear · 8ca1b090
      Marcelo Ricardo Leitner 提交于
      Invalid detection works with two distinct moments: act_ct tries to find
      a conntrack entry and set post_ct true, indicating that that was
      attempted. Then, when flow dissector tries to dissect CT info and no
      entry is there, it knows that it was tried and no entry was found, and
      synthesizes/sets
                        key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
                                        TCA_FLOWER_KEY_CT_FLAGS_INVALID;
      mimicing what OVS does.
      
      OVS has this a bit more streamlined, as it recomputes the key after
      trying to find a conntrack entry for it.
      
      Issue here is, when we have 'tc action ct clear', it didn't clear
      post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to
      the above. The fix, thus, is to clear it.
      
      Reproducer rules:
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \
      	protocol ip flower ip_proto tcp ct_state -trk \
      	action ct zone 1 pipe \
      	action goto chain 2
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \
      	protocol ip flower \
      	action ct clear pipe \
      	action goto chain 4
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \
      	protocol ip flower ct_state -trk \
      	action mirred egress redirect dev enp130s0f1np1_0
      
      With the fix, the 3rd rule matches, like it does with OVS kernel
      datapath.
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ca1b090
  12. 23 3月, 2021 2 次提交
  13. 22 3月, 2021 3 次提交
    • X
      esp: delete NETIF_F_SCTP_CRC bit from features for esp offload · 154deab6
      Xin Long 提交于
      Now in esp4/6_gso_segment(), before calling inner proto .gso_segment,
      NETIF_F_CSUM_MASK bits are deleted, as HW won't be able to do the
      csum for inner proto due to the packet encrypted already.
      
      So the UDP/TCP packet has to do the checksum on its own .gso_segment.
      But SCTP is using CRC checksum, and for that NETIF_F_SCTP_CRC should
      be deleted to make SCTP do the csum in own .gso_segment as well.
      
      In Xiumei's testing with SCTP over IPsec/veth, the packets are kept
      dropping due to the wrong CRC checksum.
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Fixes: 7862b405 ("esp: Add gso handlers for esp4 and esp6")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      154deab6
    • A
      net: xfrm: Use sequence counter with associated spinlock · bc8e0adf
      Ahmed S. Darwish 提交于
      A sequence counter write section must be serialized or its internal
      state can get corrupted. A plain seqcount_t does not contain the
      information of which lock must be held to guaranteee write side
      serialization.
      
      For xfrm_state_hash_generation, use seqcount_spinlock_t instead of plain
      seqcount_t.  This allows to associate the spinlock used for write
      serialization with the sequence counter. It thus enables lockdep to
      verify that the write serialization lock is indeed held before entering
      the sequence counter write section.
      
      If lockdep is disabled, this lock association is compiled out and has
      neither storage size nor runtime overhead.
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      bc8e0adf
    • A
      net: xfrm: Localize sequence counter per network namespace · e88add19
      Ahmed S. Darwish 提交于
      A sequence counter write section must be serialized or its internal
      state can get corrupted. The "xfrm_state_hash_generation" seqcount is
      global, but its write serialization lock (net->xfrm.xfrm_state_lock) is
      instantiated per network namespace. The write protection is thus
      insufficient.
      
      To provide full protection, localize the sequence counter per network
      namespace instead. This should be safe as both the seqcount read and
      write sections access data exclusively within the network namespace. It
      also lays the foundation for transforming "xfrm_state_hash_generation"
      data type from seqcount_t to seqcount_LOCKNAME_t in further commits.
      
      Fixes: b65e3d7b ("xfrm: state: add sequence count to detect hash resizes")
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e88add19
  14. 21 3月, 2021 1 次提交
  15. 20 3月, 2021 1 次提交