1. 19 8月, 2022 3 次提交
  2. 18 8月, 2022 1 次提交
  3. 17 8月, 2022 3 次提交
  4. 16 8月, 2022 2 次提交
  5. 15 8月, 2022 7 次提交
    • P
      netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END · fc0ae524
      Pablo Neira Ayuso 提交于
      These flags are mutually exclusive, report EINVAL in this case.
      
      Fixes: aaa31047 ("netfilter: nftables: add catch-all set element support")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fc0ae524
    • P
      netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags · 88cccd90
      Pablo Neira Ayuso 提交于
      If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the
      netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise,
      NFTA_SET_ELEM_KEY_END should not be present.
      
      For catch-all element, NFTA_SET_ELEM_KEY_END should not be present.
      The NFT_SET_ELEM_INTERVAL_END is never used with this set flags
      combination.
      
      Fixes: 7b225d0b ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      88cccd90
    • J
      net_sched: cls_route: disallow handle of 0 · 02799571
      Jamal Hadi Salim 提交于
      Follows up on:
      https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/
      
      handle of 0 implies from/to of universe realm which is not very
      sensible.
      
      Lets see what this patch will do:
      $sudo tc qdisc add dev $DEV root handle 1:0 prio
      
      //lets manufacture a way to insert handle of 0
      $sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
      route to 0 from 0 classid 1:10 action ok
      
      //gets rejected...
      Error: handle of 0 is not valid.
      We have an error talking to the kernel, -1
      
      //lets create a legit entry..
      sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
      classid 1:10 action ok
      
      //what did the kernel insert?
      $sudo tc filter ls dev $DEV parent 1:0
      filter protocol ip pref 100 route chain 0
      filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
      	action order 1: gact action pass
      	 random type none pass val 0
      	 index 1 ref 1 bind 1
      
      //Lets try to replace that legit entry with a handle of 0
      $ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
      handle 0x000a8000 route to 0 from 0 classid 1:10 action drop
      
      Error: Replacing with handle of 0 is invalid.
      We have an error talking to the kernel, -1
      
      And last, lets run Cascardo's POC:
      $ ./poc
      0
      0
      -22
      -22
      -22
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02799571
    • X
      net: fix potential refcount leak in ndisc_router_discovery() · 7396ba87
      Xin Xiong 提交于
      The issue happens on specific paths in the function. After both the
      object `rt` and `neigh` are grabbed successfully, when `lifetime` is
      nonzero but the metric needs change, the function just deletes the
      route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh`
      again if above conditions hold. The function simply overwrite `neigh`
      if succeeds or returns if fails, without decreasing the reference
      count of previous `neigh`. This may result in memory leaks.
      
      Fix it by decrementing the reference count of `neigh` in place.
      
      Fixes: 6b2e04bc ("net: allow user to set metric on default route learned via Router Advertisement")
      Signed-off-by: NXin Xiong <xiongx18@fudan.edu.cn>
      Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7396ba87
    • A
      neighbour: make proxy_queue.qlen limit per-device · 0ff4eb3d
      Alexander Mikhalitsyn 提交于
      Right now we have a neigh_param PROXY_QLEN which specifies maximum length
      of neigh_table->proxy_queue. But in fact, this limitation doesn't work well
      because check condition looks like:
      tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN)
      
      The problem is that p (struct neigh_parms) is a per-device thing,
      but tbl (struct neigh_table) is a system-wide global thing.
      
      It seems reasonable to make proxy_queue limit per-device based.
      
      v2:
      	- nothing changed in this patch
      v3:
      	- rebase to net tree
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
      Cc: kernel@openvz.org
      Cc: devel@openvz.org
      Suggested-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Reviewed-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ff4eb3d
    • D
      neigh: fix possible DoS due to net iface start/stop loop · 66ba215c
      Denis V. Lunev 提交于
      Normal processing of ARP request (usually this is Ethernet broadcast
      packet) coming to the host is looking like the following:
      * the packet comes to arp_process() call and is passed through routing
        procedure
      * the request is put into the queue using pneigh_enqueue() if
        corresponding ARP record is not local (common case for container
        records on the host)
      * the request is processed by timer (within 80 jiffies by default) and
        ARP reply is sent from the same arp_process() using
        NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
        pneigh_enqueue())
      
      And here the problem comes. Linux kernel calls pneigh_queue_purge()
      which destroys the whole queue of ARP requests on ANY network interface
      start/stop event through __neigh_ifdown().
      
      This is actually not a problem within the original world as network
      interface start/stop was accessible to the host 'root' only, which
      could do more destructive things. But the world is changed and there
      are Linux containers available. Here container 'root' has an access
      to this API and could be considered as untrusted user in the hosting
      (container's) world.
      
      Thus there is an attack vector to other containers on node when
      container's root will endlessly start/stop interfaces. We have observed
      similar situation on a real production node when docker container was
      doing such activity and thus other containers on the node become not
      accessible.
      
      The patch proposed doing very simple thing. It drops only packets from
      the same namespace in the pneigh_queue_purge() where network interface
      state change is detected. This is enough to prevent the problem for the
      whole node preserving original semantics of the code.
      
      v2:
      	- do del_timer_sync() if queue is empty after pneigh_queue_purge()
      v3:
      	- rebase to net tree
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
      Cc: kernel@openvz.org
      Cc: devel@openvz.org
      Investigated-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66ba215c
    • M
      net: qrtr: start MHI channel after endpoit creation · 68a838b8
      Maxim Kochetkov 提交于
      MHI channel may generates event/interrupt right after enabling.
      It may leads to 2 race conditions issues.
      
      1)
      Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:
      
      	if (!qdev || mhi_res->transaction_status)
      		return;
      
      Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
      this moment. In this situation qrtr-ns will be unable to enumerate
      services in device.
      ---------------------------------------------------------------
      
      2)
      Such event may come at the moment after dev_set_drvdata() and
      before qrtr_endpoint_register(). In this case kernel will panic with
      accessing wrong pointer at qcom_mhi_qrtr_dl_callback():
      
      	rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr,
      				mhi_res->bytes_xferd);
      
      Because endpoint is not created yet.
      --------------------------------------------------------------
      So move mhi_prepare_for_transfer_autoqueue after endpoint creation
      to fix it.
      
      Fixes: a2e2cc0d ("net: qrtr: Start MHI channels during init")
      Signed-off-by: NMaxim Kochetkov <fido_max@inbox.ru>
      Reviewed-by: NHemant Kumar <quic_hemantk@quicinc.com>
      Reviewed-by: NManivannan Sadhasivam <mani@kernel.org>
      Reviewed-by: NLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68a838b8
  6. 13 8月, 2022 1 次提交
  7. 12 8月, 2022 5 次提交
  8. 11 8月, 2022 13 次提交
  9. 10 8月, 2022 5 次提交
    • P
      netfilter: nf_tables: possible module reference underflow in error path · c485c35f
      Pablo Neira Ayuso 提交于
      dst->ops is set on when nft_expr_clone() fails, but module refcount has
      not been bumped yet, therefore nft_expr_destroy() leads to module
      reference underflow.
      
      Fixes: 8cfd9b0f ("netfilter: nftables: generalize set expressions support")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c485c35f
    • P
      netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag · 4963674c
      Pablo Neira Ayuso 提交于
      These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces
      the flag notation.
      
      Fixes: 7b225d0b ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4963674c
    • P
      netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access · 34002783
      Pablo Neira Ayuso 提交于
      The generation ID is bumped from the commit path while holding the
      mutex, however, netlink dump operations rely on RCU.
      
      This patch also adds missing cb->base_eq initialization in
      nf_tables_dump_set().
      
      Fixes: 38e029f1 ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      34002783
    • I
      devlink: Fix use-after-free after a failed reload · 6b4db2e5
      Ido Schimmel 提交于
      After a failed devlink reload, devlink parameters are still registered,
      which means user space can set and get their values. In the case of the
      mlxsw "acl_region_rehash_interval" parameter, these operations will
      trigger a use-after-free [1].
      
      Fix this by rejecting set and get operations while in the failed state.
      Return the "-EOPNOTSUPP" error code which does not abort the parameters
      dump, but instead causes it to skip over the problematic parameter.
      
      Another possible fix is to perform these checks in the mlxsw parameter
      callbacks, but other drivers might be affected by the same problem and I
      am not aware of scenarios where these stricter checks will cause a
      regression.
      
      [1]
      mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev
      mlxsw_spectrum3 0000:00:10.0: Failed to create ports
      
      ==================================================================
      BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
      Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777
      
      CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1
      Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: netns cleanup_net
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:313 [inline]
       print_report.cold+0x5e/0x5cf mm/kasan/report.c:429
       kasan_report+0xb9/0xf0 mm/kasan/report.c:491
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306
       mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
       mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106
       mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854
       devlink_param_get net/core/devlink.c:4981 [inline]
       devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089
       devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168
       devlink_ns_change_notify net/core/devlink.c:4417 [inline]
       devlink_ns_change_notify net/core/devlink.c:4396 [inline]
       devlink_reload+0x15f/0x700 net/core/devlink.c:4507
       devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272
       ops_pre_exit_list net/core/net_namespace.c:152 [inline]
       cleanup_net+0x494/0xc00 net/core/net_namespace.c:582
       process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289
       worker_thread+0x675/0x10b0 kernel/workqueue.c:2436
       kthread+0x30c/0x3d0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc
      flags: 0x100000000000000(node=0|zone=1)
      raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                          ^
       ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      
      Fixes: 98bbf70c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b4db2e5
    • P
      vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout() · a3e7b29e
      Peilin Ye 提交于
      Imagine two non-blocking vsock_connect() requests on the same socket.
      The first request schedules @connect_work, and after it times out,
      vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps
      *socket* state as SS_CONNECTING.
      
      Later, the second request returns -EALREADY, meaning the socket "already
      has a pending connection in progress", even though the first request has
      already timed out.
      
      As suggested by Stefano, fix it by setting *socket* state back to
      SS_UNCONNECTED, so that the second request will return -ETIMEDOUT.
      Suggested-by: NStefano Garzarella <sgarzare@redhat.com>
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3e7b29e