1. 10 3月, 2019 7 次提交
    • P
      netlabel: fix out-of-bounds memory accesses · e3713abc
      Paul Moore 提交于
      [ Upstream commit 5578de4834fe0f2a34fedc7374be691443396d1f ]
      
      There are two array out-of-bounds memory accesses, one in
      cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk().  Both
      errors are embarassingly simple, and the fixes are straightforward.
      
      As a FYI for anyone backporting this patch to kernels prior to v4.8,
      you'll want to apply the netlbl_bitmap_walk() patch to
      cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before
      Linux v4.8.
      Reported-by: NJann Horn <jannh@google.com>
      Fixes: 446fda4f ("[NetLabel]: CIPSOv4 engine")
      Fixes: 3faa8f98 ("netlabel: Move bitmap manipulation functions to the NetLabel core.")
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3713abc
    • H
      ipv4: Add ICMPv6 support when parse route ipproto · 99ed9458
      Hangbin Liu 提交于
      [ Upstream commit 5e1a99eae84999a2536f50a0beaf5d5262337f40 ]
      
      For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
      But for ip -6 route, currently we only support tcp, udp and icmp.
      
      Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.
      
      v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
      rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: eacb9384 ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99ed9458
    • E
      tipc: fix RDM/DGRAM connect() regression · 8d1b9800
      Erik Hugne 提交于
      [ Upstream commit 0e63208915a8d7590d0a6218dadb2a6a00ac705a ]
      
      Fix regression bug introduced in
      commit 365ad353 ("tipc: reduce risk of user starvation during link
      congestion")
      
      Only signal -EDESTADDRREQ for RDM/DGRAM if we don't have a cached
      sockaddr.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Signed-off-by: NErik Hugne <erik.hugne@gmail.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d1b9800
    • X
      sctp: call iov_iter_revert() after sending ABORT · 8085d6d0
      Xin Long 提交于
      [ Upstream commit 901efe12318b1ea8d3e2c88a7b75ed6e6d5d7245 ]
      
      The user msg is also copied to the abort packet when doing SCTP_ABORT in
      sctp_sendmsg_check_sflags(). When SCTP_SENDALL is set, iov_iter_revert()
      should have been called for sending abort on the next asoc with copying
      this msg. Otherwise, memcpy_from_msg() in sctp_make_abort_user() will
      fail and return error.
      
      Fixes: 49102805 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8085d6d0
    • Y
      net-sysfs: Fix mem leak in netdev_register_kobject · 7ce2a517
      YueHaibing 提交于
      [ Upstream commit 895a5e96dbd6386c8e78e5b78e067dcc67b7f0ab ]
      
      syzkaller report this:
      BUG: memory leak
      unreferenced object 0xffff88837a71a500 (size 256):
        comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff  ........ .......
        backtrace:
          [<00000000db12624b>] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
          [<00000000dc49a994>] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
          [<00000000e5f3fea0>] tun_set_iff drivers/net/tun.c:2649 [inline]
          [<00000000e5f3fea0>] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
          [<000000001b8ac127>] vfs_ioctl fs/ioctl.c:46 [inline]
          [<000000001b8ac127>] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
          [<0000000079b269f8>] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
          [<00000000de649beb>] __do_sys_ioctl fs/ioctl.c:712 [inline]
          [<00000000de649beb>] __se_sys_ioctl fs/ioctl.c:710 [inline]
          [<00000000de649beb>] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
          [<000000007ebded1e>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
          [<00000000db315d36>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<00000000115be9bb>] 0xffffffffffffffff
      
      It should call kset_unregister to free 'dev->queues_kset'
      in error path of register_queue_kobjects, otherwise will cause a mem leak.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: 1d24eb48 ("xps: Transmit Packet Steering")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ce2a517
    • E
      net: sched: put back q.qlen into a single location · 3043bfe0
      Eric Dumazet 提交于
      [ Upstream commit 46b1c18f9deb326a7e18348e668e4c7ab7c7458b ]
      
      In the series fc8b81a5 ("Merge branch 'lockless-qdisc-series'")
      John made the assumption that the data path had no need to read
      the qdisc qlen (number of packets in the qdisc).
      
      It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
      children.
      
      But pfifo_fast can be used as leaf in class full qdiscs, and existing
      logic needs to access the child qlen in an efficient way.
      
      HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
        htb_activate() -> WARN_ON()
        htb_dequeue_tree() to decide if a class can be htb_deactivated
        when it has no more packets.
      
      HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
      qdisc_tree_reduce_backlog() also read q.qlen directly.
      
      Using qdisc_qlen_sum() (which iterates over all possible cpus)
      in the data path is a non starter.
      
      It seems we have to put back qlen in a central location,
      at least for stable kernels.
      
      For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
      so the existing q.qlen{++|--} are correct.
      
      For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
      because the spinlock might be not held (for example from
      pfifo_fast_enqueue() and pfifo_fast_dequeue())
      
      This patch adds atomic_qlen (in the same location than qlen)
      and renames the following helpers, since we want to express
      they can be used without qdisc lock, and that qlen is no longer percpu.
      
      - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
      - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()
      
      Later (net-next) we might revert this patch by tracking all these
      qlen uses and replace them by a more efficient method (not having
      to access a precise qlen, but an empty/non_empty status that might
      be less expensive to maintain/track).
      
      Another possibility is to have a legacy pfifo_fast version that would
      be used when used a a child qdisc, since the parent qdisc needs
      a spinlock anyway. But then, future lockless qdiscs would also
      have the same problem.
      
      Fixes: 7e66016f ("net: sched: helpers to sum qlen and qlen for per cpu logic")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3043bfe0
    • I
      ip6mr: Do not call __IP6_INC_STATS() from preemptible context · b5ff77dd
      Ido Schimmel 提交于
      [ Upstream commit 87c11f1ddbbad38ad8bad47af133a8208985fbdf ]
      
      Similar to commit 44f49dd8 ("ipmr: fix possible race resulting from
      improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
      assume preemption is disabled when incrementing the counter and
      accessing a per-CPU variable.
      
      Preemption can be enabled when we add a route in process context that
      corresponds to packets stored in the unresolved queue, which are then
      forwarded using this route [1].
      
      Fix this by using IP6_INC_STATS() which takes care of disabling
      preemption on architectures where it is needed.
      
      [1]
      [  157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314
      [  157.460409] caller is ip6mr_forward2+0x73e/0x10e0
      [  157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336
      [  157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  157.460461] Call Trace:
      [  157.460486]  dump_stack+0xf9/0x1be
      [  157.460553]  check_preemption_disabled+0x1d6/0x200
      [  157.460576]  ip6mr_forward2+0x73e/0x10e0
      [  157.460705]  ip6_mr_forward+0x9a0/0x1510
      [  157.460771]  ip6mr_mfc_add+0x16b3/0x1e00
      [  157.461155]  ip6_mroute_setsockopt+0x3cb/0x13c0
      [  157.461384]  do_ipv6_setsockopt.isra.8+0x348/0x4060
      [  157.462013]  ipv6_setsockopt+0x90/0x110
      [  157.462036]  rawv6_setsockopt+0x4a/0x120
      [  157.462058]  __sys_setsockopt+0x16b/0x340
      [  157.462198]  __x64_sys_setsockopt+0xbf/0x160
      [  157.462220]  do_syscall_64+0x14d/0x610
      [  157.462349]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 0912ea38 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAmit Cohen <amitc@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5ff77dd
  2. 06 3月, 2019 5 次提交
    • C
      cfg80211: extend range deviation for DMG · 99b1dbe6
      Chaitanya Tata 提交于
      [ Upstream commit 93183bdbe73bbdd03e9566c8dc37c9d06b0d0db6 ]
      
      Recently, DMG frequency bands have been extended till 71GHz, so extend
      the range check till 20GHz (45-71GHZ), else some channels will be marked
      as disabled.
      Signed-off-by: NChaitanya Tata <Chaitanya.Tata@bluwireless.co.uk>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      99b1dbe6
    • M
      mac80211: Add attribute aligned(2) to struct 'action' · 7a27cb60
      Mathieu Malaterre 提交于
      [ Upstream commit 7c53eb5d87bc21464da4268c3c0c47457b6d9c9b ]
      
      During refactor in commit 9e478066 ("mac80211: fix MU-MIMO
      follow-MAC mode") a new struct 'action' was declared with packed
      attribute as:
      
        struct {
                struct ieee80211_hdr_3addr hdr;
                u8 category;
                u8 action_code;
        } __packed action;
      
      But since struct 'ieee80211_hdr_3addr' is declared with an aligned
      keyword as:
      
        struct ieee80211_hdr {
        	__le16 frame_control;
        	__le16 duration_id;
        	u8 addr1[ETH_ALEN];
        	u8 addr2[ETH_ALEN];
        	u8 addr3[ETH_ALEN];
        	__le16 seq_ctrl;
        	u8 addr4[ETH_ALEN];
        } __packed __aligned(2);
      
      Solve the ambiguity of placing aligned structure in a packed one by
      adding the aligned(2) attribute to struct 'action'.
      
      This removes the following warning (W=1):
      
        net/mac80211/rx.c:234:2: warning: alignment 1 of 'struct <anonymous>' is less than 2 [-Wpacked-not-aligned]
      
      Cc: Johannes Berg <johannes.berg@intel.com>
      Suggested-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7a27cb60
    • B
      mac80211: don't initiate TDLS connection if station is not associated to AP · 0a7c9282
      Balaji Pothunoori 提交于
      [ Upstream commit 7ed5285396c257fd4070b1e29e7b2341aae2a1ce ]
      
      Following call trace is observed while adding TDLS peer entry in driver
      during TDLS setup.
      
      Call Trace:
      [<c1301476>] dump_stack+0x47/0x61
      [<c10537d2>] __warn+0xe2/0x100
      [<fa22415f>] ? sta_apply_parameters+0x49f/0x550 [mac80211]
      [<c1053895>] warn_slowpath_null+0x25/0x30
      [<fa22415f>] sta_apply_parameters+0x49f/0x550 [mac80211]
      [<fa20ad42>] ? sta_info_alloc+0x1c2/0x450 [mac80211]
      [<fa224623>] ieee80211_add_station+0xe3/0x160 [mac80211]
      [<c1876fe3>] nl80211_new_station+0x273/0x420
      [<c170f6d9>] genl_rcv_msg+0x219/0x3c0
      [<c170f4c0>] ? genl_rcv+0x30/0x30
      [<c170ee7e>] netlink_rcv_skb+0x8e/0xb0
      [<c170f4ac>] genl_rcv+0x1c/0x30
      [<c170e8aa>] netlink_unicast+0x13a/0x1d0
      [<c170ec18>] netlink_sendmsg+0x2d8/0x390
      [<c16c5acd>] sock_sendmsg+0x2d/0x40
      [<c16c6369>] ___sys_sendmsg+0x1d9/0x1e0
      
      Fixing this by allowing TDLS setup request only when we have completed
      association.
      Signed-off-by: NBalaji Pothunoori <bpothuno@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0a7c9282
    • B
      mac80211: fix miscounting of ttl-dropped frames · a2887f6f
      Bob Copeland 提交于
      [ Upstream commit a0dc02039a2ee54fb4ae400e0b755ed30e73e58c ]
      
      In ieee80211_rx_h_mesh_fwding, we increment the 'dropped_frames_ttl'
      counter when we decrement the ttl to zero.  For unicast frames
      destined for other hosts, we stop processing the frame at that point.
      
      For multicast frames, we do not rebroadcast it in this case, but we
      do pass the frame up the stack to process it on this STA.  That
      doesn't match the usual definition of "dropped," so don't count
      those as such.
      
      With this change, something like `ping6 -i0.2 ff02::1%mesh0` from a
      peer in a ttl=1 network no longer increments the counter rapidly.
      Signed-off-by: NBob Copeland <bobcopeland@fb.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a2887f6f
    • T
      mac80211: Change default tx_sk_pacing_shift to 7 · a7c6cf3b
      Toke Høiland-Jørgensen 提交于
      commit 5c14a4d05f68415af9e41a4e667d1748d41d1baf upstream.
      
      When we did the original tests for the optimal value of sk_pacing_shift, we
      came up with 6 ms of buffering as the default. Sadly, 6 is not a power of
      two, so when picking the shift value I erred on the size of less buffering
      and picked 4 ms instead of 8. This was probably wrong; those 2 ms of extra
      buffering makes a larger difference than I thought.
      
      So, change the default pacing shift to 7, which corresponds to 8 ms of
      buffering. The point of diminishing returns really kicks in after 8 ms, and
      so having this as a default should cut down on the need for extensive
      per-device testing and overrides needed in the drivers.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      a7c6cf3b
  3. 27 2月, 2019 28 次提交
    • T
      netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put() · 6546e115
      Taehee Yoo 提交于
      commit 2a61d8b883bbad26b06d2e6cc3777a697e78830d upstream.
      
      A proc_remove() can sleep. so that it can't be inside of spin_lock.
      Hence proc_remove() is moved to outside of spin_lock. and it also
      adds mutex to sync create and remove of proc entry(config->pde).
      
      test commands:
      SHELL#1
         %while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
      	   --dport 9000  -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
      	   iptables -F; done
      
      SHELL#2
         %while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
      	   echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done
      
      [ 2949.569864] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
      [ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
      [ 2949.587920] 1 lock held by iptables/5472:
      [ 2949.592711]  #0: 000000008f0ebcf2 (&(&cn->lock)->rlock){+...}, at: refcount_dec_and_lock+0x24/0x50
      [ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: G        W         4.19.0-rc5+ #16
      [ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [ 2949.604212] Call Trace:
      [ 2949.604212]  dump_stack+0xc9/0x16b
      [ 2949.604212]  ? show_regs_print_info+0x5/0x5
      [ 2949.604212]  ___might_sleep+0x2eb/0x420
      [ 2949.604212]  ? set_rq_offline.part.87+0x140/0x140
      [ 2949.604212]  ? _rcu_barrier_trace+0x400/0x400
      [ 2949.604212]  wait_for_completion+0x94/0x710
      [ 2949.604212]  ? wait_for_completion_interruptible+0x780/0x780
      [ 2949.604212]  ? __kernel_text_address+0xe/0x30
      [ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
      [ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
      [ 2949.604212]  ? __init_waitqueue_head+0x86/0x130
      [ 2949.604212]  ? init_wait_entry+0x1a0/0x1a0
      [ 2949.604212]  proc_entry_rundown+0x208/0x270
      [ 2949.604212]  ? proc_reg_get_unmapped_area+0x370/0x370
      [ 2949.604212]  ? __lock_acquire+0x4500/0x4500
      [ 2949.604212]  ? complete+0x18/0x70
      [ 2949.604212]  remove_proc_subtree+0x143/0x2a0
      [ 2949.708655]  ? remove_proc_entry+0x390/0x390
      [ 2949.708655]  clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
      [ ... ]
      
      Fixes: b3e456fc ("netfilter: ipt_CLUSTERIP: fix a race condition of proc file creation")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6546e115
    • F
      netfilter: nfnetlink_osf: add missing fmatch check · 0c1054e0
      Fernando Fernandez Mancera 提交于
      commit 1a6a0951fc009f6d9fe8ebea2d2417d80d54097b upstream.
      
      When we check the tcp options of a packet and it doesn't match the current
      fingerprint, the tcp packet option pointer must be restored to its initial
      value in order to do the proper tcp options check for the next fingerprint.
      
      Here we can see an example.
      Assumming the following fingerprint base with two lines:
      
      S10:64:1:60:M*,S,T,N,W6:      Linux:3.0::Linux 3.0
      S20:64:1:60:M*,S,T,N,W7:      Linux:4.19:arch:Linux 4.1
      
      Where TCP options are the last field in the OS signature, all of them overlap
      except by the last one, ie. 'W6' versus 'W7'.
      
      In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
      TCP options pointer is updated after checking for the TCP options in the first
      line.
      
      Therefore, reset pointer back to where it should be.
      
      Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match")
      Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c1054e0
    • E
      netfilter: ipv6: Don't preserve original oif for loopback address · 783359cf
      Eli Cooper 提交于
      commit 15df03c661cb362366ecfc3a21820cb934f3e4ca upstream.
      
      Commit 508b09046c0f ("netfilter: ipv6: Preserve link scope traffic
      original oif") made ip6_route_me_harder() keep the original oif for
      link-local and multicast packets. However, it also affected packets
      for the loopback address because it used rt6_need_strict().
      
      REDIRECT rules in the OUTPUT chain rewrite the destination to loopback
      address; thus its oif should not be preserved. This commit fixes the bug
      that redirected local packets are being dropped. Actually the packet was
      not exactly dropped; Instead it was sent out to the original oif rather
      than lo. When a packet with daddr ::1 is sent to the router, it is
      effectively dropped.
      
      Fixes: 508b09046c0f ("netfilter: ipv6: Preserve link scope traffic original oif")
      Signed-off-by: NEli Cooper <elicooper@gmx.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      783359cf
    • P
      netfilter: nft_compat: use-after-free when deleting targets · a905b82e
      Pablo Neira Ayuso 提交于
      commit 753c111f655e38bbd52fc01321266633f022ebe2 upstream.
      
      Fetch pointer to module before target object is released.
      
      Fixes: 29e3880109e3 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a905b82e
    • P
      netfilter: nf_tables: fix flush after rule deletion in the same batch · 1500d94e
      Pablo Neira Ayuso 提交于
      commit 23b7ca4f745f21c2b9cfcb67fdd33733b3ae7e66 upstream.
      
      Flush after rule deletion bogusly hits -ENOENT. Skip rules that have
      been already from nft_delrule_by_chain() which is always called from the
      flush path.
      
      Fixes: cf9dc09d ("netfilter: nf_tables: fix missing rules flushing per table")
      Reported-by: NPhil Sutter <phil@nwl.cc>
      Acked-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1500d94e
    • H
      Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" · 6ecc7407
      Hangbin Liu 提交于
      commit 278e2148c07559dd4ad8602f22366d61eb2ee7b7 upstream.
      
      This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list
      when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net:
      bridge: remove ipv6 zero address check in mcast queries")
      
      The reason is RFC 4541 is not a standard but suggestive. Currently we
      will elect 0.0.0.0 as Querier if there is no ip address configured on
      bridge. If we do not add the port which recives query with source
      0.0.0.0 to router list, the IGMP reports will not be about to forward
      to Querier, IGMP data will also not be able to forward to dest.
      
      As Nikolay suggested, revert this change first and add a boolopt api
      to disable none-zero election in future if needed.
      Reported-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Reported-by: NSebastian Gottschall <s.gottschall@newmedia-net.de>
      Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
      Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ecc7407
    • F
      mac80211: allocate tailroom for forwarded mesh packets · 6bab27b6
      Felix Fietkau 提交于
      commit 51d0af222f6fa43134c6187ab4f374630f6e0d96 upstream.
      
      Forwarded packets enter the tx path through ieee80211_add_pending_skb,
      which skips the ieee80211_skb_resize call.
      Fixes WARN_ON in ccmp_encrypt_skb and resulting packet loss.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bab27b6
    • C
      net_sched: fix two more memory leaks in cls_tcindex · d569cb5a
      Cong Wang 提交于
      [ Upstream commit 1db817e75f5b9387b8db11e37d5f0624eb9223e0 ]
      
      struct tcindex_filter_result contains two parts:
      struct tcf_exts and struct tcf_result.
      
      For the local variable 'cr', its exts part is never used but
      initialized without being released properly on success path. So
      just completely remove the exts part to fix this leak.
      
      For the local variable 'new_filter_result', it is never properly
      released if not used by 'r' on success path.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d569cb5a
    • C
      net_sched: fix a memory leak in cls_tcindex · dcd62aa6
      Cong Wang 提交于
      [ Upstream commit 033b228e7f26b29ae37f8bfa1bc6b209a5365e9f ]
      
      When tcindex_destroy() destroys all the filter results in
      the perfect hash table, it invokes the walker to delete
      each of them. However, results with class==0 are skipped
      in either tcindex_walk() or tcindex_delete(), which causes
      a memory leak reported by kmemleak.
      
      This patch fixes it by skipping the walker and directly
      deleting these filter results so we don't miss any filter
      result.
      
      As a result of this change, we have to initialize exts->net
      properly in tcindex_alloc_perfect_hash(). For net-next, we
      need to consider whether we should initialize ->net in
      tcf_exts_init() instead, before that just directly test
      CONFIG_NET_CLS_ACT=y.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcd62aa6
    • C
      net_sched: fix a race condition in tcindex_destroy() · 056a1798
      Cong Wang 提交于
      [ Upstream commit 8015d93ebd27484418d4952284fd02172fa4b0b2 ]
      
      tcindex_destroy() invokes tcindex_destroy_element() via
      a walker to delete each filter result in its perfect hash
      table, and tcindex_destroy_element() calls tcindex_delete()
      which schedules tcf RCU works to do the final deletion work.
      Unfortunately this races with the RCU callback
      __tcindex_destroy(), which could lead to use-after-free as
      reported by Adrian.
      
      Fix this by migrating this RCU callback to tcf RCU work too,
      as that workqueue is ordered, we will not have use-after-free.
      
      Note, we don't need to hold netns refcnt because we don't call
      tcf_exts_destroy() here.
      
      Fixes: 27ce4f05 ("net_sched: use tcf_queue_work() in tcindex filter")
      Reported-by: NAdrian <bugs@abtelecom.ro>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      056a1798
    • H
      sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() · 86260097
      Hangbin Liu 提交于
      [ Upstream commit 173656accaf583698bac3f9e269884ba60d51ef4 ]
      
      If we disabled IPv6 from the kernel command line (ipv6.disable=1), we should
      not call ip6_err_gen_icmpv6_unreach(). This:
      
        ip link add sit1 type sit local 192.0.2.1 remote 192.0.2.2 ttl 1
        ip link set sit1 up
        ip addr add 198.51.100.1/24 dev sit1
        ping 198.51.100.2
      
      if IPv6 is disabled at boot time, will crash the kernel.
      
      v2: there's no need to use in6_dev_get(), use __in6_dev_get() instead,
          as we only need to check that idev exists and we are under
          rcu_read_lock() (from netif_receive_skb_internal()).
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: ca15a078 ("sit: generate icmpv6 error when receiving icmpv4 error")
      Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86260097
    • J
      net: socket: make bond ioctls go through compat_ifreq_ioctl() · 7aab1e6d
      Johannes Berg 提交于
      [ Upstream commit 98406133dd9cb9f195676eab540c270dceca879a ]
      
      Same story as before, these use struct ifreq and thus need
      to be read with the shorter version to not cause faults.
      
      Cc: stable@vger.kernel.org
      Fixes: f92d4fc9 ("kill bond_ioctl()")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7aab1e6d
    • J
      net: socket: fix SIOCGIFNAME in compat · e37c96c1
      Johannes Berg 提交于
      [ Upstream commit c6c9fee35dc27362b7bac34b2fc9f5b8ace2e22c ]
      
      As reported by Robert O'Callahan in
      https://bugzilla.kernel.org/show_bug.cgi?id=202273
      reverting the previous changes in this area broke
      the SIOCGIFNAME ioctl in compat again (I'd previously
      fixed it after his previous report of breakage in
      https://bugzilla.kernel.org/show_bug.cgi?id=199469).
      
      This is obviously because I fixed SIOCGIFNAME more or
      less by accident.
      
      Fix it explicitly now by making it pass through the
      restored compat translation code.
      
      Cc: stable@vger.kernel.org
      Fixes: 4cf808e7 ("kill dev_ifname32()")
      Reported-by: NRobert O'Callahan <robert@ocallahan.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e37c96c1
    • J
      Revert "kill dev_ifsioc()" · 50021ba9
      Johannes Berg 提交于
      [ Upstream commit 37ac39bdddc528c998a9f36db36937de923fdf2a ]
      
      This reverts commit bf440573 ("kill dev_ifsioc()").
      
      This wasn't really unused as implied by the original commit,
      it still handles the copy to/from user differently, and the
      commit thus caused issues such as
        https://bugzilla.kernel.org/show_bug.cgi?id=199469
      and
        https://bugzilla.kernel.org/show_bug.cgi?id=202273
      
      However, deviating from a strict revert, rename dev_ifsioc()
      to compat_ifreq_ioctl() to be clearer as to its purpose and
      add a comment.
      
      Cc: stable@vger.kernel.org
      Fixes: bf440573 ("kill dev_ifsioc()")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50021ba9
    • J
      Revert "socket: fix struct ifreq size in compat ioctl" · 99f3c896
      Johannes Berg 提交于
      [ Upstream commit 63ff03ab786ab1bc6cca01d48eacd22c95b9b3eb ]
      
      This reverts commit 1cebf8f1 ("socket: fix struct ifreq
      size in compat ioctl"), it's a bugfix for another commit that
      I'll revert next.
      
      This is not a 'perfect' revert, I'm keeping some coding style
      intact rather than revert to the state with indentation errors.
      
      Cc: stable@vger.kernel.org
      Fixes: 1cebf8f1 ("socket: fix struct ifreq size in compat ioctl")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99f3c896
    • X
      sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate · 5716864d
      Xin Long 提交于
      [ Upstream commit af98c5a78517c04adb5fd68bb64b1ad6fe3d473f ]
      
      In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
      surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
      stream->outcnt will not be set to 'outcnt'.
      
      With the bigger value on stream->outcnt, when closing the assoc and
      freeing its streams, the ext of those surplus streams will be freed
      again since those stream exts were not set to NULL after freeing in
      sctp_stream_outq_migrate(). Then the invalid-free issue reported by
      syzbot would be triggered.
      
      We fix it by simply setting them to NULL after freeing.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5716864d
    • X
      sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment · e8eff9f4
      Xin Long 提交于
      [ Upstream commit fc228abc2347e106a44c0e9b29ab70b712c4ca51 ]
      
      Jianlin reported a panic when running sctp gso over gre over vlan device:
      
        [   84.772930] RIP: 0010:do_csum+0x6d/0x170
        [   84.790605] Call Trace:
        [   84.791054]  csum_partial+0xd/0x20
        [   84.791657]  gre_gso_segment+0x2c3/0x390
        [   84.792364]  inet_gso_segment+0x161/0x3e0
        [   84.793071]  skb_mac_gso_segment+0xb8/0x120
        [   84.793846]  __skb_gso_segment+0x7e/0x180
        [   84.794581]  validate_xmit_skb+0x141/0x2e0
        [   84.795297]  __dev_queue_xmit+0x258/0x8f0
        [   84.795949]  ? eth_header+0x26/0xc0
        [   84.796581]  ip_finish_output2+0x196/0x430
        [   84.797295]  ? skb_gso_validate_network_len+0x11/0x80
        [   84.798183]  ? ip_finish_output+0x169/0x270
        [   84.798875]  ip_output+0x6c/0xe0
        [   84.799413]  ? ip_append_data.part.50+0xc0/0xc0
        [   84.800145]  iptunnel_xmit+0x144/0x1c0
        [   84.800814]  ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
        [   84.801699]  gre_tap_xmit+0xac/0xf0 [ip_gre]
        [   84.802395]  dev_hard_start_xmit+0xa5/0x210
        [   84.803086]  sch_direct_xmit+0x14f/0x340
        [   84.803733]  __dev_queue_xmit+0x799/0x8f0
        [   84.804472]  ip_finish_output2+0x2e0/0x430
        [   84.805255]  ? skb_gso_validate_network_len+0x11/0x80
        [   84.806154]  ip_output+0x6c/0xe0
        [   84.806721]  ? ip_append_data.part.50+0xc0/0xc0
        [   84.807516]  sctp_packet_transmit+0x716/0xa10 [sctp]
        [   84.808337]  sctp_outq_flush+0xd7/0x880 [sctp]
      
      It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
      sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
      which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().
      
      For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
      and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.
      
      So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
      computing checksum in sctp_gso_segment.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8eff9f4
    • K
      net/packet: fix 4gb buffer limit due to overflow check · c2ee2c70
      Kal Conley 提交于
      [ Upstream commit fc62814d690cf62189854464f4bd07457d5e9e50 ]
      
      When calculating rb->frames_per_block * req->tp_block_nr the result
      can overflow. Check it for overflow without limiting the total buffer
      size to UINT_MAX.
      
      This change fixes support for packet ring buffers >= UINT_MAX.
      
      Fixes: 8f8d28e4 ("net/packet: fix overflow in check for tp_frame_nr")
      Signed-off-by: NKal Conley <kal.conley@dectris.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2ee2c70
    • L
      ipv6: propagate genlmsg_reply return code · fd49ffa3
      Li RongQing 提交于
      [ Upstream commit d1f20798a119be71746949ba9b2e2ff330fdc038 ]
      
      genlmsg_reply can fail, so propagate its return code
      
      Fixes: 915d7e5e ("ipv6: sr: add code base for control plane support of SR-IPv6")
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd49ffa3
    • K
      inet_diag: fix reporting cgroup classid and fallback to priority · 589503cb
      Konstantin Khlebnikov 提交于
      [ Upstream commit 1ec17dbd90f8b638f41ee650558609c1af63dfa0 ]
      
      Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
      extensions has only 8 bits. Thus extensions starting from DCTCPINFO
      cannot be requested directly. Some of them included into response
      unconditionally or hook into some of lower 8 bits.
      
      Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
      
      This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
      reservation, and documents behavior for other extensions.
      
      Also this patch adds fallback to reporting socket priority. This filed
      is more widely used for traffic classification because ipv4 sockets
      automatically maps TOS to priority and default qdisc pfifo_fast knows
      about that. But priority could be changed via setsockopt SO_PRIORITY so
      INET_DIAG_TOS isn't enough for predicting class.
      
      Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
      reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
      
      So, after this patch INET_DIAG_CLASS_ID will report socket priority
      for most common setup when net_cls isn't set and/or cgroup2 in use.
      
      Fixes: 0888e372 ("net: inet: diag: expose sockets cgroup classid")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      589503cb
    • E
      batman-adv: fix uninit-value in batadv_interface_tx() · c580bb31
      Eric Dumazet 提交于
      [ Upstream commit 4ffcbfac60642f63ae3d80891f573ba7e94a265c ]
      
      KMSAN reported batadv_interface_tx() was possibly using a
      garbage value [1]
      
      batadv_get_vid() does have a pskb_may_pull() call
      but batadv_interface_tx() does not actually make sure
      this did not fail.
      
      [1]
      BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
      CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x173/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
       __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
       batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
       __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
       netdev_start_xmit include/linux/netdevice.h:4365 [inline]
       xmit_one net/core/dev.c:3257 [inline]
       dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
       __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
       dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
       packet_snd net/packet/af_packet.c:2928 [inline]
       packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       __sys_sendto+0x8c4/0xac0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1796
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x441889
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
      RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
      RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
      R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
       kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
       kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
       kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2759 [inline]
       __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
       __kmalloc_reserve net/core/skbuff.c:137 [inline]
       __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
       alloc_skb include/linux/skbuff.h:998 [inline]
       alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
       sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
       packet_alloc_skb net/packet/af_packet.c:2781 [inline]
       packet_snd net/packet/af_packet.c:2872 [inline]
       packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       __sys_sendto+0x8c4/0xac0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1796
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc:	Marek Lindner <mareklindner@neomailbox.ch>
      Cc:	Simon Wunderlich <sw@simonwunderlich.de>
      Cc:	Antonio Quartulli <a@unstable.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c580bb31
    • I
      net: bridge: Mark FDB entries that were added by user as such · 4799417b
      Ido Schimmel 提交于
      [ Upstream commit 710ae72877378e7cde611efd30fe90502a6e5b30 ]
      
      Externally learned entries can be added by a user or by a switch driver
      that is notifying the bridge driver about entries that were learned in
      hardware.
      
      In the first case, the entries are not marked with the 'added_by_user'
      flag, which causes switch drivers to ignore them and not offload them.
      
      The 'added_by_user' flag can be set on externally learned FDB entries
      based on the 'swdev_notify' parameter in br_fdb_external_learn_add(),
      which effectively means if the created / updated FDB entry was added by
      a user or not.
      
      Fixes: 816a3bed ("switchdev: Add fdb.added_by_user to switchdev notifications")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4799417b
    • P
      bpf: bpf_setsockopt: reset sock dst on SO_MARK changes · 8b92162f
      Peter Oskolkov 提交于
      [ Upstream commit f4924f24da8c7ef64195096817f3cde324091d97 ]
      
      In sock_setsockopt() (net/core/sock.h), when SO_MARK option is used
      to change sk_mark, sk_dst_reset(sk) is called. The same should be
      done in bpf_setsockopt().
      
      Fixes: 8c4b4c7e ("bpf: Add setsockopt helper function to bpf")
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8b92162f
    • H
      netfilter: nft_flow_offload: fix checking method of conntrack helper · 73aa8292
      Henry Yen 提交于
      [ Upstream commit 2314e879747e82896f51cce4488f6a00f3e1af7b ]
      
      This patch uses nfct_help() to detect whether an established connection
      needs conntrack helper instead of using test_bit(IPS_HELPER_BIT,
      &ct->status).
      
      The reason is that IPS_HELPER_BIT is only set when using explicit CT
      target.
      
      However, in the case that a device enables conntrack helper via command
      "echo 1 > /proc/sys/net/netfilter/nf_conntrack_helper", the status of
      IPS_HELPER_BIT will not present any change, and consequently it loses
      the checking ability in the context.
      Signed-off-by: NHenry Yen <henry.yen@mediatek.com>
      Reviewed-by: NRyder Lee <ryder.lee@mediatek.com>
      Tested-by: NJohn Crispin <john@phrozen.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      73aa8292
    • W
      netfilter: nft_flow_offload: fix interaction with vrf slave device · 6d26c375
      wenxu 提交于
      [ Upstream commit 10f4e765879e514e1ce7f52ed26603047af196e2 ]
      
      In the forward chain, the iif is changed from slave device to master vrf
      device. Thus, flow offload does not find a match on the lower slave
      device.
      
      This patch uses the cached route, ie. dst->dev, to update the iif and
      oif fields in the flow entry.
      
      After this patch, the following example works fine:
      
       # ip addr add dev eth0 1.1.1.1/24
       # ip addr add dev eth1 10.0.0.1/24
       # ip link add user1 type vrf table 1
       # ip l set user1 up
       # ip l set dev eth0 master user1
       # ip l set dev eth1 master user1
      
       # nft add table firewall
       # nft add flowtable f fb1 { hook ingress priority 0 \; devices = { eth0, eth1 } \; }
       # nft add chain f ftb-all {type filter hook forward priority 0 \; policy accept \; }
       # nft add rule f ftb-all ct zone 1 ip protocol tcp flow offload @fb1
       # nft add rule f ftb-all ct zone 1 ip protocol udp flow offload @fb1
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6d26c375
    • Y
      bpf: correctly set initial window on active Fast Open sender · 26354d53
      Yuchung Cheng 提交于
      [ Upstream commit 31aa6503a15ba00182ea6dbbf51afb63bf9e851d ]
      
      The existing BPF TCP initial congestion window (TCP_BPF_IW) does not
      to work on (active) Fast Open sender. This is because it changes the
      (initial) window only if data_segs_out is zero -- but data_segs_out
      is also incremented on SYN-data.  This patch fixes the issue by
      proerly accounting for SYN-data additionally.
      
      Fixes: fc747810 ("bpf: Adds support for setting initial cwnd")
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      26354d53
    • W
      netfilter: nft_flow_offload: Fix reverse route lookup · 535be469
      wenxu 提交于
      [ Upstream commit a799aea0988ea0d1b1f263e996fdad2f6133c680 ]
      
      Using the following example:
      
      	client 1.1.1.7 ---> 2.2.2.7 which dnat to 10.0.0.7 server
      
      The first reply packet (ie. syn+ack) uses an incorrect destination
      address for the reverse route lookup since it uses:
      
      	daddr = ct->tuplehash[!dir].tuple.dst.u3.ip;
      
      which is 2.2.2.7 in the scenario that is described above, while this
      should be:
      
      	daddr = ct->tuplehash[dir].tuple.src.u3.ip;
      
      that is 10.0.0.7.
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      535be469
    • T
      netfilter: nf_tables: fix leaking object reference count · 95d4f951
      Taehee Yoo 提交于
      [ Upstream commit b91d9036883793122cf6575ca4dfbfbdd201a83d ]
      
      There is no code that decreases the reference count of stateful objects
      in error path of the nft_add_set_elem(). this causes a leak of reference
      count of stateful objects.
      
      Test commands:
         $nft add table ip filter
         $nft add counter ip filter c1
         $nft add map ip filter m1 { type ipv4_addr : counter \;}
         $nft add element ip filter m1 { 1 : c1 }
         $nft add element ip filter m1 { 1 : c1 }
         $nft delete element ip filter m1 { 1 }
         $nft delete counter ip filter c1
      
      Result:
         Error: Could not process rule: Device or resource busy
         delete counter ip filter c1
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      At the second 'nft add element ip filter m1 { 1 : c1 }', the reference
      count of the 'c1' is increased then it tries to insert into the 'm1'. but
      the 'm1' already has same element so it returns -EEXIST.
      But it doesn't decrease the reference count of the 'c1' in the error path.
      Due to a leak of the reference count of the 'c1', the 'c1' can't be
      removed by 'nft delete counter ip filter c1'.
      
      Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      95d4f951