1. 22 3月, 2019 1 次提交
    • X
      sctp: use memdup_user instead of vmemdup_user · 5dc16ac5
      Xin Long 提交于
      commit ef82bcfa671b9a635bab5fa669005663d8b177c5 upstream.
      
      In sctp_setsockopt_bindx()/__sctp_setsockopt_connectx(), it allocates
      memory with addrs_size which is passed from userspace. We used flag
      GFP_USER to put some more restrictions on it in Commit cacc0621
      ("sctp: use GFP_USER for user-controlled kmalloc").
      
      However, since Commit c981f254 ("sctp: use vmemdup_user() rather
      than badly open-coding memdup_user()"), vmemdup_user() has been used,
      which doesn't check GFP_USER flag when goes to vmalloc_*(). So when
      addrs_size is a huge value, it could exhaust memory and even trigger
      oom killer.
      
      This patch is to use memdup_user() instead, in which GFP_USER would
      work to limit the memory allocation with a huge addrs_size.
      
      Note we can't fix it by limiting 'addrs_size', as there's no demand
      for it from RFC.
      
      Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
      Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      5dc16ac5
  2. 15 3月, 2019 1 次提交
    • G
      net: kernel hookers service for toa module · 33576e37
      George Zhang 提交于
      LVS fullnat will replace network traffic's source ip with its local ip,
      and thus the backend servers cannot obtain the real client ip.
      
      To solve this, LVS has introduced the tcp option address (TOA) to store
      the essential ip address information in the last tcp ack packet of the
      3-way handshake, and the backend servers need to retrieve it from the
      packet header.
      
      In this patch, we have introduced the sk_toa_data member in the sock
      structure to hold the TOA information. There used to be an in-tree
      module for TOA managing, whereas it has now been maintained as an
      standalone module.
      
      In this case, the toa module should register its hook function(s) using
      the provided interfaces in the hookers module.
      
      TOA in sock structure:
      
      	__be32 sk_toa_data[16];
      
      The hookers module only provides the sk_toa_data placeholder, and the
      toa module can use this variable through the layout it needs.
      
      Hook interfaces:
      
      The hookers module replaces the kernel's syn_recv_sock and getname
      handler with a stub that chains the toa module's hook function(s) to the
      original handling function. The hookers module allows hook functions to
      be installed and uninstalled in any order.
      
      toa module:
      
      The external toa module will be provided in separate RPM package.
      
      [xuyu@linux.alibaba.com: amend commit log]
      Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
      33576e37
  3. 15 2月, 2019 9 次提交
    • C
      svcrdma: Remove max_sge check at connect time · 9b65b18f
      Chuck Lever 提交于
      commit e248aa7be86e8179f20ac0931774ecd746f3f5bf upstream.
      
      Two and a half years ago, the client was changed to use gathered
      Send for larger inline messages, in commit 655fec69 ("xprtrdma:
      Use gathered Send for large inline messages"). Several fixes were
      required because there are a few in-kernel device drivers whose
      max_sge is 3, and these were broken by the change.
      
      Apparently my memory is going, because some time later, I submitted
      commit 25fd86ec ("svcrdma: Don't overrun the SGE array in
      svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma:
      Reduce max_send_sges"). These too incorrectly assumed in-kernel
      device drivers would have more than a few Send SGEs available.
      
      The fix for the server side is not the same. This is because the
      fundamental problem on the server is that, whether or not the client
      has provisioned a chunk for the RPC reply, the server must squeeze
      even the most complex RPC replies into a single RDMA Send. Failing
      in the send path because of Send SGE exhaustion should never be an
      option.
      
      Therefore, instead of failing when the send path runs out of SGEs,
      switch to using a bounce buffer mechanism to handle RPC replies that
      are too complex for the device to send directly. That allows us to
      remove the max_sge check to enable drivers with small max_sge to
      work again.
      Reported-by: NDon Dutile <ddutile@redhat.com>
      Fixes: 25fd86ec ("svcrdma: Don't overrun the SGE array in ...")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b65b18f
    • C
      svcrdma: Reduce max_send_sges · 4d376ab8
      Chuck Lever 提交于
      commit f3c1fd0ee294abd4367dfa72d89f016c682202f0 upstream.
      
      There's no need to request a large number of send SGEs because the
      inline threshold already constrains the number of SGEs per Send.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d376ab8
    • S
      batman-adv: Force mac header to start of data on xmit · 4dd911f1
      Sven Eckelmann 提交于
      commit 9114daa825fc3f335f9bea3313ce667090187280 upstream.
      
      The caller of ndo_start_xmit may not already have called
      skb_reset_mac_header. The returned value of skb_mac_header/eth_hdr
      therefore can be in the wrong position and even outside the current skbuff.
      This for example happens when the user binds to the device using a
      PF_PACKET-SOCK_RAW with enabled qdisc-bypass:
      
        int opt = 4;
        setsockopt(sock, SOL_PACKET, PACKET_QDISC_BYPASS, &opt, sizeof(opt));
      
      Since eth_hdr is used all over the codebase, the batadv_interface_tx
      function must always take care of resetting it.
      
      Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol")
      Reported-by: syzbot+9d7405c7faa390e60b4e@syzkaller.appspotmail.com
      Reported-by: syzbot+7d20bc3f1ddddc0f9079@syzkaller.appspotmail.com
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4dd911f1
    • S
      batman-adv: Avoid WARN on net_device without parent in netns · a2122230
      Sven Eckelmann 提交于
      commit 955d3411a17f590364238bd0d3329b61f20c1cd2 upstream.
      
      It is not allowed to use WARN* helpers on potential incorrect input from
      the user or transient problems because systems configured as panic_on_warn
      will reboot due to such a problem.
      
      A NULL return value of __dev_get_by_index can be caused by various problems
      which can either be related to the system configuration or problems
      (incorrectly returned network namespaces) in other (virtual) net_device
      drivers. batman-adv should not cause a (harmful) WARN in this situation and
      instead only report it via a simple message.
      
      Fixes: b7eddd0b ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
      Reported-by: syzbot+c764de0fcfadca9a8595@syzkaller.appspotmail.com
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2122230
    • F
      xfrm: refine validation of template and selector families · 9d84284c
      Florian Westphal 提交于
      commit 35e6103861a3a970de6c84688c6e7a1f65b164ca upstream.
      
      The check assumes that in transport mode, the first templates family
      must match the address family of the policy selector.
      
      Syzkaller managed to build a template using MODE_ROUTEOPTIMIZATION,
      with ipv4-in-ipv6 chain, leading to following splat:
      
      BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x1db/0x1854
      Read of size 4 at addr ffff888063e57aa0 by task a.out/2050
       xfrm_state_find+0x1db/0x1854
       xfrm_tmpl_resolve+0x100/0x1d0
       xfrm_resolve_and_create_bundle+0x108/0x1000 [..]
      
      Problem is that addresses point into flowi4 struct, but xfrm_state_find
      treats them as being ipv6 because it uses templ->encap_family is used
      (AF_INET6 in case of reproducer) rather than family (AF_INET).
      
      This patch inverts the logic: Enforce 'template family must match
      selector' EXCEPT for tunnel and BEET mode.
      
      In BEET and Tunnel mode, xfrm_tmpl_resolve_one will have remote/local
      address pointers changed to point at the addresses found in the template,
      rather than the flowi ones, so no oob read will occur.
      
      Reported-by: 3ntr0py1337@gmail.com
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d84284c
    • I
      libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() · f7fb58a7
      Ilya Dryomov 提交于
      commit 4aac9228d16458cedcfd90c7fb37211cf3653ac3 upstream.
      
      con_fault() can transition the connection into STANDBY right after
      ceph_con_keepalive() clears STANDBY in clear_standby():
      
          libceph user thread               ceph-msgr worker
      
      ceph_con_keepalive()
        mutex_lock(&con->mutex)
        clear_standby(con)
        mutex_unlock(&con->mutex)
                                      mutex_lock(&con->mutex)
                                      con_fault()
                                        ...
                                        if KEEPALIVE_PENDING isn't set
                                          set state to STANDBY
                                        ...
                                      mutex_unlock(&con->mutex)
        set KEEPALIVE_PENDING
        set WRITE_PENDING
      
      This triggers warnings in clear_standby() when either ceph_con_send()
      or ceph_con_keepalive() get to clearing STANDBY next time.
      
      I don't see a reason to condition queue_con() call on the previous
      value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
      into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
      could have been a non-atomic flag.
      
      Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Tested-by: NMyungho Jung <mhjungk@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7fb58a7
    • B
      xfrm: Make set-mark default behavior backward compatible · 8b8f7b04
      Benedict Wong 提交于
      commit e2612cd496e7b465711d219ea6118893d7253f52 upstream.
      
      Fixes 9b42c1f1, which changed the default route lookup behavior for
      tunnel mode SAs in the outbound direction to use the skb mark, whereas
      previously mark=0 was used if the output mark was unspecified. In
      mark-based routing schemes such as Android’s, this change in default
      behavior causes routing loops or lookup failures.
      
      This patch restores the default behavior of using a 0 mark while still
      incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is
      specified.
      
      Tested with additions to Android's kernel unit test suite:
      https://android-review.googlesource.com/c/kernel/tests/+/860150
      
      Fixes: 9b42c1f1 ("xfrm: Extend the output_mark to support input direction and masking")
      Signed-off-by: NBenedict Wong <benedictwong@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b8f7b04
    • B
      SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT · 2440f3ce
      Benjamin Coddington 提交于
      This patch is only appropriate for stable kernels v4.16 - v4.19
      
      Since commit 9b30889c ("SUNRPC: Ensure we always close the socket after
      a connection shuts down"), and until commit c544577daddb ("SUNRPC: Clean up
      transport write space handling"), it is possible for the NFS client to spin
      in the following tight loop:
      
      269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
      269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
      269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
      269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
      269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
      269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
      269.964085: rpc_call_status: task:43@0 status=-32
      
      The issue is that the path through call_transmit_status does not release
      the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
      properly shut down.
      
      The below commit fixed things up in mainline by unconditionally calling
      xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
      call_transmit.  However, the entirety of this commit is not appropriate for
      stable kernels because its original inclusion was part of a series that
      modifies the sunrpc code to use a different queueing model.  As a result,
      there are machinations within this patch that are not needed for a stable
      fix and will not make sense without a larger backport of the mainline
      series.
      
      In this patch, we take the slightly modified bit of the mainline patch
      below, which is to release the XPRT_LOCK on transmission error should we
      detect that the transport is waiting to close.
      
      commit c544577daddb618c7dd5fa7fb98d6a41782f020e upstream
      Author: Trond Myklebust <trond.myklebust@hammerspace.com>
      Date:   Mon Sep 3 23:39:27 2018 -0400
      
          SUNRPC: Clean up transport write space handling
      
          Treat socket write space handling in the same way we now treat transport
          congestion: by denying the XPRT_LOCK until the transport signals that it
          has free buffer space.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      
      The original discussion of the problem is here:
      
          https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t
      
      This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.
      Reported-by: NDave Wysochanski <dwysocha@redhat.com>
      Suggested-by: NTrond Myklebust <trondmy@hammerspace.com>
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2440f3ce
    • F
      mac80211: ensure that mgmt tx skbs have tailroom for encryption · a4c77aac
      Felix Fietkau 提交于
      commit 9d0f50b80222dc273e67e4e14410fcfa4130a90c upstream.
      
      Some drivers use IEEE80211_KEY_FLAG_SW_MGMT_TX to indicate that management
      frames need to be software encrypted. Since normal data packets are still
      encrypted by the hardware, crypto_tx_tailroom_needed_cnt gets decremented
      after key upload to hw. This can lead to passing skbs to ccmp_encrypt_skb,
      which don't have the necessary tailroom for software encryption.
      
      Change the code to add tailroom for encrypted management packets, even if
      crypto_tx_tailroom_needed_cnt is 0.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4c77aac
  4. 13 2月, 2019 11 次提交
    • G
      sctp: walk the list of asoc safely · 7c236130
      Greg Kroah-Hartman 提交于
      [ Upstream commit ba59fb0273076637f0add4311faa990a5eec27c0 ]
      
      In sctp_sendmesg(), when walking the list of endpoint associations, the
      association can be dropped from the list, making the list corrupt.
      Properly handle this by using list_for_each_entry_safe()
      
      Fixes: 49102805 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
      Reported-by: NSecunia Research <vuln@secunia.com>
      Tested-by: NSecunia Research <vuln@secunia.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c236130
    • X
      sctp: check and update stream->out_curr when allocating stream_out · 7cd4e833
      Xin Long 提交于
      [ Upstream commit cfe4bd7a257f6d6f81d3458d8c9d9ec4957539e6 ]
      
      Now when using stream reconfig to add out streams, stream->out
      will get re-allocated, and all old streams' information will
      be copied to the new ones and the old ones will be freed.
      
      So without stream->out_curr updated, next time when trying to
      send from stream->out_curr stream, a panic would be caused.
      
      This patch is to check and update stream->out_curr when
      allocating stream_out.
      
      v1->v2:
        - define fa_index() to get elem index from stream->out_curr.
      v2->v3:
        - repost with no change.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Reported-by: syzbot+e33a3a138267ca119c7d@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cd4e833
    • E
      rxrpc: bad unlock balance in rxrpc_recvmsg · 21b8697e
      Eric Dumazet 提交于
      [ Upstream commit 6dce3c20ac429e7a651d728e375853370c796e8d ]
      
      When either "goto wait_interrupted;" or "goto wait_error;"
      paths are taken, socket lock has already been released.
      
      This patch fixes following syzbot splat :
      
      WARNING: bad unlock balance detected!
      5.0.0-rc4+ #59 Not tainted
      -------------------------------------
      syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at:
      [<ffffffff86651353>] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by syz-executor223/8256:
       #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline]
       #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798
      
      stack backtrace:
      CPU: 1 PID: 8256 Comm: syz-executor223 Not tainted 5.0.0-rc4+ #59
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_unlock_imbalance_bug kernel/locking/lockdep.c:3391 [inline]
       print_unlock_imbalance_bug.cold+0x114/0x123 kernel/locking/lockdep.c:3368
       __lock_release kernel/locking/lockdep.c:3601 [inline]
       lock_release+0x67e/0xa00 kernel/locking/lockdep.c:3860
       sock_release_ownership include/net/sock.h:1471 [inline]
       release_sock+0x183/0x1c0 net/core/sock.c:2808
       rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
       sock_recvmsg_nosec net/socket.c:794 [inline]
       sock_recvmsg net/socket.c:801 [inline]
       sock_recvmsg+0xd0/0x110 net/socket.c:797
       __sys_recvfrom+0x1ff/0x350 net/socket.c:1845
       __do_sys_recvfrom net/socket.c:1863 [inline]
       __se_sys_recvfrom net/socket.c:1859 [inline]
       __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:1859
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x446379
      Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fe5da89fd98 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
      RAX: ffffffffffffffda RBX: 00000000006dbc28 RCX: 0000000000446379
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
      R13: 0000000000000000 R14: 0000000000000000 R15: 20c49ba5e353f7cf
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Howells <dhowells@redhat.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21b8697e
    • E
      rds: fix refcount bug in rds_sock_addref · f2f054c4
      Eric Dumazet 提交于
      [ Upstream commit 6fa19f5637a6c22bc0999596bcc83bdcac8a4fa6 ]
      
      syzbot was able to catch a bug in rds [1]
      
      The issue here is that the socket might be found in a hash table
      but that its refcount has already be set to 0 by another cpu.
      
      We need to use refcount_inc_not_zero() to be safe here.
      
      [1]
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
       panic+0x2cb/0x65c kernel/panic.c:214
       __warn.cold+0x20/0x48 kernel/panic.c:571
       report_bug+0x263/0x2b0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       fixup_bug arch/x86/kernel/traps.c:173 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
      RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
      RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
      RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
      RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
      RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
      R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
      R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
       sock_hold include/net/sock.h:647 [inline]
       rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
       rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
       rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
       rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
       rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
       rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xdd/0x130 net/socket.c:631
       __sys_sendto+0x387/0x5f0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto net/socket.c:1796 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
       do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458089
      Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
      RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
      RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
      R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff
      
      Fixes: cc4dfb7f ("rds: fix two RCU related problems")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: rds-devel@oss.oracle.com
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2f054c4
    • R
      net: dsa: slave: Don't propagate flag changes on down slave interfaces · 8104a5e7
      Rundong Ge 提交于
      [ Upstream commit 17ab4f61b8cd6f9c38e9d0b935d86d73b5d0d2b5 ]
      
      The unbalance of master's promiscuity or allmulti will happen after ifdown
      and ifup a slave interface which is in a bridge.
      
      When we ifdown a slave interface , both the 'dsa_slave_close' and
      'dsa_slave_change_rx_flags' will clear the master's flags. The flags
      of master will be decrease twice.
      In the other hand, if we ifup the slave interface again, since the
      slave's flags were cleared the 'dsa_slave_open' won't set the master's
      flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
      will set the master's flags. The flags of master is increase once.
      
      Only propagating flag changes when a slave interface is up makes
      sure this does not happen. The 'vlan_dev_change_rx_flags' had the
      same problem and was fixed, and changes here follows that fix.
      
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: NRundong Ge <rdong.ge@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8104a5e7
    • D
      net: dsa: Fix NULL checking in dsa_slave_set_eee() · c8dfab5c
      Dan Carpenter 提交于
      [ Upstream commit 00670cb8a73b10b10d3c40f045c15411715e4465 ]
      
      This function can't succeed if dp->pl is NULL.  It will Oops inside the
      call to return phylink_ethtool_get_eee(dp->pl, e);
      
      Fixes: 1be52e97 ("dsa: slave: eee: Allow ports to use phylink")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8dfab5c
    • M
      net: dsa: Fix lockdep false positive splat · 98cedccb
      Marc Zyngier 提交于
      [ Upstream commit c8101f7729daee251f4f6505f9d135ec08e1342f ]
      
      Creating a macvtap on a DSA-backed interface results in the following
      splat when lockdep is enabled:
      
      [   19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
      [   23.041198] device lan0 entered promiscuous mode
      [   23.043445] device eth0 entered promiscuous mode
      [   23.049255]
      [   23.049557] ============================================
      [   23.055021] WARNING: possible recursive locking detected
      [   23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 #118 Not tainted
      [   23.066132] --------------------------------------------
      [   23.071598] ip/2861 is trying to acquire lock:
      [   23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38
      [   23.083693]
      [   23.083693] but task is already holding lock:
      [   23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
      [   23.096774]
      [   23.096774] other info that might help us debug this:
      [   23.103494]  Possible unsafe locking scenario:
      [   23.103494]
      [   23.109584]        CPU0
      [   23.112093]        ----
      [   23.114601]   lock(_xmit_ETHER);
      [   23.117917]   lock(_xmit_ETHER);
      [   23.121233]
      [   23.121233]  *** DEADLOCK ***
      [   23.121233]
      [   23.127325]  May be due to missing lock nesting notation
      [   23.127325]
      [   23.134315] 2 locks held by ip/2861:
      [   23.137987]  #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0
      [   23.146231]  #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
      [   23.153757]
      [   23.153757] stack backtrace:
      [   23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 #118
      [   23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
      [   23.172843] Call trace:
      [   23.175358]  dump_backtrace+0x0/0x188
      [   23.179116]  show_stack+0x14/0x20
      [   23.182524]  dump_stack+0xb4/0xec
      [   23.185928]  __lock_acquire+0x123c/0x1860
      [   23.190048]  lock_acquire+0xc8/0x248
      [   23.193724]  _raw_spin_lock_bh+0x40/0x58
      [   23.197755]  dev_set_rx_mode+0x1c/0x38
      [   23.201607]  dev_set_promiscuity+0x3c/0x50
      [   23.205820]  dsa_slave_change_rx_flags+0x5c/0x70
      [   23.210567]  __dev_set_promiscuity+0x148/0x1e0
      [   23.215136]  __dev_set_rx_mode+0x74/0x98
      [   23.219167]  dev_uc_add+0x54/0x70
      [   23.222575]  macvlan_open+0x170/0x1d0
      [   23.226336]  __dev_open+0xe0/0x160
      [   23.229830]  __dev_change_flags+0x16c/0x1b8
      [   23.234132]  dev_change_flags+0x20/0x60
      [   23.238074]  do_setlink+0x2d0/0xc50
      [   23.241658]  __rtnl_newlink+0x5f8/0x6e8
      [   23.245601]  rtnl_newlink+0x50/0x78
      [   23.249184]  rtnetlink_rcv_msg+0x360/0x4e0
      [   23.253397]  netlink_rcv_skb+0xe8/0x130
      [   23.257338]  rtnetlink_rcv+0x14/0x20
      [   23.261012]  netlink_unicast+0x190/0x210
      [   23.265043]  netlink_sendmsg+0x288/0x350
      [   23.269075]  sock_sendmsg+0x18/0x30
      [   23.272659]  ___sys_sendmsg+0x29c/0x2c8
      [   23.276602]  __sys_sendmsg+0x60/0xb8
      [   23.280276]  __arm64_sys_sendmsg+0x1c/0x28
      [   23.284488]  el0_svc_common+0xd8/0x138
      [   23.288340]  el0_svc_handler+0x24/0x80
      [   23.292192]  el0_svc+0x8/0xc
      
      This looks fairly harmless (no actual deadlock occurs), and is
      fixed in a similar way to c6894dec ("bridge: fix lockdep
      addr_list_lock false positive splat") by putting the addr_list_lock
      in its own lockdep class.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98cedccb
    • E
      dccp: fool proof ccid_hc_[rt]x_parse_options() · 15ed55e3
      Eric Dumazet 提交于
      [ Upstream commit 9b1f19d810e92d6cdc68455fbc22d9f961a58ce1 ]
      
      Similarly to commit 276bdb82 ("dccp: check ccid before dereferencing")
      it is wise to test for a NULL ccid.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
      RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
      Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
      kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
      RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
      RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
      RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
      R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
      kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
       dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
       sk_backlog_rcv include/net/sock.h:936 [inline]
       __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
       dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
       ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
       ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
       dst_input include/net/dst.h:450 [inline]
       ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
       __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
       __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
       process_backlog+0x206/0x750 net/core/dev.c:5923
       napi_poll net/core/dev.c:6346 [inline]
       net_rx_action+0x76d/0x1930 net/core/dev.c:6412
       __do_softirq+0x30b/0xb11 kernel/softirq.c:292
       run_ksoftirqd kernel/softirq.c:654 [inline]
       run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
       smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
       kthread+0x357/0x430 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace 58a0ba03bea2c376 ]---
      RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
      RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
      Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
      RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
      RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
      RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
      R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
      R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15ed55e3
    • Y
      xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi · 1552557b
      YueHaibing 提交于
      [ Upstream commit fa89a4593b927b3f59c3b69379f31d3b22272e4e ]
      
      gcc warn this:
      
      net/ipv6/xfrm6_tunnel.c:143 __xfrm6_tunnel_alloc_spi() warn:
       always true condition '(spi <= 4294967295) => (0-u32max <= u32max)'
      
      'spi' is u32, which always not greater than XFRM6_TUNNEL_SPI_MAX
      because of wrap around. So the second forloop will never reach.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1552557b
    • J
      mac80211: fix radiotap vendor presence bitmap handling · e5af9ce3
      Johannes Berg 提交于
      [ Upstream commit efc38dd7d5fa5c8cdd0c917c5d00947aa0539443 ]
      
      Due to the alignment handling, it actually matters where in the code
      we add the 4 bytes for the presence bitmap to the length; the first
      field is the timestamp with 8 byte alignment so we need to add the
      space for the extra vendor namespace presence bitmap *before* we do
      any alignment for the fields.
      
      Move the presence bitmap length accounting to the right place to fix
      the alignment for the data properly.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e5af9ce3
    • H
      tipc: fix node keep alive interval calculation · 1de47c06
      Hoang Le 提交于
      [ Upstream commit f5d6c3e5a359c0507800e7ac68d565c21de9b5a1 ]
      
      When setting LINK tolerance, node timer interval will be calculated
      base on the LINK with lowest tolerance.
      
      But when calculated, the old node timer interval only updated if current
      setting value (tolerance/4) less than old ones regardless of number of
      links as well as links' lowest tolerance value.
      
      This caused to two cases missing if tolerance changed as following:
      Case 1:
      1.1/ There is one link (L1) available in the system
      1.2/ Set L1's tolerance from 1500ms => lower (i.e 500ms)
      1.3/ Then, fallback to default (1500ms) or higher (i.e 2000ms)
      
      Expected:
          node timer interval is 1500/4=375ms after 1.3
      
      Result:
      node timer interval will not being updated after changing tolerance at 1.3
      since its value 1500/4=375ms is not less than 500/4=125ms at 1.2.
      
      Case 2:
      2.1/ There are two links (L1, L2) available in the system
      2.2/ L1 and L2 tolerance value are 2000ms as initial
      2.3/ Set L2's tolerance from 2000ms => lower 1500ms
      2.4/ Disable link L2 (bring down its bearer)
      
      Expected:
          node timer interval is 2000ms/4=500ms after 2.4
      
      Result:
      node timer interval will not being updated after disabling L2 since
      its value 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3
      although L2 is already not available in the system.
      
      To fix this, we start the node interval calculation by initializing it to
      a value larger than any conceivable calculated value. This way, the link
      with the lowest tolerance will always determine the calculated value.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1de47c06
  5. 07 2月, 2019 16 次提交
    • X
      sctp: set flow sport from saddr only when it's 0 · 37b34a91
      Xin Long 提交于
      [ Upstream commit ecf938fe7d0088077ee1280419a2b3c5429b47c8 ]
      
      Now sctp_transport_pmtu() passes transport->saddr into .get_dst() to set
      flow sport from 'saddr'. However, transport->saddr is set only when
      transport->dst exists in sctp_transport_route().
      
      If sctp_transport_pmtu() is called without transport->saddr set, like
      when transport->dst doesn't exists, the flow sport will be set to 0
      from transport->saddr, which will cause a wrong route to be got.
      
      Commit 6e91b578 ("sctp: re-use sctp_transport_pmtu in
      sctp_transport_route") made the issue be triggered more easily
      since sctp_transport_pmtu() would be called in sctp_transport_route()
      after that.
      
      In gerneral, fl4->fl4_sport should always be set to
      htons(asoc->base.bind_addr.port), unless transport->asoc doesn't exist
      in sctp_v4/6_get_dst(), which is the case:
      
        sctp_ootb_pkt_new() ->
          sctp_transport_route()
      
      For that, we can simply handle it by setting flow sport from saddr only
      when it's 0 in sctp_v4/6_get_dst().
      
      Fixes: 6e91b578 ("sctp: re-use sctp_transport_pmtu in sctp_transport_route")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37b34a91
    • X
      sctp: set chunk transport correctly when it's a new asoc · cbf23d40
      Xin Long 提交于
      [ Upstream commit 4ff40b86262b73553ee47cc3784ce8ba0f220bd8 ]
      
      In the paths:
      
        sctp_sf_do_unexpected_init() ->
          sctp_make_init_ack()
        sctp_sf_do_dupcook_a/b()() ->
          sctp_sf_do_5_1D_ce()
      
      The new chunk 'retval' transport is set from the incoming chunk 'chunk'
      transport. However, 'retval' transport belong to the new asoc, which
      is a different one from 'chunk' transport's asoc.
      
      It will cause that the 'retval' chunk gets set with a wrong transport.
      Later when sending it and because of Commit b9fd6839 ("sctp: add
      sctp_packet_singleton"), sctp_packet_singleton() will set some fields,
      like vtag to 'retval' chunk from that wrong transport's asoc.
      
      This patch is to fix it by setting 'retval' transport correctly which
      belongs to the right asoc in sctp_make_init_ack() and
      sctp_sf_do_5_1D_ce().
      
      Fixes: b9fd6839 ("sctp: add sctp_packet_singleton")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbf23d40
    • N
      ip6mr: Fix notifiers call on mroute_clean_tables() · 505e5f3d
      Nir Dotan 提交于
      [ Upstream commit 146820cc240f4389cf33481c058d9493aef95e25 ]
      
      When the MC route socket is closed, mroute_clean_tables() is called to
      cleanup existing routes. Mistakenly notifiers call was put on the cleanup
      of the unresolved MC route entries cache.
      In a case where the MC socket closes before an unresolved route expires,
      the notifier call leads to a crash, caused by the driver trying to
      increment a non initialized refcount_t object [1] and then when handling
      is done, to decrement it [2]. This was detected by a test recently added in
      commit 6d4efada3b82 ("selftests: forwarding: Add multicast routing test").
      
      Fix that by putting notifiers call on the resolved entries traversal,
      instead of on the unresolved entries traversal.
      
      [1]
      
      [  245.748967] refcount_t: increment on 0; use-after-free.
      [  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
      ...
      [  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      [  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
      ...
      [  245.907487] Call Trace:
      [  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
      [  245.917913]  notifier_call_chain+0x45/0x7
      [  245.922484]  atomic_notifier_call_chain+0x15/0x20
      [  245.927729]  call_fib_notifiers+0x15/0x30
      [  245.932205]  mroute_clean_tables+0x372/0x3f
      [  245.936971]  ip6mr_sk_done+0xb1/0xc0
      [  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
      ...
      
      [2]
      
      [  246.128487] refcount_t: underflow; use-after-free.
      [  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
      [  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      ...
      [  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
      [  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
      ...
      [  246.298889] Call Trace:
      [  246.301617]  refcount_dec_and_test_checked+0x11/0x20
      [  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
      [  246.315531]  process_one_work+0x1fa/0x3f0
      [  246.320005]  worker_thread+0x2f/0x3e0
      [  246.324083]  kthread+0x118/0x130
      [  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
      [  246.332926]  ? kthread_park+0x80/0x80
      [  246.337013]  ret_from_fork+0x1f/0x30
      
      Fixes: 088aa3ee ("ip6mr: Support fib notifications")
      Signed-off-by: NNir Dotan <nird@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      505e5f3d
    • X
      sctp: improve the events for sctp stream adding · 4ec13999
      Xin Long 提交于
      [ Upstream commit 8220c870cb0f4eaa4e335c9645dbd9a1c461c1dd ]
      
      This patch is to improve sctp stream adding events in 2 places:
      
        1. In sctp_process_strreset_addstrm_out(), move up SCTP_MAX_STREAM
           and in stream allocation failure checks, as the adding has to
           succeed after reconf_timer stops for the in stream adding
           request retransmission.
      
        3. In sctp_process_strreset_addstrm_in(), no event should be sent,
           as no in or out stream is added here.
      
      Fixes: 50a41591 ("sctp: implement receiver-side procedures for the Add Outgoing Streams Request Parameter")
      Fixes: c5c4ebb3 ("sctp: implement receiver-side procedures for the Add Incoming Streams Request Parameter")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ec13999
    • L
      net: ip6_gre: always reports o_key to userspace · 9f7d849b
      Lorenzo Bianconi 提交于
      [ Upstream commit c706863bc8902d0c2d1a5a27ac8e1ead5d06b79d ]
      
      As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
          key 1 seq erspan_ver 1
      $ip link set ip6erspan1 up
      ip -d link sh ip6erspan1
      
      ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
          link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
          ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ip6gre_fill_info
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f7d849b
    • X
      sctp: improve the events for sctp stream reset · e569927a
      Xin Long 提交于
      [ Upstream commit 2e6dc4d95110becfe0ff4c3d4749c33ea166e9e7 ]
      
      This patch is to improve sctp stream reset events in 4 places:
      
        1. In sctp_process_strreset_outreq(), the flag should always be set with
           SCTP_STREAM_RESET_INCOMING_SSN instead of OUTGOING, as receiver's in
           stream is reset here.
        2. In sctp_process_strreset_outreq(), move up SCTP_STRRESET_ERR_WRONG_SSN
           check, as the reset has to succeed after reconf_timer stops for the
           in stream reset request retransmission.
        3. In sctp_process_strreset_inreq(), no event should be sent, as no in
           or out stream is reset here.
        4. In sctp_process_strreset_resp(), SCTP_STREAM_RESET_INCOMING_SSN or
           OUTGOING event should always be sent for stream reset requests, no
           matter it fails or succeeds to process the request.
      
      Fixes: 81054476 ("sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter")
      Fixes: 16e1a919 ("sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter")
      Fixes: 11ae76e6 ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e569927a
    • J
      net: set default network namespace in init_dummy_netdev() · 5f1a18e0
      Josh Elsasser 提交于
      [ Upstream commit 35edfdc77f683c8fd27d7732af06cf6489af60a5 ]
      
      Assign a default net namespace to netdevs created by init_dummy_netdev().
      Fixes a NULL pointer dereference caused by busy-polling a socket bound to
      an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
      if napi_poll() received packets:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
        IP: napi_busy_loop+0xd6/0x200
        Call Trace:
          sock_poll+0x5e/0x80
          do_sys_poll+0x324/0x5a0
          SyS_poll+0x6c/0xf0
          do_syscall_64+0x6b/0x1f0
          entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 7db6b048 ("net: Commonize busy polling code to focus on napi_id instead of socket")
      Signed-off-by: NJosh Elsasser <jelsasser@appneta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f1a18e0
    • B
      net/rose: fix NULL ax25_cb kernel panic · fc4154c7
      Bernard Pidoux 提交于
      [ Upstream commit b0cf029234f9b18e10703ba5147f0389c382bccc ]
      
      When an internally generated frame is handled by rose_xmit(),
      rose_route_frame() is called:
      
              if (!rose_route_frame(skb, NULL)) {
                      dev_kfree_skb(skb);
                      stats->tx_errors++;
                      return NETDEV_TX_OK;
              }
      
      We have the same code sequence in Net/Rom where an internally generated
      frame is handled by nr_xmit() calling nr_route_frame(skb, NULL).
      However, in this function NULL argument is tested while it is not in
      rose_route_frame().
      Then kernel panic occurs later on when calling ax25cmp() with a NULL
      ax25_cb argument as reported many times and recently with syzbot.
      
      We need to test if ax25 is NULL before using it.
      
      Testing:
      Built kernel with CONFIG_ROSE=y.
      Signed-off-by: NBernard Pidoux <f6bvp@free.fr>
      Acked-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Bernard Pidoux <f6bvp@free.fr>
      Cc: linux-hams@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc4154c7
    • C
      netrom: switch to sock timer API · 2c6b5724
      Cong Wang 提交于
      [ Upstream commit 63346650c1a94a92be61a57416ac88c0a47c4327 ]
      
      sk_reset_timer() and sk_stop_timer() properly handle
      sock refcnt for timer function. Switching to them
      could fix a refcounting bug reported by syzbot.
      
      Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-hams@vger.kernel.org
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c6b5724
    • L
      net: ip_gre: use erspan key field for tunnel lookup · 0a198e0b
      Lorenzo Bianconi 提交于
      [ Upstream commit cb73ee40b1b381eaf3749e6dbeed567bb38e5258 ]
      
      Use ERSPAN key header field as tunnel key in gre_parse_header routine
      since ERSPAN protocol sets the key field of the external GRE header to
      0 resulting in a tunnel lookup fail in ip6gre_err.
      In addition remove key field parsing and pskb_may_pull check in
      erspan_rcv and ip6erspan_rcv
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a198e0b
    • L
      net: ip_gre: always reports o_key to userspace · 897ea28b
      Lorenzo Bianconi 提交于
      [ Upstream commit feaf5c796b3f0240f10d0d6d0b686715fd58a05b ]
      
      Erspan protocol (version 1 and 2) relies on o_key to configure
      session id header field. However TUNNEL_KEY bit is cleared in
      erspan_xmit since ERSPAN protocol does not set the key field
      of the external GRE header and so the configured o_key is not reported
      to userspace. The issue can be triggered with the following reproducer:
      
      $ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
          key 1 seq erspan_ver 1
      $ip link set erspan1 up
      $ip -d link sh erspan1
      
      erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
        link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
        erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
      
      Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
      ipgre_fill_info
      
      Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      897ea28b
    • J
      l2tp: fix reading optional fields of L2TPv3 · 8de67666
      Jacob Wen 提交于
      [ Upstream commit 4522a70db7aa5e77526a4079628578599821b193 ]
      
      Use pskb_may_pull() to make sure the optional fields are in skb linear
      parts, so we can safely read them later.
      
      It's easy to reproduce the issue with a net driver that supports paged
      skb data. Just create a L2TPv3 over IP tunnel and then generates some
      network traffic.
      Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.
      
      Changes in v4:
      1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
      2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
      3. Add 'Fixes' in commit messages.
      
      Changes in v3:
      1. To keep consistency, move the code out of l2tp_recv_common.
      2. Use "net" instead of "net-next", since this is a bug fix.
      
      Changes in v2:
      1. Only fix L2TPv3 to make code simple.
         To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
         It's complicated to do so.
      2. Reloading pointers after pskb_may_pull
      
      Fixes: f7faffa3 ("l2tp: Add L2TPv3 protocol support")
      Fixes: 0d76751f ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
      Fixes: a32e0eec ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
      Signed-off-by: NJacob Wen <jian.w.wen@oracle.com>
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8de67666
    • J
      l2tp: copy 4 more bytes to linear part if necessary · 3d418a25
      Jacob Wen 提交于
      [ Upstream commit 91c524708de6207f59dd3512518d8a1c7b434ee3 ]
      
      The size of L2TPv2 header with all optional fields is 14 bytes.
      l2tp_udp_recv_core only moves 10 bytes to the linear part of a
      skb. This may lead to l2tp_recv_common read data outside of a skb.
      
      This patch make sure that there is at least 14 bytes in the linear
      part of a skb to meet the maximum need of l2tp_udp_recv_core and
      l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
      Ethernet frame is larger than 14 bytes, so we are safe to do so.
      
      Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Suggested-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NJacob Wen <jian.w.wen@oracle.com>
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d418a25
    • Y
      ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation · 2f704348
      Yohei Kanemaru 提交于
      [ Upstream commit ef489749aae508e6f17886775c075f12ff919fb1 ]
      
      skb->cb may contain data from previous layers (in an observed case
      IPv4 with L3 Master Device). In the observed scenario, the data in
      IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
      eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
      through ip6_finish_output().
      
      This patch clears IP6CB(skb), which potentially contains garbage data,
      on the SRH ip4ip6 encapsulation.
      
      Fixes: 32d99d0b ("ipv6: sr: add support for ip4ip6 encapsulation")
      Signed-off-by: NYohei Kanemaru <yohei.kanemaru@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f704348
    • D
      ipv6: Consider sk_bound_dev_if when binding a socket to an address · 7e9a6476
      David Ahern 提交于
      [ Upstream commit c5ee066333ebc322a24a00a743ed941a0c68617e ]
      
      IPv6 does not consider if the socket is bound to a device when binding
      to an address. The result is that a socket can be bound to eth0 and then
      bound to the address of eth1. If the device is a VRF, the result is that
      a socket can only be bound to an address in the default VRF.
      
      Resolve by considering the device if sk_bound_dev_if is set.
      
      This problem exists from the beginning of git history.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e9a6476
    • G
      Fix "net: ipv4: do not handle duplicate fragments as overlapping" · 8c763a3c
      Greg Kroah-Hartman 提交于
      ade446403bfb ("net: ipv4: do not handle duplicate fragments as
      overlapping") was backported to many stable trees, but it had a problem
      that was "accidentally" fixed by the upstream commit 0ff89efb5246 ("ip:
      fail fast on IP defrag errors")
      
      This is the fixup for that problem as we do not want the larger patch in
      the older stable trees.
      
      Fixes: ade446403bfb ("net: ipv4: do not handle duplicate fragments as overlapping")
      Reported-by: NIvan Babrou <ivan@cloudflare.com>
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c763a3c
  6. 31 1月, 2019 2 次提交