1. 08 10月, 2019 40 次提交
    • T
      kexec: bail out upon SIGKILL when allocating memory. · d85bc11a
      Tetsuo Handa 提交于
      commit 7c3a6aedcd6aae0a32a527e68669f7dd667492d1 upstream.
      
      syzbot found that a thread can stall for minutes inside kexec_load() after
      that thread was killed by SIGKILL [1].  It turned out that the reproducer
      was trying to allocate 2408MB of memory using kimage_alloc_page() from
      kimage_load_normal_segment().  Let's check for SIGKILL before doing memory
      allocation.
      
      [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
      
      Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: Nsyzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d85bc11a
    • A
      NFC: fix attrs checks in netlink interface · c8a65ec0
      Andrey Konovalov 提交于
      commit 18917d51472fe3b126a3a8f756c6b18085eb8130 upstream.
      
      nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
      attribute being present, but doesn't check whether it is actually
      provided by the user. Same goes for nfc_genl_fw_download() and
      NFC_ATTR_FIRMWARE_NAME.
      
      This patch adds appropriate checks.
      
      Found with syzkaller.
      Signed-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a65ec0
    • E
      smack: use GFP_NOFS while holding inode_smack::smk_lock · 1b425032
      Eric Biggers 提交于
      commit e5bfad3d7acc5702f32aafeb388362994f4d7bd0 upstream.
      
      inode_smack::smk_lock is taken during smack_d_instantiate(), which is
      called during a filesystem transaction when creating a file on ext4.
      Therefore to avoid a deadlock, all code that takes this lock must use
      GFP_NOFS, to prevent memory reclaim from waiting for the filesystem
      transaction to complete.
      
      Reported-by: syzbot+0eefc1e06a77d327a056@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b425032
    • J
      Smack: Don't ignore other bprm->unsafe flags if LSM_UNSAFE_PTRACE is set · ef9744a0
      Jann Horn 提交于
      commit 3675f052b43ba51b99b85b073c7070e083f3e6fb upstream.
      
      There is a logic bug in the current smack_bprm_set_creds():
      If LSM_UNSAFE_PTRACE is set, but the ptrace state is deemed to be
      acceptable (e.g. because the ptracer detached in the meantime), the other
      ->unsafe flags aren't checked. As far as I can tell, this means that
      something like the following could work (but I haven't tested it):
      
       - task A: create task B with fork()
       - task B: set NO_NEW_PRIVS
       - task B: install a seccomp filter that makes open() return 0 under some
         conditions
       - task B: replace fd 0 with a malicious library
       - task A: attach to task B with PTRACE_ATTACH
       - task B: execve() a file with an SMACK64EXEC extended attribute
       - task A: while task B is still in the middle of execve(), exit (which
         destroys the ptrace relationship)
      
      Make sure that if any flags other than LSM_UNSAFE_PTRACE are set in
      bprm->unsafe, we reject the execve().
      
      Cc: stable@vger.kernel.org
      Fixes: 5663884c ("Smack: unify all ptrace accesses in the smack")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef9744a0
    • P
      soundwire: fix regmap dependencies and align with other serial links · 47035934
      Pierre-Louis Bossart 提交于
      [ Upstream commit 8676b3ca4673517650fd509d7fa586aff87b3c28 ]
      
      The existing code has a mixed select/depend usage which makes no sense.
      
      config SOUNDWIRE_BUS
             tristate
             select REGMAP_SOUNDWIRE
      
      config REGMAP_SOUNDWIRE
              tristate
              depends on SOUNDWIRE_BUS
      
      Let's remove one layer of Kconfig definitions and align with the
      solutions used by all other serial links.
      Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Link: https://lore.kernel.org/r/20190718230215.18675-1-pierre-louis.bossart@linux.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      47035934
    • P
      soundwire: Kconfig: fix help format · 322753c7
      Pierre-Louis Bossart 提交于
      [ Upstream commit 9d7cd9d500826a14fc68fb6994db375432866c6a ]
      
      Move to the regular help format, --help-- is no longer recommended.
      Reviewed-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      322753c7
    • E
      sch_cbq: validate TCA_CBQ_WRROPT to avoid crash · 74e2a311
      Eric Dumazet 提交于
      [ Upstream commit e9789c7cc182484fc031fd88097eb14cb26c4596 ]
      
      syzbot reported a crash in cbq_normalize_quanta() caused
      by an out of range cl->priority.
      
      iproute2 enforces this check, but malicious users do not.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      Modules linked in:
      CPU: 1 PID: 26447 Comm: syz-executor.1 Not tainted 5.3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:cbq_normalize_quanta.part.0+0x1fd/0x430 net/sched/sch_cbq.c:902
      RSP: 0018:ffff8801a5c333b0 EFLAGS: 00010206
      RAX: 0000000020000003 RBX: 00000000fffffff8 RCX: ffffc9000712f000
      RDX: 00000000000043bf RSI: ffffffff83be8962 RDI: 0000000100000018
      RBP: ffff8801a5c33420 R08: 000000000000003a R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000002ef
      R13: ffff88018da95188 R14: dffffc0000000000 R15: 0000000000000015
      FS:  00007f37d26b1700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c7cec CR3: 00000001bcd0a006 CR4: 00000000001626f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffff83be9d57>] cbq_normalize_quanta include/net/pkt_sched.h:27 [inline]
       [<ffffffff83be9d57>] cbq_addprio net/sched/sch_cbq.c:1097 [inline]
       [<ffffffff83be9d57>] cbq_set_wrr+0x2d7/0x450 net/sched/sch_cbq.c:1115
       [<ffffffff83bee8a7>] cbq_change_class+0x987/0x225b net/sched/sch_cbq.c:1537
       [<ffffffff83b96985>] tc_ctl_tclass+0x555/0xcd0 net/sched/sch_api.c:2329
       [<ffffffff83a84655>] rtnetlink_rcv_msg+0x485/0xc10 net/core/rtnetlink.c:5248
       [<ffffffff83cadf0a>] netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2510
       [<ffffffff83a7db6d>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5266
       [<ffffffff83cac2c6>] netlink_unicast_kernel net/netlink/af_netlink.c:1324 [inline]
       [<ffffffff83cac2c6>] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1350
       [<ffffffff83cacd4a>] netlink_sendmsg+0x89a/0xd50 net/netlink/af_netlink.c:1939
       [<ffffffff8399d46e>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff8399d46e>] sock_sendmsg+0x12e/0x170 net/socket.c:684
       [<ffffffff8399f1fd>] ___sys_sendmsg+0x81d/0x960 net/socket.c:2359
       [<ffffffff839a2d05>] __sys_sendmsg+0x105/0x1d0 net/socket.c:2397
       [<ffffffff839a2df9>] SYSC_sendmsg net/socket.c:2406 [inline]
       [<ffffffff839a2df9>] SyS_sendmsg+0x29/0x30 net/socket.c:2404
       [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305
       [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74e2a311
    • T
      tipc: fix unlimited bundling of small messages · ed9420dd
      Tuong Lien 提交于
      [ Upstream commit e95584a889e1902fdf1ded9712e2c3c3083baf96 ]
      
      We have identified a problem with the "oversubscription" policy in the
      link transmission code.
      
      When small messages are transmitted, and the sending link has reached
      the transmit window limit, those messages will be bundled and put into
      the link backlog queue. However, bundles of data messages are counted
      at the 'CRITICAL' level, so that the counter for that level, instead of
      the counter for the real, bundled message's level is the one being
      increased.
      Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
      to be tested against the unchanged counter for their own level, while
      contributing to an unrestrained increase at the CRITICAL backlog level.
      
      This leaves a gap in congestion control algorithm for small messages
      that can result in starvation for other users or a "real" CRITICAL
      user. Even that eventually can lead to buffer exhaustion & link reset.
      
      We fix this by keeping a 'target_bskb' buffer pointer at each levels,
      then when bundling, we only bundle messages at the same importance
      level only. This way, we know exactly how many slots a certain level
      have occupied in the queue, so can manage level congestion accurately.
      
      By bundling messages at the same level, we even have more benefits. Let
      consider this:
      - One socket sends 64-byte messages at the 'CRITICAL' level;
      - Another sends 4096-byte messages at the 'LOW' level;
      
      When a 64-byte message comes and is bundled the first time, we put the
      overhead of message bundle to it (+ 40-byte header, data copy, etc.)
      for later use, but the next message can be a 4096-byte one that cannot
      be bundled to the previous one. This means the last bundle carries only
      one payload message which is totally inefficient, as for the receiver
      also! Later on, another 64-byte message comes, now we make a new bundle
      and the same story repeats...
      
      With the new bundling algorithm, this will not happen, the 64-byte
      messages will be bundled together even when the 4096-byte message(s)
      comes in between. However, if the 4096-byte messages are sent at the
      same level i.e. 'CRITICAL', the bundling algorithm will again cause the
      same overhead.
      
      Also, the same will happen even with only one socket sending small
      messages at a rate close to the link transmit's one, so that, when one
      message is bundled, it's transmitted shortly. Then, another message
      comes, a new bundle is created and so on...
      
      We will solve this issue radically by another patch.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reported-by: NHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9420dd
    • D
      xen-netfront: do not use ~0U as error return value for xennet_fill_frags() · a1afd826
      Dongli Zhang 提交于
      [ Upstream commit a761129e3625688310aecf26e1be9e98e85f8eb5 ]
      
      xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
      to cache extra fragments. This is incorrect because the return type of
      xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
      ring buffer index.
      
      In the situation when the rsp_cons is approaching 0xffffffff, the return
      value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
      caller) would regard as error. As a result, queue->rx.rsp_cons is set
      incorrectly because it is updated only when there is error. If there is no
      error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
      Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
      queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
      This leads to NULL pointer access in the next iteration to process rx ring
      buffer entries.
      
      The symptom is similar to the one fixed in
      commit 00b368502d18 ("xen-netfront: do not assume sk_buff_head list is
      empty in error handling").
      
      This patch changes the return type of xennet_fill_frags() to indicate
      whether it is successful or failed. The queue->rx.rsp_cons will be
      always updated inside this function.
      
      Fixes: ad4f15dc ("xen/netfront: don't bug in case of too many frags")
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1afd826
    • D
      net/rds: Fix error handling in rds_ib_add_one() · 36a4043c
      Dotan Barak 提交于
      [ Upstream commit d64bf89a75b65f83f06be9fb8f978e60d53752db ]
      
      rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
      after allocation some resources such as protection domain.
      If allocation of such resources fail, then these uninitialized
      variables are accessed in rds_ib_dev_free() in failure path. This
      can potentially crash the system. The code has been updated to
      initialize these variables very early in the function.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NSudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a4043c
    • J
      udp: only do GSO if # of segs > 1 · 012363f5
      Josh Hunt 提交于
      [ Upstream commit 4094871db1d65810acab3d57f6089aa39ef7f648 ]
      
      Prior to this change an application sending <= 1MSS worth of data and
      enabling UDP GSO would fail if the system had SW GSO enabled, but the
      same send would succeed if HW GSO offload is enabled. In addition to this
      inconsistency the error in the SW GSO case does not get back to the
      application if sending out of a real device so the user is unaware of this
      failure.
      
      With this change we only perform GSO if the # of segments is > 1 even
      if the application has enabled segmentation. I've also updated the
      relevant udpgso selftests.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      012363f5
    • L
      net: dsa: rtl8366: Check VLAN ID and not ports · 5c08d7e4
      Linus Walleij 提交于
      [ Upstream commit e8521e53cca584ddf8ec4584d3c550a6c65f88c4 ]
      
      There has been some confusion between the port number and
      the VLAN ID in this driver. What we need to check for
      validity is the VLAN ID, nothing else.
      
      The current confusion came from assigning a few default
      VLANs for default routing and we need to rewrite that
      properly.
      
      Instead of checking if the port number is a valid VLAN
      ID, check the actual VLAN IDs passed in to the callback
      one by one as expected.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c08d7e4
    • D
      vsock: Fix a lockdep warning in __vsock_release() · 3c1f0704
      Dexuan Cui 提交于
      [ Upstream commit 0d9138ffac24cf8b75366ede3a68c951e6dcc575 ]
      
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c1f0704
    • J
      udp: fix gso_segs calculations · 544aee54
      Josh Hunt 提交于
      [ Upstream commit 44b321e5020d782ad6e8ae8183f09b163be6e6e2 ]
      
      Commit dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      added gso_segs calculation, but incorrectly got sizeof() the pointer and
      not the underlying data type. In addition let's fix the v6 case.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Fixes: dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544aee54
    • E
      sch_dsmark: fix potential NULL deref in dsmark_init() · 79fd59ae
      Eric Dumazet 提交于
      [ Upstream commit 474f0813a3002cb299bb73a5a93aa1f537a80ca8 ]
      
      Make sure TCA_DSMARK_INDICES was provided by the user.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline]
      RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline]
      RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339
      Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca
      RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09
      RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004
      RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159
      R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940
      R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000
      FS:  0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237
       tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653
       rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x803/0x920 net/socket.c:2311
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg net/socket.c:2363 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440369
      
      Fixes: 758cc43c ("[PKT_SCHED]: Fix dsmark to apply changes consistent")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79fd59ae
    • D
      rxrpc: Fix rxrpc_recvmsg tracepoint · 76b55277
      David Howells 提交于
      [ Upstream commit db9b2e0af605e7c994784527abfd9276cabd718a ]
      
      Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
      parameter.
      
      Fixes: a25e21f0 ("rxrpc, afs: Use debug_ids rather than pointers in traces")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76b55277
    • R
      qmi_wwan: add support for Cinterion CLS8 devices · 7047aae6
      Reinhard Speyerer 提交于
      [ Upstream commit cf74ac6db25d4002089e85cc623ad149ecc25614 ]
      
      Add support for Cinterion CLS8 devices.
      Use QMI_QUIRK_SET_DTR as required for Qualcomm MDM9x07 chipsets.
      
      T:  Bus=01 Lev=03 Prnt=05 Port=01 Cnt=02 Dev#= 25 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1e2d ProdID=00b0 Rev= 3.18
      S:  Manufacturer=GEMALTO
      S:  Product=USB Modem
      C:* #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: NReinhard Speyerer <rspmn@arcor.de>
      Acked-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7047aae6
    • E
      nfc: fix memory leak in llcp_sock_bind() · dd9c580a
      Eric Dumazet 提交于
      [ Upstream commit a0c2dc1fe63e2869b74c1c7f6a81d1745c8a695d ]
      
      sysbot reported a memory leak after a bind() has failed.
      
      While we are at it, abort the operation if kmemdup() has failed.
      
      BUG: memory leak
      unreferenced object 0xffff888105d83ec0 (size 32):
        comm "syz-executor067", pid 7207, jiffies 4294956228 (age 19.430s)
        hex dump (first 32 bytes):
          00 69 6c 65 20 72 65 61 64 00 6e 65 74 3a 5b 34  .ile read.net:[4
          30 32 36 35 33 33 30 39 37 5d 00 00 00 00 00 00  026533097]......
        backtrace:
          [<0000000036bac473>] kmemleak_alloc_recursive /./include/linux/kmemleak.h:43 [inline]
          [<0000000036bac473>] slab_post_alloc_hook /mm/slab.h:522 [inline]
          [<0000000036bac473>] slab_alloc /mm/slab.c:3319 [inline]
          [<0000000036bac473>] __do_kmalloc /mm/slab.c:3653 [inline]
          [<0000000036bac473>] __kmalloc_track_caller+0x169/0x2d0 /mm/slab.c:3670
          [<000000000cd39d07>] kmemdup+0x27/0x60 /mm/util.c:120
          [<000000008e57e5fc>] kmemdup /./include/linux/string.h:432 [inline]
          [<000000008e57e5fc>] llcp_sock_bind+0x1b3/0x230 /net/nfc/llcp_sock.c:107
          [<000000009cb0b5d3>] __sys_bind+0x11c/0x140 /net/socket.c:1647
          [<00000000492c3bbc>] __do_sys_bind /net/socket.c:1658 [inline]
          [<00000000492c3bbc>] __se_sys_bind /net/socket.c:1656 [inline]
          [<00000000492c3bbc>] __x64_sys_bind+0x1e/0x30 /net/socket.c:1656
          [<0000000008704b2a>] do_syscall_64+0x76/0x1a0 /arch/x86/entry/common.c:296
          [<000000009f4c57a4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 30cc4587 ("NFC: Move LLCP code to the NFC top level diirectory")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd9c580a
    • M
      net: Unpublish sk from sk_reuseport_cb before call_rcu · d5b1db1c
      Martin KaFai Lau 提交于
      [ Upstream commit 8c7138b33e5c690c308b2a7085f6313fdcb3f616 ]
      
      The "reuse->sock[]" array is shared by multiple sockets.  The going away
      sk must unpublish itself from "reuse->sock[]" before making call_rcu()
      call.  However, this unpublish-action is currently done after a grace
      period and it may cause use-after-free.
      
      The fix is to move reuseport_detach_sock() to sk_destruct().
      Due to the above reason, any socket with sk_reuseport_cb has
      to go through the rcu grace period before freeing it.
      
      It is a rather old bug (~3 yrs).  The Fixes tag is not necessary
      the right commit but it is the one that introduced the SOCK_RCU_FREE
      logic and this fix is depending on it.
      
      Fixes: a4298e45 ("net: add SOCK_RCU_FREE socket flag")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5b1db1c
    • N
      net: qlogic: Fix memory leak in ql_alloc_large_buffers · 9d0995cc
      Navid Emamdoost 提交于
      [ Upstream commit 1acb8f2a7a9f10543868ddd737e37424d5c36cf4 ]
      
      In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
      This skb should be released if pci_dma_mapping_error fails.
      
      Fixes: 0f8ab89e ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
      Signed-off-by: NNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d0995cc
    • P
      net: ipv4: avoid mixed n_redirects and rate_tokens usage · 124b64fe
      Paolo Abeni 提交于
      [ Upstream commit b406472b5ad79ede8d10077f0c8f05505ace8b6d ]
      
      Since commit c09551c6ff7f ("net: ipv4: use a dedicated counter
      for icmp_v4 redirect packets") we use 'n_redirects' to account
      for redirect packets, but we still use 'rate_tokens' to compute
      the redirect packets exponential backoff.
      
      If the device sent to the relevant peer any ICMP error packet
      after sending a redirect, it will also update 'rate_token' according
      to the leaking bucket schema; typically 'rate_token' will raise
      above BITS_PER_LONG and the redirect packets backoff algorithm
      will produce undefined behavior.
      
      Fix the issue using 'n_redirects' to compute the exponential backoff
      in ip_rt_send_redirect().
      
      Note that we still clear rate_tokens after a redirect silence period,
      to avoid changing an established behaviour.
      
      The root cause predates git history; before the mentioned commit in
      the critical scenario, the kernel stopped sending redirects, after
      the mentioned commit the behavior more randomic.
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Fixes: c09551c6ff7f ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      124b64fe
    • D
      ipv6: Handle missing host route in __ipv6_ifa_notify · 6f8564ed
      David Ahern 提交于
      [ Upstream commit 2d819d250a1393a3e725715425ab70a0e0772a71 ]
      
      Rajendra reported a kernel panic when a link was taken down:
      
          [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
          <snip>
      
          [ 6870.570501] Call Trace:
          [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
          [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
          [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
          [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
          [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
          [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
          [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
          [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
          [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
          [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
          [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in __ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      Since the DAD sequence can not be aborted, add a check for the missing
      host route in __ipv6_ifa_notify. The only way this should happen is due
      to the previously mentioned race. The host route is created when the
      address is added to an interface; it is only removed on a down event
      where the address is kept. Add a warning if the host route is missing
      AND the device is up; this is a situation that should never happen.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: NRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f8564ed
    • E
      ipv6: drop incoming packets having a v4mapped source address · 658d7ee4
      Eric Dumazet 提交于
      [ Upstream commit 6af1799aaf3f1bc8defedddfa00df3192445bbf3 ]
      
      This began with a syzbot report. syzkaller was injecting
      IPv6 TCP SYN packets having a v4mapped source address.
      
      After an unsuccessful 4-tuple lookup, TCP creates a request
      socket (SYN_RECV) and calls reqsk_queue_hash_req()
      
      reqsk_queue_hash_req() calls sk_ehashfn(sk)
      
      At this point we have AF_INET6 sockets, and the heuristic
      used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
      is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
      
      For the particular spoofed packet, we end up hashing V4 addresses
      which were not initialized by the TCP IPv6 stack, so KMSAN fired
      a warning.
      
      I first fixed sk_ehashfn() to test both source and destination addresses,
      but then faced various problems, including user-space programs
      like packetdrill that had similar assumptions.
      
      Instead of trying to fix the whole ecosystem, it is better
      to admit that we have a dual stack behavior, and that we
      can not build linux kernels without V4 stack anyway.
      
      The dual stack API automatically forces the traffic to be IPv4
      if v4mapped addresses are used at bind() or connect(), so it makes
      no sense to allow IPv6 traffic to use the same v4mapped class.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      658d7ee4
    • J
      hso: fix NULL-deref on tty open · a495fd19
      Johan Hovold 提交于
      [ Upstream commit 8353da9fa69722b54cba82b2ec740afd3d438748 ]
      
      Fix NULL-pointer dereference on tty open due to a failure to handle a
      missing interrupt-in endpoint when probing modem ports:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000006
      	...
      	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
      	...
      	Call Trace:
      	hso_start_serial_device+0xdc/0x140 [hso]
      	hso_serial_open+0x118/0x1b0 [hso]
      	tty_open+0xf1/0x490
      
      Fixes: 542f5482 ("tty: Modem functions for the HSO driver")
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a495fd19
    • H
      erspan: remove the incorrect mtu limit for erspan · 7f30c44b
      Haishuang Yan 提交于
      [ Upstream commit 0e141f757b2c78c983df893e9993313e2dc21e38 ]
      
      erspan driver calls ether_setup(), after commit 61e84623
      ("net: centralize net_device min/max MTU checking"), the range
      of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
      
      It causes the dev mtu of the erspan device to not be greater
      than 1500, this limit value is not correct for ipgre tap device.
      
      Tested:
      Before patch:
      # ip link set erspan0 mtu 1600
      Error: mtu greater than device maximum.
      After patch:
      # ip link set erspan0 mtu 1600
      # ip -d link show erspan0
      21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
      mode DEFAULT group default qlen 1000
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
      
      Fixes: 61e84623 ("net: centralize net_device min/max MTU checking")
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f30c44b
    • V
      cxgb4:Fix out-of-bounds MSI-X info array access · 2b838911
      Vishal Kulkarni 提交于
      [ Upstream commit 6b517374f4ea5a3c6e307e1219ec5f35d42e6d00 ]
      
      When fetching free MSI-X vectors for ULDs, check for the error code
      before accessing MSI-X info array. Otherwise, an out-of-bounds access is
      attempted, which results in kernel panic.
      
      Fixes: 94cdb8bb ("cxgb4: Add support for dynamic allocation of resources for ULD")
      Signed-off-by: NShahjada Abul Husain <shahjada@chelsio.com>
      Signed-off-by: NVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b838911
    • D
      bpf: fix use after free in prog symbol exposure · ed568ca7
      Daniel Borkmann 提交于
      commit c751798aa224fadc5124b49eeb38fb468c0fa039 upstream.
      
      syzkaller managed to trigger the warning in bpf_jit_free() which checks via
      bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
      in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
      
        [...]
        8021q: adding VLAN 0 to HW filter on device batadv0
        8021q: adding VLAN 0 to HW filter on device batadv0
        WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x113/0x167 lib/dump_stack.c:113
         panic+0x212/0x40b kernel/panic.c:214
         __warn.cold.8+0x1b/0x38 kernel/panic.c:571
         report_bug+0x1a4/0x200 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
         do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        RIP: 0010:bpf_jit_free+0x1e8/0x2a0
        Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
        RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
        RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
        RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
        RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
        R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
        R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
        BUG: unable to handle kernel paging request at fffffbfff400d000
        #PF error: [normal kernel read fault]
        PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
        Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
        [...]
      
      Upon further debugging, it turns out that whenever we trigger this
      issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
      but yet bpf_jit_free() reported that the entry is /in use/.
      
      Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
      perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
      fd is exposed to the public, a parallel close request came in right
      before we attempted to do the bpf_prog_kallsyms_add().
      
      Given at this time the prog reference count is one, we start to rip
      everything underneath us via bpf_prog_release() -> bpf_prog_put().
      The memory is eventually released via deferred free, so we're seeing
      that bpf_jit_free() has a kallsym entry because we added it from
      bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
      
      Therefore, move both notifications /before/ we install the fd. The
      issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
      because upon bpf_prog_get_fd_by_id() we'll take another reference to
      the BPF prog, so we're still holding the original reference from the
      bpf_prog_load().
      
      Fixes: 6ee52e2a3fe4 ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Fixes: 74451e66 ("bpf: make jited programs visible in traces")
      Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      Signed-off-by: NZubin Mithra <zsm@chromium.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ed568ca7
    • D
      block: mq-deadline: Fix queue restart handling · dbb7339c
      Damien Le Moal 提交于
      [ Upstream commit cb8acabbe33b110157955a7425ee876fb81e6bbc ]
      
      Commit 7211aef86f79 ("block: mq-deadline: Fix write completion
      handling") added a call to blk_mq_sched_mark_restart_hctx() in
      dd_dispatch_request() to make sure that write request dispatching does
      not stall when all target zones are locked. This fix left a subtle race
      when a write completion happens during a dispatch execution on another
      CPU:
      
      CPU 0: Dispatch			CPU1: write completion
      
      dd_dispatch_request()
          lock(&dd->lock);
          ...
          lock(&dd->zone_lock);	dd_finish_request()
          rq = find request		lock(&dd->zone_lock);
          unlock(&dd->zone_lock);
          				zone write unlock
      				unlock(&dd->zone_lock);
      				...
      				__blk_mq_free_request
                                            check restart flag (not set)
      				      -> queue not run
          ...
          if (!rq && have writes)
              blk_mq_sched_mark_restart_hctx()
          unlock(&dd->lock)
      
      Since the dispatch context finishes after the write request completion
      handling, marking the queue as needing a restart is not seen from
      __blk_mq_free_request() and blk_mq_sched_restart() not executed leading
      to the dispatch stall under 100% write workloads.
      
      Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
      dd_dispatch_request() into dd_finish_request() under the zone lock to
      ensure full mutual exclusion between write request dispatch selection
      and zone unlock on write request completion.
      
      Fixes: 7211aef86f79 ("block: mq-deadline: Fix write completion handling")
      Cc: stable@vger.kernel.org
      Reported-by: NHans Holmberg <Hans.Holmberg@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dbb7339c
    • A
      arm: use STACK_TOP when computing mmap base address · af10ffa6
      Alexandre Ghiti 提交于
      [ Upstream commit 86e568e9c0525fc40e76d827212d5e9721cf7504 ]
      
      mmap base address must be computed wrt stack top address, using TASK_SIZE
      is wrong since STACK_TOP and TASK_SIZE are not equivalent.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-8-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      af10ffa6
    • A
      arm: properly account for stack randomization and stack guard gap · f91a9c65
      Alexandre Ghiti 提交于
      [ Upstream commit af0f4297286f13a75edf93677b1fb2fc16c412a7 ]
      
      This commit takes care of stack randomization and stack guard gap when
      computing mmap base address and checks if the task asked for
      randomization.  This fixes the problem uncovered and not fixed for arm
      here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-7-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f91a9c65
    • A
      mips: properly account for stack randomization and stack guard gap · 53ba8d43
      Alexandre Ghiti 提交于
      [ Upstream commit b1f61b5bde3a1f50392c97b4c8513d1b8efb1cf2 ]
      
      This commit takes care of stack randomization and stack guard gap when
      computing mmap base address and checks if the task asked for
      randomization.  This fixes the problem uncovered and not fixed for arm
      here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-10-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NPaul Burton <paul.burton@mips.com>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      53ba8d43
    • A
      arm64: consider stack randomization for mmap base only when necessary · e1b391ab
      Alexandre Ghiti 提交于
      [ Upstream commit e8d54b62c55ab6201de6d195fc2c276294c1f6ae ]
      
      Do not offset mmap base address because of stack randomization if current
      task does not want randomization.  Note that x86 already implements this
      behaviour.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-4-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e1b391ab
    • N
      kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K · 30ab799e
      Nicolas Boichat 提交于
      [ Upstream commit b751c52bb587ae66f773b15204ef7a147467f4c7 ]
      
      The current default value (400) is too low on many systems (e.g.  some
      ARM64 platform takes up 1000+ entries).
      
      syzbot uses 16000 as default value, and has proved to be enough on beefy
      configurations, so let's pick that value.
      
      This consumes more RAM on boot (each entry is 160 bytes, so in total
      ~2.5MB of RAM), but the memory would later be freed (early_log is
      __initdata).
      
      Link: http://lkml.kernel.org/r/20190730154027.101525-1-drinkcat@chromium.orgSigned-off-by: NNicolas Boichat <drinkcat@chromium.org>
      Suggested-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      30ab799e
    • C
      ocfs2: wait for recovering done after direct unlock request · 52132ff5
      Changwei Ge 提交于
      [ Upstream commit 0a3775e4f883912944481cf2ef36eb6383a9cc74 ]
      
      There is a scenario causing ocfs2 umount hang when multiple hosts are
      rebooting at the same time.
      
      NODE1                           NODE2               NODE3
      send unlock requset to NODE2
                                      dies
                                                          become recovery master
                                                          recover NODE2
      find NODE2 dead
      mark resource RECOVERING
      directly remove lock from grant list
      calculate usage but RECOVERING marked
      **miss the window of purging
      clear RECOVERING
      
      To reproduce this issue, crash a host and then umount ocfs2
      from another node.
      
      To solve this, just let unlock progress wait for recovery done.
      
      Link: http://lkml.kernel.org/r/1550124866-20367-1-git-send-email-gechangwei@live.cnSigned-off-by: NChangwei Ge <gechangwei@live.cn>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      52132ff5
    • G
      kbuild: clean compressed initramfs image · d4a54645
      Greg Thelen 提交于
      [ Upstream commit 6279eb3dd7946c69346a3b98473ed13d3a44adb5 ]
      
      Since 9e3596b0 ("kbuild: initramfs cleanup, set target from Kconfig")
      "make clean" leaves behind compressed initramfs images.  Example:
      
        $ make defconfig
        $ sed -i 's|CONFIG_INITRAMFS_SOURCE=""|CONFIG_INITRAMFS_SOURCE="/tmp/ir.cpio"|' .config
        $ make olddefconfig
        $ make -s
        $ make -s clean
        $ git clean -ndxf | grep initramfs
        Would remove usr/initramfs_data.cpio.gz
      
      clean rules do not have CONFIG_* context so they do not know which
      compression format was used.  Thus they don't know which files to delete.
      
      Tell clean to delete all possible compression formats.
      
      Once patched usr/initramfs_data.cpio.gz and friends are deleted by
      "make clean".
      
      Link: http://lkml.kernel.org/r/20190722063251.55541-1-gthelen@google.com
      Fixes: 9e3596b0 ("kbuild: initramfs cleanup, set target from Kconfig")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d4a54645
    • Y
      crypto: hisilicon - Fix double free in sec_free_hw_sgl() · d983182d
      Yunfeng Ye 提交于
      [ Upstream commit 24fbf7bad888767bed952f540ac963bc57e47e15 ]
      
      There are two problems in sec_free_hw_sgl():
      
      First, when sgl_current->next is valid, @hw_sgl will be freed in the
      first loop, but it free again after the loop.
      
      Second, sgl_current and sgl_current->next_sgl is not match when
      dma_pool_free() is invoked, the third parameter should be the dma
      address of sgl_current, but sgl_current->next_sgl is the dma address
      of next chain, so use sgl_current->next_sgl is wrong.
      
      Fix this by deleting the last dma_pool_free() in sec_free_hw_sgl(),
      modifying the condition for while loop, and matching the address for
      dma_pool_free().
      
      Fixes: 915e4e84 ("crypto: hisilicon - SEC security accelerator driver")
      Signed-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d983182d
    • D
      hypfs: Fix error number left in struct pointer member · 22c788ba
      David Howells 提交于
      [ Upstream commit b54c64f7adeb241423cd46598f458b5486b0375e ]
      
      In hypfs_fill_super(), if hypfs_create_update_file() fails,
      sbi->update_file is left holding an error number.  This is passed to
      hypfs_kill_super() which doesn't check for this.
      
      Fix this by not setting sbi->update_value until after we've checked for
      error.
      
      Fixes: 24bbb1fa ("[PATCH] s390_hypfs filesystem")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      cc: linux-s390@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      22c788ba
    • J
      pktcdvd: remove warning on attempting to register non-passthrough dev · bbd76d95
      Jens Axboe 提交于
      [ Upstream commit eb09b3cc464d2c3bbde9a6648603c8d599ea8582 ]
      
      Anatoly reports that he gets the below warning when booting -git on
      a sparc64 box on debian unstable:
      
      ...
      [   13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES
      implementation
      [   13.428002] ------------[ cut here ]------------
      [   13.428081] WARNING: CPU: 21 PID: 586 at
      drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
      [   13.428147] Attempt to register a non-SCSI queue
      [   13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64
      n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash
      sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4
      ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
      async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear
      md_mod crc32c_sparc64
      [   13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted
      5.3.0-10169-g574cc4539762 #1234
      [   13.428507] Call Trace:
      [   13.428542]  [00000000004635c0] __warn+0xc0/0x100
      [   13.428582]  [0000000000463634] warn_slowpath_fmt+0x34/0x60
      [   13.428626]  [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd]
      [   13.428674]  [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd]
      [   13.428724]  [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0
      [   13.428764]  [00000000006b96c8] ksys_ioctl+0x48/0x80
      [   13.428803]  [00000000006b9714] sys_ioctl+0x14/0x40
      [   13.428847]  [0000000000406294] linux_sparc_syscall+0x34/0x44
      [   13.428890] irq event stamp: 4181
      [   13.428924] hardirqs last  enabled at (4189): [<00000000004e0a74>]
      console_unlock+0x634/0x6c0
      [   13.428984] hardirqs last disabled at (4196): [<00000000004e0540>]
      console_unlock+0x100/0x6c0
      [   13.429048] softirqs last  enabled at (3978): [<0000000000b2e2d8>]
      __do_softirq+0x498/0x520
      [   13.429110] softirqs last disabled at (3967): [<000000000042cfb4>]
      do_softirq_own_stack+0x34/0x60
      [   13.429172] ---[ end trace 2220ca468f32967d ]---
      [   13.430018] pktcdvd: setup of pktcdvd device failed
      [   13.455589] des_sparc64: Using sparc64 des opcodes optimized DES
      implementation
      [   13.515334] camellia_sparc64: Using sparc64 camellia opcodes
      optimized CAMELLIA implementation
      [   13.522856] pktcdvd: setup of pktcdvd device failed
      [   13.529327] pktcdvd: setup of pktcdvd device failed
      [   13.532932] pktcdvd: setup of pktcdvd device failed
      [   13.536165] pktcdvd: setup of pktcdvd device failed
      [   13.539372] pktcdvd: setup of pktcdvd device failed
      [   13.542834] pktcdvd: setup of pktcdvd device failed
      [   13.546536] pktcdvd: setup of pktcdvd device failed
      [   15.431071] XFS (dm-0): Mounting V5 Filesystem
      ...
      
      Apparently debian auto-attaches any cdrom like device to pktcdvd, which
      can lead to the above warning. There's really no reason to warn for this
      situation, kill it.
      Reported-by: NAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bbd76d95
    • O
      fat: work around race with userspace's read via blockdev while mounting · 0840daee
      OGAWA Hirofumi 提交于
      [ Upstream commit 07bfa4415ab607e459b69bd86aa7e7602ce10b4f ]
      
      If userspace reads the buffer via blockdev while mounting,
      sb_getblk()+modify can race with buffer read via blockdev.
      
      For example,
      
                  FS                               userspace
          bh = sb_getblk()
          modify bh->b_data
                                        read
      				    ll_rw_block(bh)
      				      fill bh->b_data by on-disk data
      				      /* lost modified data by FS */
      				      set_buffer_uptodate(bh)
          set_buffer_uptodate(bh)
      
      Userspace should not use the blockdev while mounting though, the udev
      seems to be already doing this.  Although I think the udev should try to
      avoid this, workaround the race by small overhead.
      
      Link: http://lkml.kernel.org/r/87pnk7l3sw.fsf_-_@mail.parknet.co.jpSigned-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Tested-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0840daee
    • M
      ARM: 8903/1: ensure that usable memory in bank 0 starts from a PMD-aligned address · 297904ea
      Mike Rapoport 提交于
      [ Upstream commit 00d2ec1e6bd82c0538e6dd3e4a4040de93ba4fef ]
      
      The calculation of memblock_limit in adjust_lowmem_bounds() assumes that
      bank 0 starts from a PMD-aligned address. However, the beginning of the
      first bank may be NOMAP memory and the start of usable memory
      will be not aligned to PMD boundary. In such case the memblock_limit will
      be set to the end of the NOMAP region, which will prevent any memblock
      allocations.
      
      Mark the region between the end of the NOMAP area and the next PMD-aligned
      address as NOMAP as well, so that the usable memory will start at
      PMD-aligned address.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      297904ea