1. 27 12月, 2019 40 次提交
    • D
      xfs: finobt AG reserves don't consider last AG can be a runt · 2f00d4bc
      Dave Chinner 提交于
      mainline inclusion
      from mainline-4.20-rc4
      commit c0876897
      category: bugfix
      bugzilla: 18898
      CVE: NA
      
      ---------------------------
      
      The last AG may be very small comapred to all other AGs, and hence
      AG reservations based on the superblock AG size may actually consume
      more space than the AG actually has. This results on assert failures
      like:
      
      XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319
      [   48.932891]  xfs_ag_resv_init+0x1bd/0x1d0
      [   48.933853]  xfs_fs_reserve_ag_blocks+0x37/0xb0
      [   48.934939]  xfs_mountfs+0x5b3/0x920
      [   48.935804]  xfs_fs_fill_super+0x462/0x640
      [   48.936784]  ? xfs_test_remount_options+0x60/0x60
      [   48.937908]  mount_bdev+0x178/0x1b0
      [   48.938751]  mount_fs+0x36/0x170
      [   48.939533]  vfs_kern_mount.part.43+0x54/0x130
      [   48.940596]  do_mount+0x20e/0xcb0
      [   48.941396]  ? memdup_user+0x3e/0x70
      [   48.942249]  ksys_mount+0xba/0xd0
      [   48.943046]  __x64_sys_mount+0x21/0x30
      [   48.943953]  do_syscall_64+0x54/0x170
      [   48.944835]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Hence we need to ensure the finobt per-ag space reservations take
      into account the size of the last AG rather than treat it like all
      the other full size AGs.
      
      Note that both refcountbt and rmapbt already take the size of the AG
      into account via reading the AGF length directly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: Nyu kuai <yukuai3@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2f00d4bc
    • D
      xfs: fix use-after-free race in xfs_buf_rele · 635a7979
      Dave Chinner 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 37fd1678
      category: bugfix
      bugzilla: 18873
      CVE: NA
      
      ---------------------------
      
      When looking at a 4.18 based KASAN use after free report, I noticed
      that racing xfs_buf_rele() may race on dropping the last reference
      to the buffer and taking the buffer lock. This was the symptom
      displayed by the KASAN report, but the actual issue that was
      reported had already been fixed in 4.19-rc1 by commit e339dd8d
      ("xfs: use sync buffer I/O for sync delwri queue submission").
      
      Despite this, I think there is still an issue with xfs_buf_rele()
      in this code:
      
              release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
              spin_lock(&bp->b_lock);
              if (!release) {
      .....
      
      If two threads race on the b_lock after both dropping a reference
      and one getting dropping the last reference so release = true, we
      end up with:
      
      CPU 0				CPU 1
      atomic_dec_and_lock()
      				atomic_dec_and_lock()
      				spin_lock(&bp->b_lock)
      spin_lock(&bp->b_lock)
      <spins>
      				<release = true bp->b_lru_ref = 0>
      				<remove from lists>
      				freebuf = true
      				spin_unlock(&bp->b_lock)
      				xfs_buf_free(bp)
      <gets lock, reading and writing freed memory>
      <accesses freed memory>
      spin_unlock(&bp->b_lock) <reads/writes freed memory>
      
      IOWs, we can't safely take bp->b_lock after dropping the hold
      reference because the buffer may go away at any time after we
      drop that reference. However, this can be fixed simply by taking the
      bp->b_lock before we drop the reference.
      
      It is safe to nest the pag_buf_lock inside bp->b_lock as the
      pag_buf_lock is only used to serialise against lookup in
      xfs_buf_find() and no other locks are held over or under the
      pag_buf_lock there. Make this clear by documenting the buffer lock
      orders at the top of the file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: Nyu kuai <yukuai3@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      635a7979
    • D
      xfs: xrep_findroot_block should reject root blocks with siblings · c010876e
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 1002ff45
      category: bugfix
      bugzilla: 18865
      CVE: NA
      
      ---------------------------
      
      In xrep_findroot_block, if we find a candidate root block with sibling
      pointers or sibling blocks on the same tree level, we should not return
      that block as a tree root because root blocks cannot have siblings.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: Nyu kuai <yukuai3@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c010876e
    • M
      dm thin metadata: check if in fail_io mode when setting needs_check · 01fd6933
      Mike Snitzer 提交于
      mainline inclusion
      from mainline-v5.3-rc1
      commit 54fa16ee
      category: bugfix
      bugzilla: 18564
      CVE: NA
      
      -------------------------------------------------
      
      Check if in fail_io mode at start of dm_pool_metadata_set_needs_check().
      Otherwise dm_pool_metadata_set_needs_check()'s superblock_lock() can
      crash in dm_bm_write_lock() while accessing the block manager object
      that was previously destroyed as part of a failed
      dm_pool_abort_metadata() that ultimately set fail_io to begin with.
      
      Also, update DMERR() message to more accurately describe
      superblock_lock() failure.
      
      Cc: stable@vger.kernel.org
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      
      Conflicts:
      	drivers/md/dm-thin-metadata.c
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Reviewed-by: NYi Zhang <yi.zhang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      01fd6933
    • Z
      SCSI: fix queue cleanup race before scsi_requeue_run_queue is done · 78b390b1
      zhengbin 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 20127
      CVE: NA
      
      ---------------------------
      
      KASAN reports a use-after-free in dd_has_work, need to make sure
      scsi_requeue_run_queue is done before blk_cleanup_queue.
      
      BUG: KASAN: use-after-free in dd_has_work+0x50/0xe8
      Read of size 8 at addr ffff808b57c6f168 by task kworker/53:1H/6910
      
      CPU: 53 PID: 6910 Comm: kworker/53:1H Kdump: loaded Tainted: G
      Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.59 01/31/2019
      Workqueue: kblockd scsi_requeue_run_queue
      Call trace:
       dump_backtrace+0x0/0x270
       show_stack+0x24/0x30
       dump_stack+0xb4/0xe4
       print_address_description+0x68/0x278
       kasan_report+0x204/0x330
       __asan_load8+0x88/0xb0
       dd_has_work+0x50/0xe8
       blk_mq_run_hw_queue+0x19c/0x218
       blk_mq_run_hw_queues+0x7c/0xb0
       scsi_run_queue+0x3ec/0x520
       scsi_requeue_run_queue+0x2c/0x38
       process_one_work+0x2e4/0x6d8
       worker_thread+0x6c/0x6a8
       kthread+0x1b4/0x1c0
       ret_from_fork+0x10/0x18
      
      Allocated by task 46843:
       kasan_kmalloc+0xe0/0x190
       kmem_cache_alloc_node_trace+0x10c/0x258
       dd_init_queue+0x68/0x190
       blk_mq_init_sched+0x1cc/0x300
       elevator_init_mq+0x90/0xe0
       blk_mq_init_allocated_queue+0x700/0x728
       blk_mq_init_queue+0x48/0x90
       scsi_mq_alloc_queue+0x34/0xb0
       scsi_alloc_sdev+0x340/0x530
       scsi_probe_and_add_lun+0x46c/0x1260
       __scsi_scan_target+0x1b8/0x7b0
       scsi_scan_target+0x140/0x150
       fc_scsi_scan_rport+0x164/0x178 [scsi_transport_fc]
       process_one_work+0x2e4/0x6d8
       worker_thread+0x6c/0x6a8
       kthread+0x1b4/0x1c0
       ret_from_fork+0x10/0x18
      
      Freed by task 46843:
       __kasan_slab_free+0x120/0x228
       kasan_slab_free+0x10/0x18
       kfree+0x88/0x218
       dd_exit_queue+0x5c/0x78
       blk_mq_exit_sched+0x104/0x130
       elevator_exit+0xa8/0xc8
       blk_exit_queue+0x48/0x78
       blk_cleanup_queue+0x170/0x248
       __scsi_remove_device+0x84/0x1b0
       scsi_probe_and_add_lun+0xd00/0x1260
       __scsi_scan_target+0x1b8/0x7b0
       scsi_scan_target+0x140/0x150
       fc_scsi_scan_rport+0x164/0x178 [scsi_transport_fc]
       process_one_work+0x2e4/0x6d8
       worker_thread+0x6c/0x6a8
       kthread+0x1b4/0x1c0
       ret_from_fork+0x10/0x18
      
      Fixes: 8dc765d4 ("SCSI: fix queue cleanup race before queue initialization is done")
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      Reviewed-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      78b390b1
    • J
      scsi: hisi_sas: no delay after issue phy reset if sas_smp_phy_control() fail · c69f67b4
      Jiaxing Luo 提交于
      driver inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      At expander ENV, we delay after issue phy reset to wait for hardware to
      handle phy reset. But if sas_smp_phy_control() fail, delay is unnecessary
      because we will continue controller reset.
      
      So we do not delay if sas_smp_phy_control() return error.
      
      Feature or Bugfix: Bugfix
      Signed-off-by: NJiaxing Luo <luojiaxing@huawei.com>
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: Nluojiaxing <luojiaxing@huawei.com>
      Reviewed-by: Nchenxiang <chenxiang66@hisilicon.com>
      Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c69f67b4
    • T
      crypto/hisilicon/qm: fix qm/zip sparse check · 19baf087
      tanshukun 提交于
      driver inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      Feature or Bugfix:Bugfix
      Signed-off-by: Ntanshukun (A) <tanshukun1@huawei.com>
      Reviewed-by: Nwangzhou <wangzhou1@hisilicon.com>
      Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      19baf087
    • G
      Linux 4.19.66 · fbb0b740
      Greg Kroah-Hartman 提交于
      Merge 46 patches from 4.19.66 stable
      branch (46 total) beside 0 already merged patches.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      fbb0b740
    • L
      spi: bcm2835: Fix 3-wire mode if DMA is enabled · 6518374b
      Lukas Wunner 提交于
      commit 8d8bef50 upstream.
      
      Commit 6935224d ("spi: bcm2835: enable support of 3-wire mode")
      added 3-wire support to the BCM2835 SPI driver by setting the REN bit
      (Read Enable) in the CS register when receiving data.  The REN bit puts
      the transmitter in high-impedance state.  The driver recognizes that
      data is to be received by checking whether the rx_buf of a transfer is
      non-NULL.
      
      Commit 3ecd37ed ("spi: bcm2835: enable dma modes for transfers
      meeting certain conditions") subsequently broke 3-wire support because
      it set the SPI_MASTER_MUST_RX flag which causes spi_map_msg() to replace
      rx_buf with a dummy buffer if it is NULL.  As a result, rx_buf is
      *always* non-NULL if DMA is enabled.
      
      Reinstate 3-wire support by not only checking whether rx_buf is non-NULL,
      but also checking that it is not the dummy buffer.
      
      Fixes: 3ecd37ed ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
      Reported-by: NNuno Sá <nuno.sa@analog.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v4.2+
      Cc: Martin Sperl <kernel@martin.sperl.org>
      Acked-by: NStefan Wahren <wahrenst@gmx.net>
      Link: https://lore.kernel.org/r/328318841455e505370ef8ecad97b646c033dc8a.1562148527.git.lukas@wunner.deSigned-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6518374b
    • T
      cgroup: Fix css_task_iter_advance_css_set() cset skip condition · 8c2bbb54
      Tejun Heo 提交于
      commit c596687a upstream.
      
      While adding handling for dying task group leaders c03cd773
      ("cgroup: Include dying leaders with live threads in PROCS
      iterations") added an inverted cset skip condition to
      css_task_iter_advance_css_set().  It should skip cset if it's
      completely empty but was incorrectly testing for the inverse condition
      for the dying_tasks list.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: c03cd773 ("cgroup: Include dying leaders with live threads in PROCS iterations")
      Reported-by: syzbot+d4bba5ccd4f9a2a68681@syzkaller.appspotmail.com
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      8c2bbb54
    • T
      cgroup: css_task_iter_skip()'d iterators must be advanced before accessed · 58614d47
      Tejun Heo 提交于
      commit cee0c33c upstream.
      
      b636fd38 ("cgroup: Implement css_task_iter_skip()") introduced
      css_task_iter_skip() which is used to fix task iterations skipping
      dying threadgroup leaders with live threads.  Skipping is implemented
      as a subportion of full advancing but css_task_iter_next() forgot to
      fully advance a skipped iterator before determining the next task to
      visit causing it to return invalid task pointers.
      
      Fix it by making css_task_iter_next() fully advance the iterator if it
      has been skipped since the previous iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: syzbot
      Link: http://lkml.kernel.org/r/00000000000097025d058a7fd785@google.com
      Fixes: b636fd38 ("cgroup: Implement css_task_iter_skip()")
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      58614d47
    • T
      cgroup: Include dying leaders with live threads in PROCS iterations · 2cd06ce5
      Tejun Heo 提交于
      commit c03cd773 upstream.
      
      CSS_TASK_ITER_PROCS currently iterates live group leaders; however,
      this means that a process with dying leader and live threads will be
      skipped.  IOW, cgroup.procs might be empty while cgroup.threads isn't,
      which is confusing to say the least.
      
      Fix it by making cset track dying tasks and include dying leaders with
      live threads in PROCS iteration.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NTopi Miettinen <toiwoton@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2cd06ce5
    • T
      cgroup: Implement css_task_iter_skip() · 0b0c6e8a
      Tejun Heo 提交于
      commit b636fd38 upstream.
      
      When a task is moved out of a cset, task iterators pointing to the
      task are advanced using the normal css_task_iter_advance() call.  This
      is fine but we'll be tracking dying tasks on csets and thus moving
      tasks from cset->tasks to (to be added) cset->dying_tasks.  When we
      remove a task from cset->tasks, if we advance the iterators, they may
      move over to the next cset before we had the chance to add the task
      back on the dying list, which can allow the task to escape iteration.
      
      This patch separates out skipping from advancing.  Skipping only moves
      the affected iterators to the next pointer rather than fully advancing
      it and the following advancing will recognize that the cursor has
      already been moved forward and do the rest of advancing.  This ensures
      that when a task moves from one list to another in its cset, as long
      as it moves in the right direction, it's always visible to iteration.
      
      This doesn't cause any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      0b0c6e8a
    • T
      cgroup: Call cgroup_release() before __exit_signal() · 4e15eff1
      Tejun Heo 提交于
      commit 6b115bf5 upstream.
      
      cgroup_release() calls cgroup_subsys->release() which is used by the
      pids controller to uncharge its pid.  We want to use it to manage
      iteration of dying tasks which requires putting it before
      __unhash_process().  Move cgroup_release() above __exit_signal().
      While this makes it uncharge before the pid is freed, pid is RCU freed
      anyway and the window is very narrow.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4e15eff1
    • A
      compat_ioctl: pppoe: fix PPPOEIOCSFWD handling · f5ff908e
      Arnd Bergmann 提交于
      [ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ]
      
      Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
      linux-2.5.69 along with hundreds of other commands, but was always broken
      sincen only the structure is compatible, but the command number is not,
      due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
      sockaddr_pppox)), which is different on 64-bit architectures.
      
      Guillaume Nault adds:
      
        And the implementation was broken until 2016 (see 29e73269 ("pppoe:
        fix reference counting in PPPoE proxy")), and nobody ever noticed. I
        should probably have removed this ioctl entirely instead of fixing it.
        Clearly, it has never been used.
      
      Fix it by adding a compat_ioctl handler for all pppoe variants that
      translates the command number and then calls the regular ioctl function.
      
      All other ioctl commands handled by pppoe are compatible between 32-bit
      and 64-bit, and require compat_ptr() conversion.
      
      This should apply to all stable kernels.
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f5ff908e
    • H
      r8169: don't use MSI before RTL8168d · 7b7b2eef
      Heiner Kallweit 提交于
      [ Upstream commit 003bd5b4 ]
      
      It was reported that after resuming from suspend network fails with
      error "do_IRQ: 3.38 No irq handler for vector", see [0]. Enabling WoL
      can work around the issue, but the only actual fix is to disable MSI.
      So let's mimic the behavior of the vendor driver and disable MSI on
      all chip versions before RTL8168d.
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=204079
      
      Fixes: 6c6aa15f ("r8169: improve interrupt handling")
      Reported-by: NDušan Dragić <dragic.dusan@gmail.com>
      Tested-by: NDušan Dragić <dragic.dusan@gmail.com>
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      7b7b2eef
    • A
      net/mlx5e: Prevent encap flow counter update async to user query · 703f039e
      Ariel Levkovich 提交于
      [ Upstream commit 90bb7692 ]
      
      This patch prevents a race between user invoked cached counters
      query and a neighbor last usage updater.
      
      The cached flow counter stats can be queried by calling
      "mlx5_fc_query_cached" which provides the number of bytes and
      packets that passed via this flow since the last time this counter
      was queried.
      It does so by reducting the last saved stats from the current, cached
      stats and then updating the last saved stats with the cached stats.
      It also provide the lastuse value for that flow.
      
      Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
      last usage time of encapsulation flows, it calls the flow counter
      query method periodically and async to user queries of the flow counter
      using cls_flower.
      This call is causing the driver to update the last reported bytes and
      packets from the cache and therefore, future user queries of the flow
      stats will return lower than expected number for bytes and packets
      since the last saved stats in the driver was updated async to the last
      saved stats in cls_flower.
      
      This causes wrong stats presentation of encapsulation flows to user.
      
      Since the neighbor usage updater only needs the lastuse stats from the
      cached counter, the fix is to use a dedicated lastuse query call that
      returns the lastuse value without synching between the cached stats and
      the last saved stats.
      
      Fixes: f6dfb4c3 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters")
      Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      703f039e
    • E
      net/mlx5: Fix modify_cq_in alignment · 993da109
      Edward Srouji 提交于
      [ Upstream commit 7a32f296 ]
      
      Fix modify_cq_in alignment to match the device specification.
      After this fix the 'cq_umem_valid' field will be in the right offset.
      
      Cc: <stable@vger.kernel.org> # 4.19
      Fixes: bd371975 ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
      Signed-off-by: NEdward Srouji <edwards@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      993da109
    • A
      tun: mark small packets as owned by the tap sock · ea79eedb
      Alexis Bauvin 提交于
      [ Upstream commit 4b663366 ]
      
      - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
      
      Small packets going out of a tap device go through an optimized code
      path that uses build_skb() rather than sock_alloc_send_pskb(). The
      latter calls skb_set_owner_w(), but the small packet code path does not.
      
      The net effect is that small packets are not owned by the userland
      application's socket (e.g. QEMU), while large packets are.
      This can be seen with a TCP session, where packets are not owned when
      the window size is small enough (around PAGE_SIZE), while they are once
      the window grows (note that this requires the host to support virtio
      tso for the guest to offload segmentation).
      All this leads to inconsistent behaviour in the kernel, especially on
      netfilter modules that uses sk->socket (e.g. xt_owner).
      
      Fixes: 66ccbc9c ("tap: use build_skb() for small packet")
      Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ea79eedb
    • T
      tipc: compat: allow tipc commands without arguments · 043fcb87
      Taras Kondratiuk 提交于
      [ Upstream commit 4da5f001 ]
      
      Commit 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit")
      broke older tipc tools that use compat interface (e.g. tipc-config from
      tipcutils package):
      
      % tipc-config -p
      operation not supported
      
      The commit started to reject TIPC netlink compat messages that do not
      have attributes. It is too restrictive because some of such messages are
      valid (they don't need any arguments):
      
      % grep 'tx none' include/uapi/linux/tipc_config.h
      #define  TIPC_CMD_NOOP              0x0000    /* tx none, rx none */
      #define  TIPC_CMD_GET_MEDIA_NAMES   0x0002    /* tx none, rx media_name(s) */
      #define  TIPC_CMD_GET_BEARER_NAMES  0x0003    /* tx none, rx bearer_name(s) */
      #define  TIPC_CMD_SHOW_PORTS        0x0006    /* tx none, rx ultra_string */
      #define  TIPC_CMD_GET_REMOTE_MNG    0x4003    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_MAX_PORTS     0x4004    /* tx none, rx unsigned */
      #define  TIPC_CMD_GET_NETID         0x400B    /* tx none, rx unsigned */
      #define  TIPC_CMD_NOT_NET_ADMIN     0xC001    /* tx none, rx none */
      
      This patch relaxes the original fix and rejects messages without
      arguments only if such arguments are expected by a command (reg_type is
      non zero).
      
      Fixes: 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTaras Kondratiuk <takondra@cisco.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      043fcb87
    • C
      ocelot: Cancel delayed work before wq destruction · 537ba758
      Claudiu Manoil 提交于
      [ Upstream commit c5d13969 ]
      
      Make sure the delayed work for stats update is not pending before
      wq destruction.
      This fixes the module unload path.
      The issue is there since day 1.
      
      Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      537ba758
    • J
      NFC: nfcmrvl: fix gpio-handling regression · 39c26a4a
      Johan Hovold 提交于
      [ Upstream commit c3953a3c ]
      
      Fix two reset-gpio sanity checks which were never converted to use
      gpio_is_valid(), and make sure to use -EINVAL to indicate a missing
      reset line also for the UART-driver module parameter and for the USB
      driver.
      
      This specifically prevents the UART and USB drivers from incidentally
      trying to request and use gpio 0, and also avoids triggering a WARN() in
      gpio_to_desc() during probe when no valid reset line has been specified.
      
      Fixes: e33a3f84 ("NFC: nfcmrvl: allow gpio 0 for reset signalling")
      Reported-by: syzbot+cf35b76f35e068a1107f@syzkaller.appspotmail.com
      Tested-by: syzbot+cf35b76f35e068a1107f@syzkaller.appspotmail.com
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      39c26a4a
    • U
      net/smc: do not schedule tx_work in SMC_CLOSED state · 4b4cbfdf
      Ursula Braun 提交于
      [ Upstream commit f9cedf1a ]
      
      The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
      tx worker. Make sure the socket is not yet moved into SMC_CLOSED
      state (for instance by a shutdown SHUT_RDWR call).
      
      Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
      Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
      Fixes: 01d2f7e2 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4b4cbfdf
    • D
      net: sched: use temporary variable for actions indexes · 3eabff3e
      Dmytro Linkin 提交于
      [ Upstream commit 7be8ef2c ]
      
      Currently init call of all actions (except ipt) init their 'parm'
      structure as a direct pointer to nla data in skb. This leads to race
      condition when some of the filter actions were initialized successfully
      (and were assigned with idr action index that was written directly
      into nla data), but then were deleted and retried (due to following
      action module missing or classifier-initiated retry), in which case
      action init code tries to insert action to idr with index that was
      assigned on previous iteration. During retry the index can be reused
      by another action that was inserted concurrently, which causes
      unintended action sharing between filters.
      To fix described race condition, save action idr index to temporary
      stack-allocated variable instead on nla data.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3eabff3e
    • R
      net sched: update vlan action for batched events operations · c4f0e90d
      Roman Mashak 提交于
      [ Upstream commit b35475c5491a14c8ce7a5046ef7bcda8a860581a ]
      
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      
      Fixes: c7e2b968 ("sched: introduce vlan action")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c4f0e90d
    • J
      net: sched: Fix a possible null-pointer dereference in dequeue_func() · 3107a825
      Jia-Ju Bai 提交于
      [ Upstream commit 051c7b39be4a91f6b7d8c4548444e4b850f1f56c ]
      
      In dequeue_func(), there is an if statement on line 74 to check whether
      skb is NULL:
          if (skb)
      
      When skb is NULL, it is used on line 77:
          prefetch(&skb->end);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, skb->end is used when skb is not NULL.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: 76e3cc12 ("codel: Controlled Delay AQM")
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3107a825
    • S
      net: qualcomm: rmnet: Fix incorrect UL checksum offload logic · c42b9294
      Subash Abhinov Kasiviswanathan 提交于
      [ Upstream commit a7cf3d24 ]
      
      The udp_ip4_ind bit is set only for IPv4 UDP non-fragmented packets
      so that the hardware can flip the checksum to 0xFFFF if the computed
      checksum is 0 per RFC768.
      
      However, this bit had to be set for IPv6 UDP non fragmented packets
      as well per hardware requirements. Otherwise, IPv6 UDP packets
      with computed checksum as 0 were transmitted by hardware and were
      dropped in the network.
      
      In addition to setting this bit for IPv6 UDP, the field is also
      appropriately renamed to udp_ind as part of this change.
      
      Fixes: 5eb5f860 ("net: qualcomm: rmnet: Add support for TX checksum offload")
      Cc: Sean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c42b9294
    • R
      net: phylink: Fix flow control for fixed-link · e89d63d6
      René van Dorst 提交于
      [ Upstream commit 8aace4f3eba2a3ceb431e18683ea0e1ecbade5cd ]
      
      In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
      with pl->supported, pl->supported is zeroed and only the speed/duplex
      modes and MII bits are set.
      So pl->link_config.advertising always loses the flow control/pause bits.
      
      By setting Pause and Asym_Pause bits in pl->supported, the flow control
      work again when devicetree "pause" is set in fixes-link node and the MAC
      advertise that is supports pause.
      
      Results with this patch.
      
      Legend:
      - DT = 'Pause' is set in the fixed-link in devicetree.
      - validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
        validate().
      - flow = results reported my link is Up line.
      
      +-----+------------+-------+
      | DT  | validate() | flow  |
      +-----+------------+-------+
      | Yes | Yes        | rx/tx |
      | No  | Yes        | off   |
      | Yes | No         | off   |
      +-----+------------+-------+
      
      Fixes: 9525ae83 ("phylink: add phylink infrastructure")
      Signed-off-by: NRené van Dorst <opensource@vdorst.com>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e89d63d6
    • M
      net/mlx5: Use reversed order when unregister devices · 7c29236e
      Mark Zhang 提交于
      [ Upstream commit 08aa5e7da6bce1a1963f63cf32c2e7ad434ad578 ]
      
      When lag is active, which is controlled by the bonded mlx5e netdev, mlx5
      interface unregestering must happen in the reverse order where rdma is
      unregistered (unloaded) first, to guarantee all references to the lag
      context in hardware is removed, then remove mlx5e netdev interface which
      will cleanup the lag context from hardware.
      
      Without this fix during destroy of LAG interface, we observed following
      errors:
       * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
         status bad parameter(0x3), syndrome (0xe4ac33)
       * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
         status bad parameter(0x3), syndrome (0xa5aee8).
      
      Fixes: a31208b1 ("net/mlx5_core: New init and exit flow for mlx5_core")
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NMark Zhang <markz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      7c29236e
    • Q
      net/mlx5e: always initialize frag->last_in_page · a26f14e9
      Qian Cai 提交于
      [ Upstream commit 60d60c8fbd8d1acf25b041ecd72ae4fa16e9405b ]
      
      The commit 069d1146 ("net/mlx5e: RX, Enhance legacy Receive Queue
      memory scheme") introduced an undefined behaviour below due to
      "frag->last_in_page" is only initialized in mlx5e_init_frags_partition()
      when,
      
      if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE)
      
      or after bailed out the loop,
      
      for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++)
      
      As the result, there could be some "frag" have uninitialized
      value of "last_in_page".
      
      Later, get_frag() obtains those "frag" and check "frag->last_in_page" in
      mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always
      initializing "frag->last_in_page" to "false" in
      mlx5e_init_frags_partition().
      
      UBSAN: Undefined behaviour in
      drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12
      load of value 170 is not a valid value for type 'bool' (aka '_Bool')
      Call trace:
       dump_backtrace+0x0/0x264
       show_stack+0x20/0x2c
       dump_stack+0xb0/0x104
       __ubsan_handle_load_invalid_value+0x104/0x128
       mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core]
       mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core]
       mlx5e_napi_poll+0x17c/0xa30 [mlx5_core]
       net_rx_action+0x248/0x940
       __do_softirq+0x350/0x7b8
       irq_exit+0x200/0x26c
       __handle_domain_irq+0xc8/0x128
       gic_handle_irq+0x138/0x228
       el1_irq+0xb8/0x140
       arch_cpu_idle+0x1a4/0x348
       do_idle+0x114/0x1b0
       cpu_startup_entry+0x24/0x28
       rest_init+0x1ac/0x1dc
       arch_call_rest_init+0x10/0x18
       start_kernel+0x4d4/0x57c
      
      Fixes: 069d1146 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      a26f14e9
    • J
      net: fix ifindex collision during namespace removal · d60b9685
      Jiri Pirko 提交于
      [ Upstream commit 55b40dbf0e76b4bfb9d8b3a16a0208640a9a45df ]
      
      Commit aca51397 ("netns: Fix arbitrary net_device-s corruptions
      on net_ns stop.") introduced a possibility to hit a BUG in case device
      is returning back to init_net and two following conditions are met:
      1) dev->ifindex value is used in a name of another "dev%d"
         device in init_net.
      2) dev->name is used by another device in init_net.
      
      Under real life circumstances this is hard to get. Therefore this has
      been present happily for over 10 years. To reproduce:
      
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip netns add ns1
      $ ip -n ns1 link add dummy1ns1 type dummy
      $ ip -n ns1 link add dummy2ns1 type dummy
      $ ip link set enp0s2 netns ns1
      $ ip -n ns1 link set enp0s2 name dummy0
      [  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
      $ ip link add dev4 type dummy
      $ ip -n ns1 a
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
      3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
      4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
      $ ip netns del ns1
      [  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
      [  158.719316] ------------[ cut here ]------------
      [  158.720591] kernel BUG at net/core/dev.c:9824!
      [  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
      [  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  158.727508] Workqueue: netns cleanup_net
      [  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  158.762758] Call Trace:
      [  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
      [  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.769870]  ops_exit_list.isra.0+0xa8/0x150
      [  158.771544]  cleanup_net+0x446/0x8f0
      [  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
      [  158.775294]  process_one_work+0xa1a/0x1740
      [  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
      [  158.779143]  ? do_raw_spin_lock+0x11b/0x280
      [  158.780848]  worker_thread+0x9e/0x1060
      [  158.782500]  ? process_one_work+0x1740/0x1740
      [  158.784454]  kthread+0x31b/0x420
      [  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
      [  158.788286]  ret_from_fork+0x3a/0x50
      [  158.789871] ---[ end trace defd6c657c71f936 ]---
      [  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by checking if a device with the same name exists in init_net
      and fallback to original code - dev%d to allocate name - in case it does.
      
      This was found using syzkaller.
      
      Fixes: aca51397 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d60b9685
    • N
      net: bridge: mcast: don't delete permanent entries when fast leave is enabled · 3699bdc7
      Nikolay Aleksandrov 提交于
      [ Upstream commit 5c725b6b ]
      
      When permanent entries were introduced by the commit below, they were
      exempt from timing out and thus igmp leave wouldn't affect them unless
      fast leave was enabled on the port which was added before permanent
      entries existed. It shouldn't matter if fast leave is enabled or not
      if the user added a permanent entry it shouldn't be deleted on igmp
      leave.
      
      Before:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      $
      
      After:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3699bdc7
    • N
      net: bridge: delete local fdb on device init failure · 6b400bcc
      Nikolay Aleksandrov 提交于
      [ Upstream commit d7bae09f ]
      
      On initialization failure we have to delete the local fdb which was
      inserted due to the default pvid creation. This problem has been present
      since the inception of default_pvid. Note that currently there are 2 cases:
      1) in br_dev_init() when br_multicast_init() fails
      2) if register_netdevice() fails after calling ndo_init()
      
      This patch takes care of both since br_vlan_flush() is called on both
      occasions. Also the new fdb delete would be a no-op on normal bridge
      device destruction since the local fdb would've been already flushed by
      br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
      called last when adding a port thus nothing can fail after it.
      
      Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6b400bcc
    • M
      mvpp2: refactor MTU change code · 18092f89
      Matteo Croce 提交于
      [ Upstream commit 230bd958c2c846ee292aa38bc6b006296c24ca01 ]
      
      The MTU change code can call napi_disable() with the device already down,
      leading to a deadlock. Also, lot of code is duplicated unnecessarily.
      
      Rework mvpp2_change_mtu() to avoid the deadlock and remove duplicated code.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      18092f89
    • M
      mvpp2: fix panic on module removal · 685fb60f
      Matteo Croce 提交于
      [ Upstream commit 944a83a2669ae8aa2c7664e79376ca7468eb0a2b ]
      
      mvpp2 uses a delayed workqueue to gather traffic statistics.
      On module removal the workqueue can be destroyed before calling
      cancel_delayed_work_sync() on its works.
      Fix it by moving the destroy_workqueue() call after mvpp2_port_remove().
      Also remove an unneeded call to flush_workqueue()
      
          # rmmod mvpp2
          [ 2743.311722] mvpp2 f4000000.ethernet eth1: phy link down 10gbase-kr/10Gbps/Full
          [ 2743.320063] mvpp2 f4000000.ethernet eth1: Link is Down
          [ 2743.572263] mvpp2 f4000000.ethernet eth2: phy link down sgmii/1Gbps/Full
          [ 2743.580076] mvpp2 f4000000.ethernet eth2: Link is Down
          [ 2744.102169] mvpp2 f2000000.ethernet eth0: phy link down 10gbase-kr/10Gbps/Full
          [ 2744.110441] mvpp2 f2000000.ethernet eth0: Link is Down
          [ 2744.115614] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
          [ 2744.115615] Mem abort info:
          [ 2744.115616]   ESR = 0x96000005
          [ 2744.115617]   Exception class = DABT (current EL), IL = 32 bits
          [ 2744.115618]   SET = 0, FnV = 0
          [ 2744.115619]   EA = 0, S1PTW = 0
          [ 2744.115620] Data abort info:
          [ 2744.115621]   ISV = 0, ISS = 0x00000005
          [ 2744.115622]   CM = 0, WnR = 0
          [ 2744.115624] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000422681000
          [ 2744.115626] [0000000000000000] pgd=0000000000000000, pud=0000000000000000
          [ 2744.115630] Internal error: Oops: 96000005 [#1] SMP
          [ 2744.115632] Modules linked in: mvpp2(-) algif_hash af_alg nls_iso8859_1 nls_cp437 vfat fat xhci_plat_hcd m25p80 spi_nor xhci_hcd mtd usbcore i2c_mv64xxx sfp usb_common marvell10g phy_generic spi_orion mdio_i2c i2c_core mvmdio phylink sbsa_gwdt ip_tables x_tables autofs4 [last unloaded: mvpp2]
          [ 2744.115654] CPU: 3 PID: 8357 Comm: kworker/3:2 Not tainted 5.3.0-rc2 #1
          [ 2744.115655] Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT)
          [ 2744.115665] Workqueue: events_power_efficient phylink_resolve [phylink]
          [ 2744.115669] pstate: a0000085 (NzCv daIf -PAN -UAO)
          [ 2744.115675] pc : __queue_work+0x9c/0x4d8
          [ 2744.115677] lr : __queue_work+0x170/0x4d8
          [ 2744.115678] sp : ffffff801001bd50
          [ 2744.115680] x29: ffffff801001bd50 x28: ffffffc422597600
          [ 2744.115684] x27: ffffff80109ae6f0 x26: ffffff80108e4018
          [ 2744.115688] x25: 0000000000000003 x24: 0000000000000004
          [ 2744.115691] x23: ffffff80109ae6e0 x22: 0000000000000017
          [ 2744.115694] x21: ffffffc42c030000 x20: ffffffc42209e8f8
          [ 2744.115697] x19: 0000000000000000 x18: 0000000000000000
          [ 2744.115699] x17: 0000000000000000 x16: 0000000000000000
          [ 2744.115701] x15: 0000000000000010 x14: ffffffffffffffff
          [ 2744.115702] x13: ffffff8090e2b95f x12: ffffff8010e2b967
          [ 2744.115704] x11: ffffff8010906000 x10: 0000000000000040
          [ 2744.115706] x9 : ffffff80109223b8 x8 : ffffff80109223b0
          [ 2744.115707] x7 : ffffffc42bc00068 x6 : 0000000000000000
          [ 2744.115709] x5 : ffffffc42bc00000 x4 : 0000000000000000
          [ 2744.115710] x3 : 0000000000000000 x2 : 0000000000000000
          [ 2744.115712] x1 : 0000000000000008 x0 : ffffffc42c030000
          [ 2744.115714] Call trace:
          [ 2744.115716]  __queue_work+0x9c/0x4d8
          [ 2744.115718]  delayed_work_timer_fn+0x28/0x38
          [ 2744.115722]  call_timer_fn+0x3c/0x180
          [ 2744.115723]  expire_timers+0x60/0x168
          [ 2744.115724]  run_timer_softirq+0xbc/0x1e8
          [ 2744.115727]  __do_softirq+0x128/0x320
          [ 2744.115731]  irq_exit+0xa4/0xc0
          [ 2744.115734]  __handle_domain_irq+0x70/0xc0
          [ 2744.115735]  gic_handle_irq+0x58/0xa8
          [ 2744.115737]  el1_irq+0xb8/0x140
          [ 2744.115738]  console_unlock+0x3a0/0x568
          [ 2744.115740]  vprintk_emit+0x200/0x2a0
          [ 2744.115744]  dev_vprintk_emit+0x1c8/0x1e4
          [ 2744.115747]  dev_printk_emit+0x6c/0x7c
          [ 2744.115751]  __netdev_printk+0x104/0x1d8
          [ 2744.115752]  netdev_printk+0x60/0x70
          [ 2744.115756]  phylink_resolve+0x38c/0x3c8 [phylink]
          [ 2744.115758]  process_one_work+0x1f8/0x448
          [ 2744.115760]  worker_thread+0x54/0x500
          [ 2744.115762]  kthread+0x12c/0x130
          [ 2744.115764]  ret_from_fork+0x10/0x1c
          [ 2744.115768] Code: aa1403e0 97fffbbe aa0003f5 b4000700 (f9400261)
      
      Fixes: 118d6298 ("net: mvpp2: add ethtool GOP statistics")
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Acked-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      685fb60f
    • J
      mlxsw: spectrum: Fix error path in mlxsw_sp_module_init() · f53d4926
      Jiri Pirko 提交于
      [ Upstream commit 28fe79000e9b0a6f99959869947f1ca305f14599 ]
      
      In case of sp2 pci driver registration fail, fix the error path to
      start with sp1 pci driver unregister.
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f53d4926
    • H
      ipip: validate header length in ipip_tunnel_xmit · bdab0044
      Haishuang Yan 提交于
      [ Upstream commit 47d858d0 ]
      
      We need the same checks introduced by commit cb9f1b78
      ("ip: validate header length on virtual device xmit") for
      ipip tunnel.
      
      Fixes: cb9f1b78 ("ip: validate header length on virtual device xmit")
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      bdab0044
    • H
      ip6_tunnel: fix possible use-after-free on xmit · 9bc847ac
      Haishuang Yan 提交于
      [ Upstream commit 01f5bffa ]
      
      ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
      can cause a possible use-after-free accessing iph/ipv6h pointer
      since the packet will be 'uncloned' running pskb_expand_head if
      it is a cloned gso skb.
      
      Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9bc847ac
    • H
      ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6 · fbc862fb
      Haishuang Yan 提交于
      [ Upstream commit 3bc817d6 ]
      
      Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
      which may change skb->data, so we need to re-load ipv6h at
      the right place.
      
      Fixes: 898b2979 ("ip6_gre: Refactor ip6gre xmit codes")
      Cc: William Tu <u9012063@gmail.com>
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      fbc862fb
    • C
      ife: error out when nla attributes are empty · 2cb1e09d
      Cong Wang 提交于
      [ Upstream commit c8ec4632 ]
      
      act_ife at least requires TCA_IFE_PARMS, so we have to bail out
      when there is no attribute passed in.
      
      Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com
      Fixes: ef6980b6 ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2cb1e09d