1. 17 3月, 2020 1 次提交
    • Y
      pagecache: support percpu refcount to imporve performance · 8b9ea901
      Yunfeng Ye 提交于
      euleros inclusion
      category: feature
      feature: pagecache percpu refcount
      bugzilla: 31398
      CVE: NA
      
      -------------------------------------------------
      
      The pagecache manages the file physical pages, and the life cycle of
      page is managed by atomic counting. With the increasing number of cpu
      cores, the cost of atomic counting is very large when reading file
      pagecaches at large concurrent.
      
      For example, when running nginx http application, the biggest hotspot is
      found in the atomic operation of find_get_entry():
      
       11.94% [kernel] [k] find_get_entry
        7.45% [kernel] [k] do_tcp_sendpages
        6.12% [kernel] [k] generic_file_buffered_read
      
      So we using the percpu refcount mechanism to fix this problem. and the
      test result show that the read performance of nginx http can be improved
      by 100%:
      
        worker   original(requests/sec)   percpu(requests/sec)   imporve
        64       759656.87                1627088.95             114.2%
      
      Notes: we use page->lru to save percpu information, so the pages with
      percpu attribute will not be recycled by memory recycling process, we
      should avoid grow the file size unlimited.
      Signed-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      8b9ea901
  2. 16 3月, 2020 2 次提交
  3. 12 3月, 2020 1 次提交
  4. 05 3月, 2020 36 次提交
    • Y
      iscsi: add member for NUMA aware order workqueue · c8a2308c
      Yang Yingliang 提交于
      euleros inclusion
      category: feature
      feature: Implement NUMA affinity for order workqueue
      
      -------------------------------------------------
      
      Add member to struct iscsi_conn.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c8a2308c
    • Y
      Revert "debugfs: fix kabi for function debugfs_remove_recursive" · acd24e6d
      Yang Yingliang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30939
      CVE: NA
      
      ---------------------------
      
      The kabi can be broken before official release.
      
      This reverts commit ce620c1a6783b2341a376ef948484b5314ed064e.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      acd24e6d
    • Y
      Revert "bdi: fix kabi for struct backing_dev_info" · 6cd5af2c
      Yang Yingliang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30939
      CVE: NA
      ---------------------------
      
      The kabi can be broken before official release.
      
      This reverts commit f8589079659b51222d86a1cb8fd9129752b0d97c.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6cd5af2c
    • Y
      Revert "membarrier/kabi: fix kabi for membarrier_state" · 6597894c
      Yang Yingliang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30939
      CVE: NA
      
      -------------------------------------------------
      
      The kabi can be broken before official release.
      
      This reverts commit f316812150a4fbb52720fe7fb7702c5a52c37602.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6597894c
    • Y
      Revert "PCI: fix kabi change in struct pci_bus" · 6432d8c9
      Yang Yingliang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30939
      CVE: NA
      
      ---------------------------
      
      The kabi can be broken before official release.
      
      This reverts commit 8664b79edac95322379eee025763ba0840d458d1.
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6432d8c9
    • Y
      bdi: get device name under rcu protect · de1b854e
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30109
      CVE: NA
      ---------------------------
      
      bdi->dev may be set as "NULL" or freed by bdi_unregister().
      To avoid causing "NULL" pointer reference or use-after-free
      in user, we add a common function bdi_get_dev_name(), in which
      dev is protected by RCU lock. Then, the caller can get device
      name safely.
      
      Fixes: 5ca4579ae59b ("bdi: fix use-after-free for the bdi device")
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NHou Tao <houao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      de1b854e
    • L
      iommu/iova: avoid softlockup in fq_flush_timeout · 90ba118b
      Li Bin 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30859
      CVE: NA
      
      ---------------------------
      
      There is softlockup under fio pressure test with smmu enabled:
      watchdog: BUG: soft lockup - CPU#81 stuck for 22s!  [swapper/81:0]
      ...
      Call trace:
       fq_flush_timeout+0xc0/0x110
       call_timer_fn+0x34/0x178
       expire_timers+0xec/0x158
       run_timer_softirq+0xc0/0x1f8
       __do_softirq+0x120/0x324
       irq_exit+0x11c/0x140
       __handle_domain_irq+0x6c/0xc0
       gic_handle_irq+0x6c/0x170
       el1_irq+0xb8/0x140
       arch_cpu_idle+0x38/0x1c0
       default_idle_call+0x24/0x44
       do_idle+0x1f4/0x2d8
       cpu_startup_entry+0x2c/0x30
       secondary_start_kernel+0x17c/0x1c8
      
      This is because the timer callback fq_flush_timeout may run more than
      10ms, and timer may be processed continuously in the softirq so trigger
      softlockup. We can use work to deal with fq_ring_free for each cpu which
      may take long time, that to avoid triggering softlockup.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      90ba118b
    • Y
      uacce: Remove uacce mode 1 relatives · f7acb516
      Yu'an Wang 提交于
      driver inclusion
      category: feature
      bugzilla: NA
      CVE: NA
      
      To be simple, in this patch  we try to remove uacce mode 1
      related logic of uacce.c. Because in open mainline ,we do
      not use this mode, mode 0 is used for kernel and mode 2 is
      used for user.
      At the same time, we update correspondingheader file uacce.h.
      We also delete UACCE_QFRT_DKO in dummy_wd_dev.c and dummy_wd_v2.c
      Signed-off-by: NYu'an Wang <wangyuan46@huawei.com>
      Reviewed-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NCheng Hu <hucheng.hu@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f7acb516
    • Y
      debugfs: fix kabi for function debugfs_remove_recursive · 27d247b2
      yu kuai 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 24454
      CVE: NA
      
      ---------------------------
      
      debugfs_remove_recursive was changed from a function to an alias to
      debugfs_remove in patch "simple_recursive_removal(): kernel-side rm -rf
      for ramfs-style filesystems". Change it back to a function.
      Signed-off-by: Nyu kuai <yukuai3@huawei.com>
      Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      27d247b2
    • Y
      simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems · 3d1b056c
      yu kuai 提交于
      mainline inclusion
      from mainline-5.6-rc1
      commit a3d1e7eb5abe3aa1095bc75d1a6760d3809bd672
      category: bugfix
      bugzilla: 24454
      CVE: NA
      
      ---------------------------
      
      two requirements: no file creations in IS_DEADDIR and no cross-directory
      renames whatsoever.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      
      Conflicts:
       fs/debugfs/inode.c
       fs/libfs.c
       fs/tracefs/inode.c
       include/linux/debugfs.h
       include/linux/fs.h
       include/linux/tracefs.h
       kernel/trace/trace.c
      functional changes:
       replace current_time() with current_fs_time()
       remove call to fsnotify_rmdir() and fsnotify_unlink()
      Signed-off-by: Nyu kuai <yukuai3@huawei.com>
      Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3d1b056c
    • Y
      bdi: fix kabi for struct backing_dev_info · b45215d4
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30109
      CVE: NA
      ---------------------------
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b45215d4
    • Y
      bdi: fix use-after-free for the bdi device · 725ee753
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 30109
      CVE: NA
      ---------------------------
      
      We reported kernel crash:
      
      [201962.639350] Call trace:
      [201962.644403]  string+0x28/0xa0
      [201962.650501]  vsnprintf+0x5f0/0x748
      [201962.657472]  seq_vprintf+0x70/0x98
      [201962.664442]  seq_printf+0x7c/0xa0
      [201962.671238]  __blkg_prfill_rwstat+0x84/0x128
      [201962.679949]  blkg_prfill_rwstat_field+0x94/0xc0
      [201962.689182]  blkcg_print_blkgs+0xcc/0x140
      [201962.697370]  blkg_print_stat_bytes+0x4c/0x60
      [201962.706083]  cgroup_seqfile_show+0x58/0xc0
      [201962.714446]  kernfs_seq_show+0x44/0x50
      [201962.722112]  seq_read+0xd4/0x4a8
      [201962.728732]  kernfs_fop_read+0x16c/0x218
      [201962.736748]  __vfs_read+0x60/0x188
      [201962.743717]  vfs_read+0x94/0x150
      [201962.750338]  ksys_read+0x6c/0xd8
      [201962.756958]  __arm64_sys_read+0x24/0x30
      [201962.764800]  el0_svc_common+0x78/0x130
      [201962.772466]  el0_svc_handler+0x38/0x78
      [201962.780131]  el0_svc+0x8/0xc
      
      __blkg_prfill_rwstat() tried to get the device name by
      'bdi->dev', while the 'dev' have been freed by bdi_release().
      The race as following:
      
      blkg_print_stat_bytes         __scsi_remove_device
                                    del_gendisk
                                      bdi_unregister
      
                                      put_device(bdi->dev)
                                        kfree(bdi->dev)
      
      __blkg_prfill_rwstat
        blkg_dev_name
          //use the freed bdi->dev
          dev_name(blkg->q->backing_dev_info->dev)
      
                                      bdi->dev = NULL
      
      Since blkg_dev_name() have been coverd by rcu_read_lock/unlock(),
      we wait all rcu reader before free 'bdi->dev' to avoid use-after-free.
      
      Link: https://lore.kernel.org/linux-block/20200211140038.146629-1-yuyufen@huawei.com/Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      725ee753
    • J
      jbd2: make jbd2_handle_buffer_credits() handle reserved handles · c7f65ce1
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit 3c845acd0237caef617f330a0e3b37ad8ae9fea5
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      The helper jbd2_handle_buffer_credits() doesn't correctly handle reserved
      handles which can lead to crashes. Fix it getting of journal pointer to
      work for reserved handles as well.
      
      Fixes: a9a8344ee171 ("ext4, jbd2: Provide accessor function for handle credits")
      Reported-by: NEric Biggers <ebiggers@kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191115102210.29445-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c7f65ce1
    • J
      jbd2: Provide trace event for handle restarts · 92a8f9f1
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit 0094f981bbaca3ae707c95c5e5977429d29c2dd0
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      Provide trace event for handle restarts to ease debugging.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      92a8f9f1
    • J
      ext4: Reserve revoke credits for freed blocks · a9e4f54d
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit 83448bdfb59731c2f54784ed3f4a93ff95be6e7e
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      So far we have reserved only relatively high fixed amount of revoke
      credits for each transaction. We over-reserved by large amount for most
      cases but when freeing large directories or files with data journalling,
      the fixed amount is not enough. In fact the worst case estimate is
      inconveniently large (maximum extent size) for freeing of one extent.
      
      We fix this by doing proper estimate of the amount of blocks that need
      to be revoked when removing blocks from the inode due to truncate or
      hole punching and otherwise reserve just a small amount of revoke
      credits for each transaction to accommodate freeing of xattrs block or
      so.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      a9e4f54d
    • J
      jbd2: Rename h_buffer_credits to h_total_credits · c38a5168
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit 933f1c1e0b75bbc29730eef07c9e196c6dfd37e5
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      The credit counter now contains both buffer and revoke descriptor block
      credits. Rename to counter to h_total_credits to reflect that. No
      functional change.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      Conflict:
        fs/jbd2/transaction.c
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c38a5168
    • J
      jbd2: Reserve space for revoke descriptor blocks · 38d8c053
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit fdc3ef882a5d59c1709a13b5486ae2b1632e12b6
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      Extend functions for starting, extending, and restarting transaction
      handles to take number of revoke records handle must be able to
      accommodate. These functions then make sure transaction has enough
      credits to be able to store resulting revoke descriptor blocks. Also
      revoke code tracks number of revoke records created by a handle to catch
      situation where some place didn't reserve enough space for revoke
      records. Similarly to standard transaction credits, space for unused
      reserved revoke records is released when the handle is stopped.
      
      On the ext4 side we currently take a simplistic approach of reserving
      space for 1024 revoke records for any transaction. This grows amount of
      credits reserved for each handle only by a few and is enough for any
      normal workload so that we don't hit warnings in jbd2. We will refine
      the logic in following commits.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      Conflict:
        include/linux/jbd2.h
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      38d8c053
    • J
      jbd2: Drop jbd2_space_needed() · a5140ec0
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit 77444ac4f9537bc4211f928959d5231445e30c6e
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      The function is now just a trivial wrapper returning
      journal->j_max_transaction_buffers. Drop it.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      a5140ec0
    • J
      jbd2: Account descriptor blocks into t_outstanding_credits · ad68953d
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit 9f356e5a4f12008fa0df8b6385fc0ab830416e72
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      Currently, journal descriptor blocks were not accounted in
      transaction->t_outstanding_credits and we were just leaving some slack
      space in the journal for them (in jbd2_log_space_left() and
      jbd2_space_needed()). This is making proper accounting (and reservation
      we want to add) of descriptor blocks difficult so switch to accounting
      descriptor blocks in transaction->t_outstanding_credits and just reserve
      the same amount of credits in t_outstanding credits for journal
      descriptor blocks when creating transaction.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      
      Conflict:
        include/linux/jbd2.h
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ad68953d
    • J
      ext4, jbd2: Provide accessor function for handle credits · 02b7342a
      Jan Kara 提交于
      mainline inclusion
      from mainline-5.5-rc1
      commit a9a8344ee1714f835ba394077e8c13d751e2f148
      category: bugfix
      bugzilla: 25031
      CVE: NA
      ---------------------------
      
      Provide accessor function to get number of credits available in a handle
      and use it from ext4. Later, computation of available credits won't be
      so straightforward.
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      02b7342a
    • C
      membarrier/kabi: fix kabi for membarrier_state · 8b9958f0
      Cheng Jian 提交于
      hulk inclusion
      category: feature
      bugzilla: 28332
      CVE: NA
      
      -------------------------------------------------
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      8b9958f0
    • M
      sched/membarrier: Fix p->mm->membarrier_state racy load · 08946ecc
      Mathieu Desnoyers 提交于
      mainline inclusion
      from mainline-5.4-rc1
      commit 227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
      category: bugfix
      bugzilla: 28332
      CVE: NA
      
      -------------------------------------------------
      
      The membarrier_state field is located within the mm_struct, which
      is not guaranteed to exist when used from runqueue-lock-free iteration
      on runqueues by the membarrier system call.
      
      Copy the membarrier_state from the mm_struct into the scheduler runqueue
      when the scheduler switches between mm.
      
      When registering membarrier for mm, after setting the registration bit
      in the mm membarrier state, issue a synchronize_rcu() to ensure the
      scheduler observes the change. In order to take care of the case
      where a runqueue keeps executing the target mm without swapping to
      other mm, iterate over each runqueue and issue an IPI to copy the
      membarrier_state from the mm_struct into each runqueue which have the
      same mm which state has just been modified.
      
      Move the mm membarrier_state field closer to pgd in mm_struct to use
      a cache line already touched by the scheduler switch_mm.
      
      The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
      clear the runqueue's membarrier state in addition to clear the mm
      membarrier state, so move its implementation into the scheduler
      membarrier code so it can access the runqueue structure.
      
      Add memory barrier in membarrier_exec_mmap() prior to clearing
      the membarrier state, ensuring memory accesses executed prior to exec
      are not reordered with the stores clearing the membarrier state.
      
      As suggested by Linus, move all membarrier.c RCU read-side locks outside
      of the for each cpu loops.
      
      [Cheng Jian: use task_rcu_dereference in sync_runqueues_membarrier_state]
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      08946ecc
    • X
      PCI: fix kabi change in struct pci_bus · 822a5290
      Xiongfeng Wang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      ---------------------------
      
      Fix kabi change in struct pci_bus since the following patch.
      12de28f380a9 ("PCI: add a member in 'struct pci_bus' to record the
      original 'pci_ops'")
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      822a5290
    • X
      PCI: add a member in 'struct pci_bus' to record the original 'pci_ops' · 03377a8a
      Xiongfeng Wang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      -------------------------------------------------
      
      When I test 'aer-inject' with the following procedures:
      1. inject a fatal error into a upstream PCI bridge
      2. remove the upstream bridge by sysfs
      3. rescan the PCI tree by 'echo 1 > /sys/bus/pci/rescan'
      4. execute command 'rmmod aer-inject'
      5. remove the upstream bridge by sysfs again
      
      I came across the following Oops.
      
      [  799.713238] Internal error: Oops: 96000007 [#1] SMP
      [  799.718099] Process bash (pid: 10683, stack limit = 0x00000000125a3b1b)
      [  799.724686] CPU: 108 PID: 10683 Comm: bash Kdump: loaded Not tainted 4.19.36 #2
      [  799.731962] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019
      [  799.739325] pstate: 40400009 (nZcv daif +PAN -UAO)
      [  799.744104] pc : pci_remove_bus+0xc0/0x1c0
      [  799.748182] lr : pci_remove_bus+0x94/0x1c0
      [  799.752260] sp : ffffa02e335df940
      [  799.755560] x29: ffffa02e335df940 x28: ffff2000088216a8
      [  799.760849] x27: 1ffff405c66bbfbc x26: ffff20000a9518c0
      [  799.766139] x25: ffffa02dea6ec418 x24: 1ffff405bd4dd883
      [  799.771427] x23: ffffa02e72576628 x22: 1ffff405ce4aecc0
      [  799.776715] x21: ffffa02e72576608 x20: ffff200002e75080
      [  799.782003] x19: ffffa02e72576600 x18: 0000000000000000
      [  799.787291] x17: 0000000000000000 x16: 0000000000000000
      [  799.792578] x15: 0000000000000001 x14: dfff200000000000
      [  799.797866] x13: ffff20000a6dfaf0 x12: 0000000000000000
      [  799.803154] x11: 1fffe4000159b217 x10: ffff04000159b217
      [  799.808442] x9 : dfff200000000000 x8 : ffff20000acd90bf
      [  799.813730] x7 : 0000000000000000 x6 : 0000000000000000
      [  799.819017] x5 : 0000000000000001 x4 : 0000000000000000
      [  799.824306] x3 : 1ffff405dbe62603 x2 : 1fffe400005cea11
      [  799.829593] x1 : dfff200000000000 x0 : ffff200002e75088
      [  799.834882] Call trace:
      [  799.837323]  pci_remove_bus+0xc0/0x1c0
      [  799.841056]  pci_remove_bus_device+0xd0/0x2f0
      [  799.845392]  pci_stop_and_remove_bus_device_locked+0x2c/0x40
      [  799.851028]  remove_store+0x1b8/0x1d0
      [  799.854679]  dev_attr_store+0x60/0x80
      [  799.858330]  sysfs_kf_write+0x104/0x170
      [  799.862149]  kernfs_fop_write+0x23c/0x430
      [  799.866143]  __vfs_write+0xec/0x4e0
      [  799.869615]  vfs_write+0x12c/0x3d0
      [  799.873001]  ksys_write+0xd0/0x190
      [  799.876389]  __arm64_sys_write+0x70/0xa0
      [  799.880298]  el0_svc_common+0xfc/0x278
      [  799.884030]  el0_svc_handler+0x50/0xc0
      [  799.887764]  el0_svc+0x8/0xc
      [  799.890634] Code: d2c40001 f2fbffe1 91002280 d343fc02 (38e16841)
      [  799.896700] kernel fault(0x1) notification starting on CPU 108
      
      It is because when we alloc a new bus in rescanning process, the
      'pci_ops' of the newly allocced 'pci_bus' is inherited from its parent
      pci bus. Whereas, the 'pci_ops' of the parent bus may be changed to
      'aer_inj_pci_ops' in 'aer_inject()'. When we unload the module
      'aer_inject', we only restore the 'pci_ops' for the pci bus of the
      error-injected device and the root port in 'aer_inject_exit'. After we
      have unloaded the module, the 'pci_ops' of the newly allocced pci bus is
      still 'aer_inj_pci_ops'. When we access it, an Oops happened.
      
      This patch add a member 'backup_ops' in 'struct pci_bus' to record the
      original 'ops'. When we alloc a child pci bus, we assign the
      'backup_ops' of the parent bus to the 'ops' of the child bus.
      
      Maybe the best way is to not modify the 'pci_ops' in 'struct pci_bus',
      but this will refactor the 'aer_inject' framework a lot. I haven't found
      a better way to handle it.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      03377a8a
    • H
      PM / hibernate: introduce system_in_hibernation · ba4c6e55
      Hongbo Yao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 26326
      CVE: NA
      
      -------------------------------------------------
      Introduce boolean function system_in_hibernation() returning
      'true' when the system carrying out hibernation.
      
      Some device drivers or syscore need such a function to check
      if it is in the phase of hibernation.
      Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ba4c6e55
    • J
      f2fs: support swap file w/ DIO · c9c60491
      Jaegeuk Kim 提交于
      mainline inclusion
      from mainline-v5.3-rc1
      commit 4969c06a0d83c9c3dc50b8efcdc8eeedfce896f6
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2019-19815
      
      -------------------------------------------------
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Conflicts:
        fs/f2fs/f2fs.h
        fs/f2fs/data.c
      [yyl: adjust context]
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NWei Fang <fangwei1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c9c60491
    • D
      cfg80211/mac80211: make ieee80211_send_layer2_update a public function · d869e0d7
      Dedy Lansky 提交于
      mainline inclusion
      from mainline-v4.20-rc1
      commit 30ca1aa536211f5ac3de0173513a7a99a98a97f3
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2019-5108
      
      This patch is prepare for fixing CVE-2019-5108
      
      -------------------------------------------------
      
      Make ieee80211_send_layer2_update() a common function so other drivers
      can re-use it.
      Signed-off-by: NDedy Lansky <dlansky@codeaurora.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Conflicts:
        net/wireless/util.c
      [yyl: adjust context]
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NWenan Mao <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d869e0d7
    • E
      macvlan: do not assume mac_header is set in macvlan_broadcast() · b2b473d5
      Eric Dumazet 提交于
      [ Upstream commit 96cc4b69581db68efc9749ef32e9cf8e0160c509 ]
      
      Use of eth_hdr() in tx path is error prone.
      
      Many drivers call skb_reset_mac_header() before using it,
      but others do not.
      
      Commit 6d1ccff6 ("net: reset mac header in dev_start_xmit()")
      attempted to fix this generically, but commit d346a3fa
      ("packet: introduce PACKET_QDISC_BYPASS socket option") brought
      back the macvlan bug.
      
      Lets add a new helper, so that tx paths no longer have
      to call skb_reset_mac_header() only to get a pointer
      to skb->data.
      
      Hopefully we will be able to revert 6d1ccff6
      ("net: reset mac header in dev_start_xmit()") and save few cycles
      in transmit fast path.
      
      BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline]
      BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline]
      BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277
      Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579
      
      CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
       __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
       kasan_report+0x12/0x20 mm/kasan/common.c:639
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145
       __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline]
       mc_hash drivers/net/macvlan.c:251 [inline]
       macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277
       macvlan_queue_xmit drivers/net/macvlan.c:520 [inline]
       macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559
       __netdev_start_xmit include/linux/netdevice.h:4447 [inline]
       netdev_start_xmit include/linux/netdevice.h:4461 [inline]
       dev_direct_xmit+0x419/0x630 net/core/dev.c:4079
       packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240
       packet_snd net/packet/af_packet.c:2966 [inline]
       packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991
       sock_sendmsg_nosec net/socket.c:639 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:659
       __sys_sendto+0x262/0x380 net/socket.c:1985
       __do_sys_sendto net/socket.c:1997 [inline]
       __se_sys_sendto net/socket.c:1993 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x442639
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639
      RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 9389:
       save_stack+0x23/0x90 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       __kasan_kmalloc mm/kasan/common.c:513 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
       kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527
       __do_kmalloc mm/slab.c:3656 [inline]
       __kmalloc+0x163/0x770 mm/slab.c:3665
       kmalloc include/linux/slab.h:561 [inline]
       tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252
       tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
       tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822
       tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129
       security_inode_getattr+0xf2/0x150 security/security.c:1222
       vfs_getattr+0x25/0x70 fs/stat.c:115
       vfs_statx_fd+0x71/0xc0 fs/stat.c:145
       vfs_fstat include/linux/fs.h:3265 [inline]
       __do_sys_newfstat+0x9b/0x120 fs/stat.c:378
       __se_sys_newfstat fs/stat.c:375 [inline]
       __x64_sys_newfstat+0x54/0x80 fs/stat.c:375
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 9389:
       save_stack+0x23/0x90 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       kasan_set_free_info mm/kasan/common.c:335 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x10a/0x2c0 mm/slab.c:3757
       tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289
       tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
       tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822
       tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129
       security_inode_getattr+0xf2/0x150 security/security.c:1222
       vfs_getattr+0x25/0x70 fs/stat.c:115
       vfs_statx_fd+0x71/0xc0 fs/stat.c:145
       vfs_fstat include/linux/fs.h:3265 [inline]
       __do_sys_newfstat+0x9b/0x120 fs/stat.c:378
       __se_sys_newfstat fs/stat.c:375 [inline]
       __x64_sys_newfstat+0x54/0x80 fs/stat.c:375
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8880a4932000
       which belongs to the cache kmalloc-4k of size 4096
      The buggy address is located 1025 bytes inside of
       4096-byte region [ffff8880a4932000, ffff8880a4933000)
      The buggy address belongs to the page:
      page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0
      raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000
      raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: b863ceb7 ("[NET]: Add macvlan driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b2b473d5
    • P
      netfilter: uapi: Avoid undefined left-shift in xt_sctp.h · fccad51d
      Phil Sutter 提交于
      [ Upstream commit 164166558aacea01b99c8c8ffb710d930405ba69 ]
      
      With 'bytes(__u32)' being 32, a left-shift of 31 may happen which is
      undefined for the signed 32-bit value 1. Avoid this by declaring 1 as
      unsigned.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      fccad51d
    • Y
      block: fix use-after-free on cached last_lookup partition · f8096783
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 27962
      CVE: NA
      ---------------------------
      
      delete_partition() clears the cached last_lookup partition. However
      the .last_lookup cache may be overwritten by one IO path after
      it is cleared from delete_partition(). Then another IO path may
      use the cached deleting partition after __delete_partition() is
      called, then use-after-free is triggered on the cached partition.
      
      Fixes the issue by the following approach:
      
      1) always get the partition's refcount via hd_struct_try_get() before
      setting .last_lookup
      
      2) move clearing .last_lookup from delete_partition() to
      __delete_partition() which is release handle of the partition's
      percpu-refcount, so that no IO path can overwrite .last_lookup after it
      is cleared in __delete_partition().
      
      It is one candidate approach of Yufen's patch[1] which adds overhead
      in fast path by indirect lookup which may introduce one extra cacheline
      in IO path. Also this patch relies on percpu-refcount's protection, and
      it is easier to understand and verify.
      
      [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#tReported-by: NYufen Yu <yuyufen@huawei.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hou Tao <houtao1@huawei.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Conflict:
      	include/linux/genhd.h
      	block/blk-core.c
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f8096783
    • E
      net: add annotations on hh->hh_len lockless accesses · acae4f46
      Eric Dumazet 提交于
      [ Upstream commit c305c6ae79e2ce20c22660ceda94f0d86d639a82 ]
      
      KCSAN reported a data-race [1]
      
      While we can use READ_ONCE() on the read sides,
      we need to make sure hh->hh_len is written last.
      
      [1]
      
      BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output
      
      write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
       eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
       neigh_hh_init net/core/neighbour.c:1463 [inline]
       neigh_resolve_output net/core/neighbour.c:1480 [inline]
       neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
       ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
       rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
       neigh_resolve_output net/core/neighbour.c:1479 [inline]
       neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
       ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
       rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events rt6_probe_deferred
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      acae4f46
    • T
      net: core: limit nested device depth · cdb5d42b
      Taehee Yoo 提交于
      [ Upstream commit 5343da4c17429efaa5fb1594ea96aee1a283e694 ]
      
      Current code doesn't limit the number of nested devices.
      Nested devices would be handled recursively and this needs huge stack
      memory. So, unlimited nested devices could make stack overflow.
      
      This patch adds upper_level and lower_level, they are common variables
      and represent maximum lower/upper depth.
      When upper/lower device is attached or dettached,
      {lower/upper}_level are updated. and if maximum depth is bigger than 8,
      attach routine fails and returns -EMLINK.
      
      In addition, this patch converts recursive routine of
      netdev_walk_all_{lower/upper} to iterator routine.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add link dummy0 name vlan1 type vlan id 1
          ip link set vlan1 up
      
          for i in {2..55}
          do
      	    let A=$i-1
      
      	    ip link add vlan$i link vlan$A type vlan id $i
          done
          ip link del dummy0
      
      Splat looks like:
      [  155.513226][  T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
      [  155.514162][  T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
      [  155.515048][  T908]
      [  155.515333][  T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
      [  155.516147][  T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  155.517233][  T908] Call Trace:
      [  155.517627][  T908]
      [  155.517918][  T908] Allocated by task 0:
      [  155.518412][  T908] (stack is not available)
      [  155.518955][  T908]
      [  155.519228][  T908] Freed by task 0:
      [  155.519885][  T908] (stack is not available)
      [  155.520452][  T908]
      [  155.520729][  T908] The buggy address belongs to the object at ffff8880608a6ac0
      [  155.520729][  T908]  which belongs to the cache names_cache of size 4096
      [  155.522387][  T908] The buggy address is located 512 bytes inside of
      [  155.522387][  T908]  4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
      [  155.523920][  T908] The buggy address belongs to the page:
      [  155.524552][  T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
      [  155.525836][  T908] flags: 0x100000000010200(slab|head)
      [  155.526445][  T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
      [  155.527424][  T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  155.528429][  T908] page dumped because: kasan: bad access detected
      [  155.529158][  T908]
      [  155.529410][  T908] Memory state around the buggy address:
      [  155.530060][  T908]  ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  155.530971][  T908]  ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
      [  155.531889][  T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  155.532806][  T908]                                            ^
      [  155.533509][  T908]  ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
      [  155.534436][  T908]  ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
      [ ... ]
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cdb5d42b
    • S
      regulator: ab8500: Remove AB8505 USB regulator · cbbb04a5
      Stephan Gerhold 提交于
      commit 99c4f70df3a6446c56ca817c2d0f9c12d85d4e7c upstream.
      
      The USB regulator was removed for AB8500 in
      commit 41a06aa7 ("regulator: ab8500: Remove USB regulator").
      It was then added for AB8505 in
      commit 547f384f ("regulator: ab8500: add support for ab8505").
      
      However, there was never an entry added for it in
      ab8505_regulator_match. This causes all regulators after it
      to be initialized with the wrong device tree data, eventually
      leading to an out-of-bounds array read.
      
      Given that it is not used anywhere in the kernel, it seems
      likely that similar arguments against supporting it exist for
      AB8505 (it is controlled by hardware).
      
      Therefore, simply remove it like for AB8500 instead of adding
      an entry in ab8505_regulator_match.
      
      Fixes: 547f384f ("regulator: ab8500: add support for ab8505")
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NStephan Gerhold <stephan@gerhold.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Link: https://lore.kernel.org/r/20191106173125.14496-1-stephan@gerhold.netSigned-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cbbb04a5
    • S
      libata: Fix retrieving of active qcs · c30da1f4
      Sascha Hauer 提交于
      commit 8385d756e114f2df8568e508902d5f9850817ffb upstream.
      
      ata_qc_complete_multiple() is called with a mask of the still active
      tags.
      
      mv_sata doesn't have this information directly and instead calculates
      the still active tags from the started tags (ap->qc_active) and the
      finished tags as (ap->qc_active ^ done_mask)
      
      Since 28361c40 the hw_tag and tag are no longer the same and the
      equation is no longer valid. In ata_exec_internal_sg() ap->qc_active is
      initialized as 1ULL << ATA_TAG_INTERNAL, but in hardware tag 0 is
      started and this will be in done_mask on completion. ap->qc_active ^
      done_mask becomes 0x100000000 ^ 0x1 = 0x100000001 and thus tag 0 used as
      the internal tag will never be reported as completed.
      
      This is fixed by introducing ata_qc_get_active() which returns the
      active hardware tags and calling it where appropriate.
      
      This is tested on mv_sata, but sata_fsl and sata_nv suffer from the same
      problem. There is another case in sata_nv that most likely needs fixing
      as well, but this looks a little different, so I wasn't confident enough
      to change that.
      
      Fixes: 28361c40 ("libata: add extra internal command")
      Cc: stable@vger.kernel.org
      Tested-by: NPali Rohár <pali.rohar@gmail.com>
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Add missing export of ata_qc_get_active(), as per Pali.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c30da1f4
    • F
      ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys() · e86dc8a4
      Florian Fainelli 提交于
      commit 84b032dbfdf1c139cd2b864e43959510646975f8 upstream.
      
      This reverts commit 6bb86fef
      ("libahci_platform: Staticize ahci_platform_<en/dis>able_phys()") we are
      going to need ahci_platform_{enable,disable}_phys() in a subsequent
      commit for ahci_brcm.c in order to properly control the PHY
      initialization order.
      
      Also make sure the function prototypes are declared in
      include/linux/ahci_platform.h as a result.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e86dc8a4
    • L
      dmaengine: Fix access to uninitialized dma_slave_caps · 2f16dfc9
      Lukas Wunner 提交于
      commit 53a256a9b925b47c7e67fc1f16ca41561a7b877c upstream.
      
      dmaengine_desc_set_reuse() allocates a struct dma_slave_caps on the
      stack, populates it using dma_get_slave_caps() and then accesses one
      of its members.
      
      However dma_get_slave_caps() may fail and this isn't accounted for,
      leading to a legitimate warning of gcc-4.9 (but not newer versions):
      
         In file included from drivers/spi/spi-bcm2835.c:19:0:
         drivers/spi/spi-bcm2835.c: In function 'dmaengine_desc_set_reuse':
      >> include/linux/dmaengine.h:1370:10: warning: 'caps.descriptor_reuse' is used uninitialized in this function [-Wuninitialized]
           if (caps.descriptor_reuse) {
      
      Fix it, thereby also silencing the gcc-4.9 warning.
      
      The issue has been present for 4 years but surfaces only now that
      the first caller of dmaengine_desc_set_reuse() has been added in
      spi-bcm2835.c. Another user of reusable DMA descriptors has existed
      for a while in pxa_camera.c, but it sets the DMA_CTRL_REUSE flag
      directly instead of calling dmaengine_desc_set_reuse(). Nevertheless,
      tag this commit for stable in case there are out-of-tree users.
      
      Fixes: 27242021 ("dmaengine: Add DMA_CTRL_REUSE")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org # v4.3+
      Link: https://lore.kernel.org/r/ca92998ccc054b4f2bfd60ef3adbab2913171eac.1575546234.git.lukas@wunner.deSigned-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2f16dfc9