提交 · 5.10.0-16.0.0 · openeuler / Kernel

30 10月, 2021 20 次提交

openeuler_defconfig: Enable CONFIG_HW_RANDOM_HISI_GM by default · 0c50f770

由 Yu'an Wang 提交于 10月 30, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4FHUR

-----------------------------------------------------------

update CONFIG_HW_RANDOM_HISI_V2 to CONFIG_HW_RANDOM_HISI_GM and enable it.
Then add CONFIG_CRYPTO_DEV_HISI_TRNG and enable it.
Signed-off-by: NYu'an Wang <wangyuan46@huawei.com>
Reviewed-by: NWeili Qian <qianweili@huawei.com>
Reviewed-by: NLongfang Liu <liulongfang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0c50f770

hwrng: add hisilicon GM auth trng driver · 8116aced

由 Yu'an Wang 提交于 10月 30, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4FHUR

----------------------------------------------------------

Provide kernel-side GM authentication support for the True Random Number
Generator hardware found on HiSilicon KP920 SoC
Signed-off-by: NYu'an Wang <wangyuan46@huawei.com>
Reviewed-by: NWeili Qian <qianweili@huawei.com>
Reviewed-by: NLongfang Liu <liulongfang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8116aced

cgroup/files: support boot parameter to control if disable files cgroup · 26ba3a84

由 Yang Yingliang 提交于 10月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4G4S5

--------------------------------

When files cgroup is enabled, it's will leads syscall performance
regression in UnixBench. Add a helper files_cgroup_enabled() and
use it to control if use files cgroup, wen can use cgroup_disable=files
in cmdline to disable files cgroup.

syscall of UnixBench (large is better)
enable files cgroup:            2868.5
disable files cgroup:           3177.0
disable config of files cgroup: 3186.5
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NTao Hou <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Conflicts:
	Documentation/admin-guide/kernel-parameters.txt
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

26ba3a84

files_cgroup: Fix soft lockup when refcnt overflow. · 7e485fec

由 Zhang Xiaoxu 提交于 10月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4G4S5

---------------------

There is a soft lockup call trace as below:
  CPU: 0 PID: 1360 Comm: imapsvcd Kdump: loaded Tainted: G           OE
  task: ffff8a7296e1eeb0 ti: ffff8a7296aa0000 task.ti: ffff8a7296aa0000
  RIP: 0010:[<ffffffffb691ecb4>]  [<ffffffffb691ecb4>]
  __css_tryget+0x24/0x50
  RSP: 0018:ffff8a7296aa3db8  EFLAGS: 00000a87
  RAX: 0000000080000000 RBX: ffff8a7296aa3df8 RCX: ffff8a72820d9a08
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a72820d9a00
  RBP: ffff8a7296aa3db8 R08: 000000000001c360 R09: ffffffffb6a478f4
  R10: ffffffffb6935e83 R11: ffffffffffffffd0 R12: 0000000057d35cd8
  R13: 000000d000000002 R14: ffffffffb6892fbe R15: 000000d000000002
  FS:  0000000000000000(0000) GS:ffff8a72fec00000(0063)
  knlGS:00000000c6e65b40
  CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
  CR2: 0000000057d35cd8 CR3: 00000007e8008000 CR4: 00000000003607f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   [<ffffffffb6a93578>] files_cgroup_assign+0x48/0x60
   [<ffffffffb6a47972>] dup_fd+0xb2/0x2f0
   [<ffffffffb6935e83>] ? audit_alloc+0xe3/0x180
   [<ffffffffb6893a03>] copy_process+0xbd3/0x1a40
   [<ffffffffb6894a21>] do_fork+0x91/0x320
   [<ffffffffb6f329e6>] ? trace_do_page_fault+0x56/0x150
   [<ffffffffb6894d36>] SyS_clone+0x16/0x20
   [<ffffffffb6f3bf8c>] ia32_ptregs_common+0x4c/0xfc
   code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 4f 08 48 89 e5 8b
         47 08 8d 90 00 00 00 80 85 c0 0f 49 d0 8d 72 01 89 d0 f0 0f b1

When the child process exit, we doesn't call dec refcnt, so, the refcnt
maybe overflow. Then the 'task_get_css' will dead loop because the
'css_refcnt' will return an unbias refcnt, if the refcnt is negitave,
'__css_tryget' always return false, then 'task_get_css' dead looped.

The child process always call 'close_files' when exit, add dec refcnt in
it.
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7e485fec

filescontrol: silence suspicious RCU warning · e2b24a5a

由 zhangyi (F) 提交于 10月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4G4S5

---------------------------

files_fdtable() in files_cgroup_count_fds() should be invoked under
files_struct->file_lock, otherwise a suspicious RCU usage warning
triggers below when CONFIG_PROVE_RCU and CONFIG_LOCKDEP are enabled.

  =============================
  WARNING: suspicious RCU usage
  ...
  -----------------------------
  fs/filescontrol.c:96 suspicious rcu_dereference_check() usage!
  ...
  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted
  4.19.36-cph920-32bitc-vmalloc-binder-debugon.aarch64 #6
  Call trace:
   dump_backtrace+0x0/0x198
   show_stack+0x24/0x30
   dump_stack+0xd0/0x11c
   lockdep_rcu_suspicious+0xcc/0x110
   files_cgroup_count_fds+0xc0/0xe0
   dup_fd+0x234/0x448
   copy_process.isra.2.part.3+0x698/0x1490
   _do_fork+0xe8/0x728
   kernel_thread+0x48/0x58
   rest_init+0x34/0x2a0
   start_kernel+0x52c/0x558

Although the 'newf' is newly created and will not be released in
paralle, still silence the warning through adding spin_lock around.

Fixes: 52cc1eccf6de ("cgroups: Resource controller for open files")
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Conflict:
	fs/file.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e2b24a5a

mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() · e08c8d34

由 Wang Hai 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.14-rc4
commit 121dffe2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=121dffe20b141c9b27f39d49b15882469cbebae7

----------------------------------------------------------------------

When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  [...]
  Oops: 0000 [#1] SMP
  [...]
  Workqueue: events kfree_rcu_work
  RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
  RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
  RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
  [...]
  Call Trace:
    kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
    kfree_bulk include/linux/slab.h:413 [inline]
    kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
    process_one_work+0x207/0x530 kernel/workqueue.c:2276
    worker_thread+0x320/0x610 kernel/workqueue.c:2422
    kthread+0x13d/0x160 kernel/kthread.c:313
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.

Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.

Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
Fixes: 270c6a71 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: NWang Hai <wanghai38@huawei.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e08c8d34

mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM · 6c27b037

由 Muchun Song 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit bd290e1e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd290e1e75d8a8b2d87031b63db56ae165677870

----------------------------------------------------------------------

The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.

Link: https://lkml.kernel.org/r/20210319163821.20704-8-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6c27b037

mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() · 4b7c53a1

由 Muchun Song 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit f1286fae
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f1286fae540697e0b4713a8262f4aab5cf65f1c5

----------------------------------------------------------------------

There is only one user of __memcg_kmem_charge(), so manually inline
__memcg_kmem_charge() to obj_cgroup_charge_pages().  Similarly manually
inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and call
obj_cgroup_uncharge_pages() in obj_cgroup_release().

This is just code cleanup without any functionality changes.

Link: https://lkml.kernel.org/r/20210319163821.20704-7-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	mm/memcontrol.c
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4b7c53a1

mm: memcontrol: use obj_cgroup APIs to charge kmem pages · 63472377

由 Muchun Song 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit b4e0b68f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b4e0b68fbd9d1fd7e31cbe8adca3ad6cf556e2ee

----------------------------------------------------------------------

Since Roman's series "The new cgroup slab memory controller" applied.
All slab objects are charged via the new APIs of obj_cgroup.  The new
APIs introduce a struct obj_cgroup to charge slab objects.  It prevents
long-living objects from pinning the original memory cgroup in the
memory.  But there are still some corner objects (e.g.  allocations
larger than order-1 page on SLUB) which are not charged via the new
APIs.  Those objects (include the pages which are allocated from buddy
allocator directly) are charged as kmem pages which still hold a
reference to the memory cgroup.

We want to reuse the obj_cgroup APIs to charge the kmem pages.  If we do
that, we should store an object cgroup pointer to page->memcg_data for
the kmem pages.

Finally, page->memcg_data will have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

We do not change the behavior of page_memcg() and page_memcg_rcu().  They
are also suitable for LRU pages and kmem pages.  Why?

Because memory allocations pinning memcgs for a long time - it exists at a
larger scale and is causing recurring problems in the real world: page
cache doesn't get reclaimed for a long time, or is used by the second,
third, fourth, ...  instance of the same job that was restarted into a new
cgroup every time.  Unreclaimable dying cgroups pile up, waste memory, and
make page reclaim very inefficient.

We can convert LRU pages and most other raw memcg pins to the objcg
direction to fix this problem, and then the page->memcg will always point
to an object cgroup pointer.  At that time, LRU pages and kmem pages will
be treated the same.  The implementation of page_memcg() will remove the
kmem page check.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup.  Finally, the page->memcg_data of the kmem page points to an
object cgroup.  We can use the __page_objcg() to get the object cgroup
associated with a kmem page.  Or we can use page_memcg() to get the memory
cgroup associated with a kmem page, but caller must ensure that the
returned memcg won't be released (e.g.  acquire the rcu_read_lock or
css_set_lock).

  Link: https://lkml.kernel.org/r/20210401030141.37061-1-songmuchun@bytedance.com

Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
[songmuchun@bytedance.com: fix forget to obtain the ref to objcg in split_page_memcg]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	include/linux/memcontrol.h
	mm/memcontrol.c
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

63472377

mm: memcontrol: change ug->dummy_page only if memcg changed · 66a3bf90

由 Muchun Song 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 7ab345a8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7ab345a8973017c89a1be87b6c8722d1fee1fd95

----------------------------------------------------------------------

Just like assignment to ug->memcg, we only need to update ug->dummy_page
if memcg changed.  So move it to there.  This is a very small
optimization.

Link: https://lkml.kernel.org/r/20210319163821.20704-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

66a3bf90

mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c · 1abb5714

由 Muchun Song 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 48060834
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=48060834f2277374bb68c04c62de8b57e769f701

----------------------------------------------------------------------

page_memcg() is not suitable for use by page_expected_state() and
page_bad_reason().  Because it can BUG_ON() for the slab pages when
CONFIG_DEBUG_VM is enabled.  As neither lru, nor kmem, nor slab page
should have anything left in there by the time the page is freed, what
we care about is whether the value of page->memcg_data is 0.  So just
directly access page->memcg_data here.

Link: https://lkml.kernel.org/r/20210319163821.20704-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1abb5714

mm: memcontrol: introduce obj_cgroup_{un}charge_pages · e3f73e77

由 Muchun Song 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit e74d2259
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e74d225910ec3a9999f06934afa068b6a30babf8

----------------------------------------------------------------------

We know that the unit of slab object charging is bytes, the unit of kmem
page charging is PAGE_SIZE.  If we want to reuse obj_cgroup APIs to
charge the kmem pages, we should pass PAGE_SIZE (as third parameter) to
obj_cgroup_charge().  Because the size is already PAGE_SIZE, we can skip
touch the objcg stock.  And obj_cgroup_{un}charge_pages() are introduced
to charge in units of page level.

In the latter patch, we also can reuse those two helpers to charge or
uncharge a number of kernel pages to a object cgroup.  This is just a
code movement without any functional changes.

Link: https://lkml.kernel.org/r/20210319163821.20704-3-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e3f73e77

mm: Convert page kmemcg type to a page memcg flag · 492cf0b0

由 Roman Gushchin 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 18b2db3b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18b2db3b0385226b71cb3288474fa5a6e4a45474

----------------------------------------------------------------------

PageKmemcg flag is currently defined as a page type (like buddy, offline,
table and guard).  Semantically it means that the page was accounted as a
kernel memory by the page allocator and has to be uncharged on the
release.

As a side effect of defining the flag as a page type, the accounted page
can't be mapped to userspace (look at page_has_type() and comments above).
In particular, this blocks the accounting of vmalloc-backed memory used
by some bpf maps, because these maps do map the memory to userspace.

One option is to fix it by complicating the access to page->mapcount,
which provides some free bits for page->page_type.

But it's way better to move this flag into page->memcg_data flags.
Indeed, the flag makes no sense without enabled memory cgroups and memory
cgroup pointer set in particular.

This commit replaces PageKmemcg() and __SetPageKmemcg() with
PageMemcgKmem() and an open-coded OR operation setting the memcg pointer
with the MEMCG_DATA_KMEM bit.  __ClearPageKmemcg() can be simple deleted,
as the whole memcg_data is zeroed at once.

As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be
compiled out.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-5-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-5-guro@fb.com

Conflicts:
	mm/memcontrol.c
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

492cf0b0

mm: Introduce page memcg flags · bfc245bc

由 Roman Gushchin 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 87944e29
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=87944e2992bd28098c6806086a1e96bb4d0e502b

----------------------------------------------------------------------

The lowest bit in page->memcg_data is used to distinguish between struct
memory_cgroup pointer and a pointer to a objcgs array.  All checks and
modifications of this bit are open-coded.

Let's formalize it using page memcg flags, defined in enum
page_memcg_data_flags.

Additional flags might be added later.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-4-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-4-guro@fb.comSigned-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bfc245bc

mm: memcontrol/slab: Use helpers to access slab page's memcg_data · dbaf3609

由 Roman Gushchin 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 270c6a71
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=270c6a71460e12b07b1dcadf7457ff95b6c6e8f4

----------------------------------------------------------------------

To gather all direct accesses to struct page's memcg_data field in one
place, let's introduce 3 new helpers to use in the slab accounting code:

  struct obj_cgroup **page_objcgs(struct page *page);
  struct obj_cgroup **page_objcgs_check(struct page *page);
  bool set_page_objcgs(struct page *page, struct obj_cgroup **objcgs);

They are similar to the corresponding API for generic pages, except that
the setter can return false, indicating that the value has been already
set from a different thread.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20201027001657.3398190-3-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-3-guro@fb.comSigned-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dbaf3609

mm: memcontrol: Use helpers to read page's memcg data · d1b942b7

由 Roman Gushchin 提交于 10月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit bcfe06bf
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4C0GB
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcfe06bf2622f7c4899468e427683aec49070687

----------------------------------------------------------------------

Patch series "mm: allow mapping accounted kernel pages to userspace", v6.

Currently a non-slab kernel page which has been charged to a memory cgroup
can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
flag is defined as a page type (like buddy, offline, etc), so it takes a
bit from a page->mapped counter.  Pages with a type set can't be mapped to
userspace.

But in general the kmemcg flag has nothing to do with mapping to
userspace.  It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.

Some bpf maps are mapping the vmalloc-based memory to userspace, and their
memory can't be accounted because of this implementation detail.

This patchset removes this limitation by moving the PageKmemcg flag into
one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
adds several checks and removes a couple of obsolete functions.  As the
result the code became more robust with fewer open-coded bit tricks.

This patch (of 4):

Currently there are many open-coded reads of the page->mem_cgroup pointer,
as well as a couple of read helpers, which are barely used.

It creates an obstacle on a way to reuse some bits of the pointer for
storing additional bits of information.  In fact, we already do this for
slab pages, where the last bit indicates that a pointer has an attached
vector of objcg pointers instead of a regular memcg pointer.

This commits uses 2 existing helpers and introduces a new helper to
converts all read sides to calls of these helpers:
  struct mem_cgroup *page_memcg(struct page *page);
  struct mem_cgroup *page_memcg_rcu(struct page *page);
  struct mem_cgroup *page_memcg_check(struct page *page);

page_memcg_check() is intended to be used in cases when the page can be a
slab page and have a memcg pointer pointing at objcg vector.  It does
check the lowest bit, and if set, returns NULL.  page_memcg() contains a
VM_BUG_ON_PAGE() check for the page not being a slab page.

To make sure nobody uses a direct access, struct page's
mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com

Conflicts:
	mm/memcontrol.c
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d1b942b7

scsi: spfc: initial commit the spfc module · dff67aa5

由 Yanling Song 提交于 10月 30, 2021

Ramaxel inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4DBD7
CVE: NA

Initial commit the spfc module for ramaxel Super FC adapter
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dff67aa5

mmap: userswap: fix some format issues · 7a99cdfb

由 Xiongfeng Wang 提交于 10月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4AHP2
CVE: NA

-------------------------------------------------

Fix some format issues in mm/mmap.c.

This patch also fix the wrong address range of mmu_notifier_range_init()
in do_user_swap().
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7a99cdfb

mmap: userswap: fix memory leak in do_mmap · 653e67ab

由 Xiongfeng Wang 提交于 10月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4AHP2
CVE: NA

-------------------------------------------------

When userswap is enabled, the memory pointed by 'pages' is not freed in
abnormal branch in do_mmap(). To fix the issue and keep do_mmap() mostly
unchanged, we rename do_mmap() to __do_mmap() and extract the memory
alloc and free code out of __do_mmap(). When __do_mmap() returns a error
value, we goto the error label to free the memory.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

653e67ab

perf stat: Add --quiet option · 67e65dde

由 Andi Kleen 提交于 10月 30, 2021

mainline inclusion
from mainline-5.11
commit 55a4de94
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CMQA
CVE: NA

--------------------------------

Add a new --quiet option to 'perf stat'. This is useful with 'perf stat
record' to write the data only to the perf.data file, which can lower
measurement overhead because the data doesn't need to be formatted.

On my 4C desktop:

  % time ./perf stat record  -e $(python -c 'print ",\
".join(["cycles"]*1000)')  -a -I 1000 sleep 5
  ...
  real    0m5.377s
  user    0m0.238s
  sys     0m0.452s
  % time ./perf stat record --quiet -e $(python -c 'print ",\
".join(["cycles"]*1000)')  -a -I 1000 sleep 5

  real    0m5.452s
  user    0m0.183s
  sys     0m0.423s

In this example it cuts the user time by 20%. On systems with more cores
the savings are higher.
Signed-off-by: NAndi Kleen <andi@firstfloor.org>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Link: http://lore.kernel.org/lkml/20201027002737.30942-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Nyin-xiujiang <yinxiujiang@kylinos.cn>
Reviewed-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

67e65dde

21 10月, 2021 20 次提交

net: dsa: bcm_sf2: Fix array overrun in bcm_sf2_num_active_ports() · 3d7719ea

由 Florian Fainelli 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit a23d3576215f7447c547976817b33cb975ecec84
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a23d3576215f7447c547976817b33cb975ecec84

--------------------------------

commit 02319bf1 upstream.

After d12e1c46 ("net: dsa: b53: Set correct number of ports in the
DSA struct") we stopped setting dsa_switch::num_ports to DSA_MAX_PORTS,
which created an off by one error between the statically allocated
bcm_sf2_priv::port_sts array (of size DSA_MAX_PORTS). When
dsa_is_cpu_port() is used, we end-up accessing an out of bounds member
and causing a NPD.

Fix this by iterating with the appropriate port count using
ds->num_ports.

Fixes: d12e1c46 ("net: dsa: b53: Set correct number of ports in the DSA struct")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3d7719ea

bnxt_en: Fix error recovery regression · d1840366

由 Michael Chan 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 9f2972e151dd16d3286c1407bec4e66395f30135
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9f2972e151dd16d3286c1407bec4e66395f30135

--------------------------------

commit eca4cf12 upstream.

The recent patch has introduced a regression by not reading the reset
count in the ERROR_RECOVERY async event handler.  We may have just
gone through a reset and the reset count has just incremented.  If
we don't update the reset count in the ERROR_RECOVERY event handler,
the health check timer will see that the reset count has changed and
will initiate an unintended reset.

Restore the unconditional update of the reset count in
bnxt_async_event_process() if error recovery watchdog is enabled.
Also, update the reset count at the end of the reset sequence to
make it even more robust.

Fixes: 1b2b9183 ("bnxt_en: Fix possible unintended driver initiated error recovery")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d1840366

x86/mce: Avoid infinite loop for copy from user recovery · c6a9d0e7

由 Tony Luck 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 619d747c1850bab61625ca9d8b4730f470a5947b
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=619d747c1850bab61625ca9d8b4730f470a5947b

--------------------------------

commit 81065b35 upstream.

There are two cases for machine check recovery:

1) The machine check was triggered by ring3 (application) code.
   This is the simpler case. The machine check handler simply queues
   work to be executed on return to user. That code unmaps the page
   from all users and arranges to send a SIGBUS to the task that
   triggered the poison.

2) The machine check was triggered in kernel code that is covered by
   an exception table entry. In this case the machine check handler
   still queues a work entry to unmap the page, etc. but this will
   not be called right away because the #MC handler returns to the
   fix up code address in the exception table entry.

Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.

Specifically, the work is queued using the ->mce_kill_me callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user, it enters an infinite loop processing the same entry for ever.

There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.

1) Some code (e.g. futex) first tries a get_user() with page faults
   disabled. If this fails, the code retries with page faults enabled
   expecting that this will resolve the page fault.

2) Copy from user code retries a copy in byte-at-time mode to check
   whether any additional bytes can be copied.

On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.

Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).

Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.

 [ bp: Massage commit message, drop noinstr, fix typo, extend panic
   messages. ]

Fixes: 5567d11c ("x86/mce: Send #MC singal from task work")
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c6a9d0e7

net: renesas: sh_eth: Fix freeing wrong tx descriptor · 205de40b

由 Yoshihiro Shimoda 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 47bc9c3929eb0bafe805bf87474615c8596d16bc
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47bc9c3929eb0bafe805bf87474615c8596d16bc

--------------------------------

[ Upstream commit 0341d5e3 ]

The cur_tx counter must be incremented after TACT bit of
txdesc->status was set. However, a CPU is possible to reorder
instructions and/or memory accesses between cur_tx and
txdesc->status. And then, if TX interrupt happened at such a
timing, the sh_eth_tx_free() may free the descriptor wrongly.
So, add wmb() before cur_tx++.
Otherwise NETDEV WATCHDOG timeout is possible to happen.

Fixes: 86a74ff2 ("net: sh_eth: add support for Renesas SuperH Ethernet")
Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

205de40b

mfd: lpc_sch: Rename GPIOBASE to prevent build error · 16ae8613

由 Randy Dunlap 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit b2f9b7455baf8ba12113520cdf0398b6141ae42f
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b2f9b7455baf8ba12113520cdf0398b6141ae42f

--------------------------------

[ Upstream commit cdff1eda ]

One MIPS platform (mach-rc32434) defines GPIOBASE. This macro
conflicts with one of the same name in lpc_sch.c. Rename the latter one
to prevent the build error.

../drivers/mfd/lpc_sch.c:25: error: "GPIOBASE" redefined [-Werror]
   25 | #define GPIOBASE        0x44
../arch/mips/include/asm/mach-rc32434/rb.h:32: note: this is the location of the previous definition
   32 | #define GPIOBASE        0x050000

Cc: Denis Turischev <denis@compulab.co.il>
Fixes: e82c60ae ("mfd: Introduce lpc_sch for Intel SCH LPC bridge")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

16ae8613

mfd: lpc_sch: Partially revert "Add support for Intel Quark X1000" · 0423c2ad

由 Andy Shevchenko 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 027c44b8c8e4140d8dc1633fb32586ab07f1ed3a
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=027c44b8c8e4140d8dc1633fb32586ab07f1ed3a

--------------------------------

[ Upstream commit 922e8ce8 ]

The IRQ support for SCH GPIO is not specific to the Intel Quark SoC.
Moreover the IRQ routing is quite interesting there, so while it's
needs a special support, the driver haven't it anyway yet.

Due to above remove basically redundant code of IRQ support.

This reverts commit ec689a8a.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0423c2ad

bnxt_en: Fix possible unintended driver initiated error recovery · 4de9eb20

由 Michael Chan 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 52a7e6667133553a51f93076f96c9294314ae44f
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=52a7e6667133553a51f93076f96c9294314ae44f

--------------------------------

[ Upstream commit 1b2b9183 ]

If error recovery is already enabled, bnxt_timer() will periodically
check the heartbeat register and the reset counter.  If we get an
error recovery async. notification from the firmware (e.g. change in
primary/secondary role), we will immediately read and update the
heartbeat register and the reset counter.  If the timer for the next
health check expires soon after this, we may read the heartbeat register
again in quick succession and find that it hasn't changed.  This will
trigger error recovery unintentionally.

The likelihood is small because we also reset fw_health->tmr_counter
which will reset the interval for the next health check.  But the
update is not protected and bnxt_timer() can miss the update and
perform the health check without waiting for the full interval.

Fix it by only reading the heartbeat register and reset counter in
bnxt_async_event_process() if error recovery is trasitioning to the
enabled state.  Also add proper memory barriers so that when enabling
for the first time, bnxt_timer() will see the tmr_counter interval and
perform the health check after the full interval has elapsed.

Fixes: 7e914027 ("bnxt_en: Enable health monitoring.")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4de9eb20

bnxt_en: Improve logging of error recovery settings information. · 480a39f7

由 Michael Chan 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 9a3f52f73c04dd8dddf9209fbf555f530b315ec3
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9a3f52f73c04dd8dddf9209fbf555f530b315ec3

--------------------------------

[ Upstream commit f4d95c3c ]

We currently only log the error recovery settings if it is enabled.
In some cases, firmware disables error recovery after it was
initially enabled.  Without logging anything, the user will not be
aware of this change in setting.

Log it when error recovery is disabled.  Also, change the reset count
value from hexadecimal to decimal.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

480a39f7

bnxt_en: Convert to use netif_level() helpers. · 5c58d46b

由 Michael Chan 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 639a2eddb7310bf6e76377a2e5610184712204a3
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=639a2eddb7310bf6e76377a2e5610184712204a3

--------------------------------

[ Upstream commit 871127e6 ]

Use the various netif_level() helpers to simplify the C code.  This was
suggested by Joe Perches.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1611642024-3166-1-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5c58d46b

bnxt_en: Consolidate firmware reset event logging. · cc0ca2d5

由 Michael Chan 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 01cad477a96834c16f76019698aac57c779893b4
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=01cad477a96834c16f76019698aac57c779893b4

--------------------------------

[ Upstream commit 5863b10a ]

Combine the three netdev_warn() calls into a single call, printed at
the NETIF_MSG_HW log level.
Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc0ca2d5

bnxt_en: log firmware debug notifications · 5e58d42c

由 Edwin Peer 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit fad75e046363ebf37c42c04dc8e3f68e0ed4c130
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fad75e046363ebf37c42c04dc8e3f68e0ed4c130

--------------------------------

[ Upstream commit a44daa8f ]

Firmware is capable of generating asynchronous debug notifications.
The event data is opaque to the driver and is simply logged. Debug
notifications can be enabled by turning on hardware status messages
using the ethtool msglvl interface.
Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5e58d42c

bnxt_en: Fix asic.rev in devlink dev info command · 9a675b6d

由 Michael Chan 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit f90a34fabaa5f6c62fb0dcf4990c10ad84678eff
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f90a34fabaa5f6c62fb0dcf4990c10ad84678eff

--------------------------------

[ Upstream commit 6fdab8a3 ]

The current asic.rev is incomplete and does not include the metal
revision.  Add the metal revision and decode the complete asic
revision into the more common and readable form (A0, B0, etc).

Fixes: 7154917a ("bnxt_en: Refactor bnxt_dl_info_get().")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Reviewed-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a675b6d

bnxt_en: fix stored FW_PSID version masks · 0c9a39b2

由 Edwin Peer 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 72450231845518503833b8cc1892f44e59dd10e3
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=72450231845518503833b8cc1892f44e59dd10e3

--------------------------------

[ Upstream commit 1656db67 ]

The FW_PSID version components are 8 bits wide, not 4.

Fixes: db28b6c7 ("bnxt_en: Fix devlink info's stored fw.psid version format.")
Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0c9a39b2

net: dsa: b53: Fix IMP port setup on BCM5301x · 26d2f875

由 Rafał Miłecki 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit eb635e008cb1d9845e19ac5910c82ff77e660306
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb635e008cb1d9845e19ac5910c82ff77e660306

--------------------------------

[ Upstream commit 63f8428b ]

Broadcom's b53 switches have one IMP (Inband Management Port) that needs
to be programmed using its own designed register. IMP port may be
different than CPU port - especially on devices with multiple CPU ports.

For that reason it's required to explicitly note IMP port index and
check for it when choosing a register to use.

This commit fixes BCM5301x support. Those switches use CPU port 5 while
their IMP port is 8. Before this patch b53 was trying to program port 5
with B53_PORT_OVERRIDE_CTRL instead of B53_GMII_PORT_OVERRIDE_CTRL(5).

It may be possible to also replace "cpu_port" usages with
dsa_is_cpu_port() but that is out of the scope of thix BCM5301x fix.

Fixes: 967dd82f ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

26d2f875

ip_gre: validate csum_start only on pull · 6f4126b2

由 Willem de Bruijn 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 87b34cd6485192777f632f92d592f2a71d8801a6
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=87b34cd6485192777f632f92d592f2a71d8801a6

--------------------------------

[ Upstream commit 8a0ed250 ]

The GRE tunnel device can pull existing outer headers in ipge_xmit.
This is a rare path, apparently unique to this device. The below
commit ensured that pulling does not move skb->data beyond csum_start.

But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and
thus csum_start is irrelevant.

Refine to exclude this. At the same time simplify and strengthen the
test.

Simplify, by moving the check next to the offending pull, making it
more self documenting and removing an unnecessary branch from other
code paths.

Strengthen, by also ensuring that the transport header is correct and
therefore the inner headers will be after skb_reset_inner_headers.
The transport header is set to csum_start in skb_partial_csum_set.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c48 ("ip_gre: add validation for csum_start")
Reported-by: NIdo Schimmel <idosch@idosch.org>
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6f4126b2

qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom · 08967c94

由 Dinghao Liu 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 9c98d2bd143420a26d3ba1b096326f28247a058d
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9c98d2bd143420a26d3ba1b096326f28247a058d

--------------------------------

[ Upstream commit 9ddbc2a0 ]

Previous commit 68233c58 removes the qlcnic_rom_lock()
in qlcnic_pinit_from_rom(), but remains its corresponding
unlock function, which is odd. I'm not very sure whether the
lock is missing, or the unlock is redundant. This bug is
suggested by a static analysis tool, please advise.

Fixes: 68233c58 ("qlcnic: updated reset sequence")
Signed-off-by: NDinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

08967c94

fq_codel: reject silly quantum parameters · aee0bcbc

由 Eric Dumazet 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 8c01c620ae61142760d8924ee96a3ff8eec00aa6
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8c01c620ae61142760d8924ee96a3ff8eec00aa6

--------------------------------

[ Upstream commit c7c5e6ff ]

syzbot found that forcing a big quantum attribute would crash hosts fast,
essentially using this:

tc qd replace dev eth0 root fq_codel quantum 4294967295

This is because fq_codel_dequeue() would have to loop
~2^31 times in :

	if (flow->deficit <= 0) {
		flow->deficit += q->quantum;
		list_move_tail(&flow->flowchain, &q->old_flows);
		goto begin;
	}

SFQ max quantum is 2^19 (half a megabyte)
Lets adopt a max quantum of one megabyte for FQ_CODEL.

Fixes: 4b549a2e ("fq_codel: Fair Queue Codel AQM")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

aee0bcbc

netfilter: socket: icmp6: fix use-after-scope · 9844020f

由 Benjamin Hesmans 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 6e2d36f2b1d19bee1096a53e6cbc9e0c0f0b05b4
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6e2d36f2b1d19bee1096a53e6cbc9e0c0f0b05b4

--------------------------------

[ Upstream commit 730affed ]

Bug reported by KASAN:

BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
Call Trace:
(...)
inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
(...)
nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91
net/ipv6/netfilter/nf_socket_ipv6.c:146)

It seems that this bug has already been fixed by Eric Dumazet in the
past in:
commit 78296c97 ("netfilter: xt_socket: fix a stack corruption bug")

But a variant of the same issue has been introduced in
commit d64d80a2 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")

`daddr` and `saddr` potentially hold a reference to ipv6_var that is no
longer in scope when the call to `nf_socket_get_sock_v6` is made.

Fixes: d64d80a2 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")
Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NBenjamin Hesmans <benjamin.hesmans@tessares.net>
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9844020f

net: dsa: b53: Set correct number of ports in the DSA struct · 6079479a

由 Rafał Miłecki 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit c361c955609a48d993f07caa93fdcc516de20849
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c361c955609a48d993f07caa93fdcc516de20849

--------------------------------

[ Upstream commit d12e1c46 ]

Setting DSA_MAX_PORTS caused DSA to call b53 callbacks (e.g.
b53_disable_port() during dsa_register_switch()) for invalid
(non-existent) ports. That made b53 modify unrelated registers and is
one of reasons for a broken BCM5301x support.

This problem exists for years but DSA_MAX_PORTS usage has changed few
times. It seems the most accurate to reference commit dropping
dsa_switch_alloc() in the Fixes tag.

Fixes: 7e99e347 ("net: dsa: remove dsa_switch_alloc helper")
Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6079479a

net: dsa: b53: Fix calculating number of switch ports · 846141b0

由 Rafał Miłecki 提交于 10月 21, 2021

stable inclusion
from stable-5.10.68
commit 0db7e0d9f67da8789d2aea12e0464e40359f9bd4
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0db7e0d9f67da8789d2aea12e0464e40359f9bd4

--------------------------------

[ Upstream commit cdb067d3 ]

It isn't true that CPU port is always the last one. Switches BCM5301x
have 9 ports (port 6 being inactive) and they use port 5 as CPU by
default (depending on design some other may be CPU ports too).

A more reliable way of determining number of ports is to check for the
last set bit in the "enabled_ports" bitfield.

This fixes b53 internal state, it will allow providing accurate info to
the DSA and is required to fix BCM5301x support.

Fixes: 967dd82f ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

846141b0

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功