提交 · d933b9f1aea75c2862beac8b3249f549ed615c44 · openeuler / Kernel

10 5月, 2022 40 次提交

config: enable CONFIG_QOS_SCHED_SMT_EXPELLER by · d933b9f1

由 Guan Jing 提交于 5月 10, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I52611
CVE: NA
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d933b9f1

sched: Add tracepoint for qos smt expeller · 8090ab77

由 Guan Jing 提交于 5月 10, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I52611
CVE: NA

--------------------------------

There are two caces that we add tracepoint:
a) while online task of sibling cpu is running, it is running
that offline task of local cpu will be set TIF_NEED_RESCHED;
b) while online task of sibling cpu is running, it will expell
that next picked offline task of local cpu.
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8090ab77

sched: Add statistics for qos smt expeller · 42f42fee

由 Guan Jing 提交于 5月 10, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I52611
CVE: NA

--------------------------------

We have added two statistics for qos smt expeller:
a) nr_qos_smt_send_ipi:the times of ipi which online task expel offline tasks;
b) nr_qos_smt_expelled:the statistics that offline task will not be picked times.
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

42f42fee

sched: Implement the function of qos smt expeller · fd5207be

由 Guan Jing 提交于 5月 10, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I52611
CVE: NA

--------------------------------

We implement the function of qos smt expeller by this following two points：
a)when online tasks and offline tasks are running on the same physical cpu,
online tasks will send ipi to expel offline tasks on the smt sibling cpus.
b)when online tasks are running, the smt sibling cpus will not allow
offline tasks to be selected.
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fd5207be

sched: Introduce qos smt expeller for co-location · 4e57e412

由 Guan Jing 提交于 5月 10, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I52611
CVE: NA

--------------------------------

We introduce the qos smt expeller, which lets
online tasks to expel offline tasks on the smt sibling cpus,
and exclusively occupy CPU resources.In this way we are
able to improve QOS of online tasks in co-location.
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4e57e412

ext4: fix symlink file size not match to file content · 83d11b1f

由 Ye Bin 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.18-rc4
commit a2b0b205
category: bugfix
bugzilla: 186450, https://gitee.com/openeuler/kernel/issues/I4YSJ7
CVE: NA

-----------------------------------------------

We got issue as follows:
[home]# fsck.ext4  -fn  ram0yb
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no
Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no

As the symlink file size does not match the file content. If the writeback
of the symlink data block failed, ext4_finish_bio() handles the end of IO.
However this function fails to mark the buffer with BH_write_io_error and
so when unmount does journal checkpoint it cannot detect the writeback
error and will cleanup the journal. Thus we've lost the correct data in the
journal area. To solve this issue, mark the buffer as BH_write_io_error in
ext4_finish_bio().

Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

83d11b1f

ext4: fix bug_on in start_this_handle during umount filesystem · 2ecb9c43

由 Ye Bin 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.18-rc4
commit b98535d0
category: bugfix
bugzilla: 186675, https://gitee.com/openeuler/kernel/issues/I55TUC
CVE: NA

-------------------------------------------------

We got issue as follows:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:389!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197
Workqueue: events flush_stashed_error_work
RIP: 0010:start_this_handle+0x41c/0x1160
RSP: 0018:ffff888106b47c20 EFLAGS: 00010202
RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac
RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050
RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a
R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000
FS:  0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 jbd2__journal_start+0x38a/0x790
 jbd2_journal_start+0x19/0x20
 flush_stashed_error_work+0x110/0x2b3
 process_one_work+0x688/0x1080
 worker_thread+0x8b/0xc50
 kthread+0x26f/0x310
 ret_from_fork+0x22/0x30
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---

Above issue may happen as follows:
      umount            read procfs            error_work
ext4_put_super
  flush_work(&sbi->s_error_work);

                      ext4_mb_seq_groups_show
	                ext4_mb_load_buddy_gfp
			  ext4_mb_init_group
			    ext4_mb_init_cache
	                      ext4_read_block_bitmap_nowait
			        ext4_validate_block_bitmap
				  ext4_error
			            ext4_handle_error
			              schedule_work(&EXT4_SB(sb)->s_error_work);

  ext4_unregister_sysfs(sb);
  jbd2_journal_destroy(sbi->s_journal);
    journal_kill_thread
      journal->j_flags |= JBD2_UNMOUNT;

                                          flush_stashed_error_work
				            jbd2_journal_start
					      start_this_handle
					        BUG_ON(journal->j_flags & JBD2_UNMOUNT);

To solve this issue, we call 'ext4_unregister_sysfs() before flushing
s_error_work in ext4_put_super().
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

conflicts:
	fs/ext4/super.c
Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Nyebin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2ecb9c43

ext4: fix use-after-free in ext4_search_dir · 9668ea4d

由 Ye Bin 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.18-rc4
commit c186f088
category: bugfix
bugzilla: 186477, https://gitee.com/openeuler/kernel/issues/I55UHT
CVE: NA

-------------------------------------------------

We got issue as follows:
EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue

==================================================================
BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline]
BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline]
BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331

CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:83 [inline]
 dump_stack+0x144/0x187 lib/dump_stack.c:124
 print_address_description+0x7d/0x630 mm/kasan/report.c:387
 __kasan_report+0x132/0x190 mm/kasan/report.c:547
 kasan_report+0x47/0x60 mm/kasan/report.c:564
 ext4_search_dir fs/ext4/namei.c:1394 [inline]
 search_dirblock fs/ext4/namei.c:1199 [inline]
 __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
 ext4_lookup_entry fs/ext4/namei.c:1622 [inline]
 ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690
 __lookup_hash+0xc5/0x190 fs/namei.c:1451
 do_rmdir+0x19e/0x310 fs/namei.c:3760
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x445e59
Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000
R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3
flags: 0x200000000000000()
raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================

ext4_search_dir:
  ...
  de = (struct ext4_dir_entry_2 *)search_buf;
  dlimit = search_buf + buf_size;
  while ((char *) de < dlimit) {
  ...
    if ((char *) de + de->name_len <= dlimit &&
	 ext4_match(dir, fname, de)) {
	    ...
    }
  ...
    de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
    if (de_len <= 0)
      return -1;
    offset += de_len;
    de = (struct ext4_dir_entry_2 *) ((char *) de + de_len);
  }

Assume:
de=0xffff8881317c2fff
dlimit=0x0xffff8881317c3000

If read 'de->name_len' which address is 0xffff8881317c3005, obviously is
out of range, then will trigger use-after-free.
To solve this issue, 'dlimit' must reserve 8 bytes, as we will read
'de->name_len' to judge if '(char *) de + de->name_len' out of range.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Nyebin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9668ea4d

KVM: s390: Return error on SIDA memop on normal guest · 0387db2d

由 Janis Schoetterl-Glausch 提交于 5月 10, 2022

stable inclusion
from stable-v5.10.100
commit b62267b8b06e9b8bb429ae8f962ee431e6535d60
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I4U746
CVE: CVE-2022-0516

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b62267b8b06e9b8bb429ae8f962ee431e6535d60

--------------------------------

commit 2c212e1b upstream.

Refuse SIDA memops on guests which are not protected.
For normal guests, the secure instruction data address designation,
which determines the location we access, is not under control of KVM.

Fixes: 19e12277 (KVM: S390: protvirt: Introduce instruction data area bounce buffer)
Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0387db2d

fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages · 6b48ea3c

由 Zhihao Cheng 提交于 5月 10, 2022

hulk inclusion
category: bugfix
bugzilla: 185955, https://gitee.com/openeuler/kernel/issues/I55AKK
CVE: NA
backport: openEuler-22.03-LTS

--------------------------------

Commit 505a666e ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during wb_writeback, which
may cause a potential ABBA dead lock:

    wb_writeback		fat_file_fsync
blk_start_plug(&plug)
for (;;) {
  iter i-1: some reqs have been added into plug->mq_list  // LOCK A
  iter i:
    progress = __writeback_inodes_wb(wb, work)
    . writeback_sb_inodes // fat's bdev
    .   __writeback_single_inode
    .   . generic_writepages
    .   .   __block_write_full_page
    .   .   . . 	    __generic_file_fsync
    .   .   . . 	      sync_inode_metadata
    .   .   . . 	        writeback_single_inode
    .   .   . . 		  __writeback_single_inode
    .   .   . . 		    fat_write_inode
    .   .   . . 		      __fat_write_inode
    .   .   . . 		        sync_dirty_buffer	// fat's bdev
    .   .   . . 			  lock_buffer(bh)	// LOCK B
    .   .   . . 			    submit_bh
    .   .   . . 			      blk_mq_get_tag	// LOCK A
    .   .   . trylock_buffer(bh)  // LOCK B
    .   .   .   redirty_page_for_writepage
    .   .   .     wbc->pages_skipped++
    .   .   --wbc->nr_to_write
    .   wrote += write_chunk - wbc.nr_to_write  // wrote > 0
    .   requeue_inode
    .     redirty_tail_locked
    if (progress)    // progress > 0
      continue;
  iter i+1:
      queue_io
      // similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug)   // flush plug won't be called

Above process triggers a hungtask like:
[  399.044861] INFO: task bb:2607 blocked for more than 30 seconds.
[  399.046824]       Not tainted 5.18.0-rc1-00005-gefae4d9eb6a2-dirty
[  399.051539] task:bb              state:D stack:    0 pid: 2607 ppid:
2426 flags:0x00004000
[  399.051556] Call Trace:
[  399.051570]  __schedule+0x480/0x1050
[  399.051592]  schedule+0x92/0x1a0
[  399.051602]  io_schedule+0x22/0x50
[  399.051613]  blk_mq_get_tag+0x1d3/0x3c0
[  399.051640]  __blk_mq_alloc_requests+0x21d/0x3f0
[  399.051657]  blk_mq_submit_bio+0x68d/0xca0
[  399.051674]  __submit_bio+0x1b5/0x2d0
[  399.051708]  submit_bio_noacct+0x34e/0x720
[  399.051718]  submit_bio+0x3b/0x150
[  399.051725]  submit_bh_wbc+0x161/0x230
[  399.051734]  __sync_dirty_buffer+0xd1/0x420
[  399.051744]  sync_dirty_buffer+0x17/0x20
[  399.051750]  __fat_write_inode+0x289/0x310
[  399.051766]  fat_write_inode+0x2a/0xa0
[  399.051783]  __writeback_single_inode+0x53c/0x6f0
[  399.051795]  writeback_single_inode+0x145/0x200
[  399.051803]  sync_inode_metadata+0x45/0x70
[  399.051856]  __generic_file_fsync+0xa3/0x150
[  399.051880]  fat_file_fsync+0x1d/0x80
[  399.051895]  vfs_fsync_range+0x40/0xb0
[  399.051929]  __x64_sys_fsync+0x18/0x30

In my test, 'need_resched()' (which is imported by 590dca3a "fs-writeback:
unplug before cond_resched in writeback_sb_inodes") in function
'writeback_sb_inodes()' seldom comes true, unless cond_resched() is deleted
from write_cache_pages().

Fix it by correcting wrote number according number of skipped pages
in writeback_sb_inodes().

Goto Link to find a reproducer.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215837
Cc: stable@vger.kernel.org # v4.3
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6b48ea3c

ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty · c4d51010

由 Zhihao Cheng 提交于 5月 10, 2022

hulk inclusion
category: bugfix
bugzilla: 185955, https://gitee.com/openeuler/kernel/issues/I50DVI?from=project-issue

--------------------------------

There at least 6 PEBs reserved on UBI device:
1. EBA_RESERVED_PEBS[1]
2. WL_RESERVED_PEBS[1]
3. UBI_LAYOUT_VOLUME_EBS[2]
4. MIN_FASTMAP_RESERVED_PEBS[2]

When all ubi volumes take all their PEBs, there are 3 (EBA_RESERVED_PEBS +
WL_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS - MIN_FASTMAP_TAKEN_PEBS[1])
free PEBs. Since f9c34bb5 ("ubi: Fix producing anchor PEBs") and
4b68bf9a ("ubi: Select fastmap anchor PEBs considering wear level
rules") applied, there is only 1 (3 - FASTMAP_ANCHOR_PEBS[1] -
FASTMAP_NEXT_ANCHOR_PEBS[1]) free PEB to fill pool and wl_pool, after
filling pool, wl_pool is always empty. So, UBI could be stuck in an
infinite loop:

	ubi_thread	   system_wq
wear_leveling_worker <--------------------------------------------------
  get_peb_for_wl							|
    // fm_wl_pool, used = size = 0					|
    schedule_work(&ubi->fm_work)					|
									|
		    update_fastmap_work_fn				|
		      ubi_update_fastmap				|
			ubi_refill_pools				|
			// ubi->free_count - ubi->beb_rsvd_pebs < 5	|
			// wl_pool is not filled with any PEBs		|
			schedule_erase(old_fm_anchor)			|
			ubi_ensure_anchor_pebs				|
			  __schedule_ubi_work(wear_leveling_worker)	|
									|
__erase_worker								|
  ensure_wear_leveling							|
    __schedule_ubi_work(wear_leveling_worker) --------------------------

, which cause high cpu usage of ubi_bgt:
top - 12:10:42 up 5 min,  2 users,  load average: 1.76, 0.68, 0.27
Tasks: 123 total,   3 running,  54 sleeping,   0 stopped,   0 zombie

  PID USER PR   NI VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1589 root 20   0   0      0      0 R  45.0  0.0   0:38.86 ubi_bgt0d
  319 root 20   0   0      0      0 I  15.2  0.0   0:15.29 kworker/0:3-eve
  371 root 20   0   0      0      0 I  14.9  0.0   0:12.85 kworker/3:3-eve
   20 root 20   0   0      0      0 I  11.3  0.0   0:05.33 kworker/1:0-eve
  202 root 20   0   0      0      0 I  11.3  0.0   0:04.93 kworker/2:3-eve

In 4b68bf9a ("ubi: Select fastmap anchor PEBs considering wear level
rules"), there are three key changes:
  1) Choose the fastmap anchor when the most free PEBs are available.
  2) Enable anchor move within the anchor area again as it is useful
     for distributing wear.
  3) Import a candidate fm anchor and check this PEB's erase count during
     wear leveling. If the wear leveling limit is exceeded, use the used
     anchor area PEB with the lowest erase count to replace it.

The anchor candidate can be removed, we can check fm_anchor PEB's erase
count during wear leveling. Fix it by:
  1) Removing 'fm_next_anchor' and check 'fm_anchor' during wear leveling.
  2) Preferentially filling one free peb into fm_wl_pool in condition of
     ubi->free_count > ubi->beb_rsvd_pebs, then try to reserve enough
     free count for fastmap non anchor pebs after the above prerequisites
     are met.
Then, there are at least 1 PEB in pool and 1 PEB in wl_pool after calling
ubi_refill_pools() with all erase works done.

Fetch a reproducer in [Link].

Fixes: 4b68bf9a ("ubi: Select fastmap anchor PEBs ... rules")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215407Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
v1->v2:
  Update fm pool filling strategy, consider reserve enough free count
  for fastmap non anchor pebs while filling fm_wl_pool.
v2->v3:
  Remove 'fm_next_anchor' and check 'fm_anchor' during wear
  leveling.
v3->v4:
  Reserve 'fm_next_anchor' member in 'ubi_device' to keep kabi no
  changes.
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c4d51010

perf c2c: Update documentation for display option 'all' · 5424c701

由 Yang Jihong 提交于 5月 10, 2022

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53L83
CVE: NA

Reference: https://lore.kernel.org/all/20210104020930.GA4897@leoy-ThinkPad-X240s/

-------------------

Since the new display option 'all' is introduced, this patch is to
update the documentation to reflect it.
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5424c701

perf c2c: Sort on all cache hit for load operations · c273fc36

由 Yang Jihong 提交于 5月 10, 2022

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53L83
CVE: NA

Reference: https://lore.kernel.org/all/20210104020930.GA4897@leoy-ThinkPad-X240s/

-------------------

Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds option 'all' so can sort on the all cache hit for load operation.
This new introduced option can be a choice for profiling cache false
sharing if the memory event doesn't contain HITM tags.

For displaying with option 'all', the "Shared Data Cache Line Table" and
"Shared Cache Line Distribution Pareto" both have difference comparing
to other three display options.

For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
it sorts with the metrics "tot_ld_hit" and "percent_tot_ld_hit".  If
without HITM metrics, users can analyze the load hit statistics for all
cache levels, so the dimensions of total load hit is used to replace
HITM dimensions.

For Pareto, every single cache line shows the metrics "cl_tot_ld_hit"
and "cl_tot_ld_miss" instead of "cl_rmt_hitm" and "percent_lcl_hitm",
and the single cache line view is sorted by metrics "tot_ld_hit".

As result, we can get the 'all' display as follows:

  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x556f25dff100     0    1895    75.73%      4591     7840     4591     3249     2633      616      849     2734       67        58      883         0        0         0         0
        1      0x556f25dff080     0       1    13.10%       794      794      794        0        0        0      164      486       28        20       96         0        0         0         0
        2      0x556f25dff0c0     0       1    10.01%       607      607      607        0        0        0      107        5        5       488        2         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0     4591        0     2633      616      0x556f25dff100
    -------------------------------------------------------------
            20.52%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc1d         0      2200      1276      942         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            19.82%    0.00%   38.06%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc16         0      2190      1130     1912         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.25%    0.00%   56.63%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc16         0      2173      1074     2329         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.23%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc1d         0      2013      1220      837         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    3.11%   59.90%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc28         0         0         0      451         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    2.20%   40.10%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc28         0         0         0      305         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            12.00%    0.00%    0.00%    0.00%                0x20     0       1    28079    28083:reader_thd      0x556f25bfdc73         0       159       107      551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
            11.17%    0.00%    0.00%    0.00%                0x20     0       1    28079    28084:reader_thd      0x556f25bfdc73         0       148       108      513         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0

  [...]
Signed-off-by: NLeo Yan <leo.yan@linaro.org>

conflict:
        tools/perf/builtin-c2c.c
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c273fc36

perf c2c: Refactor node header · 483b3bb3

由 Yang Jihong 提交于 5月 10, 2022

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53L83
CVE: NA

Reference: https://lore.kernel.org/all/20210104020930.GA4897@leoy-ThinkPad-X240s/

-------------------

The node header array contains 3 items, each item is used for one of
the 3 flavors for node accessing info.  To extend sorting on all load
references and not always stick to HITMs, the second header string
"Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
"Node{cpus %loads %stores}").

For this reason, this patch changes the node header array to three
flat variables and uses switch-case in function setup_nodes_header(),
thus it is easier for altering the header string.
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

483b3bb3

perf c2c: Add dimensions for load miss · 6d9ee5b3

由 Yang Jihong 提交于 5月 10, 2022

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53L83
CVE: NA

Reference: https://lore.kernel.org/all/20210104020930.GA4897@leoy-ThinkPad-X240s/

-------------------

Add dimensions for load miss and its percentage calculation, which is to
be displayed in the single cache line output.
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6d9ee5b3

perf c2c: Add dimensions for load hit · d731e184

由 Yang Jihong 提交于 5月 10, 2022

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53L83
CVE: NA

Reference: https://lore.kernel.org/all/20210104020930.GA4897@leoy-ThinkPad-X240s/

-------------------

Add dimensions for load hit and its percentage calculation, which is to
be displayed in the single cache line output.
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d731e184

perf c2c: Add dimensions for total load hit · d2cdcdaf

由 Yang Jihong 提交于 5月 10, 2022

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I53L83
CVE: NA

Reference: https://lore.kernel.org/all/20210104020930.GA4897@leoy-ThinkPad-X240s/

-------------------

Arm SPE trace data doesn't support HITM, but we still want to explore
"perf c2c" tool to analyze cache false sharing.  If without HITM tag,
the tool cannot give out accurate result for cache false sharing, a
candidate solution is to sort the total load operations and connect with
the threads info, e.g. if multiple threads hit the same cache line for
many times, this can give out the hint that it's likely to cause cache
false sharing issue.

Unlike having HITM tag, the proposed solution is not accurate and might
introduce false positive reporting, but it's a pragmatic approach for
detecting false sharing if memory event doesn't support HITM.

To sort with the cache line hit, this patch adds dimensions for total
load hit and the associated percentage calculation.
Signed-off-by: NLeo Yan <leo.yan@linaro.org>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d2cdcdaf

rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe · 35054be4

由 Thomas Gleixner 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 24ee940d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I53K0E
CVE: NA

-------------------------------------------------------------------------

While reporting a quiescent state for a given CPU, rcu_core() takes
advantage of the freshly loaded grace period sequence number and the
locked rnp to accelerate the callbacks whose sequence number have been
assigned a stale value.

This action is only necessary when the rdp isn't offloaded, otherwise
the NOCB kthreads already take care of the callbacks progression.

However the check for the offloaded state is volatile because it is
performed outside the IRQs disabled section. It's possible for the
offloading process to preempt rcu_core() at that point on PREEMPT_RT.

This is dangerous because rcu_core() may end up accelerating callbacks
concurrently with NOCB kthreads without appropriate locking.

Fix this with moving the offloaded check inside the rnp locking section.
Reported-and-tested-by: NValentin Schneider <valentin.schneider@arm.com>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Tested-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
Conflicts:
	kernel/rcu/tree.c
Move "const bool offloaded = ..." down, so that it is within the irq
disabled protection range, and with minimal changes.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

35054be4

livepatch/arm64: Fix incorrect endian conversion when long jump · 74534bb4