- 02 9月, 2020 40 次提交
-
-
由 Joseph Qi 提交于
to #29357063 The blkg lookup or create logic may bring much overhead even iocost is disabled. So bypass it earlier in such case. Fixes: 9da41925 ("alinux: iocost: fix NULL pointer dereference in ioc_rqos_throttle") Reported-by: NHongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 youngjun 提交于
to #29273482 When "ovl_is_inuse" true case, trap inode reference not put. plus adding the comment explaining sequence of ovl_is_inuse after ovl_setup_trap. Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection") Cc: <stable@vger.kernel.org> # v4.19+ Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: Nyoungjun <her0gyugyu@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Link: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs-next&id=24f14009b8f1754ec2ae4c168940c01259b0f88aSigned-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Dust Li 提交于
fix #29262413 The original patch didn't fit 4.19 well since upstream kernel use #if HAVE_ATTR_TEST xxx #endif but in 4.19, we use #ifdef HAVE_ATTR_TEST xxx #endif As a result, the origin patch enabled the macro in #ifdef HAVE_ATTR_TEST and finnaly cause the build fail when run: $make M=samples/bpf make -C /mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../ O= Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' CC samples/bpf/syscall_nrs.s UPD samples/bpf/syscall_nrs.h HOSTCC samples/bpf/test_lru_dist HOSTCC samples/bpf/sock_example HOSTCC samples/bpf/bpf_load.o In file included from ./tools/perf/perf-sys.h:9:0, from samples/bpf/bpf_load.c:29: ./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’: ./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function) if (unlikely(test_attr__enabled)) ^ ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ # define unlikely(x) __builtin_expect(!!(x), 0) ^ ./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in if (unlikely(test_attr__enabled)) ^ ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ # define unlikely(x) __builtin_expect(!!(x), 0) ^ In file included from samples/bpf/bpf_load.c:29:0: ./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration] test_attr__open(attr, pid, cpu, fd, group_fd, flags); ^~~~~~~~~~~~~~~ make[1]: *** [samples/bpf/bpf_load.o] Error 1 make: *** [_module_samples/bpf] Error 2 This reverts commit 4665759a. Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 KP Singh 提交于
to #29262413 commit 98beb3edeb974e906a81f305d88f7bc96b2ec83e upstream. This was added in commit eb111869301e ("compiler-types.h: add asm_inline definition") and breaks samples/bpf as clang does not support asm __inline. Fixes: eb111869301e ("compiler-types.h: add asm_inline definition") Co-developed-by: NFlorent Revest <revest@google.com> Signed-off-by: NFlorent Revest <revest@google.com> Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NSong Liu <songliubraving@fb.com> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191002191652.11432-1-kpsingh@chromium.orgSigned-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Alexei Starovoitov 提交于
to #29262413 commit 636e78b1cdb40b77a79b143dbd9d94847b360efa upstream. clang started to error on invalid asm clobber usage in x86 headers and many bpf program samples failed to build with the message: CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14: In file included from ../include/linux/in.h:23: In file included from ../include/uapi/linux/in.h:24: In file included from ../include/linux/socket.h:8: In file included from ../include/linux/uio.h:14: In file included from ../include/crypto/hash.h:16: In file included from ../include/linux/crypto.h:26: In file included from ../include/linux/uaccess.h:5: In file included from ../include/linux/sched.h:15: In file included from ../include/linux/sem.h:5: In file included from ../include/uapi/linux/sem.h:5: In file included from ../include/linux/ipc.h:9: In file included from ../include/linux/refcount.h:72: ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "er", i, "cx"); ^ ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "cx"); ^ 2 errors generated. Override volatile() to workaround the problem. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yonghong Song 提交于
fix #29255532 commit 6bf3bbe1f4d4cf405e3c2bf07bbdff56d3223ec8 upstream. x86 compilation has required asm goto support since 4.17. Since clang does not support asm goto, at 4.17, Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation") worked around the issue by permitting an alternative implementation without asm goto for clang. At 5.0, more asm goto usages appeared. [yhs@148 x86]$ egrep -r asm_volatile_goto include/asm/cpufeature.h: asm_volatile_goto("1: jmp 6f\n" include/asm/jump_label.h: asm_volatile_goto("1:" include/asm/jump_label.h: asm_volatile_goto("1:" include/asm/rmwcc.h: asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ include/asm/uaccess.h: asm_volatile_goto("\n" \ include/asm/uaccess.h: asm_volatile_goto("\n" \ [yhs@148 x86]$ Compiling samples/bpf directories, most bpf programs failed compilation with error messages like: In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2: In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6: In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15: In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5: In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5: In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9: In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72: /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", ^ /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc' __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc, \ ^ /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc' asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ ^ /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto' #define asm_volatile_goto(x...) asm goto(x) Most implementation does not even provide an alternative implementation. And it is also not practical to make changes for each call site. This patch workarounded the asm goto issue by redefining the macro like below: #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto") If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes. With this patch, all bpf programs under samples/bpf can pass compilation. Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as they do not access kernel internal headers. Fixes: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition") Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #29276773 commit 16d598030a37853a7a6b4384cad19c9c0af2f021 upstream. 59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing io_req_init_async(), but left out work.flags and hash. Do it earlier. Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #29276773 commit dd821e0c95a64b5923a0c57f07d3f7563553e756 upstream. Ensure to set msg.msg_name for the async portion of send/recvmsg, as the header copy will copy to/from it. Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #29276773 commit 309fc03a3284af62eb6082fb60327045a1dabf57 upstream. We currently account the memory after the exit work has been run, but that leaves a gap where a process has closed its ring and until the memory has been accounted as freed. If the memlocked ulimit is borderline, then that can introduce spurious setup errors returning -ENOMEM because the free work hasn't been run yet. Account this as freed when we close the ring, as not to expose a tiny gap where setting up a new ring can fail. Fixes: 85faa7b8346e ("io_uring: punt final io_ring_ctx wait-and-free to workqueue") Cc: stable@vger.kernel.org # v5.7 Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yang Yingliang 提交于
to #29276773 commit 667e57da358f61b6966e12e925a69e42d912e8bb upstream. I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0x607eeac06e78 (size 8): comm "test", pid 295, jiffies 4294735835 (age 31.745s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0 [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0 [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480 [<00000000591b89a6>] do_syscall_64+0x56/0xa0 [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Call percpu_ref_exit() on error path to avoid refcount memleak. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Cc: stable@vger.kernel.org Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yang Yingliang 提交于
to #29276773 commit f3bd9dae3708a0ff6b067e766073ffeb853301f9 upstream. I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0xffff888113e02300 (size 488): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`....... backtrace: [<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 BUG: memory leak unreferenced object 0xffff8881152dd5e0 (size 16): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 16 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline] [<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440 [<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 If io_sqe_file_register() failed, we need put the file that get by fget() to avoid the memleak. Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Cc: stable@vger.kernel.org Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 David Howells 提交于
task #29263287 commit 3f19b2ab97a97b413c24b66c67ae16daa4f56c35 upstream Make the inode hash table RCU searchable so that searches that want to access or modify an inode without taking a ref on that inode can do so without taking the inode hash table lock. The main thing this requires is some RCU annotation on the list manipulation operations. Inodes are already freed by RCU in most cases. Users of this interface must take care as the inode may be still under construction or may be being torn down around them. There are at least three instances where this can be of use: (1) Testing whether the inode number iunique() is going to return is currently unique (the iunique_lock is still held). (2) Ext4 date stamp updating. (3) AFS callback breaking. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> cc: linux-ext4@vger.kernel.org cc: linux-afs@lists.infradead.org [jeffle: resolve collision in afs_break_one_callback since code base change] Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #29233603 commit 6d5f904904608a9cd32854d7d0a4dd65b27f9935 upstream For those applications which are not willing to use io_uring_enter() to reap and handle cqes, they may completely rely on liburing's io_uring_peek_cqe(), but if cq ring has overflowed, currently because io_uring_peek_cqe() is not aware of this overflow, it won't enter kernel to flush cqes, below test program can reveal this bug: static void test_cq_overflow(struct io_uring *ring) { struct io_uring_cqe *cqe; struct io_uring_sqe *sqe; int issued = 0; int ret = 0; do { sqe = io_uring_get_sqe(ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); break;; } ret = io_uring_submit(ring); if (ret <= 0) { if (ret != -EBUSY) fprintf(stderr, "sqe submit failed: %d\n", ret); break; } issued++; } while (ret > 0); assert(ret == -EBUSY); printf("issued requests: %d\n", issued); while (issued) { ret = io_uring_peek_cqe(ring, &cqe); if (ret) { if (ret != -EAGAIN) { fprintf(stderr, "peek completion failed: %s\n", strerror(ret)); break; } printf("left requets: %d\n", issued); continue; } io_uring_cqe_seen(ring, cqe); issued--; printf("left requets: %d\n", issued); } } int main(int argc, char *argv[]) { int ret; struct io_uring ring; ret = io_uring_queue_init(16, &ring, 0); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } test_cq_overflow(&ring); return 0; } To fix this issue, export cq overflow status to userspace by adding new IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc upstream The following commit introduced a warning on error reports without a non-zero grain value. 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") The amd64_edac_mod module does not provide a value, so the warning will be given on the first reported memory error. Set the grain per DIMM to cacheline size (64 bytes). This is the current recommendation. Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit e53a3b267fb0a79db9ca1f1e08b97889b22013e6 upstream Chip Select memory size reporting on AMD Family 17h was recently fixed in order to account for interleaving. However, the current method is not robust. The Chip Select Address Mask can be used to find the memory size. There are a couple of cases. 1) For single-rank and dual-rank non-interleaved, use the address mask plus 1 as the size. 2) For dual-rank interleaved, do #1 but "de-interleave" the address mask first. Always "de-interleave" the address mask in order to simplify the code flow. Bit mask manipulation is necessary to check for interleaving, so just go ahead and do the de-interleaving. In the non-interleaved case, the original and de-interleaved address masks will be the same. To de-interleave the mask, count the number of zero bits in the middle of the mask and swap them with the most significant bits. For example, Original=0xFFFF9FE, De-interleaved=0x3FFFFFE Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit 353a1fcb8f9e5857c0fb720b9e57a86c1fb7c17e upstream Currently, the DIMM info for AMD Family 17h systems is initialized in init_csrows(). This function is shared with legacy systems, and it has a limit of two channel support. This prevents initialization of the DIMM info for a number of ranks, so there will be missing ranks in the EDAC sysfs. Create a new init_csrows_df() for Family17h+ and revert init_csrows() back to pre-Family17h support. Loop over all channels in the new function in order to support systems with more than two channels. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-4-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream The struct chip_select array that's used for saving chip select bases and masks is fixed at length of two. There should be one struct chip_select for each controller, so this array should be increased to support systems that may have more than two controllers. Increase the size of the struct chip_select array to eight, which is the largest number of controllers per die currently supported on AMD systems. Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per channel. Also, carve out the Family 17h+ reading of the bases/masks into a separate function. This effectively reverts the original bases/masks reading code to before Family 17h support was added. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Borislav Petkov 提交于
fix #29035167 commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea. Unfortunately, this commit caused wrong detection of chip select sizes on some F17h client machines: --- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100 +++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200 EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: UMC1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0M Revert it for now until it has been solved properly. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit e48667b865480d8bf0f1171a8b474ffc785b9ace upstream Family 19h introduces change in slice, core and thread specification in its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is incompatible with Family 17h's version of the register. Introduce a new path in l3_thread_slice_mask() to do things differently for Family 19h vs. Family 17h, otherwise the new hardware doesn't get programmed correctly. Instead of a linear core--thread bitmask, Family 19h takes an encoded core number, and a separate thread mask. There are new bits that are set for all cores and all slices, of which only the latter is used, since the driver counts events for all slices on behalf of the specified CPU. Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode decision based on Family 17h or above, not just 17h and 18h: the Family 19h Data Fabric PMC is compatible with the Family 17h DF PMC. [ bp: Touchups. ] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 9689dbbeaea884d19e3085439c6a247ef986b2af upstream Convert the l3_thread_slice_mask() function to use the more readable topology_* helper functions, more intuitive variable names like shift and thread_mask, and BIT_ULL(). No functional changes. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 4dcc3df82573a946c620dda5fb00e27c7b080105 upstream In order to better accommodate the upcoming Family 19h, given the 80-char line limit, move the existing code into a new l3_thread_slice_mask() function. No functional changes. [ bp: Touchups. ] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 753039ef8b2f1078e5bff8cd42f80578bf6385b0 upstream Family 19h CPUs are Zen-based and still share most architectural features with Family 17h CPUs, and therefore still need to call init_amd_zn() e.g., to set the RECLAIM_DISTANCE override. init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used in amd_set_core_ssb_state(), which isn't called on some late model Family 17h CPUs, nor on any Family 19h CPUs: X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those later model CPUs, where the SSBD mitigation is done via the SPEC_CTRL MSR instead of the LS_CFG MSR. Family 19h CPUs also don't have the erratum where the CPB feature bit isn't set, but that code can stay unchanged and run safely on Family 19h. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 5738891229a25e9e678122a843cbf0466a456d0c upstream Description of hardware operation --------------------------------- The core AMD PMU has a 4-bit wide per-cycle increment for each performance monitor counter. That works for most events, but now with AMD Family 17h and above processors, some events can occur more than 15 times in a cycle. Those events are called "Large Increment per Cycle" events. In order to count these events, two adjacent h/w PMCs get their count signals merged to form 8 bits per cycle total. In addition, the PERF_CTR count registers are merged to be able to count up to 64 bits. Normally, events like instructions retired, get programmed on a single counter like so: PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count The next counter at MSRs 0xc0010202-3 remains unused, or can be used independently to count something else. When counting Large Increment per Cycle events, such as FLOPs, however, we now have to reserve the next counter and program the PERF_CTL (config) register with the Merge event (0xFFF), like so: PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count The count is widened from the normal 48-bits to 64 bits by having the second counter carry the higher 16 bits of the count in its lower 16 bits of its counter register. The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge event before the even counter, PERF_CTL0. The Large Increment feature is available starting with Family 17h. For more details, search any Family 17h PPR for the "Large Increment per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this version: https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip Description of software operation --------------------------------- The following steps are taken in order to support reserving and enabling the extra counter for Large Increment per Cycle events: 1. In the main x86 scheduler, we reduce the number of available counters by the number of Large Increment per Cycle events being scheduled, tracked by a new cpuc variable 'n_pair' and a new amd_put_event_constraints_f17h(). This improves the counter scheduler success rate. 2. In perf_assign_events(), if a counter is assigned to a Large Increment event, we increment the current counter variable, so the counter used for the Merge event is removed from assignment consideration by upcoming event assignments. 3. In find_counter(), if a counter has been found for the Large Increment event, we set the next counter as used, to prevent other events from using it. 4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e., we add Merge event accounting to the existing used_mask logic. 5. Finally, we add on the programming of Merge event to the neighbouring PMC counters in the counter enable/disable{_all} code paths. Currently, software does not support a single PMU with mixed 48- and 64-bit counting, so Large increment event counts are limited to 48 bits. In set_period, we zero-out the upper 16 bits of the count, so the hardware doesn't copy them to the even counter's higher bits. Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm vaddps instruction executed in a loop 100 million times: perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload> Performance counter stats for '<workload>': 800,000,000 cpu/fp_ret_sse_avx_ops.all/u 300,042,101 cpu/instructions/u Prior to this patch, the reported SSE/AVX FLOPs retired count would be wrong. [peterz: lots of renames and edits to the code] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 471af006a747f1c535c8a8c6c0973c320fe01b22 upstream AMD Family 17h processors and above gain support for Large Increment per Cycle events. Unfortunately there is no CPUID or equivalent bit that indicates whether the feature exists or not, so we continue to determine eligibility based on a CPU family number comparison. For Large Increment per Cycle events, we add a f17h-and-compatibles get_event_constraints_f17h() that returns an even counter bitmask: Large Increment per Cycle events can only be placed on PMCs 0, 2, and 4 out of the currently available 0-5. The only currently public event that requires this feature to report valid counts is PMCx003 "Retired SSE/AVX Operations". Note that the CPU family logic in amd_core_pmu_init() is changed so as to be able to selectively add initialization for features available in ranges of backward-compatible CPU families. This Large Increment per Cycle feature is expected to be retained in future families. A side-effect of assigning a new get_constraints function for f17h disables calling the old (prior to f15h) amd_get_event_constraints implementation left enabled by commit e40ed154 ("perf/x86: Add perf support for AMD family-17h processors"), which is no longer necessary since those North Bridge event codes are obsoleted. Also fix a spelling mistake whilst in the area (calulating -> calculating). Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Reinette Chatre 提交于
fix #29035100 commit 1182a49529edde899be4b4f0e1ab76e626976eb6 upstream perf_event_read_local() is the safest way to obtain measurements associated with performance events. In some cases the overhead introduced by perf_event_read_local() affects the measurements and the use of rdpmcl() is needed. rdpmcl() requires the index of the performance counter used so a helper is introduced to determine the index used by a provided performance event. The index used by a performance event may change when interrupts are enabled. A check is added to ensure that the index is only accessed with interrupts disabled. Even with this check the use of this counter needs to be done with care to ensure it is queried and used within the same disabled interrupts section. This change introduces a new checkpatch warning: CHECK: extern prototypes should be avoided in .h files +extern int x86_perf_rdpmc_index(struct perf_event *event); This warning was discussed and designated as a false positive in http://lkml.kernel.org/r/20180919091759.GZ24124@hirez.programming.kicks-ass.netSuggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/b277ffa78a51254f5414f7b1bc1923826874566e.1537377064.git.reinette.chatre@intel.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Dust Li 提交于
to #29272054 AF_XDP is a new AF family that support usespace applications communicate with XDP program directly. One promising use case is UDP Both x86_64 and aarch64 are enabled Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit 2b3b76b5ec67568da4bb475d3ce8a92ef494b5de upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. The uncore subsystem in Ice Lake server is similar to previous server. There are some differences in config register encoding and pci device IDs. The uncore PMON units in Ice Lake server include Ubox, Chabox, IIO, IRP, M2PCIE, PCU, M2M, PCIE3 and IMC. - For CHA, filter 1 register has been removed. The filter 0 register can be used by and of CHA events to be filterd by Thread/Core-ID. To do so, the control register's tid_en bit must be set to 1. - For IIO, there are some changes on event constraints. The MSR address and MSR offsets among counters are also changed. - For IRP, the MSR address and MSR offsets among counters are changed. - For M2PCIE, the counters are accessed by MSR now. Add new MSR address and MSR offsets. Change event constraints. - To determine the number of CHAs, have to read CAPID6(Low) and CAPID7 (High) now. - For M2M, update the PCICFG address and Device ID. - For UPI, update the PCICFG address, Device ID and counter address. - For M3UPI, update the PCICFG address, Device ID, counter address and event constraints. - For IMC, update the formular to calculate MMIO BAR address, which is MMIO_BASE + specific MEM_BAR offset. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/1585842411-150452-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit bc88a2fe216a51e8ab46d61f89d0c1b5a400470e upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. The offset between uncore boxes of free-running counters varies, e.g. IIO free-running counters on Ice Lake server. Add box_offsets, an array of offsets between adjacent uncore boxes. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1584470314-46657-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit 3442a9ecb8e72a33c28a2b969b766c659830e410 upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. The IMC uncore unit in Ice Lake server can only be accessed by MMIO, which is similar as Snow Ridge. Factor out __snr_uncore_mmio_init_box which can be shared with Ice Lake server in the following patch. No functional changes. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1584470314-46657-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit ee49532b38dd084650bf715eabe7e3828fb8d275 upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. IMC uncore unit can only be accessed via MMIO on Snow Ridge. The MMIO space of IMC uncore is at the specified offsets from the MEM0_BAR. Add snr_uncore_get_mc_dev() to locate the PCI device with MMIO_BASE and MEM0_BAR register. Add new ops to access the IMC registers via MMIO. Add 3 new free running counters for clocks, read and write bandwidth. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1556672028-119221-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit 07ce734dd8adc0f170d43c15a9b91b707a21b9d7 upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. The client IMC block is accessed by MMIO. Current code uses an informal way to access the block, which is not recommended. Clean up the code by using __iomem annotation and the accessor functions (read[lq]()). Move exit_box() and read_counter() to generic code, which can be shared with the server code later. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1556672028-119221-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit 3da04b8a00dd6d39970b9e764b78c5dfb40ec013 upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. A new MMIO type uncore box is introduced on Snow Ridge server. The counters of MMIO type uncore box can only be accessed by MMIO. Add a new uncore type, uncore_mmio_uncores, for MMIO type uncore blocks. Support MMIO type uncore blocks in CPU hot plug. The MMIO space has to be map/unmap for the first/last CPU. The context also need to be migrated if the bind CPU changes. Add mmio_init() to init and register PMUs for MMIO type uncore blocks. Add a helper to calculate the box_ctl address. The helpers which calculate ctl/ctr can be shared with PCI type uncore blocks. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1556672028-119221-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit c8872d90e0a3651a096860d3241625ccfa1647e0 upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. For uncore box which can only be accessed by MSR, its reference box->refcnt is updated in CPU hot plug. The uncore boxes need to be initalized and exited accordingly for the first/last CPU of a socket. Starts from Snow Ridge server, a new type of uncore box is introduced, which can only be accessed by MMIO. The driver needs to map/unmap MMIO space for the first/last CPU of a socket. Extract the codes of box ref/unref and init/exit for reuse later. There is no functional change. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1556672028-119221-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kan Liang 提交于
fix #29130534 commit 210cc5f9db7a5c66b7ca6290b7d35cc7db7e9dbd upstream. Backport summary: Backport to kernel 4.19.57 for ICX uncore support. The uncore subsystem on Snow Ridge is similar as previous SKX server. The uncore units on Snow Ridge include Ubox, Chabox, IIO, IRP, M2PCIE, PCU, M2M, PCIE3 and IMC. - The config register encoding and pci device IDs are changed. - For CHA, the umask_ext and filter_tid fields are changed. - For IIO, the ch_mask and fc_mask fields are changed. - For M2M, the mask_ext field is changed. - Add new PCIe3 unit for PCIe3 root port which provides the interface between PCIe devices, plugged into the PCIe port, and the components (in M2IOSF). - IMC can only be accessed via MMIO on Snow Ridge now. Current common code doesn't support it yet. IMC will be supported in following patches. - There are 9 free running counters for IIO CLOCKS and bandwidth In. - Full uncore event list is not published yet. Event constrain is not included in this patch. It will be added later separately. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1556672028-119221-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYunying Sun <yunying.sun@intel.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
task #29063222 Current blk throttle codes always do io statistics even though users don't specify valid throttle rules, which will introduce significant overheads for applications that don't use blk throttle function and is wrose in arm, see below perf data captured in arm: sudo taskset -c 66 fio -ioengine=io_uring -sqthread_poll=1 -hipri=1 -sqthread_poll_cpu=65 -registerfiles=1 -fixedbufs=1 -direct=1 -filename=/dev/nvme0n1 -bs=4k -iodepth=8 -rw=randwrite -time_based -ramp_time=30 -runtime=60 -name="test" Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 3.54% io_uring-sq [kernel.kallsyms] [k] throtl_stats_update_completion 0.89% io_uring-sq [kernel.kallsyms] [k] throtl_bio_end_io 0.66% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio 0.05% io_uring-sq [kernel.kallsyms] [k] blk_throtl_stat_add 0.05% io_uring-sq [kernel.kallsyms] [k] throtl_track_latency 0.01% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio_endio Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 1.62% io_uring-sq [kernel.kallsyms] [k] io_submit_sqes 1.06% io_uring-sq [kernel.kallsyms] [k] io_issue_sqe 0.32% io_uring-sq [kernel.kallsyms] [k] __io_queue_sqe 0.06% io_uring-sq [kernel.kallsyms] [k] io_queue_sqe Above test doesn't set valid blk throttle rules, but the overhead introduced by blk throttle is even bigger than many io_uring framework functions, which is not acceptable. To improve this issue, only do do io statistics if users specify valid blk throttle rules, and this will also improve performance. Before this patch: clat (usec): min=5, max=6871, avg=18.70, stdev=17.89 lat (usec): min=9, max=6871, avg=18.84, stdev=17.89 WRITE: bw=1618MiB/s (1697MB/s), 1618MiB/s-1618MiB/s (1697MB/s-1697MB/s), io=94.8GiB (102GB), run=60001-60001msec With this patch: clat (usec): min=5, max=7554, avg=17.49, stdev=18.24 lat (usec): min=9, max=7554, avg=17.62, stdev=18.24 WRITE: bw=1727MiB/s (1810MB/s), 1727MiB/s-1727MiB/s (1810MB/s-1810MB/s), io=101GiB (109GB), run=60001-60001msec About 6.6% bps improvement and 6.4% latency reduction. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Dust Li 提交于
fix #29180329 CONFIG_REFCOUNT_FULL is used for debugging mainly, for release kernel, it's better to diable it. This patch disables both x86 and aarch64 for release kernel. This has a pretty large performance penalty for will-it-scale:signal1_process when the process number are large. Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Shile Zhang 提交于
to #27305291 Enabled the following configs for NVDIMM support: - CONFIG_ACPI_NFIT=m - CONFIG_NVDIMM_KEYS=y Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Dave Jiang 提交于
to #27305291 commit 037c8489ade669e0f09ad40d5b91e5e1159a14b1 upstream. Add a zero key in order to standardize hardware that want a key of 0's to be passed. Some platforms defaults to a zero-key with security enabled rather than allow the OS to enable the security. The zero key would allow us to manage those platform as well. This also adds a fix to secure erase so it can use the zero key to do crypto erase. Some other security commands already use zero keys. This introduces a standard zero-key to allow unification of semantics cross nvdimm security commands. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Dave Jiang 提交于
to #27305291 commit 1f4883f300da4f4d9d31eaa80f7debf6ce74843b upstream. Add theory of operation for the security support that's going into libnvdimm. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NJing Lin <jing.lin@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Dave Jiang 提交于
to #27305291 commit ecaa4a97b3908be0bf3ad12181ae8c44d1816d40 upstream. Adding test support for new Intel DSM from v1.8. The ability of simulating master passphrase update and master secure erase have been added to nfit_test. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-