- 02 9月, 2020 40 次提交
-
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 The peer ports add server_time, recv_time, recv_data statistics, and modify the upload keyword to recv, which is more common for local and peer. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 remove con_num from struct tcp_rt_stats, that is not used. Add new type TCPRT_TYPE_PEER_PORT_RANG, then stats_peer also use the two-dimensional array. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 _tcp_rt_stats is used to save the values returned from tcp_rt_stats. These values were originally 64-bit, and now stored in u32, some larger variables will overflow, so they are modified to 64-bit. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 1. Call atomic_read first and then atomic_set which will cause some statistics to be lost 2. The instruction of atomic class is relatively slow, and calling it twice for each variable in succession is a big harm to performance. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Use prefixes to distinguish between local and peer ports. Simplify the parameter length, and also consider that you can increase the filter conditions of the peer port range or peer address later. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Only count mrtt at the end of the connection. In the case of ports peer, if mrtt is counted in each output, it will cause repeated statistics. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 1. Using relay's overwrite working mode, when the buffer is full, the old data is discarded, and the new data is written to the buffer. 2. Since stats is triggered by a timer, there will be no concurrency, so the method of outputting one relay file per CPU is not suitable, so it is modified to have only one "rt-network-stats" file. 3. Use the relay open to transfer the parent directory tcp-rt. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Uniformly modify "check" "statis" to stats. Since some users do not use the stats function, and the stats function takes up some resources, modifying stats is not enabled by default. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 ports_range configured as '1000,2000,4000,6000', that means port range 1000-2000 or 4000-6000 will be monitored. If the user uses ports_range then the real function will have too many ports to count. If we apply space for all ports at once when loading the module, this is a waste of space, so we use a two-dimensional array to solve this problem. A continuous space is allocated as needed to save a certain amount of statistical information. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 TCP-RT is a kernel module for monitoring services at the tcp level. TCP-RT is essentially a trace method. By burying points in advance in the corresponding position in the kernel tcp protocol stack, we can identify the request and response from the scenario where there is only one concurrent request for a single connection, then collect the time when the request is received in the protocol stack and the time-consuming information on the processing of the service process and so on. In addition, TCP-RT also supports some statistical analysis in the kernel and periodically outputs statistical information about the specified connection. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Zelin Deng 提交于
fix #29335392 DPC stands for downstream port containment, it will halt PCIE traffic below a downstream port after an unmasked uncorrectable error is detected at or below the port, so that it can avoid the potential spread of any data corruption. Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Joseph Qi 提交于
to #29357063 The blkg lookup or create logic may bring much overhead even iocost is disabled. So bypass it earlier in such case. Fixes: 9da41925 ("alinux: iocost: fix NULL pointer dereference in ioc_rqos_throttle") Reported-by: NHongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 youngjun 提交于
to #29273482 When "ovl_is_inuse" true case, trap inode reference not put. plus adding the comment explaining sequence of ovl_is_inuse after ovl_setup_trap. Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection") Cc: <stable@vger.kernel.org> # v4.19+ Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: Nyoungjun <her0gyugyu@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Link: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs-next&id=24f14009b8f1754ec2ae4c168940c01259b0f88aSigned-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Dust Li 提交于
fix #29262413 The original patch didn't fit 4.19 well since upstream kernel use #if HAVE_ATTR_TEST xxx #endif but in 4.19, we use #ifdef HAVE_ATTR_TEST xxx #endif As a result, the origin patch enabled the macro in #ifdef HAVE_ATTR_TEST and finnaly cause the build fail when run: $make M=samples/bpf make -C /mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../ O= Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' CC samples/bpf/syscall_nrs.s UPD samples/bpf/syscall_nrs.h HOSTCC samples/bpf/test_lru_dist HOSTCC samples/bpf/sock_example HOSTCC samples/bpf/bpf_load.o In file included from ./tools/perf/perf-sys.h:9:0, from samples/bpf/bpf_load.c:29: ./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’: ./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function) if (unlikely(test_attr__enabled)) ^ ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ # define unlikely(x) __builtin_expect(!!(x), 0) ^ ./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in if (unlikely(test_attr__enabled)) ^ ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ # define unlikely(x) __builtin_expect(!!(x), 0) ^ In file included from samples/bpf/bpf_load.c:29:0: ./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration] test_attr__open(attr, pid, cpu, fd, group_fd, flags); ^~~~~~~~~~~~~~~ make[1]: *** [samples/bpf/bpf_load.o] Error 1 make: *** [_module_samples/bpf] Error 2 This reverts commit 4665759a. Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 KP Singh 提交于
to #29262413 commit 98beb3edeb974e906a81f305d88f7bc96b2ec83e upstream. This was added in commit eb111869301e ("compiler-types.h: add asm_inline definition") and breaks samples/bpf as clang does not support asm __inline. Fixes: eb111869301e ("compiler-types.h: add asm_inline definition") Co-developed-by: NFlorent Revest <revest@google.com> Signed-off-by: NFlorent Revest <revest@google.com> Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NSong Liu <songliubraving@fb.com> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191002191652.11432-1-kpsingh@chromium.orgSigned-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Alexei Starovoitov 提交于
to #29262413 commit 636e78b1cdb40b77a79b143dbd9d94847b360efa upstream. clang started to error on invalid asm clobber usage in x86 headers and many bpf program samples failed to build with the message: CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14: In file included from ../include/linux/in.h:23: In file included from ../include/uapi/linux/in.h:24: In file included from ../include/linux/socket.h:8: In file included from ../include/linux/uio.h:14: In file included from ../include/crypto/hash.h:16: In file included from ../include/linux/crypto.h:26: In file included from ../include/linux/uaccess.h:5: In file included from ../include/linux/sched.h:15: In file included from ../include/linux/sem.h:5: In file included from ../include/uapi/linux/sem.h:5: In file included from ../include/linux/ipc.h:9: In file included from ../include/linux/refcount.h:72: ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "er", i, "cx"); ^ ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "cx"); ^ 2 errors generated. Override volatile() to workaround the problem. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yonghong Song 提交于
fix #29255532 commit 6bf3bbe1f4d4cf405e3c2bf07bbdff56d3223ec8 upstream. x86 compilation has required asm goto support since 4.17. Since clang does not support asm goto, at 4.17, Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation") worked around the issue by permitting an alternative implementation without asm goto for clang. At 5.0, more asm goto usages appeared. [yhs@148 x86]$ egrep -r asm_volatile_goto include/asm/cpufeature.h: asm_volatile_goto("1: jmp 6f\n" include/asm/jump_label.h: asm_volatile_goto("1:" include/asm/jump_label.h: asm_volatile_goto("1:" include/asm/rmwcc.h: asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ include/asm/uaccess.h: asm_volatile_goto("\n" \ include/asm/uaccess.h: asm_volatile_goto("\n" \ [yhs@148 x86]$ Compiling samples/bpf directories, most bpf programs failed compilation with error messages like: In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2: In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6: In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15: In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5: In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5: In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9: In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72: /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", ^ /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc' __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc, \ ^ /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc' asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ ^ /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto' #define asm_volatile_goto(x...) asm goto(x) Most implementation does not even provide an alternative implementation. And it is also not practical to make changes for each call site. This patch workarounded the asm goto issue by redefining the macro like below: #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto") If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes. With this patch, all bpf programs under samples/bpf can pass compilation. Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as they do not access kernel internal headers. Fixes: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition") Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #29276773 commit 16d598030a37853a7a6b4384cad19c9c0af2f021 upstream. 59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing io_req_init_async(), but left out work.flags and hash. Do it earlier. Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #29276773 commit dd821e0c95a64b5923a0c57f07d3f7563553e756 upstream. Ensure to set msg.msg_name for the async portion of send/recvmsg, as the header copy will copy to/from it. Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
to #29276773 commit 309fc03a3284af62eb6082fb60327045a1dabf57 upstream. We currently account the memory after the exit work has been run, but that leaves a gap where a process has closed its ring and until the memory has been accounted as freed. If the memlocked ulimit is borderline, then that can introduce spurious setup errors returning -ENOMEM because the free work hasn't been run yet. Account this as freed when we close the ring, as not to expose a tiny gap where setting up a new ring can fail. Fixes: 85faa7b8346e ("io_uring: punt final io_ring_ctx wait-and-free to workqueue") Cc: stable@vger.kernel.org # v5.7 Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yang Yingliang 提交于
to #29276773 commit 667e57da358f61b6966e12e925a69e42d912e8bb upstream. I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0x607eeac06e78 (size 8): comm "test", pid 295, jiffies 4294735835 (age 31.745s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0 [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0 [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480 [<00000000591b89a6>] do_syscall_64+0x56/0xa0 [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Call percpu_ref_exit() on error path to avoid refcount memleak. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Cc: stable@vger.kernel.org Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yang Yingliang 提交于
to #29276773 commit f3bd9dae3708a0ff6b067e766073ffeb853301f9 upstream. I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0xffff888113e02300 (size 488): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`....... backtrace: [<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 BUG: memory leak unreferenced object 0xffff8881152dd5e0 (size 16): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 16 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline] [<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440 [<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 If io_sqe_file_register() failed, we need put the file that get by fget() to avoid the memleak. Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Cc: stable@vger.kernel.org Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 David Howells 提交于
task #29263287 commit 3f19b2ab97a97b413c24b66c67ae16daa4f56c35 upstream Make the inode hash table RCU searchable so that searches that want to access or modify an inode without taking a ref on that inode can do so without taking the inode hash table lock. The main thing this requires is some RCU annotation on the list manipulation operations. Inodes are already freed by RCU in most cases. Users of this interface must take care as the inode may be still under construction or may be being torn down around them. There are at least three instances where this can be of use: (1) Testing whether the inode number iunique() is going to return is currently unique (the iunique_lock is still held). (2) Ext4 date stamp updating. (3) AFS callback breaking. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> cc: linux-ext4@vger.kernel.org cc: linux-afs@lists.infradead.org [jeffle: resolve collision in afs_break_one_callback since code base change] Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
to #29233603 commit 6d5f904904608a9cd32854d7d0a4dd65b27f9935 upstream For those applications which are not willing to use io_uring_enter() to reap and handle cqes, they may completely rely on liburing's io_uring_peek_cqe(), but if cq ring has overflowed, currently because io_uring_peek_cqe() is not aware of this overflow, it won't enter kernel to flush cqes, below test program can reveal this bug: static void test_cq_overflow(struct io_uring *ring) { struct io_uring_cqe *cqe; struct io_uring_sqe *sqe; int issued = 0; int ret = 0; do { sqe = io_uring_get_sqe(ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); break;; } ret = io_uring_submit(ring); if (ret <= 0) { if (ret != -EBUSY) fprintf(stderr, "sqe submit failed: %d\n", ret); break; } issued++; } while (ret > 0); assert(ret == -EBUSY); printf("issued requests: %d\n", issued); while (issued) { ret = io_uring_peek_cqe(ring, &cqe); if (ret) { if (ret != -EAGAIN) { fprintf(stderr, "peek completion failed: %s\n", strerror(ret)); break; } printf("left requets: %d\n", issued); continue; } io_uring_cqe_seen(ring, cqe); issued--; printf("left requets: %d\n", issued); } } int main(int argc, char *argv[]) { int ret; struct io_uring ring; ret = io_uring_queue_init(16, &ring, 0); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } test_cq_overflow(&ring); return 0; } To fix this issue, export cq overflow status to userspace by adding new IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc upstream The following commit introduced a warning on error reports without a non-zero grain value. 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") The amd64_edac_mod module does not provide a value, so the warning will be given on the first reported memory error. Set the grain per DIMM to cacheline size (64 bytes). This is the current recommendation. Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation") Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit e53a3b267fb0a79db9ca1f1e08b97889b22013e6 upstream Chip Select memory size reporting on AMD Family 17h was recently fixed in order to account for interleaving. However, the current method is not robust. The Chip Select Address Mask can be used to find the memory size. There are a couple of cases. 1) For single-rank and dual-rank non-interleaved, use the address mask plus 1 as the size. 2) For dual-rank interleaved, do #1 but "de-interleave" the address mask first. Always "de-interleave" the address mask in order to simplify the code flow. Bit mask manipulation is necessary to check for interleaving, so just go ahead and do the de-interleaving. In the non-interleaved case, the original and de-interleaved address masks will be the same. To de-interleave the mask, count the number of zero bits in the middle of the mask and swap them with the most significant bits. For example, Original=0xFFFF9FE, De-interleaved=0x3FFFFFE Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit 353a1fcb8f9e5857c0fb720b9e57a86c1fb7c17e upstream Currently, the DIMM info for AMD Family 17h systems is initialized in init_csrows(). This function is shared with legacy systems, and it has a limit of two channel support. This prevents initialization of the DIMM info for a number of ranks, so there will be missing ranks in the EDAC sysfs. Create a new init_csrows_df() for Family17h+ and revert init_csrows() back to pre-Family17h support. Loop over all channels in the new function in order to support systems with more than two channels. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-4-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Yazen Ghannam 提交于
fix #29035167 commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream The struct chip_select array that's used for saving chip select bases and masks is fixed at length of two. There should be one struct chip_select for each controller, so this array should be increased to support systems that may have more than two controllers. Increase the size of the struct chip_select array to eight, which is the largest number of controllers per die currently supported on AMD systems. Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per channel. Also, carve out the Family 17h+ reading of the bases/masks into a separate function. This effectively reverts the original bases/masks reading code to before Family 17h support was added. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Borislav Petkov 提交于
fix #29035167 commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea. Unfortunately, this commit caused wrong detection of chip select sizes on some F17h client machines: --- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100 +++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200 EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: UMC1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0M Revert it for now until it has been solved properly. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit e48667b865480d8bf0f1171a8b474ffc785b9ace upstream Family 19h introduces change in slice, core and thread specification in its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is incompatible with Family 17h's version of the register. Introduce a new path in l3_thread_slice_mask() to do things differently for Family 19h vs. Family 17h, otherwise the new hardware doesn't get programmed correctly. Instead of a linear core--thread bitmask, Family 19h takes an encoded core number, and a separate thread mask. There are new bits that are set for all cores and all slices, of which only the latter is used, since the driver counts events for all slices on behalf of the specified CPU. Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode decision based on Family 17h or above, not just 17h and 18h: the Family 19h Data Fabric PMC is compatible with the Family 17h DF PMC. [ bp: Touchups. ] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 9689dbbeaea884d19e3085439c6a247ef986b2af upstream Convert the l3_thread_slice_mask() function to use the more readable topology_* helper functions, more intuitive variable names like shift and thread_mask, and BIT_ULL(). No functional changes. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 4dcc3df82573a946c620dda5fb00e27c7b080105 upstream In order to better accommodate the upcoming Family 19h, given the 80-char line limit, move the existing code into a new l3_thread_slice_mask() function. No functional changes. [ bp: Touchups. ] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 753039ef8b2f1078e5bff8cd42f80578bf6385b0 upstream Family 19h CPUs are Zen-based and still share most architectural features with Family 17h CPUs, and therefore still need to call init_amd_zn() e.g., to set the RECLAIM_DISTANCE override. init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used in amd_set_core_ssb_state(), which isn't called on some late model Family 17h CPUs, nor on any Family 19h CPUs: X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those later model CPUs, where the SSBD mitigation is done via the SPEC_CTRL MSR instead of the LS_CFG MSR. Family 19h CPUs also don't have the erratum where the CPB feature bit isn't set, but that code can stay unchanged and run safely on Family 19h. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 5738891229a25e9e678122a843cbf0466a456d0c upstream Description of hardware operation --------------------------------- The core AMD PMU has a 4-bit wide per-cycle increment for each performance monitor counter. That works for most events, but now with AMD Family 17h and above processors, some events can occur more than 15 times in a cycle. Those events are called "Large Increment per Cycle" events. In order to count these events, two adjacent h/w PMCs get their count signals merged to form 8 bits per cycle total. In addition, the PERF_CTR count registers are merged to be able to count up to 64 bits. Normally, events like instructions retired, get programmed on a single counter like so: PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count The next counter at MSRs 0xc0010202-3 remains unused, or can be used independently to count something else. When counting Large Increment per Cycle events, such as FLOPs, however, we now have to reserve the next counter and program the PERF_CTL (config) register with the Merge event (0xFFF), like so: PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count The count is widened from the normal 48-bits to 64 bits by having the second counter carry the higher 16 bits of the count in its lower 16 bits of its counter register. The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge event before the even counter, PERF_CTL0. The Large Increment feature is available starting with Family 17h. For more details, search any Family 17h PPR for the "Large Increment per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this version: https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip Description of software operation --------------------------------- The following steps are taken in order to support reserving and enabling the extra counter for Large Increment per Cycle events: 1. In the main x86 scheduler, we reduce the number of available counters by the number of Large Increment per Cycle events being scheduled, tracked by a new cpuc variable 'n_pair' and a new amd_put_event_constraints_f17h(). This improves the counter scheduler success rate. 2. In perf_assign_events(), if a counter is assigned to a Large Increment event, we increment the current counter variable, so the counter used for the Merge event is removed from assignment consideration by upcoming event assignments. 3. In find_counter(), if a counter has been found for the Large Increment event, we set the next counter as used, to prevent other events from using it. 4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e., we add Merge event accounting to the existing used_mask logic. 5. Finally, we add on the programming of Merge event to the neighbouring PMC counters in the counter enable/disable{_all} code paths. Currently, software does not support a single PMU with mixed 48- and 64-bit counting, so Large increment event counts are limited to 48 bits. In set_period, we zero-out the upper 16 bits of the count, so the hardware doesn't copy them to the even counter's higher bits. Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm vaddps instruction executed in a loop 100 million times: perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload> Performance counter stats for '<workload>': 800,000,000 cpu/fp_ret_sse_avx_ops.all/u 300,042,101 cpu/instructions/u Prior to this patch, the reported SSE/AVX FLOPs retired count would be wrong. [peterz: lots of renames and edits to the code] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Kim Phillips 提交于
fix #29035100 commit 471af006a747f1c535c8a8c6c0973c320fe01b22 upstream AMD Family 17h processors and above gain support for Large Increment per Cycle events. Unfortunately there is no CPUID or equivalent bit that indicates whether the feature exists or not, so we continue to determine eligibility based on a CPU family number comparison. For Large Increment per Cycle events, we add a f17h-and-compatibles get_event_constraints_f17h() that returns an even counter bitmask: Large Increment per Cycle events can only be placed on PMCs 0, 2, and 4 out of the currently available 0-5. The only currently public event that requires this feature to report valid counts is PMCx003 "Retired SSE/AVX Operations". Note that the CPU family logic in amd_core_pmu_init() is changed so as to be able to selectively add initialization for features available in ranges of backward-compatible CPU families. This Large Increment per Cycle feature is expected to be retained in future families. A side-effect of assigning a new get_constraints function for f17h disables calling the old (prior to f15h) amd_get_event_constraints implementation left enabled by commit e40ed154 ("perf/x86: Add perf support for AMD family-17h processors"), which is no longer necessary since those North Bridge event codes are obsoleted. Also fix a spelling mistake whilst in the area (calulating -> calculating). Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Reinette Chatre 提交于
fix #29035100 commit 1182a49529edde899be4b4f0e1ab76e626976eb6 upstream perf_event_read_local() is the safest way to obtain measurements associated with performance events. In some cases the overhead introduced by perf_event_read_local() affects the measurements and the use of rdpmcl() is needed. rdpmcl() requires the index of the performance counter used so a helper is introduced to determine the index used by a provided performance event. The index used by a performance event may change when interrupts are enabled. A check is added to ensure that the index is only accessed with interrupts disabled. Even with this check the use of this counter needs to be done with care to ensure it is queried and used within the same disabled interrupts section. This change introduces a new checkpatch warning: CHECK: extern prototypes should be avoided in .h files +extern int x86_perf_rdpmc_index(struct perf_event *event); This warning was discussed and designated as a false positive in http://lkml.kernel.org/r/20180919091759.GZ24124@hirez.programming.kicks-ass.netSuggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/b277ffa78a51254f5414f7b1bc1923826874566e.1537377064.git.reinette.chatre@intel.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Dust Li 提交于
to #29272054 AF_XDP is a new AF family that support usespace applications communicate with XDP program directly. One promising use case is UDP Both x86_64 and aarch64 are enabled Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-