提交 · e166c2000fd3c984378f955df7cdbb2dbcf6ef8a · openanolis / cloud-kernel

02 9月, 2020 40 次提交

alinux: blk-iocost: bypass IOs earlier if disabled · e166c200

由 Joseph Qi 提交于 7月 16, 2020

to #29357063

The blkg lookup or create logic may bring much overhead even iocost is
disabled. So bypass it earlier in such case.

Fixes: 9da41925 ("alinux: iocost: fix NULL pointer dereference in ioc_rqos_throttle")
Reported-by: NHongnan Li <hongnan.li@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

e166c200

ovl: inode reference leak in ovl_is_inuse true case. · b247d8a6

由 youngjun 提交于 6月 16, 2020

to #29273482

When "ovl_is_inuse" true case, trap inode reference not put. plus adding
the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.

Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection")
Cc: <stable@vger.kernel.org> # v4.19+
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs-next&id=24f14009b8f1754ec2ae4c168940c01259b0f88aSigned-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b247d8a6

Revert "samples/bpf: fix build by setting HAVE_ATTR_TEST to zero" · eaf8003a

由 Dust Li 提交于 7月 10, 2020

fix #29262413

The original patch didn't fit 4.19 well since upstream kernel use
  #if HAVE_ATTR_TEST
  xxx
  #endif

but in 4.19, we use
  #ifdef HAVE_ATTR_TEST
  xxx
  #endif

As a result, the origin patch enabled the macro in #ifdef HAVE_ATTR_TEST
and finnaly cause the build fail when run:

$make M=samples/bpf
make -C /mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../ O=
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
  CC      samples/bpf/syscall_nrs.s
  UPD     samples/bpf/syscall_nrs.h
  HOSTCC  samples/bpf/test_lru_dist
  HOSTCC  samples/bpf/sock_example
  HOSTCC  samples/bpf/bpf_load.o
In file included from ./tools/perf/perf-sys.h:9:0,
                 from samples/bpf/bpf_load.c:29:
./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’:
./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function)
  if (unlikely(test_attr__enabled))
               ^
./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’
 # define unlikely(x)  __builtin_expect(!!(x), 0)
                                           ^
./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in
  if (unlikely(test_attr__enabled))
               ^
./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’
 # define unlikely(x)  __builtin_expect(!!(x), 0)
                                           ^
In file included from samples/bpf/bpf_load.c:29:0:
./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration]
   test_attr__open(attr, pid, cpu, fd, group_fd, flags);
   ^~~~~~~~~~~~~~~
make[1]: *** [samples/bpf/bpf_load.o] Error 1
make: *** [_module_samples/bpf] Error 2

This reverts commit 4665759a.
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

eaf8003a

samples/bpf: Add a workaround for asm_inline · 867c8a45

由 KP Singh 提交于 10月 02, 2019

to #29262413

commit 98beb3edeb974e906a81f305d88f7bc96b2ec83e upstream.

This was added in commit eb111869301e ("compiler-types.h: add asm_inline
definition") and breaks samples/bpf as clang does not support asm __inline.

Fixes: eb111869301e ("compiler-types.h: add asm_inline definition")
Co-developed-by: NFlorent Revest <revest@google.com>
Signed-off-by: NFlorent Revest <revest@google.com>
Signed-off-by: NKP Singh <kpsingh@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191002191652.11432-1-kpsingh@chromium.orgSigned-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

867c8a45

samples/bpf: fix build with new clang · 3637cf04

由 Alexei Starovoitov 提交于 4月 04, 2019

to #29262413

commit 636e78b1cdb40b77a79b143dbd9d94847b360efa upstream.

clang started to error on invalid asm clobber usage in x86 headers
and many bpf program samples failed to build with the message:

  CLANG-bpf  /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
In file included from ../include/linux/in.h:23:
In file included from ../include/uapi/linux/in.h:24:
In file included from ../include/linux/socket.h:8:
In file included from ../include/linux/uio.h:14:
In file included from ../include/crypto/hash.h:16:
In file included from ../include/linux/crypto.h:26:
In file included from ../include/linux/uaccess.h:5:
In file included from ../include/linux/sched.h:15:
In file included from ../include/linux/sem.h:5:
In file included from ../include/uapi/linux/sem.h:5:
In file included from ../include/linux/ipc.h:9:
In file included from ../include/linux/refcount.h:72:
../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
                                         r->refs.counter, e, "er", i, "cx");
                                                                      ^
../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
                                         r->refs.counter, e, "cx");
                                                             ^
2 errors generated.

Override volatile() to workaround the problem.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3637cf04

samples/bpf: workaround clang asm goto compilation errors · 81aa61c4

由 Yonghong Song 提交于 1月 12, 2019

fix #29255532

commit 6bf3bbe1f4d4cf405e3c2bf07bbdff56d3223ec8 upstream.

x86 compilation has required asm goto support since 4.17.
Since clang does not support asm goto, at 4.17,
Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage
for BPF compilation") worked around the issue by permitting an
alternative implementation without asm goto for clang.

At 5.0, more asm goto usages appeared.
  [yhs@148 x86]$ egrep -r asm_volatile_goto
  include/asm/cpufeature.h:     asm_volatile_goto("1: jmp 6f\n"
  include/asm/jump_label.h:     asm_volatile_goto("1:"
  include/asm/jump_label.h:     asm_volatile_goto("1:"
  include/asm/rmwcc.h:  asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"     \
  include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
  include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
  [yhs@148 x86]$

Compiling samples/bpf directories, most bpf programs failed
compilation with error messages like:
  In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2:
  In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6:
  In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15:
  In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5:
  In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5:
  In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9:
  In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72:
  /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet
        return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
               ^
  /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc'
        __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc,           \
        ^
  /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc'
        asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"             \
        ^
  /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto'
  #define asm_volatile_goto(x...) asm goto(x)

Most implementation does not even provide an alternative
implementation. And it is also not practical to make changes
for each call site.

This patch workarounded the asm goto issue by redefining the macro like below:
  #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")

If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad
will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler
will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes.

With this patch, all bpf programs under samples/bpf can pass compilation.

Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as
they do not access kernel internal headers.

Fixes: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")
Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition")
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

81aa61c4

io_uring: fix not initialised work->flags · 109831ff

由 Pavel Begunkov 提交于 7月 12, 2020

to #29276773

commit 16d598030a37853a7a6b4384cad19c9c0af2f021 upstream.

59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing
io_req_init_async(), but left out work.flags and hash. Do it earlier.

Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

109831ff

io_uring: fix missing msg_name assignment · 589ee219

由 Pavel Begunkov 提交于 7月 12, 2020

to #29276773

commit dd821e0c95a64b5923a0c57f07d3f7563553e756 upstream.

Ensure to set msg.msg_name for the async portion of send/recvmsg,
as the header copy will copy to/from it.

Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

589ee219

io_uring: account user memory freed when exit has been queued · 8c9ebb73

由 Jens Axboe 提交于 7月 10, 2020

to #29276773

commit 309fc03a3284af62eb6082fb60327045a1dabf57 upstream.

We currently account the memory after the exit work has been run, but
that leaves a gap where a process has closed its ring and until the
memory has been accounted as freed. If the memlocked ulimit is
borderline, then that can introduce spurious setup errors returning
-ENOMEM because the free work hasn't been run yet.

Account this as freed when we close the ring, as not to expose a tiny
gap where setting up a new ring can fail.

Fixes: 85faa7b8346e ("io_uring: punt final io_ring_ctx wait-and-free to workqueue")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8c9ebb73

io_uring: fix memleak in io_sqe_files_register() · c1f5f815

由 Yang Yingliang 提交于 7月 10, 2020

to #29276773

commit 667e57da358f61b6966e12e925a69e42d912e8bb upstream.

I got a memleak report when doing some fuzz test:

BUG: memory leak
unreferenced object 0x607eeac06e78 (size 8):
  comm "test", pid 295, jiffies 4294735835 (age 31.745s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0
    [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0
    [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480
    [<00000000591b89a6>] do_syscall_64+0x56/0xa0
    [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Call percpu_ref_exit() on error path to avoid
refcount memleak.

Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Cc: stable@vger.kernel.org
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c1f5f815

io_uring: fix memleak in __io_sqe_files_update() · 4b392775

由 Yang Yingliang 提交于 7月 09, 2020

to #29276773

commit f3bd9dae3708a0ff6b067e766073ffeb853301f9 upstream.

I got a memleak report when doing some fuzz test:

BUG: memory leak
unreferenced object 0xffff888113e02300 (size 488):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`.......
backtrace:
[<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0xffff8881152dd5e0 (size 16):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 16 bytes):
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline]
[<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440
[<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

If io_sqe_file_register() failed, we need put the file that get by fget()
to avoid the memleak.

Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Cc: stable@vger.kernel.org
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4b392775

vfs, afs, ext4: Make the inode hash table RCU searchable · dc136109

由 David Howells 提交于 12月 01, 2017

task #29263287

commit 3f19b2ab97a97b413c24b66c67ae16daa4f56c35 upstream

Make the inode hash table RCU searchable so that searches that want to
access or modify an inode without taking a ref on that inode can do so
without taking the inode hash table lock.

The main thing this requires is some RCU annotation on the list
manipulation operations.  Inodes are already freed by RCU in most cases.

Users of this interface must take care as the inode may be still under
construction or may be being torn down around them.

There are at least three instances where this can be of use:

 (1) Testing whether the inode number iunique() is going to return is
     currently unique (the iunique_lock is still held).

 (2) Ext4 date stamp updating.

 (3) AFS callback breaking.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
cc: linux-ext4@vger.kernel.org
cc: linux-afs@lists.infradead.org
[jeffle: resolve collision in afs_break_one_callback since code base change]
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

dc136109

io_uring: export cq overflow status to userspace · c7f865a4

由 Xiaoguang Wang 提交于 7月 09, 2020

to #29233603

commit 6d5f904904608a9cd32854d7d0a4dd65b27f9935 upstream

For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:

static void test_cq_overflow(struct io_uring *ring)
{
        struct io_uring_cqe *cqe;
        struct io_uring_sqe *sqe;
        int issued = 0;
        int ret = 0;

        do {
                sqe = io_uring_get_sqe(ring);
                if (!sqe) {
                        fprintf(stderr, "get sqe failed\n");
                        break;;
                }
                ret = io_uring_submit(ring);
                if (ret <= 0) {
                        if (ret != -EBUSY)
                                fprintf(stderr, "sqe submit failed: %d\n", ret);
                        break;
                }
                issued++;
        } while (ret > 0);
        assert(ret == -EBUSY);

        printf("issued requests: %d\n", issued);

        while (issued) {
                ret = io_uring_peek_cqe(ring, &cqe);
                if (ret) {
                        if (ret != -EAGAIN) {
                                fprintf(stderr, "peek completion failed: %s\n",
                                        strerror(ret));
                                break;
                        }
                        printf("left requets: %d\n", issued);
                        continue;
                }
                io_uring_cqe_seen(ring, cqe);
                issued--;
                printf("left requets: %d\n", issued);
        }
}

int main(int argc, char *argv[])
{
        int ret;
        struct io_uring ring;

        ret = io_uring_queue_init(16, &ring, 0);
        if (ret) {
                fprintf(stderr, "ring setup failed: %d\n", ret);
                return 1;
        }

        test_cq_overflow(&ring);
        return 0;
}

To fix this issue, export cq overflow status to userspace by adding new
IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as
io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c7f865a4

EDAC/amd64: Set grain per DIMM · 9d090926

由 Yazen Ghannam 提交于 10月 22, 2019

fix #29035167

commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc upstream

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9d090926

EDAC/amd64: Find Chip Select memory size using Address Mask · ac316351

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit e53a3b267fb0a79db9ca1f1e08b97889b22013e6 upstream

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a couple of cases.

1) For single-rank and dual-rank non-interleaved, use the address mask
plus 1 as the size.

2) For dual-rank interleaved, do #1 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead and do the de-interleaving. In the non-interleaved case,
the original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0xFFFF9FE, De-interleaved=0x3FFFFFE
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

ac316351

EDAC/amd64: Initialize DIMM info for systems with more than two channels · b635b9c9

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit 353a1fcb8f9e5857c0fb720b9e57a86c1fb7c17e upstream

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-4-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

b635b9c9

EDAC/amd64: Support more than two controllers for chip selects handling · 47af478f

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because
AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

47af478f

Revert "EDAC/amd64: Support more than two controllers for chip select handling" · b7e38109

由 Borislav Petkov 提交于 4月 25, 2019

fix #29035167

commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream

This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea.

Unfortunately, this commit caused wrong detection of chip select sizes
on some F17h client machines:

  --- 00-rc6+     2019-02-14 14:28:03.126622904 +0100
  +++ 01-rc4+     2019-04-14 21:06:16.060614790 +0200
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0MB
   EDAC MC: UMC1 chip selects:
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0M

Revert it for now until it has been solved properly.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

b7e38109

perf/amd/uncore: Add support for Family 19h L3 PMU · efaf304e

由 Kim Phillips 提交于 3月 13, 2020

fix #29035100

commit e48667b865480d8bf0f1171a8b474ffc785b9ace upstream

Family 19h introduces change in slice, core and thread specification in
its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is
incompatible with Family 17h's version of the register.

Introduce a new path in l3_thread_slice_mask() to do things differently
for Family 19h vs. Family 17h, otherwise the new hardware doesn't get
programmed correctly.

Instead of a linear core--thread bitmask, Family 19h takes an encoded
core number, and a separate thread mask. There are new bits that are set
for all cores and all slices, of which only the latter is used, since
the driver counts events for all slices on behalf of the specified CPU.

Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode
decision based on Family 17h or above, not just 17h and 18h: the Family
19h Data Fabric PMC is compatible with the Family 17h DF PMC.

 [ bp: Touchups. ]
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

efaf304e

perf/amd/uncore: Make L3 thread mask code more readable · e4e69222

由 Kim Phillips 提交于 3月 13, 2020

fix #29035100

commit 9689dbbeaea884d19e3085439c6a247ef986b2af upstream

Convert the l3_thread_slice_mask() function to use the more readable
topology_* helper functions, more intuitive variable names like shift
and thread_mask, and BIT_ULL().

No functional changes.
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

e4e69222

perf/amd/uncore: Prepare L3 thread mask code for Family 19h · 4c6a1c15

由 Kim Phillips 提交于 3月 13, 2020

fix #29035100

commit 4dcc3df82573a946c620dda5fb00e27c7b080105 upstream

In order to better accommodate the upcoming Family 19h, given
the 80-char line limit, move the existing code into a new
l3_thread_slice_mask() function.

No functional changes.

 [ bp: Touchups. ]
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

4c6a1c15

x86/cpu/amd: Call init_amd_zn() om Family 19h processors too · 696fcb77

由 Kim Phillips 提交于 3月 11, 2020

fix #29035100

commit 753039ef8b2f1078e5bff8cd42f80578bf6385b0 upstream

Family 19h CPUs are Zen-based and still share most architectural
features with Family 17h CPUs, and therefore still need to call
init_amd_zn() e.g., to set the RECLAIM_DISTANCE override.

init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used
in amd_set_core_ssb_state(), which isn't called on some late
model Family 17h CPUs, nor on any Family 19h CPUs:
X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those
later model CPUs, where the SSBD mitigation is done via the
SPEC_CTRL MSR instead of the LS_CFG MSR.

Family 19h CPUs also don't have the erratum where the CPB feature
bit isn't set, but that code can stay unchanged and run safely
on Family 19h.
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

696fcb77

perf/x86/amd: Add support for Large Increment per Cycle Events · 52ef78f4

由 Kim Phillips 提交于 11月 14, 2019

fix #29035100

commit 5738891229a25e9e678122a843cbf0466a456d0c upstream

Description of hardware operation
---------------------------------

The core AMD PMU has a 4-bit wide per-cycle increment for each
performance monitor counter. That works for most events, but
now with AMD Family 17h and above processors, some events can
occur more than 15 times in a cycle. Those events are called
"Large Increment per Cycle" events. In order to count these
events, two adjacent h/w PMCs get their count signals merged
to form 8 bits per cycle total. In addition, the PERF_CTR count
registers are merged to be able to count up to 64 bits.

Normally, events like instructions retired, get programmed on a single
counter like so:

PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff
PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count

The next counter at MSRs 0xc0010202-3 remains unused, or can be used
independently to count something else.

When counting Large Increment per Cycle events, such as FLOPs,
however, we now have to reserve the next counter and program the
PERF_CTL (config) register with the Merge event (0xFFF), like so:

PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff
PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b
PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit
PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count

The count is widened from the normal 48-bits to 64 bits by having the
second counter carry the higher 16 bits of the count in its lower 16
bits of its counter register.

The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge
event before the even counter, PERF_CTL0.

The Large Increment feature is available starting with Family 17h.
For more details, search any Family 17h PPR for the "Large Increment
per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this
version:

https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip

Description of software operation
---------------------------------

The following steps are taken in order to support reserving and
enabling the extra counter for Large Increment per Cycle events:

1. In the main x86 scheduler, we reduce the number of available
counters by the number of Large Increment per Cycle events being
scheduled, tracked by a new cpuc variable 'n_pair' and a new
amd_put_event_constraints_f17h(). This improves the counter
scheduler success rate.

2. In perf_assign_events(), if a counter is assigned to a Large
Increment event, we increment the current counter variable, so the
counter used for the Merge event is removed from assignment
consideration by upcoming event assignments.

3. In find_counter(), if a counter has been found for the Large
Increment event, we set the next counter as used, to prevent other
events from using it.

4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e.,
we add Merge event accounting to the existing used_mask logic.

5. Finally, we add on the programming of Merge event to the
neighbouring PMC counters in the counter enable/disable{_all}
code paths.

Currently, software does not support a single PMU with mixed 48- and
64-bit counting, so Large increment event counts are limited to 48
bits. In set_period, we zero-out the upper 16 bits of the count, so
the hardware doesn't copy them to the even counter's higher bits.

Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm
vaddps instruction executed in a loop 100 million times:

perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload>

Performance counter stats for '<workload>':

800,000,000 cpu/fp_ret_sse_avx_ops.all/u
300,042,101 cpu/instructions/u

Prior to this patch, the reported SSE/AVX FLOPs retired count would
be wrong.

[peterz: lots of renames and edits to the code]
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

52ef78f4

perf/x86/amd: Constrain Large Increment per Cycle events · e29ecc26

由 Kim Phillips 提交于 11月 14, 2019

fix #29035100

commit 471af006a747f1c535c8a8c6c0973c320fe01b22 upstream

AMD Family 17h processors and above gain support for Large Increment
per Cycle events.  Unfortunately there is no CPUID or equivalent bit
that indicates whether the feature exists or not, so we continue to
determine eligibility based on a CPU family number comparison.

For Large Increment per Cycle events, we add a f17h-and-compatibles
get_event_constraints_f17h() that returns an even counter bitmask:
Large Increment per Cycle events can only be placed on PMCs 0, 2,
and 4 out of the currently available 0-5.  The only currently
public event that requires this feature to report valid counts
is PMCx003 "Retired SSE/AVX Operations".

Note that the CPU family logic in amd_core_pmu_init() is changed
so as to be able to selectively add initialization for features
available in ranges of backward-compatible CPU families.  This
Large Increment per Cycle feature is expected to be retained
in future families.

A side-effect of assigning a new get_constraints function for f17h
disables calling the old (prior to f15h) amd_get_event_constraints
implementation left enabled by commit e40ed154 ("perf/x86: Add perf
support for AMD family-17h processors"), which is no longer
necessary since those North Bridge event codes are obsoleted.

Also fix a spelling mistake whilst in the area (calulating ->
calculating).

Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

e29ecc26

perf/x86: Add helper to obtain performance counter index · bfd2cd67

由 Reinette Chatre 提交于 9月 19, 2018

fix #29035100

commit 1182a49529edde899be4b4f0e1ab76e626976eb6 upstream

perf_event_read_local() is the safest way to obtain measurements
associated with performance events. In some cases the overhead
introduced by perf_event_read_local() affects the measurements and the
use of rdpmcl() is needed. rdpmcl() requires the index
of the performance counter used so a helper is introduced to determine
the index used by a provided performance event.

The index used by a performance event may change when interrupts are
enabled. A check is added to ensure that the index is only accessed
with interrupts disabled. Even with this check the use of this counter
needs to be done with care to ensure it is queried and used within the
same disabled interrupts section.

This change introduces a new checkpatch warning:
CHECK: extern prototypes should be avoided in .h files
+extern int x86_perf_rdpmc_index(struct perf_event *event);

This warning was discussed and designated as a false positive in
http://lkml.kernel.org/r/20180919091759.GZ24124@hirez.programming.kicks-ass.netSuggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b277ffa78a51254f5414f7b1bc1923826874566e.1537377064.git.reinette.chatre@intel.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

bfd2cd67

configs: enable AF_XDP socket by default · ab4f8724

由 Dust Li 提交于 7月 12, 2020

to #29272054

AF_XDP is a new AF family that support usespace applications
communicate with XDP program directly.
One promising use case is UDP

Both x86_64 and aarch64 are enabled
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ab4f8724

Intel: perf/x86/intel/uncore: Add Ice Lake server uncore support · c4af4e97

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit 2b3b76b5ec67568da4bb475d3ce8a92ef494b5de upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

The uncore subsystem in Ice Lake server is similar to previous server.
There are some differences in config register encoding and pci device
IDs. The uncore PMON units in Ice Lake server include Ubox, Chabox, IIO,
IRP, M2PCIE, PCU, M2M, PCIE3 and IMC.

- For CHA, filter 1 register has been removed. The filter 0 register can
be used by and of CHA events to be filterd by Thread/Core-ID. To do
so, the control register's tid_en bit must be set to 1.
- For IIO, there are some changes on event constraints. The MSR address
and MSR offsets among counters are also changed.
- For IRP, the MSR address and MSR offsets among counters are changed.
- For M2PCIE, the counters are accessed by MSR now. Add new MSR address
and MSR offsets. Change event constraints.
- To determine the number of CHAs, have to read CAPID6(Low) and CAPID7
(High) now.
- For M2M, update the PCICFG address and Device ID.
- For UPI, update the PCICFG address, Device ID and counter address.
- For M3UPI, update the PCICFG address, Device ID, counter address and
event constraints.
- For IMC, update the formular to calculate MMIO BAR address, which is
MMIO_BASE + specific MEM_BAR offset.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/1585842411-150452-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

c4af4e97

Intel: perf/x86/intel/uncore: Add box_offsets for free-running counters · 4b559ab4

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit bc88a2fe216a51e8ab46d61f89d0c1b5a400470e upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

The offset between uncore boxes of free-running counters varies, e.g.
IIO free-running counters on Ice Lake server.

Add box_offsets, an array of offsets between adjacent uncore boxes.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1584470314-46657-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

4b559ab4

Intel: perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box · 6b7f290f

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit 3442a9ecb8e72a33c28a2b969b766c659830e410 upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

The IMC uncore unit in Ice Lake server can only be accessed by MMIO,
which is similar as Snow Ridge.
Factor out __snr_uncore_mmio_init_box which can be shared with Ice Lake
server in the following patch.

No functional changes.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1584470314-46657-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

6b7f290f

Intel: perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge · 28661c6a

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit ee49532b38dd084650bf715eabe7e3828fb8d275 upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

IMC uncore unit can only be accessed via MMIO on Snow Ridge.
The MMIO space of IMC uncore is at the specified offsets from the
MEM0_BAR. Add snr_uncore_get_mc_dev() to locate the PCI device with
MMIO_BASE and MEM0_BAR register.

Add new ops to access the IMC registers via MMIO.

Add 3 new free running counters for clocks, read and write bandwidth.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1556672028-119221-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

28661c6a

Intel: perf/x86/intel/uncore: Clean up client IMC · 4f42d8f8

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit 07ce734dd8adc0f170d43c15a9b91b707a21b9d7 upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

The client IMC block is accessed by MMIO. Current code uses an informal
way to access the block, which is not recommended.

Clean up the code by using __iomem annotation and the accessor
functions (read[lq]()).

Move exit_box() and read_counter() to generic code, which can be shared
with the server code later.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1556672028-119221-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

4f42d8f8

Intel: perf/x86/intel/uncore: Support MMIO type uncore blocks · 4599feef

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit 3da04b8a00dd6d39970b9e764b78c5dfb40ec013 upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

A new MMIO type uncore box is introduced on Snow Ridge server. The
counters of MMIO type uncore box can only be accessed by MMIO.

Add a new uncore type, uncore_mmio_uncores, for MMIO type uncore blocks.

Support MMIO type uncore blocks in CPU hot plug. The MMIO space has to
be map/unmap for the first/last CPU. The context also need to be
migrated if the bind CPU changes.

Add mmio_init() to init and register PMUs for MMIO type uncore blocks.

Add a helper to calculate the box_ctl address.

The helpers which calculate ctl/ctr can be shared with PCI type uncore
blocks.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1556672028-119221-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

4599feef

Intel: perf/x86/intel/uncore: Factor out box ref/unref functions · 3df4e38a

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit c8872d90e0a3651a096860d3241625ccfa1647e0 upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

For uncore box which can only be accessed by MSR, its reference
box->refcnt is updated in CPU hot plug. The uncore boxes need to be
initalized and exited accordingly for the first/last CPU of a socket.

Starts from Snow Ridge server, a new type of uncore box is introduced,
which can only be accessed by MMIO. The driver needs to map/unmap
MMIO space for the first/last CPU of a socket.

Extract the codes of box ref/unref and init/exit for reuse later.

There is no functional change.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1556672028-119221-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

3df4e38a

Intel: perf/x86/intel/uncore: Add uncore support for Snow Ridge server · 06253540

由 Kan Liang 提交于 6月 12, 2020

fix #29130534

commit 210cc5f9db7a5c66b7ca6290b7d35cc7db7e9dbd upstream.

Backport summary: Backport to kernel 4.19.57 for ICX uncore support.

The uncore subsystem on Snow Ridge is similar as previous SKX server.
The uncore units on Snow Ridge include Ubox, Chabox, IIO, IRP, M2PCIE,
PCU, M2M, PCIE3 and IMC.

- The config register encoding and pci device IDs are changed.
- For CHA, the umask_ext and filter_tid fields are changed.
- For IIO, the ch_mask and fc_mask fields are changed.
- For M2M, the mask_ext field is changed.
- Add new PCIe3 unit for PCIe3 root port which provides the interface
  between PCIe devices, plugged into the PCIe port, and the components
  (in M2IOSF).
- IMC can only be accessed via MMIO on Snow Ridge now. Current common
  code doesn't support it yet. IMC will be supported in following
  patches.
- There are 9 free running counters for IIO CLOCKS and bandwidth In.
- Full uncore event list is not published yet. Event constrain is not
  included in this patch. It will be added later separately.
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1556672028-119221-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYunying Sun <yunying.sun@intel.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

06253540

alinux: block-throttle: only do io statistics if needed · b8a94ed8

由 Xiaoguang Wang 提交于 7月 04, 2020

task #29063222

Current blk throttle codes always do io statistics even though users
don't specify valid throttle rules, which will introduce significant
overheads for applications that don't use blk throttle function and
is wrose in arm, see below perf data captured in arm:

sudo taskset -c 66 fio -ioengine=io_uring -sqthread_poll=1 -hipri=1
-sqthread_poll_cpu=65 -registerfiles=1 -fixedbufs=1 -direct=1
-filename=/dev/nvme0n1 -bs=4k -iodepth=8 -rw=randwrite  -time_based
-ramp_time=30 -runtime=60  -name="test"

Samples: 25K of event 'cycles', Event count (approx.): 16586974662
Overhead  Command      Shared Object      Symbol
   3.54%  io_uring-sq  [kernel.kallsyms]  [k]
throtl_stats_update_completion
   0.89%  io_uring-sq  [kernel.kallsyms]  [k] throtl_bio_end_io
   0.66%  io_uring-sq  [kernel.kallsyms]  [k] blk_throtl_bio
   0.05%  io_uring-sq  [kernel.kallsyms]  [k] blk_throtl_stat_add
   0.05%  io_uring-sq  [kernel.kallsyms]  [k] throtl_track_latency
   0.01%  io_uring-sq  [kernel.kallsyms]  [k] blk_throtl_bio_endio

Samples: 25K of event 'cycles', Event count (approx.): 16586974662
Overhead  Command      Shared Object      Symbol
   1.62%  io_uring-sq  [kernel.kallsyms]  [k] io_submit_sqes
   1.06%  io_uring-sq  [kernel.kallsyms]  [k] io_issue_sqe
   0.32%  io_uring-sq  [kernel.kallsyms]  [k] __io_queue_sqe
   0.06%  io_uring-sq  [kernel.kallsyms]  [k] io_queue_sqe

Above test doesn't set valid blk throttle rules, but the overhead
introduced by blk throttle is even bigger than many io_uring framework
functions, which is not acceptable.

To improve this issue, only do do io statistics if users specify valid
blk throttle rules, and this will also improve performance.

Before this patch:
clat (usec): min=5, max=6871, avg=18.70, stdev=17.89
 lat (usec): min=9, max=6871, avg=18.84, stdev=17.89
WRITE: bw=1618MiB/s (1697MB/s), 1618MiB/s-1618MiB/s (1697MB/s-1697MB/s),
io=94.8GiB (102GB), run=60001-60001msec

With this patch:
clat (usec): min=5, max=7554, avg=17.49, stdev=18.24
lat (usec): min=9, max=7554, avg=17.62, stdev=18.24
 WRITE: bw=1727MiB/s (1810MB/s), 1727MiB/s-1727MiB/s
(1810MB/s-1810MB/s), io=101GiB (109GB), run=60001-60001msec

About 6.6% bps improvement and 6.4% latency reduction.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b8a94ed8

configs: disable CONFIG_REFCOUNT_FULL for release kernel · 897fa8eb

由 Dust Li 提交于 7月 07, 2020

fix #29180329

CONFIG_REFCOUNT_FULL is used for debugging mainly,
for release kernel, it's better to diable it.
This patch disables both x86 and aarch64 for release
kernel.

This has a pretty large performance penalty for
will-it-scale:signal1_process when the process
number are large.
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

897fa8eb

configs: update configs to adapt nvdimm series · efa76a28

由 Shile Zhang 提交于 4月 28, 2020

to #27305291

Enabled the following configs for NVDIMM support:
- CONFIG_ACPI_NFIT=m
- CONFIG_NVDIMM_KEYS=y
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

efa76a28

libnvdimm/security: provide fix for secure-erase to use zero-key · 8828833e

由 Dave Jiang 提交于 3月 27, 2019

to #27305291

commit 037c8489ade669e0f09ad40d5b91e5e1159a14b1 upstream.

Add a zero key in order to standardize hardware that want a key of 0's to
be passed. Some platforms defaults to a zero-key with security enabled
rather than allow the OS to enable the security. The zero key would allow
us to manage those platform as well. This also adds a fix to secure erase
so it can use the zero key to do crypto erase. Some other security commands
already use zero keys. This introduces a standard zero-key to allow
unification of semantics cross nvdimm security commands.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

8828833e

libnvdimm/security: Add documentation for nvdimm security support · b42cd6ea

由 Dave Jiang 提交于 12月 10, 2018

to #27305291

commit 1f4883f300da4f4d9d31eaa80f7debf6ce74843b upstream.

Add theory of operation for the security support that's going into
libnvdimm.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Reviewed-by: NJing Lin <jing.lin@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

b42cd6ea

tools/testing/nvdimm: add Intel DSM 1.8 support for nfit_test · 55039eda

由 Dave Jiang 提交于 12月 10, 2018

to #27305291

commit ecaa4a97b3908be0bf3ad12181ae8c44d1816d40 upstream.

Adding test support for new Intel DSM from v1.8. The ability of simulating
master passphrase update and master secure erase have been added to
nfit_test.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

55039eda

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功