提交 · e26a55e2ba1b056f44070f794f09c5b164e96c46 · openanolis / cloud-kernel

02 9月, 2020 40 次提交

hookers: fix Kconfig dependency on INET · e26a55e2

由 Dust Li 提交于 7月 17, 2020

fix #29372337

hookers need CONFIG_INET to run its basic functionanity,
add dependency in Kconfig
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

e26a55e2

x86/resctrl: Fix memory bandwidth counter width for AMD · 9df89203

由 Babu Moger 提交于 6月 04, 2020

fix #29035143

commit 2c18bd525c47f882f033b0a813ecd09c93e1ecdf upstream

Memory bandwidth is calculated reading the monitoring counter
at two intervals and calculating the delta. It is the software’s
responsibility to read the count often enough to avoid having
the count roll over _twice_ between reads.

The current code hardcodes the bandwidth monitoring counter's width
to 24 bits for AMD. This is due to default base counter width which
is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
to adjust the counter width. But, the AMD hardware supports much
wider bandwidth counter with the default width of 44 bits.

Kernel reads these monitoring counters every 1 second and adjusts the
counter value for overflow. With 24 bits and scale value of 64 for AMD,
it can only measure up to 1GB/s without overflowing. For the rates
above 1GB/s this will fail to measure the bandwidth.

Fix the issue setting the default width to 44 bits by adjusting the
offset.

AMD future products will implement CPUID 0xF.[ECX=1]:EAX.

 [ bp: Let the line stick out and drop {}-brackets around a single
   statement. ]

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/159129975546.62538.5656031125604254041.stgit@naples-babu.amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9df89203

x86/resctrl: Support CPUID enumeration of MBM counter width · 31df1a2e

由 Reinette Chatre 提交于 5月 05, 2020

fix #29035143

commit f3d44f18b0662327c42128b9d3604489bdb6e36f upstream

The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.

Expand the MBM CPUID enumeration properties to include the MBM
counter width. The previously undefined EAX output register contains,
in bits [7:0], the MBM counter width encoded as an offset from
24 bits. Enumerating this property is only specified for Intel
CPUs.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

31df1a2e

x86/cpu: Move resctrl CPUID code to resctrl/ · 9cf1c9c1

由 Reinette Chatre 提交于 5月 05, 2020

fix #29035143

commit 0118ad82c2a64ebcf15d7565ed35361407efadfa upstream

The function determining a platform's support and properties of cache
occupancy and memory bandwidth monitoring (properties of
X86_FEATURE_CQM_LLC) can be found among the common CPU code. After
the feature's properties is populated in the per-CPU data the resctrl
subsystem is the only consumer (via boot_cpu_data).

Move the function that obtains the CPU information used by resctrl to
the resctrl subsystem and rename it from init_cqm() to
resctrl_cpu_detect(). The function continues to be called from the
common CPU code. This move is done in preparation of the addition of some
vendor specific code.

No functional change.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9cf1c9c1

x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h · 16cc4455

由 Reinette Chatre 提交于 5月 05, 2020

fix #29035143

commit 8dd97c65185c5a63c668e5bd8a861c04f47a35ed upstream

asm/resctrl_sched.h is dedicated to the code used for configuration
of the CPU resource control state when a task is scheduled.

Rename resctrl_sched.h to resctrl.h in preparation of additions that
will no longer make this file dedicated to work done during scheduling.

No functional change.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

16cc4455

configs: enable config for TCP_RT module · 0996ce64

由 Xuan Zhuo 提交于 5月 26, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

0996ce64

alinux: tcp_rt: add Documentation for tcp-rt · 3719abbc

由 Xuan Zhuo 提交于 7月 09, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

3719abbc

alinux: tcp_rt module: peer ports add more statistics · 5efc57b3

由 Xuan Zhuo 提交于 6月 05, 2020

to #27804112

The peer ports add server_time, recv_time, recv_data statistics,
and modify the upload keyword to recv, which is more common for
local and peer.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

5efc57b3

alinux: tcp_rt module: support pports_range · ded7aa68

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112

remove con_num from struct tcp_rt_stats, that is not used.

Add new type TCPRT_TYPE_PEER_PORT_RANG, then stats_peer also
use the two-dimensional array.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

ded7aa68

alinux: tcp_rt module: change the _tcp_rt_stats item type to u64 · 36e4a622

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112

_tcp_rt_stats is used to save the values returned from tcp_rt_stats.
These values were originally 64-bit, and now stored in u32, some larger
variables will overflow, so they are modified to 64-bit.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

36e4a622

alinux: tcp_rt module: use atomic64_xchg replace atomic64_read and atomic64_set · 81be6f9d

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112

1. Call atomic_read first and then atomic_set which will cause some
   statistics to be lost
2. The instruction of atomic class is relatively slow, and calling it
   twice for each variable in succession is a big harm to performance.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

81be6f9d

alinux: tcp_rt module: save tcp rtt when R record, change the unit to us · 404bc1e2

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

404bc1e2

alinux: tcp_rt module: P record add rt and tcp reorder info · 0c635e00

由 Xuan Zhuo 提交于 7月 09, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

0c635e00

alinux: tcp_rt module: simplify the parameter name · 13867aae

由 Xuan Zhuo 提交于 5月 28, 2020

to #27804112

Use prefixes to distinguish between local and peer ports.
Simplify the parameter length, and also consider that you
can increase the filter conditions of the peer port range
or peer address later.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

13867aae

alinux: tcp_rt module: fix repeat stats for mrtt · 8675ebe3

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

Only count mrtt at the end of the connection. In the case of ports peer,
if mrtt is counted in each output, it will cause repeated statistics.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

8675ebe3

alinux: tcp_rt module: change relay work mode · 2bac13c5

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

1. Using relay's overwrite working mode, when the buffer is full,
   the old data is discarded, and the new data is written to the buffer.
2. Since stats is triggered by a timer, there will be no concurrency,
   so the method of outputting one relay file per CPU is not suitable,
   so it is modified to have only one "rt-network-stats" file.
3. Use the relay open to transfer the parent directory tcp-rt.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

2bac13c5

alinux: tcp_rt module: change real to stats · 757ae940

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

Uniformly modify "check" "statis" to stats.

Since some users do not use the stats function,
and the stats function takes up some resources,
modifying stats is not enabled by default.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

757ae940

alinux: tcp_rt module: support ports_range · f486d9b9

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

ports_range configured as '1000,2000,4000,6000', that means
port range 1000-2000 or 4000-6000 will be monitored.

If the user uses ports_range then the real function will have too many
ports to count. If we apply space for all ports at once when loading the
module, this is a waste of space, so we use a two-dimensional array to
solve this problem. A continuous space is allocated as needed to save
a certain amount of statistical information.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

f486d9b9

alinux: tcp_rt module: add tcp_rt module · 03fcbae2

由 Xuan Zhuo 提交于 5月 19, 2020

to #27804112

TCP-RT is a kernel module for monitoring services at the tcp level.

TCP-RT is essentially a trace method. By burying points in advance in the
corresponding position in the kernel tcp protocol stack, we can identify
the request and response from the scenario where there is only one concurrent
request for a single connection, then collect the time when the request is
received in the protocol stack and the time-consuming information on the
processing of the service process and so on.

In addition, TCP-RT also supports some statistical analysis in the kernel
and periodically outputs statistical information about the specified connection.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

03fcbae2

configs: Enabled CONFIG_PCIE_DPC · 4680a739

由 Zelin Deng 提交于 7月 16, 2020

fix #29335392

DPC stands for downstream port containment, it will halt PCIE traffic
below a downstream port after an unmasked uncorrectable error is
detected at or below the port, so that it can avoid the potential spread
of any data corruption.
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

4680a739

alinux: blk-iocost: bypass IOs earlier if disabled · e166c200

由 Joseph Qi 提交于 7月 16, 2020

to #29357063

The blkg lookup or create logic may bring much overhead even iocost is
disabled. So bypass it earlier in such case.

Fixes: 9da41925 ("alinux: iocost: fix NULL pointer dereference in ioc_rqos_throttle")
Reported-by: NHongnan Li <hongnan.li@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

e166c200

ovl: inode reference leak in ovl_is_inuse true case. · b247d8a6

由 youngjun 提交于 6月 16, 2020

to #29273482

When "ovl_is_inuse" true case, trap inode reference not put. plus adding
the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.

Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection")
Cc: <stable@vger.kernel.org> # v4.19+
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs-next&id=24f14009b8f1754ec2ae4c168940c01259b0f88aSigned-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b247d8a6

Revert "samples/bpf: fix build by setting HAVE_ATTR_TEST to zero" · eaf8003a

由 Dust Li 提交于 7月 10, 2020

fix #29262413

The original patch didn't fit 4.19 well since upstream kernel use
  #if HAVE_ATTR_TEST
  xxx
  #endif

but in 4.19, we use
  #ifdef HAVE_ATTR_TEST
  xxx
  #endif

As a result, the origin patch enabled the macro in #ifdef HAVE_ATTR_TEST
and finnaly cause the build fail when run:

$make M=samples/bpf
make -C /mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../ O=
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
  CC      samples/bpf/syscall_nrs.s
  UPD     samples/bpf/syscall_nrs.h
  HOSTCC  samples/bpf/test_lru_dist
  HOSTCC  samples/bpf/sock_example
  HOSTCC  samples/bpf/bpf_load.o
In file included from ./tools/perf/perf-sys.h:9:0,
                 from samples/bpf/bpf_load.c:29:
./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’:
./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function)
  if (unlikely(test_attr__enabled))
               ^
./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’
 # define unlikely(x)  __builtin_expect(!!(x), 0)
                                           ^
./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in
  if (unlikely(test_attr__enabled))
               ^
./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’
 # define unlikely(x)  __builtin_expect(!!(x), 0)
                                           ^
In file included from samples/bpf/bpf_load.c:29:0:
./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration]
   test_attr__open(attr, pid, cpu, fd, group_fd, flags);
   ^~~~~~~~~~~~~~~
make[1]: *** [samples/bpf/bpf_load.o] Error 1
make: *** [_module_samples/bpf] Error 2

This reverts commit 4665759a.
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

eaf8003a

samples/bpf: Add a workaround for asm_inline · 867c8a45

由 KP Singh 提交于 10月 02, 2019

to #29262413

commit 98beb3edeb974e906a81f305d88f7bc96b2ec83e upstream.

This was added in commit eb111869301e ("compiler-types.h: add asm_inline
definition") and breaks samples/bpf as clang does not support asm __inline.

Fixes: eb111869301e ("compiler-types.h: add asm_inline definition")
Co-developed-by: NFlorent Revest <revest@google.com>
Signed-off-by: NFlorent Revest <revest@google.com>
Signed-off-by: NKP Singh <kpsingh@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191002191652.11432-1-kpsingh@chromium.orgSigned-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

867c8a45

samples/bpf: fix build with new clang · 3637cf04

由 Alexei Starovoitov 提交于 4月 04, 2019

to #29262413

commit 636e78b1cdb40b77a79b143dbd9d94847b360efa upstream.

clang started to error on invalid asm clobber usage in x86 headers
and many bpf program samples failed to build with the message:

  CLANG-bpf  /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
In file included from ../include/linux/in.h:23:
In file included from ../include/uapi/linux/in.h:24:
In file included from ../include/linux/socket.h:8:
In file included from ../include/linux/uio.h:14:
In file included from ../include/crypto/hash.h:16:
In file included from ../include/linux/crypto.h:26:
In file included from ../include/linux/uaccess.h:5:
In file included from ../include/linux/sched.h:15:
In file included from ../include/linux/sem.h:5:
In file included from ../include/uapi/linux/sem.h:5:
In file included from ../include/linux/ipc.h:9:
In file included from ../include/linux/refcount.h:72:
../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
                                         r->refs.counter, e, "er", i, "cx");
                                                                      ^
../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
                                         r->refs.counter, e, "cx");
                                                             ^
2 errors generated.

Override volatile() to workaround the problem.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3637cf04

samples/bpf: workaround clang asm goto compilation errors · 81aa61c4

由 Yonghong Song 提交于 1月 12, 2019

fix #29255532

commit 6bf3bbe1f4d4cf405e3c2bf07bbdff56d3223ec8 upstream.

x86 compilation has required asm goto support since 4.17.
Since clang does not support asm goto, at 4.17,
Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage
for BPF compilation") worked around the issue by permitting an
alternative implementation without asm goto for clang.

At 5.0, more asm goto usages appeared.
  [yhs@148 x86]$ egrep -r asm_volatile_goto
  include/asm/cpufeature.h:     asm_volatile_goto("1: jmp 6f\n"
  include/asm/jump_label.h:     asm_volatile_goto("1:"
  include/asm/jump_label.h:     asm_volatile_goto("1:"
  include/asm/rmwcc.h:  asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"     \
  include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
  include/asm/uaccess.h:        asm_volatile_goto("\n"                          \
  [yhs@148 x86]$

Compiling samples/bpf directories, most bpf programs failed
compilation with error messages like:
  In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2:
  In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6:
  In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15:
  In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5:
  In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5:
  In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9:
  In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72:
  /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet
        return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
               ^
  /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc'
        __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc,           \
        ^
  /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc'
        asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"             \
        ^
  /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto'
  #define asm_volatile_goto(x...) asm goto(x)

Most implementation does not even provide an alternative
implementation. And it is also not practical to make changes
for each call site.

This patch workarounded the asm goto issue by redefining the macro like below:
  #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")

If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad
will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler
will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes.

With this patch, all bpf programs under samples/bpf can pass compilation.

Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as
they do not access kernel internal headers.

Fixes: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")
Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition")
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

81aa61c4

io_uring: fix not initialised work->flags · 109831ff

由 Pavel Begunkov 提交于 7月 12, 2020

to #29276773

commit 16d598030a37853a7a6b4384cad19c9c0af2f021 upstream.

59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing
io_req_init_async(), but left out work.flags and hash. Do it earlier.

Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

109831ff

io_uring: fix missing msg_name assignment · 589ee219

由 Pavel Begunkov 提交于 7月 12, 2020

to #29276773

commit dd821e0c95a64b5923a0c57f07d3f7563553e756 upstream.

Ensure to set msg.msg_name for the async portion of send/recvmsg,
as the header copy will copy to/from it.

Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

589ee219

io_uring: account user memory freed when exit has been queued · 8c9ebb73

由 Jens Axboe 提交于 7月 10, 2020

to #29276773

commit 309fc03a3284af62eb6082fb60327045a1dabf57 upstream.

We currently account the memory after the exit work has been run, but
that leaves a gap where a process has closed its ring and until the
memory has been accounted as freed. If the memlocked ulimit is
borderline, then that can introduce spurious setup errors returning
-ENOMEM because the free work hasn't been run yet.

Account this as freed when we close the ring, as not to expose a tiny
gap where setting up a new ring can fail.

Fixes: 85faa7b8346e ("io_uring: punt final io_ring_ctx wait-and-free to workqueue")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8c9ebb73

io_uring: fix memleak in io_sqe_files_register() · c1f5f815

由 Yang Yingliang 提交于 7月 10, 2020

to #29276773

commit 667e57da358f61b6966e12e925a69e42d912e8bb upstream.

I got a memleak report when doing some fuzz test:

BUG: memory leak
unreferenced object 0x607eeac06e78 (size 8):
  comm "test", pid 295, jiffies 4294735835 (age 31.745s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0
    [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0
    [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480
    [<00000000591b89a6>] do_syscall_64+0x56/0xa0
    [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Call percpu_ref_exit() on error path to avoid
refcount memleak.

Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Cc: stable@vger.kernel.org
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c1f5f815

io_uring: fix memleak in __io_sqe_files_update() · 4b392775

由 Yang Yingliang 提交于 7月 09, 2020

to #29276773

commit f3bd9dae3708a0ff6b067e766073ffeb853301f9 upstream.

I got a memleak report when doing some fuzz test:

BUG: memory leak
unreferenced object 0xffff888113e02300 (size 488):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`.......
backtrace:
[<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

BUG: memory leak
unreferenced object 0xffff8881152dd5e0 (size 16):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 16 bytes):
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline]
[<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440
[<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

If io_sqe_file_register() failed, we need put the file that get by fget()
to avoid the memleak.

Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Cc: stable@vger.kernel.org
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4b392775

vfs, afs, ext4: Make the inode hash table RCU searchable · dc136109

由 David Howells 提交于 12月 01, 2017

task #29263287

commit 3f19b2ab97a97b413c24b66c67ae16daa4f56c35 upstream

Make the inode hash table RCU searchable so that searches that want to
access or modify an inode without taking a ref on that inode can do so
without taking the inode hash table lock.

The main thing this requires is some RCU annotation on the list
manipulation operations.  Inodes are already freed by RCU in most cases.

Users of this interface must take care as the inode may be still under
construction or may be being torn down around them.

There are at least three instances where this can be of use:

 (1) Testing whether the inode number iunique() is going to return is
     currently unique (the iunique_lock is still held).

 (2) Ext4 date stamp updating.

 (3) AFS callback breaking.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
cc: linux-ext4@vger.kernel.org
cc: linux-afs@lists.infradead.org
[jeffle: resolve collision in afs_break_one_callback since code base change]
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

dc136109

io_uring: export cq overflow status to userspace · c7f865a4

由 Xiaoguang Wang 提交于 7月 09, 2020

to #29233603

commit 6d5f904904608a9cd32854d7d0a4dd65b27f9935 upstream

For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:

static void test_cq_overflow(struct io_uring *ring)
{
        struct io_uring_cqe *cqe;
        struct io_uring_sqe *sqe;
        int issued = 0;
        int ret = 0;

        do {
                sqe = io_uring_get_sqe(ring);
                if (!sqe) {
                        fprintf(stderr, "get sqe failed\n");
                        break;;
                }
                ret = io_uring_submit(ring);
                if (ret <= 0) {
                        if (ret != -EBUSY)
                                fprintf(stderr, "sqe submit failed: %d\n", ret);
                        break;
                }
                issued++;
        } while (ret > 0);
        assert(ret == -EBUSY);

        printf("issued requests: %d\n", issued);

        while (issued) {
                ret = io_uring_peek_cqe(ring, &cqe);
                if (ret) {
                        if (ret != -EAGAIN) {
                                fprintf(stderr, "peek completion failed: %s\n",
                                        strerror(ret));
                                break;
                        }
                        printf("left requets: %d\n", issued);
                        continue;
                }
                io_uring_cqe_seen(ring, cqe);
                issued--;
                printf("left requets: %d\n", issued);
        }
}

int main(int argc, char *argv[])
{
        int ret;
        struct io_uring ring;

        ret = io_uring_queue_init(16, &ring, 0);
        if (ret) {
                fprintf(stderr, "ring setup failed: %d\n", ret);
                return 1;
        }

        test_cq_overflow(&ring);
        return 0;
}

To fix this issue, export cq overflow status to userspace by adding new
IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as
io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c7f865a4

EDAC/amd64: Set grain per DIMM · 9d090926

由 Yazen Ghannam 提交于 10月 22, 2019

fix #29035167

commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc upstream

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9d090926

EDAC/amd64: Find Chip Select memory size using Address Mask · ac316351

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit e53a3b267fb0a79db9ca1f1e08b97889b22013e6 upstream

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a couple of cases.

1) For single-rank and dual-rank non-interleaved, use the address mask
plus 1 as the size.

2) For dual-rank interleaved, do #1 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead and do the de-interleaving. In the non-interleaved case,
the original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0xFFFF9FE, De-interleaved=0x3FFFFFE
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

ac316351

EDAC/amd64: Initialize DIMM info for systems with more than two channels · b635b9c9

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit 353a1fcb8f9e5857c0fb720b9e57a86c1fb7c17e upstream

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-4-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

b635b9c9

EDAC/amd64: Support more than two controllers for chip selects handling · 47af478f

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because
AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

47af478f

Revert "EDAC/amd64: Support more than two controllers for chip select handling" · b7e38109

由 Borislav Petkov 提交于 4月 25, 2019

fix #29035167

commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream

This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea.

Unfortunately, this commit caused wrong detection of chip select sizes
on some F17h client machines:

  --- 00-rc6+     2019-02-14 14:28:03.126622904 +0100
  +++ 01-rc4+     2019-04-14 21:06:16.060614790 +0200
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0MB
   EDAC MC: UMC1 chip selects:
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0M

Revert it for now until it has been solved properly.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

b7e38109

perf/amd/uncore: Add support for Family 19h L3 PMU · efaf304e

由 Kim Phillips 提交于 3月 13, 2020

fix #29035100

commit e48667b865480d8bf0f1171a8b474ffc785b9ace upstream

Family 19h introduces change in slice, core and thread specification in
its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is
incompatible with Family 17h's version of the register.

Introduce a new path in l3_thread_slice_mask() to do things differently
for Family 19h vs. Family 17h, otherwise the new hardware doesn't get
programmed correctly.

Instead of a linear core--thread bitmask, Family 19h takes an encoded
core number, and a separate thread mask. There are new bits that are set
for all cores and all slices, of which only the latter is used, since
the driver counts events for all slices on behalf of the specified CPU.

Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode
decision based on Family 17h or above, not just 17h and 18h: the Family
19h Data Fabric PMC is compatible with the Family 17h DF PMC.

 [ bp: Touchups. ]
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

efaf304e

perf/amd/uncore: Make L3 thread mask code more readable · e4e69222

由 Kim Phillips 提交于 3月 13, 2020

fix #29035100

commit 9689dbbeaea884d19e3085439c6a247ef986b2af upstream

Convert the l3_thread_slice_mask() function to use the more readable
topology_* helper functions, more intuitive variable names like shift
and thread_mask, and BIT_ULL().

No functional changes.
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.comSigned-off-by: NPeng Wang <rocking@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

e4e69222

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功