提交 · fe15220e558ba3581830dac7d3035987ef546453 · openanolis / cloud-kernel

02 9月, 2020 40 次提交

io_uring: fix recvmsg memory leak with buffer selection · fe15220e

由 Pavel Begunkov 提交于 7月 15, 2020

to #29441901

commit 681fda8d27a66f7e65ff7f2d200d7635e64a8d05 upstream.

io_recvmsg() doesn't free memory allocated for struct io_buffer. This can
causes a leak when used with automatic buffer selection.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

fe15220e

iocost: protect iocg->abs_vdebt with iocg->waitq.lock · 1ec0deaf

由 Tejun Heo 提交于 5月 04, 2020

to #29361128

commit 0b80f9866e6bbfb905140ed8787ff2af03652c0c upstream.

abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.

However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.

1. The iocg is in the process of being deactivated by the periodic timer.

2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
without anything because it still sees that the iocg is already active.

3. The iocg is deactivated.

4. The bio from #2 is over budget but needs to be forced. It increases
abs_vdebt and goes over the threshold and enables use_delay.

5. IO control is enabled for the iocg's subtree and now IOs are attributed
to the descendant cgroups and the iocg itself no longer issues IOs.

This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.

The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.

This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NVlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f63f246 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1ec0deaf

iocost_monitor: drop string wrap around numbers when outputting json · 751b3a44

由 Tejun Heo 提交于 4月 13, 2020

to #29361128

commit 21f3cfeab304fc07b90d93d98d4d2f62110fe6b2 upstream.

Wrapping numbers in strings is used by some to work around bit-width issues in
some enviroments. The problem isn't innate to json and the workaround seems to
cause more integration problems than help. Let's drop the string wrapping.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

751b3a44

iocost_monitor: exit successfully if interval is zero · c5eb33a2

由 Tejun Heo 提交于 4月 13, 2020

to #29361128

commit f4fe3ea636385a51f1dfbb27c387a04b12b919e9 upstream.

This is to help external tools to decide whether iocost_monitor has all its
requirements met or not based on the exit status of an -i0 run.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c5eb33a2

blk-iocost: account for IO size when testing latencies · 8c8fb141

由 Tejun Heo 提交于 4月 13, 2020

to #29361128

commit cd006509b0a93cb7ee9d9fd50ae274098997a460 upstream.

On each IO completion, iocost decides whether the IO met or missed its latency
target. Currently, the targets are fixed numbers per IO type. While this can be
good enough for loose latency targets way higher than typical completion
latencies, the effect of IO size makes it difficult to tighten the latency
target - a target adequate for 4k IOs might be too tight for 512k IOs and
vice-versa.

iocost already has all the necessary information to account for different IO
sizes when testing whether the latency target is met as iocost can calculate the
size vtime cost of a given IO. This patch updates the completion path to
calculate the size vtime cost of the IO, deduct the nsec equivalent from the
observed latency and use the adjusted value to decide whether the target is met.

This makes latency targets independent from IO size and enables determining
adequate latency targets with fixed size fio runs.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8c8fb141

block: make rq sector size accessible for block stats · 9bdcaff2

由 Hou Tao 提交于 5月 21, 2019

to #29361128

commit 3d24430694077313c75c6b89f618db09943621e4 upstream.

Currently rq->data_len will be decreased by partial completion or
zeroed by completion, so when blk_stat_add() is invoked, data_len
will be zero and there will never be samples in poll_cb because
blk_mq_poll_stats_bkt() will return -1 if data_len is zero.

We could move blk_stat_add() back to __blk_mq_complete_request(),
but that would make the effort of trying to call ktime_get_ns()
once in vain. Instead we can reuse throtl_size field, and use
it for both block stats and block throttle, and adjust the
logic in blk_mq_poll_stats_bkt() accordingly.

Fixes: 4bc6339a ("block: move blk_stat_add() to __blk_mq_end_request()")
Tested-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9bdcaff2

blk-iocost: switch to fixed non-auto-decaying use_delay · fc94dc72

由 Tejun Heo 提交于 4月 13, 2020

to #29361128

commit 54c52e10dc9b939084a7e6e3d32ce8fd8dee7898 upstream.

The use_delay mechanism was introduced by blk-iolatency to hold memory
allocators accountable for the reclaim and other shared IOs they cause. The
duration of the delay is dynamically balanced between iolatency increasing the
value on each target miss and it auto-decaying as time passes and threads get
delayed on it.

While this works well for iolatency, iocost's control model isn't compatible
with it. There is no repeated "violation" events which can be balanced against
auto-decaying. iocost instead knows how much a given cgroup is over budget and
wants to prevent that cgroup from issuing IOs while over budget. Until now,
iocost has been adding the cost of force-issued IOs. However, this doesn't
reflect the amount which is already over budget and is simply not enough to
counter the auto-decaying allowing anon-memory leaking low priority cgroup to
go over its alloted share of IOs.

As auto-decaying doesn't make much sense for iocost, this patch introduces a
different mode of operation for use_delay - when blkcg_set_delay() are used
insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it
is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the
delay duration synchronized to the budget overage amount.

With this change, iocost can effectively police cgroups which generate
significant amount of force-issued IOs.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

fc94dc72

blk-iocost: Fix error on iocost_ioc_vrate_adj · d14b5329

由 Waiman Long 提交于 4月 21, 2020

to #29361128

commmit d6c8e949a35d6906d6c03a50e9a9cdf4e494528a upstream.

Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]"
argument of the iocost_ioc_vrate_adj trace entry defined in
include/trace/events/iocost.h leading to the following error:

  /tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8:
  error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
   , u32[]* __tracepoint_arg_missed_ppm

That argument type is indeed rather complex and hard to read. Looking
at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying
the argument to a simple "u32 *missed_ppm" and adjusting the trace
entry accordingly, the compilation error was gone.

Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d14b5329

blk-iocost: remove duplicated lines in comments · 4b3109d5

由 Weiping Zhang 提交于 2月 27, 2020

to #29361128

commit fa800d73c8d0d36b1f5929198371f421b69e610e upstream.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NWeiping Zhang <zhangweiping@didiglobal.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4b3109d5

blk-iocost: fix incorrect vtime comparison in iocg_is_idle() · e7c5f028

由 Tejun Heo 提交于 3月 10, 2020

to 29361128

commit dcd6589b11d3b1e71f516a87a7b9646ed356b4c0 upstream.

vtimes may wrap and time_before/after64() should be used to determine
whether a given vtime is before or after another. iocg_is_idle() was
incorrectly using plain "<" comparison do determine whether done_vtime
is before vtime. Here, the only thing we're interested in is whether
done_vtime matches vtime which indicates that there's nothing in
flight. Let's test for inequality instead.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e7c5f028

iocost: Fix iocost_monitor.py due to helper type mismatch · 80cefe0c

由 Tejun Heo 提交于 1月 17, 2020

to #29361128

commit 9ea37e24d4a95dd934a0600d65caa25e409705bb upstream.

iocost_monitor.py broke with recent versions of drgn due to helper
being stricter about types.  Fix it so that it uses the correct type.
Signed-off-by: NTejun Heo <tj@kernel.org>
Suggested-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

80cefe0c

iocost: over-budget forced IOs should schedule async delay · 9ab225fe

由 Tejun Heo 提交于 12月 16, 2019

to #29361128

commit d7bd15a138aef3be227818aad9c501e43c89c8c5 upstream.

When over-budget IOs are force-issued through root cgroup,
iocg_kick_delay() adjusts the async delay accordingly but doesn't
actually schedule async throttle for the issuing task.  This bug is
pretty well masked because sooner or later the offending threads are
gonna get directly throttled on regular IOs or have async delay
scheduled by mem_cgroup_throttle_swaprate().

However, it can affect control quality on filesystem metadata heavy
operations.  Let's fix it by invoking blkcg_schedule_throttle() when
iocg_kick_delay() says async delay is needed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org
Reported-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9ab225fe

alinux: sched: Add cpu_stress to show system-wide task waiting · ab81d2d9

由 Yihao Wu 提交于 6月 01, 2020

to #28739709

/proc/loadavg can reflex the waiting tasks over a period of time
to some extent. But to become a SLI requires better precision and
quicker response. Furthermore, I/O block is not concerned here,
and bandwidth control is excluded from cpu_stress.

This patch adds a new interface /proc/cpu_stress. It's based on
task runtime tracking so we don't need to deal with complex state
transition. And because task runtime tracking is done in most
scheduler events, the precision is quite enough.

Like loadavg, cpu_stress has 3 average windows too (1,5,15 min)
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

ab81d2d9

perf vendor events amd: Add L3 cache events for Family 17h · 4d76552e

由 Kim Phillips 提交于 9月 19, 2019

to #29276623

commit faef87494139cf2cc4d188d5730251ade9b2022d upstream

Allow users to symbolically specify L3 events for Family 17h processors
using the existing AMD Uncore driver.

Source of events descriptions are from section 2.1.15.4.1 "L3 Cache PMC
Events" of the latest Family 17h PPR, available here:

  https://www.amd.com/system/files/TechDocs/55570-B1_PUB.zip

Opnly BriefDescriptions added, since they show with and without
the -v and --details flags.

Tested with:

 # perf stat -e l3_request_g1.caching_l3_cache_accesses,amd_l3/event=0x01,umask=0x80/,l3_comb_clstr_state.request_miss,amd_l3/event=0x06,umask=0x01/ perf bench mem memcpy -s 4mb -l 100 -f default
...
         7,006,831      l3_request_g1.caching_l3_cache_accesses
         7,006,830      amd_l3/event=0x01,umask=0x80/
           366,530      l3_comb_clstr_state.request_miss
           366,568      amd_l3/event=0x06,umask=0x01/
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Janakarajan Natarajan <janakarajan.natarajan@amd.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Luke Mujica <lukemujica@google.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190919204306.12598-1-kim.phillips@amd.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NPeng Wang <rocking@linux.alibaba.com>
Acked-by: NShanpei Chen <shanpeic@linux.alibaba.com>

4d76552e

alinux: block: initialize io hang counter · 5d951856

由 Xiaoguang Wang 提交于 7月 21, 2020

fix #29420707

Otherwise we'll get stale io hang counter.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

5d951856

configs: Enable CONFIG_RESCTRL to enable Intel RDT and AMD QoS · 71228589

由 Zelin Deng 提交于 7月 20, 2020

fix #29334855

Intel RDT and AMD QoS are used to cache monitoring/allocation and memory
bandwidth monitoring and allocation. In order to enabled Intel RDT and
AMD QoS, CONFIG_RESCTRL has to be configured as Y.
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

71228589

hookers: fix Kconfig dependency on INET · e26a55e2

由 Dust Li 提交于 7月 17, 2020

fix #29372337

hookers need CONFIG_INET to run its basic functionanity,
add dependency in Kconfig
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

e26a55e2

x86/resctrl: Fix memory bandwidth counter width for AMD · 9df89203

由 Babu Moger 提交于 6月 04, 2020

fix #29035143

commit 2c18bd525c47f882f033b0a813ecd09c93e1ecdf upstream

Memory bandwidth is calculated reading the monitoring counter
at two intervals and calculating the delta. It is the software’s
responsibility to read the count often enough to avoid having
the count roll over _twice_ between reads.

The current code hardcodes the bandwidth monitoring counter's width
to 24 bits for AMD. This is due to default base counter width which
is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
to adjust the counter width. But, the AMD hardware supports much
wider bandwidth counter with the default width of 44 bits.

Kernel reads these monitoring counters every 1 second and adjusts the
counter value for overflow. With 24 bits and scale value of 64 for AMD,
it can only measure up to 1GB/s without overflowing. For the rates
above 1GB/s this will fail to measure the bandwidth.

Fix the issue setting the default width to 44 bits by adjusting the
offset.

AMD future products will implement CPUID 0xF.[ECX=1]:EAX.

 [ bp: Let the line stick out and drop {}-brackets around a single
   statement. ]

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/159129975546.62538.5656031125604254041.stgit@naples-babu.amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9df89203

x86/resctrl: Support CPUID enumeration of MBM counter width · 31df1a2e

由 Reinette Chatre 提交于 5月 05, 2020

fix #29035143

commit f3d44f18b0662327c42128b9d3604489bdb6e36f upstream

The original Memory Bandwidth Monitoring (MBM) architectural
definition defines counters of up to 62 bits in the
IA32_QM_CTR MSR while the first-generation MBM implementation
uses statically defined 24 bit counters.

Expand the MBM CPUID enumeration properties to include the MBM
counter width. The previously undefined EAX output register contains,
in bits [7:0], the MBM counter width encoded as an offset from
24 bits. Enumerating this property is only specified for Intel
CPUs.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

31df1a2e

x86/cpu: Move resctrl CPUID code to resctrl/ · 9cf1c9c1

由 Reinette Chatre 提交于 5月 05, 2020

fix #29035143

commit 0118ad82c2a64ebcf15d7565ed35361407efadfa upstream

The function determining a platform's support and properties of cache
occupancy and memory bandwidth monitoring (properties of
X86_FEATURE_CQM_LLC) can be found among the common CPU code. After
the feature's properties is populated in the per-CPU data the resctrl
subsystem is the only consumer (via boot_cpu_data).

Move the function that obtains the CPU information used by resctrl to
the resctrl subsystem and rename it from init_cqm() to
resctrl_cpu_detect(). The function continues to be called from the
common CPU code. This move is done in preparation of the addition of some
vendor specific code.

No functional change.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9cf1c9c1

x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h · 16cc4455

由 Reinette Chatre 提交于 5月 05, 2020

fix #29035143

commit 8dd97c65185c5a63c668e5bd8a861c04f47a35ed upstream

asm/resctrl_sched.h is dedicated to the code used for configuration
of the CPU resource control state when a task is scheduled.

Rename resctrl_sched.h to resctrl.h in preparation of additions that
will no longer make this file dedicated to work done during scheduling.

No functional change.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

16cc4455

configs: enable config for TCP_RT module · 0996ce64

由 Xuan Zhuo 提交于 5月 26, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

0996ce64

alinux: tcp_rt: add Documentation for tcp-rt · 3719abbc

由 Xuan Zhuo 提交于 7月 09, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

3719abbc

alinux: tcp_rt module: peer ports add more statistics · 5efc57b3

由 Xuan Zhuo 提交于 6月 05, 2020

to #27804112

The peer ports add server_time, recv_time, recv_data statistics,
and modify the upload keyword to recv, which is more common for
local and peer.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

5efc57b3

alinux: tcp_rt module: support pports_range · ded7aa68

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112

remove con_num from struct tcp_rt_stats, that is not used.

Add new type TCPRT_TYPE_PEER_PORT_RANG, then stats_peer also
use the two-dimensional array.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

ded7aa68

alinux: tcp_rt module: change the _tcp_rt_stats item type to u64 · 36e4a622

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112

_tcp_rt_stats is used to save the values returned from tcp_rt_stats.
These values were originally 64-bit, and now stored in u32, some larger
variables will overflow, so they are modified to 64-bit.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

36e4a622

alinux: tcp_rt module: use atomic64_xchg replace atomic64_read and atomic64_set · 81be6f9d

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112

1. Call atomic_read first and then atomic_set which will cause some
   statistics to be lost
2. The instruction of atomic class is relatively slow, and calling it
   twice for each variable in succession is a big harm to performance.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

81be6f9d

alinux: tcp_rt module: save tcp rtt when R record, change the unit to us · 404bc1e2

由 Xuan Zhuo 提交于 6月 04, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

404bc1e2

alinux: tcp_rt module: P record add rt and tcp reorder info · 0c635e00

由 Xuan Zhuo 提交于 7月 09, 2020

to #27804112
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

0c635e00

alinux: tcp_rt module: simplify the parameter name · 13867aae

由 Xuan Zhuo 提交于 5月 28, 2020

to #27804112

Use prefixes to distinguish between local and peer ports.
Simplify the parameter length, and also consider that you
can increase the filter conditions of the peer port range
or peer address later.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

13867aae

alinux: tcp_rt module: fix repeat stats for mrtt · 8675ebe3

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

Only count mrtt at the end of the connection. In the case of ports peer,
if mrtt is counted in each output, it will cause repeated statistics.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

8675ebe3

alinux: tcp_rt module: change relay work mode · 2bac13c5

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

1. Using relay's overwrite working mode, when the buffer is full,
   the old data is discarded, and the new data is written to the buffer.
2. Since stats is triggered by a timer, there will be no concurrency,
   so the method of outputting one relay file per CPU is not suitable,
   so it is modified to have only one "rt-network-stats" file.
3. Use the relay open to transfer the parent directory tcp-rt.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

2bac13c5

alinux: tcp_rt module: change real to stats · 757ae940

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

Uniformly modify "check" "statis" to stats.

Since some users do not use the stats function,
and the stats function takes up some resources,
modifying stats is not enabled by default.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

757ae940

alinux: tcp_rt module: support ports_range · f486d9b9

由 Xuan Zhuo 提交于 5月 21, 2020

to #27804112

ports_range configured as '1000,2000,4000,6000', that means
port range 1000-2000 or 4000-6000 will be monitored.

If the user uses ports_range then the real function will have too many
ports to count. If we apply space for all ports at once when loading the
module, this is a waste of space, so we use a two-dimensional array to
solve this problem. A continuous space is allocated as needed to save
a certain amount of statistical information.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

f486d9b9

alinux: tcp_rt module: add tcp_rt module · 03fcbae2

由 Xuan Zhuo 提交于 5月 19, 2020

to #27804112

TCP-RT is a kernel module for monitoring services at the tcp level.

TCP-RT is essentially a trace method. By burying points in advance in the
corresponding position in the kernel tcp protocol stack, we can identify
the request and response from the scenario where there is only one concurrent
request for a single connection, then collect the time when the request is
received in the protocol stack and the time-consuming information on the
processing of the service process and so on.

In addition, TCP-RT also supports some statistical analysis in the kernel
and periodically outputs statistical information about the specified connection.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com>
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>

03fcbae2

configs: Enabled CONFIG_PCIE_DPC · 4680a739

由 Zelin Deng 提交于 7月 16, 2020

fix #29335392

DPC stands for downstream port containment, it will halt PCIE traffic
below a downstream port after an unmasked uncorrectable error is
detected at or below the port, so that it can avoid the potential spread
of any data corruption.
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

4680a739

alinux: blk-iocost: bypass IOs earlier if disabled · e166c200

由 Joseph Qi 提交于 7月 16, 2020

to #29357063

The blkg lookup or create logic may bring much overhead even iocost is
disabled. So bypass it earlier in such case.

Fixes: 9da41925 ("alinux: iocost: fix NULL pointer dereference in ioc_rqos_throttle")
Reported-by: NHongnan Li <hongnan.li@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

e166c200

ovl: inode reference leak in ovl_is_inuse true case. · b247d8a6

由 youngjun 提交于 6月 16, 2020

to #29273482

When "ovl_is_inuse" true case, trap inode reference not put. plus adding
the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.

Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection")
Cc: <stable@vger.kernel.org> # v4.19+
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs-next&id=24f14009b8f1754ec2ae4c168940c01259b0f88aSigned-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

b247d8a6

Revert "samples/bpf: fix build by setting HAVE_ATTR_TEST to zero" · eaf8003a

由 Dust Li 提交于 7月 10, 2020

fix #29262413

The original patch didn't fit 4.19 well since upstream kernel use
  #if HAVE_ATTR_TEST
  xxx
  #endif

but in 4.19, we use
  #ifdef HAVE_ATTR_TEST
  xxx
  #endif

As a result, the origin patch enabled the macro in #ifdef HAVE_ATTR_TEST
and finnaly cause the build fail when run:

$make M=samples/bpf
make -C /mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../ O=
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
  CC      samples/bpf/syscall_nrs.s
  UPD     samples/bpf/syscall_nrs.h
  HOSTCC  samples/bpf/test_lru_dist
  HOSTCC  samples/bpf/sock_example
  HOSTCC  samples/bpf/bpf_load.o
In file included from ./tools/perf/perf-sys.h:9:0,
                 from samples/bpf/bpf_load.c:29:
./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’:
./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function)
  if (unlikely(test_attr__enabled))
               ^
./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’
 # define unlikely(x)  __builtin_expect(!!(x), 0)
                                           ^
./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in
  if (unlikely(test_attr__enabled))
               ^
./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’
 # define unlikely(x)  __builtin_expect(!!(x), 0)
                                           ^
In file included from samples/bpf/bpf_load.c:29:0:
./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration]
   test_attr__open(attr, pid, cpu, fd, group_fd, flags);
   ^~~~~~~~~~~~~~~
make[1]: *** [samples/bpf/bpf_load.o] Error 1
make: *** [_module_samples/bpf] Error 2

This reverts commit 4665759a.
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

eaf8003a

samples/bpf: Add a workaround for asm_inline · 867c8a45

由 KP Singh 提交于 10月 02, 2019

to #29262413

commit 98beb3edeb974e906a81f305d88f7bc96b2ec83e upstream.

This was added in commit eb111869301e ("compiler-types.h: add asm_inline
definition") and breaks samples/bpf as clang does not support asm __inline.

Fixes: eb111869301e ("compiler-types.h: add asm_inline definition")
Co-developed-by: NFlorent Revest <revest@google.com>
Signed-off-by: NFlorent Revest <revest@google.com>
Signed-off-by: NKP Singh <kpsingh@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191002191652.11432-1-kpsingh@chromium.orgSigned-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

867c8a45

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功