- 02 9月, 2020 40 次提交
-
-
由 Tejun Heo 提交于
to #29361128 commit cd006509b0a93cb7ee9d9fd50ae274098997a460 upstream. On each IO completion, iocost decides whether the IO met or missed its latency target. Currently, the targets are fixed numbers per IO type. While this can be good enough for loose latency targets way higher than typical completion latencies, the effect of IO size makes it difficult to tighten the latency target - a target adequate for 4k IOs might be too tight for 512k IOs and vice-versa. iocost already has all the necessary information to account for different IO sizes when testing whether the latency target is met as iocost can calculate the size vtime cost of a given IO. This patch updates the completion path to calculate the size vtime cost of the IO, deduct the nsec equivalent from the observed latency and use the adjusted value to decide whether the target is met. This makes latency targets independent from IO size and enables determining adequate latency targets with fixed size fio runs. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Hou Tao 提交于
to #29361128 commit 3d24430694077313c75c6b89f618db09943621e4 upstream. Currently rq->data_len will be decreased by partial completion or zeroed by completion, so when blk_stat_add() is invoked, data_len will be zero and there will never be samples in poll_cb because blk_mq_poll_stats_bkt() will return -1 if data_len is zero. We could move blk_stat_add() back to __blk_mq_complete_request(), but that would make the effort of trying to call ktime_get_ns() once in vain. Instead we can reuse throtl_size field, and use it for both block stats and block throttle, and adjust the logic in blk_mq_poll_stats_bkt() accordingly. Fixes: 4bc6339a ("block: move blk_stat_add() to __blk_mq_end_request()") Tested-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Tejun Heo 提交于
to #29361128 commit 54c52e10dc9b939084a7e6e3d32ce8fd8dee7898 upstream. The use_delay mechanism was introduced by blk-iolatency to hold memory allocators accountable for the reclaim and other shared IOs they cause. The duration of the delay is dynamically balanced between iolatency increasing the value on each target miss and it auto-decaying as time passes and threads get delayed on it. While this works well for iolatency, iocost's control model isn't compatible with it. There is no repeated "violation" events which can be balanced against auto-decaying. iocost instead knows how much a given cgroup is over budget and wants to prevent that cgroup from issuing IOs while over budget. Until now, iocost has been adding the cost of force-issued IOs. However, this doesn't reflect the amount which is already over budget and is simply not enough to counter the auto-decaying allowing anon-memory leaking low priority cgroup to go over its alloted share of IOs. As auto-decaying doesn't make much sense for iocost, this patch introduces a different mode of operation for use_delay - when blkcg_set_delay() are used insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the delay duration synchronized to the budget overage amount. With this change, iocost can effectively police cgroups which generate significant amount of force-issued IOs. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Waiman Long 提交于
to #29361128 commmit d6c8e949a35d6906d6c03a50e9a9cdf4e494528a upstream. Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]" argument of the iocost_ioc_vrate_adj trace entry defined in include/trace/events/iocost.h leading to the following error: /tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token , u32[]* __tracepoint_arg_missed_ppm That argument type is indeed rather complex and hard to read. Looking at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying the argument to a simple "u32 *missed_ppm" and adjusting the trace entry accordingly, the compilation error was gone. Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Weiping Zhang 提交于
to #29361128 commit fa800d73c8d0d36b1f5929198371f421b69e610e upstream. Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NWeiping Zhang <zhangweiping@didiglobal.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Tejun Heo 提交于
to 29361128 commit dcd6589b11d3b1e71f516a87a7b9646ed356b4c0 upstream. vtimes may wrap and time_before/after64() should be used to determine whether a given vtime is before or after another. iocg_is_idle() was incorrectly using plain "<" comparison do determine whether done_vtime is before vtime. Here, the only thing we're interested in is whether done_vtime matches vtime which indicates that there's nothing in flight. Let's test for inequality instead. Signed-off-by: NTejun Heo <tj@kernel.org> Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Tejun Heo 提交于
to #29361128 commit 9ea37e24d4a95dd934a0600d65caa25e409705bb upstream. iocost_monitor.py broke with recent versions of drgn due to helper being stricter about types. Fix it so that it uses the correct type. Signed-off-by: NTejun Heo <tj@kernel.org> Suggested-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Tejun Heo 提交于
to #29361128 commit d7bd15a138aef3be227818aad9c501e43c89c8c5 upstream. When over-budget IOs are force-issued through root cgroup, iocg_kick_delay() adjusts the async delay accordingly but doesn't actually schedule async throttle for the issuing task. This bug is pretty well masked because sooner or later the offending threads are gonna get directly throttled on regular IOs or have async delay scheduled by mem_cgroup_throttle_swaprate(). However, it can affect control quality on filesystem metadata heavy operations. Let's fix it by invoking blkcg_schedule_throttle() when iocg_kick_delay() says async delay is needed. Signed-off-by: NTejun Heo <tj@kernel.org> Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org Reported-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yihao Wu 提交于
to #28739709 /proc/loadavg can reflex the waiting tasks over a period of time to some extent. But to become a SLI requires better precision and quicker response. Furthermore, I/O block is not concerned here, and bandwidth control is excluded from cpu_stress. This patch adds a new interface /proc/cpu_stress. It's based on task runtime tracking so we don't need to deal with complex state transition. And because task runtime tracking is done in most scheduler events, the precision is quite enough. Like loadavg, cpu_stress has 3 average windows too (1,5,15 min) Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Kim Phillips 提交于
to #29276623 commit faef87494139cf2cc4d188d5730251ade9b2022d upstream Allow users to symbolically specify L3 events for Family 17h processors using the existing AMD Uncore driver. Source of events descriptions are from section 2.1.15.4.1 "L3 Cache PMC Events" of the latest Family 17h PPR, available here: https://www.amd.com/system/files/TechDocs/55570-B1_PUB.zip Opnly BriefDescriptions added, since they show with and without the -v and --details flags. Tested with: # perf stat -e l3_request_g1.caching_l3_cache_accesses,amd_l3/event=0x01,umask=0x80/,l3_comb_clstr_state.request_miss,amd_l3/event=0x06,umask=0x01/ perf bench mem memcpy -s 4mb -l 100 -f default ... 7,006,831 l3_request_g1.caching_l3_cache_accesses 7,006,830 amd_l3/event=0x01,umask=0x80/ 366,530 l3_comb_clstr_state.request_miss 366,568 amd_l3/event=0x06,umask=0x01/ Signed-off-by: NKim Phillips <kim.phillips@amd.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Janakarajan Natarajan <janakarajan.natarajan@amd.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Luke Mujica <lukemujica@google.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20190919204306.12598-1-kim.phillips@amd.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NPeng Wang <rocking@linux.alibaba.com> Acked-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
fix #29420707 Otherwise we'll get stale io hang counter. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Zelin Deng 提交于
fix #29334855 Intel RDT and AMD QoS are used to cache monitoring/allocation and memory bandwidth monitoring and allocation. In order to enabled Intel RDT and AMD QoS, CONFIG_RESCTRL has to be configured as Y. Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Dust Li 提交于
fix #29372337 hookers need CONFIG_INET to run its basic functionanity, add dependency in Kconfig Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Signed-off-by: NTony Lu <tonylu@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Babu Moger 提交于
fix #29035143 commit 2c18bd525c47f882f033b0a813ecd09c93e1ecdf upstream Memory bandwidth is calculated reading the monitoring counter at two intervals and calculating the delta. It is the software’s responsibility to read the count often enough to avoid having the count roll over _twice_ between reads. The current code hardcodes the bandwidth monitoring counter's width to 24 bits for AMD. This is due to default base counter width which is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX to adjust the counter width. But, the AMD hardware supports much wider bandwidth counter with the default width of 44 bits. Kernel reads these monitoring counters every 1 second and adjusts the counter value for overflow. With 24 bits and scale value of 64 for AMD, it can only measure up to 1GB/s without overflowing. For the rates above 1GB/s this will fail to measure the bandwidth. Fix the issue setting the default width to 44 bits by adjusting the offset. AMD future products will implement CPUID 0xF.[ECX=1]:EAX. [ bp: Let the line stick out and drop {}-brackets around a single statement. ] Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: NBabu Moger <babu.moger@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/159129975546.62538.5656031125604254041.stgit@naples-babu.amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Reinette Chatre 提交于
fix #29035143 commit f3d44f18b0662327c42128b9d3604489bdb6e36f upstream The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. Expand the MBM CPUID enumeration properties to include the MBM counter width. The previously undefined EAX output register contains, in bits [7:0], the MBM counter width encoded as an offset from 24 bits. Enumerating this property is only specified for Intel CPUs. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Reinette Chatre 提交于
fix #29035143 commit 0118ad82c2a64ebcf15d7565ed35361407efadfa upstream The function determining a platform's support and properties of cache occupancy and memory bandwidth monitoring (properties of X86_FEATURE_CQM_LLC) can be found among the common CPU code. After the feature's properties is populated in the per-CPU data the resctrl subsystem is the only consumer (via boot_cpu_data). Move the function that obtains the CPU information used by resctrl to the resctrl subsystem and rename it from init_cqm() to resctrl_cpu_detect(). The function continues to be called from the common CPU code. This move is done in preparation of the addition of some vendor specific code. No functional change. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Reinette Chatre 提交于
fix #29035143 commit 8dd97c65185c5a63c668e5bd8a861c04f47a35ed upstream asm/resctrl_sched.h is dedicated to the code used for configuration of the CPU resource control state when a task is scheduled. Rename resctrl_sched.h to resctrl.h in preparation of additions that will no longer make this file dedicated to work done during scheduling. No functional change. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 The peer ports add server_time, recv_time, recv_data statistics, and modify the upload keyword to recv, which is more common for local and peer. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 remove con_num from struct tcp_rt_stats, that is not used. Add new type TCPRT_TYPE_PEER_PORT_RANG, then stats_peer also use the two-dimensional array. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 _tcp_rt_stats is used to save the values returned from tcp_rt_stats. These values were originally 64-bit, and now stored in u32, some larger variables will overflow, so they are modified to 64-bit. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 1. Call atomic_read first and then atomic_set which will cause some statistics to be lost 2. The instruction of atomic class is relatively slow, and calling it twice for each variable in succession is a big harm to performance. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Use prefixes to distinguish between local and peer ports. Simplify the parameter length, and also consider that you can increase the filter conditions of the peer port range or peer address later. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Only count mrtt at the end of the connection. In the case of ports peer, if mrtt is counted in each output, it will cause repeated statistics. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 1. Using relay's overwrite working mode, when the buffer is full, the old data is discarded, and the new data is written to the buffer. 2. Since stats is triggered by a timer, there will be no concurrency, so the method of outputting one relay file per CPU is not suitable, so it is modified to have only one "rt-network-stats" file. 3. Use the relay open to transfer the parent directory tcp-rt. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 Uniformly modify "check" "statis" to stats. Since some users do not use the stats function, and the stats function takes up some resources, modifying stats is not enabled by default. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 ports_range configured as '1000,2000,4000,6000', that means port range 1000-2000 or 4000-6000 will be monitored. If the user uses ports_range then the real function will have too many ports to count. If we apply space for all ports at once when loading the module, this is a waste of space, so we use a two-dimensional array to solve this problem. A continuous space is allocated as needed to save a certain amount of statistical information. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Xuan Zhuo 提交于
to #27804112 TCP-RT is a kernel module for monitoring services at the tcp level. TCP-RT is essentially a trace method. By burying points in advance in the corresponding position in the kernel tcp protocol stack, we can identify the request and response from the scenario where there is only one concurrent request for a single connection, then collect the time when the request is received in the protocol stack and the time-consuming information on the processing of the service process and so on. In addition, TCP-RT also supports some statistical analysis in the kernel and periodically outputs statistical information about the specified connection. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Reviewed-by: NYa Zhao <zhaoya123@linux.alibaba.com> Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
-
由 Zelin Deng 提交于
fix #29335392 DPC stands for downstream port containment, it will halt PCIE traffic below a downstream port after an unmasked uncorrectable error is detected at or below the port, so that it can avoid the potential spread of any data corruption. Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Joseph Qi 提交于
to #29357063 The blkg lookup or create logic may bring much overhead even iocost is disabled. So bypass it earlier in such case. Fixes: 9da41925 ("alinux: iocost: fix NULL pointer dereference in ioc_rqos_throttle") Reported-by: NHongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 youngjun 提交于
to #29273482 When "ovl_is_inuse" true case, trap inode reference not put. plus adding the comment explaining sequence of ovl_is_inuse after ovl_setup_trap. Fixes: 0be0bfd2de9d ("ovl: fix regression caused by overlapping layers detection") Cc: <stable@vger.kernel.org> # v4.19+ Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: Nyoungjun <her0gyugyu@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Link: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git/commit/?h=overlayfs-next&id=24f14009b8f1754ec2ae4c168940c01259b0f88aSigned-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Dust Li 提交于
fix #29262413 The original patch didn't fit 4.19 well since upstream kernel use #if HAVE_ATTR_TEST xxx #endif but in 4.19, we use #ifdef HAVE_ATTR_TEST xxx #endif As a result, the origin patch enabled the macro in #ifdef HAVE_ATTR_TEST and finnaly cause the build fail when run: $make M=samples/bpf make -C /mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/mnt/nvme0/wuya/kernel/aliyunlinux2/ck/samples/bpf/../../ O= Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' CC samples/bpf/syscall_nrs.s UPD samples/bpf/syscall_nrs.h HOSTCC samples/bpf/test_lru_dist HOSTCC samples/bpf/sock_example HOSTCC samples/bpf/bpf_load.o In file included from ./tools/perf/perf-sys.h:9:0, from samples/bpf/bpf_load.c:29: ./tools/perf/perf-sys.h: In function ‘sys_perf_event_open’: ./tools/perf/perf-sys.h:68:15: error: ‘test_attr__enabled’ undeclared (first use in this function) if (unlikely(test_attr__enabled)) ^ ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ # define unlikely(x) __builtin_expect(!!(x), 0) ^ ./tools/perf/perf-sys.h:68:15: note: each undeclared identifier is reported only once for each function it appears in if (unlikely(test_attr__enabled)) ^ ./tools/include/linux/compiler.h:74:43: note: in definition of macro ‘unlikely’ # define unlikely(x) __builtin_expect(!!(x), 0) ^ In file included from samples/bpf/bpf_load.c:29:0: ./tools/perf/perf-sys.h:69:3: warning: implicit declaration of function ‘test_attr__open’ [-Wimplicit-function-declaration] test_attr__open(attr, pid, cpu, fd, group_fd, flags); ^~~~~~~~~~~~~~~ make[1]: *** [samples/bpf/bpf_load.o] Error 1 make: *** [_module_samples/bpf] Error 2 This reverts commit 4665759a. Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 KP Singh 提交于
to #29262413 commit 98beb3edeb974e906a81f305d88f7bc96b2ec83e upstream. This was added in commit eb111869301e ("compiler-types.h: add asm_inline definition") and breaks samples/bpf as clang does not support asm __inline. Fixes: eb111869301e ("compiler-types.h: add asm_inline definition") Co-developed-by: NFlorent Revest <revest@google.com> Signed-off-by: NFlorent Revest <revest@google.com> Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NSong Liu <songliubraving@fb.com> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191002191652.11432-1-kpsingh@chromium.orgSigned-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Alexei Starovoitov 提交于
to #29262413 commit 636e78b1cdb40b77a79b143dbd9d94847b360efa upstream. clang started to error on invalid asm clobber usage in x86 headers and many bpf program samples failed to build with the message: CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14: In file included from ../include/linux/in.h:23: In file included from ../include/uapi/linux/in.h:24: In file included from ../include/linux/socket.h:8: In file included from ../include/linux/uio.h:14: In file included from ../include/crypto/hash.h:16: In file included from ../include/linux/crypto.h:26: In file included from ../include/linux/uaccess.h:5: In file included from ../include/linux/sched.h:15: In file included from ../include/linux/sem.h:5: In file included from ../include/uapi/linux/sem.h:5: In file included from ../include/linux/ipc.h:9: In file included from ../include/linux/refcount.h:72: ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "er", i, "cx"); ^ ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "cx"); ^ 2 errors generated. Override volatile() to workaround the problem. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yonghong Song 提交于
fix #29255532 commit 6bf3bbe1f4d4cf405e3c2bf07bbdff56d3223ec8 upstream. x86 compilation has required asm goto support since 4.17. Since clang does not support asm goto, at 4.17, Commit b1ae32db ("x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation") worked around the issue by permitting an alternative implementation without asm goto for clang. At 5.0, more asm goto usages appeared. [yhs@148 x86]$ egrep -r asm_volatile_goto include/asm/cpufeature.h: asm_volatile_goto("1: jmp 6f\n" include/asm/jump_label.h: asm_volatile_goto("1:" include/asm/jump_label.h: asm_volatile_goto("1:" include/asm/rmwcc.h: asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ include/asm/uaccess.h: asm_volatile_goto("\n" \ include/asm/uaccess.h: asm_volatile_goto("\n" \ [yhs@148 x86]$ Compiling samples/bpf directories, most bpf programs failed compilation with error messages like: In file included from /home/yhs/work/bpf-next/samples/bpf/xdp_sample_pkts_kern.c:2: In file included from /home/yhs/work/bpf-next/include/linux/ptrace.h:6: In file included from /home/yhs/work/bpf-next/include/linux/sched.h:15: In file included from /home/yhs/work/bpf-next/include/linux/sem.h:5: In file included from /home/yhs/work/bpf-next/include/uapi/linux/sem.h:5: In file included from /home/yhs/work/bpf-next/include/linux/ipc.h:9: In file included from /home/yhs/work/bpf-next/include/linux/refcount.h:72: /home/yhs/work/bpf-next/arch/x86/include/asm/refcount.h:70:9: error: 'asm goto' constructs are not supported yet return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", ^ /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:67:2: note: expanded from macro 'GEN_BINARY_SUFFIXED_RMWcc' __GEN_RMWcc(op " %[val], %[var]\n\t" suffix, var, cc, \ ^ /home/yhs/work/bpf-next/arch/x86/include/asm/rmwcc.h:21:2: note: expanded from macro '__GEN_RMWcc' asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ ^ /home/yhs/work/bpf-next/include/linux/compiler_types.h:188:37: note: expanded from macro 'asm_volatile_goto' #define asm_volatile_goto(x...) asm goto(x) Most implementation does not even provide an alternative implementation. And it is also not practical to make changes for each call site. This patch workarounded the asm goto issue by redefining the macro like below: #define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto") If asm_volatile_goto is not used by bpf programs, which is typically the case, nothing bad will happen. If asm_volatile_goto is used by bpf programs, which is incorrect, the compiler will issue an error since "invalid use of asm_volatile_goto" is not valid assembly codes. With this patch, all bpf programs under samples/bpf can pass compilation. Note that bpf programs under tools/testing/selftests/bpf/ compiled fine as they do not access kernel internal headers. Fixes: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Fixes: 18fe5822 ("x86, asm: change the GEN_*_RMWcc() macros to not quote the condition") Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #29276773 commit 16d598030a37853a7a6b4384cad19c9c0af2f021 upstream. 59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing io_req_init_async(), but left out work.flags and hash. Do it earlier. Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Pavel Begunkov 提交于
to #29276773 commit dd821e0c95a64b5923a0c57f07d3f7563553e756 upstream. Ensure to set msg.msg_name for the async portion of send/recvmsg, as the header copy will copy to/from it. Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-