提交 · ff2cdc98bdc8ac3a0fce2d942392c19ebe949474 · openanolis / cloud-kernel

27 5月, 2020 9 次提交

io_uring: fix compat for IORING_REGISTER_FILES_UPDATE · ff2cdc98

由 Eugene Syromiatnikov 提交于 1月 15, 2020

to #26323578

commit 1292e972fff2b2d81e139e0c2fe5f50249e78c58 upstream.

fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space.  In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it.  Also, align the field naturally and check that
no garbage is passed there.

Fixes: c3a31e605620c279 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ff2cdc98

io_uring: ensure we return -EINVAL on unknown opcode · 569f4461

由 Jens Axboe 提交于 12月 11, 2019

to #26323578

commit 9e3aa61ae3e01ce1ce6361a41ef725e1f4d1d2bf upstream.

If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.

Change io_op_needs_file() to have the following return values:

0   - does not need a file
1   - does need a file
< 0 - error value

and use this to pass back the right value for this invalid case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

569f4461

io_uring: allow unbreakable links · e10c16e1

由 Jens Axboe 提交于 12月 07, 2019

to #26323578

commit 4e88d6e7793f2f445f43bd608828541d7f43b608 upstream.

Some commands will invariably end in a failure in the sense that the
completion result will be less than zero. One such example is timeouts
that don't have a completion count set, they will always complete with
-ETIME unless cancelled.

For linked commands, we sever links and fail the rest of the chain if
the result is less than zero. Since we have commands where we know that
will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
regardless of the completion result. Note that the link will still sever
if we fail submitting the parent request, hard links are only resilient
in the presence of completion results for requests that did submit
correctly.

Cc: stable@vger.kernel.org # v5.4
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

e10c16e1

io_uring: mark us with IORING_FEAT_SUBMIT_STABLE · b3b5cb38

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit da8c96906990f1108cb626ee7865e69267a3263b upstream.

If this flag is set, applications can be certain that any data for
async offload has been consumed when the kernel has consumed the
SQE.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

b3b5cb38

io_uring: ensure async punted connect requests copy data · 62753e93

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit f499a021ea8c9f70321fce3d674d8eca5bbeee2c upstream.

Just like commit f67676d160c6 for read/write requests, this one ensures
that the sockaddr data has been copied for IORING_OP_CONNECT if we need
to punt the request to async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

62753e93

io_uring: ensure async punted sendmsg/recvmsg requests copy data · 43b411d0

由 Jens Axboe 提交于 12月 02, 2019

to #26323578

commit 03b1230ca12a12e045d83b0357792075bf94a1e0 upstream.

Just like commit f67676d160c6 for read/write requests, this one ensures
that the msghdr data is fully copied if we need to punt a recvmsg or
sendmsg system call to async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

43b411d0

io_uring: add support for IORING_OP_CONNECT · 78e9fdaa

由 Jens Axboe 提交于 11月 23, 2019

to #26323578

commit f8e85cf255ad57d65eeb9a9d0e59e3dec55bdd9e upstream.

This allows an application to call connect() in an async fashion. Like
other opcodes, we first try a non-blocking connect, then punt to async
context if we have to.

Note that we can still return -EINPROGRESS, and in that case the caller
should use IORING_OP_POLL_ADD to do an async wait for completion of the
connect request (just like for regular connect(2), except we can do it
async here too).
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

78e9fdaa

net: add __sys_connect_file() helper · 351e775a

由 Jens Axboe 提交于 11月 23, 2019

to #26323578

commit bd3ded3146daa2cbb57ed353749ef99cf75371b0 upstream.

This is identical to __sys_connect(), except it takes a struct file
instead of an fd, and it also allows passing in extra file->f_flags
flags. The latter is done to support masking in O_NONBLOCK without
manipulating the original file flags.

No functional changes in this patch.

Cc: netdev@vger.kernel.org
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

351e775a

io_uring: improve trace_io_uring_defer() trace point · 55ce4ccf

由 Jens Axboe 提交于 11月 21, 2019

to #26323578

commit 915967f69c591b34c5a18d6618af021a81ffd700 upstream.

We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

55ce4ccf

15 5月, 2020 1 次提交

alinux: add tcprt framework to kernel · f84e8fa0

由 xuanzhuo 提交于 5月 11, 2020

to #26353046

TcpRT: Instrument and Diagnostic Analysis System for Service Quality
of Cloud Databases at Massive Scale in Real-time.

It can also provide information for all request/response services. Such as
HTTP request.

This is the kernel framework for tcprt, more work needs tcprt module
support.

TcpRt module should call tcp_unregitsert_rt before rmmod.

TcpRt hooks will be called when sock init, recv data, send data,
packet acked and socket been destroy. The private data save to
icsk->icsk_tcp_rt_priv.
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: Nxuanzhuo <xuanzhuo@linux.alibaba.com>

f84e8fa0

08 5月, 2020 1 次提交

mm: zero remaining unavailable struct pages · f8521831

由 Naoya Horiguchi 提交于 10月 26, 2018

to #26809468

commit 907ec5fca3dc38d37737de826f06f25b063aa08e upstream.

Patch series "mm: Fix for movable_node boot option", v3.

This patch series contains a fix for the movable_node boot option issue
which was introduced by commit 124049de ("x86/e820: put !E820_TYPE_RAM
regions into memblock.reserved").

The commit breaks the option because it changed the memory gap range to
reserved memblock.  So, the node is marked as Normal zone even if the SRAT
has Hot pluggable affinity.

First and second patch fix the original issue which the commit tried to
fix, then revert the commit.

This patch (of 3):

There is a kernel panic that is triggered when reading /proc/kpageflags on
the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffffffffffffffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
  RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
  R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
  R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
  FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100 which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone.  Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the
default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x4
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
   memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:

  MEMBLOCK configuration:
   memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x3
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory.  So some of struct pages in the gap
range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct pages
outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e.  memblock.memory && !memblock.reserved).  This
patch extends it to cover all unavailable range, which fixes the reported
issue.

Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com
Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Tested-by: NOscar Salvador <osalvador@suse.de>
Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>

f8521831

06 5月, 2020 3 次提交

blktrace: Protect q->blk_trace with RCU · 62b6bd7c

由 Jan Kara 提交于 2月 06, 2020

to #24913189

commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream.

KASAN is reporting that __blk_add_trace() has a use-after-free issue
when accessing q->blk_trace. Indeed the switching of block tracing (and
thus eventual freeing of q->blk_trace) is completely unsynchronized with
the currently running tracing and thus it can happen that the blk_trace
structure is being freed just while __blk_add_trace() works on it.
Protect accesses to q->blk_trace by RCU during tracing and make sure we
wait for the end of RCU grace period when shutting down tracing. Luckily
that is rare enough event that we can afford that. Note that postponing
the freeing of blk_trace to an RCU callback should better be avoided as
it could have unexpected user visible side-effects as debugfs files
would be still existing for a short while block tracing has been shut
down.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711
CC: stable@vger.kernel.org
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reported-by: NTristan Madani <tristmd@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
[bwh: Backported to 4.19: adjust context]
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2019-19768
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

62b6bd7c

net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup · ff7c8fa2

由 Sabrina Dubroca 提交于 12月 04, 2019

to #24913189

commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream.

ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.

All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().

This requires some changes in all the callers, as these two functions
take different arguments and have different return types.

Fixes: 5f81bd2e ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: NXiumei Mu <xmu@redhat.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
[bwh: Backported to 4.19:
 - Drop change in lwt_bpf.c
 - Delete now-unused "ret" in mlx5e_route_lookup_ipv6()
 - Initialise "out_dev" in mlx5e_create_encap_header_ipv6() to avoid
   introducing a spurious "may be used uninitialised" warning
 - Adjust filenames, context, indentation]
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2020-1749
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

ff7c8fa2

net: ipv6: add net argument to ip6_dst_lookup_flow · c1394349

由 Sabrina Dubroca 提交于 12月 04, 2019

to #24913189

commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e upstream.

This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
as some modules currently pass a net argument without a socket to
ip6_dst_lookup. This is equivalent to commit 343d60aa ("ipv6: change
ipv6_stub_impl.ipv6_dst_lookup to take net argument").
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
[bwh: Backported to 4.19: adjust context]
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2020-1749
[zsl: fixes conflicts in net/sctp/ipv6.c]
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

c1394349

30 4月, 2020 5 次提交

virtio_pmem: fix sparse warning · 663765e1

由 Pankaj Gupta 提交于 7月 12, 2019

fix #27138800

commit 8c2e408e73f735d2e6e8b43f9b038c9abb082939 upstream.

This patch fixes below sparse warning related to __virtio
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment
                                (different base types)
nd_virtio.c:56:28:    expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:    got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2
                                (different base types)
nd_virtio.c:93:59:    expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:    got unsigned int [unsigned] [usertype] ret
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

663765e1

dax: check synchronous mapping is supported · bd85b796

由 Pankaj Gupta 提交于 7月 05, 2019

fix #27138800

commit 32de1484648a837db5dea0a7007fe7136804e392 upstream.

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

bd85b796

libnvdimm: add dax_dev sync flag · b6f6b2d6

由 Pankaj Gupta 提交于 7月 05, 2019

fix #27138800

commit fefc1d97fa4b5e016bbe15447dc3edcd9e1bcb9f upstream.

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

b6f6b2d6

virtio-pmem: Add virtio pmem driver · 76015475

由 Pankaj Gupta 提交于 7月 05, 2019

fix #27138800

commit 6e84200c0a2994b991259d19450eee561029bf70 upstream.

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJakub Staron <jstaron@google.com>
Tested-by: NJakub Staron <jstaron@google.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

76015475

libnvdimm: nd_region flush callback support · be6f116c

由 Pankaj Gupta 提交于 7月 05, 2019

fix #27138800

commit c5d4355d10d414a96ca870b731756b89d068d57a upstream.

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

be6f116c

29 4月, 2020 2 次提交

new helper: lookup_positive_unlocked() · ec6880e8

由 Al Viro 提交于 10月 31, 2019

fix #27211210

commit 6c2d4798a8d16cf4f3a28c3cd4af4f1dcbbb4d04 upstream.

Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT).  Provide a helper that would do just that.  Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ec6880e8

fs/namei.c: pull positivity check into follow_managed() · da2c0773

由 Al Viro 提交于 11月 04, 2019

fix #27211210

commit d41efb522e902364ab09c782d511c1bedc388ddd upstream.

There are 4 callers; two proceed to check if result is positive and
fail with ENOENT if it isn't; one (in handle_lookup_down()) is
guaranteed to yield positive and one (in lookup_fast()) is _preceded_
by positivity check.

However, follow_managed() on a negative dentry is a (fairly cheap)
no-op on anything other than autofs.  And negative autofs dentries
are never hashed, so lookup_fast() is not going to run into one
of those.  Moreover, successful follow_managed() on a _positive_
dentry never yields a negative one (and we significantly rely upon
that in callers of lookup_fast()).

In other words, we can easily transpose the positivity check and
the call of follow_managed() in lookup_fast().  And that allows
to fold the positivity check *into* follow_managed(), simplifying
life for the code downstream of its calls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

da2c0773

28 4月, 2020 1 次提交

x86/resctrl: Rename the config option INTEL_RDT to RESCTRL · fce127d5

由 Babu Moger 提交于 11月 21, 2018

to #26613714

commit 6fe07ce35e8ad870ba1cf82e0481e0fc0f526eff upstream.

The resource control feature is supported by both Intel and AMD. So,
rename CONFIG_INTEL_RDT to the vendor-neutral CONFIG_RESCTRL.

Now CONFIG_RESCTRL will be used for both Intel and AMD to enable
Resource Control support. Update the texts in config and condition
accordingly.

 [ bp: Simplify Kconfig text. ]
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-9-babu.moger@amd.com

[ Shile: fixed conflict in arch/x86/Kconfig ]
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Tested-by: NWANG Siyuan <Siyuan.Wang@amd.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

fce127d5

26 4月, 2020 1 次提交

alinux: mm: restrict the print message frequency further when memcg oom triggers · 8962f125

由 zhongjiang-ali 提交于 2月 24, 2020

to #24843736

It is because too much memcg oom printed message will trigger the softlockup.
In general, we use the same ratelimit oom_rc between system and memcg
to limit the print message. But it is more frequent to exceed its limit
of the memcg, thus it would will result in oom easily. And A lot of
printed information will be outputed. It's likely to trigger softlockup.

The patch use different ratelimit to limit the memcg and system oom. And
we test the patch using the default value in the memcg, The issue will
go.

[xuyu: adjust corresponding sysctl indexes]
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>

8962f125

24 4月, 2020 5 次提交

alinux: sched: Introduce per-cgroup iowait accounting · 9e7b35d6

由 Yihao Wu 提交于 4月 21, 2020

to #26424323

We account iowait when the cgroup's se is idle, and it has blocked
task on the hierarchy of se->my_q.

To achieve this, we also add cg_nr_running to track the hierarchical
number of blocked tasks. We do it when a blocked task wakes up or
a task is blocked.
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

9e7b35d6

alinux: sched: Introduce per-cgroup steal accounting · c7552980

由 Yihao Wu 提交于 3月 10, 2020

to #26424323

From the previous patch. We know there are 4 possible states.
Since steal state's transition is complex. We choose to account
its supplement.

        steal = elapse - idle - sum_exec_raw - ineffective

Where elapse is the time since the cgroup is created. sum_exec_raw is
the running time including IRQ time. ineffective is the total time that
the cpuacct-binded cpuset doesn't allow this cpu for the cgroup.
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

c7552980

alinux: sched: Introduce per-cgroup idle accounting · 61e58859

由 Yihao Wu 提交于 3月 10, 2020

to #26424323

Since we concern idle, let's take idle as the center state. And omit
transition between other stats. Below is the state transition graph:

                                sleep->deque
+-----------+ cpumask +-------+ exit->deque +-------+
|ineffective|-------- | idle  | <-----------|running|
+-----------+         +-------+             +-------+
                        ^ |
 unthrtl child -> deque | |
          wake -> deque | |thrtl chlid -> enque
       migrate -> deque | |migrate -> enque
                        | v
                      +-------+
                      | steal |
                      +-------+

We conclude idle state condition as:

!se->on_rq && !my_q->throttled && cpu allowed.

From this graph and condition, we can hook (de|en)queue_task_fair
update_cpumasks_hier, (un|)throttle_cfs_rq to account idle state.

In the hooked functions, we also check the conditions, to avoid
accounting unwanted cpu clocks.
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

61e58859

alinux: cpuacct: export cpuacct.proc_stat interface · 9be0ac2b

由 Xunlei Pang 提交于 7月 23, 2019

to #26424323

Add the cgroup file "cpuacct.proc_stat", we'll export per-cgroup
cpu usages and some other scheduler statistics in this interface.
Reviewed-by: NMichael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>

9be0ac2b

alinux: sched: Maintain "nr_uninterruptible" in runqueue · 36da4fe9

由 Xunlei Pang 提交于 7月 24, 2019

to #26424323

It's relatively easy to maintain nr_uninterruptible in scheduler
compared to doing it in cpuacct, we assume that "cpu,cpuacct" are
bound together, so that it can be used for per-cgroup load.

This will be needed to calculate per-cgroup load average later.
Reviewed-by: NMichael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>

36da4fe9

23 4月, 2020 3 次提交

mm, compaction: capture a page under direct compaction · 35d915be

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit 5e1f0f098b4649fad53011246bcaeff011ffdf5d upstream

Compaction is inherently race-prone as a suitable page freed during
compaction can be allocated by any parallel task. This patch uses a
capture_control structure to isolate a page immediately when it is freed
by a direct compactor in the slow path of the page allocator. The
intent is to avoid redundant scanning.

5.0.0-rc1 5.0.0-rc1
selective-v3r17 capture-v3r19
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 2582.11 ( 0.00%) 2563.68 ( 0.71%)
Amean fault-both-5 4500.26 ( 0.00%) 4233.52 ( 5.93%)
Amean fault-both-7 5819.53 ( 0.00%) 6333.65 ( -8.83%)
Amean fault-both-12 9321.18 ( 0.00%) 9759.38 ( -4.70%)
Amean fault-both-18 9782.76 ( 0.00%) 10338.76 ( -5.68%)
Amean fault-both-24 15272.81 ( 0.00%) 13379.55 * 12.40%*
Amean fault-both-30 15121.34 ( 0.00%) 16158.25 ( -6.86%)
Amean fault-both-32 18466.67 ( 0.00%) 18971.21 ( -2.73%)

Latency is only moderately affected but the devil is in the details. A
closer examination indicates that base page fault latency is reduced but
latency of huge pages is increased as it takes creater care to succeed.
Part of the "problem" is that allocation success rates are close to 100%
even when under pressure and compaction gets harder

5.0.0-rc1 5.0.0-rc1
selective-v3r17 capture-v3r19
Percentage huge-3 96.70 ( 0.00%) 98.23 ( 1.58%)
Percentage huge-5 96.99 ( 0.00%) 95.30 ( -1.75%)
Percentage huge-7 94.19 ( 0.00%) 97.24 ( 3.24%)
Percentage huge-12 94.95 ( 0.00%) 97.35 ( 2.53%)
Percentage huge-18 96.74 ( 0.00%) 97.30 ( 0.58%)
Percentage huge-24 97.07 ( 0.00%) 97.55 ( 0.50%)
Percentage huge-30 95.69 ( 0.00%) 98.50 ( 2.95%)
Percentage huge-32 96.70 ( 0.00%) 99.27 ( 2.65%)

And scan rates are reduced as expected by 6% for the migration scanner
and 29% for the free scanner indicating that there is less redundant
work.

Compaction migrate scanned 20815362 19573286
Compaction free scanned 16352612 11510663

[mgorman@techsingularity.net: remove redundant check]
Link: http://lkml.kernel.org/r/20190201143853.GH9565@techsingularity.net
Link: http://lkml.kernel.org/r/20190118175136.31341-23-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

35d915be

mm, compaction: be selective about what pageblocks to clear skip hints · 1edbee61

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit e332f741a8dd1ec9a6dc8aa997296ecbfe64323e upstream

Pageblock hints are cleared when compaction restarts or kswapd makes
enough progress that it can sleep but it's over-eager in that the bit is
cleared for migration sources with no LRU pages and migration targets
with no free pages.  As pageblock skip hint flushes are relatively rare
and out-of-band with respect to kswapd, this patch makes a few more
expensive checks to see if it's appropriate to even clear the bit.
Every pageblock that is not cleared will avoid 512 pages being scanned
unnecessarily on x86-64.

The impact is variable with different workloads showing small
differences in latency, success rates and scan rates.  This is expected
as clearing the hints is not that common but doing a small amount of
work out-of-band to avoid a large amount of work in-band later is
generally a good thing.

Link: http://lkml.kernel.org/r/20190118175136.31341-22-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Signed-off-by: NQian Cai <cai@lca.pw>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
[cai@lca.pw: no stuck in __reset_isolation_pfn()]
  Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@lca.pwSigned-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

1edbee61

mm: move zone watermark accesses behind an accessor · 1bb2b85e

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit a921444382b49cc7fdeca3fba3e278bc09484a27 upstream

This is a preparation patch only, no functional change.

Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

1bb2b85e

22 4月, 2020 2 次提交

iommu/amd: Re-factor guest virtual APIC (de-)activation code · d5a28aba

由 Suthikulpanit, Suravee 提交于 3月 30, 2020

fix #26319040

commit b9c6ff94e43a0ee053e0c1d983fba1ac4953b762 upstream.

Re-factore the logic for activate/deactivate guest virtual APIC mode
(GAM)
into helper functions, and export them for other drivers (e.g. SVM).
to support run-time activate/deactivate of SVM AVIC.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

d5a28aba

ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens · 428adfc0

由 Jeremy Linton 提交于 6月 26, 2019

to #25688970

commit 56855a99f3d0d1e9f1f4e24f5851f9bf14c83296 upstream

ACPI 6.3 adds a flag to indicate that child nodes are all
identical cores. This is useful to authoritatively determine
if a set of (possibly offline) cores are identical or not.

Since the flag doesn't give us a unique id we can generate
one and use it to create bitmaps of sibling nodes, or simply
in a loop to determine if a subset of cores are identical.
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NHanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

428adfc0

17 4月, 2020 2 次提交

alinux: mm: Pin code section of process in memory · 7d6cb94f

由 Xunlei Pang 提交于 9月 19, 2019

to #26782094

Pin code section of process in memory for the corresponding
VMAs like mlock does.

Usage:
- pin process "PID"
  echo PID > /proc/unevictable/add_pid
- unpin it
  echo PID > /proc/unevictable/del_pid
- show all pinned process pids
  cat /proc/unevictable/add_pid

For easy maintenance, we place it in kernel because it has no
side effect if don't use it.
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

7d6cb94f

alinux: mm, memcg: rework memsli interfaces · 055ed63b

由 Xu Yu 提交于 1月 13, 2020

to #26424368

This reworks memsli "start", "end", "update" interfaces to make it more
clear and symmetrical, by merging "update" action into "end", just like
what psi_memstall_{enter, leave} does.

Now the latency probe pattern of memsli is as follows:

memcg_lat_stat_start(&start);
/* kernel codes being probed */
memcg_lat_stat_end(MEM_LAT_XXX, start);

This also formats the codes and fixes the warning(s) produced when
CONFIG_MEMSLI is not set.
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

055ed63b

16 4月, 2020 5 次提交

alinux: mm, memcg: add kconfig MEMSLI · 9e58d704

由 Xu Yu 提交于 1月 08, 2020

to #26424368

This introduces the new bool kconfig MEMSLI, determining whether the
memsli (memory latency histogram) feature should be built-in or not.
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

9e58d704

alinux: mm, memcg: add memsli procfs switch interface · 892970b7

由 Xu Yu 提交于 12月 25, 2019

to #26424368

Since memsli also records latency histogram for swapout and swapin,
which are NOT in the slow memory path, the overhead of memsli could
be nonnegligible in some specific scenarios.

For example, in scenarios with frequent swapping out and in, memsli
could introduce overhead of ~1% of total run time of the synthetic
testcase.

This adds procfs interface for memsli switch. The memsli feature is
enabled by default, and you can now disable it by:

$ echo 0 > /proc/memsli/enabled

Apparently, you can check current memsli switch status by:

$ cat /proc/memsli/enabled

Note that disabling memsli at runtime will NOT clear the existing
latency histogram. You still need to manually reset the specified
latency histogram(s) by echo 0 into the corresponding cgroup control
file(s).
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

892970b7

alinux: mm, memcg: record latency of swapout and swapin in every memcg · ddfd4d5e

由 Xu Yu 提交于 11月 04, 2019

to #26424368

Probe and calculate the latency of global swapout, memcg swapout and
swapin respectively, and then group into the latency histogram in struct
mem_cgroup.

Note that the latency in each memcg is aggregated from all child memcgs.

Usage:

$ cat memory.direct_swapout_global_latency
0-1ms:  98313
1-5ms:  0
5-10ms:         0
10-100ms:       0
100-500ms:      0
500-1000ms:     0
>=1000ms:       0
total(ms):      52

Each line is the count of global swapout within the appropriate latency
range.

To clear the latency histogram:

$ echo 0 > memory.direct_swapout_global_latency
$ cat memory.direct_swapout_global_latency
0-1ms:  0
1-5ms:  0
5-10ms:         0
10-100ms:       0
100-500ms:      0
500-1000ms:     0
>=1000ms:       0
total(ms):      0

The usage of memory.direct_swapout_memcg_latency and
memory.direct_swapin_latency is the same as
memory.direct_swapout_global_latency.
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

ddfd4d5e

alinux: mm, memcg: rework memory latency histogram interfaces · 837e53ab

由 Xu Yu 提交于 11月 04, 2019

to #26424368

There are some duplicate codes in the original implementation of memory
latency histogram, such as {x, y, z}_show, and {x, y, z}_write, where x,
y, z represents various types of memory latency.

This reworks common codes of memory latency histogram to make it easier
to add more types of memory latency later.
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

837e53ab

alinux: mm, memcg: record latency of direct compact in every memcg · 4bec5cfe

由 Xu Yu 提交于 10月 31, 2019

to #26424368

Probe and calculate the latency of direct compact, and then group into
the latency histogram in struct mem_cgroup.

Note that the latency in each memcg is aggregated from all child memcgs.

Usage:

$ cat memory.direct_compact_latency
0-1ms:  1176
1-5ms:  259
5-10ms:         17
10-100ms:       10
100-500ms:      0
500-1000ms:     0
>=1000ms:       0
total(ms):      921

Each line is the count of direct compact within the appropriate latency
range.

To clear the latency histogram:

$ echo 0 > memory.direct_compact_latency
$ cat memory.direct_compact_latency
0-1ms:  0
1-5ms:  0
5-10ms:         0
10-100ms:       0
100-500ms:      0
500-1000ms:     0
>=1000ms:       0
total(ms):      0
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

4bec5cfe

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功