提交 · 55ce4ccf3f493f131dc704b70c818ce754f974be · openanolis / cloud-kernel

27 5月, 2020 26 次提交

io_uring: improve trace_io_uring_defer() trace point · 55ce4ccf

由 Jens Axboe 提交于 11月 21, 2019

to #26323578

commit 915967f69c591b34c5a18d6618af021a81ffd700 upstream.

We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

55ce4ccf

io_uring: drain next sqe instead of shadowing · 007d7c0a

由 Pavel Begunkov 提交于 11月 21, 2019

to #26323578

commit 1b4a51b6d03d21f55effbcf609ba5526d87d9e9d upstream.

There's an issue with the shadow drain logic in that we drop the
completion lock after deciding to defer a request, then re-grab it later
and assume that the state is still the same. In the mean time, someone
else completing a request could have found and issued it. This can cause
a stall in the queue, by having a shadow request inserted that nobody is
going to drain.

Additionally, if we fail allocating the shadow request, we simply ignore
the drain.

Instead of using a shadow request, defer the next request/link instead.
This also has the following advantages:

- removes semi-duplicated code
- doesn't allocate memory for shadows
- works better if only the head marked for drain
- doesn't need complex synchronisation

On the flip side, it removes the shadow->seq ==
last_drain_in_in_link->seq optimization. That shouldn't be a common
case, and can always be added back, if needed.

Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
Cc: Jackie Liu <liuyun01@kylinos.cn>
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

007d7c0a

io_uring: close lookup gap for dependent next work · f4e3d2b8

由 Jens Axboe 提交于 11月 20, 2019

to #26323578

commit b76da70fc3759df13e0991706451f1a2e06ba19e upstream.

When we find new work to process within the work handler, we queue the
linked timeout before we have issued the new work. This can be
problematic for very short timeouts, as we have a window where the new
work isn't visible.

Allow the work handler to store a callback function for this in the work
item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that
is set, then io-wq will call the callback when it has setup the new work
item.
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f4e3d2b8

io_uring: allow finding next link independent of req reference count · d98ccaf3

由 Jens Axboe 提交于 11月 20, 2019

to #26323578

commit 4d7dd462971405c65bfb3821dbb6b9ce13b5e8d6 upstream.

We currently try and start the next link when we put the request, and
only if we were going to free it. This means that the optimization to
continue executing requests from the same context often fails, as we're
not putting the final reference.

Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the
next request more efficiently.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d98ccaf3

io_uring: io_allocate_scq_urings() should return a sane state · 4e56986f

由 Jens Axboe 提交于 11月 20, 2019

to #26323578

commit eb065d301e8c83643367bdb0898becc364046bda upstream.

We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.

Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.

Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

4e56986f

io_uring: Always REQ_F_FREE_SQE for allocated sqe · f9d30ea7

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit bbad27b2f622fa26d107f8a72c0cd5cc102dc56e upstream.

Always mark requests with allocated sqe and deallocate it in
__io_free_req(). It's easier to follow and doesn't add edge cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f9d30ea7

io_uring: io_fail_links() should only consider first linked timeout · 5234d231

由 Jens Axboe 提交于 11月 19, 2019

to #26323578

commit 5d960724b0cb0d12469d1c62912e4a8c09c9fd92 upstream.

We currently clear the linked timeout field if we cancel such a timeout,
but we should only attempt to cancel if it's the first one we see.
Others should simply be freed like other requests, as they haven't
been started yet.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

5234d231

io_uring: Fix leaking linked timeouts · 76e1b540

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit 09fbb0a83ec6ab5a4037766261c031151985fff6 upstream.

let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT

1. submission stage: submission references for REQ and LINK_TIMEOUT
are dropped. So, references respectively (1,1,2)

2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all
linked timeouts will call cancel_timeout() and drop 1 reference.
So, references after: (0,0,1). That's a leak.

Make it treat only the first linked timeout as such, and pass others
through __io_double_put_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

76e1b540

io_uring: remove redundant check · 965269a5

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit f70193d6d8cad4cc614223fef349e6ea9d48c61f upstream.

Pass any IORING_OP_LINK_TIMEOUT request further, where it will
eventually fail in io_issue_sqe().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

965269a5

io_uring: break links for failed defer · a162e056

由 Pavel Begunkov 提交于 11月 19, 2019

to #26323578

commit d3b35796b1e3f118017491d621f624e0de7ff9fb upstream.

If io_req_defer() failed, it needs to cancel a dependant link.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a162e056

io-wq: remove extra space characters · 76333268

由 Dan Carpenter 提交于 11月 19, 2019

to #26323578

commit b2e9c7d64b7ecacc1d0f15a6af88a73cab7d8db9 upstream.

These lines are indented an extra space character.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

76333268

io_uring: request cancellations should break links · 653a3c13

由 Jens Axboe 提交于 11月 18, 2019

to #26323578

commit fba38c272a0385148935d6443cb9dc68cf1f37a7 upstream.

We currently don't explicitly break links if a request is cancelled, but
we should. Add explicitly link breakage for all types of request
cancellations that we support.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

653a3c13

io_uring: correct poll cancel and linked timeout expiration completion · 95cecacf

由 Jens Axboe 提交于 11月 18, 2019

to #26323578

commit b0dd8a412699afe3420a08f841333f3474ad45c5 upstream.

Currently a poll request fills a completion entry of 0, even if it got
cancelled. This is odd, and it makes it harder to support with chains.
Ensure that it returns -ECANCELED in the completions events if it got
cancelled, and furthermore ensure that the linked timeout that triggered
it completes with -ETIME if we did indeed trigger the completions
through a timeout.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

95cecacf

io_uring: remove dead REQ_F_SEQ_PREV flag · 468927d3

由 Jens Axboe 提交于 11月 15, 2019

to #26323578

commit e0e328c4b330712e45ba799dc589bda751323110 upstream.

With the conversion to io-wq, we no longer use that flag. Kill it.

Fixes: 561fb04a6a22 ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

468927d3

io_uring: fix sequencing issues with linked timeouts · 9f421418

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 94ae5e77a9150a8c6c57432e2db290c6868ddfad upstream.

We have an issue with timeout links that are deeper in the submit chain,
because we only handle it upfront, not from later submissions. Move the
prep + issue of the timeout link to the async work prep handler, and do
it normally for non-async queue. If we validate and prepare the timeout
links upfront when we first see them, there's nothing stopping us from
supporting any sort of nesting.

Fixes: 2665abfd757f ("io_uring: add support for linked SQE timeouts")
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

9f421418

io_uring: make req->timeout be dynamically allocated · 2daf4b5c

由 Jens Axboe 提交于 11月 15, 2019

to #26323578

commit ad8a48acc23cb13cbf4332ebabb867b1baa81842 upstream.

There are a few reasons for this:

- As a prep to improving the linked timeout logic
- io_timeout is the biggest member in the io_kiocb opcode union

This also enables a few cleanups, like unifying the timer setup between
IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple
arguments to the link/prep helpers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

2daf4b5c

io_uring: make io_double_put_req() use normal completion path · a9a99776

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 978db57e2c329fc612ff669cab9bf0007efd3ca3 upstream.

If we don't use the normal completion path, we may skip killing links
that should be errored and freed. Add __io_double_put_req() for use
within the completion path itself, other calls should just use
io_double_put_req().
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a9a99776

io_uring: cleanup return values from the queueing functions · 94453214

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 0e0702dac26b282603261f04a62711a2d9aac17b upstream.

__io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err,
but the caller doesn't care since the errors are handled inline. Clean
these up and just make them void.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

94453214

io_uring: io_async_cancel() should pass in 'nxt' request pointer · a5a701a4

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 95a5bbae05ef1ec1cceb8c1b04a482aa0b7c177c upstream.

If we have a linked request, this enables us to pass it back directly
without having to go through async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

a5a701a4

io_uring: make POLL_ADD/POLL_REMOVE scale better · f7860a3c

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit eac406c61cd0ec8fe7970ca46ddf23e40a86b579 upstream.

One of the obvious use cases for these commands is networking, where
it's not uncommon to have tons of sockets open and polled for. The
current implementation uses a list for insertion and lookup, which works
fine for file based use cases where the count is usually low, it breaks
down somewhat for higher number of files / sockets. A test case with
30k sockets being polled for and cancelled takes:

real    0m6.968s
user    0m0.002s
sys     0m6.936s

with the patch it takes:

real    0m0.233s
user    0m0.010s
sys     0m0.176s

If you go to 50k sockets, it gets even more abysmal with the current
code:

real    0m40.602s
user    0m0.010s
sys     0m40.555s

with the patch it takes:

real    0m0.398s
user    0m0.000s
sys     0m0.341s

Change is pretty straight forward, just replace the cancel_list with
a red/black tree instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f7860a3c

io-wq: remove now redundant struct io_wq_nulls_list · f24ee8ad

由 Jens Axboe 提交于 11月 14, 2019

to #26323578

commit 021d1cdda3875bf35edac9133335f622d7910abc upstream.

Since we don't iterate these lists anymore after commit:

e61df66c69b1 ("io-wq: ensure free/busy list browsing see all items")

we don't need to retain the nulls value we use for them. That means it's
pretty pointless to wrap the hlist_nulls_head in a structure, so get rid
of it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

f24ee8ad

io-wq: ensure free/busy list browsing see all items · ed0788d3

由 Jens Axboe 提交于 11月 13, 2019

to #26323578

commit e61df66c69b11bc050d233dc95714a6339192c28 upstream.

We have two lists for workers in io-wq, a busy and a free list. For
certain operations we want to browse all workers, and we currently do
that by browsing the two separate lists. But since these lists are RCU
protected, we can potentially miss workers if they move between the two
lists while we're browsing them.

Add a third list, all_list, that simply holds all workers. A worker is
added to that list when it starts, and removed when it exits. This makes
the worker iteration cleaner, too.
Reported-by: NPaul E. McKenney <paulmck@kernel.org>
Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

ed0788d3

io-wq: ensure we have a stable view of ->cur_work for cancellations · 63209d2f

由 Jens Axboe 提交于 11月 13, 2019

to #26323578

commit 36c2f9223e84c1aa84bfba90cb2e74b517c92a54 upstream.

worker->cur_work is currently protected by the lock of the wqe that the
worker belongs to. When we send a signal to a worker, we need a stable
view of ->cur_work, so we need to hold that lock. But this doesn't work
so well, since we have the opposite order potentially on queueing work.
If POLL_ADD is used with a signalfd, then io_poll_wake() is called with
the signal lock, and that sometimes needs to insert work items.

Add a specific worker lock that protects the current work item. Then we
can guarantee that the task we're sending a signal is currently
processing the exact work we think it is.
Reported-by: NPaul E. McKenney <paulmck@kernel.org>
Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

63209d2f

io_uring: Fix getting file for non-fd opcodes · d6fd7772

由 Pavel Begunkov 提交于 11月 14, 2019

to #26323578

commit a320e9fa1e2680116d165b9369dfa41d7cc1e1d1 upstream.

For timeout requests and bunch of others io_uring tries to grab a file
with specified fd, which is usually stdin/fd=0.
Update io_op_needs_file()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

d6fd7772

io_uring: introduce req_need_defer() · 54ad42f9

由 Bob Liu 提交于 11月 13, 2019

to #26323578

commit 9d858b21483981db9c0cb4b184d4cdeb4bc525c2 upstream.

Makes the code easier to read.
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

54ad42f9

io_uring: clean up io_uring_cancel_files() · 6ecd8586

由 Bob Liu 提交于 11月 13, 2019

to #26323578

commit 2f6d9b9d6357ede64a29437676884ee263039910 upstream.

We don't use the return value anymore, drop it. Also drop the
unecessary double cancel_req value check.
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

6ecd8586

26 5月, 2020 1 次提交

alinux: sched: Fix regression caused by nr_uninterruptible · 1607a485

由 Yihao Wu 提交于 5月 26, 2020

fix #27788368

per cgroup nr_uninterruptible tracking leads to huge performance regression
of hackbench. This patch delete nr_uninterruptible related code for now, to
address performance regression issue.

Fixes: 9410d314 ("alinux: cpuacct: Export nr_running & nr_uninterruptible")
Fixes: 36da4fe9 ("alinux: sched: Maintain "nr_uninterruptible" in runqueue")
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NShanpei Chen <shanpeic@linux.alibaba.com>

1607a485

18 5月, 2020 4 次提交

configs/x86: add some NET_EMATCH options as module · 1d6103ae

由 Dust Li 提交于 5月 18, 2020

to #27793353

The following configs are set to 'm' to make x86 the same
as aarch64.
  CONFIG_NET_EMATCH_CMP=m
  CONFIG_NET_EMATCH_NBYTE=m
  CONFIG_NET_EMATCH_U32=m
  CONFIG_NET_EMATCH_META=m
  CONFIG_NET_EMATCH_TEXT=m

Some commonly used cases need CONFIG_NET_EMATCH_CMP,
for example:

tc qdisc add dev eth0 root handle 1: prio bands 4
tc qdisc add dev eth0 parent 1:4 handle 40: netem delay 20ms 2ms
tc filter add dev eth0 parent 1: protocol ip prio 4 basic match
			       "cmp(u16 at 2 layer transport eq 3306)
                            and cmp(u8 at 16 layer network eq 10)
                            and cmp(u8 at 17 layer network eq 0)
                            and cmp(u8 at 18 layer network eq 200)
                            and cmp(u8 at 19 layer network eq 45)" flowid 1:4
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1d6103ae

configs/x86: align x86 NET_SCH configs to aarch64 · 0093b9dd

由 Dust Li 提交于 5月 18, 2020

to #27778669

Aligned NET_SCHED configs with aarch64, except:

1. CONFIG_NET_SCH_ATM is not enabled since we don't use
   ATM on cloud, and CONFIG_ATM is not enabled
2. CONFIG_NET_SCH_DEFAULT is not set since we still use
   pfifo_fast as the default scheduler.
3. CONFIG_NET_SCH_FQ_CODEL set to 'm' since we don't use
   fq_codel as the default qdisc
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0093b9dd

alinux: sched: make SCHED_SLI dependent on FAIR_GROUP_SCHED · 1e0cec0b

由 Yihao Wu 提交于 5月 09, 2020

fix #27497611

sched SLI feature relies heavily on CFS group scheduling. So we add
"depends on FAIR_GROUP_SCHED" in Kconfig to avoid build issues where
FAIR_GROUP_SCHED is not turned on.
Suggested-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NShanpei Chen <shanpeic@linux.alibaba.com>

1e0cec0b

configs: aarch64: keep uniform configs between ARM and X86 · bc372163

由 Shile Zhang 提交于 5月 15, 2020

to #27182371

1. build mouse driver as module;
2. disable RT_GROUP_SCHED;
3. set HZ=250.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Suggested-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

bc372163

15 5月, 2020 3 次提交

ipmi: fix hung processes in __get_guid() · f0845c9e

由 Wen Yang 提交于 4月 03, 2020

fix #27563995

commit 32830a0534700f86366f371b150b17f0f0d140d7 upstream.

The wait_event() function is used to detect command completion.
When send_guid_cmd() returns an error, smi_send() has not been
called to send data. Therefore, wait_event() should not be used
on the error path, otherwise it will cause the following warning:

[ 1361.588808] systemd-udevd   D    0  1501   1436 0x00000004
[ 1361.588813]  ffff883f4b1298c0 0000000000000000 ffff883f4b188000 ffff887f7e3d9f40
[ 1361.677952]  ffff887f64bd4280 ffffc90037297a68 ffffffff8173ca3b ffffc90000000010
[ 1361.767077]  00ffc90037297ad0 ffff887f7e3d9f40 0000000000000286 ffff883f4b188000
[ 1361.856199] Call Trace:
[ 1361.885578]  [<ffffffff8173ca3b>] ? __schedule+0x23b/0x780
[ 1361.951406]  [<ffffffff8173cfb6>] schedule+0x36/0x80
[ 1362.010979]  [<ffffffffa071f178>] get_guid+0x118/0x150 [ipmi_msghandler]
[ 1362.091281]  [<ffffffff810d5350>] ? prepare_to_wait_event+0x100/0x100
[ 1362.168533]  [<ffffffffa071f755>] ipmi_register_smi+0x405/0x940 [ipmi_msghandler]
[ 1362.258337]  [<ffffffffa0230ae9>] try_smi_init+0x529/0x950 [ipmi_si]
[ 1362.334521]  [<ffffffffa022f350>] ? std_irq_setup+0xd0/0xd0 [ipmi_si]
[ 1362.411701]  [<ffffffffa0232bd2>] init_ipmi_si+0x492/0x9e0 [ipmi_si]
[ 1362.487917]  [<ffffffffa0232740>] ? ipmi_pci_probe+0x280/0x280 [ipmi_si]
[ 1362.568219]  [<ffffffff810021a0>] do_one_initcall+0x50/0x180
[ 1362.636109]  [<ffffffff812231b2>] ? kmem_cache_alloc_trace+0x142/0x190
[ 1362.714330]  [<ffffffff811b2ae1>] do_init_module+0x5f/0x200
[ 1362.781208]  [<ffffffff81123ca8>] load_module+0x1898/0x1de0
[ 1362.848069]  [<ffffffff811202e0>] ? __symbol_put+0x60/0x60
[ 1362.913886]  [<ffffffff8130696b>] ? security_kernel_post_read_file+0x6b/0x80
[ 1362.998514]  [<ffffffff81124465>] SYSC_finit_module+0xe5/0x120
[ 1363.068463]  [<ffffffff81124465>] ? SYSC_finit_module+0xe5/0x120
[ 1363.140513]  [<ffffffff811244be>] SyS_finit_module+0xe/0x10
[ 1363.207364]  [<ffffffff81003c04>] do_syscall_64+0x74/0x180

Fixes: 50c812b2 ("[PATCH] ipmi: add full sysfs support")
Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: openipmi-developer@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 2.6.17-
Message-Id: <20200403090408.58745-1-wenyang@linux.alibaba.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

f0845c9e

configs: enable support for TCP_RT · 42c27c4f

由 xuanzhuo 提交于 5月 15, 2020

to #26353046
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: Nxuanzhuo <xuanzhuo@linux.alibaba.com>

42c27c4f

alinux: add tcprt framework to kernel · f84e8fa0

由 xuanzhuo 提交于 5月 11, 2020

to #26353046

TcpRT: Instrument and Diagnostic Analysis System for Service Quality
of Cloud Databases at Massive Scale in Real-time.

It can also provide information for all request/response services. Such as
HTTP request.

This is the kernel framework for tcprt, more work needs tcprt module
support.

TcpRt module should call tcp_unregitsert_rt before rmmod.

TcpRt hooks will be called when sock init, recv data, send data,
packet acked and socket been destroy. The private data save to
icsk->icsk_tcp_rt_priv.
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: Nxuanzhuo <xuanzhuo@linux.alibaba.com>

f84e8fa0

14 5月, 2020 3 次提交

alinux: quota: fix unused label warning in dquot_load_quota_inode() · 4dc24f04

由 Jeffle Xu 提交于 5月 14, 2020

fix #27211210

Fix the compile warning caused by the unused label 'out' since
commit ec6880e8 ("new helper: lookup_positive_unlocked()").

Fixes: ec6880e8 ("new helper: lookup_positive_unlocked()")
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4dc24f04

alinux: mm: fix undefined reference to printk_ratelimit_state · 58713685

由 Xu Yu 提交于 5月 09, 2020

fix #27508738

The variable printk_ratelimit_state is not defined if CONFIG_PRINTK is
not set, but is directly accessed in mm/oom_kill.c without considering
the config.

Consider CONFIG_PRINTK when accessing printk_ratelimit_state in
mm/oom_kill.c.

Fixes: 41a1a935 ("alinux: oom: add ratelimit printk to prevent softlockup")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

58713685

alinux: mm: fix undefined reference to mlock_fixup · 79e7c57a

由 Xu Yu 提交于 5月 09, 2020

fix #27508674

The function mlock_fixup is not defined if CONFIG_MMU is not set, but is
directly invoked by mm/unevictable.c without considering the config.

Make unevictable.o depend on mmu-$(CONFIG_MMU) where the definition of
mlock_fixup locates in.

Fixes: 7d6cb94f ("alinux: mm: Pin code section of process in memory")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NXunlei Pang <xlpang@linux.alibaba.com>

79e7c57a

11 5月, 2020 1 次提交

configs: enable multipath for kernel selftests · 9ed6dd62

由 Joseph Qi 提交于 5月 11, 2020

fix #27497636

Enable ip route multipath and dm multipath, for consistent with arm and
physical kconfig.
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>

9ed6dd62

08 5月, 2020 2 次提交

mm: return zero_resv_unavail optimization · b68e2875

由 Pavel Tatashin 提交于 10月 26, 2018

to #26809468

commit ec393a0f014eaf688a3dbe8c8a4cbb52d7f535f9 upstream.

When checking for valid pfns in zero_resv_unavail(), it is not necessary
to verify that pfns within pageblock_nr_pages ranges are valid, only the
first one needs to be checked.  This is because memory for pages are
allocated in contiguous chunks that contain pageblock_nr_pages struct
pages.

Link: http://lkml.kernel.org/r/20181002143821.5112-3-msys.mizuma@gmail.comSigned-off-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>

b68e2875

mm: zero remaining unavailable struct pages · f8521831

由 Naoya Horiguchi 提交于 10月 26, 2018

to #26809468

commit 907ec5fca3dc38d37737de826f06f25b063aa08e upstream.

Patch series "mm: Fix for movable_node boot option", v3.

This patch series contains a fix for the movable_node boot option issue
which was introduced by commit 124049de ("x86/e820: put !E820_TYPE_RAM
regions into memblock.reserved").

The commit breaks the option because it changed the memory gap range to
reserved memblock.  So, the node is marked as Normal zone even if the SRAT
has Hot pluggable affinity.

First and second patch fix the original issue which the commit tried to
fix, then revert the commit.

This patch (of 3):

There is a kernel panic that is triggered when reading /proc/kpageflags on
the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffffffffffffffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
  RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
  R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
  R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
  FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100 which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone.  Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the
default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x4
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
   memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:

  MEMBLOCK configuration:
   memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x3
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory.  So some of struct pages in the gap
range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct pages
outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e.  memblock.memory && !memblock.reserved).  This
patch extends it to cover all unavailable range, which fixes the reported
issue.

Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com
Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Tested-by: NOscar Salvador <osalvador@suse.de>
Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>

f8521831

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功