提交 · a7ede3d16808b8f3915c8572d783530a82b2f027 · openeuler / Kernel

12 12月, 2019 1 次提交

raid5: need to set STRIPE_HANDLE for batch head · a7ede3d1

由 Guoqing Jiang 提交于 11月 27, 2019

With commit 6ce220dd ("raid5: don't set
STRIPE_HANDLE to stripe which is in batch list"), we don't want to set
STRIPE_HANDLE flag for sh which is already in batch list.

However, the stripe which is the head of batch list should set this flag,
otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head),
it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved.

Thanks for Xiao's effort to verify the change.

Fixes: 6ce220dd ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list")
Reported-by: NXiao Ni <xni@redhat.com>
Tested-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

a7ede3d1

10 12月, 2019 1 次提交

block: fix "check bi_size overflow before merge" · cc90bc68

由 Andreas Gruenbacher 提交于 12月 09, 2019

This partially reverts commit e3a5d8e3.

Commit e3a5d8e3 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page.  This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached.  Instead, what we want here is only
the bi_size overflow check.

Fixes: e3a5d8e3 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc90bc68

07 12月, 2019 4 次提交

Merge branch 'nvme/for-5.5' of git://git.infradead.org/nvme into for-linus · dc3ecfc9

由 Jens Axboe 提交于 12月 06, 2019

Pull NVMe fixes from Keith

* 'nvme/for-5.5' of git://git.infradead.org/nvme:
  nvme/pci: Fix read queue count
  nvme/pci Limit write queue sizes to possible cpus
  nvme/pci: Fix write and poll queue types
  nvme/pci: Remove last_cq_head
  nvme: Namepace identification descriptor list is optional
  nvme-fc: fix double-free scenarios on hw queues
  nvme: else following return is not needed
  nvme: add error message on mismatching controller ids
  nvme_fc: add module to ops template to allow module references
  nvmet-loop: Avoid preallocating big SGL for data
  nvme-fc: Avoid preallocating big SGL for data
  nvme-rdma: Avoid preallocating big SGL for data

dc3ecfc9

nvme/pci: Fix read queue count · 7e4c6b9a

由 Keith Busch 提交于 12月 06, 2019

If nvme.write_queues equals the number of CPUs, the driver had decreased
the number of interrupts available such that there could only be one read
queue even if the controller could support more. Remove the interrupt
count reduction in this case. The driver wouldn't request more IRQs than
it wants queues anyway.
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

7e4c6b9a

nvme/pci Limit write queue sizes to possible cpus · 17c33167

由 Keith Busch 提交于 12月 07, 2019

The driver can never use more queues of any type than the number of
possible CPUs, so a higher value causes the driver to allocate more
memory for IO queues than it could ever use. Limit the parameter at
module load time to the number of possible cpus.
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

17c33167

nvme/pci: Fix write and poll queue types · 3f68baf7

由 Keith Busch 提交于 12月 07, 2019

The number of poll or write queues should never be negative. Use unsigned
types so that it's not possible to break have the driver not allocate
any queues.
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

3f68baf7

06 12月, 2019 2 次提交

Merge branch 'io_uring-5.5' into for-linus · 85394299

由 Jens Axboe 提交于 12月 05, 2019

* io_uring-5.5:
  io_uring: fix a typo in a comment
  io_uring: hook all linked requests via link_list
  io_uring: fix error handling in io_queue_link_head
  io_uring: use hash table for poll command lookups

85394299

block: fix memleak of bio integrity data · ece841ab

由 Justin Tee 提交于 12月 05, 2019

7c20f116 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.

So memory leak of bio integrity data is caused by commit 7c20f116.

Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by Justin Tee <justin.tee@broadcom.com>

Add commit log, and simplify/fix the original patch wroten by Justin.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ece841ab

05 12月, 2019 11 次提交

io_uring: fix a typo in a comment · 0b4295b5

由 LimingWu 提交于 12月 05, 2019

thatn -> than.
Signed-off-by: NLiming Wu <19092205@suning.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b4295b5

bfq-iosched: Ensure bio->bi_blkg is valid before using it · 08802ed6

由 Hou Tao 提交于 12月 05, 2019

bio->bi_blkg will be NULL when the issue of the request
has bypassed the block layer as shown in the following oops:

 Internal error: Oops: 96000005 [#1] SMP
 CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4
 Call trace:
  percpu_counter_add_batch+0x38/0x4c8
  bfqg_stats_update_legacy_io+0x9c/0x280
  bfq_insert_requests+0xbac/0x2190
  blk_mq_sched_insert_request+0x288/0x670
  blk_execute_rq_nowait+0x140/0x178
  blk_execute_rq+0x8c/0x140
  sg_io+0x604/0x9c0
  scsi_cmd_ioctl+0xe38/0x10a8
  scsi_cmd_blk_ioctl+0xac/0xe8
  sd_ioctl+0xe4/0x238
  blkdev_ioctl+0x590/0x20e0
  block_ioctl+0x60/0x98
  do_vfs_ioctl+0xe0/0x1b58
  ksys_ioctl+0x80/0xd8
  __arm64_sys_ioctl+0x40/0x78
  el0_svc_handler+0xc4/0x270

so ensure its validity before using it.

Fixes: fd41e603 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08802ed6

io_uring: hook all linked requests via link_list · 4493233e

由 Pavel Begunkov 提交于 12月 05, 2019

Links are created by chaining requests through req->list with an
exception that head uses req->link_list. (e.g. link_list->list->list)
Because of that, io_req_link_next() needs complex splicing to advance.

Link them all through list_list. Also, it seems to be simpler and more
consistent IMHO.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4493233e

io_uring: fix error handling in io_queue_link_head · 2e6e1fde

由 Pavel Begunkov 提交于 12月 05, 2019

In case of an error io_submit_sqe() drops a request and continues
without it, even if the request was a part of a link. Not only it
doesn't cancel links, but also may execute wrong sequence of actions.

Stop consuming sqes, and let the user handle errors.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e6e1fde

io_uring: use hash table for poll command lookups · 78076bb6

由 Jens Axboe 提交于 12月 04, 2019

We recently changed this from a single list to an rbtree, but for some
real life workloads, the rbtree slows down the submission/insertion
case enough so that it's the top cycle consumer on the io_uring side.
In testing, using a hash table is a more well rounded compromise. It
is fast for insertion, and as long as it's sized appropriately, it
works well for the cancellation case as well. Running TAO with a lot
of network sockets, this removes io_poll_req_insert() from spending
2% of the CPU cycles.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

78076bb6

io-wq: clear node->next on list deletion · 08bdcc35

由 Jens Axboe 提交于 12月 04, 2019

If someone removes a node from a list, and then later adds it back to
a list, we can have invalid data in ->next. This can cause all sorts
of issues. One such use case is the IORING_OP_POLL_ADD command, which
will do just that if we race and get woken twice without any pending
events. This is a pretty rare case, but can happen under extreme loads.
Dan reports that he saw the following crash:

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0
Oops: 0002 [#1] SMP
CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
RIP: 0010:io_wqe_enqueue+0x3e/0xd0
Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00
RSP: 0000:ffffc90006858a08 EFLAGS: 00010082
RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000
RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0
RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8
R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100
FS:  00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 io_poll_wake+0x12f/0x2a0
 __wake_up_common+0x86/0x120
 __wake_up_common_lock+0x7a/0xc0
 sock_def_readable+0x3c/0x70
 tcp_rcv_established+0x557/0x630
 tcp_v6_do_rcv+0x118/0x3c0
 tcp_v6_rcv+0x97e/0x9d0
 ip6_protocol_deliver_rcu+0xe3/0x440
 ip6_input+0x3d/0xc0
 ? ip6_protocol_deliver_rcu+0x440/0x440
 ipv6_rcv+0x56/0xd0
 ? ip6_rcv_finish_core.isra.18+0x80/0x80
 __netif_receive_skb_one_core+0x50/0x70
 netif_receive_skb_internal+0x2f/0xa0
 napi_gro_receive+0x125/0x150
 mlx5e_handle_rx_cqe+0x1d9/0x5a0
 ? mlx5e_poll_tx_cq+0x305/0x560
 mlx5e_poll_rx_cq+0x49f/0x9c5
 mlx5e_napi_poll+0xee/0x640
 ? smp_reschedule_interrupt+0x16/0xd0
 ? reschedule_interrupt+0xf/0x20
 net_rx_action+0x286/0x3d0
 __do_softirq+0xca/0x297
 irq_exit+0x96/0xa0
 do_IRQ+0x54/0xe0
 common_interrupt+0xf/0xf
 </IRQ>
RIP: 0033:0x7fdc627a2e3a
Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10

when running a networked workload with about 5000 sockets being polled
for. Fix this by clearing node->next when the node is being removed from
the list.

Fixes: 6206f0e1 ("io-wq: shrink io_wq_work a bit")
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08bdcc35

io_uring: ensure deferred timeouts copy necessary data · 2d28390a

由 Jens Axboe 提交于 12月 04, 2019

If we defer a timeout, we should ensure that we copy the timespec
when we have consumed the sqe. This is similar to commit f67676d1
for read/write requests. We already did this correctly for timeouts
deferred as links, but do it generally and use the infrastructure added
by commit 1a6b74fc instead of having the timeout deferral use its
own.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d28390a

io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT · 901e59bb

由 Jens Axboe 提交于 12月 04, 2019

There's really no reason why we forbid things like link/drain etc on
regular timeout commands. Enable the usual SQE flags on timeouts.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

901e59bb

null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED · bca1c43c

由 Jens Axboe 提交于 12月 04, 2019

If BLK_DEV_ZONED isn't set, 'ret' isn't used. This makes gcc complain,
rightfully. Move ret where it is used.

Fixes: 979d5447 ("null_blk: cleanup null_gendisk_register")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bca1c43c

brd: warn on un-aligned buffer · f1acbf21

由 Ming Lei 提交于 12月 04, 2019

Queue dma alignment limit requires users(fs, target, ...) of block layer
to pass aligned buffer.

So far brd doesn't support un-aligned buffer, even though it is easy
to support it.

However, given brd is often used for debug purpose, and there are other
drivers which can't support un-aligned buffer too.

So add warning so that brd users know what to fix.
Reported-by: NStephen Rust <srust@blockbridge.com>
Cc: Stephen Rust <srust@blockbridge.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1acbf21

brd: remove max_hw_sectors queue limit · 36582a5a

由 Ming Lei 提交于 12月 04, 2019

Now we depend on blk_queue_split() to respect most of queue limit
(the only one exception could be dma alignment), however
blk_queue_split() isn't used for brd, so this limit isn't respected
since v4.3.

Also max_hw_sectors limit doesn't play a big role for brd, which is
added since brd is added to tree for unknown reason.

So remove it.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

36582a5a

04 12月, 2019 3 次提交

xen/blkback: Avoid unmapping unmapped grant pages · f9bd84a8

由 SeongJae Park 提交于 11月 26, 2019

For each I/O request, blkback first maps the foreign pages for the
request to its local pages.  If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.

However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped.  Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure.  On a
system that fails the allocation frequently, this problem leads to
following kernel crash.

  [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
  [  372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40
  [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
  [  372.012562] Oops: 0002 [#1] SMP
  [  372.012566] Modules linked in: act_police sch_ingress cls_u32
  ...
  [  372.012746] Call Trace:
  [  372.012752]  [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40
  [  372.012759]  [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
  ...
  [  372.012802]  [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback]
  ...
  Decompressing Linux... Parsing ELF... done.
  Booting the kernel.
  [    0.000000] Initializing cgroup subsys cpuset

This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.

Fixes: c6cc142d ("xen-blkback: use balloon pages for all mappings")
Reviewed-by: NDavid Woodhouse <dwmw@amazon.de>
Reviewed-by: NMaximilian Heyne <mheyne@amazon.de>
Reviewed-by: NPaul Durrant <pdurrant@amazon.co.uk>
Reviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NSeongJae Park <sjpark@amazon.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9bd84a8

io_uring: handle connect -EINPROGRESS like -EAGAIN · 87f80d62

由 Jens Axboe 提交于 12月 03, 2019

Right now we return it to userspace, which means the application has
to poll for the socket to be writeable. Let's just treat it like
-EAGAIN and have io_uring handle it internally, this makes it much
easier to use.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87f80d62

block: set the zone size in blk_revalidate_disk_zones atomically · 6c6b3549

由 Christoph Hellwig 提交于 12月 03, 2019

The current zone revalidation code has a major problem in that it
doesn't update the zone size and q->nr_zones atomically, leading
to a short window where an out of bounds access to the zone arrays
is possible.

To fix this move the setting of the zone size into the crticial
sections blk_revalidate_disk_zones so that it gets updated together
with the zone bitmaps and q->nr_zones.  This also slightly simplifies
the caller as it deducts the zone size from the report_zones.

This change also allows to check for a power of two zone size in generic
code.
Reported-by: NHans Holmberg <hans@owltronix.com>
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6c6b3549

03 12月, 2019 18 次提交

block: don't handle bio based drivers in blk_revalidate_disk_zones · ae58954d

由 Christoph Hellwig 提交于 12月 03, 2019

bio based drivers only need to update q->nr_zones.  Do that manually
instead of overloading blk_revalidate_disk_zones to keep that function
simpler for the next round of changes that will rely even more on the
request based functionality.
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ae58954d

block: allocate the zone bitmaps lazily · e94f5819

由 Christoph Hellwig 提交于 12月 03, 2019

Allocate the conventional zone bitmap and the sequential zone locking
bitmap only when we find a zone of the respective type.  This avoids
wasting memory on the conventional zone bitmap for devices that only
have sequential zones, and will also prepare for other future changes.
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e94f5819

block: replace seq_zones_bitmap with conv_zones_bitmap · f216fdd7

由 Christoph Hellwig 提交于 12月 03, 2019

Invert the meaning of seq_zones_bitmap by keeping a bitmap of
conventional zones.  This allows not having a bitmap for devices
that do not have conventional zones.
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f216fdd7

block: simplify blkdev_nr_zones · 9b38bb4b

由 Christoph Hellwig 提交于 12月 03, 2019

Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity.  This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b38bb4b

C
block: remove the empty line at the end of blk-zoned.c · bb556282
由 Christoph Hellwig 提交于 12月 03, 2019
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
bb556282

null_blk: cleanup null_gendisk_register · 979d5447

由 Christoph Hellwig 提交于 12月 03, 2019

Use a saner size calculation, and do a trivial cleanup on the zone
revalidation to prepare to future changes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

979d5447

null_blk: fix zone size paramter check · 5c4bd1f4

由 Damien Le Moal 提交于 12月 03, 2019

For zoned=1 mode, the zone size must be a power of 2. Check this not
only when the zone size is specified during modprobe, but also when
creating a zoned null_blk device using configfs.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c4bd1f4

nvme/pci: Remove last_cq_head · f6c4d97b

由 Keith Busch 提交于 12月 03, 2019

We had been saving the last_cq_head seen from an interrupt so that a
polled queue wouldn't mistakenly trigger spruious interrupt detection. We
don't poll interrupt driven queues any more, so saving this value is
pointless.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

f6c4d97b

block: optimise bvec_iter_advance() · 795ee49c

由 Pavel Begunkov 提交于 11月 30, 2019

bvec_iter_advance() is quite popular, but compilers fail to do proper
alias analysis and optimise it good enough. The assembly is checked
for gcc 9.2, x86-64.

- remove @iter->bi_size from min(...), as it's always less than @bytes.
Modify at the beginning and forget about it.

- the compiler isn't able to collapse memory dependencies and remove
writes in the loop. Help it by explicitely using local vars.
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

795ee49c

io_uring: remove io_wq_current_is_worker · 8cdda87a

由 Jackie Liu 提交于 12月 02, 2019

Since commit b18fdf71 ("io_uring: simplify io_req_link_next()"),
the io_wq_current_is_worker function is no longer needed, clean it
up.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cdda87a

io_uring: remove parameter ctx of io_submit_state_start · 22efde59

由 Jackie Liu 提交于 12月 02, 2019

Parameter ctx we have never used, clean it up.
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

22efde59

io_uring: mark us with IORING_FEAT_SUBMIT_STABLE · da8c9690

由 Jens Axboe 提交于 12月 02, 2019

If this flag is set, applications can be certain that any data for
async offload has been consumed when the kernel has consumed the
SQE.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da8c9690

io_uring: ensure async punted connect requests copy data · f499a021

由 Jens Axboe 提交于 12月 02, 2019

Just like commit f67676d1 for read/write requests, this one ensures
that the sockaddr data has been copied for IORING_OP_CONNECT if we need
to punt the request to async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f499a021

io_uring: ensure async punted sendmsg/recvmsg requests copy data · 03b1230c

由 Jens Axboe 提交于 12月 02, 2019

Just like commit f67676d1 for read/write requests, this one ensures
that the msghdr data is fully copied if we need to punt a recvmsg or
sendmsg system call to async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

03b1230c

io_uring: ensure async punted read/write requests copy iovec · f67676d1

由 Jens Axboe 提交于 12月 02, 2019

Currently we don't copy the iovecs when we punt to async context. This
can be problematic for applications that store the iovec on the stack,
as they often assume that it's safe to let the iovec go out of scope
as soon as IO submission has been called. This isn't always safe, as we
will re-copy the iovec once we're in async context.

Make this 100% safe by copying the iovec just once. With this change,
applications may safely store the iovec on the stack for all cases.
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f67676d1

io_uring: add general async offload context · 1a6b74fc

由 Jens Axboe 提交于 12月 02, 2019

Right now we just copy the sqe for async offload, but we want to store
more context across an async punt. In preparation for doing so, put the
sqe copy inside a structure that we can expand. With this pointer added,
we can get rid of REQ_F_FREE_SQE, as that is now indicated by whether
req->io is NULL or not.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1a6b74fc

block: don't send uevent for empty disk when not invalidating · 490547ca

由 Eric Biggers 提交于 12月 02, 2019

Commit 6917d068 ("block: merge invalidate_partitions into
rescan_partitions") caused a regression where systemd-udevd spins
forever using max CPU starting at boot time.

It's caused by a behavior change where a KOBJ_CHANGE uevent is now sent
in a case where previously it wasn't.

Restore the old behavior.

Fixes: 6917d068 ("block: merge invalidate_partitions into rescan_partitions")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

490547ca

io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR · 441cdbd5

由 Jens Axboe 提交于 12月 02, 2019

We should never return -ERESTARTSYS to userspace, transform it into
-EINTR.

Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: NJens Axboe <axboe@kernel.dk>

441cdbd5

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功