- 09 4月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
If completion queue overflow occurs, __io_cqring_fill_event() will update req->cflags, which is in a union with req->work and happens to be aliased to req->work.fs. Following io_free_req() -> io_req_work_drop_env() may get a bunch of different problems (miscount fs->users, segfault, etc) on cleaning @fs. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 4月, 2020 6 次提交
-
-
由 Pavel Begunkov 提交于
Don't re-read userspace-shared sqe->flags, it can be exploited. sqe->flags are copied into req->flags in io_submit_sqe(), check them there instead. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
io_get_req() do two different things: io_kiocb allocation and initialisation. Move init part out of it and rename into io_alloc_req(). It's simpler this way and also have better data locality. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
As io_get_sqe() split into 2 stage get/consume, get an sqe before allocating io_kiocb, so no free_req*() for a failure case is needed, and inline back __io_req_do_free(), which has only 1 user. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Make io_get_sqring() care only about sqes themselves, not initialising the io_kiocb. Also, split it into get + consume, that will be helpful in the future. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Xiaoguang Wang 提交于
In io_read_prep() or io_write_prep(), io_req_map_rw() takes struct io_async_rw's fast_iov as argument to call io_import_iovec(), and if io_import_iovec() uses struct io_async_rw's fast_iov as valid iovec array, later indeed io_req_map_rw() does not need to do the memcpy operation, because they are same pointers. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the OPENAT opcode. Dmitry reports that his test case that compares openat() and IORING_OP_OPENAT sees failures on large files: *** sync openat openat succeeded sync write at offset 0 write succeeded sync write at offset 4294967296 write succeeded *** sync openat openat succeeded io_uring write at offset 0 write succeeded io_uring write at offset 4294967296 write succeeded *** io_uring openat openat succeeded sync write at offset 0 write succeeded sync write at offset 4294967296 write failed: File too large *** io_uring openat openat succeeded io_uring write at offset 0 write succeeded io_uring write at offset 4294967296 write failed: File too large Ensure we set O_LARGEFILE, if force_o_largefile() is true. Cc: stable@vger.kernel.org # v5.6 Fixes: 15b71abe ("io_uring: add support for IORING_OP_OPENAT") Reported-by: NDmitry Kadashev <dkadashev@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 4月, 2020 2 次提交
-
-
由 Xiaoguang Wang 提交于
syzbot reports below warning: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 assign_lock_key kernel/locking/lockdep.c:913 [inline] register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225 __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159 io_sqe_files_register fs/io_uring.c:6599 [inline] __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001 __do_sys_io_uring_register fs/io_uring.c:8081 [inline] __se_sys_io_uring_register fs/io_uring.c:8063 [inline] __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x440289 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289 RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10 R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000 Initialize struct fixed_file_data's lock to fix this issue. Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com Fixes: 05589553 ("io_uring: refactor file register/unregister/update handling") Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Colin Ian King 提交于
An earlier commit "io_uring: remove @nxt from handlers" removed the setting of pointer nxt and now it is always null, hence the non-null check and call to io_wq_assign_next is redundant and can be removed. Addresses-Coverity: ("'Constant' variable guard") Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 4月, 2020 1 次提交
-
-
由 Pavel Begunkov 提交于
If io_get_req() fails, it drops a ref. Then, awhile keeping @Submitted unmodified, io_submit_sqes() breaks the loop and puts @nr - @Submitted refs. For each submitted req a ref is dropped in io_put_req() and friends. So, for @nr taken refs there will be (@nr - @Submitted + @Submitted + 1) dropped. Remove ctx refcounting from io_get_req(), that at the same time makes it clearer. Fixes: 2b85edfc ("io_uring: batch getting pcpu references") Cc: stable@vger.kernel.org # v5.6 Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 4月, 2020 5 次提交
-
-
由 Bijan Mottahedeh 提交于
A request that completes with an -EAGAIN result after it has been added to the poll list, will not be removed from that list in io_do_iopoll() because the f_op->iopoll() will not succeed for that request. Maintain a retryable local list similar to the done list, and explicity reissue requests with an -EAGAIN result. Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We already checked this limit when the file was opened, and we keep it open in the file table. Hence when we added unit_inflight to the count we want to register, we're doubly accounting these files. This results in -EMFILE for file registration, if we're at half the limit. Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If the original task is (or has) exited, then the task work will not get queued properly. Allow for using the io-wq manager task to queue this work for execution, and ensure that the io-wq manager notices and runs this work if woken up (or exiting). Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We can have a task exit if it's not the owner of the ring. Be safe and grab an actual reference to it, to avoid a potential use-after-free. Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we get woken and the poll doesn't match our mask, re-add the task to the poll waitqueue and try again instead of completing the request with a mask of 0. Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 4月, 2020 1 次提交
-
-
由 Hillf Danton 提交于
Add it to pair with prepare_to_wait() in an attempt to avoid anything weird in the field. Fixes: b41e9852 ("io_uring: add per-task callback handler") Reported-by: syzbot+0c3370f235b74b3cfd97@syzkaller.appspotmail.com Signed-off-by: NHillf Danton <hdanton@sina.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 3月, 2020 1 次提交
-
-
由 Xiaoguang Wang 提交于
While diving into io_uring fileset register/unregister/update codes, we found one bug in the fileset update handling. io_uring fileset update use a percpu_ref variable to check whether we can put the previously registered file, only when the refcnt of the perfcpu_ref variable reaches zero, can we safely put these files. But this doesn't work so well. If applications always issue requests continually, this perfcpu_ref will never have an chance to reach zero, and it'll always be in atomic mode, also will defeat the gains introduced by fileset register/unresiger/update feature, which are used to reduce the atomic operation overhead of fput/fget. To fix this issue, while applications do IORING_REGISTER_FILES or IORING_REGISTER_FILES_UPDATE operations, we allocate a new percpu_ref and kill the old percpu_ref, new requests will use the new percpu_ref. Once all previous old requests complete, old percpu_refs will be dropped and registered files will be put safely. Link: https://lore.kernel.org/io-uring/5a8dac33-4ca2-4847-b091-f7dcd3ad0ff3@linux.alibaba.com/T/#tSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 3月, 2020 1 次提交
-
-
由 Thomas Gleixner 提交于
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. PREEMPT_RT replaced the bit spinlocks with regular spinlocks to avoid this problem. The replacement was done conditionaly at compile time, but Christoph requested to do an unconditional conversion. Jan suggested to move the spinlock into a existing padding hole which avoids a size increase of struct buffer_head on production kernels. As a benefit the lock gains lockdep coverage. [ bigeasy: Remove the wrapper and use always spinlock_t and move it into the padding hole ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Link: https://lkml.kernel.org/r/20191118132824.rclhrbujqh4b4g4d@linutronix.de
-
- 27 3月, 2020 2 次提交
-
-
由 Xiaoguang Wang 提交于
Cleanup io_alloc_async_ctx() a bit, add a new __io_alloc_async_ctx(), so io_setup_async_rw() won't need to check whether async_ctx is true or false again. Reviewed-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 David Howells 提交于
When it's probing all of a fileserver's interfaces to find which one is best to use, afs_do_probe_fileserver() takes a lock on the server record and notes the pointer to the address list. It doesn't, however, pin the address list, so as soon as it drops the lock, there's nothing to stop the address list from being freed under us. Fix this by taking a ref on the address list inside the locked section and dropping it at the end of the function. Fixes: 3bf0fb6f ("afs: Probe multiple fileservers simultaneously") Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NMarc Dionne <marc.dionne@auristor.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 3月, 2020 4 次提交
-
-
由 Christoph Hellwig 提交于
These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This is bio layer functionality and not related to buffer heads. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Chucheng Luo 提交于
The missing 'return' work may make it hard for other developers to understand it. Signed-off-by: NChucheng Luo <luochucheng@vivo.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
The write pointer of zones in the read-only consition is defined as invalid by the SCSI ZBC and ATA ZAC specifications. It is thus not possible to determine the correct size of a read-only zone file on mount. Fix this by handling read-only zones in the same manner as offline zones by disabling all accesses to the zone (read and write) and initializing the inode size of the read-only zone to 0). For zones found to be in the read-only condition at runtime, only disable write access to the zone and keep the size of the zone file to its last updated value to allow the user to recover previously written data. Also fix zonefs documentation file to reflect this change. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
-
- 24 3月, 2020 3 次提交
-
-
由 Christoph Hellwig 提交于
There is no good reason for __bdevname to exist. Just open code printing the string in the callers. For three of them the format string can be trivially merged into existing printk statements, and in init/do_mounts.c we can at least do the scnprintf once at the start of the function, and unconditional of CONFIG_BLOCK to make the output for tiny configfs a little more helpful. Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Eric Biggers 提交于
Reading from a debugfs file at a nonzero position, without first reading at position 0, leaks uninitialized memory to userspace. It's a bit tricky to do this, since lseek() and pread() aren't allowed on these files, and write() doesn't update the position on them. But writing to them with splice() *does* update the position: #define _GNU_SOURCE 1 #include <fcntl.h> #include <stdio.h> #include <unistd.h> int main() { int pipes[2], fd, n, i; char buf[32]; pipe(pipes); write(pipes[1], "0", 1); fd = open("/sys/kernel/debug/fault_around_bytes", O_RDWR); splice(pipes[0], NULL, fd, NULL, 1, 0); n = read(fd, buf, sizeof(buf)); for (i = 0; i < n; i++) printf("%02x", buf[i]); printf("\n"); } Output: 5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a30 Fix the infoleak by making simple_attr_read() always fill simple_attr::get_buf if it hasn't been filled yet. Reported-by: syzbot+fcab69d1ada3e8d6f06b@syzkaller.appspotmail.com Reported-by: NAlexander Potapenko <glider@google.com> Fixes: acaefc25 ("[PATCH] libfs: add simple attribute files") Cc: stable@vger.kernel.org Signed-off-by: NEric Biggers <ebiggers@google.com> Acked-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200308023849.988264-1-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Pavel Begunkov 提交于
We always punt async buffered writes to an io-wq helper, as the core kernel does not have IOCB_NOWAIT support for that. Most buffered async writes complete very quickly, as it's just a copy operation. This means that doing multiple locking roundtrips on the shared wqe lock for each buffered write is wasteful. Additionally, buffered writes are hashed work items, which means that any buffered write to a given file is serialized. Keep identicaly hashed work items contiguously in @wqe->work_list, and track a tail for each hash bucket. On dequeue of a hashed item, splice all of the same hash in one go using the tracked tail. Until the batch is done, the caller doesn't have to synchronize with the wqe or worker locks again. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 3月, 2020 6 次提交
-
-
由 Hillf Danton 提交于
Sync removal of file is only used in case of a GFP_KERNEL kmalloc failure at the cost of io_file_put::done and work flush, while a glich like it can be handled at the call site without too much pain. That said, what is proposed is to drop sync removing of file, and the kink in neck as well. Signed-off-by: NHillf Danton <hdanton@sina.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hillf Danton 提交于
A case of task hung was reported by syzbot, INFO: task syz-executor975:9880 blocked for more than 143 seconds. Not tainted 5.6.0-rc6-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor975 D27576 9880 9878 0x80004000 Call Trace: schedule+0xd0/0x2a0 kernel/sched/core.c:4154 schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871 do_wait_for_common kernel/sched/completion.c:83 [inline] __wait_for_common kernel/sched/completion.c:104 [inline] wait_for_common kernel/sched/completion.c:115 [inline] wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136 io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826 __io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867 io_sqe_files_update fs/io_uring.c:5918 [inline] __io_uring_register+0x377/0x2c00 fs/io_uring.c:7131 __do_sys_io_uring_register fs/io_uring.c:7202 [inline] __se_sys_io_uring_register fs/io_uring.c:7184 [inline] __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe and bisect pointed to 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update"). It is down to the order that we wait for work done before flushing it while nobody is likely going to wake us up. We can drop that completion on stack as flushing work itself is a sync operation we need and no more is left behind it. To that end, io_file_put::done is re-used for indicating if it can be freed in the workqueue worker context. Reported-and-Inspired-by: Nsyzbot <syzbot+538d1957ce178382a394@syzkaller.appspotmail.com> Signed-off-by: NHillf Danton <hdanton@sina.com> Rename ->done to ->free_pfile Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Luis Henriques 提交于
kmemleak reports the following memory leak: unreferenced object 0xffff88821feac8a0 (size 96): comm "kworker/1:0", pid 17, jiffies 4294896362 (age 20.512s) hex dump (first 32 bytes): a0 c8 ea 1f 82 88 ff ff 00 c9 ea 1f 82 88 ff ff ................ 00 00 00 00 00 00 00 00 00 01 00 00 00 00 ad de ................ backtrace: [<00000000b3ea77fb>] ceph_get_snapid_map+0x75/0x2a0 [<00000000d4060942>] fill_inode+0xb26/0x1010 [<0000000049da6206>] ceph_readdir_prepopulate+0x389/0xc40 [<00000000e2fe2549>] dispatch+0x11ab/0x1521 [<000000007700b894>] ceph_con_workfn+0xf3d/0x3240 [<0000000039138a41>] process_one_work+0x24d/0x590 [<00000000eb751f34>] worker_thread+0x4a/0x3d0 [<000000007e8f0d42>] kthread+0xfb/0x130 [<00000000d49bd1fa>] ret_from_fork+0x3a/0x50 A kfree is missing while looping the 'to_free' list of ceph_snapid_map objects. Cc: stable@vger.kernel.org Fixes: 75c9627e ("ceph: map snapid to anonymous bdev ID") Signed-off-by: NLuis Henriques <lhenriques@suse.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult per-pool flags as well. Unfortunately the backwards compatibility here is lacking: - the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but was guarded by require_osd_release >= RELEASE_LUMINOUS - it was subsequently backported to luminous in v12.2.2, but that makes no difference to clients that only check OSDMAP_FULL/NEARFULL because require_osd_release is not client-facing -- it is for OSDs Since all kernels are affected, the best we can do here is just start checking both map flags and pool flags and send that to stable. These checks are best effort, so take osdc->lock and look up pool flags just once. Remove the FIXME, since filesystem quotas are checked above and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set. Cc: stable@vger.kernel.org Reported-by: NYanhu Cao <gmayyyha@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Acked-by: NSage Weil <sage@redhat.com>
-
由 Pavel Begunkov 提交于
work->data and work->list are shared in union. io_wq_assign_next() sets ->data if a req having a linked_timeout, but then io-wq may want to use work->list, e.g. to do re-enqueue of a request, so corrupting ->data. ->data is not necessary, just remove it and extract linked_timeout through @Link_list. Fixes: 60cf46ae ("io-wq: hash dependent work") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
After io_assign_current_work() of a linked work, it can be decided to offloaded to another thread so doing io_wqe_enqueue(). However, until next io_assign_current_work() it can be cancelled, that isn't handled. Don't assign it, if it's not going to be executed. Fixes: 60cf46ae ("io-wq: hash dependent work") Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 3月, 2020 1 次提交
-
-
由 Roman Penyaev 提交于
This fixes possible lost wakeup introduced by commit a218cc49. Originally modifications to ep->wq were serialized by ep->wq.lock, but in commit a218cc49 ("epoll: use rwlock in order to reduce ep_poll_callback() contention") a new rw lock was introduced in order to relax fd event path, i.e. callers of ep_poll_callback() function. After the change ep_modify and ep_insert (both are called on epoll_ctl() path) were switched to ep->lock, but ep_poll (epoll_wait) was using ep->wq.lock on wqueue list modification. The bug doesn't lead to any wqueue list corruptions, because wake up path and list modifications were serialized by ep->wq.lock internally, but actual waitqueue_active() check prior wake_up() call can be reordered with modifications of ep ready list, thus wake up can be lost. And yes, can be healed by explicit smp_mb(): list_add_tail(&epi->rdlink, &ep->rdllist); smp_mb(); if (waitqueue_active(&ep->wq)) wake_up(&ep->wp); But let's make it simple, thus current patch replaces ep->wq.lock with the ep->lock for wqueue modifications, thus wake up path always observes activeness of the wqueue correcty. Fixes: a218cc49 ("epoll: use rwlock in order to reduce ep_poll_callback() contention") Reported-by: NMax Neunhoeffer <max@arangodb.com> Signed-off-by: NRoman Penyaev <rpenyaev@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NMax Neunhoeffer <max@arangodb.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Jes Sorensen <jes.sorensen@gmail.com> Cc: <stable@vger.kernel.org> [5.1+] Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de References: https://bugzilla.kernel.org/show_bug.cgi?id=205933Bisected-by: NMax Neunhoeffer <max@arangodb.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 3月, 2020 2 次提交
-
-
由 Filipe Manana 提交于
We are incorrectly dropping the raid56 and raid1c34 incompat flags when there are still raid56 and raid1c34 block groups, not when we do not any of those anymore. The logic just got unintentionally broken after adding the support for the raid1c34 modes. Fix this by clear the flags only if we do not have block groups with the respective profiles. Fixes: 9c907446 ("btrfs: drop incompat bit for raid1c34 after last block group is gone") Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jens Axboe 提交于
With the previous fixes for number of files open checking, I added some debug code to see if we had other spots where we're checking rlimit() against the async io-wq workers. The only one I found was file size checking, which we should also honor. During write and fallocate prep, store the max file size and override that for the current ask if we're in io-wq worker context. Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 3月, 2020 2 次提交
-
-
由 Jens Axboe 提交于
Just like commit 4022e7af, this fixes the fact that IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks current->signal->rlim[] for limits. Add an extra argument to __sys_accept4_file() that allows us to pass in the proper nofile limit, and grab it at request prep time. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Dmitry reports that a test case shows that io_uring isn't honoring a modified rlimit nofile setting. get_unused_fd_flags() checks the task signal->rlimi[] for the limits. As this isn't easily inheritable, provide a __get_unused_fd_flags() that takes the value instead. Then we can grab it when the request is prepared (from the original task), and pass that in when we do the async part part of the open. Reported-by: NDmitry Kadashev <dkadashev@gmail.com> Tested-by: NDmitry Kadashev <dkadashev@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 3月, 2020 1 次提交
-
-
由 Linus Torvalds 提交于
There is measurable performance impact in some synthetic tests due to commit 6d390e4b (locks: fix a potential use-after-free problem when wakeup a waiter). Fix the race condition instead by clearing the fl_blocker pointer after the wake_up, using explicit acquire/release semantics. This does mean that we can no longer use the clearing of fl_blocker as the wait condition, so switch the waiters over to checking whether the fl_blocked_member list_head is empty. Reviewed-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NNeilBrown <neilb@suse.de> Fixes: 6d390e4b (locks: fix a potential use-after-free problem when wakeup a waiter) Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Historically we only set the capacity to zero for devices that support partitions (independ of actually having partitions created). Doing that is rather inconsistent, but changing it broke legacy udisks polling for legacy ide-cdrom devices. Use the crude a crude check for devices that either are non-removable or partitionable to get the sane behavior for most device while not breaking userspace for this particular setup. Fixes: a1548b67 ("block: move rescan_partitions to fs/block_dev.c") Reported-by: NHe Zhe <zhe.he@windriver.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NHe Zhe <zhe.he@windriver.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-