1. 09 4月, 2020 1 次提交
  2. 08 4月, 2020 6 次提交
  3. 07 4月, 2020 2 次提交
    • X
      io_uring: initialize fixed_file_data lock · f7fe9346
      Xiaoguang Wang 提交于
      syzbot reports below warning:
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 1 PID: 7099 Comm: syz-executor897 Not tainted 5.6.0-next-20200406-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       assign_lock_key kernel/locking/lockdep.c:913 [inline]
       register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225
       __lock_acquire+0x104/0x4e00 kernel/locking/lockdep.c:4223
       lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4923
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159
       io_sqe_files_register fs/io_uring.c:6599 [inline]
       __io_uring_register+0x1fe8/0x2f00 fs/io_uring.c:8001
       __do_sys_io_uring_register fs/io_uring.c:8081 [inline]
       __se_sys_io_uring_register fs/io_uring.c:8063 [inline]
       __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:8063
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x440289
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffff1bbf558 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440289
      RDX: 0000000020000280 RSI: 0000000000000002 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000401b10
      R13: 0000000000401ba0 R14: 0000000000000000 R15: 0000000000000000
      
      Initialize struct fixed_file_data's lock to fix this issue.
      
      Reported-by: syzbot+e6eeca4a035da76b3065@syzkaller.appspotmail.com
      Fixes: 05589553 ("io_uring: refactor file register/unregister/update handling")
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f7fe9346
    • C
      io_uring: remove redundant variable pointer nxt and io_wq_assign_next call · 211fea18
      Colin Ian King 提交于
      An earlier commit "io_uring: remove @nxt from handlers" removed the
      setting of pointer nxt and now it is always null, hence the non-null
      check and call to io_wq_assign_next is redundant and can be removed.
      
      Addresses-Coverity: ("'Constant' variable guard")
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      211fea18
  4. 06 4月, 2020 1 次提交
  5. 04 4月, 2020 5 次提交
  6. 01 4月, 2020 1 次提交
  7. 31 3月, 2020 1 次提交
    • X
      io_uring: refactor file register/unregister/update handling · 05589553
      Xiaoguang Wang 提交于
      While diving into io_uring fileset register/unregister/update codes, we
      found one bug in the fileset update handling. io_uring fileset update
      use a percpu_ref variable to check whether we can put the previously
      registered file, only when the refcnt of the perfcpu_ref variable
      reaches zero, can we safely put these files. But this doesn't work so
      well. If applications always issue requests continually, this
      perfcpu_ref will never have an chance to reach zero, and it'll always be
      in atomic mode, also will defeat the gains introduced by fileset
      register/unresiger/update feature, which are used to reduce the atomic
      operation overhead of fput/fget.
      
      To fix this issue, while applications do IORING_REGISTER_FILES or
      IORING_REGISTER_FILES_UPDATE operations, we allocate a new percpu_ref
      and kill the old percpu_ref, new requests will use the new percpu_ref.
      Once all previous old requests complete, old percpu_refs will be dropped
      and registered files will be put safely.
      
      Link: https://lore.kernel.org/io-uring/5a8dac33-4ca2-4847-b091-f7dcd3ad0ff3@linux.alibaba.com/T/#tSigned-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05589553
  8. 28 3月, 2020 1 次提交
  9. 27 3月, 2020 2 次提交
  10. 25 3月, 2020 4 次提交
  11. 24 3月, 2020 3 次提交
    • C
      block: remove __bdevname · ea3edd4d
      Christoph Hellwig 提交于
      There is no good reason for __bdevname to exist.  Just open code
      printing the string in the callers.  For three of them the format
      string can be trivially merged into existing printk statements,
      and in init/do_mounts.c we can at least do the scnprintf once at
      the start of the function, and unconditional of CONFIG_BLOCK to
      make the output for tiny configfs a little more helpful.
      
      Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ea3edd4d
    • E
      libfs: fix infoleak in simple_attr_read() · a65cab7d
      Eric Biggers 提交于
      Reading from a debugfs file at a nonzero position, without first reading
      at position 0, leaks uninitialized memory to userspace.
      
      It's a bit tricky to do this, since lseek() and pread() aren't allowed
      on these files, and write() doesn't update the position on them.  But
      writing to them with splice() *does* update the position:
      
      	#define _GNU_SOURCE 1
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <unistd.h>
      	int main()
      	{
      		int pipes[2], fd, n, i;
      		char buf[32];
      
      		pipe(pipes);
      		write(pipes[1], "0", 1);
      		fd = open("/sys/kernel/debug/fault_around_bytes", O_RDWR);
      		splice(pipes[0], NULL, fd, NULL, 1, 0);
      		n = read(fd, buf, sizeof(buf));
      		for (i = 0; i < n; i++)
      			printf("%02x", buf[i]);
      		printf("\n");
      	}
      
      Output:
      	5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a30
      
      Fix the infoleak by making simple_attr_read() always fill
      simple_attr::get_buf if it hasn't been filled yet.
      
      Reported-by: syzbot+fcab69d1ada3e8d6f06b@syzkaller.appspotmail.com
      Reported-by: NAlexander Potapenko <glider@google.com>
      Fixes: acaefc25 ("[PATCH] libfs: add simple attribute files")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20200308023849.988264-1-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a65cab7d
    • P
      io-wq: handle hashed writes in chains · 86f3cd1b
      Pavel Begunkov 提交于
      We always punt async buffered writes to an io-wq helper, as the core
      kernel does not have IOCB_NOWAIT support for that. Most buffered async
      writes complete very quickly, as it's just a copy operation. This means
      that doing multiple locking roundtrips on the shared wqe lock for each
      buffered write is wasteful. Additionally, buffered writes are hashed
      work items, which means that any buffered write to a given file is
      serialized.
      
      Keep identicaly hashed work items contiguously in @wqe->work_list, and
      track a tail for each hash bucket. On dequeue of a hashed item, splice
      all of the same hash in one go using the tracked tail. Until the batch
      is done, the caller doesn't have to synchronize with the wqe or worker
      locks again.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86f3cd1b
  12. 23 3月, 2020 6 次提交
    • H
      io-uring: drop 'free_pfile' in struct io_file_put · a5318d3c
      Hillf Danton 提交于
      Sync removal of file is only used in case of a GFP_KERNEL kmalloc
      failure at the cost of io_file_put::done and work flush, while a
      glich like it can be handled at the call site without too much pain.
      
      That said, what is proposed is to drop sync removing of file, and
      the kink in neck as well.
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a5318d3c
    • H
      io-uring: drop completion when removing file · 4afdb733
      Hillf Danton 提交于
      A case of task hung was reported by syzbot,
      
      INFO: task syz-executor975:9880 blocked for more than 143 seconds.
            Not tainted 5.6.0-rc6-syzkaller #0
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      syz-executor975 D27576  9880   9878 0x80004000
      Call Trace:
       schedule+0xd0/0x2a0 kernel/sched/core.c:4154
       schedule_timeout+0x6db/0xba0 kernel/time/timer.c:1871
       do_wait_for_common kernel/sched/completion.c:83 [inline]
       __wait_for_common kernel/sched/completion.c:104 [inline]
       wait_for_common kernel/sched/completion.c:115 [inline]
       wait_for_completion+0x26a/0x3c0 kernel/sched/completion.c:136
       io_queue_file_removal+0x1af/0x1e0 fs/io_uring.c:5826
       __io_sqe_files_update.isra.0+0x3a1/0xb00 fs/io_uring.c:5867
       io_sqe_files_update fs/io_uring.c:5918 [inline]
       __io_uring_register+0x377/0x2c00 fs/io_uring.c:7131
       __do_sys_io_uring_register fs/io_uring.c:7202 [inline]
       __se_sys_io_uring_register fs/io_uring.c:7184 [inline]
       __x64_sys_io_uring_register+0x192/0x560 fs/io_uring.c:7184
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      and bisect pointed to 05f3fb3c ("io_uring: avoid ring quiesce for
      fixed file set unregister and update").
      
      It is down to the order that we wait for work done before flushing it
      while nobody is likely going to wake us up.
      
      We can drop that completion on stack as flushing work itself is a sync
      operation we need and no more is left behind it.
      
      To that end, io_file_put::done is re-used for indicating if it can be
      freed in the workqueue worker context.
      Reported-and-Inspired-by: Nsyzbot <syzbot+538d1957ce178382a394@syzkaller.appspotmail.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      
      Rename ->done to ->free_pfile
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4afdb733
    • L
      ceph: fix memory leak in ceph_cleanup_snapid_map() · c8d6ee01
      Luis Henriques 提交于
      kmemleak reports the following memory leak:
      
      unreferenced object 0xffff88821feac8a0 (size 96):
        comm "kworker/1:0", pid 17, jiffies 4294896362 (age 20.512s)
        hex dump (first 32 bytes):
          a0 c8 ea 1f 82 88 ff ff 00 c9 ea 1f 82 88 ff ff  ................
          00 00 00 00 00 00 00 00 00 01 00 00 00 00 ad de  ................
        backtrace:
          [<00000000b3ea77fb>] ceph_get_snapid_map+0x75/0x2a0
          [<00000000d4060942>] fill_inode+0xb26/0x1010
          [<0000000049da6206>] ceph_readdir_prepopulate+0x389/0xc40
          [<00000000e2fe2549>] dispatch+0x11ab/0x1521
          [<000000007700b894>] ceph_con_workfn+0xf3d/0x3240
          [<0000000039138a41>] process_one_work+0x24d/0x590
          [<00000000eb751f34>] worker_thread+0x4a/0x3d0
          [<000000007e8f0d42>] kthread+0xfb/0x130
          [<00000000d49bd1fa>] ret_from_fork+0x3a/0x50
      
      A kfree is missing while looping the 'to_free' list of ceph_snapid_map
      objects.
      
      Cc: stable@vger.kernel.org
      Fixes: 75c9627e ("ceph: map snapid to anonymous bdev ID")
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c8d6ee01
    • I
      ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL · 76142097
      Ilya Dryomov 提交于
      CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
      per-pool flags as well.  Unfortunately the backwards compatibility here
      is lacking:
      
      - the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
        was guarded by require_osd_release >= RELEASE_LUMINOUS
      - it was subsequently backported to luminous in v12.2.2, but that makes
        no difference to clients that only check OSDMAP_FULL/NEARFULL because
        require_osd_release is not client-facing -- it is for OSDs
      
      Since all kernels are affected, the best we can do here is just start
      checking both map flags and pool flags and send that to stable.
      
      These checks are best effort, so take osdc->lock and look up pool flags
      just once.  Remove the FIXME, since filesystem quotas are checked above
      and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
      its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.
      
      Cc: stable@vger.kernel.org
      Reported-by: NYanhu Cao <gmayyyha@gmail.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Acked-by: NSage Weil <sage@redhat.com>
      76142097
    • P
      io_uring: Fix ->data corruption on re-enqueue · 18a542ff
      Pavel Begunkov 提交于
      work->data and work->list are shared in union. io_wq_assign_next() sets
      ->data if a req having a linked_timeout, but then io-wq may want to use
      work->list, e.g. to do re-enqueue of a request, so corrupting ->data.
      
      ->data is not necessary, just remove it and extract linked_timeout
      through @Link_list.
      
      Fixes: 60cf46ae ("io-wq: hash dependent work")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      18a542ff
    • P
      io-wq: close cancel gap for hashed linked work · f2cf1149
      Pavel Begunkov 提交于
      After io_assign_current_work() of a linked work, it can be decided to
      offloaded to another thread so doing io_wqe_enqueue(). However, until
      next io_assign_current_work() it can be cancelled, that isn't handled.
      
      Don't assign it, if it's not going to be executed.
      
      Fixes: 60cf46ae ("io-wq: hash dependent work")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f2cf1149
  13. 22 3月, 2020 1 次提交
  14. 21 3月, 2020 2 次提交
  15. 20 3月, 2020 2 次提交
  16. 19 3月, 2020 1 次提交
  17. 18 3月, 2020 1 次提交