1. 28 2月, 2019 3 次提交
    • C
      io_uring: add fsync support · c992fe29
      Christoph Hellwig 提交于
      Add a new fsync opcode, which either syncs a range if one is passed,
      or the whole file if the offset and length fields are both cleared
      to zero.  A flag is provided to use fdatasync semantics, that is only
      force out metadata which is required to retrieve the file data, but
      not others like metadata.
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c992fe29
    • J
      Add io_uring IO interface · 2b188cc1
      Jens Axboe 提交于
      The submission queue (SQ) and completion queue (CQ) rings are shared
      between the application and the kernel. This eliminates the need to
      copy data back and forth to submit and complete IO.
      
      IO submissions use the io_uring_sqe data structure, and completions
      are generated in the form of io_uring_cqe data structures. The SQ
      ring is an index into the io_uring_sqe array, which makes it possible
      to submit a batch of IOs without them being contiguous in the ring.
      The CQ ring is always contiguous, as completion events are inherently
      unordered, and hence any io_uring_cqe entry can point back to an
      arbitrary submission.
      
      Two new system calls are added for this:
      
      io_uring_setup(entries, params)
      	Sets up an io_uring instance for doing async IO. On success,
      	returns a file descriptor that the application can mmap to
      	gain access to the SQ ring, CQ ring, and io_uring_sqes.
      
      io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
      	Initiates IO against the rings mapped to this fd, or waits for
      	them to complete, or both. The behavior is controlled by the
      	parameters passed in. If 'to_submit' is non-zero, then we'll
      	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
      	kernel will wait for 'min_complete' events, if they aren't
      	already available. It's valid to set IORING_ENTER_GETEVENTS
      	and 'min_complete' == 0 at the same time, this allows the
      	kernel to return already completed events without waiting
      	for them. This is useful only for polling, as for IRQ
      	driven IO, the application can just check the CQ ring
      	without entering the kernel.
      
      With this setup, it's possible to do async IO with a single system
      call. Future developments will enable polled IO with this interface,
      and polled submission as well. The latter will enable an application
      to do IO without doing ANY system calls at all.
      
      For IRQ driven IO, an application only needs to enter the kernel for
      completions if it wants to wait for them to occur.
      
      Each io_uring is backed by a workqueue, to support buffered async IO
      as well. We will only punt to an async context if the command would
      need to wait for IO on the device side. Any data that can be accessed
      directly in the page cache is done inline. This avoids the slowness
      issue of usual threadpools, since cached data is accessed as quickly
      as a sync interface.
      
      Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.cReviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2b188cc1
    • M
      block: introduce mp_bvec_for_each_page() for iterating over page · 594b9a89
      Ming Lei 提交于
      mp_bvec_for_each_segment() is a bit big for the iteration, so introduce
      a light-weight helper for iterating over pages, then 32bytes stack
      space can be saved.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      594b9a89
  2. 27 2月, 2019 3 次提交
  3. 24 2月, 2019 4 次提交
  4. 23 2月, 2019 2 次提交
  5. 22 2月, 2019 2 次提交
    • M
      block: bounce: make sure that bvec table is updated · 8f4e80da
      Ming Lei 提交于
      Block bounce needs to allocate new page for doing IO, and the
      new page has to be updated to bvec table.
      
      Commit 6dc4f100 switches __blk_queue_bounce() to use the new
      bio_for_each_segment_all() interface. Unfortunately the new
      bio_for_each_segment_all() can't be used to update bvec table.
      
      This patch fixes this issue by retrieving bvec from the table
      directly, then the new allocated page can be updated to the bio.
      This way is safe because the cloned bio is single page bvec.
      
      Fixes: 6dc4f100 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8f4e80da
    • J
      Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-5.1/block · 037b2625
      Jens Axboe 提交于
      Pull NVMe changes for 5.1 from Christoph
      
      * 'nvme-5.1' of git://git.infradead.org/nvme: (22 commits)
        nvme-rdma: use nr_phys_segments when map rq to sgl
        nvmet: convert to SPDX identifiers
        nvmet-rdma: convert to SPDX identifiers
        nvme-loop: convert to SPDX identifiers
        nvmet-fcloop: convert to SPDX identifiers
        nvmet-fc: convert to SPDX identifiers
        nvme: convert to SPDX identifiers
        nvme-pci: convert to SPDX identifiers
        nvme-lightnvm: convert to SPDX identifiers
        nvme-rdma: convert to SPDX identifiers
        nvme-fc: convert to SPDX identifiers
        nvme-fabrics: convert to SPDX identifiers
        nvme-tcp.h: fix SPDX header
        nvme_ioctl.h: remove duplicate GPL boilerplate
        nvme: return error from nvme_alloc_ns()
        nvme: avoid that deleting a controller triggers a circular locking complaint
        nvme: introduce a helper function for controller deletion
        nvme: unexport nvme_delete_ctrl_sync()
        nvme-pci: check kstrtoint() return value in queue_count_set()
        nvme-fabrics: document the poll function argument
        ...
      037b2625
  6. 21 2月, 2019 1 次提交
  7. 20 2月, 2019 22 次提交
  8. 15 2月, 2019 3 次提交
    • J
      Merge tag 'v5.0-rc6' into for-5.1/block · 6fb845f0
      Jens Axboe 提交于
      Pull in 5.0-rc6 to avoid a dumb merge conflict with fs/iomap.c.
      This is needed since io_uring is now based on the block branch,
      to avoid a conflict between the multi-page bvecs and the bits
      of io_uring that touch the core block parts.
      
      * tag 'v5.0-rc6': (525 commits)
        Linux 5.0-rc6
        x86/mm: Make set_pmd_at() paravirt aware
        MAINTAINERS: Update the ocores i2c bus driver maintainer, etc
        blk-mq: remove duplicated definition of blk_mq_freeze_queue
        Blk-iolatency: warn on negative inflight IO counter
        blk-iolatency: fix IO hang due to negative inflight counter
        MAINTAINERS: unify reference to xen-devel list
        x86/mm/cpa: Fix set_mce_nospec()
        futex: Handle early deadlock return correctly
        futex: Fix barrier comment
        net: dsa: b53: Fix for failure when irq is not defined in dt
        blktrace: Show requests without sector
        mips: cm: reprime error cause
        mips: loongson64: remove unreachable(), fix loongson_poweroff().
        sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
        geneve: should not call rt6_lookup() when ipv6 was disabled
        KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
        KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
        kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
        signal: Better detection of synchronous signals
        ...
      6fb845f0
    • M
      block: kill BLK_MQ_F_SG_MERGE · 56d18f62
      Ming Lei 提交于
      QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      56d18f62
    • M
      block: kill QUEUE_FLAG_NO_SG_MERGE · 2705c937
      Ming Lei 提交于
      Since bdced438 ("block: setup bi_phys_segments after splitting"),
      physical segment number is mainly figured out in blk_queue_split() for
      fast path, and the flag of BIO_SEG_VALID is set there too.
      
      Now only blk_recount_segments() and blk_recalc_rq_segments() use this
      flag.
      
      Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
      is set in blk_queue_split().
      
      For another user of blk_recalc_rq_segments():
      
      - run in partial completion branch of blk_update_request, which is an unusual case
      
      - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
      since dm-rq is the only user.
      
      Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
      current setup of the I/O path, as it isn't going to save you a significant amount
      of cycles.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2705c937