• J
    io_uring: support buffer selection for OP_READ and OP_RECV · 1d68f9f6
    Jens Axboe 提交于
    to #28170604
    
    commit bcda7baaa3f15c7a95db3c024bb046d6e298f76b upstream
    
    If a server process has tons of pending socket connections, generally
    it uses epoll to wait for activity. When the socket is ready for reading
    (or writing), the task can select a buffer and issue a recv/send on the
    given fd.
    
    Now that we have fast (non-async thread) support, a task can have tons
    of pending reads or writes pending. But that means they need buffers to
    back that data, and if the number of connections is high enough, having
    them preallocated for all possible connections is unfeasible.
    
    With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to
    use for any request. The request then sets IOSQE_BUFFER_SELECT in the
    sqe, and a given group ID in sqe->buf_group. When the fd becomes ready,
    a free buffer from the specified group is selected. If none are
    available, the request is terminated with -ENOBUFS. If successful, the
    CQE on completion will contain the buffer ID chosen in the cqe->flags
    member, encoded as:
    
    	(buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER;
    
    Once a buffer has been consumed by a request, it is no longer available
    and must be registered again with IORING_OP_PROVIDE_BUFFERS.
    
    Requests need to support this feature. For now, IORING_OP_READ and
    IORING_OP_RECV support it. This is checked on SQE submission, a CQE with
    res == -EOPNOTSUPP will be posted if attempted on unsupported requests.
    Signed-off-by: NJens Axboe <axboe@kernel.dk>
    Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
    Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
    1d68f9f6
io_uring.h 6.0 KB