• J
    io_uring: improve task work cache utilization · 34d2bfe7
    Jens Axboe 提交于
    While profiling task_work intensive workloads, I noticed that most of
    the time in tctx_task_work() is spending stalled on loading 'req'. This
    is one of the unfortunate side effects of using linked lists,
    particularly when they end up being passe around.
    
    Prefetch the next request, if there is one. There's a sufficient amount
    of work in between that this makes it available for the next loop.
    
    While fiddling with the cache layout, move the link outside of the
    hot completion cacheline. It's rarely used in hot workloads, so better
    to bring in kbuf which is used for networked loads with provided buffers.
    
    This reduces tctx_task_work() overhead from ~3% to 1-1.5% in my testing.
    Signed-off-by: NJens Axboe <axboe@kernel.dk>
    34d2bfe7
io_uring.c 296.8 KB