• K
    9pfs: Correctly handle cancelled requests · fc78d5ee
    Keno Fischer 提交于
    # Background
    
    I was investigating spurious non-deterministic EINTR returns from
    various 9p file system operations in a Linux guest served from the
    qemu 9p server.
    
     ## EINTR, ERESTARTSYS and the linux kernel
    
    When a signal arrives that the Linux kernel needs to deliver to user-space
    while a given thread is blocked (in the 9p case waiting for a reply to its
    request in 9p_client_rpc -> wait_event_interruptible), it asks whatever
    driver is currently running to abort its current operation (in the 9p case
    causing the submission of a TFLUSH message) and return to user space.
    In these situations, the error message reported is generally ERESTARTSYS.
    If the userspace processes specified SA_RESTART, this means that the
    system call will get restarted upon completion of the signal handler
    delivery (assuming the signal handler doesn't modify the process state
    in complicated ways not relevant here). If SA_RESTART is not specified,
    ERESTARTSYS gets translated to EINTR and user space is expected to handle
    the restart itself.
    
     ## The 9p TFLUSH command
    
    The 9p TFLUSH commands requests that the server abort an ongoing operation.
    The man page [1] specifies:
    
    ```
    If it recognizes oldtag as the tag of a pending transaction, it should
    abort any pending response and discard that tag.
    [...]
    When the client sends a Tflush, it must wait to receive the corresponding
    Rflush before reusing oldtag for subsequent messages. If a response to the
    flushed request is received before the Rflush, the client must honor the
    response as if it had not been flushed, since the completed request may
    signify a state change in the server
    ```
    
    In particular, this means that the server must not send a reply with the
    orignal tag in response to the cancellation request, because the client is
    obligated to interpret such a reply as a coincidental reply to the original
    request.
    
     # The bug
    
    When qemu receives a TFlush request, it sets the `cancelled` flag on the
    relevant pdu. This flag is periodically checked, e.g. in
    `v9fs_co_name_to_path`, and if set, the operation is aborted and the error
    is set to EINTR. However, the server then violates the spec, by returning
    to the client an Rerror response, rather than discarding the message
    entirely. As a result, the client is required to assume that said Rerror
    response is a result of the original request, not a result of the
    cancellation and thus passes the EINTR error back to user space.
    This is not the worst thing it could do, however as discussed above, the
    correct error code would have been ERESTARTSYS, such that user space
    programs with SA_RESTART set get correctly restarted upon completion of
    the signal handler.
    Instead, such programs get spurious EINTR results that they were not
    expecting to handle.
    
    It should be noted that there are plenty of user space programs that do not
    set SA_RESTART and do not correctly handle EINTR either. However, that is
    then a userspace bug. It should also be noted that this bug has been
    mitigated by a recent commit to the Linux kernel [2], which essentially
    prevents the kernel from sending Tflush requests unless the process is about
    to die (in which case the process likely doesn't care about the response).
    Nevertheless, for older kernels and to comply with the spec, I believe this
    change is beneficial.
    
     # Implementation
    
    The fix is fairly simple, just skipping notification of a reply if
    the pdu was previously cancelled. We do however, also notify the transport
    layer that we're doing this, so it can clean up any resources it may be
    holding. I also added a new trace event to distinguish
    operations that caused an error reply from those that were cancelled.
    
    One complication is that we only omit sending the message on EINTR errors in
    order to avoid confusing the rest of the code (which may assume that a
    client knows about a fid if it sucessfully passed it off to pud_complete
    without checking for cancellation status). This does mean that if the server
    acts upon the cancellation flag, it always needs to set err to EINTR. I
    believe this is true of the current code.
    
    [1] https://9fans.github.io/plan9port/man/man9/flush.html
    [2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891Signed-off-by: NKeno Fischer <keno@juliacomputing.com>
    Reviewed-by: NGreg Kurz <groug@kaod.org>
    [groug, send a zero-sized reply instead of detaching the buffer]
    Signed-off-by: NGreg Kurz <groug@kaod.org>
    Acked-by: NMichael S. Tsirkin <mst@redhat.com>
    Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
    fc78d5ee
trace-events 5.5 KB