• P
    tcp: fix page frag corruption on page fault · dacb5d88
    Paolo Abeni 提交于
    Steffen reported a TCP stream corruption for HTTP requests
    served by the apache web-server using a cifs mount-point
    and memory mapping the relevant file.
    
    The root cause is quite similar to the one addressed by
    commit 20eb4f29 ("net: fix sk_page_frag() recursion from
    memory reclaim"). Here the nested access to the task page frag
    is caused by a page fault on the (mmapped) user-space memory
    buffer coming from the cifs file.
    
    The page fault handler performs an smb transaction on a different
    socket, inside the same process context. Since sk->sk_allaction
    for such socket does not prevent the usage for the task_frag,
    the nested allocation modify "under the hood" the page frag
    in use by the outer sendmsg call, corrupting the stream.
    
    The overall relevant stack trace looks like the following:
    
    httpd 78268 [001] 3461630.850950:      probe:tcp_sendmsg_locked:
            ffffffff91461d91 tcp_sendmsg_locked+0x1
            ffffffff91462b57 tcp_sendmsg+0x27
            ffffffff9139814e sock_sendmsg+0x3e
            ffffffffc06dfe1d smb_send_kvec+0x28
            [...]
            ffffffffc06cfaf8 cifs_readpages+0x213
            ffffffff90e83c4b read_pages+0x6b
            ffffffff90e83f31 __do_page_cache_readahead+0x1c1
            ffffffff90e79e98 filemap_fault+0x788
            ffffffff90eb0458 __do_fault+0x38
            ffffffff90eb5280 do_fault+0x1a0
            ffffffff90eb7c84 __handle_mm_fault+0x4d4
            ffffffff90eb8093 handle_mm_fault+0xc3
            ffffffff90c74f6d __do_page_fault+0x1ed
            ffffffff90c75277 do_page_fault+0x37
            ffffffff9160111e page_fault+0x1e
            ffffffff9109e7b5 copyin+0x25
            ffffffff9109eb40 _copy_from_iter_full+0xe0
            ffffffff91462370 tcp_sendmsg_locked+0x5e0
            ffffffff91462370 tcp_sendmsg_locked+0x5e0
            ffffffff91462b57 tcp_sendmsg+0x27
            ffffffff9139815c sock_sendmsg+0x4c
            ffffffff913981f7 sock_write_iter+0x97
            ffffffff90f2cc56 do_iter_readv_writev+0x156
            ffffffff90f2dff0 do_iter_write+0x80
            ffffffff90f2e1c3 vfs_writev+0xa3
            ffffffff90f2e27c do_writev+0x5c
            ffffffff90c042bb do_syscall_64+0x5b
            ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
    
    The cifs filesystem rightfully sets sk_allocations to GFP_NOFS,
    we can avoid the nesting using the sk page frag for allocation
    lacking the __GFP_FS flag. Do not define an additional mm-helper
    for that, as this is strictly tied to the sk page frag usage.
    
    v1 -> v2:
     - use a stricted sk_page_frag() check instead of reordering the
       code (Eric)
    Reported-by: NSteffen Froemer <sfroemer@redhat.com>
    Fixes: 5640f768 ("net: use a per task frag allocator")
    Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
    Reviewed-by: NEric Dumazet <edumazet@google.com>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    dacb5d88
sock.h 80.3 KB