• J
    bpf: Sockmap, ensure sock lock held during tear down · 7e81a353
    John Fastabend 提交于
    The sock_map_free() and sock_hash_free() paths used to delete sockmap
    and sockhash maps walk the maps and destroy psock and bpf state associated
    with the socks in the map. When done the socks no longer have BPF programs
    attached and will function normally. This can happen while the socks in
    the map are still "live" meaning data may be sent/received during the walk.
    
    Currently, though we don't take the sock_lock when the psock and bpf state
    is removed through this path. Specifically, this means we can be writing
    into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
    while they are also being called from the networking side. This is not
    safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
    believed it was safe. Further its not clear to me its even a good idea
    to try and do this on "live" sockets while networking side might also
    be using the socket. Instead of trying to reason about using the socks
    from both sides lets realize that every use case I'm aware of rarely
    deletes maps, in fact kubernetes/Cilium case builds map at init and
    never tears it down except on errors. So lets do the simple fix and
    grab sock lock.
    
    This patch wraps sock deletes from maps in sock lock and adds some
    annotations so we catch any other cases easier.
    
    Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: NSong Liu <songliubraving@fb.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
    7e81a353
skmsg.c 19.3 KB