• A
    bpf: Add ambient BPF runtime context stored in current · c7603cfa
    Andrii Nakryiko 提交于
    b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
    helper") fixed the problem with cgroup-local storage use in BPF by
    pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
    possible BPF program preemptions and nested executions.
    
    While this seems to work good in practice, it introduces new and unnecessary
    failure mode in which not all BPF programs might be executed if we fail to
    find an unused slot for cgroup storage, however unlikely it is. It might also
    not be so unlikely when/if we allow sleepable cgroup BPF programs in the
    future.
    
    Further, the way that cgroup storage is implemented as ambiently-available
    property during entire BPF program execution is a convenient way to pass extra
    information to BPF program and helpers without requiring user code to pass
    around extra arguments explicitly. So it would be good to have a generic
    solution that can allow implementing this without arbitrary restrictions.
    Ideally, such solution would work for both preemptable and sleepable BPF
    programs in exactly the same way.
    
    This patch introduces such solution, bpf_run_ctx. It adds one pointer field
    (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
    macros in such a way that it always stays valid throughout BPF program
    execution. BPF program preemption is handled by remembering previous
    current->bpf_ctx value locally while executing nested BPF program and
    restoring old value after nested BPF program finishes. This is handled by two
    helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
    supposed to be used before and after BPF program runs, respectively.
    
    Restoring old value of the pointer handles preemption, while bpf_run_ctx
    pointer being a property of current task_struct naturally solves this problem
    for sleepable BPF programs by "following" BPF program execution as it is
    scheduled in and out of CPU. It would even allow CPU migration of BPF
    programs, even though it's not currently allowed by BPF infra.
    
    This patch cleans up cgroup local storage handling as a first application. The
    design itself is generic, though, with bpf_run_ctx being an empty struct that
    is supposed to be embedded into a specific struct for a given BPF program type
    (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
    this mechanism for other uses within tracing BPF programs.
    
    To verify that this change doesn't revert the fix to the original cgroup
    storage issue, I ran the same repro as in the original report ([0]) and didn't
    get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
    bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
    
      [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
    
    Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
    Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: NYonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
    c7603cfa
sched.h 61.1 KB