• D
    perf, events: add non-linear data support for raw records · 7e3f977e
    Daniel Borkmann 提交于
    This patch adds support for non-linear data on raw records. It
    extends raw records to have one or multiple fragments that will
    be written linearly into the ring slot, where each fragment can
    optionally have a custom callback handler to walk and extract
    complex, possibly non-linear data.
    
    If a callback handler is provided for a fragment, then the new
    __output_custom() will be used instead of __output_copy() for
    the perf_output_sample() part. perf_prepare_sample() does all
    the size calculation only once, so perf_output_sample() doesn't
    need to redo the same work anymore, meaning real_size and padding
    will be cached in the raw record. The raw record becomes 32 bytes
    in size without holes; to not increase it further and to avoid
    doing unnecessary recalculations in fast-path, we can reuse
    next pointer of the last fragment, idea here is borrowed from
    ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
    path for PERF_SAMPLE_RAW minimal.
    
    This facility is needed for BPF's event output helper as a first
    user that will, in a follow-up, add an additional perf_raw_frag
    to its perf_raw_record in order to be able to more efficiently
    dump skb context after a linear head meta data related to it.
    skbs can be non-linear and thus need a custom output function to
    dump buffers. Currently, the skb data needs to be copied twice;
    with the help of __output_custom() this work only needs to be
    done once. Future users could be things like XDP/BPF programs
    that work on different context though and would thus also have
    a different callback function.
    
    The few users of raw records are adapted to initialize their frag
    data from the raw record itself, no change in behavior for them.
    The code is based upon a PoC diff provided by Peter Zijlstra [1].
    
      [1] http://thread.gmane.org/gmane.linux.network/421294Suggested-by: NPeter Zijlstra <peterz@infradead.org>
    Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: NAlexei Starovoitov <ast@kernel.org>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    7e3f977e
internal.h 5.6 KB