1. 15 5月, 2018 1 次提交
    • R
      vmcore: add API to collect hardware dump in second kernel · 2724273e
      Rahul Lakkireddy 提交于
      The sequence of actions done by device drivers to append their device
      specific hardware/firmware logs to /proc/vmcore are as follows:
      
      1. During probe (before hardware is initialized), device drivers
      register to the vmcore module (via vmcore_add_device_dump()), with
      callback function, along with buffer size and log name needed for
      firmware/hardware log collection.
      
      2. vmcore module allocates the buffer with requested size. It adds
      an Elf note and invokes the device driver's registered callback
      function.
      
      3. Device driver collects all hardware/firmware logs into the buffer
      and returns control back to vmcore module.
      
      Ensure that the device dump buffer size is always aligned to page size
      so that it can be mmaped.
      
      Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
      generic and reserve NT_VMCOREDD note type to indicate vmcore device
      dump.
      
      Suggested-by: Eric Biederman <ebiederm@xmission.com>.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2724273e
  2. 07 5月, 2018 5 次提交
  3. 04 5月, 2018 9 次提交
  4. 02 5月, 2018 4 次提交
    • M
      Revert "vhost: make msg padding explicit" · c818aa88
      Michael S. Tsirkin 提交于
      This reverts commit 93c0d549c4c5a7382ad70de6b86610b7aae57406.
      
      Unfortunately the padding will break 32 bit userspace.
      Ouch. Need to add some compat code, revert for now.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c818aa88
    • S
      tcp: send in-queue bytes in cmsg upon read · b75eba76
      Soheil Hassas Yeganeh 提交于
      Applications with many concurrent connections, high variance
      in receive queue length and tight memory bounds cannot
      allocate worst-case buffer size to drain sockets. Knowing
      the size of receive queue length, applications can optimize
      how they allocate buffers to read from the socket.
      
      The number of bytes pending on the socket is directly
      available through ioctl(FIONREAD/SIOCINQ) and can be
      approximated using getsockopt(MEMINFO) (rmem_alloc includes
      skb overheads in addition to application data). But, both of
      these options add an extra syscall per recvmsg. Moreover,
      ioctl(FIONREAD/SIOCINQ) takes the socket lock.
      
      Add the TCP_INQ socket option to TCP. When this socket
      option is set, recvmsg() relays the number of bytes available
      on the socket for reading to the application via the
      TCP_CM_INQ control message.
      
      Calculate the number of bytes after releasing the socket lock
      to include the processed backlog, if any. To avoid an extra
      branch in the hot path of recvmsg() for this new control
      message, move all cmsg processing inside an existing branch for
      processing receive timestamps. Since the socket lock is not held
      when calculating the size of receive queue, TCP_INQ is a hint.
      For example, it can overestimate the queue size by one byte,
      if FIN is received.
      
      With this method, applications can start reading from the socket
      using a small buffer, and then use larger buffers based on the
      remaining data when needed.
      
      V3 change-log:
      	As suggested by David Miller, added loads with barrier
      	to check whether we have multiple threads calling recvmsg
      	in parallel. When that happens we lock the socket to
      	calculate inq.
      V4 change-log:
      	Removed inline from a static function.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b75eba76
    • S
      connector: add parent pid and tgid to coredump and exit events · b086ff87
      Stefan Strogin 提交于
      The intention is to get notified of process failures as soon
      as possible, before a possible core dumping (which could be very long)
      (e.g. in some process-manager). Coredump and exit process events
      are perfect for such use cases (see 2b5faa4c "connector: Added
      coredumping event to the process connector").
      
      The problem is that for now the process-manager cannot know the parent
      of a dying process using connectors. This could be useful if the
      process-manager should monitor for failures only children of certain
      parents, so we could filter the coredump and exit events by parent
      process and/or thread ID.
      
      Add parent pid and tgid to coredump and exit process connectors event
      data.
      Signed-off-by: NStefan Strogin <sstrogin@cisco.com>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b086ff87
    • M
      vhost: make msg padding explicit · de08481a
      Michael S. Tsirkin 提交于
      There's a 32 bit hole just after type. It's best to
      give it a name, this way compiler is forced to initialize
      it with rest of the structure.
      Reported-by: NKevin Easton <kevin@guarana.org>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de08481a
  5. 30 4月, 2018 3 次提交
    • Q
      bpf: fix formatting for bpf_get_stack() helper doc · 79552fbc
      Quentin Monnet 提交于
      Fix formatting (indent) for bpf_get_stack() helper documentation, so
      that the doc is rendered correctly with the Python script.
      
      Fixes: c195651e ("bpf: add bpf_get_stack helper")
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      79552fbc
    • Q
      bpf: fix formatting for bpf_perf_event_read() helper doc · 3bd5a09b
      Quentin Monnet 提交于
      Some edits brought to the last iteration of BPF helper functions
      documentation introduced an error with RST formatting. As a result, most
      of one paragraph is rendered in bold text when only the name of a helper
      should be. Fix it, and fix formatting of another function name in the
      same paragraph.
      
      Fixes: c6b5fb86 ("bpf: add documentation for eBPF helpers (42-50)")
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3bd5a09b
    • E
      tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive · 05255b82
      Eric Dumazet 提交于
      When adding tcp mmap() implementation, I forgot that socket lock
      had to be taken before current->mm->mmap_sem. syzbot eventually caught
      the bug.
      
      Since we can not lock the socket in tcp mmap() handler we have to
      split the operation in two phases.
      
      1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
        This operation does not involve any TCP locking.
      
      2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
       the transfert of pages from skbs to one VMA.
        This operation only uses down_read(&current->mm->mmap_sem) after
        holding TCP lock, thus solving the lockdep issue.
      
      This new implementation was suggested by Andy Lutomirski with great details.
      
      Benefits are :
      
      - Better scalability, in case multiple threads reuse VMAS
         (without mmap()/munmap() calls) since mmap_sem wont be write locked.
      
      - Better error recovery.
         The previous mmap() model had to provide the expected size of the
         mapping. If for some reason one part could not be mapped (partial MSS),
         the whole operation had to be aborted.
         With the tcp_zerocopy_receive struct, kernel can report how
         many bytes were successfuly mapped, and how many bytes should
         be read to skip the problematic sequence.
      
      - No more memory allocation to hold an array of page pointers.
        16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
      
      - skbs are freed while mmap_sem has been released
      
      Following patch makes the change in tcp_mmap tool to demonstrate
      one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
      
      Note that memcg might require additional changes.
      
      Fixes: 93ab6cc6 ("tcp: implement mmap() for zero copy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Cc: linux-mm@kvack.org
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05255b82
  6. 29 4月, 2018 2 次提交
    • A
      bpf: Fix helpers ctx struct types in uapi doc · a3ef8e9a
      Andrey Ignatov 提交于
      Helpers may operate on two types of ctx structures: user visible ones
      (e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones
      (e.g. `struct bpf_sock_ops_kern`) in kernel implementation.
      
      UAPI documentation must refer to only user visible structures.
      
      The patch replaces references to `_kern` structures in BPF helpers
      description by corresponding user visible structures.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a3ef8e9a
    • Y
      bpf: add bpf_get_stack helper · c195651e
      Yonghong Song 提交于
      Currently, stackmap and bpf_get_stackid helper are provided
      for bpf program to get the stack trace. This approach has
      a limitation though. If two stack traces have the same hash,
      only one will get stored in the stackmap table,
      so some stack traces are missing from user perspective.
      
      This patch implements a new helper, bpf_get_stack, will
      send stack traces directly to bpf program. The bpf program
      is able to see all stack traces, and then can do in-kernel
      processing or send stack traces to user space through
      shared map or bpf_perf_event_output.
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c195651e
  7. 28 4月, 2018 1 次提交
  8. 27 4月, 2018 12 次提交
    • J
      tipc: introduce ioctl for fetching node identity · 3e5cf362
      Jon Maloy 提交于
      After the introduction of a 128-bit node identity it may be difficult
      for a user to correlate between this identity and the generated node
      hash address.
      
      We now try to make this easier by introducing a new ioctl() call for
      fetching a node identity by using the hash value as key. This will
      be particularly useful when we extend some of the commands in the
      'tipc' tool, but we also expect regular user applications to need
      this feature.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e5cf362
    • Q
      bpf: add documentation for eBPF helpers (65-66) · 2d020dd7
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helper from Nikita:
      - bpf_xdp_adjust_tail()
      
      Helper from Eyal:
      - bpf_skb_get_xfrm_state()
      
      v4:
      - New patch (helpers did not exist yet for previous versions).
      
      Cc: Nikita V. Shirokov <tehnerd@tehnerd.com>
      Cc: Eyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2d020dd7
    • Q
      bpf: add documentation for eBPF helpers (58-64) · ab127040
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by John:
      
      - bpf_redirect_map()
      - bpf_sk_redirect_map()
      - bpf_sock_map_update()
      - bpf_msg_redirect_map()
      - bpf_msg_apply_bytes()
      - bpf_msg_cork_bytes()
      - bpf_msg_pull_data()
      
      v4:
      - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
        "his" to "this". Also add a paragraph on performance improvement over
        bpf_redirect() helper.
      
      v3:
      - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
      - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
      - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
        for generic XDP but supported on native XDP.
      - bpf_msg_pull_data(): Clarify comment about invalidated verifier
        checks.
      
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ab127040
    • Q
      bpf: add documentation for eBPF helpers (51-57) · 7aa79a86
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helpers from Lawrence:
      - bpf_setsockopt()
      - bpf_getsockopt()
      - bpf_sock_ops_cb_flags_set()
      
      Helpers from Yonghong:
      - bpf_perf_event_read_value()
      - bpf_perf_prog_read_value()
      
      Helper from Josef:
      - bpf_override_return()
      
      Helper from Andrey:
      - bpf_bind()
      
      v4:
      - bpf_perf_event_read_value(): State that this helper should be
        preferred over bpf_perf_event_read().
      
      v3:
      - bpf_perf_event_read_value(): Fix time of selection for perf event type
        in description. Remove occurences of "cores" to avoid confusion with
        "CPU".
      - bpf_bind(): Remove last paragraph of description, which was off topic.
      
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Andrey Ignatov <rdna@fb.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      [for bpf_bind()]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7aa79a86
    • Q
      bpf: add documentation for eBPF helpers (42-50) · c6b5fb86
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helper from Kaixu:
      - bpf_perf_event_read()
      
      Helpers from Martin:
      - bpf_skb_under_cgroup()
      - bpf_xdp_adjust_head()
      
      Helpers from Sargun:
      - bpf_probe_write_user()
      - bpf_current_task_under_cgroup()
      
      Helper from Thomas:
      - bpf_skb_change_head()
      
      Helper from Gianluca:
      - bpf_probe_read_str()
      
      Helpers from Chenbo:
      - bpf_get_socket_cookie()
      - bpf_get_socket_uid()
      
      v4:
      - bpf_perf_event_read(): State that bpf_perf_event_read_value() should
        be preferred over this helper.
      - bpf_skb_change_head(): Clarify comment about invalidated verifier
        checks.
      - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
        checks.
      - bpf_probe_write_user(): Add that dst must be a valid user space
        address.
      - bpf_get_socket_cookie(): Improve description by making clearer that
        the cockie belongs to the socket, and state that it remains stable for
        the life of the socket.
      
      v3:
      - bpf_perf_event_read(): Fix time of selection for perf event type in
        description. Remove occurences of "cores" to avoid confusion with
        "CPU".
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Gianluca Borello <g.borello@gmail.com>
      Cc: Chenbo Feng <fengc@google.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c6b5fb86
    • Q
      bpf: add documentation for eBPF helpers (33-41) · fa15601a
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Daniel:
      
      - bpf_get_hash_recalc()
      - bpf_skb_change_tail()
      - bpf_skb_pull_data()
      - bpf_csum_update()
      - bpf_set_hash_invalid()
      - bpf_get_numa_node_id()
      - bpf_set_hash()
      - bpf_skb_adjust_room()
      - bpf_xdp_adjust_meta()
      
      v4:
      - bpf_skb_change_tail(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_pull_data(): Clarify the motivation for using this helper or
        bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
        *skb*. Clarify comment about invalidated verifier checks.
      - bpf_csum_update(): Fix description of checksum (entire packet, not IP
        checksum). Fix a typo: "header" instead of "helper".
      - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
      - bpf_get_numa_node_id(): State that the helper is not restricted to
        programs attached to sockets.
      - bpf_skb_adjust_room(): Clarify comment about invalidated verifier
        checks.
      - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
        checks.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fa15601a
    • Q
      bpf: add documentation for eBPF helpers (23-32) · 1fdd08be
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Daniel:
      
      - bpf_get_prandom_u32()
      - bpf_get_smp_processor_id()
      - bpf_get_cgroup_classid()
      - bpf_get_route_realm()
      - bpf_skb_load_bytes()
      - bpf_csum_diff()
      - bpf_skb_get_tunnel_opt()
      - bpf_skb_set_tunnel_opt()
      - bpf_skb_change_proto()
      - bpf_skb_change_type()
      
      v4:
      - bpf_get_prandom_u32(): Warn that the prng is not cryptographically
        secure.
      - bpf_get_smp_processor_id(): Fix a typo (case).
      - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
        being limited to cgroup v1, and to egress path.
      - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
        Add a note about usage with TC and advantage of clsact. Fix a typo in
        return value ("sdb" instead of "skb").
      - bpf_skb_load_bytes(): Make explicit loading large data loads it to the
        eBPF stack.
      - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
        bpf_l3|l4_csum_replace().
      - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
        metadata" mode, and example of this with Geneve.
      - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
        description.
      - bpf_skb_change_proto(): Mention that the main use case is NAT64.
        Clarify comment about invalidated verifier checks.
      
      v3:
      - bpf_get_prandom_u32(): Fix helper name :(. Add description, including
        a note on the internal random state.
      - bpf_get_smp_processor_id(): Add description, including a note on the
        processor id remaining stable during program run.
      - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
        required to use the helper. Add a reference to related documentation.
        State that placing a task in net_cls controller disables cgroup-bpf.
      - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
        required to use this helper.
      - bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1fdd08be
    • Q
      bpf: add documentation for eBPF helpers (12-22) · c456dec4
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Alexei:
      
      - bpf_get_current_pid_tgid()
      - bpf_get_current_uid_gid()
      - bpf_get_current_comm()
      - bpf_skb_vlan_push()
      - bpf_skb_vlan_pop()
      - bpf_skb_get_tunnel_key()
      - bpf_skb_set_tunnel_key()
      - bpf_redirect()
      - bpf_perf_event_output()
      - bpf_get_stackid()
      - bpf_get_current_task()
      
      v4:
      - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
        note on bpf_redirect_map() providing better performance. Replace "Save
        for" with "Except for".
      - bpf_skb_vlan_push(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
        mode, and example tunneling protocols with which it can be used.
      - bpf_skb_set_tunnel_key(): Add a reference to the description of
        bpf_skb_get_tunnel_key().
      - bpf_perf_event_output(): Specify that, and for what purpose, the
        helper can be used with programs attached to TC and XDP.
      
      v3:
      - bpf_skb_get_tunnel_key(): Change and improve description and example.
      - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
      - bpf_perf_event_output(): Fix first sentence of description. Delete
        wrong statement on context being evaluated as a struct pt_reg. Remove
        the long yet incomplete example.
      - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
        configurable.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c456dec4
    • Q
      bpf: add documentation for eBPF helpers (01-11) · ad4a5223
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Alexei:
      
      - bpf_map_lookup_elem()
      - bpf_map_update_elem()
      - bpf_map_delete_elem()
      - bpf_probe_read()
      - bpf_ktime_get_ns()
      - bpf_trace_printk()
      - bpf_skb_store_bytes()
      - bpf_l3_csum_replace()
      - bpf_l4_csum_replace()
      - bpf_tail_call()
      - bpf_clone_redirect()
      
      v4:
      - bpf_map_lookup_elem(): Add "const" qualifier for key.
      - bpf_map_update_elem(): Add "const" qualifier for key and value.
      - bpf_map_lookup_elem(): Add "const" qualifier for key.
      - bpf_skb_store_bytes(): Clarify comment about invalidated verifier
        checks.
      - bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
        about bpf_csum_diff().
      - bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
        note about bpf_csum_diff().
      - bpf_tail_call(): Bring minor edits to description.
      - bpf_clone_redirect(): Add a note about the relation with
        bpf_redirect(). Also clarify comment about invalidated verifier
        checks.
      
      v3:
      - bpf_map_lookup_elem(): Fix description of restrictions for flags
        related to the existence of the entry.
      - bpf_trace_printk(): State that trace_pipe can be configured. Fix
        return value in case an unknown format specifier is met. Add a note on
        kernel log notice when the helper is used. Edit example.
      - bpf_tail_call(): Improve comment on stack inheritance.
      - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ad4a5223
    • Q
      bpf: add script and prepare bpf.h for new helpers documentation · 56a092c8
      Quentin Monnet 提交于
      Remove previous "overview" of eBPF helpers from user bpf.h header.
      Replace it by a comment explaining how to process the new documentation
      (to come in following patches) with a Python script to produce RST, then
      man page documentation.
      
      Also add the aforementioned Python script under scripts/. It is used to
      process include/uapi/linux/bpf.h and to extract helper descriptions, to
      turn it into a RST document that can further be processed with rst2man
      to produce a man page. The script takes one "--filename <path/to/file>"
      option. If the script is launched from scripts/ in the kernel root
      directory, it should be able to find the location of the header to
      parse, and "--filename <path/to/file>" is then optional. If it cannot
      find the file, then the option becomes mandatory. RST-formatted
      documentation is printed to standard output.
      
      Typical workflow for producing the final man page would be:
      
          $ ./scripts/bpf_helpers_doc.py \
                  --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
          $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
          $ man /tmp/bpf-helpers.7
      
      Note that the tool kernel-doc cannot be used to document eBPF helpers,
      whose signatures are not available directly in the header files
      (pre-processor directives are used to produce them at the beginning of
      the compilation process).
      
      v4:
      - Also remove overviews for newly added bpf_xdp_adjust_tail() and
        bpf_skb_get_xfrm_state().
      - Remove vague statement about what helpers are restricted to GPL
        programs in "LICENSE" section for man page footer.
      - Replace license boilerplate with SPDX tag for Python script.
      
      v3:
      - Change license for man page.
      - Remove "for safety reasons" from man page header text.
      - Change "packets metadata" to "packets" in man page header text.
      - Move and fix comment on helpers introducing no overhead.
      - Remove "NOTES" section from man page footer.
      - Add "LICENSE" section to man page footer.
      - Edit description of file include/uapi/linux/bpf.h in man page footer.
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      56a092c8
    • J
      bpf: Add gpl_compatible flag to struct bpf_prog_info · b85fab0e
      Jiri Olsa 提交于
      Adding gpl_compatible flag to struct bpf_prog_info
      so it can be dumped via bpf_prog_get_info_by_fd and
      displayed via bpftool progs dump.
      
      Alexei noticed 4-byte hole in struct bpf_prog_info,
      so we put the u32 flags field in there, and we can
      keep adding bit fields in there without breaking
      user space.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b85fab0e
    • W
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn 提交于
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec1f6f6
  9. 26 4月, 2018 1 次提交
    • T
      Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME · a3ed0e43
      Thomas Gleixner 提交于
      Revert commits
      
      92af4dcb ("tracing: Unify the "boot" and "mono" tracing clocks")
      127bfa5f ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
      7250a404 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
      d6c7270e ("timekeeping: Remove boot time specific code")
      f2d6fdbf ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
      d6ed449a ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
      72199320 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
      
      As stated in the pull request for the unification of CLOCK_MONOTONIC and
      CLOCK_BOOTTIME, it was clear that we might have to revert the change.
      
      As reported by several folks systemd and other applications rely on the
      documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
      changes. After resume daemons time out and other timeout related issues are
      observed. Rafael compiled this list:
      
      * systemd kills daemons on resume, after >WatchdogSec seconds
        of suspending (Genki Sky).  [Verified that that's because systemd uses
        CLOCK_MONOTONIC and expects it to not include the suspend time.]
      
      * systemd-journald misbehaves after resume:
        systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
      corrupted or uncleanly shut down, renaming and replacing.
        (Mike Galbraith).
      
      * NetworkManager reports "networking disabled" and networking is broken
        after resume 50% of the time (Pavel).  [May be because of systemd.]
      
      * MATE desktop dims the display and starts the screensaver right after
        system resume (Pavel).
      
      * Full system hang during resume (me).  [May be due to systemd or NM or both.]
      
      That happens on debian and open suse systems.
      
      It's sad, that these problems were neither catched in -next nor by those
      folks who expressed interest in this change.
      Reported-by: NRafael J. Wysocki <rjw@rjwysocki.net>
      Reported-by: Genki Sky <sky@genki.is>,
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kevin Easton <kevin@guarana.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      a3ed0e43
  10. 25 4月, 2018 2 次提交
    • E
      bpf: add helper for getting xfrm states · 12bed760
      Eyal Birger 提交于
      This commit introduces a helper which allows fetching xfrm state
      parameters by eBPF programs attached to TC.
      
      Prototype:
      bpf_skb_get_xfrm_state(skb, index, xfrm_state, size, flags)
      
      skb: pointer to skb
      index: the index in the skb xfrm_state secpath array
      xfrm_state: pointer to 'struct bpf_xfrm_state'
      size: size of 'struct bpf_xfrm_state'
      flags: reserved for future extensions
      
      The helper returns 0 on success. Non zero if no xfrm state at the index
      is found - or non exists at all.
      
      struct bpf_xfrm_state currently includes the SPI, peer IPv4/IPv6
      address and the reqid; it can be further extended by adding elements to
      its end - indicating the populated fields by the 'size' argument -
      keeping backwards compatibility.
      
      Typical usage:
      
      struct bpf_xfrm_state x = {};
      bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
      ...
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      12bed760
    • M
      virtio_balloon: add array of stat names · b4000032
      Michael S. Tsirkin 提交于
      Jason Wang points out that it's very hard for users to build an array of
      stat names. The naive thing is to use VIRTIO_BALLOON_S_NR but that
      breaks if we add more stats - as done e.g. recently by commit 6c64fe7f
      ("virtio_balloon: export hugetlb page allocation counts").
      
      Let's add an array of reasonably readable names.
      
      Fixes: 6c64fe7f ("virtio_balloon: export hugetlb page allocation counts")
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NJonathan Helman <jonathan.helman@oracle.com>
      b4000032