1. 03 3月, 2019 7 次提交
    • S
      selftests: bpf: break up test_progs - preparations · 3f306588
      Stanislav Fomichev 提交于
      Add new prog_tests directory where tests are supposed to land.
      Each prog_tests/<filename>.c is expected to have a global function
      with signature 'void test_<filename>(void)'. Makefile automatically
      generates prog_tests/tests.h file with entry for each prog_tests file:
      
      	#ifdef DECLARE
      	extern void test_<filename>(void);
      	...
      	#endif
      
      	#ifdef CALL
      	test_<filename>();
      	...
      	#endif
      
      prog_tests/tests.h is included in test_progs.c in two places with
      appropriate defines. This scheme allows us to move each function with
      a separate patch without breaking anything.
      
      Compared to the recent verifier split, each separate file here is
      a compilation unit and test_progs.[ch] is now used as a place to put
      some common routines that might be used by multiple tests.
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3f306588
    • A
      Merge branch 'bpf_skb_ecn_set_ce' · 0d7f6827
      Alexei Starovoitov 提交于
      Lawrence Brakmo says:
      
      ====================
      Host Bandwidth Manager is a framework for limiting the bandwidth used
      by v2 cgroups. It consists of 1 BPF helper, a sample BPF program to
      limit egress bandwdith as well as a sample user program and script to
      simplify HBM testing.
      
      The sample HBM BPF program is not meant to be production quality, it is
      provided as proof of concept. A lot more information, including sample
      runs in some cases, are provided in the commit messages of the individual
      patches.
      
      A future patch will add support for reducing TCP's cwnd (we are evaluating
      alternatives). Another patch will add support for fair queueing's Earliest
      Departure Time. Until then, HBM is better suited for flows supporitng ECN.
      
      In addition, A BPF program to limit ingress bandwidth will be provided in
      an upcomming patchset.
      
      Changes from v1 to v2:
        * bpf_tcp_enter_cwr can only be called from a cgroup skb egress BPF
          program (otherwise load or attach will fail) where we already hold
          the sk lock. Also only applies for ESTABLISHED state.
        * bpf_skb_ecn_set_ce uses INET_ECN_set_ce()
        * bpf_tcp_check_probe_timer now uses tcp_reset_xmit_timer. Can only be
          used by egress cgroup skb programs.
        * removed load_cg_skb user program.
        * nrm bpf egress program checks packet header in skb to determine
          ECN value. Now also works for ECN enabled UDP packets.
          Using ECN_ defines instead of integers.
        * NRM script test program now uses bpftool instead of load_cg_skb
      
      Changes from v2 to v3:
        * Changed name from NRM (Network Resource Manager) to HBM (Host
          Bandwdith Manager)
        * The bpf helper to set ECN ce now checks that the header is writeable
        * Removed helper bpf functions that modified TCP state due to a concern
          about whether the socket is locked by the current thread.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0d7f6827
    • B
      bpf: HBM test script · 4ffd44cf
      brakmo 提交于
      Script for testing HBM (Host Bandwidth Manager) framework.
      It creates a cgroup to use for testing and load a BPF program to limit
      egress bandwidht. It then uses iperf3 or netperf to create
      loads. The output is the goodput in Mbps (unless -D is used).
      
      It can work on a single host using loopback or among two hosts (with netperf).
      When using loopback, it is recommended to also introduce a delay of at least
      1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.
      
      USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D]
                   [-d=<delay>|--delay=<delay>] [--debug] [-E]
                   [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l]
      	     [-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>]
                   [-R] [-s=<server>|--server=<server] [--stats]
      	     [-t=<time>|--time=<time>] [-w] [cubic|dctcp]
        Where:
          out               Egress (default egress)
          -b or --bpf       BPF program filename to load and attach.
                            Default is nrm_out_kern.o for egress,
          -c or -cc         TCP congestion control (cubic or dctcp)
          -d or --delay     Add a delay in ms using netem
          -D                In addition to the goodput in Mbps, it also outputs
                            other detailed information. This information is
                            test dependent (i.e. iperf3 or netperf).
          --debug           Print BPF trace buffer
          -E                Enable ECN (not required for dctcp)
          -f or --flows     Number of concurrent flows (default=1)
          -i or --id        cgroup id (an integer, default is 1)
          -l                Do not limit flows using loopback
          -N                Use netperf instead of iperf3
          -h                Help
          -p or --port      iperf3 port (default is 5201)
          -P                Use an iperf3 instance for each flow
          -q                Use the specified qdisc.
          -r or --rate      Rate in Mbps (default 1s 1Gbps)
          -R                Use TCP_RR for netperf. 1st flow has req
                            size of 10KB, rest of 1MB. Reply in all
                            cases is 1 byte.
                            More detailed output for each flow can be found
                            in the files netperf.<cg>.<flow>, where <cg> is the
                            cgroup id as specified with the -i flag, and <flow>
                            is the flow id starting at 1 and increasing by 1 for
                            flow (as specified by -f).
          -s or --server    hostname of netperf server. Used to create netperf
                            test traffic between to hosts (default is within host)
                            netserver must be running on the host.
          --stats           Get HBM stats (marked, dropped, etc.)
          -t or --time      duration of iperf3 in seconds (default=5)
          -w                Work conserving flag. cgroup can increase its
                            bandwidth beyond the rate limit specified
                            while there is available bandwidth. Current
                            implementation assumes there is only one NIC
                            (eth0), but can be extended to support multiple
                            NICs. This is just a proof of concept.
          cubic or dctcp    specify TCP CC to use
      
      Examples:
       ./do_hbm_test.sh -l -d=1 -D --stats
           Runs a 5 second test, using a single iperf3 flow and with the default
           rate limit of 1Gbps and a delay of 1ms (using netem) using the default
           TCP congestion control on the loopback device (hence we use "-l" to
           enforce bandwidth limit on loopback device). Since no direction is
           specified, it defaults to egress. Since no TCP CC algorithm is
           specified it uses the system default (Cubic for this test).
           With no -D flag, only the value of the AGGREGATE OUTPUT would show.
           id refers to the cgroup id and is useful when running multi cgroup
           tests (supported by a future patch).
           This patchset does not support calling TCP's congesion window
           reduction, even when packets are dropped by the BPF program, resulting
           in a large number of packets dropped. It is recommended that the  current
           HBM implemenation only be used with ECN enabled flows. A future patch
           will add support for reducing TCP's cwnd and will increase the
           performance of non-ECN enabled flows.
         Output:
           Details for HBM in cgroup 1
           id:1
           rate_mbps:493
           duration:4.8 secs
           packets:11355
           bytes_MB:590
           pkts_dropped:4497
           bytes_dropped_MB:292
           pkts_marked_percent: 39.60
           bytes_marked_percent: 49.49
           pkts_dropped_percent: 39.60
           bytes_dropped_percent: 49.49
           PING AVG DELAY:2.075
           AGGREGATE_GOODPUT:505
      
      ./do_nrm_test.sh -l -d=1 -D --stats dctcp
           Same as above but using dctcp. Note that fewer bytes are dropped
           (0.01% vs. 49%).
         Output:
           Details for HBM in cgroup 1
           id:1
           rate_mbps:945
           duration:4.9 secs
           packets:16859
           bytes_MB:578
           pkts_dropped:1
           bytes_dropped_MB:0
           pkts_marked_percent: 28.74
           bytes_marked_percent: 45.15
           pkts_dropped_percent:  0.01
           bytes_dropped_percent:  0.01
           PING AVG DELAY:2.083
           AGGREGATE_GOODPUT:965
      
      ./do_nrm_test.sh -d=1 -D --stats
           As first example, but without limiting loopback device (i.e. no
           "-l" flag). Since there is no bandwidth limiting, no details for
           HBM are printed out.
         Output:
           Details for HBM in cgroup 1
           PING AVG DELAY:2.019
           AGGREGATE_GOODPUT:42655
      
      ./do_hbm.sh -l -d=1 -D --stats -f=2
           Uses iper3 and does 2 flows
      ./do_hbm.sh -l -d=1 -D --stats -f=4 -P
           Uses iperf3 and does 4 flows, each flow as a separate process.
      ./do_hbm.sh -l -d=1 -D --stats -f=4 -N
           Uses netperf, 4 flows
      ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name>
           Uses netperf between two hosts. The remote host name is specified
           with -s= and you need to start the program netserver manually on
           the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
      ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
           -s=<server-name>
           As previous, but allows use of extra bandwidth. For this test the
           rate is 8Gbps vs. 1Gbps of the previous test.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4ffd44cf
    • B
      bpf: User program for testing HBM · a1270fe9
      brakmo 提交于
      The program nrm creates a cgroup and attaches a BPF program to the
      cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
      One still needs to create network traffic. This can be done through
      netesto, netperf or iperf3.
      A follow-up patch contains a script to create traffic.
      
      USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>]
                 [-w] [-h] [prog]
        Where:
         -d        Print BPF trace debug buffer
         -l        Also limit flows doing loopback
         -n <#>    To create cgroup "/hbm#" and attach prog. Default is /nrm1
                   This is convenient when testing HBM in more than 1 cgroup
         -r <rate> Rate limit in Mbps
         -s        Get HBM stats (marked, dropped, etc.)
         -t <time> Exit after specified seconds (deault is 0)
         -w        Work conserving flag. cgroup can increase its bandwidth
                   beyond the rate limit specified while there is available
                   bandwidth. Current implementation assumes there is only
                   NIC (eth0), but can be extended to support multiple NICs.
                   Currrently only supported for egress. Note, this is just
      	     a proof of concept.
         -h        Print this info
         prog      BPF program file name. Name defaults to hbm_out_kern.o
      
      More information about HBM can be found in the paper "BPF Host Resource
      Management" presented at the 2018 Linux Plumbers Conference, Networking Track
      (http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf)
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a1270fe9
    • B
      bpf: Sample HBM BPF program to limit egress bw · 187d0738
      brakmo 提交于
      A cgroup skb BPF program to limit cgroup output bandwidth.
      It uses a modified virtual token bucket queue to limit average
      egress bandwidth. The implementation uses credits instead of tokens.
      Negative credits imply that queueing would have happened (this is
      a virtual queue, so no queueing is done by it. However, queueing may
      occur at the actual qdisc (which is not used for rate limiting).
      
      This implementation uses 3 thresholds, one to start marking packets and
      the other two to drop packets:
                                       CREDIT
             - <--------------------------|------------------------> +
                   |    |          |      0
                   |  Large pkt    |
                   |  drop thresh  |
        Small pkt drop             Mark threshold
            thresh
      
      The effect of marking depends on the type of packet:
      a) If the packet is ECN enabled, then the packet is ECN ce marked.
         The current mark threshold is tuned for DCTCP.
      c) Else, it is dropped if it is a large packet.
      
      If the credit is below the drop threshold, the packet is dropped.
      Note that dropping a packet through the BPF program does not trigger CWR
      (Congestion Window Reduction) in TCP packets. A future patch will add
      support for triggering CWR.
      
      This BPF program actually uses 2 drop thresholds, one threshold
      for larger packets (>= 120 bytes) and another for smaller packets. This
      protects smaller packets such as SYNs, ACKs, etc.
      
      The default bandwidth limit is set at 1Gbps but this can be changed by
      a user program through a shared BPF map. In addition, by default this BPF
      program does not limit connections using loopback. This behavior can be
      overwritten by the user program. There is also an option to calculate
      some statistics, such as percent of packets marked or dropped, which
      the user program can access.
      
      A latter patch provides such a program (hbm.c)
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      187d0738
    • B
      bpf: sync bpf.h to tools and update bpf_helpers.h · 5cce85c6
      brakmo 提交于
      This patch syncs the uapi bpf.h to tools/ and also updates
      bpf_herlpers.h in tools/
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5cce85c6
    • B
      bpf: add bpf helper bpf_skb_ecn_set_ce · f7c917ba
      brakmo 提交于
      This patch adds a new bpf helper BPF_FUNC_skb_ecn_set_ce
      "int bpf_skb_ecn_set_ce(struct sk_buff *skb)". It is added to
      BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can
      be attached to the ingress and egress path. The helper is needed
      because his type of bpf_prog cannot modify the skb directly.
      
      This helper is used to set the ECN field of ECN capable IP packets to ce
      (congestion encountered) in the IPv6 or IPv4 header of the skb. It can be
      used by a bpf_prog to manage egress or ingress network bandwdith limit
      per cgroupv2 by inducing an ECN response in the TCP sender.
      This works best when using DCTCP.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f7c917ba
  2. 02 3月, 2019 7 次提交
  3. 01 3月, 2019 16 次提交
  4. 28 2月, 2019 9 次提交
    • A
      Merge branch 'inner_map_spin_lock-fix' · 3bcd6044
      Alexei Starovoitov 提交于
      Yonghong Song says:
      
      ====================
      The inner_map_meta->spin_lock_off is not set correctly during
      map creation for BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS.
      This may lead verifier error due to misinformation.
      This patch set fixed the issue with Patch #1 for the kernel change
      and Patch #2 for enhanced selftest test_maps.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3bcd6044
    • Y
      tools/bpf: selftests: add map lookup to test_map_in_map bpf prog · 9eca5083
      Yonghong Song 提交于
      The bpf_map_lookup_elem is added in the bpf program.
      Without previous patch, the test change will trigger the
      following error:
        $ ./test_maps
        ...
        ; value_p = bpf_map_lookup_elem(map, &key);
        20: (bf) r1 = r7
        21: (bf) r2 = r8
        22: (85) call bpf_map_lookup_elem#1
        ; if (!value_p || *value_p != 123)
        23: (15) if r0 == 0x0 goto pc+16
         R0=map_value(id=2,off=0,ks=4,vs=4,imm=0) R6=inv1 R7=map_ptr(id=0,off=0,ks=4,vs=4,imm=0)
         R8=fp-8,call_-1 R10=fp0,call_-1 fp-8=mmmmmmmm
        ; if (!value_p || *value_p != 123)
        24: (61) r1 = *(u32 *)(r0 +0)
         R0=map_value(id=2,off=0,ks=4,vs=4,imm=0) R6=inv1 R7=map_ptr(id=0,off=0,ks=4,vs=4,imm=0)
         R8=fp-8,call_-1 R10=fp0,call_-1 fp-8=mmmmmmmm
        bpf_spin_lock cannot be accessed directly by load/store
      
      With the kernel fix in the previous commit, the error goes away.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9eca5083
    • Y
      bpf: set inner_map_meta->spin_lock_off correctly · a115d0ed
      Yonghong Song 提交于
      Commit d83525ca ("bpf: introduce bpf_spin_lock")
      introduced bpf_spin_lock and the field spin_lock_off
      in kernel internal structure bpf_map has the following
      meaning:
        >=0 valid offset, <0 error
      
      For every map created, the kernel will ensure
      spin_lock_off has correct value.
      
      Currently, bpf_map->spin_lock_off is not copied
      from the inner map to the map_in_map inner_map_meta
      during a map_in_map type map creation, so
      inner_map_meta->spin_lock_off = 0.
      This will give verifier wrong information that
      inner_map has bpf_spin_lock and the bpf_spin_lock
      is defined at offset 0. An access to offset 0
      of a value pointer will trigger the following error:
         bpf_spin_lock cannot be accessed directly by load/store
      
      This patch fixed the issue by copy inner map's spin_lock_off
      value to inner_map_meta->spin_lock_off.
      
      Fixes: d83525ca ("bpf: introduce bpf_spin_lock")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a115d0ed
    • D
      samples: bpf: fix: broken sample regarding removed function · d2e614cb
      Daniel T. Lee 提交于
      Currently, running sample "task_fd_query" and "tracex3" occurs the
      following error. On kernel v5.0-rc* this sample will be unavailable
      due to the removal of function 'blk_start_request' at commit "a1ce35fa".
      (function removed, as "Single Queue IO scheduler" no longer exists)
      
      $ sudo ./task_fd_query
      failed to create kprobe 'blk_start_request' error 'No such file or
      directory'
      
      This commit will change the function 'blk_start_request' to
      'blk_mq_start_request' to fix the broken sample.
      Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d2e614cb
    • D
      Merge branch 'bpf-prog-stats' · da4e023e
      Daniel Borkmann 提交于
      Alexei Starovoitov says:
      
      ====================
      Introduce per program stats to monitor the usage BPF.
      
      v2->v3:
      - rename to run_time_ns/run_cnt everywhere
      
      v1->v2:
      - fixed u64 stats on 32-bit archs. Thanks Eric
      - use more verbose run_time_ns in json output as suggested by Andrii
      - refactored prog_alloc and clarified behavior of stats in subprogs
      ====================
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      da4e023e
    • A
      tools/bpftool: recognize bpf_prog_info run_time_ns and run_cnt · 88ad472b
      Alexei Starovoitov 提交于
      $ bpftool p s
      1: kprobe  tag a56587d488d216c9  gpl run_time_ns 79786 run_cnt 8
      	loaded_at 2019-02-22T12:22:51-0800  uid 0
      	xlated 352B  not jited  memlock 4096B
      
      $ bpftool --json --pretty p s
      [{
              "id": 1,
              "type": "kprobe",
              "tag": "a56587d488d216c9",
              "gpl_compatible": true,
              "run_time_ns": 79786,
              "run_cnt": 8,
              "loaded_at": 1550866971,
              "uid": 0,
              "bytes_xlated": 352,
              "jited": false,
              "bytes_memlock": 4096
          }
      ]
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      88ad472b
    • A
      tools/bpf: sync bpf.h into tools · b1eca86d
      Alexei Starovoitov 提交于
      sync bpf.h into tools directory
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b1eca86d
    • A
      bpf: expose program stats via bpf_prog_info · 5f8f8b93
      Alexei Starovoitov 提交于
      Return bpf program run_time_ns and run_cnt via bpf_prog_info
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5f8f8b93
    • A
      bpf: enable program stats · 492ecee8
      Alexei Starovoitov 提交于
      JITed BPF programs are indistinguishable from kernel functions, but unlike
      kernel code BPF code can be changed often.
      Typical approach of "perf record" + "perf report" profiling and tuning of
      kernel code works just as well for BPF programs, but kernel code doesn't
      need to be monitored whereas BPF programs do.
      Users load and run large amount of BPF programs.
      These BPF stats allow tools monitor the usage of BPF on the server.
      The monitoring tools will turn sysctl kernel.bpf_stats_enabled
      on and off for few seconds to sample average cost of the programs.
      Aggregated data over hours and days will provide an insight into cost of BPF
      and alarms can trigger in case given program suddenly gets more expensive.
      
      The cost of two sched_clock() per program invocation adds ~20 nsec.
      Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
      from ~10 nsec to ~30 nsec.
      static_key minimizes the cost of the stats collection.
      There is no measurable difference before/after this patch
      with kernel.bpf_stats_enabled=0
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      492ecee8
  5. 26 2月, 2019 1 次提交
    • D
      Merge branch 'bpf-libbpf-af-xdp' · 143bdc2e
      Daniel Borkmann 提交于
      Magnus Karlsson says:
      
      ====================
      This patch proposes to add AF_XDP support to libbpf. The main reason
      for this is to facilitate writing applications that use AF_XDP by
      offering higher-level APIs that hide many of the details of the AF_XDP
      uapi. This is in the same vein as libbpf facilitates XDP adoption by
      offering easy-to-use higher level interfaces of XDP
      functionality. Hopefully this will facilitate adoption of AF_XDP, make
      applications using it simpler and smaller, and finally also make it
      possible for applications to benefit from optimizations in the AF_XDP
      user space access code. Previously, people just copied and pasted the
      code from the sample application into their application, which is not
      desirable.
      
      The proposed interface is composed of two parts:
      
      * Low-level access interface to the four rings and the packet
      * High-level control plane interface for creating and setting up umems
        and AF_XDP sockets. This interface also loads a simple XDP program
        that routes all traffic on a queue up to the AF_XDP socket.
      
      The sample program has been updated to use this new interface and in
      that process it lost roughly 300 lines of code. I cannot detect any
      performance degradations due to the use of this library instead of the
      previous functions that were inlined in the sample application. But I
      did measure this on a slower machine and not the Broadwell that we
      normally use.
      
      The rings are now called xsk_ring and when a producer operates on
      it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
      way we can get some compile time error checking that the rings are
      used correctly.
      
      Comments and contenplations:
      
      * The current behaviour is that the library loads an XDP program (if
        requested to do so) but the clean up of this program is left to the
        application. It would be possible to implement this cleanup in the
        library, but it would require state to be kept on netdev level,
        which there is none at the moment, and the synchronization of this
        between processes. All this adding complexity. But when we get an
        XDP program per queue id, then it becomes trivial to also remove the
        XDP program when the application exits. This proposal from Jesper,
        Björn and others will also improve the performance of libbpf, since
        most of the XDP program code can be removed when that feature is
        supported.
      
      * In a future release, I am planning on adding a higher level data
        plane interface too. This will be based around recvmsg and sendmsg
        with the use of struct iovec for batching, without the user having
        to know anything about the underlying four rings of an AF_XDP
        socket. There will be one semantic difference though from the
        standard recvmsg and that is that the kernel will fill in the iovecs
        instead of the application. But the rest should be the same as the
        libc versions so that application writers feel at home.
      
      Patch 1: adds AF_XDP support in libbpf
      Patch 2: updates the xdpsock sample application to use the libbpf functions
      Patch 3: Documentation update to help first time users
      
      Changes v5 to v6:
        * Fixed prog_fd bug found by Xiaolong Ye. Thanks!
      Changes v4 to v5:
        * Added a FAQ to the documentation
        * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to
          xsk_umem__get_data
        * Replaced the netlink code with bpf_get_link_xdp_id()
        * Dynamic allocation of the map sizes. They are now sized after
          the max number of queueus on the netdev in question.
      Changes v3 to v4:
        * Dropped the pr_*() patch in favor of Yonghong Song's patch set
        * Addressed the review comments of Daniel Borkmann, mainly leaking
          of file descriptors at clean up and making the data plane APIs
          all static inline (with the exception of xsk_umem__get_data that
          uses an internal structure I do not want to expose).
        * Fixed the netlink callback as suggested by Maciej Fijalkowski.
        * Removed an unecessary include in the sample program as spotted by
          Ilia Fillipov.
      Changes v2 to v3:
        * Added automatic loading of a simple XDP program that routes all
          traffic on a queue up to the AF_XDP socket. This program loading
          can be disabled.
        * Updated function names to be consistent with the libbpf naming
          convention
        * Moved all code to xsk.[ch]
        * Removed all the XDP program loading code from the sample since
          this is now done by libbpf
        * The initialization functions now return a handle as suggested by
          Alexei
        * const statements added in the API where applicable.
      Changes v1 to v2:
        * Fixed cleanup of library state on error.
        * Moved API to initial version
        * Prefixed all public functions by xsk__ instead of xsk_
        * Added comment about changed default ring sizes, batch size and umem
          size in the sample application commit message
        * The library now only creates an Rx or Tx ring if the respective
          parameter is != NULL
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      143bdc2e