1. 23 5月, 2018 4 次提交
    • M
      bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info · 9b2cf328
      Martin KaFai Lau 提交于
      In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id"
      could cause confusion because the "id" of "btf_id" means the BPF obj id
      given to the BTF object while
      "btf_key_id" and "btf_value_id" means the BTF type id within
      that BTF object.
      
      To make it clear, btf_key_id and btf_value_id are
      renamed to btf_key_type_id and btf_value_type_id.
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9b2cf328
    • M
      bpf: btf: Remove unused bits from uapi/linux/btf.h · aea2f7b8
      Martin KaFai Lau 提交于
      This patch does the followings:
      1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k.  We can
         raise it later.
      
      2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID.  They are
         currently encoded at the highest bit of a u32.
         It is because the current use case does not require supporting
         parent type (i.e type_id referring to a type in another BTF file).
         It also does not support referring to a string in ELF.
      
         The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced
         by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are
         defined in btf.c instead of uapi/linux/btf.h.
      
      3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough.
         There is unused bits headroom if we ever needed it later.
      
      4. The root bit in BTF_INFO is also removed because it is not
         used in the current use case.
      
      5. Remove BTF_INT_VARARGS since func type is not supported now.
         The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits.
      
      The above can be added back later because the verifier
      ensures the unused bits are zeros.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      aea2f7b8
    • M
      bpf: btf: Change how section is supported in btf_header · f80442a4
      Martin KaFai Lau 提交于
      There are currently unused section descriptions in the btf_header.  Those
      sections are here to support future BTF use cases.  For example, the
      func section (func_off) is to support function signature (e.g. the BPF
      prog function signature).
      
      Instead of spelling out all potential sections up-front in the btf_header.
      This patch makes changes to btf_header such that extending it (e.g. adding
      a section) is possible later.  The unused ones can be removed for now and
      they can be added back later.
      
      This patch:
      1. adds a hdr_len to the btf_header.  It will allow adding
      sections (and other info like parent_label and parent_name)
      later.  The check is similar to the existing bpf_attr.
      If a user passes in a longer hdr_len, the kernel
      ensures the extra tailing bytes are 0.
      
      2. allows the section order in the BTF object to be
      different from its sec_off order in btf_header.
      
      3. each sec_off is followed by a sec_len.  It must not have gap or
      overlapping among sections.
      
      The string section is ensured to be at the end due to the 4 bytes
      alignment requirement of the type section.
      
      The above changes will allow enough flexibility to
      add new sections (and other info) to the btf_header later.
      
      This patch also removes an unnecessary !err check
      at the end of btf_parse().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f80442a4
    • M
      bpf: Expose check_uarg_tail_zero() · dcab51f1
      Martin KaFai Lau 提交于
      This patch exposes check_uarg_tail_zero() which will
      be reused by a later BTF patch.  Its name is changed to
      bpf_check_uarg_tail_zero().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      dcab51f1
  2. 22 5月, 2018 4 次提交
  3. 19 5月, 2018 1 次提交
  4. 18 5月, 2018 1 次提交
  5. 17 5月, 2018 3 次提交
  6. 16 5月, 2018 1 次提交
  7. 15 5月, 2018 3 次提交
    • J
      bpf: sockmap, refactor sockmap routines to work with hashmap · e5cd3abc
      John Fastabend 提交于
      This patch only refactors the existing sockmap code. This will allow
      much of the psock initialization code path and bpf helper codes to
      work for both sockmap bpf map types that are backed by an array, the
      currently supported type, and the new hash backed bpf map type
      sockhash.
      
      Most the fallout comes from three changes,
      
        - Pushing bpf programs into an independent structure so we
          can use it from the htab struct in the next patch.
        - Generalizing helpers to use void *key instead of the hardcoded
          u32.
        - Instead of passing map/key through the metadata we now do
          the lookup inline. This avoids storing the key in the metadata
          which will be useful when keys can be longer than 4 bytes. We
          rename the sk pointers to sk_redir at this point as well to
          avoid any confusion between the current sk pointer and the
          redirect pointer sk_redir.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e5cd3abc
    • M
      sched: cls: enable verbose logging · 81c7288b
      Marcelo Ricardo Leitner 提交于
      Currently, when the rule is not to be exclusively executed by the
      hardware, extack is not passed along and offloading failures don't
      get logged. The idea was that hardware failures are okay because the
      rule will get executed in software then and this way it doesn't confuse
      unware users.
      
      But this is not helpful in case one needs to understand why a certain
      rule failed to get offloaded. Considering it may have been a temporary
      failure, like resources exceeded or so, reproducing it later and knowing
      that it is triggering the same reason may be challenging.
      
      The ultimate goal is to improve Open vSwitch debuggability when using
      flower offloading.
      
      This patch adds a new flag to enable verbose logging. With the flag set,
      extack will be passed to the driver, which will be able to log the
      error. As the operation itself probably won't fail (not because of this,
      at least), current iproute will already log it as a Warning.
      
      The flag is generic, so it can be reused later. No need to restrict it
      just for HW offloading. The command line will follow the syntax that
      tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
      
      For example:
      # ./tc qdisc add dev p7p1 ingress
      # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
      	flower verbose \
      	src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
      	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
      Warning: TC offload is disabled on net device.
      # echo $?
      0
      # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
      	flower \
      	src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
      	src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
      # echo $?
      0
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c7288b
    • R
      vmcore: add API to collect hardware dump in second kernel · 2724273e
      Rahul Lakkireddy 提交于
      The sequence of actions done by device drivers to append their device
      specific hardware/firmware logs to /proc/vmcore are as follows:
      
      1. During probe (before hardware is initialized), device drivers
      register to the vmcore module (via vmcore_add_device_dump()), with
      callback function, along with buffer size and log name needed for
      firmware/hardware log collection.
      
      2. vmcore module allocates the buffer with requested size. It adds
      an Elf note and invokes the device driver's registered callback
      function.
      
      3. Device driver collects all hardware/firmware logs into the buffer
      and returns control back to vmcore module.
      
      Ensure that the device dump buffer size is always aligned to page size
      so that it can be mmaped.
      
      Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
      generic and reserve NT_VMCOREDD note type to indicate vmcore device
      dump.
      
      Suggested-by: Eric Biederman <ebiederm@xmission.com>.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2724273e
  8. 14 5月, 2018 1 次提交
  9. 12 5月, 2018 3 次提交
    • E
      tcp: switch pacing timer to softirq based hrtimer · 73a6bab5
      Eric Dumazet 提交于
      linux-4.16 got support for softirq based hrtimers.
      TCP can switch its pacing hrtimer to this variant, since this
      avoids going through a tasklet and some atomic operations.
      
      pacing timer logic looks like other (jiffies based) tcp timers.
      
      v2: use hrtimer_try_to_cancel() in tcp_clear_xmit_timers()
          to correctly release reference on socket if needed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73a6bab5
    • F
      net: dsa: Plug in PHYLINK support · aab9c406
      Florian Fainelli 提交于
      Add support for PHYLINK within the DSA subsystem in order to support more
      complex devices such as pluggable (SFP) and non-pluggable (SFF) modules, 10G
      PHYs, and traditional PHYs. Using PHYLINK allows us to drop some amount of
      complexity we had while probing fixed and non-fixed PHYs using Device Tree.
      
      Because PHYLINK separates the Ethernet MAC/port configuration into different
      stages, we let switch drivers implement those, and for now, we maintain
      functionality by calling dsa_slave_adjust_link() during
      phylink_mac_link_{up,down} which provides semantically equivalent steps.
      
      Drivers willing to take advantage of PHYLINK should implement the phylink_mac_*
      operations that DSA wraps.
      
      We cannot quite remove the adjust_link() callback just yet, because a number of
      drivers rely on that for configuring their "CPU" and "DSA" ports, this is done
      dsa_port_setup_phy_of() and dsa_port_fixed_link_register_of() still.
      
      Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2) will need
      to implement phylink_mac_ops from now on to preserve functionality, since PHYLINK
      *does not* create a phy_device instance for fixed links.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aab9c406
    • F
      net: dsa: Add PHYLINK switch operations · 11d8f3dd
      Florian Fainelli 提交于
      In preparation for adding support for PHYLINK within DSA, define a number of
      operations that we will need and that switch drivers can start implementing.
      Proper integration with PHYLINK will follow in subsequent patches.
      
      We start selecting PHYLINK (which implies PHYLIB) in net/dsa/Kconfig
      such that drivers can be guaranteed that this dependency is properly
      taken care of and can start referencing PHYLINK helper functions without
      requiring stubs or anything.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11d8f3dd
  10. 11 5月, 2018 15 次提交
  11. 10 5月, 2018 2 次提交
  12. 09 5月, 2018 2 次提交
    • M
      bpf: btf: Add struct bpf_btf_info · 62dab84c
      Martin KaFai Lau 提交于
      During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
      info.info is directly filled with the BTF binary data.  It is
      not extensible.  In this case, we want to add BTF ID.
      
      This patch adds "struct bpf_btf_info" which has the BTF ID as
      one of its member.  The BTF binary data itself is exposed through
      the "btf" and "btf_size" members.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      62dab84c
    • M
      bpf: btf: Introduce BTF ID · 78958fca
      Martin KaFai Lau 提交于
      This patch gives an ID to each loaded BTF.  The ID is allocated by
      the idr like the existing prog-id and map-id.
      
      The bpf_put(map->btf) is moved to __bpf_map_put() so that the
      userspace can stop seeing the BTF ID ASAP when the last BTF
      refcnt is gone.
      
      It also makes BTF accessible from userspace through the
      1. new BPF_BTF_GET_FD_BY_ID command.  It is limited to CAP_SYS_ADMIN
         which is inline with the BPF_BTF_LOAD cmd and the existing
         BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
      2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"
      
      Once the BTF ID handler is accessible from userspace, freeing a BTF
      object has to go through a rcu period.  The BPF_BTF_GET_FD_BY_ID cmd
      can then be done under a rcu_read_lock() instead of taking
      spin_lock.
      [Note: A similar rcu usage can be done to the existing
             bpf_prog_get_fd_by_id() in a follow up patch]
      
      When processing the BPF_BTF_GET_FD_BY_ID cmd,
      refcount_inc_not_zero() is needed because the BTF object
      could be already in the rcu dead row .  btf_get() is
      removed since its usage is currently limited to btf.c
      alone.  refcount_inc() is used directly instead.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      78958fca