1. 20 1月, 2018 1 次提交
  2. 19 1月, 2018 15 次提交
  3. 18 1月, 2018 24 次提交
    • Y
      bpf: change fake_ip for bpf_trace_printk helper · eefa864a
      Yonghong Song 提交于
      Currently, for bpf_trace_printk helper, fake ip address 0x1
      is used with comments saying that fake ip will not be printed.
      This is indeed true for 4.12 and earlier version, but for
      4.13 and later version, the ip address will be printed if
      it cannot be resolved with kallsym. Running samples/bpf/tracex5
      program and you will have the following in the debugfs
      trace_pipe output:
        ...
        <...>-1819  [003] ....   443.497877: 0x00000001: mmap
        <...>-1819  [003] ....   443.498289: 0x00000001: syscall=102 (one of get/set uid/pid/gid)
        ...
      
      The kernel commit changed this behavior is:
        commit feaf1283
        Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
        Date:   Thu Jun 22 17:04:55 2017 -0400
      
            tracing: Show address when function names are not found
        ...
      
      This patch changed the comment and also altered the fake ip
      address to 0x0 as users may think 0x1 has some special meaning
      while it doesn't. The new output:
        ...
        <...>-1799  [002] ....    25.953576: 0: mmap
        <...>-1799  [002] ....    25.953865: 0: read(fd=0, buf=00000000053936b5, size=512)
        ...
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      eefa864a
    • J
      samples/bpf: xdp2skb_meta comment explain why pkt-data pointers are invalidated · e2e32241
      Jesper Dangaard Brouer 提交于
      Improve the 'unknown reason' comment, with an actual explaination of why
      the ctx pkt-data pointers need to be loaded after the helper function
      bpf_xdp_adjust_meta().  Based on the explaination Daniel gave.
      
      Fixes: 36e04a2d ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e2e32241
    • D
      Merge branch 'bpf-dump-and-disasm-nfp-jit' · cda18e97
      Daniel Borkmann 提交于
      Jakub Kicinski says:
      
      ====================
      Jiong says:
      
      Currently bpftool could disassemble host jited image, for example x86_64,
      using libbfd. However it couldn't disassemble offload jited image.
      
      There are two reasons:
      
        1. bpf_obj_get_info_by_fd/struct bpf_prog_info couldn't get the address
           of jited image and image's length.
      
        2. Even after issue 1 resolved, bpftool couldn't figure out what is the
           offload arch from bpf_prog_info, therefore can't drive libbfd
           disassembler correctly.
      
        This patch set resolve issue 1 by introducing two new fields "jited_len"
      and "jited_image" in bpf_dev_offload. These two fields serve as the generic
      interface to communicate the jited image address and length for all offload
      targets to higher level caller. For example, bpf_obj_get_info_by_fd could
      use them to fill the userspace visible fields jited_prog_len and
      jited_prog_insns.
      
        This patch set resolve issue 2 by getting bfd backend name through
      "ifindex", i.e network interface index.
      
      v1:
       - Deduct bfd arch name through ifindex, i.e network interface index.
         First, map ifindex to devname through ifindex_to_name_ns, then get
         pci id through /sys/class/dev/DEVNAME/device/vendor. (Daniel, Alexei)
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      cda18e97
    • J
      tools: bpftool: improve architecture detection by using ifindex · e6593596
      Jiong Wang 提交于
      The current architecture detection method in bpftool is designed for host
      case.
      
      For offload case, we can't use the architecture of "bpftool" itself.
      Instead, we could call the existing "ifindex_to_name_ns" to get DEVNAME,
      then read pci id from /sys/class/dev/DEVNAME/device/vendor, finally we map
      vendor id to bfd arch name which will finally be used to select bfd backend
      for the disassembler.
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e6593596
    • J
      nfp: bpf: set new jit info fields · eb1d7db9
      Jiong Wang 提交于
      This patch set those new jit info fields introduced in this patch set.
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      eb1d7db9
    • J
      bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info · fcfb126d
      Jiong Wang 提交于
      For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog
      used by all host JIT targets to get jited image and it's length. While for
      offload, targets are likely to have different offload mechanisms that these
      info are kept in device private data fields.
      
      Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT
      length and contents info for offload targets.
      
      One way is to introduce new callback to parse device private data then fill
      those fields in bpf_prog_info. This might be a little heavy, the other way
      is to add generic fields which will be initialized by all offload targets.
      
      This patch follow the second approach to introduce two new fields in
      struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill
      correct jited_prog_len and jited_prog_insns in bpf_prog_info.
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fcfb126d
    • D
      Merge tag 'linux-can-next-for-4.16-20180116' of... · 4f7d5851
      David S. Miller 提交于
      Merge tag 'linux-can-next-for-4.16-20180116' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2018-01-16
      
      this is a pull request for net-next/master consisting of 9 patches.
      
      This is a series of patches, some of them initially by Franklin S Cooper
      Jr, which was picked up by Faiz Abbas. Faiz Abbas added some patches
      while working on this series, I contributed one as well.
      
      The first two patches add support to CAN device infrastructure to limit
      the bitrate of a CAN adapter if the used CAN-transceiver has a certain
      maximum bitrate.
      
      The remaining patches improve the m_can driver. They add support for
      bitrate limiting to the driver, clean up the driver and add support for
      runtime PM.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f7d5851
    • L
      vxlan: Fix trailing semicolon · 5ef7e0ba
      Luis de Bethencourt 提交于
      The trailing semicolon is an empty statement that does no operation.
      It is completely stripped out by the compiler. Removing it since it doesn't do
      anything.
      
      Fixes: 5f35227e ("net: Generalize ndo_gso_check to ndo_features_check")
      Signed-off-by: NLuis de Bethencourt <luisbg@kernel.org>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ef7e0ba
    • G
      cxgb4: restructure VF mgmt code · baf50868
      Ganesh Goudar 提交于
      restructure the code which adds support for configuring
      PCIe VF via mgmt netdevice. which was added by
      commit 7829451c ("cxgb4: Add control net_device for
      configuring PCIe VF")
      
      Original work by: Casey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baf50868
    • K
      net: Remove spinlock from get_net_ns_by_id() · 42157277
      Kirill Tkhai 提交于
      idr_find() is safe under rcu_read_lock() and
      maybe_get_net() guarantees that net is alive.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42157277
    • K
      net: Fix possible race in peernet2id_alloc() · 0c06bea9
      Kirill Tkhai 提交于
      peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count)
      under net->nsid_lock does not guarantee, peer is alive:
      
      rcu_read_lock()
      peernet2id_alloc()                            ..
        spin_lock_bh(&net->nsid_lock)               ..
        refcount_read(&peer->count) (!= 0)          ..
        ..                                          put_net()
        ..                                            cleanup_net()
        ..                                              for_each_net(tmp)
        ..                                                spin_lock_bh(&tmp->nsid_lock)
        ..                                                __peernet2id(tmp, net) == -1
        ..                                                    ..
        ..                                                    ..
          __peernet2id_alloc(alloc == true)                   ..
        ..                                                    ..
      rcu_read_unlock()                                       ..
      ..                                                synchronize_rcu()
      ..                                                kmem_cache_free(net)
      
      After the above situation, net::netns_id contains id pointing to freed memory,
      and any other dereferencing by the id will operate with this freed memory.
      
      Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
      ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
      is generic interface, and better we fix it before someone really starts
      use it in wrong context.
      
      v2: Don't place refcount_read(&net->count) under net->nsid_lock
          as suggested by Eric W. Biederman <ebiederm@xmission.com>
      v3: Rebase on top of net-next
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c06bea9
    • D
      Merge branch 'tun-allow-to-attach-eBPF-filter' · a29ae44c
      David S. Miller 提交于
      Jason Wang says:
      
      ====================
      tun: allow to attach eBPF filter
      
      This series tries to implement eBPF socket filter for tun. This could
      be used for implementing efficient virtio-net receive filter for
      vhost-net.
      
      Changes from V2:
      - fix typo
      - remove unnecessary double check
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a29ae44c
    • J
      tun: allow to attach ebpf socket filter · aff3d70a
      Jason Wang 提交于
      This patch allows userspace to attach eBPF filter to tun. This will
      allow to implement VM dataplane filtering in a more efficient way
      compared to cBPF filter by allowing either qemu or libvirt to
      attach eBPF filter to tun.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aff3d70a
    • J
      tuntap: rename struct tun_steering_prog to struct tun_prog · cd5681d7
      Jason Wang 提交于
      To be reused by other eBPF program other than queue selection.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd5681d7
    • D
      Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances' · ca46abd6
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      net: sched: allow qdiscs to share filter block instances
      
      Currently the filters added to qdiscs are independent. So for example if you
      have 2 netdevices and you create ingress qdisc on both and you want to add
      identical filter rules both, you need to add them twice. This patchset
      makes this easier and mainly saves resources allowing to share all filters
      within a qdisc - I call it a "filter block". Also this helps to save
      resources when we do offload to hw for example to expensive TCAM.
      
      So back to the example. First, we create 2 qdiscs. Both will share
      block number 22. "22" is just an identification:
      $ tc qdisc add dev ens7 ingress_block 22 ingress
                              ^^^^^^^^^^^^^^^^
      $ tc qdisc add dev ens8 ingress_block 22 ingress
                              ^^^^^^^^^^^^^^^^
      
      If we don't specify "block" command line option, no shared block would
      be created:
      $ tc qdisc add dev ens9 ingress
      
      Now if we list the qdiscs, we will see the block index in the output:
      
      $ tc qdisc
      qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
      qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
      qdisc ingress ffff: dev ens9 parent ffff:fff1
      
      To make is more visual, the situation looks like this:
      
         ens7 ingress qdisc                 ens7 ingress qdisc
                |                                  |
                |                                  |
                +---------->  block 22  <----------+
      
      Unlimited number of qdiscs may share the same block.
      
      Note that this patchset introduces block sharing support also for clsact
      qdisc:
      $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
      $ tc qdisc show dev ens10
      qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24
      
      We can add filter using the block index:
      
      $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
      
      Note we cannot use the qdisc for filter manipulations of shared blocks:
      
      $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
      Error: This filter block is shared. Please use the block index to manipulate the filters.
      
      We will see the same output if we list filters for ingress qdisc of
      ens7 and ens8, also for the block 22:
      
      $ tc filter show block 22
      filter block 22 protocol ip pref 25 flower chain 0
      filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
      ...
      
      $ tc filter show dev ens7 ingress
      filter block 22 protocol ip pref 25 flower chain 0
      filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
      ...
      
      $ tc filter show dev ens8 ingress
      filter block 22 protocol ip pref 25 flower chain 0
      filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
      ...
      
      ---
      v10->v11:
      - patch 2:
       - fixed error path when register_pernet_subsys fails pointed out by Cong
      - patch 9:
       - rebased on top of the current net-next
      
      v9->v10:
      - patch 7:
       - fixed ifindex magic in the patch description
      - userspace patches:
       - added manpages and patch descriptions
      
      v8->v9:
      - patch "net: sched: add rt netlink message type for block get" was
        removed, userspace check filter existence using qdisc dump
      
      v7->v8:
      - patch 7:
       - added comment to ifindex block magic
      - patch 9:
       - new patch
      - patch 10:
       - base this on the patch that introduces qdisc-generic block index
         attributes parsing/dumping
      - patch 13:
       - rebased on top of current net-next
      
      v6->v7:
      - patch 1:
       - unsquashed shared block patch that was previously squashed by mistake
       - fixed error path in block create - freeing chain 0
      - patch 2:
       - new patch - splitted from the previous one as it got accidentaly
         squashed in the rebasing process in the past
       - converted to idr extended
       - removed auto-generating of block indexes. Callers have to explicily
         tell that the block is shared by passing non-zero block index
       - fixed error path in block get ext - freeing chain 0
      - patch 7:
       - changed extack message for block index handle as suggested by DaveA
       - added extack message when block index does not exist
       - the block ifindex magic is in define and change to 0xffffffff
         as suggested by Jamal
      - patch 8:
       - new patch implementing RTM_GETBLOCK in order to query if the block
         with some index exists
      - patch 9:
       - adjust to the core changes and check block index attributes for being 0
      
      v5->v6:
      - added patch 6 that introduces block handle
      
      v4->v5:
      - patch 5:
       - add tracking of binding of devs that are unable to offload and check
         that before block cbs call.
      
      v3->v4:
      - patch 1:
       - rebased on top of the current net-next
       - added some extack strings
      - patch 3:
       - rebased on top of the current net-next
      - patch 5:
       - propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind
         caller
      - patch 7:
       - rebased on top of the current net-next
      
      v2->v3:
      - removed original patch 1, removing tp->q cls_bpf dependency. Fixed by
        Jakub in the meantime.
      - patch 1:
       - rebased on top of the current net-next
      - patch 5:
       - new patch
      - patch 8:
       - removed "p_" prefix from block index function args
      - patch 10:
       - add tc offload feature handling
      ====================
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca46abd6
    • J
      mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops · 4b23258d
      Jiri Pirko 提交于
      No need to convert from mlxsw_sp_port to net_device and back again.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b23258d
    • J
      mlxsw: spectrum_acl: Implement TC block sharing · 3aaff323
      Jiri Pirko 提交于
      Benefit from the prepared TC and in-driver ACL infrastructure and
      introduce block sharing offload. For that, a new struct "block" is
      introduced in spectrum_acl in order to hold a list of specific
      block-port bindings.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3aaff323
    • J
      mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind · 02caf499
      Jiri Pirko 提交于
      Instead, pass netdev and ingress flag to ruleset unbind op.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02caf499
    • J
      mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy · 9fe5fdf2
      Jiri Pirko 提交于
      In order to prepare for follow-up changes, make the bind/unbind helpers
      very simple. That required move of ht insertion/removal and bind/unbind
      calls into mlxsw_sp_acl_ruleset_create/destroy.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe5fdf2
    • J
      net: sched: allow ingress and clsact qdiscs to share filter blocks · 51ab2994
      Jiri Pirko 提交于
      Benefit from the previously introduced shared filter blocks
      infrastructure and allow ingress and clsact qdisc instances to share
      filter blocks. The block index is coming from userspace as qdisc option.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51ab2994
    • J
      net: sched: introduce ingress/egress block index attributes for qdisc · d47a6b0e
      Jiri Pirko 提交于
      Introduce two new attributes to be used for qdisc creation and dumping.
      One for ingress block, one for egress block. Introduce a set of ops that
      qdisc which supports block sharing would implement.
      
      Passing block indexes in qdisc change is not supported yet and it is
      checked and forbidded.
      
      In future, these attributes are to be reused for specifying block
      indexes for classes as well. As of this moment however, it is not
      supported so a check is in place to forbid it.
      Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d47a6b0e
    • J
      net: sched: use block index as a handle instead of qdisc when block is shared · 7960d1da
      Jiri Pirko 提交于
      As the tcm_ifindex with value TCM_IFINDEX_MAGIC_BLOCK is invalid ifindex,
      use it to indicate that we work with block, instead of qdisc.
      So if tcm_ifindex is set to TCM_IFINDEX_MAGIC_BLOCK, tcm_parent is used
      to carry block_index.
      
      If the block is set to be shared between at least 2 qdiscs, it is
      forbidden to use the qdisc handle to add/delete filters. In that case,
      userspace has to pass block_index.
      
      Also, for dump of the filters, in case the block is shared in between at
      least 2 qdiscs, the each filter is dumped with tcm_ifindex value
      TCM_IFINDEX_MAGIC_BLOCK and tcm_parent set to block_index. That gives
      the user clear indication, that the filter belongs to a shared block
      and not only to one qdisc under which it is dumped.
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7960d1da
    • J
      net: sched: keep track of offloaded filters and check tc offload feature · caa72601
      Jiri Pirko 提交于
      During block bind, we need to check tc offload feature. If it is
      disabled yet still the block contains offloaded filters, forbid the
      bind. Also forbid to register callback for a block that already
      contains offloaded filters, as the play back is not supported now.
      For keeping track of offloaded filters there is a new counter
      introduced, alongside with couple of helpers called from cls_* code.
      These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caa72601
    • J
      net: sched: remove classid and q fields from tcf_proto · edf6711c
      Jiri Pirko 提交于
      Both are no longer used, so remove them.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edf6711c