1. 26 9月, 2020 17 次提交
    • J
      bpf: Add comment to document BTF type PTR_TO_BTF_ID_OR_NULL · ba5f4cfe
      John Fastabend 提交于
      The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types
      denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET
      and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the
      type PTR_TO_SOCKET is guaranteed to _not_ have a null value.
      
      In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly
      different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value,
      but it is safe to read this pointer in the program context because
      the program context will handle any faults. The fallout is for
      PTR_TO_BTF_ID the verifier can assume reads are safe, but can not
      use the type in branch analysis. Additionally, authors need to be
      extra careful when passing PTR_TO_BTF_ID into helpers. In general
      helpers consuming type PTR_TO_BTF_ID will need to assume it may
      be null.
      
      Seeing the above is not obvious to readers without the back knowledge
      lets add a comment in the type definition.
      
      Editorial comment, as networking and tracing programs get closer
      and more tightly merged we may need to consider a new type that we
      can ensure is non-null for branch analysis and also passing into
      helpers.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      ba5f4cfe
    • J
      bpf: Add AND verifier test case where 32bit and 64bit bounds differ · 99d4def4
      John Fastabend 提交于
      If we AND two values together that are known in the 32bit subregs, but not
      known in the 64bit registers we rely on the tnum value to report the 32bit
      subreg is known. And do not use mark_reg_known() directly from
      scalar32_min_max_and()
      
      Add an AND test to cover the case with known 32bit subreg, but unknown
      64bit reg.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      99d4def4
    • J
      bpf, verifier: Remove redundant var_off.value ops in scalar known reg cases · 4fbb38a3
      John Fastabend 提交于
      In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst
      tnum is a constant.
      
       1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off)
       2 scalar32_min_max_[op]
       3       if (known) return
       4 scalar_min_max_[op]
       5       if (known)
       6          __mark_reg_known(dst_reg,
                         dst_reg->var_off.value [op] src_reg.var_off.value)
      
      The result is in 1 we calculate the var_off value and store it in the
      dst_reg. Then in 6 we duplicate this logic doing the op again on the
      value.
      
      The duplication comes from the the tnum_[op] handlers because they have
      already done the value calcuation. For example this is tnum_and().
      
       struct tnum tnum_and(struct tnum a, struct tnum b)
       {
      	u64 alpha, beta, v;
      
      	alpha = a.value | a.mask;
      	beta = b.value | b.mask;
      	v = a.value & b.value;
      	return TNUM(v, alpha & beta & ~v);
       }
      
      So lets remove the redundant op calculation. Its confusing for readers
      and unnecessary. Its also not harmful because those ops have the
      property, r1 & r1 = r1 and r1 | r1 = r1.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4fbb38a3
    • A
      Merge branch 'enable-bpf_skc-cast-for-networking-progs' · 84085f87
      Alexei Starovoitov 提交于
      Martin KaFai Lau says:
      
      ====================
      This set allows networking prog type to directly read fields from
      the in-kernel socket type, e.g. "struct tcp_sock".
      
      Patch 2 has the details on the use case.
      
      v3:
      - Pass arg_btf_id instead of fn into check_reg_type() in Patch 1 (Lorenz)
      - Move arg_btf_id from func_proto to struct bpf_reg_types in Patch 2 (Lorenz)
      - Remove test_sock_fields from .gitignore in Patch 8 (Andrii)
      - Add tests to have better coverage on the modified helpers (Alexei)
        Patch 13 is added.
      - Use "void *sk" as the helper argument in UAPI bpf.h
      
      v3:
      - ARG_PTR_TO_SOCK_COMMON_OR_NULL was attempted in v2.  The _OR_NULL was
        needed because the PTR_TO_BTF_ID could be NULL but note that a could be NULL
        PTR_TO_BTF_ID is not a scalar NULL to the verifier.  "_OR_NULL" implicitly
        gives an expectation that the helper can take a scalar NULL which does
        not make sense in most (except one) helpers.  Passing scalar NULL
        should be rejected at the verification time.
      
        Thus, this patch uses ARG_PTR_TO_BTF_ID_SOCK_COMMON to specify that the
        helper can take both the btf-id ptr or the legacy PTR_TO_SOCK_COMMON but
        not scalar NULL.  It requires the func_proto to explicitly specify the
        arg_btf_id such that there is a very clear expectation that the helper
        can handle a NULL PTR_TO_BTF_ID.
      
      v2:
      - Add ARG_PTR_TO_SOCK_COMMON_OR_NULL (Lorenz)
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      84085f87
    • M
      bpf: selftest: Add test_btf_skc_cls_ingress · 9a856cae
      Martin KaFai Lau 提交于
      This patch attaches a classifier prog to the ingress filter.
      It exercises the following helpers with different socket pointer
      types in different logical branches:
      1. bpf_sk_release()
      2. bpf_sk_assign()
      3. bpf_skc_to_tcp_request_sock(), bpf_skc_to_tcp_sock()
      4. bpf_tcp_gen_syncookie, bpf_tcp_check_syncookie
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000458.3859627-1-kafai@fb.com
      9a856cae
    • M
      bpf: selftest: Remove enum tcp_ca_state from bpf_tcp_helpers.h · 0c402c6c
      Martin KaFai Lau 提交于
      The enum tcp_ca_state is available in <linux/tcp.h>.
      Remove it from the bpf_tcp_helpers.h to avoid conflict when the bpf prog
      needs to include both both <linux/tcp.h> and bpf_tcp_helpers.h.
      
      Modify the bpf_cubic.c and bpf_dctcp.c to use <linux/tcp.h> instead.
      The <linux/stddef.h> is needed by <linux/tcp.h>.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000452.3859313-1-kafai@fb.com
      0c402c6c
    • M
      bpf: selftest: Use bpf_skc_to_tcp_sock() in the sock_fields test · edc2d66a
      Martin KaFai Lau 提交于
      This test uses bpf_skc_to_tcp_sock() to get a kernel tcp_sock ptr "ktp".
      Access the ktp->lsndtime and also pass ktp to bpf_sk_storage_get().
      
      It also exercises the bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id()
      with the "ktp".  To do that, a parent cgroup and a child cgroup are
      created.  The bpf prog is attached to the child cgroup.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000446.3858975-1-kafai@fb.com
      edc2d66a
    • M
      bpf: selftest: Use network_helpers in the sock_fields test · c40a565a
      Martin KaFai Lau 提交于
      This patch uses start_server() and connect_to_fd() from network_helpers.h
      to remove the network testing boiler plate codes.  epoll is no longer
      needed also since the timeout has already been taken care of also.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000440.3858639-1-kafai@fb.com
      c40a565a
    • M
      bpf: selftest: Adapt sock_fields test to use skel and global variables · b18c1f0a
      Martin KaFai Lau 提交于
      skel is used.
      
      Global variables are used to store the result from bpf prog.
      addr_map, sock_result_map, and tcp_sock_result_map are gone.
      Instead, global variables listen_tp, srv_sa6, cli_tp,, srv_tp,
      listen_sk, srv_sk, and cli_sk are added.
      Because of that, bpf_addr_array_idx and bpf_result_array_idx are also
      no longer needed.
      
      CHECK() macro from test_progs.h is reused and bail as soon as
      a CHECK failure.
      
      shutdown() is used to ensure the previous data-ack is received.
      The bytes_acked, bytes_received, and the pkt_out_cnt checks are
      using "<" to accommodate the final ack may not have been received/sent.
      It is enough since it is not the focus of this test.
      
      The sk local storage is all initialized to 0xeB9F now, so the
      check_sk_pkt_out_cnt() always checks with the 0xeB9F base.  It is to
      keep things simple.
      
      The next patch will reuse helpers from network_helpers.h to simplify
      things further.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000434.3858204-1-kafai@fb.com
      b18c1f0a
    • M
      bpf: selftest: Move sock_fields test into test_progs · 6f521a2b
      Martin KaFai Lau 提交于
      This is a mechanical change to
      1. move test_sock_fields.c to prog_tests/sock_fields.c
      2. rename progs/test_sock_fields_kern.c to progs/test_sock_fields.c
      
      Minimal change is made to the code itself.  Next patch will make
      changes to use new ways of writing test, e.g. use skel and global
      variables.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000427.3857814-1-kafai@fb.com
      6f521a2b
    • M
      bpf: selftest: Add ref_tracking verifier test for bpf_skc casting · 5d13746d
      Martin KaFai Lau 提交于
      The patch tests for:
      1. bpf_sk_release() can be called on a tcp_sock btf_id ptr.
      
      2. Ensure the tcp_sock btf_id pointer cannot be used
         after bpf_sk_release().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20200925000421.3857616-1-kafai@fb.com
      5d13746d
    • M
      bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON · 27e5203b
      Martin KaFai Lau 提交于
      This patch changes the bpf_sk_assign() to take
      ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
      returned by the bpf_skc_to_*() helpers also.
      
      The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL".  Meaning
      it specifically takes a literal NULL.  ARG_PTR_TO_BTF_ID_SOCK_COMMON
      does not allow a literal NULL, so another ARG type is required
      for this purpose and another follow-up patch can be used if
      there is such need.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com
      27e5203b
    • M
      bpf: Change bpf_tcp_*_syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON · c0df236e
      Martin KaFai Lau 提交于
      This patch changes the bpf_tcp_*_syncookie() to take
      ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
      returned by the bpf_skc_to_*() helpers also.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com
      c0df236e
    • M
      bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON · 592a3498
      Martin KaFai Lau 提交于
      This patch changes the bpf_sk_storage_*() to take
      ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
      returned by the bpf_skc_to_*() helpers also.
      
      A micro benchmark has been done on a "cgroup_skb/egress" bpf program
      which does a bpf_sk_storage_get().  It was driven by netperf doing
      a 4096 connected UDP_STREAM test with 64bytes packet.
      The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.
      
      The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
      btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
      no longer needed, so they are removed.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com
      592a3498
    • M
      bpf: Change bpf_sk_release and bpf_sk_*cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON · a5fa25ad
      Martin KaFai Lau 提交于
      The previous patch allows the networking bpf prog to use the
      bpf_skc_to_*() helpers to get a PTR_TO_BTF_ID socket pointer,
      e.g. "struct tcp_sock *".  It allows the bpf prog to read all the
      fields of the tcp_sock.
      
      This patch changes the bpf_sk_release() and bpf_sk_*cgroup_id()
      to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will
      work with the pointer returned by the bpf_skc_to_*() helpers
      also.  For example, the following will work:
      
      	sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0);
      	if (!sk)
      		return;
      	tp = bpf_skc_to_tcp_sock(sk);
      	if (!tp) {
      		bpf_sk_release(sk);
      		return;
      	}
      	lsndtime = tp->lsndtime;
      	/* Pass tp to bpf_sk_release() will also work */
      	bpf_sk_release(tp);
      
      Since PTR_TO_BTF_ID could be NULL, the helper taking
      ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime.
      
      A btf_id of "struct sock" may not always mean a fullsock.  Regardless
      the helper's running context may get a non-fullsock or not,
      considering fullsock check/handling is pretty cheap, it is better to
      keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID*
      will be able to handle the minisock situation.  In the bpf_sk_*cgroup_id()
      case,  it will try to get a fullsock by using sk_to_full_sk() as its
      skb variant bpf_sk"b"_*cgroup_id() has already been doing.
      
      bpf_sk_release can already handle minisock, so nothing special has to
      be done.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com
      a5fa25ad
    • M
      bpf: Enable bpf_skc_to_* sock casting helper to networking prog type · 1df8f55a
      Martin KaFai Lau 提交于
      There is a constant need to add more fields into the bpf_tcp_sock
      for the bpf programs running at tc, sock_ops...etc.
      
      A current workaround could be to use bpf_probe_read_kernel().  However,
      other than making another helper call for reading each field and missing
      CO-RE, it is also not as intuitive to use as directly reading
      "tp->lsndtime" for example.  While already having perfmon cap to do
      bpf_probe_read_kernel(), it will be much easier if the bpf prog can
      directly read from the tcp_sock.
      
      This patch tries to do that by using the existing casting-helpers
      bpf_skc_to_*() whose func_proto returns a btf_id.  For example, the
      func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
      kernel "struct tcp_sock".
      
      These helpers are also added to is_ptr_cast_function().
      It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
      That will keep the ref-tracking works properly.
      
      The bpf_skc_to_* helpers are made available to most of the bpf prog
      types in filter.c. The bpf_skc_to_* helpers will be limited by
      perfmon cap.
      
      This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON.  The helper accepting
      this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
      or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON).  The bpf_skc_to_*()
      helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
      they will accept pointer obtained from skb->sk.
      
      Instead of specifying both arg_type and arg_btf_id in the same func_proto
      which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
      the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
      compatible_reg_types[] in verifier.c.  The reason is the arg_btf_id is
      always the same.  Discussion in this thread:
      https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/
      
      The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
      expecting a PTR_TO_BTF_ID which could be NULL.  This is the same
      behavior as the existing helper taking ARG_PTR_TO_BTF_ID.
      
      The _SOCK_COMMON part means the helper is also expecting the legacy
      SOCK_COMMON pointer.
      
      By excluding the _OR_NULL part, the bpf prog cannot call helper
      with a literal NULL which doesn't make sense in most cases.
      e.g. bpf_skc_to_tcp_sock(NULL) will be rejected.  All PTR_TO_*_OR_NULL
      reg has to do a NULL check first before passing into the helper or else
      the bpf prog will be rejected.  This behavior is nothing new and
      consistent with the current expectation during bpf-prog-load.
      
      [ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
        ARG_PTR_TO_SOCK* of other existing helpers later such that
        those existing helpers can take the PTR_TO_BTF_ID returned by
        the bpf_skc_to_*() helpers.
      
        The only special case is bpf_sk_lookup_assign() which can accept a
        literal NULL ptr.  It has to be handled specially in another follow
        up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
        to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ]
      
      [ When converting the older helpers that take ARG_PTR_TO_SOCK* in
        the later patch, if the kernel does not support BTF,
        ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
        because no reg->type could have PTR_TO_BTF_ID in this case.
      
        It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
        here though because these helpers must require BTF vmlinux to begin
        with. ]
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com
      1df8f55a
    • M
      bpf: Move the PTR_TO_BTF_ID check to check_reg_type() · a968d5e2
      Martin KaFai Lau 提交于
      check_reg_type() checks whether a reg can be used as an arg of a
      func_proto.  For PTR_TO_BTF_ID, the check is actually not
      completely done until the reg->btf_id is pointing to a
      kernel struct that is acceptable by the func_proto.
      
      Thus, this patch moves the btf_id check into check_reg_type().
      "arg_type" and "arg_btf_id" are passed to check_reg_type() instead of
      "compatible".  The compatible_reg_types[] usage is localized in
      check_reg_type() now.
      
      The "if (!btf_id) verbose(...); " is also removed since it won't happen.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200925000344.3854828-1-kafai@fb.com
      a968d5e2
  2. 24 9月, 2020 23 次提交
    • A
      Merge branch 'rtt-speedup.2020.09.16a' of... · 182bf3f3
      Alexei Starovoitov 提交于
      Merge branch 'rtt-speedup.2020.09.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into bpf-next
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      182bf3f3
    • A
      Revert "bpf: Fix potential call bpf_link_free() in atomic context" · f00f2f7f
      Alexei Starovoitov 提交于
      This reverts commit 31f23a6a.
      
      This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others.
      There was no issue in the code.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f00f2f7f
    • D
      Merge branch 'net-dsa-bcm_sf2-Additional-DT-changes' · 3fc826f1
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: bcm_sf2: Additional DT changes
      
      This patch series includes some additional changes to the bcm_sf2 in
      order to support the Device Tree firmwares provided on such platforms.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fc826f1
    • F
      net: dsa: bcm_sf2: Include address 0 for MDIO diversion · 0fa45ee3
      Florian Fainelli 提交于
      We need to include MDIO address 0, which is how our Device Tree blobs
      indicate where to find the external BCM53125 switches.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fa45ee3
    • F
      net: dsa: bcm_sf2: Disallow port 5 to be a DSA CPU port · 8c280440
      Florian Fainelli 提交于
      While the switch driver is written such that port 5 or 8 could be CPU
      ports, the use case on Broadcom STB chips is to use port 8 exclusively.
      The platform firmware does make port 5 comply to a proper DSA CPU port
      binding by specifiying an "ethernet" phandle. This is undesirable for
      now until we have an user-space configuration mechanism (such as
      devlink) which could support dynamically changing the port flavor at
      run time.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c280440
    • D
      Merge branch 'octeontx2-Add-support-for-VLAN-based-flow-distribution' · 9d33ffaa
      David S. Miller 提交于
      George Cherian says:
      
      ====================
      octeontx2: Add support for VLAN based flow distribution
      
      This series add support for VLAN based flow distribution for octeontx2
      netdev driver. This adds support for configuring the same via ethtool.
      
      Following tests have been done.
      	- Multi VLAN flow with same SD
      	- Multi VLAN flow with same SDFN
      	- Single VLAN flow with multi SD
      	- Single VLAN flow with multi SDFN
      All tests done for udp/tcp both v4 and v6
      ====================
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d33ffaa
    • G
      octeontx2-pf: Support to change VLAN based RSS hash options via ethtool · a55ff8ef
      George Cherian 提交于
      Add support to control rx-flow-hash based on VLAN.
      By default VLAN plus 4-tuple based hashing is enabled.
      Changes can be done runtime using ethtool
      
      To enable 2-tuple plus VLAN based flow distribution
        # ethtool -N <intf> rx-flow-hash <prot> sdv
      To enable 4-tuple plus VLAN based flow distribution
        # ethtool -N <intf> rx-flow-hash <prot> sdfnv
      Signed-off-by: NGeorge Cherian <george.cherian@marvell.com>
      Signed-off-by: NSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a55ff8ef
    • G
      octeontx2-af: Add support for VLAN based RSS hashing · 8f900363
      George Cherian 提交于
      Added support for PF/VF drivers to choose RSS flow key algorithm
      with VLAN tag included in hashing input data. Only CTAG is considered.
      Signed-off-by: NGeorge Cherian <george.cherian@marvell.com>
      Signed-off-by: NSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f900363
    • M
      net: fix a new kernel-doc warning at dev.c · de2b541b
      Mauro Carvalho Chehab 提交于
      kernel-doc expects the function prototype to be just after
      the kernel-doc markup, as otherwise it will get it all wrong:
      
      	./net/core/dev.c:10036: warning: Excess function parameter 'dev' description in 'WAIT_REFS_MIN_MSECS'
      
      Fixes: 0e4be9e5 ("net: use exponential backoff in netdev_wait_allrefs")
      Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Reviewed-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de2b541b
    • D
      Merge branch 'net-mdio-ipq4019-add-Clause-45-support' · 774e9ea6
      David S. Miller 提交于
      Robert Marko says:
      
      ====================
      net: mdio-ipq4019: add Clause 45 support
      
      This patch series adds support for Clause 45 to the driver.
      
      While at it also change some defines to upper case to match rest of the driver.
      
      Changes since v4:
      * Rebase onto net-next.git
      
      Changes since v1:
      * Drop clock patches, these need further investigation and
      no user for non default configuration has been found
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      774e9ea6
    • R
      net: mdio-ipq4019: add Clause 45 support · 06fb5606
      Robert Marko 提交于
      While up-streaming the IPQ4019 driver it was thought that the controller had no Clause 45 support,
      but it actually does and its activated by writing a bit to the mode register.
      
      So lets add it as newer SoC-s use the same controller and Clause 45 compliant PHY-s.
      Signed-off-by: NRobert Marko <robert.marko@sartura.hr>
      Cc: Luka Perkov <luka.perkov@sartura.hr>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06fb5606
    • R
      net: mdio-ipq4019: change defines to upper case · b840ec1e
      Robert Marko 提交于
      In the commit adding the IPQ4019 MDIO driver, defines for timeout and sleep partially used lower case.
      Lets change it to upper case in line with the rest of driver defines.
      Signed-off-by: NRobert Marko <robert.marko@sartura.hr>
      Cc: Luka Perkov <luka.perkov@sartura.hr>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b840ec1e
    • D
      Merge branch 'Introduce-mbox-tracepoints-for-Octeontx2' · 35e3dbfa
      David S. Miller 提交于
      Subbaraya Sundeep says:
      
      ====================
      Introduce mbox tracepoints for Octeontx2
      
      This patchset adds tracepoints support for mailbox.
      In Octeontx2, PFs and VFs need to communicate with AF
      for allocating and freeing resources. Once all the
      configuration is done by AF for a PF/VF then packet I/O
      can happen on PF/VF queues. When an interface
      is brought up many mailbox messages are sent
      to AF for initializing queues. Say a VF is brought up
      then each message is sent to PF and PF forwards to
      AF and response also traverses from AF to PF and then VF.
      To aid debugging, tracepoints are added at places where
      messages are allocated, sent and message interrupts.
      Below is the trace of one of the messages from VF to AF
      and AF response back to VF:
      
      ~ # echo 1 > /sys/kernel/tracing/events/rvu/enable
      ~ # ifconfig eth20 up
      [  279.379559] eth20 NIC Link is UP 10000 Mbps Full duplex
      ~ # cat /sys/kernel/tracing/trace
              ifconfig-171   [000] ....   275.753345: otx2_msg_alloc: [0002:02:00.1] msg:(0x400) size:40
      
              ifconfig-171   [000] ...1   275.753347: otx2_msg_send: [0002:02:00.1] sent 1 msg(s) of size:48
      
                <idle>-0     [001] dNh1   275.753356: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt VF(s) to PF (0x1)
      
          kworker/u9:1-90    [001] ...1   275.753364: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48
      
          kworker/u9:1-90    [001] d.h.   275.753367: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2)
      
          kworker/u9:2-167   [002] ....   275.753535: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0
      
          kworker/u9:2-167   [002] ...1   275.753537: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32
      
                <idle>-0     [003] d.h1   275.753543: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1)
      
                <idle>-0     [001] d.h2   275.754376: otx2_msg_interrupt: [0002:02:00.1] mbox interrupt PF to VF (0x1)
      
      v3 changes:
       Removed EXPORT_TRACEPOINT_SYMBOLS of otx2_msg_send and otx2_msg_check
       since they are called locally only
      
      v2 changes:
       Removed otx2_msg_err tracepoint since it is similar to devlink_hwerr
       and it will be used instead when devlink supported is added.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35e3dbfa
    • S
      octeontx2-pf: Add tracepoints for PF/VF mailbox · 31a97460
      Subbaraya Sundeep 提交于
      With tracepoints support present in the mailbox
      code this patch adds tracepoints in PF and VF drivers
      at places where mailbox messages are allocated,
      sent and at message interrupts.
      Signed-off-by: NSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: NSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31a97460
    • S
      octeontx2-af: Introduce tracepoints for mailbox · 49142d12
      Subbaraya Sundeep 提交于
      Added tracepoints in mailbox code so that
      the mailbox operations like message allocation,
      sending message and message interrupts are traced.
      Also the mailbox errors occurred like timeout
      or wrong responses are traced.
      These will help in debugging mailbox issues.
      
      Here's an example output showing one of the mailbox
      messages sent by PF to AF and AF responding to it:
      
      ~# mount -t tracefs none /sys/kernel/tracing/
      ~# echo 1 > /sys/kernel/tracing/events/rvu/enable
      ~# ifconfig eth0 up
      ~# cat /sys/kernel/tracing/trace
      
      ~# cat /sys/kernel/tracing/trace
       tracer: nop
      
      		      _-----=> irqs-off
      		     / _----=> need-resched
      		    | / _---=> hardirq/softirq
      		    || / _--=> preempt-depth
      		    ||| /     delay
         TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
            | |       |   ||||       |         |
      ifconfig-2382  [002] ....   756.161892: otx2_msg_alloc: [0002:02:00.0] msg:(0x400) size:40
      
      ifconfig-2382  [002] ...1   756.161895: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48
      
       <idle>-0     [000] d.h1   756.161902: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2)
      
      kworker/u49:0-1165  [000] ....   756.162049: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0
      
      kworker/u49:0-1165  [000] ...1   756.162051: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32
      
      kworker/u49:0-1165  [000] d.h.   756.162056: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1)
      Signed-off-by: NSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: NSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49142d12
    • B
      net: allwinner: remove redundant irqsave and irqrestore in hardIRQ · 36493269
      Barry Song 提交于
      The comment "holders of db->lock must always block IRQs" and related
      code to do irqsave and irqrestore don't make sense since we are in a
      IRQ-disabled hardIRQ context.
      
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Signed-off-by: NBarry Song <song.bao.hua@hisilicon.com>
      Acked-by: NMaxime Ripard <mripard@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36493269
    • R
      net: hns3: Constify static structs · e4b91468
      Rikard Falkeborn 提交于
      A number of static variables were not modified. Make them const to allow
      the compiler to put them in read-only memory. In order to do so,
      constify a couple of input pointers as well as some local pointers.
      This moves about 35Kb to read-only memory as seen by the output of the
      size command.
      
      Before:
         text    data     bss     dec     hex filename
       404938  111534     640  517112   7e3f8 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko
      
      After:
         text    data     bss     dec     hex filename
       439499   76974     640  517113   7e3f9 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko
      Signed-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4b91468
    • D
      Merge branch 'net-bridge-mcast-IGMPv3-MLDv2-fast-path-part-2' · 68d4fd30
      David S. Miller 提交于
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: mcast: IGMPv3/MLDv2 fast-path (part 2)
      
      This is the second part of the IGMPv3/MLDv2 support which adds support
      for the fast-path. In order to be able to handle source entries we add
      mdb support for S,G entries (i.e. we add source address support to
      br_ip), that requires to extend the current mdb netlink API, fortunately
      we just add another attribute which will contain nested future mdb
      attributes, then we use it to add support for S,G user- add, del and
      dump. The lookup sequence is simple: when IGMPv3/MLDv2 are enabled do
      the S,G lookup first and if it fails fallback to *,G. The more complex
      part is when we begin handling source lists and auto-installing S,G entries
      and *,G filter mode transitions. We have the following cases:
       1) *,G INCLUDE -> EXCLUDE transition: we need to install the port in
          all of *,G's installed S,G entries for proper replication (except
          the ones explicitly blocked), this is also necessary when adding a
          new *,G EXCLUDE port group
      
       2) *,G EXCLUDE -> INCLUDE transition: we need to remove the port from
          all of *,G's installed S,G entries, this is also necessary when
          removing a *,G port group
      
       3) New S,G port entry: we need to install all current *,G EXCLUDE ports
      
       4) Remove S,G port entry: if all other port groups were auto-installed we
          can safely remove them and delete the whole S,G entry
      
      Currently we compute these operations from the available ports, their
      source lists and their filter mode. In the future we can extend the port
      group structure and reduce the running time of these ops. Also one
      current limitation is that host-joined S,G entries are not supported.
      I.e. one cannot add "dev bridge port bridge" mdb S,G entries. The host
      join is currently considered an EXCLUDE {} join, so it's reflected in
      all of *,G's installed S,G entries. If an S,G,port entry is added as
      temporary then the kernel can take it over if a source shows up from a
      report, permanent entries are skipped. In order to properly handle
      blocked sources we add a new port group blocked flag to avoid forwarding
      to that port group in the S,G. Finally when forwarding we use the port
      group filter mode (if it's INCLUDE and the port group is from a *,G then
      don't replicate to it, respectively if it's EXCLUDE then forward) and the
      blocked flag (obviously if it's set - skip that port unless it's a
      router port) to decide if the port should be skipped. Another limitation
      is that we can't do some of the above transitions without small traffic
      drop while installing/removing entries. That will be taken care of when
      we add atomic swap of port replication lists later.
      
      Patch break down:
       patches 1-3: prepare the mdb code for better extack support which is
                    used in future patches to return a more meaningful error
       patches 4-6: add the source address field to struct br_ip, and do minor
                    cleanups around it
       patches 7-8: extend the mdb netlink API so we can send new mdb
                    attributes and uses the new API for S,G entry add/del/dump
                    support
       patch     9: takes care of S,G entries when doing a lookup (first S,G
                    then *,G lookup)
       patch    10: adds a new port group field and attribute for origin protocol
                    we use the already available RTPROT_ definitions,
                    currently user-space entries are added as RTPROT_STATIC and
                    kernel entries are added as RTPROT_KERNEL, we may allow
                    user-space to set custom values later (e.g. for FRR, clag)
       patch    11: adds an internal S,G,port rhashtable to speed up filter
                    mode transitions
       patch    12: initial automatic install of S,G entries based on port
                    groups' source lists
       patch    13: handles port group modes on transitions or when new
                    port group entries are added
       patch    14: self-explanatory - adds support for blocked port group
                    entries needed to stop forwarding to particular S,G,port
                    entries
       patch    15: handles host-join/leave state changes, treats host-joins
                    as EXCLUDE {} groups (reflected in all *,G's S,G entries)
       patch    16: finally adds the fast-path filter mode and block flag
                    support
      
      Here're the sets that will come next (in order):
       - iproute2 support for IGMPv3/MLDv2
       - selftests for all mode transitions and group flags
       - explicit host tracking for proper fast-leave support
       - atomic port replication lists (these are also needed for broadcast
         forwarding optimizations)
       - mode transition optimization and removal of open-coded sorted lists
      
      Not implemented yet:
       - Host IGMPv3/MLDv2 filter support (currently we handle only join/leave
         as before)
       - Proper other querier source timer and value updates
       - IGMPv3/v2 MLDv2/v1 compat (I have a few rough patches for this one)
      
      v2: fix build with CONFIG_BATMAN_ADV_MCAST in patch 6
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68d4fd30
    • N
      net: bridge: mcast: when forwarding handle filter mode and blocked flag · 36cfec73
      Nikolay Aleksandrov 提交于
      We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the
      mdst entry is a *,G or when the port has the blocked flag.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36cfec73
    • N
      net: bridge: mcast: handle host state · 094b82fd
      Nikolay Aleksandrov 提交于
      Since host joins are considered as EXCLUDE {} joins we need to reflect
      that in all of *,G ports' S,G entries. Since the S,Gs can have
      host_joined == true only set automatically we can safely set it to false
      when removing all automatically added entries upon S,G delete.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      094b82fd
    • N
      net: bridge: mcast: add support for blocked port groups · 9116ffbf
      Nikolay Aleksandrov 提交于
      When excluding S,G entries we need a way to block a particular S,G,port.
      The new port group flag is managed based on the source's timer as per
      RFCs 3376 and 3810. When a source expires and its port group is in
      EXCLUDE mode, it will be blocked.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9116ffbf
    • N
      net: bridge: mcast: handle port group filter modes · 8266a049
      Nikolay Aleksandrov 提交于
      We need to handle group filter mode transitions and initial state.
      To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
      a new port group in EXCLUDE mode) we need to add that port to all of
      *,G ports' S,G entries for proper replication. When the EXCLUDE state is
      changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
      called after the source list processing because the assumption is that
      all of the group's S,G entries will be created before transitioning to
      EXCLUDE mode, i.e. most importantly its blocked entries will already be
      added so it will not get automatically added to them.
      The transition EXCLUDE -> INCLUDE happens only when a port group timer
      expires, it requires us to remove that port from all of *,G ports' S,G
      entries where it was automatically added previously.
      Finally when we are adding a new S,G entry we must add all of *,G's
      EXCLUDE ports to it.
      In order to distinguish automatically added *,G EXCLUDE ports we have a
      new port group flag - MDB_PG_FLAGS_STAR_EXCL.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8266a049
    • N
      net: bridge: mcast: install S,G entries automatically based on reports · b0812368
      Nikolay Aleksandrov 提交于
      This patch adds support for automatic install of S,G mdb entries based
      on the port group's source list and the source entry's timer.
      Once installed the S,G will be used when forwarding packets if the
      approprate multicast/mld versions are set. A new source flag called
      BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry
      installed.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0812368