1. 25 3月, 2021 10 次提交
  2. 24 3月, 2021 6 次提交
    • D
      net: make unregister netdev warning timeout configurable · 5aa3afe1
      Dmitry Vyukov 提交于
      netdev_wait_allrefs() issues a warning if refcount does not drop to 0
      after 10 seconds. While 10 second wait generally should not happen
      under normal workload in normal environment, it seems to fire falsely
      very often during fuzzing and/or in qemu emulation (~10x slower).
      At least it's not possible to understand if it's really a false
      positive or not. Automated testing generally bumps all timeouts
      to very high values to avoid flake failures.
      Add net.core.netdev_unregister_timeout_secs sysctl to make
      the timeout configurable for automated testing systems.
      Lowering the timeout may also be useful for e.g. manual bisection.
      The default value matches the current behavior.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa3afe1
    • V
      net: bridge: add helper to replay VLANs installed on port · 22f67cdf
      Vladimir Oltean 提交于
      Currently this simple setup with DSA:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link add bond0 type bond
      ip link set bond0 master br0
      ip link set swp0 master bond0
      
      will not work because the bridge has created the PVID in br_add_if ->
      nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
      but that was too early, since swp0 was not yet a lower of bond0, so it
      had no reason to act upon that notification.
      
      We need a helper in the bridge to replay the switchdev VLAN objects that
      were notified since the bridge port creation, because some of them may
      have been missed.
      
      As opposed to the br_mdb_replay function, the vg->vlan_list write side
      protection is offered by the rtnl_mutex which is sleepable, so we don't
      need to queue up the objects in atomic context, we can replay them right
      away.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f67cdf
    • V
      net: bridge: add helper to replay port and local fdb entries · 04846f90
      Vladimir Oltean 提交于
      When a switchdev port starts offloading a LAG that is already in a
      bridge and has an FDB entry pointing to it:
      
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp0 master bond0
      
      the switchdev driver will have no idea that this FDB entry is there,
      because it missed the switchdev event emitted at its creation.
      
      Ido Schimmel pointed this out during a discussion about challenges with
      switchdev offloading of stacked interfaces between the physical port and
      the bridge, and recommended to just catch that condition and deny the
      CHANGEUPPER event:
      https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/
      
      But in fact, we might need to deal with the hard thing anyway, which is
      to replay all FDB addresses relevant to this port, because it isn't just
      static FDB entries, but also local addresses (ones that are not
      forwarded but terminated by the bridge). There, we can't just say 'oh
      yeah, there was an upper already so I'm not joining that'.
      
      So, similar to the logic for replaying MDB entries, add a function that
      must be called by individual switchdev drivers and replays local FDB
      entries as well as ones pointing towards a bridge port. This time, we
      use the atomic switchdev notifier block, since that's what FDB entries
      expect for some reason.
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04846f90
    • V
      net: bridge: add helper to replay port and host-joined mdb entries · 4f2673b3
      Vladimir Oltean 提交于
      I have a system with DSA ports, and udhcpcd is configured to bring
      interfaces up as soon as they are created.
      
      I create a bridge as follows:
      
      ip link add br0 type bridge
      
      As soon as I create the bridge and udhcpcd brings it up, I also have
      avahi which automatically starts sending IPv6 packets to advertise some
      local services, and because of that, the br0 bridge joins the following
      IPv6 groups due to the code path detailed below:
      
      33:33:ff:6d:c1:9c vid 0
      33:33:00:00:00:6a vid 0
      33:33:00:00:00:fb vid 0
      
      br_dev_xmit
      -> br_multicast_rcv
         -> br_ip6_multicast_add_group
            -> __br_multicast_add_group
               -> br_multicast_host_join
                  -> br_mdb_notify
      
      This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
      hooked up, and switchdev will attempt to offload the host joined groups
      to an empty list of ports. Of course nobody offloads them.
      
      Then when we add a port to br0:
      
      ip link set swp0 master br0
      
      the bridge doesn't replay the host-joined MDB entries from br_add_if,
      and eventually the host joined addresses expire, and a switchdev
      notification for deleting it is emitted, but surprise, the original
      addition was already completely missed.
      
      The strategy to address this problem is to replay the MDB entries (both
      the port ones and the host joined ones) when the new port joins the
      bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
      be populated and only then attached to a bridge that you offload).
      However there are 2 possibilities: the addresses can be 'pushed' by the
      bridge into the port, or the port can 'pull' them from the bridge.
      
      Considering that in the general case, the new port can be really late to
      the party, and there may have been many other switchdev ports that
      already received the initial notification, we would like to avoid
      delivering duplicate events to them, since they might misbehave. And
      currently, the bridge calls the entire switchdev notifier chain, whereas
      for replaying it should just call the notifier block of the new guy.
      But the bridge doesn't know what is the new guy's notifier block, it
      just knows where the switchdev notifier chain is. So for simplification,
      we make this a driver-initiated pull for now, and the notifier block is
      passed as an argument.
      
      To emulate the calling context for mdb objects (deferred and put on the
      blocking notifier chain), we must iterate under RCU protection through
      the bridge's mdb entries, queue them, and only call them once we're out
      of the RCU read-side critical section.
      
      There was some opportunity for reuse between br_mdb_switchdev_host_port,
      br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
      mdb object is created, so a helper was created.
      Suggested-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f2673b3
    • V
      net: bridge: add helper to retrieve the current ageing time · f1d42ea1
      Vladimir Oltean 提交于
      The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
      
      sysfs/ioctl/netlink
      -> br_set_ageing_time
         -> __set_ageing_time
      
      therefore not at bridge port creation time, so:
      (a) switchdev drivers have to hardcode the initial value for the address
          ageing time, because they didn't get any notification
      (b) that hardcoded value can be out of sync, if the user changes the
          ageing time before enslaving the port to the bridge
      
      We need a helper in the bridge, such that switchdev drivers can query
      the current value of the bridge ageing time when they start offloading
      it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1d42ea1
    • V
      net: bridge: add helper for retrieving the current bridge port STP state · c0e715bb
      Vladimir Oltean 提交于
      It may happen that we have the following topology with DSA or any other
      switchdev driver with LAG offload:
      
      ip link add br0 type bridge stp_state 1
      ip link add bond0 type bond
      ip link set bond0 master br0
      ip link set swp0 master bond0
      ip link set swp1 master bond0
      
      STP decides that it should put bond0 into the BLOCKING state, and
      that's that. The ports that are actively listening for the switchdev
      port attributes emitted for the bond0 bridge port (because they are
      offloading it) and have the honor of seeing that switchdev port
      attribute can react to it, so we can program swp0 and swp1 into the
      BLOCKING state.
      
      But if then we do:
      
      ip link set swp2 master bond0
      
      then as far as the bridge is concerned, nothing has changed: it still
      has one bridge port. But this new bridge port will not see any STP state
      change notification and will remain FORWARDING, which is how the
      standalone code leaves it in.
      
      We need a function in the bridge driver which retrieves the current STP
      state, such that drivers can synchronize to it when they may have missed
      switchdev events.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0e715bb
  3. 23 3月, 2021 6 次提交
    • K
      net: dsa: hellcreek: Report switch name and ID · 1ab568e9
      Kurt Kanzenbach 提交于
      Report the driver name, ASIC ID and the switch name via devlink. This is a
      useful information for user space tooling.
      Signed-off-by: NKurt Kanzenbach <kurt@kmk-computers.de>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ab568e9
    • E
      net: set initial device refcount to 1 · add2d736
      Eric Dumazet 提交于
      When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the
      initial net device refcount was 0.
      
      When CONFIG_PCPU_DEV_REFCNT is not set, this means
      the first dev_hold() triggers an illegal refcount
      operation (addition on 0)
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 0 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x128/0x1a4
      
      Fix is to change initial (and final) refcount to be 1.
      
      Also add a missing kerneldoc piece, as reported by
      Stephen Rothwell.
      
      Fixes: 919067cc ("net: add CONFIG_PCPU_DEV_REFCNT")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NGuenter Roeck <groeck@google.com>
      Tested-by: NGuenter Roeck <groeck@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      add2d736
    • V
      net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h · 744b8376
      Vladimir Oltean 提交于
      ptype_all and ptype_base are declared in net/core/dev.c as non-static,
      because they are used by net-procfs.c too. However, a "make W=1" build
      complains that there was no previous declaration of ptype_all and
      ptype_base in a header file, so this way of declaring things constitutes
      a violation of coding style.
      
      Let's move the extern declarations of ptype_all and ptype_base to the
      linux/netdevice.h file, which is included by net-procfs.c too.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744b8376
    • B
      linux/qed: Mundane spelling fixes throughout the file · 405a129f
      Bhaskar Chowdhury 提交于
      s/unrequired/"not required"/
      s/consme/consume/ .....two different places
      s/accros/across/
      Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
      Acked-by: NIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      405a129f
    • V
      netdev: add netdev_queue_set_dql_min_limit() · f57bac3c
      Vincent Mailhol 提交于
      Add a function to set the dynamic queue limit minimum value.
      
      Some specific drivers might have legitimate reasons to configure
      dql.min_limit to a given value. Typically, this is the case when the
      PDU of the protocol is smaller than the packet size to used to
      carry those frames to the device.
      
      Concrete example: a CAN (Control Area Network) device with an USB 2.0
      interface.  The PDU of classical CAN protocol are roughly 16 bytes but
      the USB packet size (which is used to carry the CAN frames to the
      device) might be up to 512 bytes.  Wen small traffic burst occurs, BQL
      algorithm is not able to immediately adjust and this would result in
      having to send many small USB packets (i.e packet of 16 bytes for each
      CAN frame). Filling up the USB packet with CAN frames is relatively
      fast (small latency issue) but the gain of not having to send several
      small USB packets is huge (big throughput increase). In this case,
      forcing dql.min_limit to a given value that would allow to stuff the
      USB packet is always a win.
      
      This function is to be used by network drivers which are able to prove
      through a rationale and through empirical tests on several environment
      (with other applications, heavy context switching, virtualization...),
      that they constantly reach better performances with a specific
      predefined dql.min_limit value with no noticeable latency impact.
      Signed-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f57bac3c
    • Q
      ice: Enable FDIR Configure for AVF · 1f7ea1cd
      Qi Zhang 提交于
      The virtual channel is going to be extended to support FDIR and
      RSS configure from AVF. New data structures and OP codes will be
      added, the patch enable the FDIR part.
      
      To support above advanced AVF feature, we need to figure out
      what kind of data structure should be passed from VF to PF to describe
      an FDIR rule or RSS config rule. The common part of the requirement is
      we need a data structure to represent the input set selection of a rule's
      hash key.
      
      An input set selection is a group of fields be selected from one or more
      network protocol layers that could be identified as a specific flow.
      For example, select dst IP address from an IPv4 header combined with
      dst port from the TCP header as the input set for an IPv4/TCP flow.
      
      The patch adds a new data structure virtchnl_proto_hdrs to abstract
      a network protocol headers group which is composed of layers of network
      protocol header(virtchnl_proto_hdr).
      
      A protocol header contains a 32 bits mask (field_selector) to describe
      which fields are selected as input sets, as well as a header type
      (enum virtchnl_proto_hdr_type). Each bit is mapped to a field in
      enum virtchnl_proto_hdr_field guided by its header type.
      
      +------------+-----------+------------------------------+
      |            | Proto Hdr | Header Type A                |
      |            |           +------------------------------+
      |            |           | BIT 31 | ... | BIT 1 | BIT 0 |
      |            |-----------+------------------------------+
      |Proto Hdrs  | Proto Hdr | Header Type B                |
      |            |           +------------------------------+
      |            |           | BIT 31 | ... | BIT 1 | BIT 0 |
      |            |-----------+------------------------------+
      |            | Proto Hdr | Header Type C                |
      |            |           +------------------------------+
      |            |           | BIT 31 | ... | BIT 1 | BIT 0 |
      |            |-----------+------------------------------+
      |            |    ....                                  |
      +-------------------------------------------------------+
      
      All fields in enum virtchnl_proto_hdr_fields are grouped with header type
      and the value of the first field of a header type is always 32 aligned.
      
      enum proto_hdr_type {
              header_type_A = 0;
              header_type_B = 1;
              ....
      }
      
      enum proto_hdr_field {
              /* header type A */
              header_A_field_0 = 0,
              header_A_field_1 = 1,
              header_A_field_2 = 2,
              header_A_field_3 = 3,
      
              /* header type B */
              header_B_field_0 = 32, // = header_type_B << 5
              header_B_field_0 = 33,
              header_B_field_0 = 34
              header_B_field_0 = 35,
              ....
      };
      
      So we have:
      proto_hdr_type = proto_hdr_field / 32
      bit offset = proto_hdr_field % 32
      
      To simply the protocol header's operations, couple help macros are added.
      For example, to select src IP and dst port as input set for an IPv4/UDP
      flow.
      
      we have:
      struct virtchnl_proto_hdr hdr[2];
      
      VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[0], IPV4)
      VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[0], IPV4, SRC)
      
      VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[1], UDP)
      VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[1], UDP, DST)
      
      The byte array is used to store the protocol header of a training package.
      The byte array must be network order.
      
      The patch added virtual channel support for iAVF FDIR add/validate/delete
      filter. iAVF FDIR is Flow Director for Intel Adaptive Virtual Function
      which can direct Ethernet packets to the queues of the Network Interface
      Card. Add/delete command is adding or deleting one rule for each virtual
      channel message, while validate command is just verifying if this rule
      is valid without any other operations.
      
      To add or delete one rule, driver needs to config TCAM and Profile,
      build training packets which contains the input set value, and send
      the training packets through FDIR Tx queue. In addition, driver needs to
      manage the software context to avoid adding duplicated rules, deleting
      non-existent rule, input set conflicts and other invalid cases.
      
      NOTE:
      Supported pattern/actions and their parse functions are not be included in
      this patch, they will be added in a separate one.
      Signed-off-by: NJeff Guo <jia.guo@intel.com>
      Signed-off-by: NYahui Cao <yahui.cao@intel.com>
      Signed-off-by: NSimei Su <simei.su@intel.com>
      Signed-off-by: NBeilei Xing <beilei.xing@intel.com>
      Signed-off-by: NQi Zhang <qi.z.zhang@intel.com>
      Tested-by: NChen Bo <BoX.C.Chen@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1f7ea1cd
  4. 20 3月, 2021 1 次提交
    • E
      net: add CONFIG_PCPU_DEV_REFCNT · 919067cc
      Eric Dumazet 提交于
      I was working on a syzbot issue, claiming one device could not be
      dismantled because its refcount was -1
      
      unregister_netdevice: waiting for sit0 to become free. Usage count = -1
      
      It would be nice if syzbot could trigger a warning at the time
      this reference count became negative.
      
      This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults
      to per cpu variables (as before this patch) on SMP builds.
      
      v2: free_dev label in alloc_netdev_mqs() is moved to avoid
          a compiler warning (-Wunused-label), as reported
          by kernel test robot <lkp@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      919067cc
  5. 19 3月, 2021 4 次提交
  6. 18 3月, 2021 1 次提交
  7. 17 3月, 2021 6 次提交
  8. 16 3月, 2021 6 次提交