1. 19 9月, 2020 1 次提交
    • S
      devlink: add timeout information to status_notify · f92970c6
      Shannon Nelson 提交于
      Add a timeout element to the DEVLINK_CMD_FLASH_UPDATE_STATUS
      netlink message for use by a userland utility to show that
      a particular firmware flash activity may take a long but
      bounded time to finish.  Also add a handy helper for drivers
      to make use of the new timeout value.
      
      UI usage hints:
       - if non-zero, add timeout display to the end of the status line
       	[component] status_msg  ( Xm Ys : Am Bs )
           using the timeout value for Am Bs and updating the Xm Ys
           every second
       - if the timeout expires while awaiting the next update,
         display something like
       	[component] status_msg  ( timeout reached : Am Bs )
       - if new status notify messages are received, remove
         the timeout and start over
      Signed-off-by: NShannon Nelson <snelson@pensando.io>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f92970c6
  2. 18 9月, 2020 4 次提交
  3. 16 9月, 2020 8 次提交
  4. 15 9月, 2020 3 次提交
  5. 14 9月, 2020 1 次提交
  6. 12 9月, 2020 4 次提交
    • D
      net: phy: mchp: Add support for LAN8814 QUAD PHY · 1623ad8e
      Divya Koppera 提交于
      LAN8814 is a low-power, quad-port triple-speed (10BASE-T/100BASETX/1000BASE-T)
      Ethernet physical layer transceiver (PHY). It supports transmission and
      reception of data on standard CAT-5, as well as CAT-5e and CAT-6, unshielded
      twisted pair (UTP) cables.
      
      LAN8814 supports industry-standard QSGMII (Quad Serial Gigabit Media
      Independent Interface) and Q-USGMII (Quad Universal Serial Gigabit Media
      Independent Interface) providing chip-to-chip connection to four Gigabit
      Ethernet MACs using a single serialized link (differential pair) in each
      direction.
      
      The LAN8814 SKU supports high-accuracy timestamping functions to
      support IEEE-1588 solutions using Microchip Ethernet switches, as well as
      customer solutions based on SoCs and FPGAs.
      
      The LAN8804 SKU has same features as that of LAN8814 SKU except that it does
      not support 1588, SyncE, or Q-USGMII with PCH/MCH.
      
      This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T,
      QSGMII link with the MAC.
      
      Signed-off-by: Divya Koppera<divya.koppera@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1623ad8e
    • V
      net: dsa: tag_8021q: add a context structure · 5899ee36
      Vladimir Oltean 提交于
      While working on another tag_8021q driver implementation, some things
      became apparent:
      
      - It is not mandatory for a DSA driver to offload the tag_8021q VLANs by
        using the VLAN table per se. For example, it can add custom TCAM rules
        that simply encapsulate RX traffic, and redirect & decapsulate rules
        for TX traffic. For such a driver, it makes no sense to receive the
        tag_8021q configuration through the same callback as it receives the
        VLAN configuration from the bridge and the 8021q modules.
      
      - Currently, sja1105 (the only tag_8021q user) sets a
        priv->expect_dsa_8021q variable to distinguish between the bridge
        calling, and tag_8021q calling. That can be improved, to say the
        least.
      
      - The crosschip bridging operations are, in fact, stateful already. The
        list of crosschip_links must be kept by the caller and passed to the
        relevant tag_8021q functions.
      
      So it would be nice if the tag_8021q configuration was more
      self-contained. This patch attempts to do that.
      
      Create a struct dsa_8021q_context which encapsulates a struct
      dsa_switch, and has 2 function pointers for adding and deleting a VLAN.
      These will replace the previous channel to the driver, which was through
      the .port_vlan_add and .port_vlan_del callbacks of dsa_switch_ops.
      
      Also put the list of crosschip_links into this dsa_8021q_context.
      Drivers that don't support cross-chip bridging can simply omit to
      initialize this list, as long as they dont call any cross-chip function.
      
      The sja1105_vlan_add and sja1105_vlan_del functions are refactored into
      a smaller sja1105_vlan_add_one, which now has 2 entry points:
      - sja1105_vlan_add, from struct dsa_switch_ops
      - sja1105_dsa_8021q_vlan_add, from the tag_8021q ops
      But even this change is fairly trivial. It just reflects the fact that
      for sja1105, the VLANs from these 2 channels end up in the same hardware
      table. However that is not necessarily true in the general sense (and
      that's the reason for making this change).
      
      The rest of the patch is mostly plain refactoring of "ds" -> "ctx". The
      dsa_8021q_context structure needs to be propagated because adding a VLAN
      is now done through the ops function pointers inside of it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5899ee36
    • V
      net: dsa: tag_8021q: setup tagging via a single function call · 7e092af2
      Vladimir Oltean 提交于
      There is no point in calling dsa_port_setup_8021q_tagging for each
      individual port. Additionally, it will become more difficult to do that
      when we'll have a context structure to tag_8021q (next patch). So
      refactor this now.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e092af2
    • V
      net: dsa: tag_8021q: include missing refcount.h · 568a36a6
      Vladimir Oltean 提交于
      The previous assumption was that the caller would already have this
      header file included.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      568a36a6
  7. 11 9月, 2020 7 次提交
    • W
      tcp: reflect tos value received in SYN to the socket · ac8f1710
      Wei Wang 提交于
      This commit adds a new TCP feature to reflect the tos value received in
      SYN, and send it out on the SYN-ACK, and eventually set the tos value of
      the established socket with this reflected tos value. This provides a
      way to set the traffic class/QoS level for all traffic in the same
      connection to be the same as the incoming SYN request. It could be
      useful in data centers to provide equivalent QoS according to the
      incoming request.
      This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by
      default turned off.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac8f1710
    • W
      ip: pass tos into ip_build_and_send_pkt() · de033b7d
      Wei Wang 提交于
      This commit adds tos as a new passed in parameter to
      ip_build_and_send_pkt() which will be used in the later commit.
      This is a pure restructure and does not have any functional change.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de033b7d
    • W
      tcp: record received TOS value in the request socket · e9b12edc
      Wei Wang 提交于
      A new field is added to the request sock to record the TOS value
      received on the listening socket during 3WHS:
      When not under syn flood, it is recording the TOS value sent in SYN.
      When under syn flood, it is recording the TOS value sent in the ACK.
      This is a preparation patch in order to do TOS reflection in the later
      commit.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9b12edc
    • J
      net: manage napi add/del idempotence explicitly · 4d092dd2
      Jakub Kicinski 提交于
      To RCUify napi->dev_list we need to replace list_del_init()
      with list_del_rcu(). There is no _init() version for RCU for
      obvious reasons. Up until now netif_napi_del() was idempotent
      so to make sure it remains such add a bit which is set when
      NAPI is listed, and cleared when it removed. Since we don't
      expect multiple calls to netif_napi_add() to be correct,
      add a warning on that side.
      
      Now that napi_hash_add / napi_hash_del are only called by
      napi_add / del we can actually steal its bit. We just need
      to make sure hash node is initialized correctly.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d092dd2
    • J
      net: remove napi_hash_del() from driver-facing API · 5198d545
      Jakub Kicinski 提交于
      We allow drivers to call napi_hash_del() before calling
      netif_napi_del() to batch RCU grace periods. This makes
      the API asymmetric and leaks internal implementation details.
      Soon we will want the grace period to protect more than just
      the NAPI hash table.
      
      Restructure the API and have drivers call a new function -
      __netif_napi_del() if they want to take care of RCU waits.
      
      Note that only core was checking the return status from
      napi_hash_del() so the new helper does not report if the
      NAPI was actually deleted.
      
      Some notes on driver oddness:
       - veth observed the grace period before calling netif_napi_del()
         but that should not matter
       - myri10ge observed normal RCU flavor
       - bnx2x and enic did not actually observe the grace period
         (unless they did so implicitly)
       - virtio_net and enic only unhashed Rx NAPIs
      
      The last two points seem to indicate that the calls to
      napi_hash_del() were a left over rather than an optimization.
      Regardless, it's easy enough to correct them.
      
      This patch may introduce extra synchronize_net() calls for
      interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
      free_netdev() to call netif_napi_del(). This seems inevitable
      since we want to use RCU for netpoll dev->napi_list traversal,
      and almost no drivers set IFF_DISABLE_NETPOLL.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5198d545
    • P
      ipmr: Add high byte of VIF ID to igmpmsg · c8715a8e
      Paul Davey 提交于
      Use the unused3 byte in struct igmpmsg to hold the high 8 bits of the
      VIF ID.
      
      If using more than 255 IPv4 multicast interfaces it is necessary to have
      access to a VIF ID for cache reports that is wider than 8 bits, the VIF
      ID present in the igmpmsg reports sent to mroute_sk was only 8 bits wide
      in the igmpmsg header.  Adding the high 8 bits of the 16 bit VIF ID in
      the unused byte allows use of more than 255 IPv4 multicast interfaces.
      Signed-off-by: NPaul Davey <paul.davey@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8715a8e
    • P
      ipmr: Add route table ID to netlink cache reports · 501cb008
      Paul Davey 提交于
      Insert the multicast route table ID as a Netlink attribute to Netlink
      cache report notifications.
      
      When multiple route tables are in use it is necessary to have a way to
      determine which route table a given cache report belongs to when
      receiving the cache report.
      Signed-off-by: NPaul Davey <paul.davey@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      501cb008
  8. 10 9月, 2020 4 次提交
    • P
      devlink: Introduce controller number · 3a2d9588
      Parav Pandit 提交于
      A devlink port may be for a controller consist of PCI device.
      A devlink instance holds ports of two types of controllers.
      (1) controller discovered on same system where eswitch resides
      This is the case where PCI PF/VF of a controller and devlink eswitch
      instance both are located on a single system.
      (2) controller located on external host system.
      This is the case where a controller is located in one system and its
      devlink eswitch ports are located in a different system.
      
      When a devlink eswitch instance serves the devlink ports of both
      controllers together, PCI PF/VF numbers may overlap.
      Due to this a unique phys_port_name cannot be constructed.
      
      For example in below such system controller-0 and controller-1, each has
      PCI PF pf0 whose eswitch ports can be present in controller-0.
      These results in phys_port_name as "pf0" for both.
      Similar problem exists for VFs and upcoming Sub functions.
      
      An example view of two controller systems:
      
                   ---------------------------------------------------------
                   |                                                       |
                   |           --------- ---------         ------- ------- |
      -----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
      | server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
      | pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
      | connect |  | -------                       -------                 |
      -----------  |     | controller_num=1 (no eswitch)                   |
                   ------|--------------------------------------------------
                   (internal wire)
                         |
                   ---------------------------------------------------------
                   | devlink eswitch ports and reps                        |
                   | ----------------------------------------------------- |
                   | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
                   | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
                   | ----------------------------------------------------- |
                   | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
                   | |pf1    | pf1vfN | pf1sfN | pf1    | pf1vfN |pf0sfN | |
                   | ----------------------------------------------------- |
                   |                                                       |
                   |                                                       |
                   |           --------- ---------         ------- ------- |
                   |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
                   | -------   ----/---- ---/----- ------- ---/--- ---/--- |
                   | | pf0 |______/________/       | pf1 |___/_______/     |
                   | -------                       -------                 |
                   |                                                       |
                   |  local controller_num=0 (eswitch)                     |
                   ---------------------------------------------------------
      
      An example devlink port for external controller with controller
      number = 1 for a VF 1 of PF 0:
      
      $ devlink port show pci/0000:06:00.0/2
      pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
        function:
          hw_addr 00:00:00:00:00:00
      
      $ devlink port show pci/0000:06:00.0/2 -jp
      {
          "port": {
              "pci/0000:06:00.0/2": {
                  "type": "eth",
                  "netdev": "ens2f0pf0vf1",
                  "flavour": "pcivf",
                  "controller": 1,
                  "pfnum": 0,
                  "vfnum": 1,
                  "external": true,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:00:00"
                  }
              }
          }
      }
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a2d9588
    • P
      devlink: Introduce external controller flag · 05b595e9
      Parav Pandit 提交于
      A devlink eswitch port may represent PCI PF/VF ports of a controller.
      
      A controller either located on same system or it can be an external
      controller located in host where such NIC is plugged in.
      
      Add the ability for driver to specify if a port is for external
      controller.
      
      Use such flag in the mlx5_core driver.
      
      An example of an external controller having VF1 of PF0 belong to
      controller 1.
      
      $ devlink port show pci/0000:06:00.0/2
      pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
        function:
          hw_addr 00:00:00:00:00:00
      $ devlink port show pci/0000:06:00.0/2 -jp
      {
          "port": {
              "pci/0000:06:00.0/2": {
                  "type": "eth",
                  "netdev": "ens2f0pf0vf1",
                  "flavour": "pcivf",
                  "pfnum": 0,
                  "vfnum": 1,
                  "external": true,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:00:00"
                  }
              }
          }
      }
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05b595e9
    • P
      devlink: Move structure comments outside of structure · ff03e63a
      Parav Pandit 提交于
      To add more fields to the PCI PF and VF port attributes, follow standard
      structure comment format.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff03e63a
    • P
      devlink: Add comment block for missing port attributes · 2efbe6ae
      Parav Pandit 提交于
      Add comment block for physical, PF and VF port attributes.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2efbe6ae
  9. 09 9月, 2020 1 次提交
    • D
      rxrpc: Rewrite the client connection manager · 245500d8
      David Howells 提交于
      Rewrite the rxrpc client connection manager so that it can support multiple
      connections for a given security key to a peer.  The following changes are
      made:
      
       (1) For each open socket, the code currently maintains an rbtree with the
           connections placed into it, keyed by communications parameters.  This
           is tricky to maintain as connections can be culled from the tree or
           replaced within it.  Connections can require replacement for a number
           of reasons, e.g. their IDs span too great a range for the IDR data
           type to represent efficiently, the call ID numbers on that conn would
           overflow or the conn got aborted.
      
           This is changed so that there's now a connection bundle object placed
           in the tree, keyed on the same parameters.  The bundle, however, does
           not need to be replaced.
      
       (2) An rxrpc_bundle object can now manage the available channels for a set
           of parallel connections.  The lock that manages this is moved there
           from the rxrpc_connection struct (channel_lock).
      
       (3) There'a a dummy bundle for all incoming connections to share so that
           they have a channel_lock too.  It might be better to give each
           incoming connection its own bundle.  This bundle is not needed to
           manage which channels incoming calls are made on because that's the
           solely at whim of the client.
      
       (4) The restrictions on how many client connections are around are
           removed.  Instead, a previous patch limits the number of client calls
           that can be allocated.  Ordinarily, client connections are reaped
           after 2 minutes on the idle queue, but when more than a certain number
           of connections are in existence, the reaper starts reaping them after
           2s of idleness instead to get the numbers back down.
      
           It could also be made such that new call allocations are forced to
           wait until the number of outstanding connections subsides.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      245500d8
  10. 08 9月, 2020 3 次提交
    • J
      netfilter: nf_tables: add userdata support for nft_object · b131c964
      Jose M. Guisado Gomez 提交于
      Enables storing userdata for nft_object. Initially this will store an
      optional comment but can be extended in the future as needed.
      
      Adds new attribute NFTA_OBJ_USERDATA to nft_object.
      Signed-off-by: NJose M. Guisado Gomez <guigom@riseup.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b131c964
    • J
      net: tighten the definition of interface statistics · 0db0c34c
      Jakub Kicinski 提交于
      This patch is born out of an investigation into which IEEE statistics
      correspond to which struct rtnl_link_stats64 members. Turns out that
      there seems to be reasonable consensus on the matter, among many drivers.
      To save others the time (and it took more time than I'm comfortable
      admitting) I'm adding comments referring to IEEE attributes to
      struct rtnl_link_stats64.
      
      Up until now we had two forms of documentation for stats - in
      Documentation/ABI/testing/sysfs-class-net-statistics and the comments
      on struct rtnl_link_stats64 itself. While the former is very cautious
      in defining the expected behavior, the latter feel quite dated and
      may not be easy to understand for modern day driver author
      (e.g. rx_over_errors). At the same time modern systems are far more
      complex and once obvious definitions lost their clarity. For example
      - does rx_packet count at the MAC layer (aFramesReceivedOK)?
      packets processed correctly by hardware? received by the driver?
      or maybe received by the stack?
      
      I tried to clarify the expectations, further clarifications from
      others are very welcome.
      
      The part hardest to untangle is rx_over_errors vs rx_fifo_errors
      vs rx_missed_errors. After much deliberation I concluded that for
      modern HW only two of the counters will make sense. The distinction
      between internal FIFO overflow and packets dropped due to back-pressure
      from the host is likely too implementation (driver and device) specific
      to expose in the standard stats.
      
      Now - which two of those counters we select to use is anyone's pick:
      
      sysfs documentation suggests rx_over_errors counts packets which
      did not fit into buffers due to MTU being too small, which I reused.
      There don't seem to be many modern drivers using it (well, CAN drivers
      seem to love this statistic).
      
      Of the remaining two I picked rx_missed_errors to report device drops.
      bnxt reports it and it's folded into "drop"s in procfs (while
      rx_fifo_errors is an error, and modern devices usually receive the frame
      OK, they just can't admit it into the pipeline).
      
      Of the drivers I looked at only AMD Lance-like and NS8390-like use all
      three of these counters. rx_missed_errors counts missed frames,
      rx_over_errors counts overflow events, and rx_fifo_errors counts frames
      which were truncated because they didn't fit into buffers. This suggests
      that rx_fifo_errors may be the correct stat for truncated packets, but
      I'd think a FIFO stat counting truncated packets would be very confusing
      to a modern reader.
      
      v2:
       - add driver developer notes about ethtool stat count and reset
       - replace Ethernet with IEEE 802.3 to better indicate source of attrs
       - mention byte counters don't count FCS
       - clarify RX counter is from device to host
       - drop "sightly" from sysfs paragraph
       - add examples of ethtool stats
       - s/incoming/received/ s/incoming/transmitted/
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0db0c34c
    • N
      net: bridge: mcast: add support for src list and filter mode dumping · 5205e919
      Nikolay Aleksandrov 提交于
      Support per port group src list (address and timer) and filter mode
      dumping. Protected by either multicast_lock or rcu.
      
      v3: add IPv6 support
      v2: require RCU or multicast_lock to traverse src groups
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5205e919
  11. 06 9月, 2020 1 次提交
  12. 05 9月, 2020 2 次提交
  13. 04 9月, 2020 1 次提交