1. 21 9月, 2020 6 次提交
    • V
      net: dsa: refuse configuration in prepare phase of dsa_port_vlan_filtering() · 707ec383
      Vladimir Oltean 提交于
      The current logic beats me a little bit. The comment that "bridge skips
      -EOPNOTSUPP, so skip the prepare phase" was introduced in commit
      fb2dabad ("net: dsa: support VLAN filtering switchdev attr").
      
      I'm not sure:
      (a) ok, the bridge skips -EOPNOTSUPP, but, so what, where are we
          returning -EOPNOTSUPP?
      (b) even if we are, and I'm just not seeing it, what is the causality
          relationship between the bridge skipping -EOPNOTSUPP and DSA
          skipping the prepare phase, and just returning zero?
      
      One thing is certain beyond doubt though, and that is that DSA currently
      refuses VLAN filtering from the "commit" phase instead of "prepare", and
      that this is not a good thing:
      
      ip link add br0 type bridge
      ip link add br1 type bridge vlan_filtering 1
      ip link set swp2 master br0
      ip link set swp3 master br1
      [ 3790.379389] 001: sja1105 spi0.1: VLAN filtering is a global setting
      [ 3790.379399] 001: ------------[ cut here ]------------
      [ 3790.379403] 001: WARNING: CPU: 1 PID: 515 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
      [ 3790.379420] 001: swp3: Commit of attribute (id=6) failed.
      [ 3790.379533] 001: [<c11ff588>] (switchdev_port_attr_set_now) from [<c11b62e4>] (nbp_vlan_init+0x84/0x148)
      [ 3790.379544] 001: [<c11b62e4>] (nbp_vlan_init) from [<c11a2ff0>] (br_add_if+0x514/0x670)
      [ 3790.379554] 001: [<c11a2ff0>] (br_add_if) from [<c1031b5c>] (do_setlink+0x38c/0xab0)
      [ 3790.379565] 001: [<c1031b5c>] (do_setlink) from [<c1036fe8>] (__rtnl_newlink+0x44c/0x748)
      [ 3790.379573] 001: [<c1036fe8>] (__rtnl_newlink) from [<c1037328>] (rtnl_newlink+0x44/0x60)
      [ 3790.379580] 001: [<c1037328>] (rtnl_newlink) from [<c10315fc>] (rtnetlink_rcv_msg+0x124/0x2f8)
      [ 3790.379590] 001: [<c10315fc>] (rtnetlink_rcv_msg) from [<c10926b8>] (netlink_rcv_skb+0xb8/0x110)
      [ 3790.379806] 001: ---[ end trace 0000000000000002 ]---
      [ 3790.379819] 001: sja1105 spi0.1 swp3: failed to initialize vlan filtering on this port
      
      So move the current logic that may fail (except ds->ops->port_vlan_filtering,
      that is way harder) into the prepare stage of the switchdev transaction.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      707ec383
    • V
      net: dsa: convert denying bridge VLAN with existing 8021q upper to PRECHANGEUPPER · 1ce39f0e
      Vladimir Oltean 提交于
      This is checking for the following order of operations, and makes sure
      to deny that configuration:
      
      ip link add link swp2 name swp2.100 type vlan id 100
      ip link add br0 type bridge vlan_filtering 1
      ip link set swp2 master br0
      bridge vlan add dev swp2 vid 100
      
      Instead of using vlan_for_each(), which looks at the VLAN filters
      installed with vlan_vid_add(), just track the 8021q uppers. This has the
      advantage of freeing up the vlan_vid_add() call for actual VLAN
      filtering.
      
      There is another change in this patch. The check is moved in slave.c,
      from switch.c. I don't think it makes sense to have this 8021q upper
      check for each switch port that gets notified of that VLAN addition
      (these include DSA links and CPU ports, we know those can't have 8021q
      uppers because they don't have a net_device registered for them), so
      just do it in slave.c, for that one slave interface.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce39f0e
    • V
      net: dsa: convert check for 802.1Q upper when bridged into PRECHANGEUPPER · 2b138406
      Vladimir Oltean 提交于
      DSA tries to prevent having a VLAN added by a bridge and by an 802.1Q
      upper at the same time. It does that by checking the VID in
      .ndo_vlan_rx_add_vid(), since that's something that the 8021q module
      calls, via vlan_vid_add(). When a VLAN matches in both subsystems, this
      check returns -EBUSY.
      
      However the vlan_vid_add() function isn't specific to the 8021q module
      in any way at all. It is simply the kernel's way to tell an interface to
      add a VLAN to its RX filter and not drop that VLAN. So there's no reason
      to return -EBUSY when somebody tries to call vlan_vid_add() for a VLAN
      that was installed by the bridge. The proper behavior is to accept that
      configuration.
      
      So what's wrong is how DSA checks that it has an 8021q upper. It should
      look at the actual uppers for that, not just assume that the 8021q
      module was somewhere in the call stack of .ndo_vlan_rx_add_vid().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b138406
    • V
      net: dsa: rename dsa_slave_upper_vlan_check to something more suggestive · eb46e8da
      Vladimir Oltean 提交于
      We'll be adding a new check in the PRECHANGEUPPER notifier, where we'll
      need to check some VLAN uppers. It is hard to do that when there is
      already a function named dsa_slave_upper_vlan_check. So rename this one.
      
      Not to mention that this function probably shouldn't have started with
      "dsa_slave_" in the first place, since the struct net_device argument
      isn't a DSA slave, but an 8021q upper of one.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb46e8da
    • V
      net: dsa: deny enslaving 802.1Q upper to VLAN-aware bridge from PRECHANGEUPPER · 83501299
      Vladimir Oltean 提交于
      There doesn't seem to be any strong technical reason for doing it this
      way, but we'll be adding more checks for invalid upper device
      configurations, and it will be easier to have them all grouped under
      PRECHANGEUPPER.
      
      Tested that it still works:
      ip link set br0 type bridge vlan_filtering 1
      ip link add link swp2 name swp2.100 type vlan id 100
      ip link set swp2.100 master br0
      [   20.321312] br0: port 5(swp2.100) entered blocking state
      [   20.326711] br0: port 5(swp2.100) entered disabled state
      Error: dsa_core: Cannot enslave VLAN device into VLAN aware bridge.
      [   20.346549] br0: port 5(swp2.100) entered blocking state
      [   20.351957] br0: port 5(swp2.100) entered disabled state
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83501299
    • Y
      net: remove unnecessary NULL checking in napi_consume_skb() · 1f14bd99
      Yunsheng Lin 提交于
      When budget is non-zero, skb_unref() has already handled the
      NULL checking.
      
      When budget is zero, the dev_consume_skb_any() has handled NULL
      checking in __dev_kfree_skb_irq(), or dev_kfree_skb() which also
      ultimately call skb_unref().
      
      So remove the unnecessary checking in napi_consume_skb().
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f14bd99
  2. 20 9月, 2020 2 次提交
  3. 19 9月, 2020 22 次提交
    • A
      net: dsa: wire up devlink info get · 0f06b855
      Andrew Lunn 提交于
      Allow the DSA drivers to implement the devlink call to get info info,
      e.g. driver name, firmware version, ASIC ID, etc.
      
      v2:
      Combine declaration and the assignment on a single line.
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f06b855
    • A
      net: dsa: Add devlink regions support to DSA · 97c82c23
      Andrew Lunn 提交于
      Allow DSA drivers to make use of devlink regions, via simple wrappers.
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97c82c23
    • A
      net: dsa: Add helper to convert from devlink to ds · ccc3e6b0
      Andrew Lunn 提交于
      Given a devlink instance, return the dsa switch it is associated to.
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc3e6b0
    • A
      net: devlink: region: Pass the region ops to the snapshot function · d4602a9f
      Andrew Lunn 提交于
      Pass the region to be snapshotted to the function performing the
      snapshot. This allows one function to operate on numerous regions.
      
      v4:
      Add missing kerneldoc for ICE
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4602a9f
    • W
      net: tipc: Supply missing udp_media.h include file · 5f3666e8
      Wang Hai 提交于
      If the header file containing a function's prototype isn't included by
      the sourcefile containing the associated function, the build system
      complains of missing prototypes.
      
      Fixes the following W=1 kernel build warning(s):
      
      net/tipc/udp_media.c:446:5: warning: no previous prototype for ‘tipc_udp_nl_dump_remoteip’ [-Wmissing-prototypes]
      net/tipc/udp_media.c:532:5: warning: no previous prototype for ‘tipc_udp_nl_add_bearer_data’ [-Wmissing-prototypes]
      net/tipc/udp_media.c:614:5: warning: no previous prototype for ‘tipc_udp_nl_bearer_add’ [-Wmissing-prototypes]
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f3666e8
    • Y
      tipc: Remove unused macro CF_SERVER · 7eae7f72
      YueHaibing 提交于
      It is no used any more, so can remove it.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7eae7f72
    • T
      l2tp: fix up inconsistent rx/tx statistics · f52e4b27
      Tom Parkin 提交于
      Historically L2TP core statistics count the L2TP header in the
      per-session and per-tunnel byte counts tracked for transmission and
      receipt.
      
      Now that l2tp_xmit_skb updates tx stats, it is necessary for
      l2tp_xmit_core to pass out the length of the transmitted packet so that
      the statistics can be updated correctly.
      Signed-off-by: NTom Parkin <tparkin@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f52e4b27
    • Z
      net: openswitch: reuse the helper variable to improve the code readablity · 7b066d17
      Zeng Tao 提交于
      In the function ovs_ct_limit_exit, there is already a helper vaibale
      which could be reused to improve the readability, so i fix it in this
      patch.
      Signed-off-by: NZeng Tao <prime.zeng@hisilicon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b066d17
    • R
      net: bridge: delete duplicated words · 4bbd026c
      Randy Dunlap 提交于
      Drop repeated words in net/bridge/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Cc: bridge@lists.linux-foundation.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bbd026c
    • R
      net: atm: delete duplicated words · 563f63e3
      Randy Dunlap 提交于
      Drop repeated words in net/atm/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Chas Williams <3chas3@gmail.com>
      Cc: linux-atm-general@lists.sourceforge.net
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563f63e3
    • R
      net: tipc: delete duplicated words · 60462191
      Randy Dunlap 提交于
      Drop repeated words in net/tipc/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jon Maloy <jmaloy@redhat.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Cc: tipc-discussion@lists.sourceforge.net
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60462191
    • R
      net: bluetooth: delete duplicated words · bb6d6895
      Randy Dunlap 提交于
      Drop repeated words in net/bluetooth/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb6d6895
    • R
      net: ipv6: delete duplicated words · 634a63e7
      Randy Dunlap 提交于
      Drop repeated words in net/ipv6/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634a63e7
    • R
      net: rds: delete duplicated words · d936b1d5
      Randy Dunlap 提交于
      Drop repeated words in net/rds/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: linux-rdma@vger.kernel.org
      Cc: rds-devel@oss.oracle.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d936b1d5
    • R
      net: core: delete duplicated words · 4250b75b
      Randy Dunlap 提交于
      Drop repeated words in net/core/.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4250b75b
    • T
      tipc: add automatic rekeying for encryption key · 23700da2
      Tuong Lien 提交于
      Rekeying is required for security since a key is less secure when using
      for a long time. Also, key will be detached when its nonce value (or
      seqno ...) is exhausted. We now make the rekeying process automatic and
      configurable by user.
      
      Basically, TIPC will at a specific interval generate a new key by using
      the kernel 'Random Number Generator' cipher, then attach it as the node
      TX key and securely distribute to others in the cluster as RX keys (-
      the key exchange). The automatic key switching will then take over, and
      make the new key active shortly. Afterwards, the traffic from this node
      will be encrypted with the new session key. The same can happen in peer
      nodes but not necessarily at the same time.
      
      For simplicity, the automatically generated key will be initiated as a
      per node key. It is not too hard to also support a cluster key rekeying
      (e.g. a given node will generate a unique cluster key and update to the
      others in the cluster...), but that doesn't bring much benefit, while a
      per-node key is even more secure.
      
      We also enable user to force a rekeying or change the rekeying interval
      via netlink, the new 'set key' command option: 'TIPC_NLA_NODE_REKEYING'
      is added for these purposes as follows:
      - A value >= 1 will be set as the rekeying interval (in minutes);
      - A value of 0 will disable the rekeying;
      - A value of 'TIPC_REKEYING_NOW' (~0) will force an immediate rekeying;
      
      The default rekeying interval is (60 * 24) minutes i.e. done every day.
      There isn't any restriction for the value but user shouldn't set it too
      small or too large which results in an "ineffective" rekeying (thats ok
      for testing though).
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23700da2
    • T
      tipc: add automatic session key exchange · 1ef6f7c9
      Tuong Lien 提交于
      With support from the master key option in the previous commit, it
      becomes easy to make frequent updates/exchanges of session keys between
      authenticated cluster nodes.
      Basically, there are two situations where the key exchange will take in
      place:
      
      - When a new node joins the cluster (with the master key), it will need
        to get its peer's TX key, so that be able to decrypt further messages
        from that peer.
      
      - When a new session key is generated (by either user manual setting or
        later automatic rekeying feature), the key will be distributed to all
        peer nodes in the cluster.
      
      A key to be exchanged is encapsulated in the data part of a 'MSG_CRYPTO
      /KEY_DISTR_MSG' TIPC v2 message, then xmit-ed as usual and encrypted by
      using the master key before sending out. Upon receipt of the message it
      will be decrypted in the same way as regular messages, then attached as
      the sender's RX key in the receiver node.
      
      In this way, the key exchange is reliable by the link layer, as well as
      security, integrity and authenticity by the crypto layer.
      
      Also, the forward security will be easily achieved by user changing the
      master key actively but this should not be required very frequently.
      
      The key exchange feature is independent on the presence of a master key
      Note however that the master key still is needed for new nodes to be
      able to join the cluster. It is also optional, and can be turned off/on
      via the sysfs: 'net/tipc/key_exchange_enabled' [default 1: enabled].
      
      Backward compatibility is guaranteed because for nodes that do not have
      master key support, key exchange using master key ie. tx_key = 0 if any
      will be shortly discarded at the message validation step. In other
      words, the key exchange feature will be automatically disabled to those
      nodes.
      
      v2: fix the "implicit declaration of function 'tipc_crypto_key_flush'"
      error in node.c. The function only exists when built with the TIPC
      "CONFIG_TIPC_CRYPTO" option.
      
      v3: use 'info->extack' for a message emitted due to netlink operations
      instead (- David's comment).
      Reported-by: Nkernel test robot <lkp@intel.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ef6f7c9
    • T
      tipc: introduce encryption master key · daef1ee3
      Tuong Lien 提交于
      In addition to the supported cluster & per-node encryption keys for the
      en/decryption of TIPC messages, we now introduce one option for user to
      set a cluster key as 'master key', which is simply a symmetric key like
      the former but has a longer life cycle. It has two purposes:
      
      - Authentication of new member nodes in the cluster. New nodes, having
        no knowledge of current session keys in the cluster will still be
        able to join the cluster as long as they know the master key. This is
        because all neighbor discovery (LINK_CONFIG) messages must be
        encrypted with this key.
      
      - Encryption of session encryption keys during automatic exchange and
        update of those.This is a feature we will introduce in a later commit
        in this series.
      
      We insert the new key into the currently unused slot 0 in the key array
      and start using it immediately once the user has set it.
      After joining, a node only knowing the master key should be fully
      communicable to existing nodes in the cluster, although those nodes may
      have their own session keys activated (i.e. not the master one). To
      support this, we define a 'grace period', starting from the time a node
      itself reports having no RX keys, so the existing nodes will use the
      master key for encryption instead. The grace period can be extended but
      will automatically stop after e.g. 5 seconds without a new report. This
      is also the basis for later key exchanging feature as the new node will
      be impossible to decrypt anything without the support from master key.
      
      For user to set a master key, we define a new netlink flag -
      'TIPC_NLA_NODE_KEY_MASTER', so it can be added to the current 'set key'
      netlink command to specify the setting key to be a master key.
      
      Above all, the traditional cluster/per-node key mechanism is guaranteed
      to work when user comes not to use this master key option. This is also
      compatible to legacy nodes without the feature supported.
      
      Even this master key can be updated without any interruption of cluster
      connectivity but is so is needed, this has to be coordinated and set by
      the user.
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      daef1ee3
    • T
      tipc: optimize key switching time and logic · f779bf79
      Tuong Lien 提交于
      We reduce the lasting time for a pending TX key to be active as well as
      for a passive RX key to be freed which generally helps speed up the key
      switching. It is not expected to be too fast but should not be too slow
      either. Also the key handling logic is simplified that a pending RX key
      will be removed automatically if it is found not working after a number
      of times; the probing for a pending TX key is now carried on a specific
      message user ('LINK_PROTOCOL' or 'LINK_CONFIG') which is more efficient
      than using a timer on broadcast messages, the timer is reserved for use
      later as needed.
      
      The kernel logs or 'pr***()' are now made as clear as possible to user.
      Some prints are added, removed or changed to the debug-level. The
      'TIPC_CRYPTO_DEBUG' definition is removed, and the 'pr_debug()' is used
      instead which will be much helpful in runtime.
      
      Besides we also optimize the code in some other places as a preparation
      for later commits.
      
      v2: silent more kernel logs, also use 'info->extack' for a message
      emitted due to netlink operations instead (- David's comments).
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f779bf79
    • S
      devlink: collect flash notify params into a struct · 6700acc5
      Shannon Nelson 提交于
      The dev flash status notify function parameter lists are getting
      rather long, so add a struct to be filled and passed rather than
      continuously changing the function signatures.
      Signed-off-by: NShannon Nelson <snelson@pensando.io>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700acc5
    • S
      devlink: add timeout information to status_notify · f92970c6
      Shannon Nelson 提交于
      Add a timeout element to the DEVLINK_CMD_FLASH_UPDATE_STATUS
      netlink message for use by a userland utility to show that
      a particular firmware flash activity may take a long but
      bounded time to finish.  Also add a handy helper for drivers
      to make use of the new timeout value.
      
      UI usage hints:
       - if non-zero, add timeout display to the end of the status line
       	[component] status_msg  ( Xm Ys : Am Bs )
           using the timeout value for Am Bs and updating the Xm Ys
           every second
       - if the timeout expires while awaiting the next update,
         display something like
       	[component] status_msg  ( timeout reached : Am Bs )
       - if new status notify messages are received, remove
         the timeout and start over
      Signed-off-by: NShannon Nelson <snelson@pensando.io>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f92970c6
    • F
      net: use exponential backoff in netdev_wait_allrefs · 0e4be9e5
      Francesco Ruggeri 提交于
      The combination of aca_free_rcu, introduced in commit 2384d025
      ("net/ipv6: Add anycast addresses to a global hashtable"), and
      fib6_info_destroy_rcu, introduced in commit 9b0a8da8 ("net/ipv6:
      respect rcu grace period before freeing fib6_info"), can result in
      an extra rcu grace period being needed when deleting an interface,
      with the result that netdev_wait_allrefs ends up hitting the msleep(250),
      which is considerably longer than the required grace period.
      This can result in long delays when deleting a large number of interfaces,
      and it can be observed with this script:
      
      ns=dummy-ns
      NIFS=100
      
      ip netns add $ns
      ip netns exec $ns ip link set lo up
      ip netns exec $ns sysctl net.ipv6.conf.default.disable_ipv6=0
      ip netns exec $ns sysctl net.ipv6.conf.default.forwarding=1
      
      for ((i=0; i<$NIFS; i++))
      do
              if=eth$i
              ip netns exec $ns ip link add $if type dummy
              ip netns exec $ns ip link set $if up
              ip netns exec $ns ip -6 addr add 2021:$i::1/120 dev $if
      done
      
      for ((i=0; i<$NIFS; i++))
      do
              if=eth$i
              ip netns exec $ns ip link del $if
      done
      
      ip netns del $ns
      
      Instead of using a fixed msleep(250), this patch tries an extra
      rcu_barrier() followed by an exponential backoff.
      
      Time with this patch on a 5.4 kernel:
      
      real	0m7.704s
      user	0m0.385s
      sys	0m1.230s
      
      Time without this patch:
      
      real    0m31.522s
      user    0m0.438s
      sys     0m1.156s
      
      v2: use exponential backoff instead of trying to wake up
          netdev_wait_allrefs.
      v3: preserve reverse christmas tree ordering of local variables
      v4: try an extra rcu_barrier before the backoff, plus some
          cosmetic changes.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e4be9e5
  4. 18 9月, 2020 6 次提交
  5. 17 9月, 2020 2 次提交
  6. 16 9月, 2020 2 次提交