1. 25 2月, 2020 2 次提交
  2. 15 1月, 2020 1 次提交
  3. 13 12月, 2019 1 次提交
  4. 02 10月, 2019 1 次提交
  5. 05 7月, 2019 1 次提交
    • V
      bonding: add an option to specify a delay between peer notifications · 07a4ddec
      Vincent Bernat 提交于
      Currently, gratuitous ARP/ND packets are sent every `miimon'
      milliseconds. This commit allows a user to specify a custom delay
      through a new option, `peer_notif_delay'.
      
      Like for `updelay' and `downdelay', this delay should be a multiple of
      `miimon' to avoid managing an additional work queue. The configuration
      logic is copied from `updelay' and `downdelay'. However, the default
      value cannot be set using a module parameter: Netlink or sysfs should
      be used to configure this feature.
      
      When setting `miimon' to 100 and `peer_notif_delay' to 500, we can
      observe the 500 ms delay is respected:
      
          20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
          20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
          20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
          20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
      
      In bond_mii_monitor(), I have tried to keep the lock logic readable.
      The change is due to the fact we cannot rely on a notification to
      lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is
      only triggered once every N times, while we need to decrement the
      counter each time.
      
      iproute2 also needs to be updated to be able to specify this new
      attribute through `ip link'.
      Signed-off-by: NVincent Bernat <vincent@bernat.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07a4ddec
  6. 19 6月, 2019 1 次提交
    • D
      ipoib: show VF broadcast address · 75345f88
      Denis Kirjanov 提交于
      in IPoIB case we can't see a VF broadcast address for but
      can see for PF
      
      Before:
      11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
      state UP mode DEFAULT group default qlen 256
          link/infiniband
      80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
      00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
          vf 0 MAC 14:80:00:00:66:fe, spoof checking off, link-state disable,
      trust off, query_rss off
      ...
      
      After:
      11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
      state UP mode DEFAULT group default qlen 256
          link/infiniband
      80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
      00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
          vf 0     link/infiniband
      80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd
      00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof
      checking off, link-state disable, trust off, query_rss off
      
      v1->v2: add the IFLA_VF_BROADCAST constant
      v2->v3: put IFLA_VF_BROADCAST at the end
      to avoid KABI breakage and set NLA_REJECT
      dev_setlink
      Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75345f88
  7. 23 1月, 2019 1 次提交
    • N
      bonding: add support for xstats and export 3ad stats · a258aeac
      Nikolay Aleksandrov 提交于
      This patch adds support for extended statistics (xstats) call to the
      bonding. The first user would be the 3ad code which counts the following
      events:
       - LACPDU Rx/Tx
       - LACPDU unknown type Rx
       - LACPDU illegal Rx
       - Marker Rx/Tx
       - Marker response Rx/Tx
       - Marker unknown type Rx
      
      All of these are exported via netlink as separate attributes to be
      easily extensible as we plan to add more in the future.
      Similar to how the bridge and other xstats exports, the structure
      inside is:
       [ IFLA_STATS_LINK_XSTATS ]
         -> [ LINK_XSTATS_TYPE_BOND ]
              -> [ BOND_XSTATS_3AD ]
                   -> [ 3ad stats attributes ]
      
      With this structure it's easy to add more stat types later.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a258aeac
  8. 28 11月, 2018 1 次提交
    • N
      net: bridge: add support for user-controlled bool options · a428afe8
      Nikolay Aleksandrov 提交于
      We have been adding many new bridge options, a big number of which are
      boolean but still take up netlink attribute ids and waste space in the skb.
      Recently we discussed learning from link-local packets[1] and decided
      yet another new boolean option will be needed, thus introducing this API
      to save some bridge nl space.
      The API supports changing the value of multiple boolean options at once
      via the br_boolopt_multi struct which has an optmask (which options to
      set, bit per opt) and optval (options' new values). Future boolean
      options will only be added to the br_boolopt_id enum and then will have
      to be handled in br_boolopt_toggle/get. The API will automatically
      add the ability to change and export them via netlink, sysfs can use the
      single boolopt function versions to do the same. The behaviour with
      failing/succeeding is the same as with normal netlink option changing.
      
      If an option requires mapping to internal kernel flag or needs special
      configuration to be enabled then it should be handled in
      br_boolopt_toggle. It should also be able to retrieve an option's current
      state via br_boolopt_get.
      
      v2: WARN_ON() on unsupported option as that shouldn't be possible and
          also will help catch people who add new options without handling
          them for both set and get. Pass down extack so if an option desires
          it could set it on error and be more user-friendly.
      
      [1] https://www.spinics.net/lists/netdev/msg532698.htmlSigned-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a428afe8
  9. 09 11月, 2018 2 次提交
    • S
      geneve: Allow configuration of DF behaviour · a025fb5f
      Stefano Brivio 提交于
      draft-ietf-nvo3-geneve-08 says:
      
         It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
         [RFC1981]) be used by setting the DF bit in the IP header when Geneve
         packets are transmitted over IPv4 (this is the default with IPv6).
      
      Now that ICMP error handling is working for GENEVE, we can comply with
      this recommendation.
      
      Make this configurable, though, to avoid breaking existing setups. By
      default, DF won't be set. It can be set or inherited from inner IPv4
      packets. If it's configured to be inherited and we are encapsulating IPv6,
      it will be set.
      
      This only applies to non-lwt tunnels: if an external control plane is
      used, tunnel key will still control the DF flag.
      
      v2:
      - DF behaviour configuration only applies for non-lwt tunnels, apply DF
        setting only if (!geneve->collect_md) in geneve_xmit_skb()
        (Stephen Hemminger)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a025fb5f
    • S
      vxlan: Allow configuration of DF behaviour · b4d30697
      Stefano Brivio 提交于
      Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
      value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
      DF is configured to be inherited, always set it.
      
      For IPv4, inheriting DF from the inner header was probably intended from
      the very beginning judging by the comment to vxlan_xmit(), but it wasn't
      actually implemented -- also because it would have done more harm than
      good, without handling for ICMP Fragmentation Needed messages.
      
      According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
      draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
      PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
      SHOULD set the DF-bit [...]", whatever that means.
      
      Given this background, the only sane option is probably to let the user
      decide, and keep the current behaviour as default.
      
      This only applies to non-lwt tunnels: if an external control plane is
      used, tunnel key will still control the DF flag.
      
      v2:
      - DF behaviour configuration only applies for non-lwt tunnels, move DF
        setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4d30697
  10. 13 10月, 2018 1 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
  11. 13 9月, 2018 1 次提交
  12. 06 9月, 2018 1 次提交
  13. 30 7月, 2018 1 次提交
  14. 24 7月, 2018 1 次提交
    • N
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov 提交于
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2756f68c
  15. 14 7月, 2018 2 次提交
  16. 23 6月, 2018 1 次提交
  17. 26 5月, 2018 1 次提交
  18. 18 4月, 2018 1 次提交
  19. 23 3月, 2018 1 次提交
  20. 17 2月, 2018 1 次提交
  21. 30 1月, 2018 1 次提交
  22. 23 1月, 2018 1 次提交
  23. 09 1月, 2018 1 次提交
  24. 05 11月, 2017 1 次提交
    • J
      rtnetlink: use netnsid to query interface · 79e1ad14
      Jiri Benc 提交于
      Currently, when an application gets netnsid from the kernel (for example as
      the result of RTM_GETLINK call on one end of the veth pair), it's not much
      useful. There's no reliable way to get to the netns fd from the netnsid, nor
      does any kernel API accept netnsid.
      
      Extend the RTM_GETLINK call to also accept netnsid. It will operate on the
      netns with the given netnsid in such case. Of course, the calling process
      needs to have enough capabilities in the target name space; for now, require
      CAP_NET_ADMIN. This can be relaxed in the future.
      
      To signal to the calling process that the kernel understood the new
      IFLA_IF_NETNSID attribute in the query, it will include it in the response.
      This is needed to detect older kernels, as they will just ignore
      IFLA_IF_NETNSID and query in the current name space.
      
      This patch implemetns IFLA_IF_NETNSID only for get and dump. For set
      operations, this can be extended later.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e1ad14
  25. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  26. 29 10月, 2017 2 次提交
    • M
      ipvlan: implement VEPA mode · fe89aa6b
      Mahesh Bandewar 提交于
      This is very similar to the Macvlan VEPA mode, however, there is some
      difference. IPvlan uses the mac-address of the lower device, so the VEPA
      mode has implications of ICMP-redirects for packets destined for its
      immediate neighbors sharing same master since the packets will have same
      source and dest mac. The external switch/router will send redirect msg.
      
      Having said that, this will be useful tool in terms of debugging
      since IPvlan will not switch packets within its slaves and rely completely
      on the external entity as intended in 802.1Qbg.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe89aa6b
    • M
      ipvlan: introduce 'private' attribute for all existing modes. · a190d04d
      Mahesh Bandewar 提交于
      IPvlan has always operated in bridge mode. However there are scenarios
      where each slave should be able to talk through the master device but
      not necessarily across each other. Think of an environment where each
      of a namespace is a private and independant customer. In this scenario
      the machine which is hosting these namespaces neither want to tell who
      their neighbor is nor the individual namespaces care to talk to neighbor
      on short-circuited network path.
      
      This patch implements the mode that is very similar to the 'private' mode
      in macvlan where individual slaves can send and receive traffic through
      the master device, just that they can not talk among slave devices.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a190d04d
  27. 09 10月, 2017 1 次提交
  28. 05 10月, 2017 1 次提交
    • N
      dev: advertise the new nsid when the netns iface changes · 6621dd29
      Nicolas Dichtel 提交于
      x-netns interfaces are bound to two netns: the link netns and the upper
      netns. Usually, this kind of interfaces is created in the link netns and
      then moved to the upper netns. At the end, the interface is visible only
      in the upper netns. The link nsid is advertised via netlink in the upper
      netns, thus the user always knows where is the link part.
      
      There is no such mechanism in the link netns. When the interface is moved
      to another netns, the user cannot "follow" it.
      This patch adds a new netlink attribute which helps to follow an interface
      which moves to another netns. When the interface is unregistered, the new
      nsid is advertised. If the interface is a x-netns interface (ie
      rtnl_link_ops->get_link_net is defined), the nsid is allocated if needed.
      
      CC: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6621dd29
  29. 29 9月, 2017 1 次提交
    • N
      net: bridge: add per-port group_fwd_mask with less restrictions · 5af48b59
      Nikolay Aleksandrov 提交于
      We need to be able to transparently forward most link-local frames via
      tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
      mask which restricts the forwarding of STP and LACP, but we need to be able
      to forward these over tunnels and control that forwarding on a per-port
      basis thus add a new per-port group_fwd_mask option which only disallows
      mac pause frames to be forwarded (they're always dropped anyway).
      The patch does not change the current default situation - all of the others
      are still restricted unless configured for forwarding.
      We have successfully tested this patch with LACP and STP forwarding over
      VxLAN and qinq tunnels.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5af48b59
  30. 24 6月, 2017 2 次提交
  31. 16 6月, 2017 1 次提交
  32. 28 5月, 2017 1 次提交
  33. 12 5月, 2017 2 次提交
    • D
      xdp: refine xdp api with regards to generic xdp · d67b9cd2
      Daniel Borkmann 提交于
      While working on the iproute2 generic XDP frontend, I noticed that
      as of right now it's possible to have native *and* generic XDP
      programs loaded both at the same time for the case when a driver
      supports native XDP.
      
      The intended model for generic XDP from b5cdae32 ("net: Generic
      XDP") is, however, that only one out of the two can be present at
      once which is also indicated as such in the XDP netlink dump part.
      The main rationale for generic XDP is to ease accessibility (in
      case a driver does not yet have XDP support) and to generically
      provide a semantical model as an example for driver developers
      wanting to add XDP support. The generic XDP option for an XDP
      aware driver can still be useful for comparing and testing both
      implementations.
      
      However, it is not intended to have a second XDP processing stage
      or layer with exactly the same functionality of the first native
      stage. Only reason could be to have a partial fallback for future
      XDP features that are not supported yet in the native implementation
      and we probably also shouldn't strive for such fallback and instead
      encourage native feature support in the first place. Given there's
      currently no such fallback issue or use case, lets not go there yet
      if we don't need to.
      
      Therefore, change semantics for loading XDP and bail out if the
      user tries to load a generic XDP program when a native one is
      present and vice versa. Another alternative to bailing out would
      be to handle the transition from one flavor to another gracefully,
      but that would require to bring the device down, exchange both
      types of programs, and bring it up again in order to avoid a tiny
      window where a packet could hit both hooks. Given this complicates
      the logic for just a debugging feature in the native case, I went
      with the simpler variant.
      
      For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae32
      and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
      or just a subset of flags that were used for loading the XDP prog
      is suboptimal in the long run since not all flags are useful for
      dumping and if we start to reuse the same flag definitions for
      load and dump, then we'll waste bit space. What we really just
      want is to dump the mode for now.
      
      Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
      a program is running at the native driver layer (1). Thus, add a
      mode that says that a program is running at generic XDP layer (2).
      Applications will handle this fine in that older binaries will
      just indicate that something is attached at XDP layer, effectively
      this is similar to IFLA_XDP_FLAGS attr that we would have had
      modulo the redundancy.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d67b9cd2
    • D
      xdp: add flag to enforce driver mode · 0489df9a
      Daniel Borkmann 提交于
      After commit b5cdae32 ("net: Generic XDP") we automatically fall
      back to a generic XDP variant if the driver does not support native
      XDP. Allow for an option where the user can specify that always the
      native XDP variant should be selected and in case it's not supported
      by a driver, just bail out.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0489df9a
  34. 28 4月, 2017 1 次提交