1. 09 11月, 2018 2 次提交
    • S
      geneve: Allow configuration of DF behaviour · a025fb5f
      Stefano Brivio 提交于
      draft-ietf-nvo3-geneve-08 says:
      
         It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
         [RFC1981]) be used by setting the DF bit in the IP header when Geneve
         packets are transmitted over IPv4 (this is the default with IPv6).
      
      Now that ICMP error handling is working for GENEVE, we can comply with
      this recommendation.
      
      Make this configurable, though, to avoid breaking existing setups. By
      default, DF won't be set. It can be set or inherited from inner IPv4
      packets. If it's configured to be inherited and we are encapsulating IPv6,
      it will be set.
      
      This only applies to non-lwt tunnels: if an external control plane is
      used, tunnel key will still control the DF flag.
      
      v2:
      - DF behaviour configuration only applies for non-lwt tunnels, apply DF
        setting only if (!geneve->collect_md) in geneve_xmit_skb()
        (Stephen Hemminger)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a025fb5f
    • S
      vxlan: Allow configuration of DF behaviour · b4d30697
      Stefano Brivio 提交于
      Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
      value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
      DF is configured to be inherited, always set it.
      
      For IPv4, inheriting DF from the inner header was probably intended from
      the very beginning judging by the comment to vxlan_xmit(), but it wasn't
      actually implemented -- also because it would have done more harm than
      good, without handling for ICMP Fragmentation Needed messages.
      
      According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
      draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
      PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
      SHOULD set the DF-bit [...]", whatever that means.
      
      Given this background, the only sane option is probably to let the user
      decide, and keep the current behaviour as default.
      
      This only applies to non-lwt tunnels: if an external control plane is
      used, tunnel key will still control the DF flag.
      
      v2:
      - DF behaviour configuration only applies for non-lwt tunnels, move DF
        setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4d30697
  2. 13 10月, 2018 1 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
  3. 13 9月, 2018 1 次提交
  4. 06 9月, 2018 1 次提交
  5. 30 7月, 2018 1 次提交
  6. 24 7月, 2018 1 次提交
    • N
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov 提交于
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2756f68c
  7. 14 7月, 2018 2 次提交
  8. 23 6月, 2018 1 次提交
  9. 26 5月, 2018 1 次提交
  10. 18 4月, 2018 1 次提交
  11. 23 3月, 2018 1 次提交
  12. 17 2月, 2018 1 次提交
  13. 30 1月, 2018 1 次提交
  14. 23 1月, 2018 1 次提交
  15. 09 1月, 2018 1 次提交
  16. 05 11月, 2017 1 次提交
    • J
      rtnetlink: use netnsid to query interface · 79e1ad14
      Jiri Benc 提交于
      Currently, when an application gets netnsid from the kernel (for example as
      the result of RTM_GETLINK call on one end of the veth pair), it's not much
      useful. There's no reliable way to get to the netns fd from the netnsid, nor
      does any kernel API accept netnsid.
      
      Extend the RTM_GETLINK call to also accept netnsid. It will operate on the
      netns with the given netnsid in such case. Of course, the calling process
      needs to have enough capabilities in the target name space; for now, require
      CAP_NET_ADMIN. This can be relaxed in the future.
      
      To signal to the calling process that the kernel understood the new
      IFLA_IF_NETNSID attribute in the query, it will include it in the response.
      This is needed to detect older kernels, as they will just ignore
      IFLA_IF_NETNSID and query in the current name space.
      
      This patch implemetns IFLA_IF_NETNSID only for get and dump. For set
      operations, this can be extended later.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e1ad14
  17. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  18. 29 10月, 2017 2 次提交
    • M
      ipvlan: implement VEPA mode · fe89aa6b
      Mahesh Bandewar 提交于
      This is very similar to the Macvlan VEPA mode, however, there is some
      difference. IPvlan uses the mac-address of the lower device, so the VEPA
      mode has implications of ICMP-redirects for packets destined for its
      immediate neighbors sharing same master since the packets will have same
      source and dest mac. The external switch/router will send redirect msg.
      
      Having said that, this will be useful tool in terms of debugging
      since IPvlan will not switch packets within its slaves and rely completely
      on the external entity as intended in 802.1Qbg.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe89aa6b
    • M
      ipvlan: introduce 'private' attribute for all existing modes. · a190d04d
      Mahesh Bandewar 提交于
      IPvlan has always operated in bridge mode. However there are scenarios
      where each slave should be able to talk through the master device but
      not necessarily across each other. Think of an environment where each
      of a namespace is a private and independant customer. In this scenario
      the machine which is hosting these namespaces neither want to tell who
      their neighbor is nor the individual namespaces care to talk to neighbor
      on short-circuited network path.
      
      This patch implements the mode that is very similar to the 'private' mode
      in macvlan where individual slaves can send and receive traffic through
      the master device, just that they can not talk among slave devices.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a190d04d
  19. 09 10月, 2017 1 次提交
  20. 05 10月, 2017 1 次提交
    • N
      dev: advertise the new nsid when the netns iface changes · 6621dd29
      Nicolas Dichtel 提交于
      x-netns interfaces are bound to two netns: the link netns and the upper
      netns. Usually, this kind of interfaces is created in the link netns and
      then moved to the upper netns. At the end, the interface is visible only
      in the upper netns. The link nsid is advertised via netlink in the upper
      netns, thus the user always knows where is the link part.
      
      There is no such mechanism in the link netns. When the interface is moved
      to another netns, the user cannot "follow" it.
      This patch adds a new netlink attribute which helps to follow an interface
      which moves to another netns. When the interface is unregistered, the new
      nsid is advertised. If the interface is a x-netns interface (ie
      rtnl_link_ops->get_link_net is defined), the nsid is allocated if needed.
      
      CC: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6621dd29
  21. 29 9月, 2017 1 次提交
    • N
      net: bridge: add per-port group_fwd_mask with less restrictions · 5af48b59
      Nikolay Aleksandrov 提交于
      We need to be able to transparently forward most link-local frames via
      tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
      mask which restricts the forwarding of STP and LACP, but we need to be able
      to forward these over tunnels and control that forwarding on a per-port
      basis thus add a new per-port group_fwd_mask option which only disallows
      mac pause frames to be forwarded (they're always dropped anyway).
      The patch does not change the current default situation - all of the others
      are still restricted unless configured for forwarding.
      We have successfully tested this patch with LACP and STP forwarding over
      VxLAN and qinq tunnels.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5af48b59
  22. 24 6月, 2017 2 次提交
  23. 16 6月, 2017 1 次提交
  24. 28 5月, 2017 1 次提交
  25. 12 5月, 2017 2 次提交
    • D
      xdp: refine xdp api with regards to generic xdp · d67b9cd2
      Daniel Borkmann 提交于
      While working on the iproute2 generic XDP frontend, I noticed that
      as of right now it's possible to have native *and* generic XDP
      programs loaded both at the same time for the case when a driver
      supports native XDP.
      
      The intended model for generic XDP from b5cdae32 ("net: Generic
      XDP") is, however, that only one out of the two can be present at
      once which is also indicated as such in the XDP netlink dump part.
      The main rationale for generic XDP is to ease accessibility (in
      case a driver does not yet have XDP support) and to generically
      provide a semantical model as an example for driver developers
      wanting to add XDP support. The generic XDP option for an XDP
      aware driver can still be useful for comparing and testing both
      implementations.
      
      However, it is not intended to have a second XDP processing stage
      or layer with exactly the same functionality of the first native
      stage. Only reason could be to have a partial fallback for future
      XDP features that are not supported yet in the native implementation
      and we probably also shouldn't strive for such fallback and instead
      encourage native feature support in the first place. Given there's
      currently no such fallback issue or use case, lets not go there yet
      if we don't need to.
      
      Therefore, change semantics for loading XDP and bail out if the
      user tries to load a generic XDP program when a native one is
      present and vice versa. Another alternative to bailing out would
      be to handle the transition from one flavor to another gracefully,
      but that would require to bring the device down, exchange both
      types of programs, and bring it up again in order to avoid a tiny
      window where a packet could hit both hooks. Given this complicates
      the logic for just a debugging feature in the native case, I went
      with the simpler variant.
      
      For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae32
      and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all
      or just a subset of flags that were used for loading the XDP prog
      is suboptimal in the long run since not all flags are useful for
      dumping and if we start to reuse the same flag definitions for
      load and dump, then we'll waste bit space. What we really just
      want is to dump the mode for now.
      
      Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0),
      a program is running at the native driver layer (1). Thus, add a
      mode that says that a program is running at generic XDP layer (2).
      Applications will handle this fine in that older binaries will
      just indicate that something is attached at XDP layer, effectively
      this is similar to IFLA_XDP_FLAGS attr that we would have had
      modulo the redundancy.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d67b9cd2
    • D
      xdp: add flag to enforce driver mode · 0489df9a
      Daniel Borkmann 提交于
      After commit b5cdae32 ("net: Generic XDP") we automatically fall
      back to a generic XDP variant if the driver does not support native
      XDP. Allow for an option where the user can specify that always the
      native XDP variant should be selected and in case it's not supported
      by a driver, just bail out.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0489df9a
  26. 28 4月, 2017 1 次提交
  27. 26 4月, 2017 1 次提交
    • D
      net: Generic XDP · b5cdae32
      David S. Miller 提交于
      This provides a generic SKB based non-optimized XDP path which is used
      if either the driver lacks a specific XDP implementation, or the user
      requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.
      
      It is arguable that perhaps I should have required something like
      this as part of the initial XDP feature merge.
      
      I believe this is critical for two reasons:
      
      1) Accessibility.  More people can play with XDP with less
         dependencies.  Yes I know we have XDP support in virtio_net, but
         that just creates another depedency for learning how to use this
         facility.
      
         I wrote this to make life easier for the XDP newbies.
      
      2) As a model for what the expected semantics are.  If there is a pure
         generic core implementation, it serves as a semantic example for
         driver folks adding XDP support.
      
      One thing I have not tried to address here is the issue of
      XDP_PACKET_HEADROOM, thanks to Daniel for spotting that.  It seems
      incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or
      whatever even if the XDP program doesn't try to push headers at all.
      I think we really need the verifier to somehow propagate whether
      certain XDP helpers are used or not.
      
      v5:
       - Handle both negative and positive offset after running prog
       - Fix mac length in XDP_TX case (Alexei)
       - Use rcu_dereference_protected() in free_netdev (kbuild test robot)
      
      v4:
       - Fix MAC header adjustmnet before calling prog (David Ahern)
       - Disable LRO when generic XDP is installed (Michael Chan)
       - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
       - Do not perform generic XDP on reinjected packets (DaveM)
      
      v3:
       - Make sure XDP program sees packet at MAC header, push back MAC
         header if we do XDP_TX.  (Alexei)
       - Elide GRO when generic XDP is in use.  (Alexei)
       - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic
         XDP even if the driver has an XDP implementation.  (Alexei)
       - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS
         attribute.  (Daniel)
      
      v2:
       - Add some "fall through" comments in switch statements based
         upon feedback from Andrew Lunn
       - Use RCU for generic xdp_prog, thanks to Johannes Berg.
      Tested-by: NAndy Gospodarek <andy@greyhouse.net>
      Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5cdae32
  28. 10 4月, 2017 1 次提交
  29. 05 4月, 2017 1 次提交
    • V
      rtnl: Add support for netdev event to link messages · def12888
      Vlad Yasevich 提交于
      When netdev events happen, a rtnetlink_event() handler will send
      messages for every event in it's white list.  These messages contain
      current information about a particular device, but they do not include
      the iformation about which event just happened.  The consumer of
      the message has to try to infer this information.  In some cases
      (ex: NETDEV_NOTIFY_PEERS), that is not possible.
      
      This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
      that would have an encoding of the which event triggered this
      message.  This would allow the the message consumer to easily determine
      if it is interested in a particular event or not.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def12888
  30. 26 3月, 2017 1 次提交
    • J
      gtp: support SGSN-side tunnels · 91ed81f9
      Jonas Bonn 提交于
      The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP
      contexts based on the incoming packets _destination_ address.  If we
      want to place ourselves on the SGSN side of the  tunnel, then we want
      to be identifying PDP contexts based on _source_ address.
      
      Let it be noted that in a "real" configuration this module would never
      be used:  the SGSN normally does not see IP packets as input.  The
      justification for this functionality is for PGW load-testing applications
      where the input to the SGSN is locally generally IP traffic.
      
      This patch adds a "role" argument at GTP-link creation time to specify
      whether we are on the GGSN or SGSN side of the tunnel; this flag is then
      used to determine which part of the IP packet to use in determining
      the PDP context.
      Signed-off-by: NJonas Bonn <jonas@southpole.se>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NHarald Welte <laforge@gnumonks.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91ed81f9
  31. 04 2月, 2017 1 次提交
    • R
      bridge: uapi: add per vlan tunnel info · b3c7ef0a
      Roopa Prabhu 提交于
      New nested netlink attribute to associate tunnel info per vlan.
      This is used by bridge driver to send tunnel metadata to
      bridge ports in vlan tunnel mode. This patch also adds new per
      port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode.
      off by default.
      
      One example use for this is a vxlan bridging gateway or vtep
      which maps vlans to vn-segments (or vnis). User can configure
      per-vlan tunnel information which the bridge driver can use
      to bridge vlan into the corresponding vn-segment.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3c7ef0a
  32. 25 1月, 2017 1 次提交
    • F
      bridge: multicast to unicast · 6db6f0ea
      Felix Fietkau 提交于
      Implements an optional, per bridge port flag and feature to deliver
      multicast packets to any host on the according port via unicast
      individually. This is done by copying the packet per host and
      changing the multicast destination MAC to a unicast one accordingly.
      
      multicast-to-unicast works on top of the multicast snooping feature of
      the bridge. Which means unicast copies are only delivered to hosts which
      are interested in it and signalized this via IGMP/MLD reports
      previously.
      
      This feature is intended for interface types which have a more reliable
      and/or efficient way to deliver unicast packets than broadcast ones
      (e.g. wifi).
      
      However, it should only be enabled on interfaces where no IGMPv2/MLDv1
      report suppression takes place. This feature is disabled by default.
      
      The initial patch and idea is from Felix Fietkau.
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      [linus.luessing@c0d3.blue: various bug + style fixes, commit message]
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6db6f0ea
  33. 18 1月, 2017 1 次提交
    • R
      net: AF-specific RTM_GETSTATS attributes · aefb4d4a
      Robert Shearman 提交于
      Add the functionality for including address-family-specific per-link
      stats in RTM_GETSTATS messages. This is done through adding a new
      IFLA_STATS_AF_SPEC attribute under which address family attributes are
      nested and then the AF-specific attributes can be further nested. This
      follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the
      advantage of presenting an easily extended hierarchy. The rtnl_af_ops
      structure is extended to provide AFs with the opportunity to fill and
      provide the size of their stats attributes.
      
      One alternative would have been to provide AFs with the ability to add
      attributes directly into the RTM_GETSTATS message without a nested
      hierarchy. I discounted this approach as it increases the rate at
      which the 32 attribute number space is used up and it makes
      implementation a little more tricky for stats dump resuming (at the
      moment the order in which attributes are added to the message has to
      match the numeric order of the attributes).
      
      Another alternative would have been to register per-AF RTM_GETSTATS
      handlers. I discounted this approach as I perceived a common use-case
      to be getting all the stats for an interface and this approach would
      necessitate multiple requests/dumps to retrieve them all.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aefb4d4a
  34. 30 11月, 2016 1 次提交
  35. 22 11月, 2016 1 次提交