1. 12 2月, 2021 4 次提交
  2. 09 2月, 2021 3 次提交
    • A
      IPv6: Add "offload failed" indication to routes · 0c5fcf9e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv6 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib6_info' is extended with new field that indicates if route
      offload failed. Note that the new field is added using unused bit and
      therefore there is no need to increase struct size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5fcf9e
    • A
      IPv4: Add "offload failed" indication to routes · 36c5100e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv4 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib_alias', and 'struct fib_rt_info' are extended with new field
      that indicates if route offload failed. Note that the new field is added
      using unused bit and therefore there is no need to increase structs size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36c5100e
    • H
      switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT · 059d2a10
      Horatiu Vultur 提交于
      Now that MRP started to use also SWITCHDEV_ATTR_ID_PORT_STP_STATE to
      notify HW, then SWITCHDEV_ATTR_ID_MRP_PORT_STAT is not used anywhere
      else, therefore we can remove it.
      
      Fixes: c284b545 ("switchdev: mrp: Extend switchdev API to offload MRP")
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      059d2a10
  3. 05 2月, 2021 4 次提交
    • L
      netfilter: move handlers to net/ip_vs.h · edf597da
      Leon Romanovsky 提交于
      Fix the following compilation warnings:
      net/netfilter/ipvs/ip_vs_proto_tcp.c:147:1: warning: no previous prototype for 'tcp_snat_handler' [-Wmissing-prototypes]
        147 | tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
            | ^~~~~~~~~~~~~~~~
      net/netfilter/ipvs/ip_vs_proto_udp.c:136:1: warning: no previous prototype for 'udp_snat_handler' [-Wmissing-prototypes]
        136 | udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
            | ^~~~~~~~~~~~~~~~
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      edf597da
    • L
      net/core: move gro function declarations to separate header · 04f00ab2
      Leon Romanovsky 提交于
      Fir the following compilation warnings:
       1031 | INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
      
      net/ipv6/ip6_offload.c:182:41: warning: no previous prototype for ‘ipv6_gro_receive’ [-Wmissing-prototypes]
        182 | INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
            |                                         ^~~~~~~~~~~~~~~~
      net/ipv6/ip6_offload.c:320:29: warning: no previous prototype for ‘ipv6_gro_complete’ [-Wmissing-prototypes]
        320 | INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
            |                             ^~~~~~~~~~~~~~~~~
      net/ipv6/ip6_offload.c:182:41: warning: no previous prototype for ‘ipv6_gro_receive’ [-Wmissing-prototypes]
        182 | INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
            |                                         ^~~~~~~~~~~~~~~~
      net/ipv6/ip6_offload.c:320:29: warning: no previous prototype for ‘ipv6_gro_complete’ [-Wmissing-prototypes]
        320 | INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      04f00ab2
    • L
      ipv6: move udp declarations to net/udp.h · f9a4719c
      Leon Romanovsky 提交于
      Fix the following compilation warning:
      
      net/ipv6/udp.c:1031:30: warning: no previous prototype for 'udp_v6_early_demux' [-Wmissing-prototypes]
       1031 | INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
            |                              ^~~~~~~~~~~~~~~~~~
      net/ipv6/udp.c:1072:29: warning: no previous prototype for 'udpv6_rcv' [-Wmissing-prototypes]
       1072 | INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
            |                             ^~~~~~~~~
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f9a4719c
    • X
      udp: call udp_encap_enable for v6 sockets when enabling encap · a4a600dd
      Xin Long 提交于
      When enabling encap for a ipv6 socket without udp_encap_needed_key
      increased, UDP GRO won't work for v4 mapped v6 address packets as
      sk will be NULL in udp4_gro_receive().
      
      This patch is to enable it by increasing udp_encap_needed_key for
      v6 sockets in udp_tunnel_encap_enable(), and correspondingly
      decrease udp_encap_needed_key in udpv6_destroy_sock().
      
      v1->v2:
        - add udp_encap_disable() and export it.
      v2->v3:
        - add the change for rxrpc and bareudp into one patch, as Alex
          suggested.
      v3->v4:
        - move rxrpc part to another patch.
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a4a600dd
  4. 04 2月, 2021 5 次提交
  5. 03 2月, 2021 3 次提交
    • A
      net: ipv6: Emit notification when fib hardware flags are changed · 907eea48
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel,
      but not necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead
      to a routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      It is also possible for a route already installed in hardware to change
      its action and therefore its flags. For example, a host route that is
      trapping packets can be "promoted" to perform decapsulation following
      the installation of an IPinIP/VXLAN tunnel.
      
      Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed. The aim is to provide an indication to user-space
      (e.g., routing daemons) about the state of the route in hardware.
      
      Introduce a sysctl that controls this behavior.
      
      Keep the default value at 0 (i.e., do not emit notifications) for several
      reasons:
      - Multiple RTM_NEWROUTE notification per-route might confuse existing
        routing daemons.
      - Convergence reasons in routing daemons.
      - The extra notifications will negatively impact the insertion rate.
      - Not all users are interested in these notifications.
      
      Move fib6_info_hw_flags_set() to C file because it is no longer a short
      function.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      907eea48
    • A
      net: Pass 'net' struct as first argument to fib6_info_hw_flags_set() · fbaca8f8
      Amit Cohen 提交于
      The next patch will emit notification when hardware flags are changed,
      in case that fib_notify_on_flag_change sysctl is set to 1.
      
      To know sysctl values, net struct is needed.
      This change is consistent with the IPv4 version, which gets 'net' struct
      as its first argument.
      
      Currently, the only callers of this function are mlxsw and netdevsim.
      Patch the callers to pass net.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      fbaca8f8
    • A
      net: ipv4: Emit notification when fib hardware flags are changed · 680aea08
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel,
      but not necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      It is also possible for a route already installed in hardware to change
      its action and therefore its flags. For example, a host route that is
      trapping packets can be "promoted" to perform decapsulation following
      the installation of an IPinIP/VXLAN tunnel.
      
      Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed. The aim is to provide an indication to user-space
      (e.g., routing daemons) about the state of the route in hardware.
      
      Introduce a sysctl that controls this behavior.
      
      Keep the default value at 0 (i.e., do not emit notifications) for several
      reasons:
      - Multiple RTM_NEWROUTE notification per-route might confuse existing
        routing daemons.
      - Convergence reasons in routing daemons.
      - The extra notifications will negatively impact the insertion rate.
      - Not all users are interested in these notifications.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Acked-by: NRoopa Prabhu <roopa@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      680aea08
  6. 02 2月, 2021 4 次提交
  7. 30 1月, 2021 6 次提交
    • N
      tcp: shrink inet_connection_sock icsk_mtup enabled and probe_size · 14e8e0f6
      Neal Cardwell 提交于
      This commit shrinks inet_connection_sock by 4 bytes, by shrinking
      icsk_mtup.enabled from 32 bits to 1 bit, and shrinking
      icsk_mtup.probe_size from s32 to an unsuigned 31 bit field.
      
      This is to save space to compensate for the recent introduction of a
      new u32 in inet_connection_sock, icsk_probes_tstamp, in the recent bug
      fix commit 9d9b1ee0 ("tcp: fix TCP_USER_TIMEOUT with zero window").
      
      This should not change functionality, since icsk_mtup.enabled is only
      ever set to 0 or 1, and icsk_mtup.probe_size can only be either 0
      or a positive MTU value returned by tcp_mss_to_mtu()
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20210129185438.1813237-1-ncardwell.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      14e8e0f6
    • V
      net: dsa: add a second tagger for Ocelot switches based on tag_8021q · 7c83a7c5
      Vladimir Oltean 提交于
      There are use cases for which the existing tagger, based on the NPI
      (Node Processor Interface) functionality, is insufficient.
      
      Namely:
      - Frames injected through the NPI port bypass the frame analyzer, so no
        source address learning is performed, no TSN stream classification,
        etc.
      - Flow control is not functional over an NPI port (PAUSE frames are
        encapsulated in the same Extraction Frame Header as all other frames)
      - There can be at most one NPI port configured for an Ocelot switch. But
        in NXP LS1028A and T1040 there are two Ethernet CPU ports. The non-NPI
        port is currently either disabled, or operated as a plain user port
        (albeit an internally-facing one). Having the ability to configure the
        two CPU ports symmetrically could pave the way for e.g. creating a LAG
        between them, to increase bandwidth seamlessly for the system.
      
      So there is a desire to have an alternative to the NPI mode. This change
      keeps the default tagger for the Seville and Felix switches as "ocelot",
      but it can be changed via the following device attribute:
      
      echo ocelot-8021q > /sys/class/<dsa-master>/dsa/tagging
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7c83a7c5
    • V
      net: dsa: allow changing the tag protocol via the "tagging" device attribute · 53da0eba
      Vladimir Oltean 提交于
      Currently DSA exposes the following sysfs:
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      
      which is a read-only device attribute, introduced in the kernel as
      commit 98cdb480 ("net: dsa: Expose tagging protocol to user-space"),
      and used by libpcap since its commit 993db3800d7d ("Add support for DSA
      link-layer types").
      
      It would be nice if we could extend this device attribute by making it
      writable:
      $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
      
      This is useful with DSA switches that can make use of more than one
      tagging protocol. It may be useful in dsa_loop in the future too, to
      perform offline testing of various taggers, or for changing between dsa
      and edsa on Marvell switches, if that is desirable.
      
      In terms of implementation, drivers can support this feature by
      implementing .change_tag_protocol, which should always leave the switch
      in a consistent state: either with the new protocol if things went well,
      or with the old one if something failed. Teardown of the old protocol,
      if necessary, must be handled by the driver.
      
      Some things remain as before:
      - The .get_tag_protocol is currently only called at probe time, to load
        the initial tagging protocol driver. Nonetheless, new drivers should
        report the tagging protocol in current use now.
      - The driver should manage by itself the initial setup of tagging
        protocol, no later than the .setup() method, as well as destroying
        resources used by the last tagger in use, no earlier than the
        .teardown() method.
      
      For multi-switch DSA trees, error handling is a bit more complicated,
      since e.g. the 5th out of 7 switches may fail to change the tag
      protocol. When that happens, a revert to the original tag protocol is
      attempted, but that may fail too, leaving the tree in an inconsistent
      state despite each individual switch implementing .change_tag_protocol
      transactionally. Since the intersection between drivers that implement
      .change_tag_protocol and drivers that support D in DSA is currently the
      empty set, the possibility for this error to happen is ignored for now.
      
      Testing:
      
      $ insmod mscc_felix.ko
      [   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
      [   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
      $ insmod tag_ocelot.ko
      $ rmmod mscc_felix.ko
      $ insmod mscc_felix.ko
      [   97.261724] libphy: VSC9959 internal MDIO bus: probed
      [   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
      [   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
      [   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
      [   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
      [   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
      [   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
      [   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
      [   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
      [   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
      [   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
      [   98.527948] device eno2 entered promiscuous mode
      [   98.544755] DSA: tree 0 setup
      
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
      ^C
       -  10.0.0.1 ping statistics  -
      2 packets transmitted, 2 packets received, 0% packet loss
      round-trip min/avg/max = 0.754/1.545/2.337 ms
      
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      $ cat ./test_ocelot_8021q.sh
              #!/bin/bash
      
              ip link set swp0 down
              ip link set swp1 down
              ip link set swp2 down
              ip link set swp3 down
              ip link set swp5 down
              ip link set eno2 down
              echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
              ip link set eno2 up
              ip link set swp0 up
              ip link set swp1 up
              ip link set swp2 up
              ip link set swp3 up
              ip link set swp5 up
      $ ./test_ocelot_8021q.sh
      ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
      $ rmmod tag_ocelot.ko
      rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
      $ insmod tag_ocelot_8021q.ko
      $ ./test_ocelot_8021q.sh
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot-8021q
      $ rmmod tag_ocelot.ko
      $ rmmod tag_ocelot_8021q.ko
      rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
      64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
      $ rmmod mscc_felix.ko
      [  645.544426] mscc_felix 0000:00:00.5: Link is Down
      [  645.838608] DSA: tree 0 torn down
      $ rmmod tag_ocelot_8021q.ko
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      53da0eba
    • V
      net: dsa: keep a copy of the tagging protocol in the DSA switch tree · 357f203b
      Vladimir Oltean 提交于
      Cascading DSA switches can be done multiple ways. There is the brute
      force approach / tag stacking, where one upstream switch, located
      between leaf switches and the host Ethernet controller, will just
      happily transport the DSA header of those leaf switches as payload.
      For this kind of setups, DSA works without any special kind of treatment
      compared to a single switch - they just aren't aware of each other.
      Then there's the approach where the upstream switch understands the tags
      it transports from its leaves below, as it doesn't push a tag of its own,
      but it routes based on the source port & switch id information present
      in that tag (as opposed to DMAC & VID) and it strips the tag when
      egressing a front-facing port. Currently only Marvell implements the
      latter, and Marvell DSA trees contain only Marvell switches.
      
      So it is safe to say that DSA trees already have a single tag protocol
      shared by all switches, and in fact this is what makes the switches able
      to understand each other. This fact is also implied by the fact that
      currently, the tagging protocol is reported as part of a sysfs installed
      on the DSA master and not per port, so it must be the same for all the
      ports connected to that DSA master regardless of the switch that they
      belong to.
      
      It's time to make this official and enforce it (yes, this also means we
      won't have any "switch understands tag to some extent but is not able to
      speak it" hardware oddities that we'll support in the future).
      
      This is needed due to the imminent introduction of the dsa_switch_ops::
      change_tag_protocol driver API. When that is introduced, we'll have
      to notify switches of the tagging protocol that they're configured to
      use. Currently the tag_ops structure pointer is held only for CPU ports.
      But there are switches which don't have CPU ports and nonetheless still
      need to be configured. These would be Marvell leaf switches whose
      upstream port is just a DSA link. How do we inform these of their
      tagging protocol setup/deletion?
      
      One answer to the above would be: iterate through the DSA switch tree's
      ports once, list the CPU ports, get their tag_ops, then iterate again
      now that we have it, and notify everybody of that tag_ops. But what to
      do if conflicts appear between one cpu_dp->tag_ops and another? There's
      no escaping the fact that conflict resolution needs to be done, so we
      can be upfront about it.
      
      Ease our work and just keep the master copy of the tag_ops inside the
      struct dsa_switch_tree. Reference counting is now moved to be per-tree
      too, instead of per-CPU port.
      
      There are many places in the data path that access master->dsa_ptr->tag_ops
      and we would introduce unnecessary performance penalty going through yet
      another indirection, so keep those right where they are.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      357f203b
    • X
      ip_gre: add csum offload support for gre header · efa1a65c
      Xin Long 提交于
      This patch is to add csum offload support for gre header:
      
      On the TX path in gre_build_header(), when CHECKSUM_PARTIAL's set
      for inner proto, it will calculate the csum for outer proto, and
      inner csum will be offloaded later. Otherwise, CHECKSUM_PARTIAL
      and csum_start/offset will be set for outer proto, and the outer
      csum will be offloaded later.
      
      On the GSO path in gre_gso_segment(), when CHECKSUM_PARTIAL is
      not set for inner proto and the hardware supports csum offload,
      CHECKSUM_PARTIAL and csum_start/offset will be set for outer
      proto, and outer csum will be offloaded later. Otherwise, it
      will do csum for outer proto by calling gso_make_checksum().
      
      Note that SCTP has to do the csum by itself for non GSO path in
      sctp_packet_pack(), as gre_build_header() can't handle the csum
      with CHECKSUM_PARTIAL set for SCTP CRC csum offload.
      
      v1->v2:
        - remove the SCTP part, as GRE dev doesn't support SCTP CRC CSUM
          and it will always do checksum for SCTP in sctp_packet_pack()
          when it's not a GSO packet.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      efa1a65c
    • P
      net: flow_offload: Add original direction flag to ct_metadata · 941eff5a
      Paul Blakey 提交于
      Give offloading drivers the direction of the offloaded ct flow,
      this will be used for matches on direction (ct_state +/-rpl).
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      941eff5a
  8. 29 1月, 2021 5 次提交
  9. 28 1月, 2021 4 次提交
  10. 27 1月, 2021 2 次提交
    • P
      net: allow user to set metric on default route learned via Router Advertisement · 6b2e04bc
      Praveen Chaudhary 提交于
      For IPv4, default route is learned via DHCPv4 and user is allowed to change
      metric using config etc/network/interfaces. But for IPv6, default route can
      be learned via RA, for which, currently a fixed metric value 1024 is used.
      
      Ideally, user should be able to configure metric on default route for IPv6
      similar to IPv4. This patch adds sysctl for the same.
      
      Logs:
      
      For IPv4:
      
      Config in etc/network/interfaces:
      auto eth0
      iface eth0 inet dhcp
          metric 4261413864
      
      IPv4 Kernel Route Table:
      $ ip route list
      default via 172.21.47.1 dev eth0 metric 4261413864
      
      FRR Table, if a static route is configured:
      [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.]
      Codes: K - kernel route, C - connected, S - static, R - RIP,
             O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
             T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
             > - selected route, * - FIB route
      
      S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03
      K   0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m
      
      i.e. User can prefer Default Router learned via Routing Protocol in IPv4.
      Similar behavior is not possible for IPv6, without this fix.
      
      After fix [for IPv6]:
      sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705
      
      IP monitor: [When IPv6 RA is received]
      default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705  pref high
      
      Kernel IPv6 routing table
      $ ip -6 route list
      default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high
      
      FRR Table, if a static route is configured:
      [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.]
      Codes: K - kernel route, C - connected, S - static, R - RIPng,
             O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
             v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
             > - selected route, * - FIB route
      
      S>* ::/0 [20/0] is directly connected, eth0, 00:00:06
      K   ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m
      
      If the metric is changed later, the effect will be seen only when next IPv6
      RA is received, because the default route must be fully controlled by RA msg.
      Below metric is changed from 1996489705 to 1996489704.
      
      $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704
      net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704
      
      IP monitor:
      [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric]
      
      Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high
      default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high
      Signed-off-by: NPraveen Chaudhary <pchaudhary@linkedin.com>
      Signed-off-by: NZhenggen Xu <zxu@linkedin.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      6b2e04bc
    • X
      net: lapb: Add locking to the lapb module · b491e6a7
      Xie He 提交于
      In the lapb module, the timers may run concurrently with other code in
      this module, and there is currently no locking to prevent the code from
      racing on "struct lapb_cb". This patch adds locking to prevent racing.
      
      1. Add "spinlock_t lock" to "struct lapb_cb"; Add "spin_lock_bh" and
      "spin_unlock_bh" to APIs, timer functions and notifier functions.
      
      2. Add "bool t1timer_stop, t2timer_stop" to "struct lapb_cb" to make us
      able to ask running timers to abort; Modify "lapb_stop_t1timer" and
      "lapb_stop_t2timer" to make them able to abort running timers;
      Modify "lapb_t2timer_expiry" and "lapb_t1timer_expiry" to make them
      abort after they are stopped by "lapb_stop_t1timer", "lapb_stop_t2timer",
      and "lapb_start_t1timer", "lapb_start_t2timer".
      
      3. Let lapb_unregister wait for other API functions and running timers
      to stop.
      
      4. The lapb_device_event function calls lapb_disconnect_request. In
      order to avoid trying to hold the lock twice, add a new function named
      "__lapb_disconnect_request" which assumes the lock is held, and make
      it called by lapb_disconnect_request and lapb_device_event.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: NXie He <xie.he.0141@gmail.com>
      Link: https://lore.kernel.org/r/20210126040939.69995-1-xie.he.0141@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b491e6a7