1. 09 2月, 2022 1 次提交
  2. 11 12月, 2021 1 次提交
  3. 22 6月, 2021 1 次提交
    • A
      xfrm: Fix xfrm offload fallback fail case · dd72fadf
      Ayush Sawal 提交于
      In case of xfrm offload, if xdo_dev_state_add() of driver returns
      -EOPNOTSUPP, xfrm offload fallback is failed.
      In xfrm state_add() both xso->dev and xso->real_dev are initialized to
      dev and when err(-EOPNOTSUPP) is returned only xso->dev is set to null.
      
      So in this scenario the condition in func validate_xmit_xfrm(),
      if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
                      return skb;
      returns true, due to which skb is returned without calling esp_xmit()
      below which has fallback code. Hence the CRYPTO_FALLBACK is failing.
      
      So fixing this with by keeping x->xso.real_dev as NULL when err is
      returned in func xfrm_dev_state_add().
      
      Fixes: bdfd2d1f ("bonding/xfrm: use real_dev instead of slave_dev")
      Signed-off-by: NAyush Sawal <ayush.sawal@chelsio.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      dd72fadf
  4. 29 3月, 2021 1 次提交
  5. 24 6月, 2020 1 次提交
    • J
      bonding/xfrm: use real_dev instead of slave_dev · bdfd2d1f
      Jarod Wilson 提交于
      Rather than requiring every hw crypto capable NIC driver to do a check for
      slave_dev being set, set real_dev in the xfrm layer and xso init time, and
      then override it in the bonding driver as needed. Then NIC drivers can
      always use real_dev, and at the same time, we eliminate the use of a
      variable name that probably shouldn't have been used in the first place,
      particularly given recent current events.
      
      CC: Boris Pismenny <borisp@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Leon Romanovsky <leon@kernel.org>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      Suggested-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdfd2d1f
  6. 23 6月, 2020 1 次提交
    • J
      xfrm: bail early on slave pass over skb · 272c2330
      Jarod Wilson 提交于
      This is prep work for initial support of bonding hardware encryption
      pass-through support. The bonding driver will fill in the slave_dev
      pointer, and we use that to know not to skb_push() again on a given
      skb that was already processed on the bond device.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      272c2330
  7. 04 6月, 2020 1 次提交
  8. 15 4月, 2020 1 次提交
    • X
      xfrm: do pskb_pull properly in __xfrm_transport_prep · 06a0afcf
      Xin Long 提交于
      For transport mode, when ipv6 nexthdr is set, the packet format might
      be like:
      
          ----------------------------------------------------
          |        | dest |     |     |      |  ESP    | ESP |
          | IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV |
          ----------------------------------------------------
      
      and in __xfrm_transport_prep():
      
        pskb_pull(skb, skb->mac_len + sizeof(ip6hdr) + x->props.header_len);
      
      it will pull the data pointer to the wrong position, as it missed the
      nexthdrs/dest opts.
      
      This patch is to fix it by using:
      
        pskb_pull(skb, skb_transport_offset(skb) + x->props.header_len);
      
      as we can be sure transport_header points to ESP header at that moment.
      
      It also fixes a panic when packets with ipv6 nexthdr are sent over
      esp6 transport mode:
      
        [  100.473845] kernel BUG at net/core/skbuff.c:4325!
        [  100.478517] RIP: 0010:__skb_to_sgvec+0x252/0x260
        [  100.494355] Call Trace:
        [  100.494829]  skb_to_sgvec+0x11/0x40
        [  100.495492]  esp6_output_tail+0x12e/0x550 [esp6]
        [  100.496358]  esp6_xmit+0x1d5/0x260 [esp6_offload]
        [  100.498029]  validate_xmit_xfrm+0x22f/0x2e0
        [  100.499604]  __dev_queue_xmit+0x589/0x910
        [  100.502928]  ip6_finish_output2+0x2a5/0x5a0
        [  100.503718]  ip6_output+0x6c/0x120
        [  100.505198]  xfrm_output_resume+0x4bf/0x530
        [  100.508683]  xfrm6_output+0x3a/0xc0
        [  100.513446]  inet6_csk_xmit+0xa1/0xf0
        [  100.517335]  tcp_sendmsg+0x27/0x40
        [  100.517977]  sock_sendmsg+0x3e/0x60
        [  100.518648]  __sys_sendto+0xee/0x160
      
      Fixes: c35fe410 ("xfrm: Add mode handlers for IPsec on layer 2")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      06a0afcf
  9. 26 3月, 2020 1 次提交
    • X
      xfrm: add prep for esp beet mode offload · 30849175
      Xin Long 提交于
      Like __xfrm_transport/mode_tunnel_prep(), this patch is to add
      __xfrm_mode_beet_prep() to fix the transport_header for gso
      segments, and reset skb mac_len, and pull skb data to the
      proto inside esp.
      
      This patch also fixes a panic, reported by ltp:
      
        # modprobe esp4_offload
        # runltp -f net_stress.ipsec_tcp
      
        [ 2452.780511] kernel BUG at net/core/skbuff.c:109!
        [ 2452.799851] Call Trace:
        [ 2452.800298]  <IRQ>
        [ 2452.800705]  skb_push.cold.98+0x14/0x20
        [ 2452.801396]  esp_xmit+0x17b/0x270 [esp4_offload]
        [ 2452.802799]  validate_xmit_xfrm+0x22f/0x2e0
        [ 2452.804285]  __dev_queue_xmit+0x589/0x910
        [ 2452.806264]  __neigh_update+0x3d7/0xa50
        [ 2452.806958]  arp_process+0x259/0x810
        [ 2452.807589]  arp_rcv+0x18a/0x1c
      
      It was caused by the skb going to esp_xmit with a wrong transport
      header.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      30849175
  10. 04 3月, 2020 1 次提交
    • X
      esp: remove the skb from the chain when it's enqueued in cryptd_wq · d1d17a35
      Xin Long 提交于
      Xiumei found a panic in esp offload:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
        RIP: 0010:esp_output_done+0x101/0x160 [esp4]
        Call Trace:
         ? esp_output+0x180/0x180 [esp4]
         cryptd_aead_crypt+0x4c/0x90
         cryptd_queue_worker+0x6e/0xa0
         process_one_work+0x1a7/0x3b0
         worker_thread+0x30/0x390
         ? create_worker+0x1a0/0x1a0
         kthread+0x112/0x130
         ? kthread_flush_work_fn+0x10/0x10
         ret_from_fork+0x35/0x40
      
      It was caused by that skb secpath is used in esp_output_done() after it's
      been released elsewhere.
      
      The tx path for esp offload is:
      
        __dev_queue_xmit()->
          validate_xmit_skb_list()->
            validate_xmit_xfrm()->
              esp_xmit()->
                esp_output_tail()->
                  aead_request_set_callback(esp_output_done) <--[1]
                  crypto_aead_encrypt()  <--[2]
      
      In [1], .callback is set, and in [2] it will trigger the worker schedule,
      later on a kernel thread will call .callback(esp_output_done), as the call
      trace shows.
      
      But in validate_xmit_xfrm():
      
        skb_list_walk_safe(skb, skb2, nskb) {
          ...
          err = x->type_offload->xmit(x, skb2, esp_features);  [esp_xmit]
          ...
        }
      
      When the err is -EINPROGRESS, which means this skb2 will be enqueued and
      later gets encrypted and sent out by .callback later in a kernel thread,
      skb2 should be removed fromt skb chain. Otherwise, it will get processed
      again outside validate_xmit_xfrm(), which could release skb secpath, and
      cause the panic above.
      
      This patch is to remove the skb from the chain when it's enqueued in
      cryptd_wq. While at it, remove the unnecessary 'if (!skb)' check.
      
      Fixes: 3dca3f38 ("xfrm: Separate ESP handling from segmentation for GRO packets.")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      d1d17a35
  11. 04 2月, 2020 1 次提交
  12. 15 1月, 2020 1 次提交
  13. 01 7月, 2019 1 次提交
  14. 31 5月, 2019 1 次提交
  15. 08 4月, 2019 2 次提交
    • F
      xfrm: store xfrm_mode directly, not its address · c9500d7b
      Florian Westphal 提交于
      This structure is now only 4 bytes, so its more efficient
      to cache a copy rather than its address.
      
      No significant size difference in allmodconfig vmlinux.
      
      With non-modular kernel that has all XFRM options enabled, this
      series reduces vmlinux image size by ~11kb. All xfrm_mode
      indirections are gone and all modes are built-in.
      
      before (ipsec-next master):
          text      data      bss         dec   filename
      21071494   7233140 11104324    39408958   vmlinux.master
      
      after this series:
      21066448   7226772 11104324    39397544   vmlinux.patched
      
      With allmodconfig kernel, the size increase is only 362 bytes,
      even all the xfrm config options removed in this series are
      modular.
      
      before:
          text      data     bss      dec   filename
      15731286   6936912 4046908 26715106   vmlinux.master
      
      after this series:
      15731492   6937068  4046908  26715468 vmlinux
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c9500d7b
    • F
      xfrm: remove xmit indirection from xfrm_mode · 303c5fab
      Florian Westphal 提交于
      There are only two versions (tunnel and transport). The ip/ipv6 versions
      are only differ in sizeof(iphdr) vs ipv6hdr.
      
      Place this in the core and use x->outer_mode->encap type to call the
      correct adjustment helper.
      
      Before:
         text   data    bss     dec      filename
      15730311  6937008 4046908 26714227 vmlinux
      
      After:
      15730428  6937008 4046908 26714344 vmlinux
      
      (about 117 byte increase)
      
      v2: use family from x->outer_mode, not inner
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      303c5fab
  16. 24 3月, 2019 1 次提交
  17. 21 3月, 2019 1 次提交
  18. 20 12月, 2018 1 次提交
  19. 11 9月, 2018 1 次提交
  20. 29 8月, 2018 1 次提交
    • S
      xfrm: allow driver to quietly refuse offload · 4a132095
      Shannon Nelson 提交于
      If the "offload" attribute is used to create an IPsec SA
      and the .xdo_dev_state_add() fails, the SA creation fails.
      However, if the "offload" attribute is used on a device that
      doesn't offer it, the attribute is quietly ignored and the SA
      is created without an offload.
      
      Along the same line of that second case, it would be good to
      have a way for the device to refuse to offload an SA without
      failing the whole SA creation.  This patch adds that feature
      by allowing the driver to return -EOPNOTSUPP as a signal that
      the SA may be fine, it just can't be offloaded.
      
      This allows the user a little more flexibility in requesting
      offloads and not needing to know every detail at all times about
      each specific NIC when trying to create SAs.
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4a132095
  21. 19 7月, 2018 1 次提交
  22. 25 6月, 2018 1 次提交
    • F
      xfrm: policy: remove pcpu policy cache · e4db5b61
      Florian Westphal 提交于
      Kristian Evensen says:
        In a project I am involved in, we are running ipsec (Strongswan) on
        different mt7621-based routers. Each router is configured as an
        initiator and has around ~30 tunnels to different responders (running
        on misc. devices). Before the flow cache was removed (kernel 4.9), we
        got a combined throughput of around 70Mbit/s for all tunnels on one
        router. However, we recently switched to kernel 4.14 (4.14.48), and
        the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
        drop of around 20%. Reverting the flow cache removal restores, as
        expected, performance levels to that of kernel 4.9.
      
      When pcpu xdst exists, it has to be validated first before it can be
      used.
      
      A negative hit thus increases cost vs. no-cache.
      
      As number of tunnels increases, hit rate decreases so this pcpu caching
      isn't a viable strategy.
      
      Furthermore, the xdst cache also needs to run with BH off, so when
      removing this the bh disable/enable pairs can be removed too.
      
      Kristian tested a 4.14.y backport of this change and reported
      increased performance:
      
        In our tests, the throughput reduction has been reduced from around -20%
        to -5%. We also see that the overall throughput is independent of the
        number of tunnels, while before the throughput was reduced as the number
        of tunnels increased.
      Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e4db5b61
  23. 23 6月, 2018 1 次提交
  24. 30 3月, 2018 1 次提交
  25. 05 3月, 2018 1 次提交
  26. 19 1月, 2018 1 次提交
  27. 18 1月, 2018 1 次提交
  28. 21 12月, 2017 1 次提交
    • S
      xfrm: check for xdo_dev_ops add and delete · 92a23206
      Shannon Nelson 提交于
      This adds a check for the required add and delete functions up front
      at registration time to be sure both are defined.
      
      Since both the features check and the registration check are looking
      at the same things, break out the check for both to call.
      
      Lastly, for some reason the feature check was setting xfrmdev_ops to
      NULL if the NETIF_F_HW_ESP bit was missing, which would probably
      surprise the driver later if the driver turned its NETIF_F_HW_ESP bit
      back on.  We shouldn't be messing with the driver's callback list, so
      we stop doing that with this patch.
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      92a23206
  29. 20 12月, 2017 3 次提交
  30. 01 12月, 2017 1 次提交
    • Y
      xfrm: Fix xfrm_dev_state_add to fail for unsupported HW SA option · 43024b9c
      Yossef Efraim 提交于
      xfrm_dev_state_add function returns success for unsupported HW SA options.
      Resulting the calling function to create SW SA without corrlating HW SA.
      Desipte IPSec device offloading option was chosen.
      These not supported HW SA options are hard coded within xfrm_dev_state_add
      function.
      SW backward compatibility will break if we add any of these option as old
      HW will fail with new SW.
      
      This patch changes the behaviour to return -EINVAL in case unsupported
      option is chosen.
      Notifying user application regarding failure and not breaking backward
      compatibility for newly added HW SA options.
      Signed-off-by: NYossef Efraim <yossefe@mellanox.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      43024b9c
  31. 30 11月, 2017 2 次提交
    • D
      xfrm: Move dst->path into struct xfrm_dst · 0f6c480f
      David Miller 提交于
      The first member of an IPSEC route bundle chain sets it's dst->path to
      the underlying ipv4/ipv6 route that carries the bundle.
      
      Stated another way, if one were to follow the xfrm_dst->child chain of
      the bundle, the final non-NULL pointer would be the path and point to
      either an ipv4 or an ipv6 route.
      
      This is largely used to make sure that PMTU events propagate down to
      the correct ipv4 or ipv6 route.
      
      When we don't have the top of an IPSEC bundle 'dst->path == dst'.
      
      Move it down into xfrm_dst and key off of dst->xfrm.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      0f6c480f
    • D
      xfrm: Move child route linkage into xfrm_dst. · b6ca8bd5
      David Miller 提交于
      XFRM bundle child chains look like this:
      
      	xdst1 --> xdst2 --> xdst3 --> path_dst
      
      All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
      The final child pointer in the chain, here called 'path_dst', is some
      other kind of route such as an ipv4 or ipv6 one.
      
      The xfrm output path pops routes, one at a time, via the child
      pointer, until we hit one which has a dst->xfrm pointer which
      is NULL.
      
      We can easily preserve the above mechanisms with child sitting
      only in the xfrm_dst structure.  All children in the chain
      before we break out of the xfrm_output() loop have dst->xfrm
      non-NULL and are therefore xfrm_dst objects.
      
      Since we break out of the loop when we find dst->xfrm NULL, we
      will not try to dereference 'dst' as if it were an xfrm_dst.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ca8bd5
  32. 11 9月, 2017 1 次提交
  33. 11 8月, 2017 1 次提交
    • L
      net: xfrm: support setting an output mark. · 077fbac4
      Lorenzo Colitti 提交于
      On systems that use mark-based routing it may be necessary for
      routing lookups to use marks in order for packets to be routed
      correctly. An example of such a system is Android, which uses
      socket marks to route packets via different networks.
      
      Currently, routing lookups in tunnel mode always use a mark of
      zero, making routing incorrect on such systems.
      
      This patch adds a new output_mark element to the xfrm state and
      a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
      mark differs from the existing xfrm mark in two ways:
      
      1. The xfrm mark is used to match xfrm policies and states, while
         the xfrm output mark is used to set the mark (and influence
         the routing) of the packets emitted by those states.
      2. The existing mark is constrained to be a subset of the bits of
         the originating socket or transformed packet, but the output
         mark is arbitrary and depends only on the state.
      
      The use of a separate mark provides additional flexibility. For
      example:
      
      - A packet subject to two transforms (e.g., transport mode inside
        tunnel mode) can have two different output marks applied to it,
        one for the transport mode SA and one for the tunnel mode SA.
      - On a system where socket marks determine routing, the packets
        emitted by an IPsec tunnel can be routed based on a mark that
        is determined by the tunnel, not by the marks of the
        unencrypted packets.
      - Support for setting the output marks can be introduced without
        breaking any existing setups that employ both mark-based
        routing and xfrm tunnel mode. Simply changing the code to use
        the xfrm mark for routing output packets could xfrm mark could
        change behaviour in a way that breaks these setups.
      
      If the output mark is unspecified or set to zero, the mark is not
      set or changed.
      
      Tested: make allyesconfig; make -j64
      Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      077fbac4
  34. 02 8月, 2017 1 次提交
  35. 19 7月, 2017 2 次提交
    • F
      xfrm: add xdst pcpu cache · ec30d78c
      Florian Westphal 提交于
      retain last used xfrm_dst in a pcpu cache.
      On next request, reuse this dst if the policies are the same.
      
      The cache will not help with strict RR workloads as there is no hit.
      
      The cache packet-path part is reasonably small, the notifier part is
      needed so we do not add long hangs when a device is dismantled but some
      pcpu xdst still holds a reference, there are also calls to the flush
      operation when userspace deletes SAs so modules can be removed
      (there is no hit.
      
      We need to run the dst_release on the correct cpu to avoid races with
      packet path.  This is done by adding a work_struct for each cpu and then
      doing the actual test/release on each affected cpu via schedule_work_on().
      
      Test results using 4 network namespaces and null encryption:
      
      ns1           ns2          -> ns3           -> ns4
      netperf -> xfrm/null enc   -> xfrm/null dec -> netserver
      
      what                    TCP_STREAM      UDP_STREAM      UDP_RR
      Flow cache:             14644.61        294.35          327231.64
      No flow cache:		14349.81	242.64		202301.72
      Pcpu cache:		14629.70	292.21		205595.22
      
      UDP tests used 64byte packets, tests ran for one minute each,
      value is average over ten iterations.
      
      'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
      series but without this patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec30d78c
    • F
      xfrm: remove flow cache · 09c75704
      Florian Westphal 提交于
      After rcu conversions performance degradation in forward tests isn't that
      noticeable anymore.
      
      See next patch for some numbers.
      
      A followup patcg could then also remove genid from the policies
      as we do not cache bundles anymore.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c75704