提交 · 7c76ecd9c99b6e9a771d813ab1aa7fa428b3ade1 · openeuler / Kernel

09 2月, 2022 1 次提交

xfrm: enforce validity of offload input flags · 7c76ecd9

由 Leon Romanovsky 提交于 2月 08, 2022

struct xfrm_user_offload has flags variable that received user input,
but kernel didn't check if valid bits were provided. It caused a situation
where not sanitized input was forwarded directly to the drivers.

For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
strongswan, but not implemented in the kernel at all.

As a solution, check and sanitize input flags to forward
XFRM_OFFLOAD_INBOUND to the drivers.

Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

7c76ecd9

11 12月, 2021 1 次提交

xfrm: add net device refcount tracker to struct xfrm_state_offload · e1b539bd

由 Eric Dumazet 提交于 12月 09, 2021

Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Link: https://lore.kernel.org/r/20211209154451.4184050-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e1b539bd

22 6月, 2021 1 次提交

xfrm: Fix xfrm offload fallback fail case · dd72fadf

由 Ayush Sawal 提交于 6月 22, 2021

In case of xfrm offload, if xdo_dev_state_add() of driver returns
-EOPNOTSUPP, xfrm offload fallback is failed.
In xfrm state_add() both xso->dev and xso->real_dev are initialized to
dev and when err(-EOPNOTSUPP) is returned only xso->dev is set to null.

So in this scenario the condition in func validate_xmit_xfrm(),
if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
                return skb;
returns true, due to which skb is returned without calling esp_xmit()
below which has fallback code. Hence the CRYPTO_FALLBACK is failing.

So fixing this with by keeping x->xso.real_dev as NULL when err is
returned in func xfrm_dev_state_add().

Fixes: bdfd2d1f ("bonding/xfrm: use real_dev instead of slave_dev")
Signed-off-by: NAyush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

dd72fadf

29 3月, 2021 1 次提交

xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets · c7dbf4c0

由 Steffen Klassert 提交于 3月 26, 2021

Commit 94579ac3 ("xfrm: Fix double ESP trailer insertion in IPsec
crypto offload.") added a XFRM_XMIT flag to avoid duplicate ESP trailer
insertion on HW offload. This flag is set on the secpath that is shared
amongst segments. This lead to a situation where some segments are
not transformed correctly when segmentation happens at layer 3.

Fix this by using private skb extensions for segmented and hw offloaded
ESP packets.

Fixes: 94579ac3 ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.")
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

c7dbf4c0

24 6月, 2020 1 次提交

bonding/xfrm: use real_dev instead of slave_dev · bdfd2d1f

由 Jarod Wilson 提交于 6月 23, 2020

Rather than requiring every hw crypto capable NIC driver to do a check for
slave_dev being set, set real_dev in the xfrm layer and xso init time, and
then override it in the bonding driver as needed. Then NIC drivers can
always use real_dev, and at the same time, we eliminate the use of a
variable name that probably shouldn't have been used in the first place,
particularly given recent current events.

CC: Boris Pismenny <borisp@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
Suggested-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdfd2d1f

23 6月, 2020 1 次提交

xfrm: bail early on slave pass over skb · 272c2330

由 Jarod Wilson 提交于 6月 19, 2020

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

272c2330

04 6月, 2020 1 次提交

xfrm: Fix double ESP trailer insertion in IPsec crypto offload. · 94579ac3

由 Huy Nguyen 提交于 6月 01, 2020

During IPsec performance testing, we see bad ICMP checksum. The error packet
has duplicated ESP trailer due to double validate_xmit_xfrm calls. The first call
is from ip_output, but the packet cannot be sent because
netif_xmit_frozen_or_stopped is true and the packet gets dev_requeue_skb. The second
call is from NET_TX softirq. However after the first call, the packet already
has the ESP trailer.

Fix by marking the skb with XFRM_XMIT bit after the packet is handled by
validate_xmit_xfrm to avoid duplicate ESP trailer insertion.

Fixes: f6e27114 ("net: Add a xfrm validate function to validate_xmit_skb")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Reviewed-by: NRaed Salem <raeds@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

94579ac3

15 4月, 2020 1 次提交

xfrm: do pskb_pull properly in __xfrm_transport_prep · 06a0afcf

由 Xin Long 提交于 4月 10, 2020

For transport mode, when ipv6 nexthdr is set, the packet format might
be like:

    ----------------------------------------------------
    |        | dest |     |     |      |  ESP    | ESP |
    | IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV |
    ----------------------------------------------------

and in __xfrm_transport_prep():

  pskb_pull(skb, skb->mac_len + sizeof(ip6hdr) + x->props.header_len);

it will pull the data pointer to the wrong position, as it missed the
nexthdrs/dest opts.

This patch is to fix it by using:

  pskb_pull(skb, skb_transport_offset(skb) + x->props.header_len);

as we can be sure transport_header points to ESP header at that moment.

It also fixes a panic when packets with ipv6 nexthdr are sent over
esp6 transport mode:

  [  100.473845] kernel BUG at net/core/skbuff.c:4325!
  [  100.478517] RIP: 0010:__skb_to_sgvec+0x252/0x260
  [  100.494355] Call Trace:
  [  100.494829]  skb_to_sgvec+0x11/0x40
  [  100.495492]  esp6_output_tail+0x12e/0x550 [esp6]
  [  100.496358]  esp6_xmit+0x1d5/0x260 [esp6_offload]
  [  100.498029]  validate_xmit_xfrm+0x22f/0x2e0
  [  100.499604]  __dev_queue_xmit+0x589/0x910
  [  100.502928]  ip6_finish_output2+0x2a5/0x5a0
  [  100.503718]  ip6_output+0x6c/0x120
  [  100.505198]  xfrm_output_resume+0x4bf/0x530
  [  100.508683]  xfrm6_output+0x3a/0xc0
  [  100.513446]  inet6_csk_xmit+0xa1/0xf0
  [  100.517335]  tcp_sendmsg+0x27/0x40
  [  100.517977]  sock_sendmsg+0x3e/0x60
  [  100.518648]  __sys_sendto+0xee/0x160

Fixes: c35fe410 ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

06a0afcf

26 3月, 2020 1 次提交

xfrm: add prep for esp beet mode offload · 30849175

由 Xin Long 提交于 3月 26, 2020

Like __xfrm_transport/mode_tunnel_prep(), this patch is to add
__xfrm_mode_beet_prep() to fix the transport_header for gso
segments, and reset skb mac_len, and pull skb data to the
proto inside esp.

This patch also fixes a panic, reported by ltp:

  # modprobe esp4_offload
  # runltp -f net_stress.ipsec_tcp

  [ 2452.780511] kernel BUG at net/core/skbuff.c:109!
  [ 2452.799851] Call Trace:
  [ 2452.800298]  <IRQ>
  [ 2452.800705]  skb_push.cold.98+0x14/0x20
  [ 2452.801396]  esp_xmit+0x17b/0x270 [esp4_offload]
  [ 2452.802799]  validate_xmit_xfrm+0x22f/0x2e0
  [ 2452.804285]  __dev_queue_xmit+0x589/0x910
  [ 2452.806264]  __neigh_update+0x3d7/0xa50
  [ 2452.806958]  arp_process+0x259/0x810
  [ 2452.807589]  arp_rcv+0x18a/0x1c

It was caused by the skb going to esp_xmit with a wrong transport
header.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

30849175

04 3月, 2020 1 次提交

esp: remove the skb from the chain when it's enqueued in cryptd_wq · d1d17a35

由 Xin Long 提交于 3月 04, 2020

Xiumei found a panic in esp offload:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  RIP: 0010:esp_output_done+0x101/0x160 [esp4]
  Call Trace:
   ? esp_output+0x180/0x180 [esp4]
   cryptd_aead_crypt+0x4c/0x90
   cryptd_queue_worker+0x6e/0xa0
   process_one_work+0x1a7/0x3b0
   worker_thread+0x30/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x112/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x35/0x40

It was caused by that skb secpath is used in esp_output_done() after it's
been released elsewhere.

The tx path for esp offload is:

  __dev_queue_xmit()->
    validate_xmit_skb_list()->
      validate_xmit_xfrm()->
        esp_xmit()->
          esp_output_tail()->
            aead_request_set_callback(esp_output_done) <--[1]
            crypto_aead_encrypt()  <--[2]

In [1], .callback is set, and in [2] it will trigger the worker schedule,
later on a kernel thread will call .callback(esp_output_done), as the call
trace shows.

But in validate_xmit_xfrm():

  skb_list_walk_safe(skb, skb2, nskb) {
    ...
    err = x->type_offload->xmit(x, skb2, esp_features);  [esp_xmit]
    ...
  }

When the err is -EINPROGRESS, which means this skb2 will be enqueued and
later gets encrypted and sent out by .callback later in a kernel thread,
skb2 should be removed fromt skb chain. Otherwise, it will get processed
again outside validate_xmit_xfrm(), which could release skb secpath, and
cause the panic above.

This patch is to remove the skb from the chain when it's enqueued in
cryptd_wq. While at it, remove the unnecessary 'if (!skb)' check.

Fixes: 3dca3f38 ("xfrm: Separate ESP handling from segmentation for GRO packets.")
Reported-by: NXiumei Mu <xmu@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

d1d17a35

04 2月, 2020 1 次提交

xfrm: handle NETDEV_UNREGISTER for xfrm device · 03891f82

由 Raed Salem 提交于 2月 02, 2020

This patch to handle the asynchronous unregister
device event so the device IPsec offload resources
could be cleanly released.

Fixes: e4db5b61 ("xfrm: policy: remove pcpu policy cache")
Signed-off-by: NRaed Salem <raeds@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

03891f82

15 1月, 2020 1 次提交

net: xfrm: use skb_list_walk_safe helper for gso segments · c3b18e0d

由 Jason A. Donenfeld 提交于 1月 13, 2020

This is converts xfrm segment iteration to use the new function, keeping
the flow of the existing code as intact as possible. One case is very
straight-forward, whereas the other case has some more subtle code that
likes to peak at ->next and relink skbs. By keeping the variables the
same as before, we can upgrade this code with minimal surgery required.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3b18e0d

01 7月, 2019 1 次提交

xfrm: remove get_mtu indirection from xfrm_type · c7b37c76

由 Florian Westphal 提交于 6月 24, 2019

esp4_get_mtu and esp6_get_mtu are exactly the same, the only difference
is a single sizeof() (ipv4 vs. ipv6 header).

Merge both into xfrm_state_mtu() and remove the indirection.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

c7b37c76

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd

由 Thomas Gleixner 提交于 5月 27, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2874c5fd

08 4月, 2019 2 次提交

xfrm: store xfrm_mode directly, not its address · c9500d7b

由 Florian Westphal 提交于 3月 29, 2019

This structure is now only 4 bytes, so its more efficient
to cache a copy rather than its address.

No significant size difference in allmodconfig vmlinux.

With non-modular kernel that has all XFRM options enabled, this
series reduces vmlinux image size by ~11kb. All xfrm_mode
indirections are gone and all modes are built-in.

before (ipsec-next master):
    text      data      bss         dec   filename
21071494   7233140 11104324    39408958   vmlinux.master

after this series:
21066448   7226772 11104324    39397544   vmlinux.patched

With allmodconfig kernel, the size increase is only 362 bytes,
even all the xfrm config options removed in this series are
modular.

before:
    text      data     bss      dec   filename
15731286   6936912 4046908 26715106   vmlinux.master

after this series:
15731492   6937068  4046908  26715468 vmlinux
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

c9500d7b

xfrm: remove xmit indirection from xfrm_mode · 303c5fab

由 Florian Westphal 提交于 3月 29, 2019

There are only two versions (tunnel and transport). The ip/ipv6 versions
are only differ in sizeof(iphdr) vs ipv6hdr.

Place this in the core and use x->outer_mode->encap type to call the
correct adjustment helper.

Before:
   text   data    bss     dec      filename
15730311  6937008 4046908 26714227 vmlinux

After:
15730428  6937008 4046908 26714344 vmlinux

(about 117 byte increase)

v2: use family from x->outer_mode, not inner
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

303c5fab

24 3月, 2019 1 次提交

xfrm: gso partial offload support · 65fd2c2a

由 Boris Pismenny 提交于 3月 21, 2019

This patch introduces support for gso partial ESP offload.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NRaed Salem <raeds@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

65fd2c2a

21 3月, 2019 1 次提交

net: dev: rename queue selection helpers. · 4bd97d51

由 Paolo Abeni 提交于 3月 20, 2019

With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.

Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.
Suggested-by: NEric Dumazet <edumazet@google.com>
Suggested-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bd97d51

20 12月, 2018 1 次提交

net: use skb_sec_path helper in more places · 2294be0f

由 Florian Westphal 提交于 12月 18, 2018

skb_sec_path gains 'const' qualifier to avoid
xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type

same reasoning as previous conversions: Won't need to touch these
spots anymore when skb->sp is removed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2294be0f

11 9月, 2018 1 次提交

net: Add and use skb_mark_not_on_list(). · a8305bff

由 David S. Miller 提交于 7月 29, 2018

An SKB is not on a list if skb->next is NULL.

Codify this convention into a helper function and use it
where we are dequeueing an SKB and need to mark it as such.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8305bff

29 8月, 2018 1 次提交

xfrm: allow driver to quietly refuse offload · 4a132095

由 Shannon Nelson 提交于 8月 22, 2018

If the "offload" attribute is used to create an IPsec SA
and the .xdo_dev_state_add() fails, the SA creation fails.
However, if the "offload" attribute is used on a device that
doesn't offer it, the attribute is quietly ignored and the SA
is created without an offload.

Along the same line of that second case, it would be good to
have a way for the device to refuse to offload an SA without
failing the whole SA creation.  This patch adds that feature
by allowing the driver to return -EOPNOTSUPP as a signal that
the SA may be fine, it just can't be offloaded.

This allows the user a little more flexibility in requesting
offloads and not needing to know every detail at all times about
each specific NIC when trying to create SAs.
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

4a132095

19 7月, 2018 1 次提交

xfrm: don't check offload_handle for nonzero · fcb662de

由 Shannon Nelson 提交于 6月 26, 2018

The offload_handle should be an opaque data cookie for the driver
to use, much like the data cookie for a timer or alarm callback.
Thus, the XFRM stack should not be checking for non-zero, because
the driver might use that to store an array reference, which could
be zero, or some other zero but meaningful value.

We can remove the checks for non-zero because there are plenty
other attributes also being checked to see if there is an offload
in place for the SA in question.
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

fcb662de

25 6月, 2018 1 次提交

xfrm: policy: remove pcpu policy cache · e4db5b61

由 Florian Westphal 提交于 6月 25, 2018

Kristian Evensen says:
  In a project I am involved in, we are running ipsec (Strongswan) on
  different mt7621-based routers. Each router is configured as an
  initiator and has around ~30 tunnels to different responders (running
  on misc. devices). Before the flow cache was removed (kernel 4.9), we
  got a combined throughput of around 70Mbit/s for all tunnels on one
  router. However, we recently switched to kernel 4.14 (4.14.48), and
  the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
  drop of around 20%. Reverting the flow cache removal restores, as
  expected, performance levels to that of kernel 4.9.

When pcpu xdst exists, it has to be validated first before it can be
used.

A negative hit thus increases cost vs. no-cache.

As number of tunnels increases, hit rate decreases so this pcpu caching
isn't a viable strategy.

Furthermore, the xdst cache also needs to run with BH off, so when
removing this the bh disable/enable pairs can be removed too.

Kristian tested a 4.14.y backport of this change and reported
increased performance:

  In our tests, the throughput reduction has been reduced from around -20%
  to -5%. We also see that the overall throughput is independent of the
  number of tunnels, while before the throughput was reduced as the number
  of tunnels increased.
Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

e4db5b61

23 6月, 2018 1 次提交

xfrm: Extend the output_mark to support input direction and masking. · 9b42c1f1

由 Steffen Klassert 提交于 6月 12, 2018

We already support setting an output mark at the xfrm_state,
unfortunately this does not support the input direction and
masking the marks that will be applied to the skb. This change
adds support applying a masked value in both directions.

The existing XFRMA_OUTPUT_MARK number is reused for this purpose
and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.

An additional XFRMA_SET_MARK_MASK attribute is added for setting the
mask. If the attribute mask not provided, it is set to 0xffffffff,
keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.
Co-developed-by: NTobias Brunner <tobias@strongswan.org>
Co-developed-by: NEyal Birger <eyal.birger@gmail.com>
Co-developed-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>

9b42c1f1

30 3月, 2018 1 次提交

xfrm: Register xfrm_dev_notifier in appropriate place · e9a441b6

由 Kirill Tkhai 提交于 3月 29, 2018

Currently, driver registers it from pernet_operations::init method,
and this breaks modularity, because initialization of net namespace
and netdevice notifiers are orthogonal actions. We don't have
per-namespace netdevice notifiers; all of them are global for all
devices in all namespaces.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9a441b6

05 3月, 2018 1 次提交

net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len · 779b7931

由 Daniel Axtens 提交于 3月 01, 2018

If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?

skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

779b7931

19 1月, 2018 1 次提交

xfrm: fix error flow in case of add state fails · aa5dd6fa

由 Aviad Yehezkel 提交于 1月 18, 2018

If add state fails in case of device offload, netdev refcount
will be negative since gc task is attempting to dev_free this state.
This is fixed by putting NULL in state dev field.
Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: NBoris Pismeny <borisp@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

aa5dd6fa

18 1月, 2018 1 次提交

xfrm: Add ESN support for IPSec HW offload · 50bd870a

由 Yossef Efraim 提交于 1月 14, 2018

This patch adds ESN support to IPsec device offload.
Adding new xfrm device operation to synchronize device ESN.
Signed-off-by: NYossef Efraim <yossefe@mellanox.com>
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

50bd870a

21 12月, 2017 1 次提交

xfrm: check for xdo_dev_ops add and delete · 92a23206

由 Shannon Nelson 提交于 12月 19, 2017

This adds a check for the required add and delete functions up front
at registration time to be sure both are defined.

Since both the features check and the registration check are looking
at the same things, break out the check for both to call.

Lastly, for some reason the feature check was setting xfrmdev_ops to
NULL if the NETIF_F_HW_ESP bit was missing, which would probably
surprise the driver later if the driver turned its NETIF_F_HW_ESP bit
back on.  We shouldn't be messing with the driver's callback list, so
we stop doing that with this patch.
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

92a23206

20 12月, 2017 3 次提交

xfrm: Allow to use the layer2 IPsec GSO codepath for software crypto. · 95bff4b5

由 Steffen Klassert 提交于 12月 20, 2017

We now have support for asynchronous crypto operations in the layer 2 TX
path. This was the missing part to allow the GSO codepath for software
crypto, so allow this codepath now.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

95bff4b5

net: Add asynchronous callbacks for xfrm on layer 2. · f53c7239

由 Steffen Klassert 提交于 12月 20, 2017

This patch implements asynchronous crypto callbacks
and a backlog handler that can be used when IPsec
is done at layer 2 in the TX path. It also extends
the skb validate functions so that we can update
the driver transmit return codes based on async
crypto operation or to indicate that we queued the
packet in a backlog queue.

Joint work with: Aviv Heller <avivh@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

f53c7239

xfrm: Separate ESP handling from segmentation for GRO packets. · 3dca3f38

由 Steffen Klassert 提交于 12月 20, 2017

We change the ESP GSO handlers to only segment the packets.
The ESP handling and encryption is defered to validate_xmit_xfrm()
where this is done for non GRO packets too. This makes the code
more robust and prepares for asynchronous crypto handling.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

3dca3f38

01 12月, 2017 1 次提交

xfrm: Fix xfrm_dev_state_add to fail for unsupported HW SA option · 43024b9c

由 Yossef Efraim 提交于 11月 28, 2017

xfrm_dev_state_add function returns success for unsupported HW SA options.
Resulting the calling function to create SW SA without corrlating HW SA.
Desipte IPSec device offloading option was chosen.
These not supported HW SA options are hard coded within xfrm_dev_state_add
function.
SW backward compatibility will break if we add any of these option as old
HW will fail with new SW.

This patch changes the behaviour to return -EINVAL in case unsupported
option is chosen.
Notifying user application regarding failure and not breaking backward
compatibility for newly added HW SA options.
Signed-off-by: NYossef Efraim <yossefe@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

43024b9c

30 11月, 2017 2 次提交

xfrm: Move dst->path into struct xfrm_dst · 0f6c480f

由 David Miller 提交于 11月 28, 2017

The first member of an IPSEC route bundle chain sets it's dst->path to
the underlying ipv4/ipv6 route that carries the bundle.

Stated another way, if one were to follow the xfrm_dst->child chain of
the bundle, the final non-NULL pointer would be the path and point to
either an ipv4 or an ipv6 route.

This is largely used to make sure that PMTU events propagate down to
the correct ipv4 or ipv6 route.

When we don't have the top of an IPSEC bundle 'dst->path == dst'.

Move it down into xfrm_dst and key off of dst->xfrm.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <edumazet@google.com>

0f6c480f

xfrm: Move child route linkage into xfrm_dst. · b6ca8bd5

由 David Miller 提交于 11月 28, 2017

XFRM bundle child chains look like this:

	xdst1 --> xdst2 --> xdst3 --> path_dst

All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
The final child pointer in the chain, here called 'path_dst', is some
other kind of route such as an ipv4 or ipv6 one.

The xfrm output path pops routes, one at a time, via the child
pointer, until we hit one which has a dst->xfrm pointer which
is NULL.

We can easily preserve the above mechanisms with child sitting
only in the xfrm_dst structure.  All children in the chain
before we break out of the xfrm_output() loop have dst->xfrm
non-NULL and are therefore xfrm_dst objects.

Since we break out of the loop when we find dst->xfrm NULL, we
will not try to dereference 'dst' as if it were an xfrm_dst.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6ca8bd5

11 9月, 2017 1 次提交

xfrm: Fix negative device refcount on offload failure. · 67a63387

由 Steffen Klassert 提交于 9月 04, 2017

Reset the offload device at the xfrm_state if the device was
not able to offload the state. Otherwise we drop the device
refcount twice.

Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API")
Reported-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

67a63387

11 8月, 2017 1 次提交

net: xfrm: support setting an output mark. · 077fbac4

由 Lorenzo Colitti 提交于 8月 11, 2017

On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.

Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.

This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:

1. The xfrm mark is used to match xfrm policies and states, while
the xfrm output mark is used to set the mark (and influence
the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
the originating socket or transformed packet, but the output
mark is arbitrary and depends only on the state.

The use of a separate mark provides additional flexibility. For
example:

- A packet subject to two transforms (e.g., transport mode inside
tunnel mode) can have two different output marks applied to it,
one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
emitted by an IPsec tunnel can be routed based on a mark that
is determined by the tunnel, not by the marks of the
unencrypted packets.
- Support for setting the output marks can be introduced without
breaking any existing setups that employ both mark-based
routing and xfrm tunnel mode. Simply changing the code to use
the xfrm mark for routing output packets could xfrm mark could
change behaviour in a way that breaks these setups.

If the output mark is unspecified or set to zero, the mark is not
set or changed.

Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

077fbac4

02 8月, 2017 1 次提交

xfrm: Auto-load xfrm offload modules · ffdb5211

由 Ilan Tayari 提交于 8月 01, 2017

IPSec crypto offload depends on the protocol-specific
offload module (such as esp_offload.ko).

When the user installs an SA with crypto-offload, load
the offload module automatically, in the same way
that the protocol module is loaded (such as esp.ko)
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ffdb5211

19 7月, 2017 2 次提交

xfrm: add xdst pcpu cache · ec30d78c

由 Florian Westphal 提交于 7月 17, 2017

retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.

The cache will not help with strict RR workloads as there is no hit.

The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.

We need to run the dst_release on the correct cpu to avoid races with
packet path.  This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().

Test results using 4 network namespaces and null encryption:

ns1           ns2          -> ns3           -> ns4
netperf -> xfrm/null enc   -> xfrm/null dec -> netserver

what                    TCP_STREAM      UDP_STREAM      UDP_RR
Flow cache:             14644.61        294.35          327231.64
No flow cache:		14349.81	242.64		202301.72
Pcpu cache:		14629.70	292.21		205595.22

UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.

'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec30d78c

xfrm: remove flow cache · 09c75704

由 Florian Westphal 提交于 7月 17, 2017

After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.

See next patch for some numbers.

A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09c75704

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功