1. 25 10月, 2022 1 次提交
  2. 03 8月, 2022 1 次提交
  3. 25 5月, 2022 1 次提交
    • M
      xfrm: do not set IPv4 DF flag when encapsulating IPv6 frames <= 1280 bytes. · 6821ad87
      Maciej Żenczykowski 提交于
      One may want to have DF set on large packets to support discovering
      path mtu and limiting the size of generated packets (hence not
      setting the XFRM_STATE_NOPMTUDISC tunnel flag), while still
      supporting networks that are incapable of carrying even minimal
      sized IPv6 frames (post encapsulation).
      
      Having IPv4 Don't Frag bit set on encapsulated IPv6 frames that
      are not larger than the minimum IPv6 mtu of 1280 isn't useful,
      because the resulting ICMP Fragmentation Required error isn't
      actionable (even assuming you receive it) because IPv6 will not
      drop it's path mtu below 1280 anyway.  While the IPv4 stack
      could prefrag the packets post encap, this requires the ICMP
      error to be successfully delivered and causes a loss of the
      original IPv6 frame (thus requiring a retransmit and latency
      hit).  Luckily with IPv4 if we simply don't set the DF flag,
      we'll just make further fragmenting the packets some other
      router's problems.
      
      We'll still learn the correct IPv4 path mtu through encapsulation
      of larger IPv6 frames.
      
      I'm still not convinced this patch is entirely sufficient to make
      everything happy... but I don't see how it could possibly
      make things worse.
      
      See also recent:
        4ff2980b 'xfrm: fix tunnel model fragmentation behavior'
      and friends
      
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Lina Wang <lina.wang@mediatek.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NMaciej Zenczykowski <maze@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      6821ad87
  4. 05 1月, 2022 1 次提交
  5. 23 12月, 2021 1 次提交
  6. 23 6月, 2021 1 次提交
  7. 21 6月, 2021 1 次提交
  8. 16 6月, 2021 1 次提交
  9. 11 6月, 2021 5 次提交
  10. 01 6月, 2021 1 次提交
    • X
      xfrm: remove the fragment check for ipv6 beet mode · eebd49a4
      Xin Long 提交于
      In commit 68dc022d ("xfrm: BEET mode doesn't support fragments
      for inner packets"), it tried to fix the issue that in TX side the
      packet is fragmented before the ESP encapping while in the RX side
      the fragments always get reassembled before decapping with ESP.
      
      This is not true for IPv6. IPv6 is different, and it's using exthdr
      to save fragment info, as well as the ESP info. Exthdrs are added
      in TX and processed in RX both in order. So in the above case, the
      ESP decapping will be done earlier than the fragment reassembling
      in TX side.
      
      Here just remove the fragment check for the IPv6 inner packets to
      recover the fragments support for BEET mode.
      
      Fixes: 68dc022d ("xfrm: BEET mode doesn't support fragments for inner packets")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      eebd49a4
  11. 24 3月, 2021 1 次提交
  12. 03 3月, 2021 1 次提交
    • E
      xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume · 9ab1265d
      Evan Nimmo 提交于
      A situation can occur where the interface bound to the sk is different
      to the interface bound to the sk attached to the skb. The interface
      bound to the sk is the correct one however this information is lost inside
      xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume
      instead. This assumes that the sk bound interface and the bound interface
      attached to the sk within the skb are the same which can lead to lookup
      failures inside ip_route_me_harder resulting in the packet being dropped.
      
      We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the
      global VRF however we have an encapsulated dot1q tunnel interface that
      is within a different VRF. We also have a mangle rule that marks the
      packets causing them to be processed inside ip_route_me_harder.
      
      Prior to commit 31c70d59 ("l2tp: keep original skb ownership") this
      worked fine as the sk attached to the skb was changed from the dot1q
      encapsulated interface to the sk for the tunnel which meant the interface
      bound to the sk and the interface bound to the skb were identical.
      Commit 46d6c5ae ("netfilter: use actual socket sk rather than skb sk
      when routing harder") fixed some of these issues however a similar
      problem existed in the xfrm code.
      
      Fixes: 31c70d59 ("l2tp: keep original skb ownership")
      Signed-off-by: NEvan Nimmo <evan.nimmo@alliedtelesis.co.nz>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      9ab1265d
  13. 05 6月, 2020 1 次提交
  14. 29 5月, 2020 1 次提交
    • X
      xfrm: fix a NULL-ptr deref in xfrm_local_error · f6a23d85
      Xin Long 提交于
      This patch is to fix a crash:
      
        [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
        [ ] general protection fault: 0000 [#1] SMP KASAN PTI
        [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
        [ ] Call Trace:
        [ ]  xfrm6_local_error+0x1eb/0x300
        [ ]  xfrm_local_error+0x95/0x130
        [ ]  __xfrm6_output+0x65f/0xb50
        [ ]  xfrm6_output+0x106/0x46f
        [ ]  udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
        [ ]  vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
        [ ]  vxlan_xmit+0x6a0/0x4276 [vxlan]
        [ ]  dev_hard_start_xmit+0x165/0x820
        [ ]  __dev_queue_xmit+0x1ff0/0x2b90
        [ ]  ip_finish_output2+0xd3e/0x1480
        [ ]  ip_do_fragment+0x182d/0x2210
        [ ]  ip_output+0x1d0/0x510
        [ ]  ip_send_skb+0x37/0xa0
        [ ]  raw_sendmsg+0x1b4c/0x2b80
        [ ]  sock_sendmsg+0xc0/0x110
      
      This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
      skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
      xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
      to get ipv6 info from a ipv4 sk.
      
      This issue was actually fixed by Commit 628e341f ("xfrm: make local
      error reporting more robust"), but brought back by Commit 844d4874
      ("xfrm: choose protocol family by skb protocol").
      
      So to fix it, we should call xfrm6_local_error() only when skb->protocol
      is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.
      
      Fixes: 844d4874 ("xfrm: choose protocol family by skb protocol")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f6a23d85
  15. 06 5月, 2020 3 次提交
  16. 21 4月, 2020 1 次提交
    • X
      xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output · a204aef9
      Xin Long 提交于
      An use-after-free crash can be triggered when sending big packets over
      vxlan over esp with esp offload enabled:
      
        [] BUG: KASAN: use-after-free in ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
        [] Call Trace:
        []  dump_stack+0x75/0xa0
        []  kasan_report+0x37/0x50
        []  ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
        []  ipv6_gso_segment+0x2c8/0x13c0
        []  skb_mac_gso_segment+0x1cb/0x420
        []  skb_udp_tunnel_segment+0x6b5/0x1c90
        []  inet_gso_segment+0x440/0x1380
        []  skb_mac_gso_segment+0x1cb/0x420
        []  esp4_gso_segment+0xae8/0x1709 [esp4_offload]
        []  inet_gso_segment+0x440/0x1380
        []  skb_mac_gso_segment+0x1cb/0x420
        []  __skb_gso_segment+0x2d7/0x5f0
        []  validate_xmit_skb+0x527/0xb10
        []  __dev_queue_xmit+0x10f8/0x2320 <---
        []  ip_finish_output2+0xa2e/0x1b50
        []  ip_output+0x1a8/0x2f0
        []  xfrm_output_resume+0x110e/0x15f0
        []  __xfrm4_output+0xe1/0x1b0
        []  xfrm4_output+0xa0/0x200
        []  iptunnel_xmit+0x5a7/0x920
        []  vxlan_xmit_one+0x1658/0x37a0 [vxlan]
        []  vxlan_xmit+0x5e4/0x3ec8 [vxlan]
        []  dev_hard_start_xmit+0x125/0x540
        []  __dev_queue_xmit+0x17bd/0x2320  <---
        []  ip6_finish_output2+0xb20/0x1b80
        []  ip6_output+0x1b3/0x390
        []  ip6_xmit+0xb82/0x17e0
        []  inet6_csk_xmit+0x225/0x3d0
        []  __tcp_transmit_skb+0x1763/0x3520
        []  tcp_write_xmit+0xd64/0x5fe0
        []  __tcp_push_pending_frames+0x8c/0x320
        []  tcp_sendmsg_locked+0x2245/0x3500
        []  tcp_sendmsg+0x27/0x40
      
      As on the tx path of vxlan over esp, skb->inner_network_header would be
      set on vxlan_xmit() and xfrm4_tunnel_encap_add(), and the later one can
      overwrite the former one. It causes skb_udp_tunnel_segment() to use a
      wrong skb->inner_network_header, then the issue occurs.
      
      This patch is to fix it by calling xfrm_output_gso() instead when the
      inner_protocol is set, in which gso_segment of inner_protocol will be
      done first.
      
      While at it, also improve some code around.
      
      Fixes: 7862b405 ("esp: Add gso handlers for esp4 and esp6")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      a204aef9
  17. 30 3月, 2020 1 次提交
  18. 15 1月, 2020 1 次提交
  19. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  20. 31 5月, 2019 1 次提交
  21. 08 4月, 2019 5 次提交
    • F
      xfrm: store xfrm_mode directly, not its address · c9500d7b
      Florian Westphal 提交于
      This structure is now only 4 bytes, so its more efficient
      to cache a copy rather than its address.
      
      No significant size difference in allmodconfig vmlinux.
      
      With non-modular kernel that has all XFRM options enabled, this
      series reduces vmlinux image size by ~11kb. All xfrm_mode
      indirections are gone and all modes are built-in.
      
      before (ipsec-next master):
          text      data      bss         dec   filename
      21071494   7233140 11104324    39408958   vmlinux.master
      
      after this series:
      21066448   7226772 11104324    39397544   vmlinux.patched
      
      With allmodconfig kernel, the size increase is only 362 bytes,
      even all the xfrm config options removed in this series are
      modular.
      
      before:
          text      data     bss      dec   filename
      15731286   6936912 4046908 26715106   vmlinux.master
      
      after this series:
      15731492   6937068  4046908  26715468 vmlinux
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c9500d7b
    • F
      xfrm: make xfrm modes builtin · 4c145dce
      Florian Westphal 提交于
      after previous changes, xfrm_mode contains no function pointers anymore
      and all modules defining such struct contain no code except an init/exit
      functions to register the xfrm_mode struct with the xfrm core.
      
      Just place the xfrm modes core and remove the modules,
      the run-time xfrm_mode register/unregister functionality is removed.
      
      Before:
      
          text    data     bss      dec filename
          7523     200    2364    10087 net/xfrm/xfrm_input.o
         40003     628     440    41071 net/xfrm/xfrm_state.o
      15730338 6937080 4046908 26714326 vmlinux
      
          7389     200    2364    9953  net/xfrm/xfrm_input.o
         40574     656     440   41670  net/xfrm/xfrm_state.o
      15730084 6937068 4046908 26714060 vmlinux
      
      The xfrm*_mode_{transport,tunnel,beet} modules are gone.
      
      v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
          ones rather than removing them.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4c145dce
    • F
      xfrm: remove afinfo pointer from xfrm_mode · 733a5fac
      Florian Westphal 提交于
      Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from
      ipv6 in case of CONFIG_IPV6=m.
      
      This change has virtually no effect on vmlinux size, but it reduces
      afinfo size and allows followup patch to make xfrm modes const.
      
      v2: mark if (afinfo) tests as likely (Sabrina)
          re-fetch afinfo according to inner_mode in xfrm_prepare_input().
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      733a5fac
    • F
      xfrm: remove output2 indirection from xfrm_mode · 1de70830
      Florian Westphal 提交于
      similar to previous patch: no external module dependencies,
      so we can avoid the indirection by placing this in the core.
      
      This change removes the last indirection from xfrm_mode and the
      xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore.
      
      Before:
         text    data     bss     dec     hex filename
         3957     136       0    4093     ffd net/xfrm/xfrm_output.o
          587      44       0     631     277 net/ipv4/xfrm4_mode_beet.o
          649      32       0     681     2a9 net/ipv4/xfrm4_mode_tunnel.o
          625      44       0     669     29d net/ipv6/xfrm6_mode_beet.o
          599      32       0     631     277 net/ipv6/xfrm6_mode_tunnel.o
      After:
         text    data     bss     dec     hex filename
         5359     184       0    5543    15a7 net/xfrm/xfrm_output.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_beet.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_tunnel.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_beet.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_tunnel.o
      
      v2: fold the *encap_add functions into xfrm*_prepare_output
          preserve (move) output2 comment (Sabrina)
          use x->outer_mode->encap, not inner
          fix a build breakage on ppc (kbuild robot)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1de70830
    • F
      xfrm: remove output indirection from xfrm_mode · 0c620e97
      Florian Westphal 提交于
      Same is input indirection.  Only exception: we need to export
      xfrm_outer_mode_output for pktgen.
      
      Increases size of vmlinux by about 163 byte:
      Before:
         text    data     bss     dec      filename
      15730208  6936948 4046908 26714064   vmlinux
      
      After:
      15730311  6937008 4046908 26714227   vmlinux
      
      xfrm_inner_extract_output has no more external callers, make it static.
      
      v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
          add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
          add WARN_ON_ONCE for 'call AF_INET6 related output function, but
          CONFIG_IPV6=n' case.
          make xfrm_inner_extract_output static
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      0c620e97
  22. 20 12月, 2018 1 次提交
    • F
      xfrm: prefer secpath_set over secpath_dup · a84e3f53
      Florian Westphal 提交于
      secpath_set is a wrapper for secpath_dup that will not perform
      an allocation if the secpath attached to the skb has a reference count
      of one, i.e., it doesn't need to be COW'ed.
      
      Also, secpath_dup doesn't attach the secpath to the skb, it leaves
      this to the caller.
      
      Use secpath_set in places that immediately assign the return value to
      skb.
      
      This allows to remove skb->sp without touching these spots again.
      
      secpath_dup can eventually be removed in followup patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a84e3f53
  23. 28 10月, 2018 1 次提交
  24. 11 9月, 2018 2 次提交
  25. 23 6月, 2018 1 次提交
  26. 16 3月, 2018 1 次提交
  27. 30 11月, 2017 1 次提交
  28. 31 10月, 2017 1 次提交
  29. 11 8月, 2017 1 次提交
    • L
      net: xfrm: support setting an output mark. · 077fbac4
      Lorenzo Colitti 提交于
      On systems that use mark-based routing it may be necessary for
      routing lookups to use marks in order for packets to be routed
      correctly. An example of such a system is Android, which uses
      socket marks to route packets via different networks.
      
      Currently, routing lookups in tunnel mode always use a mark of
      zero, making routing incorrect on such systems.
      
      This patch adds a new output_mark element to the xfrm state and
      a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
      mark differs from the existing xfrm mark in two ways:
      
      1. The xfrm mark is used to match xfrm policies and states, while
         the xfrm output mark is used to set the mark (and influence
         the routing) of the packets emitted by those states.
      2. The existing mark is constrained to be a subset of the bits of
         the originating socket or transformed packet, but the output
         mark is arbitrary and depends only on the state.
      
      The use of a separate mark provides additional flexibility. For
      example:
      
      - A packet subject to two transforms (e.g., transport mode inside
        tunnel mode) can have two different output marks applied to it,
        one for the transport mode SA and one for the tunnel mode SA.
      - On a system where socket marks determine routing, the packets
        emitted by an IPsec tunnel can be routed based on a mark that
        is determined by the tunnel, not by the marks of the
        unencrypted packets.
      - Support for setting the output marks can be introduced without
        breaking any existing setups that employ both mark-based
        routing and xfrm tunnel mode. Simply changing the code to use
        the xfrm mark for routing output packets could xfrm mark could
        change behaviour in a way that breaks these setups.
      
      If the output mark is unspecified or set to zero, the mark is not
      set or changed.
      
      Tested: make allyesconfig; make -j64
      Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      077fbac4