1. 03 3月, 2021 1 次提交
    • E
      xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume · 9ab1265d
      Evan Nimmo 提交于
      A situation can occur where the interface bound to the sk is different
      to the interface bound to the sk attached to the skb. The interface
      bound to the sk is the correct one however this information is lost inside
      xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume
      instead. This assumes that the sk bound interface and the bound interface
      attached to the sk within the skb are the same which can lead to lookup
      failures inside ip_route_me_harder resulting in the packet being dropped.
      
      We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the
      global VRF however we have an encapsulated dot1q tunnel interface that
      is within a different VRF. We also have a mangle rule that marks the
      packets causing them to be processed inside ip_route_me_harder.
      
      Prior to commit 31c70d59 ("l2tp: keep original skb ownership") this
      worked fine as the sk attached to the skb was changed from the dot1q
      encapsulated interface to the sk for the tunnel which meant the interface
      bound to the sk and the interface bound to the skb were identical.
      Commit 46d6c5ae ("netfilter: use actual socket sk rather than skb sk
      when routing harder") fixed some of these issues however a similar
      problem existed in the xfrm code.
      
      Fixes: 31c70d59 ("l2tp: keep original skb ownership")
      Signed-off-by: NEvan Nimmo <evan.nimmo@alliedtelesis.co.nz>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      9ab1265d
  2. 05 6月, 2020 1 次提交
  3. 29 5月, 2020 1 次提交
    • X
      xfrm: fix a NULL-ptr deref in xfrm_local_error · f6a23d85
      Xin Long 提交于
      This patch is to fix a crash:
      
        [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
        [ ] general protection fault: 0000 [#1] SMP KASAN PTI
        [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
        [ ] Call Trace:
        [ ]  xfrm6_local_error+0x1eb/0x300
        [ ]  xfrm_local_error+0x95/0x130
        [ ]  __xfrm6_output+0x65f/0xb50
        [ ]  xfrm6_output+0x106/0x46f
        [ ]  udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
        [ ]  vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
        [ ]  vxlan_xmit+0x6a0/0x4276 [vxlan]
        [ ]  dev_hard_start_xmit+0x165/0x820
        [ ]  __dev_queue_xmit+0x1ff0/0x2b90
        [ ]  ip_finish_output2+0xd3e/0x1480
        [ ]  ip_do_fragment+0x182d/0x2210
        [ ]  ip_output+0x1d0/0x510
        [ ]  ip_send_skb+0x37/0xa0
        [ ]  raw_sendmsg+0x1b4c/0x2b80
        [ ]  sock_sendmsg+0xc0/0x110
      
      This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
      skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
      xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
      to get ipv6 info from a ipv4 sk.
      
      This issue was actually fixed by Commit 628e341f ("xfrm: make local
      error reporting more robust"), but brought back by Commit 844d4874
      ("xfrm: choose protocol family by skb protocol").
      
      So to fix it, we should call xfrm6_local_error() only when skb->protocol
      is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.
      
      Fixes: 844d4874 ("xfrm: choose protocol family by skb protocol")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f6a23d85
  4. 06 5月, 2020 3 次提交
  5. 21 4月, 2020 1 次提交
    • X
      xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output · a204aef9
      Xin Long 提交于
      An use-after-free crash can be triggered when sending big packets over
      vxlan over esp with esp offload enabled:
      
        [] BUG: KASAN: use-after-free in ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
        [] Call Trace:
        []  dump_stack+0x75/0xa0
        []  kasan_report+0x37/0x50
        []  ipv6_gso_pull_exthdrs.part.8+0x32c/0x4e0
        []  ipv6_gso_segment+0x2c8/0x13c0
        []  skb_mac_gso_segment+0x1cb/0x420
        []  skb_udp_tunnel_segment+0x6b5/0x1c90
        []  inet_gso_segment+0x440/0x1380
        []  skb_mac_gso_segment+0x1cb/0x420
        []  esp4_gso_segment+0xae8/0x1709 [esp4_offload]
        []  inet_gso_segment+0x440/0x1380
        []  skb_mac_gso_segment+0x1cb/0x420
        []  __skb_gso_segment+0x2d7/0x5f0
        []  validate_xmit_skb+0x527/0xb10
        []  __dev_queue_xmit+0x10f8/0x2320 <---
        []  ip_finish_output2+0xa2e/0x1b50
        []  ip_output+0x1a8/0x2f0
        []  xfrm_output_resume+0x110e/0x15f0
        []  __xfrm4_output+0xe1/0x1b0
        []  xfrm4_output+0xa0/0x200
        []  iptunnel_xmit+0x5a7/0x920
        []  vxlan_xmit_one+0x1658/0x37a0 [vxlan]
        []  vxlan_xmit+0x5e4/0x3ec8 [vxlan]
        []  dev_hard_start_xmit+0x125/0x540
        []  __dev_queue_xmit+0x17bd/0x2320  <---
        []  ip6_finish_output2+0xb20/0x1b80
        []  ip6_output+0x1b3/0x390
        []  ip6_xmit+0xb82/0x17e0
        []  inet6_csk_xmit+0x225/0x3d0
        []  __tcp_transmit_skb+0x1763/0x3520
        []  tcp_write_xmit+0xd64/0x5fe0
        []  __tcp_push_pending_frames+0x8c/0x320
        []  tcp_sendmsg_locked+0x2245/0x3500
        []  tcp_sendmsg+0x27/0x40
      
      As on the tx path of vxlan over esp, skb->inner_network_header would be
      set on vxlan_xmit() and xfrm4_tunnel_encap_add(), and the later one can
      overwrite the former one. It causes skb_udp_tunnel_segment() to use a
      wrong skb->inner_network_header, then the issue occurs.
      
      This patch is to fix it by calling xfrm_output_gso() instead when the
      inner_protocol is set, in which gso_segment of inner_protocol will be
      done first.
      
      While at it, also improve some code around.
      
      Fixes: 7862b405 ("esp: Add gso handlers for esp4 and esp6")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      a204aef9
  6. 30 3月, 2020 1 次提交
  7. 15 1月, 2020 1 次提交
  8. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  9. 31 5月, 2019 1 次提交
  10. 08 4月, 2019 5 次提交
    • F
      xfrm: store xfrm_mode directly, not its address · c9500d7b
      Florian Westphal 提交于
      This structure is now only 4 bytes, so its more efficient
      to cache a copy rather than its address.
      
      No significant size difference in allmodconfig vmlinux.
      
      With non-modular kernel that has all XFRM options enabled, this
      series reduces vmlinux image size by ~11kb. All xfrm_mode
      indirections are gone and all modes are built-in.
      
      before (ipsec-next master):
          text      data      bss         dec   filename
      21071494   7233140 11104324    39408958   vmlinux.master
      
      after this series:
      21066448   7226772 11104324    39397544   vmlinux.patched
      
      With allmodconfig kernel, the size increase is only 362 bytes,
      even all the xfrm config options removed in this series are
      modular.
      
      before:
          text      data     bss      dec   filename
      15731286   6936912 4046908 26715106   vmlinux.master
      
      after this series:
      15731492   6937068  4046908  26715468 vmlinux
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c9500d7b
    • F
      xfrm: make xfrm modes builtin · 4c145dce
      Florian Westphal 提交于
      after previous changes, xfrm_mode contains no function pointers anymore
      and all modules defining such struct contain no code except an init/exit
      functions to register the xfrm_mode struct with the xfrm core.
      
      Just place the xfrm modes core and remove the modules,
      the run-time xfrm_mode register/unregister functionality is removed.
      
      Before:
      
          text    data     bss      dec filename
          7523     200    2364    10087 net/xfrm/xfrm_input.o
         40003     628     440    41071 net/xfrm/xfrm_state.o
      15730338 6937080 4046908 26714326 vmlinux
      
          7389     200    2364    9953  net/xfrm/xfrm_input.o
         40574     656     440   41670  net/xfrm/xfrm_state.o
      15730084 6937068 4046908 26714060 vmlinux
      
      The xfrm*_mode_{transport,tunnel,beet} modules are gone.
      
      v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
          ones rather than removing them.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4c145dce
    • F
      xfrm: remove afinfo pointer from xfrm_mode · 733a5fac
      Florian Westphal 提交于
      Adds an EXPORT_SYMBOL for afinfo_get_rcu, as it will now be called from
      ipv6 in case of CONFIG_IPV6=m.
      
      This change has virtually no effect on vmlinux size, but it reduces
      afinfo size and allows followup patch to make xfrm modes const.
      
      v2: mark if (afinfo) tests as likely (Sabrina)
          re-fetch afinfo according to inner_mode in xfrm_prepare_input().
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      733a5fac
    • F
      xfrm: remove output2 indirection from xfrm_mode · 1de70830
      Florian Westphal 提交于
      similar to previous patch: no external module dependencies,
      so we can avoid the indirection by placing this in the core.
      
      This change removes the last indirection from xfrm_mode and the
      xfrm4|6_mode_{beet,tunnel}.c modules contain (almost) no code anymore.
      
      Before:
         text    data     bss     dec     hex filename
         3957     136       0    4093     ffd net/xfrm/xfrm_output.o
          587      44       0     631     277 net/ipv4/xfrm4_mode_beet.o
          649      32       0     681     2a9 net/ipv4/xfrm4_mode_tunnel.o
          625      44       0     669     29d net/ipv6/xfrm6_mode_beet.o
          599      32       0     631     277 net/ipv6/xfrm6_mode_tunnel.o
      After:
         text    data     bss     dec     hex filename
         5359     184       0    5543    15a7 net/xfrm/xfrm_output.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_beet.o
          171      24       0     195      c3 net/ipv4/xfrm4_mode_tunnel.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_beet.o
          172      24       0     196      c4 net/ipv6/xfrm6_mode_tunnel.o
      
      v2: fold the *encap_add functions into xfrm*_prepare_output
          preserve (move) output2 comment (Sabrina)
          use x->outer_mode->encap, not inner
          fix a build breakage on ppc (kbuild robot)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1de70830
    • F
      xfrm: remove output indirection from xfrm_mode · 0c620e97
      Florian Westphal 提交于
      Same is input indirection.  Only exception: we need to export
      xfrm_outer_mode_output for pktgen.
      
      Increases size of vmlinux by about 163 byte:
      Before:
         text    data     bss     dec      filename
      15730208  6936948 4046908 26714064   vmlinux
      
      After:
      15730311  6937008 4046908 26714227   vmlinux
      
      xfrm_inner_extract_output has no more external callers, make it static.
      
      v2: add IS_ENABLED(IPV6) guard in xfrm6_prepare_output
          add two missing breaks in xfrm_outer_mode_output (Sabrina Dubroca)
          add WARN_ON_ONCE for 'call AF_INET6 related output function, but
          CONFIG_IPV6=n' case.
          make xfrm_inner_extract_output static
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      0c620e97
  11. 20 12月, 2018 1 次提交
    • F
      xfrm: prefer secpath_set over secpath_dup · a84e3f53
      Florian Westphal 提交于
      secpath_set is a wrapper for secpath_dup that will not perform
      an allocation if the secpath attached to the skb has a reference count
      of one, i.e., it doesn't need to be COW'ed.
      
      Also, secpath_dup doesn't attach the secpath to the skb, it leaves
      this to the caller.
      
      Use secpath_set in places that immediately assign the return value to
      skb.
      
      This allows to remove skb->sp without touching these spots again.
      
      secpath_dup can eventually be removed in followup patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a84e3f53
  12. 28 10月, 2018 1 次提交
  13. 11 9月, 2018 2 次提交
  14. 23 6月, 2018 1 次提交
  15. 16 3月, 2018 1 次提交
  16. 30 11月, 2017 1 次提交
  17. 31 10月, 2017 1 次提交
  18. 11 8月, 2017 1 次提交
    • L
      net: xfrm: support setting an output mark. · 077fbac4
      Lorenzo Colitti 提交于
      On systems that use mark-based routing it may be necessary for
      routing lookups to use marks in order for packets to be routed
      correctly. An example of such a system is Android, which uses
      socket marks to route packets via different networks.
      
      Currently, routing lookups in tunnel mode always use a mark of
      zero, making routing incorrect on such systems.
      
      This patch adds a new output_mark element to the xfrm state and
      a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
      mark differs from the existing xfrm mark in two ways:
      
      1. The xfrm mark is used to match xfrm policies and states, while
         the xfrm output mark is used to set the mark (and influence
         the routing) of the packets emitted by those states.
      2. The existing mark is constrained to be a subset of the bits of
         the originating socket or transformed packet, but the output
         mark is arbitrary and depends only on the state.
      
      The use of a separate mark provides additional flexibility. For
      example:
      
      - A packet subject to two transforms (e.g., transport mode inside
        tunnel mode) can have two different output marks applied to it,
        one for the transport mode SA and one for the tunnel mode SA.
      - On a system where socket marks determine routing, the packets
        emitted by an IPsec tunnel can be routed based on a mark that
        is determined by the tunnel, not by the marks of the
        unencrypted packets.
      - Support for setting the output marks can be introduced without
        breaking any existing setups that employ both mark-based
        routing and xfrm tunnel mode. Simply changing the code to use
        the xfrm mark for routing output packets could xfrm mark could
        change behaviour in a way that breaks these setups.
      
      If the output mark is unspecified or set to zero, the mark is not
      set or changed.
      
      Tested: make allyesconfig; make -j64
      Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      077fbac4
  19. 14 4月, 2017 2 次提交
  20. 10 1月, 2017 1 次提交
    • F
      xfrm: remove xfrm_state_put_afinfo · af5d27c4
      Florian Westphal 提交于
      commit 44abdc30
      ("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
      xfrm_state_put_afinfo equivalent to rcu_read_unlock.
      
      Use spatch to replace it with direct calls to rcu_read_unlock:
      
      @@
      struct xfrm_state_afinfo *a;
      @@
      
      -  xfrm_state_put_afinfo(a);
      +  rcu_read_unlock();
      
      old:
       text    data     bss     dec     hex filename
      22570      72     424   23066    5a1a xfrm_state.o
       1612       0       0    1612     64c xfrm_output.o
      new:
      22554      72     424   23050    5a0a xfrm_state.o
       1596       0       0    1596     63c xfrm_output.o
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      af5d27c4
  21. 17 3月, 2016 1 次提交
  22. 16 1月, 2016 1 次提交
  23. 08 10月, 2015 3 次提交
  24. 18 9月, 2015 4 次提交
    • E
      netfilter: Add blank lines in callers of netfilter hooks · be10de0a
      Eric W. Biederman 提交于
      In code review it was noticed that I had failed to add some blank lines
      in places where they are customarily used.  Taking a second look at the
      code I have to agree blank lines would be nice so I have added them
      here.
      Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be10de0a
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • E
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman 提交于
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a26a56
    • E
      net: Merge dst_output and dst_output_sk · 5a70649e
      Eric W. Biederman 提交于
      Add a sock paramter to dst_output making dst_output_sk superfluous.
      Add a skb->sk parameter to all of the callers of dst_output
      Have the callers of dst_output_sk call dst_output.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a70649e
  25. 13 5月, 2015 1 次提交
  26. 08 4月, 2015 1 次提交
    • D
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller 提交于
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7026b1dd
  27. 21 10月, 2014 1 次提交
    • F
      net: make skb_gso_segment error handling more robust · 330966e5
      Florian Westphal 提交于
      skb_gso_segment has three possible return values:
      1. a pointer to the first segmented skb
      2. an errno value (IS_ERR())
      3. NULL.  This can happen when GSO is used for header verification.
      
      However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
      and would oops when NULL is returned.
      
      Note that these call sites should never actually see such a NULL return
      value; all callers mask out the GSO bits in the feature argument.
      
      However, there have been issues with some protocol handlers erronously not
      respecting the specified feature mask in some cases.
      
      It is preferable to get 'have to turn off hw offloading, else slow' reports
      rather than 'kernel crashes'.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      330966e5