1. 29 8月, 2022 1 次提交
    • E
      xfrm: interface: support collect metadata mode · abc340b3
      Eyal Birger 提交于
      This commit adds support for 'collect_md' mode on xfrm interfaces.
      
      Each net can have one collect_md device, created by providing the
      IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be
      altered and has no if_id or link device attributes.
      
      On transmit to this device, the if_id is fetched from the attached dst
      metadata on the skb. If exists, the link property is also fetched from
      the metadata. The dst metadata type used is METADATA_XFRM which holds
      these properties.
      
      On the receive side, xfrmi_rcv_cb() populates a dst metadata for each
      packet received and attaches it to the skb. The if_id used in this case is
      fetched from the xfrm state, and the link is fetched from the incoming
      device. This information can later be used by upper layers such as tc,
      ebpf, and ip rules.
      
      Because the skb is scrubed in xfrmi_rcv_cb(), the attachment of the dst
      metadata is postponed until after scrubing. Similarly, xfrm_input() is
      adapted to avoid dropping metadata dsts by only dropping 'valid'
      (skb_valid_dst(skb) == true) dsts.
      
      Policy matching on packets arriving from collect_md xfrmi devices is
      done by using the xfrm state existing in the skb's sec_path.
      The xfrm_if_cb.decode_cb() interface implemented by xfrmi_decode_session()
      is changed to keep the details of the if_id extraction tucked away
      in xfrm_interface.c.
      Reviewed-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      abc340b3
  2. 23 8月, 2022 1 次提交
  3. 17 8月, 2022 1 次提交
    • N
      xfrm: policy: fix metadata dst->dev xmit null pointer dereference · 17ecd4a4
      Nikolay Aleksandrov 提交于
      When we try to transmit an skb with metadata_dst attached (i.e. dst->dev
      == NULL) through xfrm interface we can hit a null pointer dereference[1]
      in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a
      loopback skb device when there's no policy which dereferences dst->dev
      unconditionally. Not having dst->dev can be interepreted as it not being
      a loopback device, so just add a check for a null dst_orig->dev.
      
      With this fix xfrm interface's Tx error counters go up as usual.
      
      [1] net-next calltrace captured via netconsole:
        BUG: kernel NULL pointer dereference, address: 00000000000000c0
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP
        CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014
        RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60
        Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02
        RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80
        RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000
        R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880
        R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000
        FS:  00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0
        Call Trace:
         <TASK>
         xfrmi_xmit+0xde/0x460
         ? tcf_bpf_act+0x13d/0x2a0
         dev_hard_start_xmit+0x72/0x1e0
         __dev_queue_xmit+0x251/0xd30
         ip_finish_output2+0x140/0x550
         ip_push_pending_frames+0x56/0x80
         raw_sendmsg+0x663/0x10a0
         ? try_charge_memcg+0x3fd/0x7a0
         ? __mod_memcg_lruvec_state+0x93/0x110
         ? sock_sendmsg+0x30/0x40
         sock_sendmsg+0x30/0x40
         __sys_sendto+0xeb/0x130
         ? handle_mm_fault+0xae/0x280
         ? do_user_addr_fault+0x1e7/0x680
         ? kvm_read_and_reset_apf_flags+0x3b/0x50
         __x64_sys_sendto+0x20/0x30
         do_syscall_64+0x34/0x80
         entry_SYSCALL_64_after_hwframe+0x46/0xb0
        RIP: 0033:0x7ff35cac1366
        Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
        RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
        RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366
        RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003
        RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
        R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001
         </TASK>
        Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole]
        CR2: 00000000000000c0
      
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      Fixes: 2d151d39 ("xfrm: Add possibility to set the default to block if we have no policy")
      Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      17ecd4a4
  4. 27 7月, 2022 1 次提交
  5. 02 6月, 2022 1 次提交
    • H
      xfrm: xfrm_policy: fix a possible double xfrm_pols_put() in xfrm_bundle_lookup() · f85daf0e
      Hangyu Hua 提交于
      xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of
      pols[0]. This refcount can be dropped in xfrm_expand_policies() when
      xfrm_expand_policies() return error. pols[0]'s refcount is balanced in
      here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with
      num_pols == 1 to drop this refcount when xfrm_expand_policies() return
      error.
      
      This patch also fix an illegal address access. pols[0] will save a error
      point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve
      an illegal address in xfrm_bundle_lookup's error path.
      
      Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path.
      
      Fixes: 80c802f3 ("xfrm: cache bundles instead of policies for outgoing flows")
      Signed-off-by: NHangyu Hua <hbh25y@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f85daf0e
  6. 17 5月, 2022 1 次提交
  7. 04 4月, 2022 1 次提交
  8. 18 3月, 2022 1 次提交
  9. 26 1月, 2022 1 次提交
    • Y
      xfrm: Check if_id in xfrm_migrate · c1aca308
      Yan Yan 提交于
      This patch enables distinguishing SAs and SPs based on if_id during
      the xfrm_migrate flow. This ensures support for xfrm interfaces
      throughout the SA/SP lifecycle.
      
      When there are multiple existing SPs with the same direction,
      the same xfrm_selector and different endpoint addresses,
      xfrm_migrate might fail with ENODATA.
      
      Specifically, the code path for performing xfrm_migrate is:
        Stage 1: find policy to migrate with
          xfrm_migrate_policy_find(sel, dir, type, net)
        Stage 2: find and update state(s) with
          xfrm_migrate_state_find(mp, net)
        Stage 3: update endpoint address(es) of template(s) with
          xfrm_policy_migrate(pol, m, num_migrate)
      
      Currently "Stage 1" always returns the first xfrm_policy that
      matches, and "Stage 3" looks for the xfrm_tmpl that matches the
      old endpoint address. Thus if there are multiple xfrm_policy
      with same selector, direction, type and net, "Stage 1" might
      rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA
      because it cannot find a xfrm_tmpl with the matching endpoint
      address.
      
      The fix is to allow userspace to pass an if_id and add if_id
      to the matching rule in Stage 1 and Stage 2 since if_id is a
      unique ID for xfrm_policy and xfrm_state. For compatibility,
      if_id will only be checked if the attribute is set.
      
      Tested with additions to Android's kernel unit test suite:
      https://android-review.googlesource.com/c/kernel/tests/+/1668886Signed-off-by: NYan Yan <evitayan@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c1aca308
  10. 12 1月, 2022 1 次提交
  11. 01 12月, 2021 1 次提交
  12. 23 11月, 2021 1 次提交
  13. 19 11月, 2021 1 次提交
  14. 19 10月, 2021 1 次提交
    • K
      xfrm: Use memset_after() to clear padding · caf283d0
      Kees Cook 提交于
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memset(), avoid intentionally writing across
      neighboring fields.
      
      Clear trailing padding bytes using the new helper so that memset()
      doesn't get confused about writing "past the end" of the last struct
      member. There is no change to the resulting machine code.
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      caf283d0
  15. 21 7月, 2021 1 次提交
  16. 02 7月, 2021 2 次提交
    • F
      xfrm: Fix RCU vs hash_resize_mutex lock inversion · 2580d3f4
      Frederic Weisbecker 提交于
      xfrm_bydst_resize() calls synchronize_rcu() while holding
      hash_resize_mutex. But then on PREEMPT_RT configurations,
      xfrm_policy_lookup_bytype() may acquire that mutex while running in an
      RCU read side critical section. This results in a deadlock.
      
      In fact the scope of hash_resize_mutex is way beyond the purpose of
      xfrm_policy_lookup_bytype() to just fetch a coherent and stable policy
      for a given destination/direction, along with other details.
      
      The lower level net->xfrm.xfrm_policy_lock, which among other things
      protects per destination/direction references to policy entries, is
      enough to serialize and benefit from priority inheritance against the
      write side. As a bonus, it makes it officially a per network namespace
      synchronization business where a policy table resize on namespace A
      shouldn't block a policy lookup on namespace B.
      
      Fixes: 77cc278f (xfrm: policy: Use sequence counters with associated lock)
      Cc: stable@vger.kernel.org
      Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Varad Gautam <varad.gautam@suse.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      2580d3f4
    • S
      Revert "xfrm: policy: Read seqcount outside of rcu-read side in xfrm_policy_lookup_bytype" · eaf22826
      Steffen Klassert 提交于
      This reverts commit d7b04089.
      
      This commit tried to fix a locking bug introduced by commit 77cc278f
      ("xfrm: policy: Use sequence counters with associated lock"). As it
      turned out, this patch did not really fix the bug. A proper fix
      for this bug is applied on top of this revert.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      eaf22826
  17. 11 6月, 2021 1 次提交
  18. 01 6月, 2021 1 次提交
    • V
      xfrm: policy: Read seqcount outside of rcu-read side in xfrm_policy_lookup_bytype · d7b04089
      Varad Gautam 提交于
      xfrm_policy_lookup_bytype loops on seqcount mutex xfrm_policy_hash_generation
      within an RCU read side critical section. Although ill advised, this is fine if
      the loop is bounded.
      
      xfrm_policy_hash_generation wraps mutex hash_resize_mutex, which is used to
      serialize writers (xfrm_hash_resize, xfrm_hash_rebuild). This is fine too.
      
      On PREEMPT_RT=y, the read_seqcount_begin call within xfrm_policy_lookup_bytype
      emits a mutex lock/unlock for hash_resize_mutex. Mutex locking is fine, since
      RCU read side critical sections are allowed to sleep with PREEMPT_RT.
      
      xfrm_hash_resize can, however, block on synchronize_rcu while holding
      hash_resize_mutex.
      
      This leads to the following situation on PREEMPT_RT, where the writer is
      blocked on RCU grace period expiry, while the reader is blocked on a lock held
      by the writer:
      
      Thead 1 (xfrm_hash_resize)	Thread 2 (xfrm_policy_lookup_bytype)
      
      				rcu_read_lock();
      mutex_lock(&hash_resize_mutex);
      				read_seqcount_begin(&xfrm_policy_hash_generation);
      				mutex_lock(&hash_resize_mutex); // block
      xfrm_bydst_resize();
      synchronize_rcu(); // block
      		<RCU stalls in xfrm_policy_lookup_bytype>
      
      Move the read_seqcount_begin call outside of the RCU read side critical section,
      and do an rcu_read_unlock/retry if we got stale data within the critical section.
      
      On non-PREEMPT_RT, this shortens the time spent within RCU read side critical
      section in case the seqcount needs a retry, and avoids unbounded looping.
      
      Fixes: 77cc278f ("xfrm: policy: Use sequence counters with associated lock")
      Signed-off-by: NVarad Gautam <varad.gautam@suse.com>
      Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.9
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: "Ahmed S. Darwish" <a.darwish@linutronix.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      d7b04089
  19. 11 5月, 2021 1 次提交
  20. 19 4月, 2021 2 次提交
  21. 29 3月, 2021 1 次提交
  22. 04 1月, 2021 2 次提交
    • V
      xfrm: Fix wraparound in xfrm_policy_addr_delta() · da64ae2d
      Visa Hankala 提交于
      Use three-way comparison for address components to avoid integer
      wraparound in the result of xfrm_policy_addr_delta(). This ensures
      that the search trees are built and traversed correctly.
      
      Treat IPv4 and IPv6 similarly by returning 0 when prefixlen == 0.
      Prefix /0 has only one equivalence class.
      
      Fixes: 9cf545eb ("xfrm: policy: store inexact policies in a tree ordered by destination address")
      Signed-off-by: NVisa Hankala <visa@hankala.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      da64ae2d
    • E
      xfrm: fix disable_xfrm sysctl when used on xfrm interfaces · 9f8550e4
      Eyal Birger 提交于
      The disable_xfrm flag signals that xfrm should not be performed during
      routing towards a device before reaching device xmit.
      
      For xfrm interfaces this is usually desired as they perform the outbound
      policy lookup as part of their xmit using their if_id.
      
      Before this change enabling this flag on xfrm interfaces prevented them
      from xmitting as xfrm_lookup_with_ifid() would not perform a policy lookup
      in case the original dst had the DST_NOXFRM flag.
      
      This optimization is incorrect when the lookup is done by the xfrm
      interface xmit logic.
      
      Fix by performing policy lookup when invoked by xfrmi as if_id != 0.
      
      Similarly it's unlikely for the 'no policy exists on net' check to yield
      any performance benefits when invoked from xfrmi.
      
      Fixes: f203b76d ("xfrm: Add virtual xfrm interfaces")
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      9f8550e4
  23. 24 8月, 2020 1 次提交
  24. 29 7月, 2020 1 次提交
  25. 21 7月, 2020 1 次提交
  26. 17 7月, 2020 1 次提交
  27. 24 6月, 2020 1 次提交
    • X
      xfrm: policy: match with both mark and mask on user interfaces · 4f47e8ab
      Xin Long 提交于
      In commit ed17b8d3 ("xfrm: fix a warning in xfrm_policy_insert_list"),
      it would take 'priority' to make a policy unique, and allow duplicated
      policies with different 'priority' to be added, which is not expected
      by userland, as Tobias reported in strongswan.
      
      To fix this duplicated policies issue, and also fix the issue in
      commit ed17b8d3 ("xfrm: fix a warning in xfrm_policy_insert_list"),
      when doing add/del/get/update on user interfaces, this patch is to change
      to look up a policy with both mark and mask by doing:
      
        mark.v == pol->mark.v && mark.m == pol->mark.m
      
      and leave the check:
      
        (mark & pol->mark.m) == pol->mark.v
      
      for tx/rx path only.
      
      As the userland expects an exact mark and mask match to manage policies.
      
      v1->v2:
        - make xfrm_policy_mark_match inline and fix the changelog as
          Tobias suggested.
      
      Fixes: 295fae56 ("xfrm: Allow user space manipulation of SPD mark")
      Fixes: ed17b8d3 ("xfrm: fix a warning in xfrm_policy_insert_list")
      Reported-by: NTobias Brunner <tobias@strongswan.org>
      Tested-by: NTobias Brunner <tobias@strongswan.org>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4f47e8ab
  28. 25 5月, 2020 1 次提交
    • X
      xfrm: fix a warning in xfrm_policy_insert_list · ed17b8d3
      Xin Long 提交于
      This waring can be triggered simply by:
      
        # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
          priority 1 mark 0 mask 0x10  #[1]
        # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
          priority 2 mark 0 mask 0x1   #[2]
        # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \
          priority 2 mark 0 mask 0x10  #[3]
      
      Then dmesg shows:
      
        [ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548
        [ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030
        [ ] Call Trace:
        [ ]  xfrm_policy_inexact_insert+0x85/0xe50
        [ ]  xfrm_policy_insert+0x4ba/0x680
        [ ]  xfrm_add_policy+0x246/0x4d0
        [ ]  xfrm_user_rcv_msg+0x331/0x5c0
        [ ]  netlink_rcv_skb+0x121/0x350
        [ ]  xfrm_netlink_rcv+0x66/0x80
        [ ]  netlink_unicast+0x439/0x630
        [ ]  netlink_sendmsg+0x714/0xbf0
        [ ]  sock_sendmsg+0xe2/0x110
      
      The issue was introduced by Commit 7cb8a939 ("xfrm: Allow inserting
      policies with matching mark and different priorities"). After that, the
      policies [1] and [2] would be able to be added with different priorities.
      
      However, policy [3] will actually match both [1] and [2]. Policy [1]
      was matched due to the 1st 'return true' in xfrm_policy_mark_match(),
      and policy [2] was matched due to the 2nd 'return true' in there. It
      caused WARN_ON() in xfrm_policy_insert_list().
      
      This patch is to fix it by only (the same value and priority) as the
      same policy in xfrm_policy_mark_match().
      
      Thanks to Yuehaibing, we could make this fix better.
      
      v1->v2:
        - check policy->mark.v == pol->mark.v only without mask.
      
      Fixes: 7cb8a939 ("xfrm: Allow inserting policies with matching mark and different priorities")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      ed17b8d3
  29. 24 3月, 2020 2 次提交
  30. 04 2月, 2020 1 次提交
  31. 09 12月, 2019 1 次提交
  32. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  33. 25 8月, 2019 1 次提交
    • H
      xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode · c3b4c3a4
      Hangbin Liu 提交于
      In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv6
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmpv6_send
            - icmpv6_route_lookup
              - xfrm_decode_session_reverse
                - decode_session4
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
                - decode_session6
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the skb_dst(skb)->dev is actually not needed as we
      called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
      used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
      
      So make a dst dev check here should be clean and safe.
      
      v4: No changes.
      
      v3: No changes.
      
      v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Tested-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3b4c3a4
  34. 20 8月, 2019 1 次提交
    • F
      xfrm: policy: avoid warning splat when merging nodes · 769a807d
      Florian Westphal 提交于
      syzbot reported a splat:
       xfrm_policy_inexact_list_reinsert+0x625/0x6e0 net/xfrm/xfrm_policy.c:877
       CPU: 1 PID: 6756 Comm: syz-executor.1 Not tainted 5.3.0-rc2+ #57
       Call Trace:
        xfrm_policy_inexact_node_reinsert net/xfrm/xfrm_policy.c:922 [inline]
        xfrm_policy_inexact_node_merge net/xfrm/xfrm_policy.c:958 [inline]
        xfrm_policy_inexact_insert_node+0x537/0xb50 net/xfrm/xfrm_policy.c:1023
        xfrm_policy_inexact_alloc_chain+0x62b/0xbd0 net/xfrm/xfrm_policy.c:1139
        xfrm_policy_inexact_insert+0xe8/0x1540 net/xfrm/xfrm_policy.c:1182
        xfrm_policy_insert+0xdf/0xce0 net/xfrm/xfrm_policy.c:1574
        xfrm_add_policy+0x4cf/0x9b0 net/xfrm/xfrm_user.c:1670
        xfrm_user_rcv_msg+0x46b/0x720 net/xfrm/xfrm_user.c:2676
        netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2477
        xfrm_netlink_rcv+0x74/0x90 net/xfrm/xfrm_user.c:2684
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0x809/0x9a0 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0xa70/0xd30 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:637 [inline]
        sock_sendmsg net/socket.c:657 [inline]
      
      There is no reproducer, however, the warning can be reproduced
      by adding rules with ever smaller prefixes.
      
      The sanity check ("does the policy match the node") uses the prefix value
      of the node before its updated to the smaller value.
      
      To fix this, update the prefix earlier.  The bug has no impact on tree
      correctness, this is only to prevent a false warning.
      
      Reported-by: syzbot+8cc27ace5f6972910b31@syzkaller.appspotmail.com
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      769a807d
  35. 03 7月, 2019 1 次提交
    • F
      xfrm: policy: fix bydst hlist corruption on hash rebuild · fd709721
      Florian Westphal 提交于
      syzbot reported following spat:
      
      BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221
      BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455
      BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
      Write of size 8 at addr ffff888095e79c00 by task kworker/1:3/8066
      Workqueue: events xfrm_hash_rebuild
      Call Trace:
       __write_once_size include/linux/compiler.h:221 [inline]
       hlist_del_rcu include/linux/rculist.h:455 [inline]
       xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
       process_one_work+0x814/0x1130 kernel/workqueue.c:2269
      Allocated by task 8064:
       __kmalloc+0x23c/0x310 mm/slab.c:3669
       kzalloc include/linux/slab.h:742 [inline]
       xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21
       xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline]
       xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120
       ops_init+0x336/0x420 net/core/net_namespace.c:130
       setup_net+0x212/0x690 net/core/net_namespace.c:316
      
      The faulting address is the address of the old chain head,
      free'd by xfrm_hash_resize().
      
      In xfrm_hash_rehash(), chain heads get re-initialized without
      any hlist_del_rcu:
      
       for (i = hmask; i >= 0; i--)
          INIT_HLIST_HEAD(odst + i);
      
      Then, hlist_del_rcu() gets called on the about to-be-reinserted policy
      when iterating the per-net list of policies.
      
      hlist_del_rcu() will then make chain->first be nonzero again:
      
      static inline void __hlist_del(struct hlist_node *n)
      {
         struct hlist_node *next = n->next;   // address of next element in list
         struct hlist_node **pprev = n->pprev;// location of previous elem, this
                                              // can point at chain->first
              WRITE_ONCE(*pprev, next);       // chain->first points to next elem
              if (next)
                      next->pprev = pprev;
      
      Then, when we walk chainlist to find insertion point, we may find a
      non-empty list even though we're supposedly reinserting the first
      policy to an empty chain.
      
      To fix this first unlink all exact and inexact policies instead of
      zeroing the list heads.
      
      Add the commands equivalent to the syzbot reproducer to xfrm_policy.sh,
      without fix KASAN catches the corruption as it happens, SLUB poisoning
      detects it a bit later.
      
      Reported-by: syzbot+0165480d4ef07360eeda@syzkaller.appspotmail.com
      Fixes: 1548bc4e ("xfrm: policy: delete inexact policies from inexact list on hash rebuild")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      fd709721
  36. 02 7月, 2019 1 次提交