1. 10 11月, 2020 1 次提交
  2. 23 10月, 2020 1 次提交
    • Z
      net: xfrm: fix a race condition during allocing spi · a779d913
      zhuoliang zhang 提交于
      we found that the following race condition exists in
      xfrm_alloc_userspi flow:
      
      user thread                                    state_hash_work thread
      ----                                           ----
      xfrm_alloc_userspi()
       __find_acq_core()
         /*alloc new xfrm_state:x*/
         xfrm_state_alloc()
         /*schedule state_hash_work thread*/
         xfrm_hash_grow_check()   	               xfrm_hash_resize()
       xfrm_alloc_spi                                  /*hold lock*/
            x->id.spi = htonl(spi)                     spin_lock_bh(&net->xfrm.xfrm_state_lock)
            /*waiting lock release*/                     xfrm_hash_transfer()
            spin_lock_bh(&net->xfrm.xfrm_state_lock)      /*add x into hlist:net->xfrm.state_byspi*/
      	                                                hlist_add_head_rcu(&x->byspi)
                                                       spin_unlock_bh(&net->xfrm.xfrm_state_lock)
      
          /*add x into hlist:net->xfrm.state_byspi 2 times*/
          hlist_add_head_rcu(&x->byspi)
      
      1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist
      in  __find_acq_core() on the LHS;
      2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state
      (include x) into the new bydst hlist and new byspi hlist;
      3. user thread on the LHS gets the lock and adds x into the new byspi hlist again.
      
      So the same xfrm_state (x) is added into the same list_hash
      (net->xfrm.state_byspi) 2 times that makes the list_hash become
      an inifite loop.
      
      To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved
      to the back of spin_lock_bh, sothat state_hash_work thread no longer add x
      which id.spi is zero into the hash_list.
      
      Fixes: f034b5d4 ("[XFRM]: Dynamic xfrm_state hash table sizing.")
      Signed-off-by: Nzhuoliang zhang <zhuoliang.zhang@mediatek.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      a779d913
  3. 25 9月, 2020 1 次提交
    • H
      xfrm: Use correct address family in xfrm_state_find · e94ee171
      Herbert Xu 提交于
      The struct flowi must never be interpreted by itself as its size
      depends on the address family.  Therefore it must always be grouped
      with its original family value.
      
      In this particular instance, the original family value is lost in
      the function xfrm_state_find.  Therefore we get a bogus read when
      it's coupled with the wrong family which would occur with inter-
      family xfrm states.
      
      This patch fixes it by keeping the original family value.
      
      Note that the same bug could potentially occur in LSM through
      the xfrm_state_pol_flow_match hook.  I checked the current code
      there and it seems to be safe for now as only secid is used which
      is part of struct flowi_common.  But that API should be changed
      so that so that we don't get new bugs in the future.  We could
      do that by replacing fl with just secid or adding a family field.
      
      Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com
      Fixes: 48b8d783 ("[XFRM]: State selection update to use inner...")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e94ee171
  4. 24 9月, 2020 2 次提交
  5. 07 9月, 2020 3 次提交
  6. 25 7月, 2020 1 次提交
  7. 19 2月, 2020 1 次提交
  8. 09 12月, 2019 1 次提交
  9. 07 11月, 2019 1 次提交
  10. 01 7月, 2019 1 次提交
  11. 06 6月, 2019 2 次提交
  12. 05 6月, 2019 3 次提交
  13. 21 5月, 2019 1 次提交
  14. 23 4月, 2019 1 次提交
  15. 08 4月, 2019 4 次提交
  16. 26 3月, 2019 1 次提交
    • C
      xfrm: clean up xfrm protocol checks · dbb2483b
      Cong Wang 提交于
      In commit 6a53b759 ("xfrm: check id proto in validate_tmpl()")
      I introduced a check for xfrm protocol, but according to Herbert
      IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
      it should be removed from validate_tmpl().
      
      And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
      protocols, this is why xfrm_state_flush() could still miss
      IPPROTO_ROUTING, which leads that those entries are left in
      net->xfrm.state_all before exit net. Fix this by replacing
      IPSEC_PROTO_ANY with zero.
      
      This patch also extracts the check from validate_tmpl() to
      xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
      With this, no other protocols should be added into xfrm.
      
      Fixes: 6a53b759 ("xfrm: check id proto in validate_tmpl()")
      Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      dbb2483b
  17. 22 3月, 2019 1 次提交
  18. 05 2月, 2019 1 次提交
    • C
      xfrm: destroy xfrm_state synchronously on net exit path · f75a2804
      Cong Wang 提交于
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f75a2804
  19. 23 11月, 2018 1 次提交
  20. 06 11月, 2018 1 次提交
  21. 01 11月, 2018 1 次提交
    • D
      compat: Cleanup in_compat_syscall() callers · 98f76206
      Dmitry Safonov 提交于
      Now that in_compat_syscall() is consistent on all architectures and does
      not longer report true on native i686, the workarounds (ifdeffery and
      helpers) can be removed.
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-efi@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com
      98f76206
  22. 20 7月, 2018 2 次提交
    • N
      xfrm: Allow xfrmi if_id to be updated by UPDSA · 5baf4f9c
      Nathan Harold 提交于
      Allow attaching an SA to an xfrm interface id after
      the creation of the SA, so that tasks such as keying
      which must be done as the SA is created, can remain
      separate from the decision on how to route traffic
      from an SA. This permits SA creation to be decomposed
      in to three separate steps:
      1) allocation of a SPI
      2) algorithm and key negotiation
      3) insertion into the data path
      Signed-off-by: NNathan Harold <nharold@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      5baf4f9c
    • B
      xfrm: Remove xfrmi interface ID from flowi · bc56b334
      Benedict Wong 提交于
      In order to remove performance impact of having the extra u32 in every
      single flowi, this change removes the flowi_xfrm struct, prefering to
      take the if_id as a method parameter where needed.
      
      In the inbound direction, if_id is only needed during the
      __xfrm_check_policy() function, and the if_id can be determined at that
      point based on the skb. As such, xfrmi_decode_session() is only called
      with the skb in __xfrm_check_policy().
      
      In the outbound direction, the only place where if_id is needed is the
      xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
      directly passed into the xfrm_lookup_with_ifid() call. All existing
      callers can still call xfrm_lookup(), which uses a default if_id of 0.
      
      This change does not change any behavior of XFRMIs except for improving
      overall system performance via flowi size reduction.
      
      This change has been tested against the Android Kernel Networking Tests:
      
      https://android.googlesource.com/kernel/tests/+/master/net/testSigned-off-by: NBenedict Wong <benedictwong@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      bc56b334
  23. 11 7月, 2018 1 次提交
    • A
      xfrm: use time64_t for in-kernel timestamps · 386c5680
      Arnd Bergmann 提交于
      The lifetime managment uses '__u64' timestamps on the user space
      interface, but 'unsigned long' for reading the current time in the kernel
      with get_seconds().
      
      While this is probably safe beyond y2038, it will still overflow in 2106,
      and the get_seconds() call is deprecated because fo that.
      
      This changes the xfrm time handling to use time64_t consistently, along
      with reading the time using the safer ktime_get_real_seconds(). It still
      suffers from problems that can happen from a concurrent settimeofday()
      call or (to a lesser degree) a leap second update, but since the time
      stamps are part of the user API, there is nothing we can do to prevent
      that.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      386c5680
  24. 01 7月, 2018 1 次提交
    • N
      xfrm: Allow Set Mark to be Updated Using UPDSA · 6d8e85ff
      Nathan Harold 提交于
      Allow UPDSA to change "set mark" to permit
      policy separation of packet routing decisions from
      SA keying in systems that use mark-based routing.
      
      The set mark, used as a routing and firewall mark
      for outbound packets, is made update-able which
      allows routing decisions to be handled independently
      of keying/SA creation. To maintain consistency with
      other optional attributes, the set mark is only
      updated if sent with a non-zero value.
      
      The per-SA lock and the xfrm_state_lock are taken in
      that order to avoid a deadlock with
      xfrm_timer_handler(), which also takes the locks in
      that order.
      Signed-off-by: NNathan Harold <nharold@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      6d8e85ff
  25. 25 6月, 2018 1 次提交
    • F
      xfrm: policy: remove pcpu policy cache · e4db5b61
      Florian Westphal 提交于
      Kristian Evensen says:
        In a project I am involved in, we are running ipsec (Strongswan) on
        different mt7621-based routers. Each router is configured as an
        initiator and has around ~30 tunnels to different responders (running
        on misc. devices). Before the flow cache was removed (kernel 4.9), we
        got a combined throughput of around 70Mbit/s for all tunnels on one
        router. However, we recently switched to kernel 4.14 (4.14.48), and
        the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
        drop of around 20%. Reverting the flow cache removal restores, as
        expected, performance levels to that of kernel 4.9.
      
      When pcpu xdst exists, it has to be validated first before it can be
      used.
      
      A negative hit thus increases cost vs. no-cache.
      
      As number of tunnels increases, hit rate decreases so this pcpu caching
      isn't a viable strategy.
      
      Furthermore, the xdst cache also needs to run with BH off, so when
      removing this the bh disable/enable pairs can be removed too.
      
      Kristian tested a 4.14.y backport of this change and reported
      increased performance:
      
        In our tests, the throughput reduction has been reduced from around -20%
        to -5%. We also see that the overall throughput is independent of the
        number of tunnels, while before the throughput was reduced as the number
        of tunnels increased.
      Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e4db5b61
  26. 23 6月, 2018 1 次提交
  27. 04 5月, 2018 1 次提交
    • M
      xfrm: use a dedicated slab cache for struct xfrm_state · 565f0fa9
      Mathias Krause 提交于
      struct xfrm_state is rather large (768 bytes here) and therefore wastes
      quite a lot of memory as it falls into the kmalloc-1024 slab cache,
      leaving 256 bytes of unused memory per XFRM state object -- a net waste
      of 25%.
      
      Using a dedicated slab cache for struct xfrm_state reduces the level of
      internal fragmentation to a minimum.
      
      On my configuration SLUB chooses to create a slab cache covering 4
      pages holding 21 objects, resulting in an average memory waste of ~13
      bytes per object -- a net waste of only 1.6%.
      
      In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
      states.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      565f0fa9
  28. 16 4月, 2018 1 次提交
  29. 02 2月, 2018 1 次提交
  30. 23 1月, 2018 1 次提交