1. 26 5月, 2019 1 次提交
  2. 24 3月, 2019 1 次提交
    • T
      xfrm: Fix inbound traffic via XFRM interfaces across network namespaces · 6ac400b7
      Tobias Brunner 提交于
      [ Upstream commit 660899ddf06ae8bb5bbbd0a19418b739375430c5 ]
      
      After moving an XFRM interface to another namespace it stays associated
      with the original namespace (net in `struct xfrm_if` and the list keyed
      with `xfrmi_net_id`), allowing processes in the new namespace to use
      SAs/policies that were created in the original namespace.  For instance,
      this allows a keying daemon in one namespace to establish IPsec SAs for
      other namespaces without processes there having access to the keys or IKE
      credentials.
      
      This worked fine for outbound traffic, however, for inbound traffic the
      lookup for the interfaces and the policies used the incorrect namespace
      (the one the XFRM interface was moved to).
      
      Fixes: f203b76d ("xfrm: Add virtual xfrm interfaces")
      Signed-off-by: NTobias Brunner <tobias@strongswan.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6ac400b7
  3. 15 2月, 2019 1 次提交
  4. 11 10月, 2018 1 次提交
  5. 11 9月, 2018 1 次提交
  6. 26 7月, 2018 1 次提交
  7. 20 7月, 2018 1 次提交
    • B
      xfrm: Remove xfrmi interface ID from flowi · bc56b334
      Benedict Wong 提交于
      In order to remove performance impact of having the extra u32 in every
      single flowi, this change removes the flowi_xfrm struct, prefering to
      take the if_id as a method parameter where needed.
      
      In the inbound direction, if_id is only needed during the
      __xfrm_check_policy() function, and the if_id can be determined at that
      point based on the skb. As such, xfrmi_decode_session() is only called
      with the skb in __xfrm_check_policy().
      
      In the outbound direction, the only place where if_id is needed is the
      xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
      directly passed into the xfrm_lookup_with_ifid() call. All existing
      callers can still call xfrm_lookup(), which uses a default if_id of 0.
      
      This change does not change any behavior of XFRMIs except for improving
      overall system performance via flowi size reduction.
      
      This change has been tested against the Android Kernel Networking Tests:
      
      https://android.googlesource.com/kernel/tests/+/master/net/testSigned-off-by: NBenedict Wong <benedictwong@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      bc56b334
  8. 11 7月, 2018 1 次提交
    • A
      xfrm: use time64_t for in-kernel timestamps · 386c5680
      Arnd Bergmann 提交于
      The lifetime managment uses '__u64' timestamps on the user space
      interface, but 'unsigned long' for reading the current time in the kernel
      with get_seconds().
      
      While this is probably safe beyond y2038, it will still overflow in 2106,
      and the get_seconds() call is deprecated because fo that.
      
      This changes the xfrm time handling to use time64_t consistently, along
      with reading the time using the safer ktime_get_real_seconds(). It still
      suffers from problems that can happen from a concurrent settimeofday()
      call or (to a lesser degree) a leap second update, but since the time
      stamps are part of the user API, there is nothing we can do to prevent
      that.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      386c5680
  9. 25 6月, 2018 1 次提交
    • F
      xfrm: policy: remove pcpu policy cache · e4db5b61
      Florian Westphal 提交于
      Kristian Evensen says:
        In a project I am involved in, we are running ipsec (Strongswan) on
        different mt7621-based routers. Each router is configured as an
        initiator and has around ~30 tunnels to different responders (running
        on misc. devices). Before the flow cache was removed (kernel 4.9), we
        got a combined throughput of around 70Mbit/s for all tunnels on one
        router. However, we recently switched to kernel 4.14 (4.14.48), and
        the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
        drop of around 20%. Reverting the flow cache removal restores, as
        expected, performance levels to that of kernel 4.9.
      
      When pcpu xdst exists, it has to be validated first before it can be
      used.
      
      A negative hit thus increases cost vs. no-cache.
      
      As number of tunnels increases, hit rate decreases so this pcpu caching
      isn't a viable strategy.
      
      Furthermore, the xdst cache also needs to run with BH off, so when
      removing this the bh disable/enable pairs can be removed too.
      
      Kristian tested a 4.14.y backport of this change and reported
      increased performance:
      
        In our tests, the throughput reduction has been reduced from around -20%
        to -5%. We also see that the overall throughput is independent of the
        number of tunnels, while before the throughput was reduced as the number
        of tunnels increased.
      Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e4db5b61
  10. 23 6月, 2018 4 次提交
  11. 31 5月, 2018 1 次提交
  12. 30 3月, 2018 1 次提交
  13. 28 3月, 2018 1 次提交
  14. 07 3月, 2018 1 次提交
  15. 27 2月, 2018 1 次提交
  16. 20 2月, 2018 1 次提交
  17. 19 2月, 2018 1 次提交
  18. 13 2月, 2018 2 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
    • S
      xfrm: Fix policy hold queue after flowcache removal. · 2471c981
      Steffen Klassert 提交于
      Now that the flowcache is removed we need to generate
      a new dummy bundle every time we check if the needed
      SAs are in place because the dummy bundle is not cached
      anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag
      to xfrm_lookup(). This makes sure that we get a dummy
      bundle in case the SAs are not yet in place.
      
      Fixes: 3ca28286 ("xfrm_policy: bypass flow_cache_lookup")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      2471c981
  19. 10 1月, 2018 1 次提交
  20. 08 1月, 2018 1 次提交
  21. 30 12月, 2017 1 次提交
    • F
      xfrm: skip policies marked as dead while rehashing · 862591bf
      Florian Westphal 提交于
      syzkaller triggered following KASAN splat:
      
      BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
      read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..]
      Workqueue: events xfrm_hash_rebuild [..]
       __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
       xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
       process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
       worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]
      
      The reproducer triggers:
      1016                 if (error) {
      1017                         list_move_tail(&walk->walk.all, &x->all);
      1018                         goto out;
      1019                 }
      
      in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
      callback returns -ENOBUFS).
      
      In this case, *walk is located the pfkey socket struct, so this socket
      becomes visible in the global policy list.
      
      It looks like this is intentional -- phony walker has walk.dead set to 1
      and all other places skip such "policies".
      
      Ccing original authors of the two commits that seem to expose this
      issue (first patch missed ->dead check, second patch adds pfkey
      sockets to policies dumper list).
      
      Fixes: 880a6fab ("xfrm: configure policy hash table thresholds by netlink")
      Fixes: 12a169e7 ("ipsec: Put dumpers on the dump list")
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Timo Teras <timo.teras@iki.fi>
      Cc: Christophe Gouault <christophe.gouault@6wind.com>
      Reported-by: Nsyzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      862591bf
  22. 12 12月, 2017 1 次提交
  23. 01 12月, 2017 1 次提交
  24. 30 11月, 2017 5 次提交
  25. 15 11月, 2017 1 次提交
  26. 14 11月, 2017 1 次提交
  27. 03 11月, 2017 2 次提交
    • S
      xfrm: Fix stack-out-of-bounds read in xfrm_state_find. · c9f3f813
      Steffen Klassert 提交于
      When we do tunnel or beet mode, we pass saddr and daddr from the
      template to xfrm_state_find(), this is ok. On transport mode,
      we pass the addresses from the flowi, assuming that the IP
      addresses (and address family) don't change during transformation.
      This assumption is wrong in the IPv4 mapped IPv6 case, packet
      is IPv4 and template is IPv6. Fix this by using the addresses
      from the template unconditionally.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c9f3f813
    • F
      xfrm: do unconditional template resolution before pcpu cache check · cf379667
      Florian Westphal 提交于
      Stephen Smalley says:
       Since 4.14-rc1, the selinux-testsuite has been encountering sporadic
       failures during testing of labeled IPSEC. git bisect pointed to
       commit ec30d ("xfrm: add xdst pcpu cache").
       The xdst pcpu cache is only checking that the policies are the same,
       but does not validate that the policy, state, and flow match with respect
       to security context labeling.
       As a result, the wrong SA could be used and the receiver could end up
       performing permission checking and providing SO_PEERSEC or SCM_SECURITY
       values for the wrong security context.
      
      This fix makes it so that we always do the template resolution, and
      then checks that the found states match those in the pcpu bundle.
      
      This has the disadvantage of doing a bit more work (lookup in state hash
      table) if we can reuse the xdst entry (we only avoid xdst alloc/free)
      but we don't add a lot of extra work in case we can't reuse.
      
      xfrm_pol_dead() check is removed, reasoning is that
      xfrm_tmpl_resolve does all needed checks.
      
      Cc: Paul Moore <paul@paul-moore.com>
      Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
      Reported-by: NStephen Smalley <sds@tycho.nsa.gov>
      Tested-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      cf379667
  28. 24 10月, 2017 1 次提交
  29. 18 10月, 2017 1 次提交
  30. 11 10月, 2017 1 次提交
  31. 24 8月, 2017 1 次提交