1. 25 6月, 2018 1 次提交
    • F
      xfrm: policy: remove pcpu policy cache · e4db5b61
      Florian Westphal 提交于
      Kristian Evensen says:
        In a project I am involved in, we are running ipsec (Strongswan) on
        different mt7621-based routers. Each router is configured as an
        initiator and has around ~30 tunnels to different responders (running
        on misc. devices). Before the flow cache was removed (kernel 4.9), we
        got a combined throughput of around 70Mbit/s for all tunnels on one
        router. However, we recently switched to kernel 4.14 (4.14.48), and
        the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
        drop of around 20%. Reverting the flow cache removal restores, as
        expected, performance levels to that of kernel 4.9.
      
      When pcpu xdst exists, it has to be validated first before it can be
      used.
      
      A negative hit thus increases cost vs. no-cache.
      
      As number of tunnels increases, hit rate decreases so this pcpu caching
      isn't a viable strategy.
      
      Furthermore, the xdst cache also needs to run with BH off, so when
      removing this the bh disable/enable pairs can be removed too.
      
      Kristian tested a 4.14.y backport of this change and reported
      increased performance:
      
        In our tests, the throughput reduction has been reduced from around -20%
        to -5%. We also see that the overall throughput is independent of the
        number of tunnels, while before the throughput was reduced as the number
        of tunnels increased.
      Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e4db5b61
  2. 23 6月, 2018 3 次提交
  3. 31 5月, 2018 1 次提交
  4. 30 3月, 2018 1 次提交
  5. 28 3月, 2018 1 次提交
  6. 07 3月, 2018 1 次提交
  7. 27 2月, 2018 1 次提交
  8. 20 2月, 2018 1 次提交
  9. 19 2月, 2018 1 次提交
  10. 13 2月, 2018 2 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
    • S
      xfrm: Fix policy hold queue after flowcache removal. · 2471c981
      Steffen Klassert 提交于
      Now that the flowcache is removed we need to generate
      a new dummy bundle every time we check if the needed
      SAs are in place because the dummy bundle is not cached
      anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag
      to xfrm_lookup(). This makes sure that we get a dummy
      bundle in case the SAs are not yet in place.
      
      Fixes: 3ca28286 ("xfrm_policy: bypass flow_cache_lookup")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      2471c981
  11. 10 1月, 2018 1 次提交
  12. 08 1月, 2018 1 次提交
  13. 30 12月, 2017 1 次提交
    • F
      xfrm: skip policies marked as dead while rehashing · 862591bf
      Florian Westphal 提交于
      syzkaller triggered following KASAN splat:
      
      BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
      read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..]
      Workqueue: events xfrm_hash_rebuild [..]
       __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
       xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
       process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
       worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]
      
      The reproducer triggers:
      1016                 if (error) {
      1017                         list_move_tail(&walk->walk.all, &x->all);
      1018                         goto out;
      1019                 }
      
      in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
      callback returns -ENOBUFS).
      
      In this case, *walk is located the pfkey socket struct, so this socket
      becomes visible in the global policy list.
      
      It looks like this is intentional -- phony walker has walk.dead set to 1
      and all other places skip such "policies".
      
      Ccing original authors of the two commits that seem to expose this
      issue (first patch missed ->dead check, second patch adds pfkey
      sockets to policies dumper list).
      
      Fixes: 880a6fab ("xfrm: configure policy hash table thresholds by netlink")
      Fixes: 12a169e7 ("ipsec: Put dumpers on the dump list")
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Timo Teras <timo.teras@iki.fi>
      Cc: Christophe Gouault <christophe.gouault@6wind.com>
      Reported-by: Nsyzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      862591bf
  14. 12 12月, 2017 1 次提交
  15. 01 12月, 2017 1 次提交
  16. 30 11月, 2017 5 次提交
  17. 15 11月, 2017 1 次提交
  18. 14 11月, 2017 1 次提交
  19. 03 11月, 2017 2 次提交
    • S
      xfrm: Fix stack-out-of-bounds read in xfrm_state_find. · c9f3f813
      Steffen Klassert 提交于
      When we do tunnel or beet mode, we pass saddr and daddr from the
      template to xfrm_state_find(), this is ok. On transport mode,
      we pass the addresses from the flowi, assuming that the IP
      addresses (and address family) don't change during transformation.
      This assumption is wrong in the IPv4 mapped IPv6 case, packet
      is IPv4 and template is IPv6. Fix this by using the addresses
      from the template unconditionally.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c9f3f813
    • F
      xfrm: do unconditional template resolution before pcpu cache check · cf379667
      Florian Westphal 提交于
      Stephen Smalley says:
       Since 4.14-rc1, the selinux-testsuite has been encountering sporadic
       failures during testing of labeled IPSEC. git bisect pointed to
       commit ec30d ("xfrm: add xdst pcpu cache").
       The xdst pcpu cache is only checking that the policies are the same,
       but does not validate that the policy, state, and flow match with respect
       to security context labeling.
       As a result, the wrong SA could be used and the receiver could end up
       performing permission checking and providing SO_PEERSEC or SCM_SECURITY
       values for the wrong security context.
      
      This fix makes it so that we always do the template resolution, and
      then checks that the found states match those in the pcpu bundle.
      
      This has the disadvantage of doing a bit more work (lookup in state hash
      table) if we can reuse the xdst entry (we only avoid xdst alloc/free)
      but we don't add a lot of extra work in case we can't reuse.
      
      xfrm_pol_dead() check is removed, reasoning is that
      xfrm_tmpl_resolve does all needed checks.
      
      Cc: Paul Moore <paul@paul-moore.com>
      Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
      Reported-by: NStephen Smalley <sds@tycho.nsa.gov>
      Tested-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      cf379667
  20. 24 10月, 2017 1 次提交
  21. 18 10月, 2017 1 次提交
  22. 11 10月, 2017 1 次提交
  23. 24 8月, 2017 1 次提交
  24. 11 8月, 2017 1 次提交
    • L
      net: xfrm: support setting an output mark. · 077fbac4
      Lorenzo Colitti 提交于
      On systems that use mark-based routing it may be necessary for
      routing lookups to use marks in order for packets to be routed
      correctly. An example of such a system is Android, which uses
      socket marks to route packets via different networks.
      
      Currently, routing lookups in tunnel mode always use a mark of
      zero, making routing incorrect on such systems.
      
      This patch adds a new output_mark element to the xfrm state and
      a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
      mark differs from the existing xfrm mark in two ways:
      
      1. The xfrm mark is used to match xfrm policies and states, while
         the xfrm output mark is used to set the mark (and influence
         the routing) of the packets emitted by those states.
      2. The existing mark is constrained to be a subset of the bits of
         the originating socket or transformed packet, but the output
         mark is arbitrary and depends only on the state.
      
      The use of a separate mark provides additional flexibility. For
      example:
      
      - A packet subject to two transforms (e.g., transport mode inside
        tunnel mode) can have two different output marks applied to it,
        one for the transport mode SA and one for the tunnel mode SA.
      - On a system where socket marks determine routing, the packets
        emitted by an IPsec tunnel can be routed based on a mark that
        is determined by the tunnel, not by the marks of the
        unencrypted packets.
      - Support for setting the output marks can be introduced without
        breaking any existing setups that employ both mark-based
        routing and xfrm tunnel mode. Simply changing the code to use
        the xfrm mark for routing output packets could xfrm mark could
        change behaviour in a way that breaks these setups.
      
      If the output mark is unspecified or set to zero, the mark is not
      set or changed.
      
      Tested: make allyesconfig; make -j64
      Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      077fbac4
  25. 08 8月, 2017 1 次提交
  26. 03 8月, 2017 1 次提交
  27. 19 7月, 2017 6 次提交