1. 29 12月, 2018 1 次提交
  2. 20 7月, 2018 2 次提交
    • N
      xfrm: Allow xfrmi if_id to be updated by UPDSA · 5baf4f9c
      Nathan Harold 提交于
      Allow attaching an SA to an xfrm interface id after
      the creation of the SA, so that tasks such as keying
      which must be done as the SA is created, can remain
      separate from the decision on how to route traffic
      from an SA. This permits SA creation to be decomposed
      in to three separate steps:
      1) allocation of a SPI
      2) algorithm and key negotiation
      3) insertion into the data path
      Signed-off-by: NNathan Harold <nharold@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      5baf4f9c
    • B
      xfrm: Remove xfrmi interface ID from flowi · bc56b334
      Benedict Wong 提交于
      In order to remove performance impact of having the extra u32 in every
      single flowi, this change removes the flowi_xfrm struct, prefering to
      take the if_id as a method parameter where needed.
      
      In the inbound direction, if_id is only needed during the
      __xfrm_check_policy() function, and the if_id can be determined at that
      point based on the skb. As such, xfrmi_decode_session() is only called
      with the skb in __xfrm_check_policy().
      
      In the outbound direction, the only place where if_id is needed is the
      xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
      directly passed into the xfrm_lookup_with_ifid() call. All existing
      callers can still call xfrm_lookup(), which uses a default if_id of 0.
      
      This change does not change any behavior of XFRMIs except for improving
      overall system performance via flowi size reduction.
      
      This change has been tested against the Android Kernel Networking Tests:
      
      https://android.googlesource.com/kernel/tests/+/master/net/testSigned-off-by: NBenedict Wong <benedictwong@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      bc56b334
  3. 11 7月, 2018 1 次提交
    • A
      xfrm: use time64_t for in-kernel timestamps · 386c5680
      Arnd Bergmann 提交于
      The lifetime managment uses '__u64' timestamps on the user space
      interface, but 'unsigned long' for reading the current time in the kernel
      with get_seconds().
      
      While this is probably safe beyond y2038, it will still overflow in 2106,
      and the get_seconds() call is deprecated because fo that.
      
      This changes the xfrm time handling to use time64_t consistently, along
      with reading the time using the safer ktime_get_real_seconds(). It still
      suffers from problems that can happen from a concurrent settimeofday()
      call or (to a lesser degree) a leap second update, but since the time
      stamps are part of the user API, there is nothing we can do to prevent
      that.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      386c5680
  4. 01 7月, 2018 1 次提交
    • N
      xfrm: Allow Set Mark to be Updated Using UPDSA · 6d8e85ff
      Nathan Harold 提交于
      Allow UPDSA to change "set mark" to permit
      policy separation of packet routing decisions from
      SA keying in systems that use mark-based routing.
      
      The set mark, used as a routing and firewall mark
      for outbound packets, is made update-able which
      allows routing decisions to be handled independently
      of keying/SA creation. To maintain consistency with
      other optional attributes, the set mark is only
      updated if sent with a non-zero value.
      
      The per-SA lock and the xfrm_state_lock are taken in
      that order to avoid a deadlock with
      xfrm_timer_handler(), which also takes the locks in
      that order.
      Signed-off-by: NNathan Harold <nharold@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      6d8e85ff
  5. 25 6月, 2018 1 次提交
    • F
      xfrm: policy: remove pcpu policy cache · e4db5b61
      Florian Westphal 提交于
      Kristian Evensen says:
        In a project I am involved in, we are running ipsec (Strongswan) on
        different mt7621-based routers. Each router is configured as an
        initiator and has around ~30 tunnels to different responders (running
        on misc. devices). Before the flow cache was removed (kernel 4.9), we
        got a combined throughput of around 70Mbit/s for all tunnels on one
        router. However, we recently switched to kernel 4.14 (4.14.48), and
        the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
        drop of around 20%. Reverting the flow cache removal restores, as
        expected, performance levels to that of kernel 4.9.
      
      When pcpu xdst exists, it has to be validated first before it can be
      used.
      
      A negative hit thus increases cost vs. no-cache.
      
      As number of tunnels increases, hit rate decreases so this pcpu caching
      isn't a viable strategy.
      
      Furthermore, the xdst cache also needs to run with BH off, so when
      removing this the bh disable/enable pairs can be removed too.
      
      Kristian tested a 4.14.y backport of this change and reported
      increased performance:
      
        In our tests, the throughput reduction has been reduced from around -20%
        to -5%. We also see that the overall throughput is independent of the
        number of tunnels, while before the throughput was reduced as the number
        of tunnels increased.
      Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e4db5b61
  6. 23 6月, 2018 1 次提交
  7. 04 5月, 2018 1 次提交
    • M
      xfrm: use a dedicated slab cache for struct xfrm_state · 565f0fa9
      Mathias Krause 提交于
      struct xfrm_state is rather large (768 bytes here) and therefore wastes
      quite a lot of memory as it falls into the kmalloc-1024 slab cache,
      leaving 256 bytes of unused memory per XFRM state object -- a net waste
      of 25%.
      
      Using a dedicated slab cache for struct xfrm_state reduces the level of
      internal fragmentation to a minimum.
      
      On my configuration SLUB chooses to create a slab cache covering 4
      pages holding 21 objects, resulting in an average memory waste of ~13
      bytes per object -- a net waste of only 1.6%.
      
      In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
      states.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      565f0fa9
  8. 16 4月, 2018 1 次提交
  9. 02 2月, 2018 1 次提交
  10. 23 1月, 2018 1 次提交
  11. 18 1月, 2018 1 次提交
  12. 31 12月, 2017 1 次提交
  13. 30 12月, 2017 1 次提交
    • H
      xfrm: Forbid state updates from changing encap type · 257a4b01
      Herbert Xu 提交于
      Currently we allow state updates to competely replace the contents
      of x->encap.  This is bad because on the user side ESP only sets up
      header lengths depending on encap_type once when the state is first
      created.  This could result in the header lengths getting out of
      sync with the actual state configuration.
      
      In practice key managers will never do a state update to change the
      encapsulation type.  Only the port numbers need to be changed as the
      peer NAT entry is updated.
      
      Therefore this patch adds a check in xfrm_state_update to forbid
      any changes to the encap_type.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      257a4b01
  14. 08 12月, 2017 1 次提交
  15. 30 11月, 2017 1 次提交
    • L
      net: xfrm: allow clearing socket xfrm policies. · be8f8284
      Lorenzo Colitti 提交于
      Currently it is possible to add or update socket policies, but
      not clear them. Therefore, once a socket policy has been applied,
      the socket cannot be used for unencrypted traffic.
      
      This patch allows (privileged) users to clear socket policies by
      passing in a NULL pointer and zero length argument to the
      {IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both
      the incoming and outgoing policies being cleared.
      
      The simple approach taken in this patch cannot clear socket
      policies in only one direction. If desired this could be added
      in the future, for example by continuing to pass in a length of
      zero (which currently is guaranteed to return EMSGSIZE) and
      making the policy be a pointer to an integer that contains one
      of the XFRM_POLICY_{IN,OUT} enum values.
      
      An alternative would have been to interpret the length as a
      signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the
      input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output
      policy.
      
      Tested: https://android-review.googlesource.com/539816Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      be8f8284
  16. 22 11月, 2017 1 次提交
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
  17. 26 10月, 2017 1 次提交
    • J
      xfrm: Clear sk_dst_cache when applying per-socket policy. · 2b06cdf3
      Jonathan Basseri 提交于
      If a socket has a valid dst cache, then xfrm_lookup_route will get
      skipped. However, the cache is not invalidated when applying policy to a
      socket (i.e. IPV6_XFRM_POLICY). The result is that new policies are
      sometimes ignored on those sockets. (Note: This was broken for IPv4 and
      IPv6 at different times.)
      
      This can be demonstrated like so,
      1. Create UDP socket.
      2. connect() the socket.
      3. Apply an outbound XFRM policy to the socket. (setsockopt)
      4. send() data on the socket.
      
      Packets will continue to be sent in the clear instead of matching an
      xfrm or returning a no-match error (EAGAIN). This affects calls to
      send() and not sendto().
      
      Invalidating the sk_dst_cache is necessary to correctly apply xfrm
      policies. Since we do this in xfrm_user_policy(), the sk_lock was
      already acquired in either do_ip_setsockopt() or do_ipv6_setsockopt(),
      and we may call __sk_dst_reset().
      
      Performance impact should be negligible, since this code is only called
      when changing xfrm policy, and only affects the socket in question.
      
      Fixes: 00bc0ef5 ("ipv6: Skip XFRM lookup if dst_entry in socket cache is valid")
      Tested: https://android-review.googlesource.com/517555
      Tested: https://android-review.googlesource.com/418659Signed-off-by: NJonathan Basseri <misterikkit@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      2b06cdf3
  18. 28 9月, 2017 1 次提交
  19. 02 8月, 2017 2 次提交
  20. 19 7月, 2017 1 次提交
    • F
      xfrm: add xdst pcpu cache · ec30d78c
      Florian Westphal 提交于
      retain last used xfrm_dst in a pcpu cache.
      On next request, reuse this dst if the policies are the same.
      
      The cache will not help with strict RR workloads as there is no hit.
      
      The cache packet-path part is reasonably small, the notifier part is
      needed so we do not add long hangs when a device is dismantled but some
      pcpu xdst still holds a reference, there are also calls to the flush
      operation when userspace deletes SAs so modules can be removed
      (there is no hit.
      
      We need to run the dst_release on the correct cpu to avoid races with
      packet path.  This is done by adding a work_struct for each cpu and then
      doing the actual test/release on each affected cpu via schedule_work_on().
      
      Test results using 4 network namespaces and null encryption:
      
      ns1           ns2          -> ns3           -> ns4
      netperf -> xfrm/null enc   -> xfrm/null dec -> netserver
      
      what                    TCP_STREAM      UDP_STREAM      UDP_RR
      Flow cache:             14644.61        294.35          327231.64
      No flow cache:		14349.81	242.64		202301.72
      Pcpu cache:		14629.70	292.21		205595.22
      
      UDP tests used 64byte packets, tests ran for one minute each,
      value is average over ten iterations.
      
      'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
      series but without this patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec30d78c
  21. 05 7月, 2017 1 次提交
  22. 30 6月, 2017 1 次提交
  23. 07 6月, 2017 2 次提交
  24. 19 5月, 2017 1 次提交
    • A
      xfrm: fix state migration copy replay sequence numbers · a486cd23
      Antony Antony 提交于
      During xfrm migration copy replay and preplay sequence numbers
      from the previous state.
      
      Here is a tcpdump output showing the problem.
      10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder.
      After the migration it sent wrong sequence number, reset to 1.
      The migration is from 10.0.0.52 to 10.0.0.53.
      
      IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136
      IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136
      IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136
      IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136
      
      IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
      IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
      IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
      IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
      
      IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136
      
      NOTE: next sequence is wrong 0x1
      
      IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136
      IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136
      IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136
      Signed-off-by: NAntony Antony <antony@phenome.org>
      Reviewed-by: NRichard Guy Briggs <rgb@tricolour.ca>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      a486cd23
  25. 16 5月, 2017 1 次提交
  26. 14 4月, 2017 2 次提交
  27. 16 1月, 2017 1 次提交
    • F
      xfrm: fix possible null deref in xfrm_init_tempstate · 3819a35f
      Florian Westphal 提交于
      Dan reports following smatch warning:
       net/xfrm/xfrm_state.c:659
       error: we previously assumed 'afinfo' could be null (see line 651)
      
       649  struct xfrm_state_afinfo *afinfo = xfrm_state_afinfo_get_rcu(family);
       651  if (afinfo)
      		...
       658  }
       659  afinfo->init_temprop(x, tmpl, daddr, saddr);
      
      I am resonably sure afinfo cannot be NULL here.
      
      xfrm_state4.c and state6.c are both part of ipv4/ipv6 (depends on
      CONFIG_XFRM, a boolean) but even if ipv6 is a module state6.c can't
      be removed (ipv6 lacks module_exit so it cannot be removed).
      
      The only callers for xfrm6_fini that leads to state backend unregister
      are error unwinding paths that can be called during ipv6 init function.
      
      So after ipv6 module is loaded successfully the state backend cannot go
      away anymore.
      
      The family value from policy lookup path is taken from dst_entry, so
      that should always be AF_INET(6).
      
      However, since this silences the warning and avoids readers of this
      code wondering about possible null deref it seems preferrable to
      be defensive and just add the old check back.
      
      Fixes: 711059b9 ("xfrm: add and use xfrm_state_afinfo_get_rcu")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      3819a35f
  28. 10 1月, 2017 4 次提交
  29. 06 1月, 2017 1 次提交
    • F
      xfrm: state: do not acquire lock in get_mtu helpers · b3b73b8e
      Florian Westphal 提交于
      Once flow cache gets removed the mtu initialisation happens for every skb
      that gets an xfrm attached, so this lock starts to show up in perf.
      
      It is not obvious why this lock is required -- the caller holds
      reference on the state struct, type->destructor is only called from the
      state gc worker (all state structs on gc list must have refcount 0).
      
      xfrm_init_state already has been called (else private data accessed
      by type->get_mtu() would not be set up).
      
      So just remove the lock -- the race on the state (DEAD?) doesn't
      matter (could change right after dropping the lock too).
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      b3b73b8e
  30. 04 1月, 2017 1 次提交
  31. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  32. 25 12月, 2016 1 次提交
  33. 30 9月, 2016 1 次提交