1. 05 12月, 2017 2 次提交
  2. 30 11月, 2017 6 次提交
    • P
      net/reuseport: drop legacy code · e94a62f5
      Paolo Abeni 提交于
      Since commit e32ea7e7 ("soreuseport: fast reuseport UDP socket
      selection") and commit c125e80b ("soreuseport: fast reuseport
      TCP socket selection") the relevant reuseport socket matching the current
      packet is selected by the reuseport_select_sock() call. The only
      exceptions are invalid BPF filters/filters returning out-of-range
      indices.
      In the latter case the code implicitly falls back to using the hash
      demultiplexing, but instead of selecting the socket inside the
      reuseport_select_sock() function, it relies on the hash selection
      logic introduced with the early soreuseport implementation.
      
      With this patch, in case of a BPF filter returning a bad socket
      index value, we fall back to hash-based selection inside the
      reuseport_select_sock() body, so that we can drop some duplicate
      code in the ipv4 and ipv6 stack.
      
      This also allows faster lookup in the above scenario and will allow
      us to avoid computing the hash value for successful, BPF based
      demultiplexing - in a later patch.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e94a62f5
    • D
      net: Remove dst->next · 7149f813
      David Miller 提交于
      There are no more users.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      7149f813
    • D
      xfrm: Move dst->path into struct xfrm_dst · 0f6c480f
      David Miller 提交于
      The first member of an IPSEC route bundle chain sets it's dst->path to
      the underlying ipv4/ipv6 route that carries the bundle.
      
      Stated another way, if one were to follow the xfrm_dst->child chain of
      the bundle, the final non-NULL pointer would be the path and point to
      either an ipv4 or an ipv6 route.
      
      This is largely used to make sure that PMTU events propagate down to
      the correct ipv4 or ipv6 route.
      
      When we don't have the top of an IPSEC bundle 'dst->path == dst'.
      
      Move it down into xfrm_dst and key off of dst->xfrm.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      0f6c480f
    • D
      ipv6: Move dst->from into struct rt6_info. · 3a2232e9
      David Miller 提交于
      The dst->from value is only used by ipv6 routes to track where
      a route "came from".
      
      Any time we clone or copy a core ipv6 route in the ipv6 routing
      tables, we have the copy/clone's ->from point to the base route.
      
      This is used to handle route expiration properly.
      
      Only ipv6 uses this mechanism, and only ipv6 code references
      it.  So it is safe to move it into rt6_info.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      3a2232e9
    • D
      xfrm: Move child route linkage into xfrm_dst. · b6ca8bd5
      David Miller 提交于
      XFRM bundle child chains look like this:
      
      	xdst1 --> xdst2 --> xdst3 --> path_dst
      
      All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
      The final child pointer in the chain, here called 'path_dst', is some
      other kind of route such as an ipv4 or ipv6 one.
      
      The xfrm output path pops routes, one at a time, via the child
      pointer, until we hit one which has a dst->xfrm pointer which
      is NULL.
      
      We can easily preserve the above mechanisms with child sitting
      only in the xfrm_dst structure.  All children in the chain
      before we break out of the xfrm_output() loop have dst->xfrm
      non-NULL and are therefore xfrm_dst objects.
      
      Since we break out of the loop when we find dst->xfrm NULL, we
      will not try to dereference 'dst' as if it were an xfrm_dst.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ca8bd5
    • D
      net: Create and use new helper xfrm_dst_child(). · b92cf4aa
      David Miller 提交于
      Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
      routes are identified by a non-NULL dst->xfrm pointer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b92cf4aa
  3. 24 11月, 2017 1 次提交
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
  4. 23 11月, 2017 1 次提交
    • G
      bpf: introduce ARG_PTR_TO_MEM_OR_NULL · db1ac496
      Gianluca Borello 提交于
      With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
      argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
      and the verifier can prove the value of this next argument is 0. However,
      most helpers are just interested in handling <!NULL, 0>, so forcing them to
      deal with <NULL, 0> makes the implementation of those helpers more
      complicated for no apparent benefits, requiring them to explicitly handle
      those corner cases with checks that bpf programs could start relying upon,
      preventing the possibility of removing them later.
      
      Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
      even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
      ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.
      
      Currently, the only helper that needs this is bpf_csum_diff_proto(), so
      change arg1 and arg3 to this new type as well.
      
      Also add a new battery of tests that explicitly test the
      !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
      various <NULL, 0> variations are focused on bpf_csum_diff, so cover also
      other helpers.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db1ac496
  5. 22 11月, 2017 1 次提交
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
  6. 21 11月, 2017 2 次提交
  7. 16 11月, 2017 2 次提交
    • M
      mm: remove __GFP_COLD · 453f85d4
      Mel Gorman 提交于
      As the page free path makes no distinction between cache hot and cold
      pages, there is no real useful ordering of pages in the free list that
      allocation requests can take advantage of.  Juding from the users of
      __GFP_COLD, it is likely that a number of them are the result of copying
      other sites instead of actually measuring the impact.  Remove the
      __GFP_COLD parameter which simplifies a number of paths in the page
      allocator.
      
      This is potentially controversial but bear in mind that the size of the
      per-cpu pagelists versus modern cache sizes means that the whole per-cpu
      list can often fit in the L3 cache.  Hence, there is only a potential
      benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
      even worse when THP is taken into account which has little or no chance
      of getting a cache-hot page as the per-cpu list is bypassed and the
      zeroing of multiple pages will thrash the cache anyway.
      
      The truncate microbenchmarks are not shown as this patch affects the
      allocation path and not the free path.  A page fault microbenchmark was
      tested but it showed no sigificant difference which is not surprising
      given that the __GFP_COLD branches are a miniscule percentage of the
      fault path.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453f85d4
    • L
      kmemcheck: remove annotations · 49502766
      Levin, Alexander (Sasha Levin) 提交于
      Patch series "kmemcheck: kill kmemcheck", v2.
      
      As discussed at LSF/MM, kill kmemcheck.
      
      KASan is a replacement that is able to work without the limitation of
      kmemcheck (single CPU, slow).  KASan is already upstream.
      
      We are also not aware of any users of kmemcheck (or users who don't
      consider KASan as a suitable replacement).
      
      The only objection was that since KASAN wasn't supported by all GCC
      versions provided by distros at that time we should hold off for 2
      years, and try again.
      
      Now that 2 years have passed, and all distros provide gcc that supports
      KASAN, kill kmemcheck again for the very same reasons.
      
      This patch (of 4):
      
      Remove kmemcheck annotations, and calls to kmemcheck from the kernel.
      
      [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
        Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
      Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.comSigned-off-by: NSasha Levin <alexander.levin@verizon.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Hansen <devtimhansen@gmail.com>
      Cc: Vegard Nossum <vegardno@ifi.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49502766
  8. 15 11月, 2017 1 次提交
    • K
      net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() · 11bf284f
      Kirill Tkhai 提交于
      There is at least unlocked deletion of net->ipv4.fib_notifier_ops
      from net::fib_notifier_ops:
      
      ip_fib_net_exit()
        rtnl_unlock()
        fib4_notifier_exit()
          fib_notifier_ops_unregister(net->ipv4.notifier_ops)
            list_del_rcu(&ops->list)
      
      So fib_seq_sum() can't use rtnl_lock() only for protection.
      
      The possible solution could be to use rtnl_lock()
      in fib_notifier_ops_unregister(), but this adds
      a possible delay during net namespace creation,
      so we better use rcu_read_lock() till someone
      really needs the mutex (if that happens).
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11bf284f
  9. 14 11月, 2017 12 次提交
  10. 11 11月, 2017 2 次提交
    • T
      sock: Remove the global prot_inuse counter. · 5290ada4
      Tonghao Zhang 提交于
      The per-cpu counter for init_net is prepared in core_initcall.
      The patch 7d720c3e ("percpu: add __percpu sparse annotations to net")
      and d6d9ca0f ("net: this_cpu_xxx conversions") optimize the
      routines. Then remove the old counter.
      
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5290ada4
    • J
      tipc: improve link resiliency when rps is activated · 8d6e79d3
      Jon Maloy 提交于
      Currently, the TIPC RPS dissector is based only on the incoming packets'
      source node address, hence steering all traffic from a node to the same
      core. We have seen that this makes the links vulnerable to starvation
      and unnecessary resets when we turn down the link tolerance to very low
      values.
      
      To reduce the risk of this happening, we exempt probe and probe replies
      packets from the convergence to one core per source node. Instead, we do
      the opposite, - we try to diverge those packets across as many cores as
      possible, by randomizing the flow selector key.
      
      To make such packets identifiable to the dissector, we add a new
      'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is
      set both for PROBE and PROBE_REPLY messages, and only for those.
      
      It should be noted that these packets are not part of any flow anyway,
      and only constitute a minuscule fraction of all packets sent across a
      link. Hence, there is no risk that this will affect overall performance.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d6e79d3
  11. 10 11月, 2017 1 次提交
  12. 08 11月, 2017 3 次提交
    • F
      netpoll: Use lockdep to assert IRQs are disabled/enabled · af073393
      Frederic Weisbecker 提交于
      Use lockdep to check that IRQs are enabled or disabled as expected. This
      way the sanity check only shows overhead when concurrency correctness
      debug code is enabled.
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/1509980490-4285-14-git-send-email-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af073393
    • A
      pktgen: document 32-bit timestamp overflow · 7f5d3f27
      Arnd Bergmann 提交于
      Timestamps in pktgen are currently retrieved using the deprecated
      do_gettimeofday() function that wraps its signed 32-bit seconds in 2038
      (on 32-bit architectures) and requires a division operation to calculate
      microseconds.
      
      The pktgen header is also defined with the same limitations, hardcoding
      to a 32-bit seconds field that can be interpreted as unsigned to produce
      times that only wrap in 2106. Whatever code reads the timestamps should
      be aware of that problem in general, but probably doesn't care too
      much as we are mostly interested in the time passing between packets,
      and that is correctly represented.
      
      Using 64-bit nanoseconds would be cheaper and good for 584 years. Using
      monotonic times would also make this unambiguous by avoiding the overflow,
      but would make it harder to correlate to the times with those on remote
      machines. Either approach would require adding a new runtime flag and
      implementing the same thing on the remote side, which we probably don't
      want to do unless someone sees it as a real problem. Also, this should
      be coordinated with other pktgen implementations and might need a new
      magic number.
      
      For the moment, I'm documenting the overflow in the source code, and
      changing the implementation over to an open-coded ktime_get_real_ts64()
      plus division, so we don't have to look at it again while scanning for
      deprecated time interfaces.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f5d3f27
    • C
      rtnetlink: fix missing size for IFLA_IF_NETNSID · 03ac738d
      Colin Ian King 提交于
      The size for IFLA_IF_NETNSID is missing from the size calculation
      because the proceeding semicolon was not removed. Fix this by removing
      the semicolon.
      
      Detected by CoverityScan, CID#1461135 ("Structurally dead code")
      
      Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03ac738d
  13. 05 11月, 2017 6 次提交