1. 04 12月, 2017 1 次提交
    • E
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() · eeea10b8
      Eric Dumazet 提交于
      James Morris reported kernel stack corruption bug [1] while
      running the SELinux testsuite, and bisected to a recent
      commit bffa72cf ("net: sk_buff rbnode reorg")
      
      We believe this commit is fine, but exposes an older bug.
      
      SELinux code runs from tcp_filter() and might send an ICMP,
      expecting IP options to be found in skb->cb[] using regular IPCB placement.
      
      We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.
      
      This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
      similar way we added them for IPv6.
      
      [1]
      [  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
      [  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
      [  339.822505]
      [  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
      [  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
      [  339.885060] Call Trace:
      [  339.896875]  <IRQ>
      [  339.908103]  dump_stack+0x63/0x87
      [  339.920645]  panic+0xe8/0x248
      [  339.932668]  ? ip_push_pending_frames+0x33/0x40
      [  339.946328]  ? icmp_send+0x525/0x530
      [  339.958861]  ? kfree_skbmem+0x60/0x70
      [  339.971431]  __stack_chk_fail+0x1b/0x20
      [  339.984049]  icmp_send+0x525/0x530
      [  339.996205]  ? netlbl_skbuff_err+0x36/0x40
      [  340.008997]  ? selinux_netlbl_err+0x11/0x20
      [  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
      [  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
      [  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
      [  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
      [  340.074562]  ? tcp_filter+0x2c/0x40
      [  340.086400]  ? tcp_v4_rcv+0x820/0xa20
      [  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
      [  340.111279]  ? ip_local_deliver+0x6f/0xe0
      [  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
      [  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
      [  340.147442]  ? ip_rcv+0x27c/0x3c0
      [  340.158668]  ? inet_del_offload+0x40/0x40
      [  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
      [  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
      [  340.195282]  ? __netif_receive_skb+0x18/0x60
      [  340.207288]  ? process_backlog+0x95/0x140
      [  340.218948]  ? net_rx_action+0x26c/0x3b0
      [  340.230416]  ? __do_softirq+0xc9/0x26a
      [  340.241625]  ? do_softirq_own_stack+0x2a/0x40
      [  340.253368]  </IRQ>
      [  340.262673]  ? do_softirq+0x50/0x60
      [  340.273450]  ? __local_bh_enable_ip+0x57/0x60
      [  340.285045]  ? ip_finish_output2+0x175/0x350
      [  340.296403]  ? ip_finish_output+0x127/0x1d0
      [  340.307665]  ? nf_hook_slow+0x3c/0xb0
      [  340.318230]  ? ip_output+0x72/0xe0
      [  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
      [  340.340070]  ? ip_local_out+0x35/0x40
      [  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
      [  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
      [  340.372484]  ? __skb_clone+0x2e/0x130
      [  340.382633]  ? tcp_transmit_skb+0x558/0xa10
      [  340.393262]  ? tcp_connect+0x938/0xad0
      [  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
      [  340.414206]  ? tcp_v4_connect+0x457/0x4e0
      [  340.424471]  ? __inet_stream_connect+0xb3/0x300
      [  340.435195]  ? inet_stream_connect+0x3b/0x60
      [  340.445607]  ? SYSC_connect+0xd9/0x110
      [  340.455455]  ? __audit_syscall_entry+0xaf/0x100
      [  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
      [  340.476636]  ? __audit_syscall_exit+0x209/0x290
      [  340.487151]  ? SyS_connect+0xe/0x10
      [  340.496453]  ? do_syscall_64+0x67/0x1b0
      [  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJames Morris <james.l.morris@oracle.com>
      Tested-by: NJames Morris <james.l.morris@oracle.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeea10b8
  2. 30 11月, 2017 2 次提交
  3. 24 11月, 2017 3 次提交
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
    • D
      net: ipv6: Fixup device for anycast routes during copy · 98d11291
      David Ahern 提交于
      Florian reported a breakage with anycast routes due to commit
      4832c30d ("net: ipv6: put host and anycast routes on device with
      address"). Prior to this commit anycast routes were added against the
      loopback device causing repetitive route entries with no insight into
      why they existed. e.g.:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
      
      The point of commit 4832c30d is to add the routes using the device
      with the address which is causing the route to be added. e.g.,:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth1 proto kernel metric 0 pref medium
      
      For traffic to work as it did before, the dst device needs to be switched
      to the loopback when the copy is created similar to local routes.
      
      Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98d11291
    • I
      ipv6: Do not consider linkdown nexthops during multipath · bbfcd776
      Ido Schimmel 提交于
      When the 'ignore_routes_with_linkdown' sysctl is set, we should not
      consider linkdown nexthops during route lookup.
      
      While the code correctly verifies that the initially selected route
      ('match') has a carrier, it does not perform the same check in the
      subsequent multipath selection, resulting in a potential packet loss.
      
      In case the chosen route does not have a carrier and the sysctl is set,
      choose the initially selected route.
      
      Fixes: 35103d11 ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbfcd776
  4. 22 11月, 2017 3 次提交
    • K
      treewide: setup_timer() -> timer_setup() (2 field) · 86cb30ec
      Kees Cook 提交于
      This converts all remaining setup_timer() calls that use a nested field
      to reach a struct timer_list. Coccinelle does not have an easy way to
      match multiple fields, so a new script is needed to change the matches of
      "&_E->_timer" into "&_E->_field1._timer" in all the rules.
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup-2fields.cocci
      
      @fix_address_of depends@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _field1;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_field1._timer, NULL, _E);
      +timer_setup(&_E->_field1._timer, NULL, 0);
      |
      -setup_timer(&_E->_field1._timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, NULL, 0);
      |
      -setup_timer(&_E._field1._timer, NULL, &_E);
      +timer_setup(&_E._field1._timer, NULL, 0);
      |
      -setup_timer(&_E._field1._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _field1;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_field1._timer, _callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, &_callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
       _E->_field1._timer@_stl.function = _callback;
      |
       _E->_field1._timer@_stl.function = &_callback;
      |
       _E->_field1._timer@_stl.function = (_cast_func)_callback;
      |
       _E->_field1._timer@_stl.function = (_cast_func)&_callback;
      |
       _E._field1._timer@_stl.function = _callback;
      |
       _E._field1._timer@_stl.function = &_callback;
      |
       _E._field1._timer@_stl.function = (_cast_func)_callback;
      |
       _E._field1._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _field1._timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _field1._timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _field1._timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _field1._timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_field1._timer, _callback, 0);
      +setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._field1._timer, _callback, 0);
      +setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_field1._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_field1._timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_field1._timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_field1._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._field1._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._field1;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_field1._timer
      |
      -(_cast_data)&_E
      +&_E._field1._timer
      |
      -_E
      +&_E->_field1._timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _field1;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_field1._timer, _callback, 0);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, _callback, 0L);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E->_field1._timer, _callback, 0UL);
      +timer_setup(&_E->_field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, 0);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, 0L);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_E._field1._timer, _callback, 0UL);
      +timer_setup(&_E._field1._timer, _callback, 0);
      |
      -setup_timer(&_field1._timer, _callback, 0);
      +timer_setup(&_field1._timer, _callback, 0);
      |
      -setup_timer(&_field1._timer, _callback, 0L);
      +timer_setup(&_field1._timer, _callback, 0);
      |
      -setup_timer(&_field1._timer, _callback, 0UL);
      +timer_setup(&_field1._timer, _callback, 0);
      |
      -setup_timer(_field1._timer, _callback, 0);
      +timer_setup(_field1._timer, _callback, 0);
      |
      -setup_timer(_field1._timer, _callback, 0L);
      +timer_setup(_field1._timer, _callback, 0);
      |
      -setup_timer(_field1._timer, _callback, 0UL);
      +timer_setup(_field1._timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      86cb30ec
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
    • K
      treewide: Switch DEFINE_TIMER callbacks to struct timer_list * · 24ed960a
      Kees Cook 提交于
      This changes all DEFINE_TIMER() callbacks to use a struct timer_list
      pointer instead of unsigned long. Since the data argument has already been
      removed, none of these callbacks are using their argument currently, so
      this renames the argument to "unused".
      
      Done using the following semantic patch:
      
      @match_define_timer@
      declarer name DEFINE_TIMER;
      identifier _timer, _callback;
      @@
      
       DEFINE_TIMER(_timer, _callback);
      
      @change_callback depends on match_define_timer@
      identifier match_define_timer._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void
      -_callback(_origtype _origarg)
      +_callback(struct timer_list *unused)
       { ... }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      24ed960a
  5. 19 11月, 2017 1 次提交
  6. 15 11月, 2017 2 次提交
    • S
      tcp: Namespace-ify sysctl_tcp_default_congestion_control · 6670e152
      Stephen Hemminger 提交于
      Make default TCP default congestion control to a per namespace
      value. This changes default congestion control to a pointer to congestion ops
      (rather than implicit as first element of available lsit).
      
      The congestion control setting of new namespaces is inherited
      from the current setting of the root namespace.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6670e152
    • N
      ipv6: set all.accept_dad to 0 by default · 09400953
      Nicolas Dichtel 提交于
      With commits 35e015e1 and a2d3f3e3, the global 'accept_dad' flag
      is also taken into account (default value is 1). If either global or
      per-interface flag is non-zero, DAD will be enabled on a given interface.
      
      This is not backward compatible: before those patches, the user could
      disable DAD just by setting the per-interface flag to 0. Now, the
      user instead needs to set both flags to 0 to actually disable DAD.
      
      Restore the previous behaviour by setting the default for the global
      'accept_dad' flag to 0. This way, DAD is still enabled by default,
      as per-interface flags are set to 1 on device creation, but setting
      them to 0 is enough to disable DAD on a given interface.
      
      - Before 35e015e1f57a7 and a2d3f3e3:
                global    per-interface    DAD enabled
      [default]   1             1              yes
                  X             0              no
                  X             1              yes
      
      - After 35e015e1 and a2d3f3e3:
                global    per-interface    DAD enabled
      [default]   1             1              yes
                  0             0              no
                  0             1              yes
                  1             0              yes
      
      - After this fix:
                global    per-interface    DAD enabled
                  1             1              yes
                  0             0              no
      [default]   0             1              yes
                  1             0              yes
      
      Fixes: 35e015e1 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
      Fixes: a2d3f3e3 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real")
      CC: Stefano Brivio <sbrivio@redhat.com>
      CC: Matteo Croce <mcroce@redhat.com>
      CC: Erik Kline <ek@google.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09400953
  7. 14 11月, 2017 1 次提交
  8. 13 11月, 2017 6 次提交
  9. 11 11月, 2017 2 次提交
    • M
      net: Remove unused skb_shared_info member · 39b17521
      Mat Martineau 提交于
      ip6_frag_id was only used by UFO, which has been removed.
      ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no
      in-tree callers.
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39b17521
    • M
      net: ipv6: sysctl to specify IPv6 ND traffic class · 2210d6b2
      Maciej Żenczykowski 提交于
      Add a per-device sysctl to specify the default traffic class to use for
      kernel originated IPv6 Neighbour Discovery packets.
      
      Currently this includes:
      
        - Router Solicitation (ICMPv6 type 133)
          ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Neighbour Solicitation (ICMPv6 type 135)
          ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Neighbour Advertisement (ICMPv6 type 136)
          ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()
      
        - Redirect (ICMPv6 type 137)
          ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()
      
      and if the kernel ever gets around to generating RA's,
      it would presumably also include:
      
        - Router Advertisement (ICMPv6 type 134)
          (radvd daemon could pick up on the kernel setting and use it)
      
      Interface drivers may examine the Traffic Class value and translate
      the DiffServ Code Point into a link-layer appropriate traffic
      prioritization scheme.  An example of mapping IETF DSCP values to
      IEEE 802.11 User Priority values can be found here:
      
          https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11
      
      The expected primary use case is to properly prioritize ND over wifi.
      
      Testing:
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        0
        jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        -bash: echo: write error: Invalid argument
        jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        -bash: echo: write error: Invalid argument
        jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        255
        jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        34
      
        jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
        jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
        tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
        IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
        jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
        length 24, tgt is jzem22.pgc, Flags [solicited]
      
      (based on original change written by Erik Kline, with minor changes)
      
      v2: fix 'suspicious rcu_dereference_check() usage'
          by explicitly grabbing the rcu_read_lock.
      
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NErik Kline <ek@google.com>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2210d6b2
  10. 10 11月, 2017 1 次提交
  11. 08 11月, 2017 4 次提交
    • T
      ila: Add a hook type for LWT routes · fddb231e
      Tom Herbert 提交于
      In LWT tunnels both an input and output route method is defined.
      If both of these are executed in the same path then double translation
      happens and the effect is not correct.
      
      This patch adds a new attribute that indicates the hook type. Two
      values are defined for route output and route output. ILA
      translation is only done for the one that is set. The default is
      to enable ILA on route output.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fddb231e
    • T
      ila: allow configuration of identifier type · 70d5aef4
      Tom Herbert 提交于
      Allow identifier to be explicitly configured for a mapping.
      This can either be one of the identifier types specified in the
      ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the
      identifier type is inferred from the identifier type field.
      If a value other than ILA_ATYPE_USE_FORMAT is set for a
      mapping then it is assumed that the identifier type field is
      not present in an identifier.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70d5aef4
    • T
      ila: add checksum neutral map auto · 84287bb3
      Tom Herbert 提交于
      Add checksum neutral auto that performs checksum neutral mapping
      without using the C-bit. This is enabled by configuration of
      a mapping.
      
      The checksum neutral function has been split into
      ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former
      handles the C-bit and includes it in the adjustment value. The latter
      just sets the adjustment value on the locator diff only.
      
      Added configuration for checksum neutral map aut in ila_lwt
      and ila_xlat.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84287bb3
    • T
      ila: cleanup checksum diff · 80661e76
      Tom Herbert 提交于
      Consolidate computing checksum diff into one function.
      
      Add get_csum_diff_iaddr that computes the checksum diff between
      an address argument and locator being written. get_csum_diff
      calls this using the destination address in the IP header as
      the argument.
      
      Also moved ila_init_saved_csum to be close to the checksum
      
      diff functions.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80661e76
  12. 07 11月, 2017 1 次提交
    • E
      ipv6: addrconf: fix a lockdep splat · fffcefe9
      Eric Dumazet 提交于
      Fixes a case where GFP_ATOMIC allocation must be used instead of
      GFP_KERNEL one.
      
      [   54.891146]  lock_acquire+0xb3/0x2f0
      [   54.891153]  ? fs_reclaim_acquire.part.60+0x5/0x30
      [   54.891165]  fs_reclaim_acquire.part.60+0x29/0x30
      [   54.891170]  ? fs_reclaim_acquire.part.60+0x5/0x30
      [   54.891178]  kmem_cache_alloc_trace+0x3f/0x500
      [   54.891186]  ? cyc2ns_read_end+0x1e/0x30
      [   54.891196]  ipv6_add_addr+0x15a/0xc30
      [   54.891217]  ? ipv6_create_tempaddr+0x2ea/0x5d0
      [   54.891223]  ipv6_create_tempaddr+0x2ea/0x5d0
      [   54.891238]  ? manage_tempaddrs+0x195/0x220
      [   54.891249]  ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
      [   54.891255]  addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
      [   54.891268]  addrconf_prefix_rcv+0x2e5/0x9b0
      [   54.891279]  ? neigh_update+0x446/0xb90
      [   54.891298]  ? ndisc_router_discovery+0x5ab/0xf00
      [   54.891303]  ndisc_router_discovery+0x5ab/0xf00
      [   54.891311]  ? retint_kernel+0x2d/0x2d
      [   54.891331]  ndisc_rcv+0x1b6/0x270
      [   54.891340]  icmpv6_rcv+0x6aa/0x9f0
      [   54.891345]  ? ipv6_chk_mcast_addr+0x176/0x530
      [   54.891351]  ? do_csum+0x17b/0x260
      [   54.891360]  ip6_input_finish+0x194/0xb20
      [   54.891372]  ip6_input+0x5b/0x2c0
      [   54.891380]  ? ip6_rcv_finish+0x320/0x320
      [   54.891389]  ip6_mc_input+0x15a/0x250
      [   54.891396]  ipv6_rcv+0x772/0x1050
      [   54.891403]  ? consume_skb+0xbe/0x2d0
      [   54.891412]  ? ip6_make_skb+0x2a0/0x2a0
      [   54.891418]  ? ip6_input+0x2c0/0x2c0
      [   54.891425]  __netif_receive_skb_core+0xa0f/0x1600
      [   54.891436]  ? process_backlog+0xac/0x400
      [   54.891441]  process_backlog+0xfa/0x400
      [   54.891448]  ? net_rx_action+0x145/0x1130
      [   54.891456]  net_rx_action+0x310/0x1130
      [   54.891524]  ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta]
      [   54.891538]  __do_softirq+0x140/0xaba
      [   54.891553]  irq_exit+0x10b/0x160
      [   54.891561]  do_IRQ+0xbb/0x1b0
      
      Fixes: f3d9832e ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fffcefe9
  13. 06 11月, 2017 1 次提交
  14. 05 11月, 2017 1 次提交
  15. 03 11月, 2017 2 次提交
    • G
      net: use -ENOSPC for transient busy indication · 068c2e70
      Gilad Ben-Yossef 提交于
      Replace -EBUSY with -ENOSPC when handling transient busy
      indication in the absence of backlog.
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      068c2e70
    • T
      ipv6: Implement limits on Hop-by-Hop and Destination options · 47d3d7ac
      Tom Herbert 提交于
      RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
      extension headers. Both of these carry a list of TLVs which is
      only limited by the maximum length of the extension header (2048
      bytes). By the spec a host must process all the TLVs in these
      options, however these could be used as a fairly obvious
      denial of service attack. I think this could in fact be
      a significant DOS vector on the Internet, one mitigating
      factor might be that many FWs drop all packets with EH (and
      obviously this is only IPv6) so an Internet wide attack might not
      be so effective (yet!).
      
      By my calculation, the worse case packet with TLVs in a standard
      1500 byte MTU packet that would be processed by the stack contains
      1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
      wrote a quick test program that floods a whole bunch of these
      packets to a host and sure enough there is substantial time spent
      in ip6_parse_tlv. These packets contain nothing but unknown TLVS
      (that are ignored), TLV padding, and bogus UDP header with zero
      payload length.
      
        25.38%  [kernel]                    [k] __fib6_clean_all
        21.63%  [kernel]                    [k] ip6_parse_tlv
         4.21%  [kernel]                    [k] __local_bh_enable_ip
         2.18%  [kernel]                    [k] ip6_pol_route.isra.39
         1.98%  [kernel]                    [k] fib6_walk_continue
         1.88%  [kernel]                    [k] _raw_write_lock_bh
         1.65%  [kernel]                    [k] dst_release
      
      This patch adds configurable limits to Destination and Hop-by-Hop
      options. There are three limits that may be set:
        - Limit the number of options in a Hop-by-Hop or Destination options
          extension header.
        - Limit the byte length of a Hop-by-Hop or Destination options
          extension header.
        - Disallow unrecognized options in a Hop-by-Hop or Destination
          options extension header.
      
      The limits are set in corresponding sysctls:
      
        ipv6.sysctl.max_dst_opts_cnt
        ipv6.sysctl.max_hbh_opts_cnt
        ipv6.sysctl.max_dst_opts_len
        ipv6.sysctl.max_hbh_opts_len
      
      If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
      The number of known TLVs that are allowed is the absolute value of
      this number.
      
      If a limit is exceeded when processing an extension header the packet is
      dropped.
      
      Default values are set to 8 for options counts, and set to INT_MAX
      for maximum length. Note the choice to limit options to 8 is an
      arbitrary guess (roughly based on the fact that the stack supports
      three HBH options and just one destination option).
      
      These limits have being proposed in draft-ietf-6man-rfc6434-bis.
      
      Tested (by Martin Lau)
      
      I tested out 1 thread (i.e. one raw_udp process).
      
      I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
      With sysctls setting to 2048, the softirq% is packed to 100%.
      With 8, the softirq% is almost unnoticable from mpstat.
      
      v2;
        - Code and documention cleanup.
        - Change references of RFC2460 to be RFC8200.
        - Add reference to RFC6434-bis where the limits will be in standard.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d3d7ac
  16. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  17. 01 11月, 2017 4 次提交
    • E
      ipv6: addrconf: increment ifp refcount before ipv6_del_addr() · e669b869
      Eric Dumazet 提交于
      In the (unlikely) event fixup_permanent_addr() returns a failure,
      addrconf_permanent_addr() calls ipv6_del_addr() without the
      mandatory call to in6_ifa_hold(), leading to a refcount error,
      spotted by syzkaller :
      
      WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50
      lib/refcount.c:227
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-20171009+ #33
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x41c kernel/panic.c:181
       __warn+0x1c4/0x1e0 kernel/panic.c:544
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
       do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
       invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227
      RSP: 0018:ffff8801ca49e680 EFLAGS: 00010286
      RAX: 000000000000002c RBX: ffff8801d07cfcdc RCX: 0000000000000000
      RDX: 000000000000002c RSI: 1ffff10039493c90 RDI: ffffed0039493cc4
      RBP: ffff8801ca49e688 R08: ffff8801ca49dd70 R09: 0000000000000000
      R10: ffff8801ca49df58 R11: 0000000000000000 R12: 1ffff10039493cd9
      R13: ffff8801ca49e6e8 R14: ffff8801ca49e7e8 R15: ffff8801d07cfcdc
       __in6_ifa_put include/net/addrconf.h:369 [inline]
       ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208
       addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline]
       addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697
       call_netdevice_notifiers net/core/dev.c:1715 [inline]
       __dev_notify_flags+0x15d/0x430 net/core/dev.c:6843
       dev_change_flags+0xf5/0x140 net/core/dev.c:6879
       do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113
       rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661
       rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313
       netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299
       netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049
       __sys_sendmsg+0xe5/0x210 net/socket.c:2083
       SYSC_sendmsg net/socket.c:2094 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2090
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x7fa9174d3320
      RSP: 002b:00007ffe302ae9e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007ffe302b2ae0 RCX: 00007fa9174d3320
      RDX: 0000000000000000 RSI: 00007ffe302aea20 RDI: 0000000000000016
      RBP: 0000000000000082 R08: 0000000000000000 R09: 000000000000000f
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe302b32a0
      R13: 0000000000000000 R14: 00007ffe302b2ab8 R15: 00007ffe302b32b8
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e669b869
    • V
      net: display hw address of source machine during ipv6 DAD failure · da13c59b
      Vishwanath Pai 提交于
      This patch updates the error messages displayed in kernel log to include
      hwaddress of the source machine that caused ipv6 duplicate address
      detection failures.
      
      Examples:
      
      a) When we receive a NA packet from another machine advertising our
      address:
      
      ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2001:db8:: on eth0!
      
      b) When we detect DAD failure during address assignment to an interface:
      
      IPv6: eth0: IPv6 duplicate address 2001:db8:: used by 34:ab:cd:56:11:e8
      detected!
      
      v2:
          Changed %pI6 to %pI6c in ndisc_recv_na()
          Chaged the v6 address in the commit message to 2001:db8::
      Suggested-by: NIgor Lubashev <ilubashe@akamai.com>
      Signed-off-by: NVishwanath Pai <vpai@akamai.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da13c59b
    • D
      net: sit: Update lookup to handle links set to L3 slave · 3051fbec
      David Ahern 提交于
      Using SIT tunnels with VRFs works fine if the underlay device is in a
      VRF and the link parameter is set to the VRF device. e.g.,
      
          ip tunnel add jtun mode sit remote <addr> local <addr> dev myvrf
      
      Update the device check to allow the link to be the enslaved device as
      well. e.g.,
      
          ip tunnel add jtun mode sit remote <addr> local <addr> dev eth4
      
      where eth4 is enslaved to myvrf.
      Reported-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3051fbec
    • D
      net: Add extack to fib_notifier_info · 6c31e5a9
      David Ahern 提交于
      Add extack to fib_notifier_info and plumb through stack to
      call_fib_rule_notifiers, call_fib_entry_notifiers and
      call_fib6_entry_notifiers. This allows notifer handlers to
      return messages to user.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c31e5a9
  18. 29 10月, 2017 1 次提交
    • W
      ipv6: prevent user from adding cached routes · 2ea2352e
      Wei Wang 提交于
      Cached routes should only be created by the system when receiving pmtu
      discovery or ip redirect msg. Users should not be allowed to create
      cached routes.
      
      Furthermore, after the patch series to move cached routes into exception
      table, user added cached routes will trigger the following warning in
      fib6_add():
      
      WARNING: CPU: 0 PID: 2985 at net/ipv6/ip6_fib.c:1137
      fib6_add+0x20d9/0x2c10 net/ipv6/ip6_fib.c:1137
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 2985 Comm: syzkaller320388 Not tainted 4.14.0-rc3+ #74
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:181
       __warn+0x1c4/0x1d9 kernel/panic.c:542
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
       do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
       invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
      RIP: 0010:fib6_add+0x20d9/0x2c10 net/ipv6/ip6_fib.c:1137
      RSP: 0018:ffff8801cf09f6a0 EFLAGS: 00010297
      RAX: ffff8801ce45e340 RBX: 1ffff10039e13eec RCX: ffff8801d749c814
      RDX: 0000000000000000 RSI: ffff8801d749c700 RDI: ffff8801d749c780
      RBP: ffff8801cf09fa08 R08: 0000000000000000 R09: ffff8801cf09f360
      R10: ffff8801cf09f2d8 R11: 1ffff10039c8befb R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d749c700 R15: ffffffff860655c0
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_add+0x148/0x1a0 net/ipv6/route.c:2782
       ipv6_route_ioctl+0x4d5/0x690 net/ipv6/route.c:3291
       inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:521
       sock_do_ioctl+0x65/0xb0 net/socket.c:961
       sock_ioctl+0x2c2/0x440 net/socket.c:1058
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      So we fix this by failing the attemp to add cached routes from userspace
      with returning EINVAL error.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea2352e
  19. 28 10月, 2017 2 次提交
  20. 27 10月, 2017 1 次提交
    • X
      ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit · 8aec4959
      Xin Long 提交于
      When receiving a Toobig icmpv6 packet, ip6gre_err would just set
      tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
      still be using the old value, it has no chance to be updated with
      tunnel dev's mtu.
      
      Jianlin found this issue by reducing route's mtu while running
      netperf, the performance went to 0.
      
      ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
      the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
      to upper socket after setting tunnel dev's mtu.
      
      We couldn't do that for ip6_gre, as gre's inner packet could be
      any protocol, it's difficult to handle them (like lookup upper
      dst) in a good way.
      
      So this patch is to fix it by updating skb_dst(skb)'s pmtu when
      dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
      update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
      performance regression can be caused by this.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8aec4959