1. 24 10月, 2017 1 次提交
  2. 23 10月, 2017 1 次提交
  3. 22 10月, 2017 15 次提交
    • E
      ipv6: flowlabel: do not leave opt->tot_len with garbage · 864e2a1f
      Eric Dumazet 提交于
      When syzkaller team brought us a C repro for the crash [1] that
      had been reported many times in the past, I finally could find
      the root cause.
      
      If FlowLabel info is merged by fl6_merge_options(), we leave
      part of the opt_space storage provided by udp/raw/l2tp with random value
      in opt_space.tot_len, unless a control message was provided at sendmsg()
      time.
      
      Then ip6_setup_cork() would use this random value to perform a kzalloc()
      call. Undefined behavior and crashes.
      
      Fix is to properly set tot_len in fl6_merge_options()
      
      At the same time, we can also avoid consuming memory and cpu cycles
      to clear it, if every option is copied via a kmemdup(). This is the
      change in ip6_setup_cork().
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cb64a100 task.stack: ffff8801cc350000
      RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
      RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
      RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
      RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
      R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
      R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
      FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
      DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
       udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x358/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4520a9
      RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
      RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
      RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
      R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
      Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
      RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      864e2a1f
    • L
      bpf: Add BPF_SOCKET_OPS_BASE_RTT support to tcp_nv · 85cce215
      Lawrence Brakmo 提交于
      TCP_NV will try to get the base RTT from a socket_ops BPF program if one
      is loaded. NV will then use the base RTT to bound its min RTT (its
      notion of the base RTT). It uses the base RTT as an upper bound and 80%
      of the base RTT as its lower bound.
      
      In other words, NV will consider filtered RTTs larger than base RTT as a
      sign of congestion. As a result, there is no minRTT inflation when there
      is a lot of congestion. For example, in a DC where the RTTs are less
      than 40us when there is no congestion, a base RTT value of 80us improves
      the performance of NV. The difference between the uncongested RTT and
      the base RTT provided represents how much queueing we are willing to
      have (in practice it can be higher).
      
      NV has been tunned to reduce congestion when there are many flows at the
      cost of one flow not achieving full bandwith utilization. When a
      reasonable base RTT is provided, one NV flow can now fully utilize the
      full bandwidth. In addition, the performance is also improved when there
      are many flows.
      
      In the following examples the NV results are using a kernel with this
      patch set (i.e. both NV results are using the new nv_loss_dec_factor).
      
      With one host sending to another host and only one flow the
      goodputs are:
        Cubic: 9.3 Gbps, NV: 5.5 Gbps, NV (baseRTT=80us): 9.2 Gbps
      
      With 2 hosts sending to one host (1 flow per host, the goodput per flow
      is:
        Cubic: 4.6 Gbps, NV: 4.5 Gbps, NV (baseRTT=80us)L 4.6 Gbps
      
      But the RTTs seen by a ping process in the sender is:
        Cubic: 3.3ms  NV: 97us,  NV (baseRTT=80us): 146us
      
      With a lot of flows things look even better for NV with baseRTT. Here we
      have 3 hosts sending to one host. Each sending host has 6 flows: 1
      stream, 4x1MB RPC, 1x10KB RPC. Cubic, NV and NV with baseRTT all fully
      utilize the full available bandwidth. However, the distribution of
      bandwidth among the flows is very different. For the 10KB RPC flow:
        Cubic: 27Mbps, NV: 111Mbps, NV (baseRTT=80us): 222Mbps
      
      The 99% latencies for the 10KB flows are:
        Cubic: 26ms,  NV: 1ms,  NV (baseRTT=80us): 500us
      
      The RTT seen by a ping process at the senders:
        Cubic: 3.2ms  NV: 720us,  NV (baseRTT=80us): 330us
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85cce215
    • L
      bpf: Adding helper function bpf_getsockops · cd86d1fd
      Lawrence Brakmo 提交于
      Adding support for helper function bpf_getsockops to socket_ops BPF
      programs. This patch only supports TCP_CONGESTION.
      Signed-off-by: NVlad Vysotsky <vlad@cs.ucla.edu>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd86d1fd
    • G
      net: x25: mark expected switch fall-throughs · 0cea8e28
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cea8e28
    • G
      net: af_unix: mark expected switch fall-through · 110af3ac
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      110af3ac
    • D
      rxrpc: Don't release call mutex on error pointer · 6cb3ece9
      David Howells 提交于
      Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
      call pointer actually holds an error value.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cb3ece9
    • J
      tipc: refactor tipc_sk_timeout() function · 0d5fcebf
      Jon Maloy 提交于
      The function tipc_sk_timeout() is more complex than necessary, and
      even seems to contain an undetected bug. At one of the occurences
      where we renew the timer we just order it with (HZ / 20), instead
      of (jiffies + HZ / 20);
      
      In this commit we clean up the function.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5fcebf
    • N
      net: ethtool: remove error check for legacy setting transceiver type · 95491e3c
      Niklas Söderlund 提交于
      Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
      restores the transceiver type to struct ethtool_link_settings and
      convert_link_ksettings_to_legacy_settings() but forgets to remove the
      error check for the same in convert_legacy_settings_to_link_ksettings().
      This prevents older versions of ethtool to change link settings.
      
          # ethtool --version
          ethtool version 3.16
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          Cannot set new settings: Invalid argument
            not setting speed
            not setting duplex
            not setting autoneg
      
      While newer versions of ethtool works.
      
          # ethtool --version
          ethtool version 4.10
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          [   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
          [   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
      
      Fixes: 19cab887 ("net: ethtool: Add back transceiver type")
      Signed-off-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reported-by: NRenjith R V <renjith.rv@quest-global.com>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95491e3c
    • G
      net: sched: mark expected switch fall-throughs · f3ae608e
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3ae608e
    • C
      soreuseport: fix initialization race · 1b5f962e
      Craig Gallek 提交于
      Syzkaller stumbled upon a way to trigger
      WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
      reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39
      
      There are two initialization paths for the sock_reuseport structure in a
      socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
      SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
      assumedthat the socket lock protected both of these paths when it actually
      only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
      double allocation by running these paths concurrently.
      
      This patch moves the check for double allocation into the reuseport_alloc
      function which is protected by a global spin lock.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b5f962e
    • G
      net: rose: mark expected switch fall-throughs · a05b8c43
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a05b8c43
    • G
      openvswitch: conntrack: mark expected switch fall-through · 279badc2
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Notice that in this particular case I placed a "fall through" comment on
      its own line, which is what GCC is expecting to find.
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      279badc2
    • G
      net: netrom: nr_in: mark expected switch fall-through · e28101a3
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e28101a3
    • N
      net: bridge: fix returning of vlan range op errors · 66c54517
      Nikolay Aleksandrov 提交于
      When vlan tunnels were introduced, vlan range errors got silently
      dropped and instead 0 was returned always. Restore the previous
      behaviour and return errors to user-space.
      
      Fixes: efa5356b ("bridge: per vlan dst_metadata netlink support")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66c54517
    • W
      sock: correct sk_wmem_queued accounting on efault in tcp zerocopy · 54d43117
      Willem de Bruijn 提交于
      Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
      after triggering an EFAULT in __zerocopy_sg_from_iter.
      
      On this error, skb_zerocopy_stream_iter resets the skb to its state
      before the operation with __pskb_trim. It cannot kfree_skb like
      datagram callers, as the skb may have data from a previous send call.
      
      __pskb_trim calls skb_condense for unowned skbs, which adjusts their
      truesize. These tcp skbuffs are owned and their truesize must add up
      to sk_wmem_queued. But they match because their skb->sk is NULL until
      tcp_transmit_skb.
      
      Temporarily set skb->sk when calling __pskb_trim to signal that the
      skbuffs are owned and avoid the skb_condense path.
      
      Fixes: 52267790 ("sock: add MSG_ZEROCOPY")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54d43117
  4. 21 10月, 2017 23 次提交