1. 02 9月, 2017 5 次提交
  2. 31 8月, 2017 3 次提交
  3. 30 8月, 2017 1 次提交
  4. 29 8月, 2017 5 次提交
    • D
      net: Add comment that early_demux can change via sysctl · a8e3bb34
      David Ahern 提交于
      Twice patches trying to constify inet{6}_protocol have been reverted:
      39294c3d ("Revert "ipv6: constify inet6_protocol structures"") to
      revert 3a3a4e30 and then 03157937 ("Revert "ipv4: make
      net_protocol const"") to revert aa8db499.
      
      Add a comment that the structures can not be const because the
      early_demux field can change based on a sysctl.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8e3bb34
    • W
      gre: add collect_md mode to ERSPAN tunnel · 1a66a836
      William Tu 提交于
      Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
      operate in 'collect metadata' mode.  bpf_skb_[gs]et_tunnel_key() helpers
      can make use of it right away.  OVS can use it as well in the future.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a66a836
    • W
      gre: refactor the gre_fb_xmit · 862a03c3
      William Tu 提交于
      The patch refactors the gre_fb_xmit function, by creating
      prepare_fb_xmit function for later ERSPAN collect_md mode patch.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      862a03c3
    • D
      Revert "ipv4: make net_protocol const" · 03157937
      David Ahern 提交于
      This reverts commit aa8db499.
      
      Early demux structs can not be made const. Doing so results in:
      [   84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10
      [   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
      [   84.970544] PGD 1a0a067
      [   84.970546] P4D 1a0a067
      [   84.971212] PUD 1a0b063
      [   84.971733] PMD 80000000016001e1
      
      [   84.972669] Oops: 0003 [#1] SMP
      [   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
      [   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
      [   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000
      [   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
      [   84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246
      [   84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000
      [   84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
      [   84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000
      [   84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001
      [   84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002
      [   84.982249] FS:  00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [   84.983231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0
      [   84.984817] Call Trace:
      [   84.985133]  proc_tcp_early_demux+0x29/0x30
      
      I think this is the second time such a patch has been reverted.
      
      Cc: Bhumika Goyal <bhumirks@gmail.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03157937
    • B
      ipv4: make net_protocol const · aa8db499
      Bhumika Goyal 提交于
      Make these const as they are only passed to a const argument of the
      function inet_add_protocol.
      Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa8db499
  5. 26 8月, 2017 3 次提交
    • P
      udp6: set rx_dst_cookie on rx_dst updates · 64f0f5d1
      Paolo Abeni 提交于
      Currently, in the udp6 code, the dst cookie is not initialized/updated
      concurrently with the RX dst used by early demux.
      
      As a result, the dst_check() in the early_demux path always fails,
      the rx dst cache is always invalidated, and we can't really
      leverage significant gain from the demux lookup.
      
      Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
      to set the dst cookie when the dst entry is really changed.
      
      The issue is there since the introduction of early demux for ipv6.
      
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64f0f5d1
    • E
      tcp: fix hang in tcp_sendpage_locked() · bd9dfc54
      Eric Dumazet 提交于
      syszkaller got a hang in tcp stack, related to a bug in
      tcp_sendpage_locked()
      
      root@syzkaller:~# cat /proc/3059/stack
      [<ffffffff83de926c>] __lock_sock+0x1dc/0x2f0
      [<ffffffff83de9473>] lock_sock_nested+0xf3/0x110
      [<ffffffff8408ce01>] tcp_sendmsg+0x21/0x50
      [<ffffffff84163b6f>] inet_sendmsg+0x11f/0x5e0
      [<ffffffff83dd8eea>] sock_sendmsg+0xca/0x110
      [<ffffffff83dd9547>] kernel_sendmsg+0x47/0x60
      [<ffffffff83de35dc>] sock_no_sendpage+0x1cc/0x280
      [<ffffffff8408916b>] tcp_sendpage_locked+0x10b/0x160
      [<ffffffff84089203>] tcp_sendpage+0x43/0x60
      [<ffffffff841641da>] inet_sendpage+0x1aa/0x660
      [<ffffffff83dd4fcd>] kernel_sendpage+0x8d/0xe0
      [<ffffffff83dd50ac>] sock_sendpage+0x8c/0xc0
      [<ffffffff81b63300>] pipe_to_sendpage+0x290/0x3b0
      [<ffffffff81b67243>] __splice_from_pipe+0x343/0x750
      [<ffffffff81b6a459>] splice_from_pipe+0x1e9/0x330
      [<ffffffff81b6a5e0>] generic_splice_sendpage+0x40/0x50
      [<ffffffff81b6b1d7>] SyS_splice+0x7b7/0x1610
      [<ffffffff84d77a01>] entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 306b13eb ("proto_ops: Add locked held versions of sendmsg and sendpage")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd9dfc54
    • S
      tcp: fix refcnt leak with ebpf congestion control · ebfa00c5
      Sabrina Dubroca 提交于
      There are a few bugs around refcnt handling in the new BPF congestion
      control setsockopt:
      
       - The new ca is assigned to icsk->icsk_ca_ops even in the case where we
         cannot get a reference on it. This would lead to a use after free,
         since that ca is going away soon.
      
       - Changing the congestion control case doesn't release the refcnt on
         the previous ca.
      
       - In the reinit case, we first leak a reference on the old ca, then we
         call tcp_reinit_congestion_control on the ca that we have just
         assigned, leading to deinitializing the wrong ca (->release of the
         new ca on the old ca's data) and releasing the refcount on the ca
         that we actually want to use.
      
      This is visible by building (for example) BIC as a module and setting
      net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from
      samples/bpf.
      
      This patch fixes the refcount issues, and moves reinit back into tcp
      core to avoid passing a ca pointer back to BPF.
      
      Fixes: 91b5b21c ("bpf: Add support for changing congestion control")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebfa00c5
  6. 25 8月, 2017 2 次提交
  7. 24 8月, 2017 3 次提交
  8. 23 8月, 2017 5 次提交
  9. 19 8月, 2017 5 次提交
    • L
      net: inet: diag: expose sockets cgroup classid · 0888e372
      Levin, Alexander (Sasha Levin) 提交于
      This is useful for directly looking up a task based on class id rather than
      having to scan through all open file descriptors.
      Signed-off-by: NSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0888e372
    • N
      tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP · cdbeb633
      Neal Cardwell 提交于
      In some situations tcp_send_loss_probe() can realize that it's unable
      to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
      to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
      realizes that the RTO was eligible to fire immediately or at some
      point in the past (delta_us <= 0). Previously in such cases
      tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
      icsk_rto, which caused needless delays of hundreds of milliseconds
      (and non-linear behavior that made reproducible testing
      difficult). This commit changes the logic to schedule "overdue" RTOs
      ASAP, rather than at now + icsk_rto.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Suggested-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdbeb633
    • R
      net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set · bc3aae2b
      Roopa Prabhu 提交于
      Syzkaller hit 'general protection fault in fib_dump_info' bug on
      commit 4.13-rc5..
      
      Guilty file: net/ipv4/fib_semantics.c
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 2808 Comm: syz-executor0 Not tainted 4.13.0-rc5 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      Ubuntu-1.8.2-1ubuntu1 04/01/2014
      task: ffff880078562700 task.stack: ffff880078110000
      RIP: 0010:fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314
      RSP: 0018:ffff880078117010 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: 00000000000000fe RCX: 0000000000000002
      RDX: 0000000000000006 RSI: ffff880078117084 RDI: 0000000000000030
      RBP: ffff880078117268 R08: 000000000000000c R09: ffff8800780d80c8
      R10: 0000000058d629b4 R11: 0000000067fce681 R12: 0000000000000000
      R13: ffff8800784bd540 R14: ffff8800780d80b5 R15: ffff8800780d80a4
      FS:  00000000022fa940(0000) GS:ffff88007fc00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004387d0 CR3: 0000000079135000 CR4: 00000000000006f0
      Call Trace:
        inet_rtm_getroute+0xc89/0x1f50 net/ipv4/route.c:2766
        rtnetlink_rcv_msg+0x288/0x680 net/core/rtnetlink.c:4217
        netlink_rcv_skb+0x340/0x470 net/netlink/af_netlink.c:2397
        rtnetlink_rcv+0x28/0x30 net/core/rtnetlink.c:4223
        netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
        netlink_unicast+0x4c4/0x6e0 net/netlink/af_netlink.c:1291
        netlink_sendmsg+0x8c4/0xca0 net/netlink/af_netlink.c:1854
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:643
        ___sys_sendmsg+0x779/0x8d0 net/socket.c:2035
        __sys_sendmsg+0xd1/0x170 net/socket.c:2069
        SYSC_sendmsg net/socket.c:2080 [inline]
        SyS_sendmsg+0x2d/0x50 net/socket.c:2076
        entry_SYSCALL_64_fastpath+0x1a/0xa5
        RIP: 0033:0x4512e9
        RSP: 002b:00007ffc75584cc8 EFLAGS: 00000216 ORIG_RAX:
        000000000000002e
        RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00000000004512e9
        RDX: 0000000000000000 RSI: 0000000020f2cfc8 RDI: 0000000000000003
        RBP: 000000000000000e R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000216 R12: fffffffffffffffe
        R13: 0000000000718000 R14: 0000000020c44ff0 R15: 0000000000000000
        Code: 00 0f b6 8d ec fd ff ff 48 8b 85 f0 fd ff ff 88 48 17 48 8b 45
        28 48 8d 78 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03
        <0f>
        b6 04 02 84 c0 74 08 3c 03 0f 8e cb 0c 00 00 48 8b 45 28 44
        RIP: fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314 RSP:
        ffff880078117010
      ---[ end trace 254a7af28348f88b ]---
      
      This patch adds a res->fi NULL check.
      
      example run:
      $ip route get 0.0.0.0 iif virt1-0
      broadcast 0.0.0.0 dev lo
          cache <local,brd> iif virt1-0
      
      $ip route get 0.0.0.0 iif virt1-0 fibmatch
      RTNETLINK answers: No route to host
      Reported-by: Nidaifish <idaifish@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: b6179813 ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc3aae2b
    • E
      ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t · 9620fef2
      Eric Dumazet 提交于
      refcount_t type and corresponding API should be
      used instead of atomic_t when the variable is used as
      a reference counter. This allows to avoid accidental
      refcounter overflows that might lead to use-after-free
      situations.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9620fef2
    • M
      datagram: When peeking datagrams with offset < 0 don't skip empty skbs · a0917e0b
      Matthew Dawson 提交于
      Due to commit e6afc8ac ("udp: remove
      headers from UDP packets before queueing"), when udp packets are being
      peeked the requested extra offset is always 0 as there is no need to skip
      the udp header.  However, when the offset is 0 and the next skb is
      of length 0, it is only returned once.  The behaviour can be seen with
      the following python script:
      
      from socket import *;
      f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
      g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
      f.bind(('::', 0));
      addr=('::1', f.getsockname()[1]);
      g.sendto(b'', addr)
      g.sendto(b'b', addr)
      print(f.recvfrom(10, MSG_PEEK));
      print(f.recvfrom(10, MSG_PEEK));
      
      Where the expected output should be the empty string twice.
      
      Instead, make sk_peek_offset return negative values, and pass those values
      to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed offset
      to __skb_try_recv_from_queue is negative, the checked skb is never skipped.
      __skb_try_recv_from_queue will then ensure the offset is reset back to 0
      if a peek is requested without an offset, unless no packets are found.
      
      Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
      greater then 0, and off is greater then or equal to skb->len, then
      (_off || skb->len) must always be true assuming skb->len >= 0 is always
      true.
      
      Also remove a redundant check around a call to sk_peek_offset in af_unix.c,
      as it double checked if MSG_PEEK was set in the flags.
      
      V2:
       - Moved the negative fixup into __skb_try_recv_from_queue, and remove now
      redundant checks
       - Fix peeking in udp{,v6}_recvmsg to report the right value when the
      offset is 0
      
      V3:
       - Marked new branch in __skb_try_recv_from_queue as unlikely.
      Signed-off-by: NMatthew Dawson <matthew@mjdsystems.ca>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0917e0b
  10. 17 8月, 2017 3 次提交
  11. 16 8月, 2017 2 次提交
  12. 15 8月, 2017 3 次提交
    • E
      tcp: fix possible deadlock in TCP stack vs BPF filter · d624d276
      Eric Dumazet 提交于
      Filtering the ACK packet was not put at the right place.
      
      At this place, we already allocated a child and put it
      into accept queue.
      
      We absolutely need to call tcp_child_process() to release
      its spinlock, or we will deadlock at accept() or close() time.
      
      Found by syzkaller team (Thanks a lot !)
      
      Fixes: 8fac365f ("tcp: Add a tcp_filter hook before handle ack packet")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Chenbo Feng <fengc@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d624d276
    • S
      tcp: ulp: avoid module refcnt leak in tcp_set_ulp · 539a06ba
      Sabrina Dubroca 提交于
      __tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on
      the module. Then, if ->init fails, tcp_set_ulp propagates the error but
      nothing releases that reference.
      
      Fixes: 734942cc ("tcp: ULP infrastructure")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      539a06ba
    • F
      ipv4: route: fix inet_rtm_getroute induced crash · 2c87d63a
      Florian Westphal 提交于
      "ip route get $daddr iif eth0 from $saddr" causes:
       BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
       Call Trace:
        ip_route_input_rcu+0x1535/0x1b50
        ip_route_input_noref+0xf9/0x190
        tcp_v4_early_demux+0x1a4/0x2b0
        ip_rcv+0xbcb/0xc05
        __netif_receive_skb+0x9c/0xd0
        netif_receive_skb_internal+0x5a8/0x890
      
      Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
      iif was provided) or ip_route_output_key_hash_rcu.
      
      But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
      associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
      bit (i.e. skb_dst_drop will release/free the entry while it should not).
      
      Thus only set the dst if we called ip_route_output_key_hash_rcu().
      
      I tested this patch by running:
       while true;do ip r get 10.0.1.2;done > /dev/null &
       while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
      ... and saw no crash or memory leak.
      
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: David Ahern <dsahern@gmail.com>
      Fixes: ba52d61e ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c87d63a