1. 31 7月, 2018 1 次提交
  2. 22 7月, 2018 1 次提交
  3. 19 7月, 2018 1 次提交
  4. 17 7月, 2018 1 次提交
  5. 13 7月, 2018 1 次提交
  6. 10 7月, 2018 1 次提交
  7. 08 7月, 2018 3 次提交
  8. 07 7月, 2018 1 次提交
  9. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  10. 22 6月, 2018 1 次提交
  11. 11 6月, 2018 1 次提交
  12. 26 5月, 2018 1 次提交
  13. 18 5月, 2018 1 次提交
    • E
      tcp: add SACK compression · 5d9f4262
      Eric Dumazet 提交于
      When TCP receives an out-of-order packet, it immediately sends
      a SACK packet, generating network load but also forcing the
      receiver to send 1-MSS pathological packets, increasing its
      RTX queue length/depth, and thus processing time.
      
      Wifi networks suffer from this aggressive behavior, but generally
      speaking, all these SACK packets add fuel to the fire when networks
      are under congestion.
      
      This patch adds a high resolution timer and tp->compressed_ack counter.
      
      Instead of sending a SACK, we program this timer with a small delay,
      based on RTT and capped to 1 ms :
      
      	delay = min ( 5 % of RTT, 1 ms)
      
      If subsequent SACKs need to be sent while the timer has not yet
      expired, we simply increment tp->compressed_ack.
      
      When timer expires, a SACK is sent with the latest information.
      Whenever an ACK is sent (if data is sent, or if in-order
      data is received) timer is canceled.
      
      Note that tcp_sack_new_ofo_skb() is able to force a SACK to be sent
      if the sack blocks need to be shuffled, even if the timer has not
      expired.
      
      A new SNMP counter is added in the following patch.
      
      Two other patches add sysctls to allow changing the 1,000,000 and 44
      values that this commit hard-coded.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d9f4262
  14. 03 5月, 2018 1 次提交
    • E
      tcp: restore autocorking · 114f39fe
      Eric Dumazet 提交于
      When adding rb-tree for TCP retransmit queue, we inadvertently broke
      TCP autocorking.
      
      tcp_should_autocork() should really check if the rtx queue is not empty.
      
      Tested:
      
      Before the fix :
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      2682.85   2.47     1.59     3.618   2.329
      TcpExtTCPAutoCorking            33                 0.0
      
      // Same test, but forcing TCP_NODELAY
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      1408.75   2.44     2.96     6.802   8.259
      TcpExtTCPAutoCorking            1                  0.0
      
      After the fix :
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      5472.46   2.45     1.43     1.761   1.027
      TcpExtTCPAutoCorking            361293             0.0
      
      // With TCP_NODELAY option
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      5454.96   2.46     1.63     1.775   1.174
      TcpExtTCPAutoCorking            315448             0.0
      
      Fixes: 75c119af ("tcp: implement rb-tree based retransmit queue")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMichael Wenig <mwenig@vmware.com>
      Tested-by: NMichael Wenig <mwenig@vmware.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMichael Wenig <mwenig@vmware.com>
      Tested-by: NMichael Wenig <mwenig@vmware.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      114f39fe
  15. 02 5月, 2018 2 次提交
    • S
      tcp: send in-queue bytes in cmsg upon read · b75eba76
      Soheil Hassas Yeganeh 提交于
      Applications with many concurrent connections, high variance
      in receive queue length and tight memory bounds cannot
      allocate worst-case buffer size to drain sockets. Knowing
      the size of receive queue length, applications can optimize
      how they allocate buffers to read from the socket.
      
      The number of bytes pending on the socket is directly
      available through ioctl(FIONREAD/SIOCINQ) and can be
      approximated using getsockopt(MEMINFO) (rmem_alloc includes
      skb overheads in addition to application data). But, both of
      these options add an extra syscall per recvmsg. Moreover,
      ioctl(FIONREAD/SIOCINQ) takes the socket lock.
      
      Add the TCP_INQ socket option to TCP. When this socket
      option is set, recvmsg() relays the number of bytes available
      on the socket for reading to the application via the
      TCP_CM_INQ control message.
      
      Calculate the number of bytes after releasing the socket lock
      to include the processed backlog, if any. To avoid an extra
      branch in the hot path of recvmsg() for this new control
      message, move all cmsg processing inside an existing branch for
      processing receive timestamps. Since the socket lock is not held
      when calculating the size of receive queue, TCP_INQ is a hint.
      For example, it can overestimate the queue size by one byte,
      if FIN is received.
      
      With this method, applications can start reading from the socket
      using a small buffer, and then use larger buffers based on the
      remaining data when needed.
      
      V3 change-log:
      	As suggested by David Miller, added loads with barrier
      	to check whether we have multiple threads calling recvmsg
      	in parallel. When that happens we lock the socket to
      	calculate inq.
      V4 change-log:
      	Removed inline from a static function.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b75eba76
    • E
      tcp: fix TCP_REPAIR_QUEUE bound checking · bf2acc94
      Eric Dumazet 提交于
      syzbot is able to produce a nasty WARN_ON() in tcp_verify_left_out()
      with following C-repro :
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0
      bind(3, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
      sendto(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
      	1242, MSG_FASTOPEN, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("127.0.0.1")}, 16) = 1242
      setsockopt(3, SOL_TCP, TCP_REPAIR_WINDOW, "\4\0\0@+\205\0\0\377\377\0\0\377\377\377\177\0\0\0\0", 20) = 0
      writev(3, [{"\270", 1}], 1)             = 1
      setsockopt(3, SOL_TCP, TCP_REPAIR_OPTIONS, "\10\0\0\0\0\0\0\0\0\0\0\0|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 386) = 0
      writev(3, [{"\210v\r[\226\320t\231qwQ\204\264l\254\t\1\20\245\214p\350H\223\254;\\\37\345\307p$"..., 3144}], 1) = 3144
      
      The 3rd system call looks odd :
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0
      
      This patch makes sure bound checking is using an unsigned compare.
      
      Fixes: ee995283 ("tcp: Initial repair mode")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf2acc94
  16. 30 4月, 2018 1 次提交
    • E
      tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive · 05255b82
      Eric Dumazet 提交于
      When adding tcp mmap() implementation, I forgot that socket lock
      had to be taken before current->mm->mmap_sem. syzbot eventually caught
      the bug.
      
      Since we can not lock the socket in tcp mmap() handler we have to
      split the operation in two phases.
      
      1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
        This operation does not involve any TCP locking.
      
      2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
       the transfert of pages from skbs to one VMA.
        This operation only uses down_read(&current->mm->mmap_sem) after
        holding TCP lock, thus solving the lockdep issue.
      
      This new implementation was suggested by Andy Lutomirski with great details.
      
      Benefits are :
      
      - Better scalability, in case multiple threads reuse VMAS
         (without mmap()/munmap() calls) since mmap_sem wont be write locked.
      
      - Better error recovery.
         The previous mmap() model had to provide the expected size of the
         mapping. If for some reason one part could not be mapped (partial MSS),
         the whole operation had to be aborted.
         With the tcp_zerocopy_receive struct, kernel can report how
         many bytes were successfuly mapped, and how many bytes should
         be read to skip the problematic sequence.
      
      - No more memory allocation to hold an array of page pointers.
        16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
      
      - skbs are freed while mmap_sem has been released
      
      Following patch makes the change in tcp_mmap tool to demonstrate
      one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
      
      Note that memcg might require additional changes.
      
      Fixes: 93ab6cc6 ("tcp: implement mmap() for zero copy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Cc: linux-mm@kvack.org
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05255b82
  17. 27 4月, 2018 1 次提交
  18. 20 4月, 2018 2 次提交
  19. 17 4月, 2018 3 次提交
    • E
      tcp: implement mmap() for zero copy receive · 93ab6cc6
      Eric Dumazet 提交于
      Some networks can make sure TCP payload can exactly fit 4KB pages,
      with well chosen MSS/MTU and architectures.
      
      Implement mmap() system call so that applications can avoid
      copying data without complex splice() games.
      
      Note that a successful mmap( X bytes) on TCP socket is consuming
      bytes, as if recvmsg() has been done. (tp->copied += X)
      
      Only PROT_READ mappings are accepted, as skb page frags
      are fundamentally shared and read only.
      
      If tcp_mmap() finds data that is not a full page, or a patch of
      urgent data, -EINVAL is returned, no bytes are consumed.
      
      Application must fallback to recvmsg() to read the problematic sequence.
      
      mmap() wont block,  regardless of socket being in blocking or
      non-blocking mode. If not enough bytes are in receive queue,
      mmap() would return -EAGAIN, or -EIO if socket is in a state
      where no other bytes can be added into receive queue.
      
      An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
      to efficiently use mmap()
      
      On the sender side, MSG_EOR might help to clearly separate unaligned
      headers and 4K-aligned chunks if necessary.
      
      Tested:
      
      mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
      MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)
      
      Without mmap() (tcp_mmap -s)
      
      received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
        cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
      received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
        cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
      received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
        cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
      received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
        cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches
      
      With mmap() on receiver (tcp_mmap -s -z)
      
      received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
        cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
      received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
        cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
      received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
        cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
      received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
        cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93ab6cc6
    • E
      tcp: avoid extra wakeups for SO_RCVLOWAT users · 03f45c88
      Eric Dumazet 提交于
      SO_RCVLOWAT is properly handled in tcp_poll(), so that POLLIN is only
      generated when enough bytes are available in receive queue, after
      David change (commit c7004482 "tcp: Respect SO_RCVLOWAT in tcp_poll().")
      
      But TCP still calls sk->sk_data_ready() for each chunk added in receive
      queue, meaning thread is awaken, and goes back to sleep shortly after.
      
      Tested:
      
      tcp_mmap test program, receiving 32768 MB of data with SO_RCVLOWAT set to 512KB
      
      -> Should get ~2 wakeups (c-switches) per MB, regardless of how many
      (tiny or big) packets were received.
      
      High speed (mostly full size GRO packets)
      
      received 32768 MB (100 % mmap'ed) in 8.03112 s, 34.2266 Gbit,
        cpu usage user:0.037 sys:1.404, 43.9758 usec per MB, 65497 c-switches
      
      received 32768 MB (99.9954 % mmap'ed) in 7.98453 s, 34.4263 Gbit,
        cpu usage user:0.03 sys:1.422, 44.3115 usec per MB, 65485 c-switches
      
      Low speed (sender is ratelimited and sends 1-MSS at a time, so GRO is not helping)
      
      received 22474.5 MB (100 % mmap'ed) in 6015.35 s, 0.0313414 Gbit,
        cpu usage user:0.05 sys:1.586, 72.7952 usec per MB, 44950 c-switches
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03f45c88
    • E
      tcp: fix SO_RCVLOWAT and RCVBUF autotuning · d1361840
      Eric Dumazet 提交于
      Applications might use SO_RCVLOWAT on TCP socket hoping to receive
      one [E]POLLIN event only when a given amount of bytes are ready in socket
      receive queue.
      
      Problem is that receive autotuning is not aware of this constraint,
      meaning sk_rcvbuf might be too small to allow all bytes to be stored.
      
      Add a new (struct proto_ops)->set_rcvlowat method so that a protocol
      can override the default setsockopt(SO_RCVLOWAT) behavior.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1361840
  20. 16 4月, 2018 1 次提交
  21. 13 4月, 2018 1 次提交
    • E
      tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets · 72123032
      Eric Dumazet 提交于
      syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
      
      I believe this was caused by a TCP_MD5SIG being set on live
      flow.
      
      This is highly unexpected, since TCP option space is limited.
      
      For instance, presence of TCP MD5 option automatically disables
      TCP TimeStamp option at SYN/SYNACK time, which we can not do
      once flow has been established.
      
      Really, adding/deleting an MD5 key only makes sense on sockets
      in CLOSE or LISTEN state.
      
      [1]
      BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
      CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
       tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline]
       tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184
       tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x448fe9
      RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9
      RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004
      RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010
      R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000
      R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624
       __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline]
       tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline]
       tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72123032
  22. 30 3月, 2018 1 次提交
    • J
      bpf: sockmap redirect ingress support · 8934ce2f
      John Fastabend 提交于
      Add support for the BPF_F_INGRESS flag in sk_msg redirect helper.
      To do this add a scatterlist ring for receiving socks to check
      before calling into regular recvmsg call path. Additionally, because
      the poll wakeup logic only checked the skb recv queue we need to
      add a hook in TCP stack (similar to write side) so that we have
      a way to wake up polling socks when a scatterlist is redirected
      to that sock.
      
      After this all that is needed is for the redirect helper to
      push the scatterlist into the psock receive queue.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8934ce2f
  23. 20 3月, 2018 1 次提交
  24. 17 3月, 2018 1 次提交
  25. 08 3月, 2018 1 次提交
  26. 05 3月, 2018 2 次提交
  27. 22 2月, 2018 4 次提交
  28. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  29. 30 1月, 2018 1 次提交
  30. 26 1月, 2018 1 次提交
    • L
      bpf: Add BPF_SOCK_OPS_STATE_CB · d4487491
      Lawrence Brakmo 提交于
      Adds support for calling sock_ops BPF program when there is a TCP state
      change. Two arguments are used; one for the old state and another for
      the new state.
      
      There is a new enum in include/uapi/linux/bpf.h that exports the TCP
      states that prepends BPF_ to the current TCP state names. If it is ever
      necessary to change the internal TCP state values (other than adding
      more to the end), then it will become necessary to convert from the
      internal TCP state value to the BPF value before calling the BPF
      sock_ops function. There are a set of compile checks added in tcp.c
      to detect if the internal and BPF values differ so we can make the
      necessary fixes.
      
      New op: BPF_SOCK_OPS_STATE_CB.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d4487491