1. 02 4月, 2019 1 次提交
    • J
      kcm: switch order of device registration to fix a crash · 3c446e6f
      Jiri Slaby 提交于
      When kcm is loaded while many processes try to create a KCM socket, a
      crash occurs:
       BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
       IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
       Oops: 0002 [#1] SMP KASAN PTI
       CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
       RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
       ...
       CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
       Call Trace:
        kcm_create+0x600/0xbf0 [kcm]
        __sock_create+0x324/0x750 net/socket.c:1272
       ...
      
      This is due to race between sock_create and unfinished
      register_pernet_device. kcm_create tries to do "net_generic(net,
      kcm_net_id)". but kcm_net_id is not initialized yet.
      
      So switch the order of the two to close the race.
      
      This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
      and one process doing module removal.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c446e6f
  2. 24 2月, 2019 1 次提交
  3. 18 9月, 2018 2 次提交
    • D
      Revert "kcm: remove any offset before parsing messages" · 3275b4df
      David S. Miller 提交于
      This reverts commit 072222b4.
      
      I just read that this causes regressions.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3275b4df
    • D
      kcm: remove any offset before parsing messages · 072222b4
      Dominique Martinet 提交于
      The current code assumes kcm users know they need to look for the
      strparser offset within their bpf program, which is not documented
      anywhere and examples laying around do not do.
      
      The actual recv function does handle the offset well, so we can create a
      temporary clone of the skb and pull that one up as required for parsing.
      
      The pull itself has a cost if we are pulling beyond the head data,
      measured to 2-3% latency in a noisy VM with a local client stressing
      that path. The clone's impact seemed too small to measure.
      
      This bug can be exhibited easily by implementing a "trivial" kcm parser
      taking the first bytes as size, and on the client sending at least two
      such packets in a single write().
      
      Note that bpf sockmap has the same problem, both for parse and for recv,
      so it would pulling twice or a real pull within the strparser logic if
      anyone cares about that.
      Signed-off-by: NDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      072222b4
  4. 25 7月, 2018 1 次提交
  5. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  6. 01 6月, 2018 1 次提交
  7. 26 5月, 2018 1 次提交
  8. 28 3月, 2018 1 次提交
  9. 16 3月, 2018 1 次提交
  10. 28 2月, 2018 1 次提交
  11. 15 2月, 2018 1 次提交
  12. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  13. 25 1月, 2018 2 次提交
  14. 06 12月, 2017 2 次提交
  15. 27 9月, 2017 1 次提交
  16. 31 8月, 2017 1 次提交
  17. 25 8月, 2017 1 次提交
    • E
      strparser: initialize all callbacks · 3fd87127
      Eric Biggers 提交于
      commit bbb03029 ("strparser: Generalize strparser") added more
      function pointers to 'struct strp_callbacks'; however, kcm_attach() was
      not updated to initialize them.  This could cause the ->lock() and/or
      ->unlock() function pointers to be set to garbage values, causing a
      crash in strp_work().
      
      Fix the bug by moving the callback structs into static memory, so
      unspecified members are zeroed.  Also constify them while we're at it.
      
      This bug was found by syzkaller, which encountered the following splat:
      
          IP: 0x55
          PGD 3b1ca067
          P4D 3b1ca067
          PUD 3b12f067
          PMD 0
      
          Oops: 0010 [#1] SMP KASAN
          Dumping ftrace buffer:
             (ftrace buffer empty)
          Modules linked in:
          CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: kstrp strp_work
          task: ffff88006bb0e480 task.stack: ffff88006bb10000
          RIP: 0010:0x55
          RSP: 0018:ffff88006bb17540 EFLAGS: 00010246
          RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000
          RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48
          RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000
          R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48
          R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000
          FS:  0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
           worker_thread+0x223/0x1860 kernel/workqueue.c:2233
           kthread+0x35e/0x430 kernel/kthread.c:231
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
          Code:  Bad RIP value.
          RIP: 0x55 RSP: ffff88006bb17540
          CR2: 0000000000000055
          ---[ end trace f0e4920047069cee ]---
      
      Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
      CONFIG_AF_KCM=y):
      
          #include <linux/bpf.h>
          #include <linux/kcm.h>
          #include <linux/types.h>
          #include <stdint.h>
          #include <sys/ioctl.h>
          #include <sys/socket.h>
          #include <sys/syscall.h>
          #include <unistd.h>
      
          static const struct bpf_insn bpf_insns[3] = {
              { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
              { .code = 0x95 }, /* BPF_EXIT_INSN() */
          };
      
          static const union bpf_attr bpf_attr = {
              .prog_type = 1,
              .insn_cnt = 2,
              .insns = (uintptr_t)&bpf_insns,
              .license = (uintptr_t)"",
          };
      
          int main(void)
          {
              int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
                                   &bpf_attr, sizeof(bpf_attr));
              int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
              int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);
      
              ioctl(kcm_fd, SIOCKCMATTACH,
                    &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
          }
      
      Fixes: bbb03029 ("strparser: Generalize strparser")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fd87127
  18. 02 8月, 2017 1 次提交
  19. 16 5月, 2017 1 次提交
  20. 18 4月, 2017 1 次提交
  21. 25 3月, 2017 1 次提交
  22. 02 3月, 2017 1 次提交
  23. 15 2月, 2017 1 次提交
  24. 10 2月, 2017 1 次提交
    • W
      kcm: fix 0-length case for kcm_sendmsg() · 98e3862c
      WANG Cong 提交于
      Dmitry reported a kernel warning:
      
       WARNING: CPU: 3 PID: 2936 at net/kcm/kcmsock.c:627
       kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
       CPU: 3 PID: 2936 Comm: a.out Not tainted 4.10.0-rc6+ #209
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:15 [inline]
        dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
        panic+0x1fb/0x412 kernel/panic.c:179
        __warn+0x1c4/0x1e0 kernel/panic.c:539
        warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
        kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
        kcm_sendmsg+0x163a/0x2200 net/kcm/kcmsock.c:1029
        sock_sendmsg_nosec net/socket.c:635 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:645
        sock_write_iter+0x326/0x600 net/socket.c:848
        new_sync_write fs/read_write.c:499 [inline]
        __vfs_write+0x483/0x740 fs/read_write.c:512
        vfs_write+0x187/0x530 fs/read_write.c:560
        SYSC_write fs/read_write.c:607 [inline]
        SyS_write+0xfb/0x230 fs/read_write.c:599
        entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      when calling syscall(__NR_write, sock2, 0x208aaf27ul, 0x0ul) on a KCM
      seqpacket socket. It appears that kcm_sendmsg() does not handle len==0
      case correctly, which causes an empty skb is allocated and queued.
      Fix this by skipping the skb allocation for len==0 case.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98e3862c
  25. 04 10月, 2016 1 次提交
    • A
      skb_splice_bits(): get rid of callback · 25869262
      Al Viro 提交于
      since pipe_lock is the outermost now, we don't need to drop/regain
      socket locks around the call of splice_to_pipe() from skb_splice_bits(),
      which kills the need to have a socket-specific callback; we can just
      call splice_to_pipe() and be done with that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      25869262
  26. 01 9月, 2016 1 次提交
    • W
      kcm: fix a socket double free · c0338aff
      WANG Cong 提交于
      Dmitry reported a double free on kcm socket, which could
      be easily reproduced by:
      
      	#include <unistd.h>
      	#include <sys/syscall.h>
      
      	int main()
      	{
      	  int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0);
      	  syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0);
      	  return 0;
      	}
      
      This is because on the error path, after we install
      the new socket file, we call sock_release() to clean
      up the socket, which leaves the fd pointing to a freed
      socket. Fix this by calling sys_close() on that fd
      directly.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0338aff
  27. 29 8月, 2016 1 次提交
  28. 24 8月, 2016 1 次提交
  29. 18 8月, 2016 1 次提交
  30. 02 7月, 2016 1 次提交
    • D
      bpf: refactor bpf_prog_get and type check into helper · 113214be
      Daniel Borkmann 提交于
      Since bpf_prog_get() and program type check is used in a couple of places,
      refactor this into a small helper function that we can make use of. Since
      the non RO prog->aux part is not used in performance critical paths and a
      program destruction via RCU is rather very unlikley when doing the put, we
      shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
      check, but actually not taking the ref at all (due to being in fdget() /
      fdput() section of the bpf fd) is even cleaner and makes the diff smaller
      as well, so just go for that. Callsites are changed to make use of the new
      helper where possible.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      113214be
  31. 20 5月, 2016 1 次提交
  32. 10 3月, 2016 6 次提交
    • T
      kcm: Add receive message timeout · 29152a34
      Tom Herbert 提交于
      This patch adds receive timeout for message assembly on the attached TCP
      sockets. The timeout is set when a new messages is started and the whole
      message has not been received by TCP (not in the receive queue). If the
      completely message is subsequently received the timer is cancelled, if the
      timer expires the RX side is aborted.
      
      The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is
      set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket
      to KCM.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29152a34
    • T
      kcm: Add memory limit for receive message construction · 7ced95ef
      Tom Herbert 提交于
      Message assembly is performed on the TCP socket. This is logically
      equivalent of an application that performs a peek on the socket to find
      out how much memory is needed for a receive buffer. The receive socket
      buffer also provides the maximum message size which is checked.
      
      The receive algorithm is something like:
      
         1) Receive the first skbuf for a message (or skbufs if multiple are
            needed to determine message length).
         2) Check the message length against the number of bytes in the TCP
            receive queue (tcp_inq()).
      	- If all the bytes of the message are in the queue (incluing the
      	  skbuf received), then proceed with message assembly (it should
      	  complete with the tcp_read_sock)
              - Else, mark the psock with the number of bytes needed to
      	  complete the message.
         3) In TCP data ready function, if the psock indicates that we are
            waiting for the rest of the bytes of a messages, check the number
            of queued bytes against that.
              - If there are still not enough bytes for the message, just
      	  return
              - Else, clear the waiting bytes and proceed to receive the
      	  skbufs.  The message should now be received in one
      	  tcp_read_sock
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ced95ef
    • T
      kcm: Sendpage support · f29698fc
      Tom Herbert 提交于
      Implement kcm_sendpage. Set in sendpage to kcm_sendpage in both
      dgram and seqpacket ops.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f29698fc
    • T
      kcm: Splice support · 91687355
      Tom Herbert 提交于
      Implement kcm_splice_read. This is supported only for seqpacket.
      Add kcm_seqpacket_ops and set splice read to kcm_splice_read.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91687355
    • T
      kcm: Add statistics and proc interfaces · cd6e111b
      Tom Herbert 提交于
      This patch adds various counters for KCM. These include counters for
      messages and bytes received or sent, as well as counters for number of
      attached/unattached TCP sockets and other error or edge events.
      
      The statistics are exposed via a proc interface. /proc/net/kcm provides
      statistics per KCM socket and per psock (attached TCP sockets).
      /proc/net/kcm_stats provides aggregate statistics.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd6e111b
    • T
      kcm: Kernel Connection Multiplexor module · ab7ac4eb
      Tom Herbert 提交于
      This module implements the Kernel Connection Multiplexor.
      
      Kernel Connection Multiplexor (KCM) is a facility that provides a
      message based interface over TCP for generic application protocols.
      With KCM an application can efficiently send and receive application
      protocol messages over TCP using datagram sockets.
      
      For more information see the included Documentation/networking/kcm.txt
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab7ac4eb