1. 15 12月, 2022 1 次提交
    • K
      unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() · 3ff8bff7
      Kirill Tkhai 提交于
      There is a race resulting in alive SOCK_SEQPACKET socket
      may change its state from TCP_ESTABLISHED to TCP_CLOSE:
      
      unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
        sock_orphan(peer)
          sock_set_flag(peer, SOCK_DEAD)
                                                 sock_alloc_send_pskb()
                                                   if !(sk->sk_shutdown & SEND_SHUTDOWN)
                                                     OK
                                                 if sock_flag(peer, SOCK_DEAD)
                                                   sk->sk_state = TCP_CLOSE
        sk->sk_shutdown = SHUTDOWN_MASK
      
      After that socket sk remains almost normal: it is able to connect, listen, accept
      and recvmsg, while it can't sendmsg.
      
      Since this is the only possibility for alive SOCK_SEQPACKET to change
      the state in such way, we should better fix this strange and potentially
      danger corner case.
      
      Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb().
      Originally used ECONNREFUSED looks strange, since it's strange to return
      a specific retval in dependence of race in kernel, when user can't affect on this.
      
      Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock
      to fix race with unix_dgram_connect():
      
      unix_dgram_connect(other)            unix_dgram_sendmsg(sk)
                                             unix_peer(sk) = NULL
                                             unix_state_unlock(sk)
        unix_state_double_lock(sk, other)
        sk->sk_state  = TCP_ESTABLISHED
        unix_peer(sk) = other
        unix_state_double_unlock(sk, other)
                                             sk->sk_state  = TCP_CLOSED
      
      This patch fixes both of these races.
      
      Fixes: 83301b53 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
      Signed-off-by: NKirill Tkhai <tkhai@ya.ru>
      Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ruSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      3ff8bff7
  2. 12 12月, 2022 1 次提交
  3. 12 10月, 2022 1 次提交
  4. 03 10月, 2022 1 次提交
    • K
      af_unix: Fix memory leaks of the whole sk due to OOB skb. · 7a62ed61
      Kuniyuki Iwashima 提交于
      syzbot reported a sequence of memory leaks, and one of them indicated we
      failed to free a whole sk:
      
        unreferenced object 0xffff8880126e0000 (size 1088):
          comm "syz-executor419", pid 326, jiffies 4294773607 (age 12.609s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00  ........}.......
            01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
          backtrace:
            [<000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970
            [<0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029
            [<00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928
            [<00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997
            [<0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516
            [<00000000da1521e1>] sock_create net/socket.c:1566 [inline]
            [<00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698
            [<000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline]
            [<000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline]
            [<000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748
            [<000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
            [<000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
            [<000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets,
      send()ing an OOB skb to each other, and close()ing them without consuming
      the OOB skbs.
      
        int skpair[2];
      
        socketpair(AF_UNIX, SOCK_STREAM, 0, skpair);
      
        send(skpair[0], "x", 1, MSG_OOB);
        send(skpair[1], "x", 1, MSG_OOB);
      
        close(skpair[0]);
        close(skpair[1]);
      
      Currently, we free an OOB skb in unix_sock_destructor() which is called via
      __sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb
      is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is
      called only when sk->sk_wmem_alloc is 0.
      
      In the repro sequences, we do not consume the OOB skb, so both two sk's
      sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc.
      Then, no one can consume the OOB skb nor call __sk_free(), and we finally
      leak the two whole sk.
      
      Thus, we must free the unconsumed OOB skb earlier when close()ing the
      socket.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a62ed61
  5. 27 9月, 2022 1 次提交
  6. 08 9月, 2022 1 次提交
    • P
      freezer,sched: Rewrite core freezer logic · f5d39b02
      Peter Zijlstra 提交于
      Rewrite the core freezer to behave better wrt thawing and be simpler
      in general.
      
      By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
      ensured frozen tasks stay frozen until thawed and don't randomly wake
      up early, as is currently possible.
      
      As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
      two PF_flags (yay!).
      
      Specifically; the current scheme works a little like:
      
      	freezer_do_not_count();
      	schedule();
      	freezer_count();
      
      And either the task is blocked, or it lands in try_to_freezer()
      through freezer_count(). Now, when it is blocked, the freezer
      considers it frozen and continues.
      
      However, on thawing, once pm_freezing is cleared, freezer_count()
      stops working, and any random/spurious wakeup will let a task run
      before its time.
      
      That is, thawing tries to thaw things in explicit order; kernel
      threads and workqueues before doing bringing SMP back before userspace
      etc.. However due to the above mentioned races it is entirely possible
      for userspace tasks to thaw (by accident) before SMP is back.
      
      This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
      where the userspace task requires a special CPU to run.
      
      As said; replace this with a special task state TASK_FROZEN and add
      the following state transitions:
      
      	TASK_FREEZABLE	-> TASK_FROZEN
      	__TASK_STOPPED	-> TASK_FROZEN
      	__TASK_TRACED	-> TASK_FROZEN
      
      The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
      (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
      is already required to deal with spurious wakeups and the freezer
      causes one such when thawing the task (since the original state is
      lost).
      
      The special __TASK_{STOPPED,TRACED} states *can* be restored since
      their canonical state is in ->jobctl.
      
      With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
      free of undue (early / spurious) wakeups.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
      f5d39b02
  7. 22 8月, 2022 1 次提交
  8. 07 7月, 2022 1 次提交
    • K
      af_unix: Optimise hash table layout. · cf21b355
      Kuniyuki Iwashima 提交于
      Commit 6dd4142f ("Merge branch 'af_unix-per-netns-socket-hash'") and
      commit 51bae889 ("af_unix: Put pathname sockets in the global hash
      table.") changed a hash table layout.
      
        Before:
          unix_socket_table [0   - 255] : abstract & pathname sockets
                            [256 - 511] : unnamed sockets
      
        After:
          per-netns table   [0   - 255] : abstract & pathname sockets
                            [256 - 511] : unnamed sockets
          bsd_socket_table  [0   - 255] : pathname sockets (sk_bind_node)
      
      Now, while looking up sockets, we traverse the global table for the
      pathname sockets and the first half of each per-netns hash table for
      abstract sockets, where pathname sockets are also linked.  Thus, the
      more pathname sockets we have, the longer we take to look up abstract
      sockets.  This characteristic has been there before the layout change,
      but we can improve it now.
      
      This patch changes the per-netns hash table's layout so that sockets not
      requiring lookup reside in the first half and do not impact the lookup of
      abstract sockets.
      
          per-netns table   [0   - 255] : pathname & unnamed sockets
                            [256 - 511] : abstract sockets
          bsd_socket_table  [0   - 255] : pathname sockets (sk_bind_node)
      
      We have run a test that bind()s 100,000 abstract/pathname sockets for
      each, bind()s an abstract socket 100,000 times and measures the time
      on __unix_find_socket_byname().  The result shows that the patch makes
      each lookup faster.
      
        Without this patch:
          $ sudo ./funclatency -p 2278 --microseconds __unix_find_socket_byname.isra.44
           usec                : count    distribution
               0 -> 1          : 0        |                                        |
               2 -> 3          : 0        |                                        |
               4 -> 7          : 0        |                                        |
               8 -> 15         : 126      |                                        |
              16 -> 31         : 1438     |*                                       |
              32 -> 63         : 4150     |***                                     |
              64 -> 127        : 9049     |*******                                 |
             128 -> 255        : 37704    |*******************************         |
             256 -> 511        : 47533    |****************************************|
      
        With this patch:
          $ sudo ./funclatency -p 3648 --microseconds __unix_find_socket_byname.isra.46
           usec                : count    distribution
               0 -> 1          : 109      |                                        |
               2 -> 3          : 318      |                                        |
               4 -> 7          : 725      |                                        |
               8 -> 15         : 2501     |*                                       |
              16 -> 31         : 3061     |**                                      |
              32 -> 63         : 4028     |***                                     |
              64 -> 127        : 9312     |*******                                 |
             128 -> 255        : 51372    |****************************************|
             256 -> 511        : 28574    |**********************                  |
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20220705233715.759-1-kuniyu@amazon.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      cf21b355
  9. 05 7月, 2022 1 次提交
    • K
      af_unix: Put pathname sockets in the global hash table. · 51bae889
      Kuniyuki Iwashima 提交于
      Commit cf2f225e ("af_unix: Put a socket into a per-netns hash table.")
      accidentally broke user API for pathname sockets.  A socket was able to
      connect() to a pathname socket whose file was visible even if they were in
      different network namespaces.
      
      The commit puts all sockets into a per-netns hash table.  As a result,
      connect() to a pathname socket in a different netns fails to find it in the
      caller's per-netns hash table and returns -ECONNREFUSED even when the task
      can view the peer socket file.
      
      We can reproduce this issue by:
      
        Console A:
      
          # python3
          >>> from socket import *
          >>> s = socket(AF_UNIX, SOCK_STREAM, 0)
          >>> s.bind('test')
          >>> s.listen(32)
      
        Console B:
      
          # ip netns add test
          # ip netns exec test sh
          # python3
          >>> from socket import *
          >>> s = socket(AF_UNIX, SOCK_STREAM, 0)
          >>> s.connect('test')
      
      Note when dumping sockets by sock_diag, procfs, and bpf_iter, they are
      filtered only by netns.  In other words, even if they are visible and
      connect()able, all sockets in different netns are skipped while iterating
      sockets.  Thus, we need a fix only for finding a peer pathname socket.
      
      This patch adds a global hash table for pathname sockets, links them with
      sk_bind_node, and uses it in unix_find_socket_byinode().  By doing so, we
      can keep sockets in per-netns hash tables and dump them easily.
      
      Thanks to Sachin Sant and Leonard Crestez for reports, logs and a reproducer.
      
      Fixes: cf2f225e ("af_unix: Put a socket into a per-netns hash table.")
      Reported-by: NSachin Sant <sachinp@linux.ibm.com>
      Reported-by: NLeonard Crestez <cdleonard@gmail.com>
      Tested-by: NSachin Sant <sachinp@linux.ibm.com>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Tested-by: NLeonard Crestez <cdleonard@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      51bae889
  10. 22 6月, 2022 6 次提交
  11. 20 6月, 2022 1 次提交
  12. 10 6月, 2022 1 次提交
  13. 07 6月, 2022 1 次提交
  14. 17 5月, 2022 1 次提交
  15. 12 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from recvmsg() entities · ec095263
      Oliver Hartkopp 提交于
      The internal recvmsg() functions have two parameters 'flags' and 'noblock'
      that were merged inside skb_recv_datagram(). As a follow up patch to commit
      f4b41f06 ("net: remove noblock parameter from skb_recv_datagram()")
      this patch removes the separate 'noblock' parameter for recvmsg().
      
      Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
      'noblock' parameters are unnecessarily split up with e.g.
      
      err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
                                 flags & ~MSG_DONTWAIT, &addr_len);
      
      or in
      
      err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
                            sk, msg, size, flags & MSG_DONTWAIT,
                            flags & ~MSG_DONTWAIT, &addr_len);
      
      instead of simply using only flags all the time and check for MSG_DONTWAIT
      where needed (to preserve for the formerly separated no(n)block condition).
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      ec095263
  16. 06 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from skb_recv_datagram() · f4b41f06
      Oliver Hartkopp 提交于
      skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
      merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
      
      As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
      into 'flags' and 'noblock' with finally obsolete bit operations like this:
      
      skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
      
      And this is not even done consistently with the 'flags' parameter.
      
      This patch removes the obsolete and costly splitting into two parameters
      and only performs bit operations when really needed on the caller side.
      
      One missing conversion thankfully reported by kernel test robot. I missed
      to enable kunit tests to build the mctp code.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b41f06
  17. 19 3月, 2022 1 次提交
  18. 18 3月, 2022 2 次提交
    • K
      af_unix: Support POLLPRI for OOB. · d9a232d4
      Kuniyuki Iwashima 提交于
      The commit 314001f0 ("af_unix: Add OOB support") introduced OOB for
      AF_UNIX, but it lacks some changes for POLLPRI.  Let's add the missing
      piece.
      
      In the selftest, normal datagrams are sent followed by OOB data, so this
      commit replaces `POLLIN | POLLPRI` with just `POLLPRI` in the first test
      case.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9a232d4
    • K
      af_unix: Fix some data-races around unix_sk(sk)->oob_skb. · e82025c6
      Kuniyuki Iwashima 提交于
      Out-of-band data automatically places a "mark" showing wherein the
      sequence the out-of-band data would have been.  If the out-of-band data
      implies cancelling everything sent so far, the "mark" is helpful to flush
      them.  When the socket's read pointer reaches the "mark", the ioctl() below
      sets a non zero value to the arg `atmark`:
      
      The out-of-band data is queued in sk->sk_receive_queue as well as ordinary
      data and also saved in unix_sk(sk)->oob_skb.  It can be used to test if the
      head of the receive queue is the out-of-band data meaning the socket is at
      the "mark".
      
      While testing that, unix_ioctl() reads unix_sk(sk)->oob_skb locklessly.
      Thus, all accesses to oob_skb need some basic protection to avoid
      load/store tearing which KCSAN detects when these are called concurrently:
      
        - ioctl(fd_a, SIOCATMARK, &atmark, sizeof(atmark))
        - send(fd_b_connected_to_a, buf, sizeof(buf), MSG_OOB)
      
      BUG: KCSAN: data-race in unix_ioctl / unix_stream_sendmsg
      
      write to 0xffff888003d9cff0 of 8 bytes by task 175 on cpu 1:
       unix_stream_sendmsg (net/unix/af_unix.c:2087 net/unix/af_unix.c:2191)
       sock_sendmsg (net/socket.c:705 net/socket.c:725)
       __sys_sendto (net/socket.c:2040)
       __x64_sys_sendto (net/socket.c:2048)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
      
      read to 0xffff888003d9cff0 of 8 bytes by task 176 on cpu 0:
       unix_ioctl (net/unix/af_unix.c:3101 (discriminator 1))
       sock_do_ioctl (net/socket.c:1128)
       sock_ioctl (net/socket.c:1242)
       __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
      
      value changed: 0xffff888003da0c00 -> 0xffff888003da0d00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 176 Comm: unix_race_oob_i Not tainted 5.17.0-rc5-59529-g83dc4c2a #12
      Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e82025c6
  19. 19 1月, 2022 3 次提交
  20. 30 12月, 2021 1 次提交
  21. 27 11月, 2021 12 次提交
    • K
      af_unix: Relax race in unix_autobind(). · 9acbc584
      Kuniyuki Iwashima 提交于
      When we bind an AF_UNIX socket without a name specified, the kernel selects
      an available one from 0x00000 to 0xFFFFF.  unix_autobind() starts searching
      from a number in the 'static' variable and increments it after acquiring
      two locks.
      
      If multiple processes try autobind, they obtain the same lock and check if
      a socket in the hash list has the same name.  If not, one process uses it,
      and all except one end up retrying the _next_ number (actually not, it may
      be incremented by the other processes).  The more we autobind sockets in
      parallel, the longer the latency gets.  We can avoid such a race by
      searching for a name from a random number.
      
      These show latency in unix_autobind() while 64 CPUs are simultaneously
      autobind-ing 1024 sockets for each.
      
        Without this patch:
      
           usec          : count     distribution
              0          : 1176     |***                                     |
              2          : 3655     |***********                             |
              4          : 4094     |*************                           |
              6          : 3831     |************                            |
              8          : 3829     |************                            |
              10         : 3844     |************                            |
              12         : 3638     |***********                             |
              14         : 2992     |*********                               |
              16         : 2485     |*******                                 |
              18         : 2230     |*******                                 |
              20         : 2095     |******                                  |
              22         : 1853     |*****                                   |
              24         : 1827     |*****                                   |
              26         : 1677     |*****                                   |
              28         : 1473     |****                                    |
              30         : 1573     |*****                                   |
              32         : 1417     |****                                    |
              34         : 1385     |****                                    |
              36         : 1345     |****                                    |
              38         : 1344     |****                                    |
              40         : 1200     |***                                     |
      
        With this patch:
      
           usec          : count     distribution
              0          : 1855     |******                                  |
              2          : 6464     |*********************                   |
              4          : 9936     |********************************        |
              6          : 12107    |****************************************|
              8          : 10441    |**********************************      |
              10         : 7264     |***********************                 |
              12         : 4254     |**************                          |
              14         : 2538     |********                                |
              16         : 1596     |*****                                   |
              18         : 1088     |***                                     |
              20         : 800      |**                                      |
              22         : 670      |**                                      |
              24         : 601      |*                                       |
              26         : 562      |*                                       |
              28         : 525      |*                                       |
              30         : 446      |*                                       |
              32         : 378      |*                                       |
              34         : 337      |*                                       |
              36         : 317      |*                                       |
              38         : 314      |*                                       |
              40         : 298      |                                        |
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9acbc584
    • K
      af_unix: Replace the big lock with small locks. · afd20b92
      Kuniyuki Iwashima 提交于
      The hash table of AF_UNIX sockets is protected by the single lock.  This
      patch replaces it with per-hash locks.
      
      The effect is noticeable when we handle multiple sockets simultaneously.
      Here is a test result on an EC2 c5.24xlarge instance.  It shows latency
      (under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating
      1024 sockets for each in parallel.
      
        Without this patch:
      
           nsec          : count     distribution
              0          : 179      |                                        |
              500        : 3021     |*********                               |
              1000       : 6271     |*******************                     |
              1500       : 6318     |*******************                     |
              2000       : 5828     |*****************                       |
              2500       : 5124     |***************                         |
              3000       : 4426     |*************                           |
              3500       : 3672     |***********                             |
              4000       : 3138     |*********                               |
              4500       : 2811     |********                                |
              5000       : 2384     |*******                                 |
              5500       : 2023     |******                                  |
              6000       : 1954     |*****                                   |
              6500       : 1737     |*****                                   |
              7000       : 1749     |*****                                   |
              7500       : 1520     |****                                    |
              8000       : 1469     |****                                    |
              8500       : 1394     |****                                    |
              9000       : 1232     |***                                     |
              9500       : 1138     |***                                     |
              10000      : 994      |***                                     |
      
        With this patch:
      
           nsec          : count     distribution
              0          : 1634     |****                                    |
              500        : 13170    |****************************************|
              1000       : 13156    |*************************************** |
              1500       : 9010     |***************************             |
              2000       : 6363     |*******************                     |
              2500       : 4443     |*************                           |
              3000       : 3240     |*********                               |
              3500       : 2549     |*******                                 |
              4000       : 1872     |*****                                   |
              4500       : 1504     |****                                    |
              5000       : 1247     |***                                     |
              5500       : 1035     |***                                     |
              6000       : 889      |**                                      |
              6500       : 744      |**                                      |
              7000       : 634      |*                                       |
              7500       : 498      |*                                       |
              8000       : 433      |*                                       |
              8500       : 355      |*                                       |
              9000       : 336      |*                                       |
              9500       : 284      |                                        |
              10000      : 243      |                                        |
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      afd20b92
    • K
      af_unix: Save hash in sk_hash. · e6b4b873
      Kuniyuki Iwashima 提交于
      To replace unix_table_lock with per-hash locks in the next patch, we need
      to save a hash in each socket because /proc/net/unix or BPF prog iterate
      sockets while holding a hash table lock and release it later in a different
      function.
      
      Currently, we store a real/pseudo hash in struct unix_address.  However, we
      do not allocate it to unbound sockets, nor should we do just for that.  For
      this purpose, we can use sk_hash.  Then, we no longer use the hash field in
      struct unix_address and can remove it.
      
      Also, this patch does
        - rename unix_insert_socket() to unix_insert_unbound_socket()
        - remove the redundant list argument from __unix_insert_socket() and
           unix_insert_unbound_socket()
        - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash()
        - remove 'inline' from unix_remove_socket() and
           unix_insert_unbound_socket().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e6b4b873
    • K
      af_unix: Add helpers to calculate hashes. · f452be49
      Kuniyuki Iwashima 提交于
      This patch adds three helper functions that calculate hashes for unbound
      sockets and bound sockets with BSD/abstract addresses.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f452be49
    • K
      af_unix: Remove UNIX_ABSTRACT() macro and test sun_path[0] instead. · 5ce7ab49
      Kuniyuki Iwashima 提交于
      In BSD and abstract address cases, we store sockets in the hash table with
      keys between 0 and UNIX_HASH_SIZE - 1.  However, the hash saved in a socket
      varies depending on its address type; sockets with BSD addresses always
      have UNIX_HASH_SIZE in their unix_sk(sk)->addr->hash.
      
      This is just for the UNIX_ABSTRACT() macro used to check the address type.
      The difference of the saved hashes comes from the first byte of the address
      in the first place.  So, we can test it directly.
      
      Then we can keep a real hash in each socket and replace unix_table_lock
      with per-hash locks in the later patch.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5ce7ab49
    • K
      af_unix: Allocate unix_address in unix_bind_(bsd|abstract)(). · 12f21c49
      Kuniyuki Iwashima 提交于
      To terminate address with '\0' in unix_bind_bsd(), we add
      unix_create_addr() and call it in unix_bind_bsd() and unix_bind_abstract().
      
      Also, unix_bind_abstract() does not return -EEXIST.  Only
      kern_path_create() and vfs_mknod() in unix_bind_bsd() can return it,
      so we move the last error check in unix_bind() to unix_bind_bsd().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      12f21c49
    • K
      af_unix: Remove unix_mkname(). · 5c32a3ed
      Kuniyuki Iwashima 提交于
      This patch removes unix_mkname() and postpones calculating a hash to
      unix_bind_abstract().  Some BSD stuffs still remain in unix_bind()
      though, the next patch packs them into unix_bind_bsd().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5c32a3ed
    • K
      af_unix: Copy unix_mkname() into unix_find_(bsd|abstract)(). · d2d8c9fd
      Kuniyuki Iwashima 提交于
      We should not call unix_mkname() before unix_find_other() and instead do
      the same thing where necessary based on the address type:
      
        - terminating the address with '\0' in unix_find_bsd()
        - calculating the hash in unix_find_abstract().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d2d8c9fd
    • K
      af_unix: Cut unix_validate_addr() out of unix_mkname(). · b8a58aa6
      Kuniyuki Iwashima 提交于
      unix_mkname() tests socket address length and family and does some
      processing based on the address type.  It is called in the early stage,
      and therefore some instructions are redundant and can end up in vain.
      
      The address length/family tests are done twice in unix_bind().  Also, the
      address type is rechecked later in unix_bind() and unix_find_other(), where
      we can do the same processing.  Moreover, in the BSD address case, the hash
      is set to 0 but never used and confusing.
      
      This patch moves the address tests out of unix_mkname(), and the following
      patches move the other part into appropriate places and remove
      unix_mkname() finally.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b8a58aa6
    • K
      af_unix: Return an error as a pointer in unix_find_other(). · aed26f55
      Kuniyuki Iwashima 提交于
      We can return an error as a pointer and need not pass an additional
      argument to unix_find_other().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      aed26f55
    • K
      af_unix: Factorise unix_find_other() based on address types. · fa39ef0e
      Kuniyuki Iwashima 提交于
      As done in the commit fa42d910 ("unix_bind(): take BSD and abstract
      address cases into new helpers"), this patch moves BSD and abstract address
      cases from unix_find_other() into unix_find_bsd() and unix_find_abstract().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      fa39ef0e
    • K
      af_unix: Pass struct sock to unix_autobind(). · f7ed31f4
      Kuniyuki Iwashima 提交于
      We do not use struct socket in unix_autobind() and pass struct sock to
      unix_bind_bsd() and unix_bind_abstract().  Let's pass it to unix_autobind()
      as well.
      
      Also, this patch fixes these errors by checkpatch.pl.
      
        ERROR: do not use assignment in if condition
        #1795: FILE: net/unix/af_unix.c:1795:
        +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
      
        CHECK: Logical continuations should be on the previous line
        #1796: FILE: net/unix/af_unix.c:1796:
        +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
        +	    && (err = unix_autobind(sock)) != 0)
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f7ed31f4