1. 07 7月, 2022 1 次提交
    • K
      af_unix: Optimise hash table layout. · cf21b355
      Kuniyuki Iwashima 提交于
      Commit 6dd4142f ("Merge branch 'af_unix-per-netns-socket-hash'") and
      commit 51bae889 ("af_unix: Put pathname sockets in the global hash
      table.") changed a hash table layout.
      
        Before:
          unix_socket_table [0   - 255] : abstract & pathname sockets
                            [256 - 511] : unnamed sockets
      
        After:
          per-netns table   [0   - 255] : abstract & pathname sockets
                            [256 - 511] : unnamed sockets
          bsd_socket_table  [0   - 255] : pathname sockets (sk_bind_node)
      
      Now, while looking up sockets, we traverse the global table for the
      pathname sockets and the first half of each per-netns hash table for
      abstract sockets, where pathname sockets are also linked.  Thus, the
      more pathname sockets we have, the longer we take to look up abstract
      sockets.  This characteristic has been there before the layout change,
      but we can improve it now.
      
      This patch changes the per-netns hash table's layout so that sockets not
      requiring lookup reside in the first half and do not impact the lookup of
      abstract sockets.
      
          per-netns table   [0   - 255] : pathname & unnamed sockets
                            [256 - 511] : abstract sockets
          bsd_socket_table  [0   - 255] : pathname sockets (sk_bind_node)
      
      We have run a test that bind()s 100,000 abstract/pathname sockets for
      each, bind()s an abstract socket 100,000 times and measures the time
      on __unix_find_socket_byname().  The result shows that the patch makes
      each lookup faster.
      
        Without this patch:
          $ sudo ./funclatency -p 2278 --microseconds __unix_find_socket_byname.isra.44
           usec                : count    distribution
               0 -> 1          : 0        |                                        |
               2 -> 3          : 0        |                                        |
               4 -> 7          : 0        |                                        |
               8 -> 15         : 126      |                                        |
              16 -> 31         : 1438     |*                                       |
              32 -> 63         : 4150     |***                                     |
              64 -> 127        : 9049     |*******                                 |
             128 -> 255        : 37704    |*******************************         |
             256 -> 511        : 47533    |****************************************|
      
        With this patch:
          $ sudo ./funclatency -p 3648 --microseconds __unix_find_socket_byname.isra.46
           usec                : count    distribution
               0 -> 1          : 109      |                                        |
               2 -> 3          : 318      |                                        |
               4 -> 7          : 725      |                                        |
               8 -> 15         : 2501     |*                                       |
              16 -> 31         : 3061     |**                                      |
              32 -> 63         : 4028     |***                                     |
              64 -> 127        : 9312     |*******                                 |
             128 -> 255        : 51372    |****************************************|
             256 -> 511        : 28574    |**********************                  |
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20220705233715.759-1-kuniyu@amazon.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      cf21b355
  2. 05 7月, 2022 1 次提交
    • K
      af_unix: Put pathname sockets in the global hash table. · 51bae889
      Kuniyuki Iwashima 提交于
      Commit cf2f225e ("af_unix: Put a socket into a per-netns hash table.")
      accidentally broke user API for pathname sockets.  A socket was able to
      connect() to a pathname socket whose file was visible even if they were in
      different network namespaces.
      
      The commit puts all sockets into a per-netns hash table.  As a result,
      connect() to a pathname socket in a different netns fails to find it in the
      caller's per-netns hash table and returns -ECONNREFUSED even when the task
      can view the peer socket file.
      
      We can reproduce this issue by:
      
        Console A:
      
          # python3
          >>> from socket import *
          >>> s = socket(AF_UNIX, SOCK_STREAM, 0)
          >>> s.bind('test')
          >>> s.listen(32)
      
        Console B:
      
          # ip netns add test
          # ip netns exec test sh
          # python3
          >>> from socket import *
          >>> s = socket(AF_UNIX, SOCK_STREAM, 0)
          >>> s.connect('test')
      
      Note when dumping sockets by sock_diag, procfs, and bpf_iter, they are
      filtered only by netns.  In other words, even if they are visible and
      connect()able, all sockets in different netns are skipped while iterating
      sockets.  Thus, we need a fix only for finding a peer pathname socket.
      
      This patch adds a global hash table for pathname sockets, links them with
      sk_bind_node, and uses it in unix_find_socket_byinode().  By doing so, we
      can keep sockets in per-netns hash tables and dump them easily.
      
      Thanks to Sachin Sant and Leonard Crestez for reports, logs and a reproducer.
      
      Fixes: cf2f225e ("af_unix: Put a socket into a per-netns hash table.")
      Reported-by: NSachin Sant <sachinp@linux.ibm.com>
      Reported-by: NLeonard Crestez <cdleonard@gmail.com>
      Tested-by: NSachin Sant <sachinp@linux.ibm.com>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Tested-by: NLeonard Crestez <cdleonard@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      51bae889
  3. 22 6月, 2022 6 次提交
  4. 10 6月, 2022 1 次提交
  5. 07 6月, 2022 1 次提交
  6. 17 5月, 2022 1 次提交
  7. 12 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from recvmsg() entities · ec095263
      Oliver Hartkopp 提交于
      The internal recvmsg() functions have two parameters 'flags' and 'noblock'
      that were merged inside skb_recv_datagram(). As a follow up patch to commit
      f4b41f06 ("net: remove noblock parameter from skb_recv_datagram()")
      this patch removes the separate 'noblock' parameter for recvmsg().
      
      Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
      'noblock' parameters are unnecessarily split up with e.g.
      
      err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
                                 flags & ~MSG_DONTWAIT, &addr_len);
      
      or in
      
      err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
                            sk, msg, size, flags & MSG_DONTWAIT,
                            flags & ~MSG_DONTWAIT, &addr_len);
      
      instead of simply using only flags all the time and check for MSG_DONTWAIT
      where needed (to preserve for the formerly separated no(n)block condition).
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      ec095263
  8. 06 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from skb_recv_datagram() · f4b41f06
      Oliver Hartkopp 提交于
      skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
      merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
      
      As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
      into 'flags' and 'noblock' with finally obsolete bit operations like this:
      
      skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
      
      And this is not even done consistently with the 'flags' parameter.
      
      This patch removes the obsolete and costly splitting into two parameters
      and only performs bit operations when really needed on the caller side.
      
      One missing conversion thankfully reported by kernel test robot. I missed
      to enable kunit tests to build the mctp code.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b41f06
  9. 19 3月, 2022 1 次提交
  10. 18 3月, 2022 2 次提交
    • K
      af_unix: Support POLLPRI for OOB. · d9a232d4
      Kuniyuki Iwashima 提交于
      The commit 314001f0 ("af_unix: Add OOB support") introduced OOB for
      AF_UNIX, but it lacks some changes for POLLPRI.  Let's add the missing
      piece.
      
      In the selftest, normal datagrams are sent followed by OOB data, so this
      commit replaces `POLLIN | POLLPRI` with just `POLLPRI` in the first test
      case.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9a232d4
    • K
      af_unix: Fix some data-races around unix_sk(sk)->oob_skb. · e82025c6
      Kuniyuki Iwashima 提交于
      Out-of-band data automatically places a "mark" showing wherein the
      sequence the out-of-band data would have been.  If the out-of-band data
      implies cancelling everything sent so far, the "mark" is helpful to flush
      them.  When the socket's read pointer reaches the "mark", the ioctl() below
      sets a non zero value to the arg `atmark`:
      
      The out-of-band data is queued in sk->sk_receive_queue as well as ordinary
      data and also saved in unix_sk(sk)->oob_skb.  It can be used to test if the
      head of the receive queue is the out-of-band data meaning the socket is at
      the "mark".
      
      While testing that, unix_ioctl() reads unix_sk(sk)->oob_skb locklessly.
      Thus, all accesses to oob_skb need some basic protection to avoid
      load/store tearing which KCSAN detects when these are called concurrently:
      
        - ioctl(fd_a, SIOCATMARK, &atmark, sizeof(atmark))
        - send(fd_b_connected_to_a, buf, sizeof(buf), MSG_OOB)
      
      BUG: KCSAN: data-race in unix_ioctl / unix_stream_sendmsg
      
      write to 0xffff888003d9cff0 of 8 bytes by task 175 on cpu 1:
       unix_stream_sendmsg (net/unix/af_unix.c:2087 net/unix/af_unix.c:2191)
       sock_sendmsg (net/socket.c:705 net/socket.c:725)
       __sys_sendto (net/socket.c:2040)
       __x64_sys_sendto (net/socket.c:2048)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
      
      read to 0xffff888003d9cff0 of 8 bytes by task 176 on cpu 0:
       unix_ioctl (net/unix/af_unix.c:3101 (discriminator 1))
       sock_do_ioctl (net/socket.c:1128)
       sock_ioctl (net/socket.c:1242)
       __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
      
      value changed: 0xffff888003da0c00 -> 0xffff888003da0d00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 176 Comm: unix_race_oob_i Not tainted 5.17.0-rc5-59529-g83dc4c2a #12
      Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e82025c6
  11. 19 1月, 2022 3 次提交
  12. 30 12月, 2021 1 次提交
  13. 27 11月, 2021 13 次提交
    • K
      af_unix: Relax race in unix_autobind(). · 9acbc584
      Kuniyuki Iwashima 提交于
      When we bind an AF_UNIX socket without a name specified, the kernel selects
      an available one from 0x00000 to 0xFFFFF.  unix_autobind() starts searching
      from a number in the 'static' variable and increments it after acquiring
      two locks.
      
      If multiple processes try autobind, they obtain the same lock and check if
      a socket in the hash list has the same name.  If not, one process uses it,
      and all except one end up retrying the _next_ number (actually not, it may
      be incremented by the other processes).  The more we autobind sockets in
      parallel, the longer the latency gets.  We can avoid such a race by
      searching for a name from a random number.
      
      These show latency in unix_autobind() while 64 CPUs are simultaneously
      autobind-ing 1024 sockets for each.
      
        Without this patch:
      
           usec          : count     distribution
              0          : 1176     |***                                     |
              2          : 3655     |***********                             |
              4          : 4094     |*************                           |
              6          : 3831     |************                            |
              8          : 3829     |************                            |
              10         : 3844     |************                            |
              12         : 3638     |***********                             |
              14         : 2992     |*********                               |
              16         : 2485     |*******                                 |
              18         : 2230     |*******                                 |
              20         : 2095     |******                                  |
              22         : 1853     |*****                                   |
              24         : 1827     |*****                                   |
              26         : 1677     |*****                                   |
              28         : 1473     |****                                    |
              30         : 1573     |*****                                   |
              32         : 1417     |****                                    |
              34         : 1385     |****                                    |
              36         : 1345     |****                                    |
              38         : 1344     |****                                    |
              40         : 1200     |***                                     |
      
        With this patch:
      
           usec          : count     distribution
              0          : 1855     |******                                  |
              2          : 6464     |*********************                   |
              4          : 9936     |********************************        |
              6          : 12107    |****************************************|
              8          : 10441    |**********************************      |
              10         : 7264     |***********************                 |
              12         : 4254     |**************                          |
              14         : 2538     |********                                |
              16         : 1596     |*****                                   |
              18         : 1088     |***                                     |
              20         : 800      |**                                      |
              22         : 670      |**                                      |
              24         : 601      |*                                       |
              26         : 562      |*                                       |
              28         : 525      |*                                       |
              30         : 446      |*                                       |
              32         : 378      |*                                       |
              34         : 337      |*                                       |
              36         : 317      |*                                       |
              38         : 314      |*                                       |
              40         : 298      |                                        |
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9acbc584
    • K
      af_unix: Replace the big lock with small locks. · afd20b92
      Kuniyuki Iwashima 提交于
      The hash table of AF_UNIX sockets is protected by the single lock.  This
      patch replaces it with per-hash locks.
      
      The effect is noticeable when we handle multiple sockets simultaneously.
      Here is a test result on an EC2 c5.24xlarge instance.  It shows latency
      (under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating
      1024 sockets for each in parallel.
      
        Without this patch:
      
           nsec          : count     distribution
              0          : 179      |                                        |
              500        : 3021     |*********                               |
              1000       : 6271     |*******************                     |
              1500       : 6318     |*******************                     |
              2000       : 5828     |*****************                       |
              2500       : 5124     |***************                         |
              3000       : 4426     |*************                           |
              3500       : 3672     |***********                             |
              4000       : 3138     |*********                               |
              4500       : 2811     |********                                |
              5000       : 2384     |*******                                 |
              5500       : 2023     |******                                  |
              6000       : 1954     |*****                                   |
              6500       : 1737     |*****                                   |
              7000       : 1749     |*****                                   |
              7500       : 1520     |****                                    |
              8000       : 1469     |****                                    |
              8500       : 1394     |****                                    |
              9000       : 1232     |***                                     |
              9500       : 1138     |***                                     |
              10000      : 994      |***                                     |
      
        With this patch:
      
           nsec          : count     distribution
              0          : 1634     |****                                    |
              500        : 13170    |****************************************|
              1000       : 13156    |*************************************** |
              1500       : 9010     |***************************             |
              2000       : 6363     |*******************                     |
              2500       : 4443     |*************                           |
              3000       : 3240     |*********                               |
              3500       : 2549     |*******                                 |
              4000       : 1872     |*****                                   |
              4500       : 1504     |****                                    |
              5000       : 1247     |***                                     |
              5500       : 1035     |***                                     |
              6000       : 889      |**                                      |
              6500       : 744      |**                                      |
              7000       : 634      |*                                       |
              7500       : 498      |*                                       |
              8000       : 433      |*                                       |
              8500       : 355      |*                                       |
              9000       : 336      |*                                       |
              9500       : 284      |                                        |
              10000      : 243      |                                        |
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      afd20b92
    • K
      af_unix: Save hash in sk_hash. · e6b4b873
      Kuniyuki Iwashima 提交于
      To replace unix_table_lock with per-hash locks in the next patch, we need
      to save a hash in each socket because /proc/net/unix or BPF prog iterate
      sockets while holding a hash table lock and release it later in a different
      function.
      
      Currently, we store a real/pseudo hash in struct unix_address.  However, we
      do not allocate it to unbound sockets, nor should we do just for that.  For
      this purpose, we can use sk_hash.  Then, we no longer use the hash field in
      struct unix_address and can remove it.
      
      Also, this patch does
        - rename unix_insert_socket() to unix_insert_unbound_socket()
        - remove the redundant list argument from __unix_insert_socket() and
           unix_insert_unbound_socket()
        - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash()
        - remove 'inline' from unix_remove_socket() and
           unix_insert_unbound_socket().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e6b4b873
    • K
      af_unix: Add helpers to calculate hashes. · f452be49
      Kuniyuki Iwashima 提交于
      This patch adds three helper functions that calculate hashes for unbound
      sockets and bound sockets with BSD/abstract addresses.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f452be49
    • K
      af_unix: Remove UNIX_ABSTRACT() macro and test sun_path[0] instead. · 5ce7ab49
      Kuniyuki Iwashima 提交于
      In BSD and abstract address cases, we store sockets in the hash table with
      keys between 0 and UNIX_HASH_SIZE - 1.  However, the hash saved in a socket
      varies depending on its address type; sockets with BSD addresses always
      have UNIX_HASH_SIZE in their unix_sk(sk)->addr->hash.
      
      This is just for the UNIX_ABSTRACT() macro used to check the address type.
      The difference of the saved hashes comes from the first byte of the address
      in the first place.  So, we can test it directly.
      
      Then we can keep a real hash in each socket and replace unix_table_lock
      with per-hash locks in the later patch.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5ce7ab49
    • K
      af_unix: Allocate unix_address in unix_bind_(bsd|abstract)(). · 12f21c49
      Kuniyuki Iwashima 提交于
      To terminate address with '\0' in unix_bind_bsd(), we add
      unix_create_addr() and call it in unix_bind_bsd() and unix_bind_abstract().
      
      Also, unix_bind_abstract() does not return -EEXIST.  Only
      kern_path_create() and vfs_mknod() in unix_bind_bsd() can return it,
      so we move the last error check in unix_bind() to unix_bind_bsd().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      12f21c49
    • K
      af_unix: Remove unix_mkname(). · 5c32a3ed
      Kuniyuki Iwashima 提交于
      This patch removes unix_mkname() and postpones calculating a hash to
      unix_bind_abstract().  Some BSD stuffs still remain in unix_bind()
      though, the next patch packs them into unix_bind_bsd().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5c32a3ed
    • K
      af_unix: Copy unix_mkname() into unix_find_(bsd|abstract)(). · d2d8c9fd
      Kuniyuki Iwashima 提交于
      We should not call unix_mkname() before unix_find_other() and instead do
      the same thing where necessary based on the address type:
      
        - terminating the address with '\0' in unix_find_bsd()
        - calculating the hash in unix_find_abstract().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d2d8c9fd
    • K
      af_unix: Cut unix_validate_addr() out of unix_mkname(). · b8a58aa6
      Kuniyuki Iwashima 提交于
      unix_mkname() tests socket address length and family and does some
      processing based on the address type.  It is called in the early stage,
      and therefore some instructions are redundant and can end up in vain.
      
      The address length/family tests are done twice in unix_bind().  Also, the
      address type is rechecked later in unix_bind() and unix_find_other(), where
      we can do the same processing.  Moreover, in the BSD address case, the hash
      is set to 0 but never used and confusing.
      
      This patch moves the address tests out of unix_mkname(), and the following
      patches move the other part into appropriate places and remove
      unix_mkname() finally.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b8a58aa6
    • K
      af_unix: Return an error as a pointer in unix_find_other(). · aed26f55
      Kuniyuki Iwashima 提交于
      We can return an error as a pointer and need not pass an additional
      argument to unix_find_other().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      aed26f55
    • K
      af_unix: Factorise unix_find_other() based on address types. · fa39ef0e
      Kuniyuki Iwashima 提交于
      As done in the commit fa42d910 ("unix_bind(): take BSD and abstract
      address cases into new helpers"), this patch moves BSD and abstract address
      cases from unix_find_other() into unix_find_bsd() and unix_find_abstract().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      fa39ef0e
    • K
      af_unix: Pass struct sock to unix_autobind(). · f7ed31f4
      Kuniyuki Iwashima 提交于
      We do not use struct socket in unix_autobind() and pass struct sock to
      unix_bind_bsd() and unix_bind_abstract().  Let's pass it to unix_autobind()
      as well.
      
      Also, this patch fixes these errors by checkpatch.pl.
      
        ERROR: do not use assignment in if condition
        #1795: FILE: net/unix/af_unix.c:1795:
        +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
      
        CHECK: Logical continuations should be on the previous line
        #1796: FILE: net/unix/af_unix.c:1796:
        +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
        +	    && (err = unix_autobind(sock)) != 0)
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f7ed31f4
    • K
      af_unix: Use offsetof() instead of sizeof(). · 755662ce
      Kuniyuki Iwashima 提交于
      The length of the AF_UNIX socket address contains an offset to the member
      sun_path of struct sockaddr_un.
      
      Currently, the preceding member is just sun_family, and its type is
      sa_family_t and resolved to short.  Therefore, the offset is represented by
      sizeof(short).  However, it is not clear and fragile to changes in struct
      sockaddr_storage or sockaddr_un.
      
      This commit makes it clear and robust by rewriting sizeof() with
      offsetof().
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      755662ce
  14. 20 11月, 2021 1 次提交
  15. 16 11月, 2021 1 次提交
  16. 27 10月, 2021 1 次提交
  17. 12 10月, 2021 1 次提交
  18. 06 10月, 2021 1 次提交
  19. 30 9月, 2021 1 次提交
    • E
      af_unix: fix races in sk_peer_pid and sk_peer_cred accesses · 35306eb2
      Eric Dumazet 提交于
      Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
      are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
      
      In order to fix this issue, this patch adds a new spinlock that needs
      to be used whenever these fields are read or written.
      
      Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
      reading sk->sk_peer_pid which makes no sense, as this field
      is only possibly set by AF_UNIX sockets.
      We will have to clean this in a separate patch.
      This could be done by reverting b48596d1 "Bluetooth: L2CAP: Add get_peer_pid callback"
      or implementing what was truly expected.
      
      Fixes: 109f6e39 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35306eb2
  20. 28 9月, 2021 1 次提交
    • K
      af_unix: Return errno instead of NULL in unix_create1(). · f4bd73b5
      Kuniyuki Iwashima 提交于
      unix_create1() returns NULL on error, and the callers assume that it never
      fails for reasons other than out of memory.  So, the callers always return
      -ENOMEM when unix_create1() fails.
      
      However, it also returns NULL when the number of af_unix sockets exceeds
      twice the limit controlled by sysctl: fs.file-max.  In this case, the
      callers should return -ENFILE like alloc_empty_file().
      
      This patch changes unix_create1() to return the correct error value instead
      of NULL on error.
      
      Out of curiosity, the assumption has been wrong since 1999 due to this
      change introduced in 2.2.4 [0].
      
        diff -u --recursive --new-file v2.2.3/linux/net/unix/af_unix.c linux/net/unix/af_unix.c
        --- v2.2.3/linux/net/unix/af_unix.c	Tue Jan 19 11:32:53 1999
        +++ linux/net/unix/af_unix.c	Sun Mar 21 07:22:00 1999
        @@ -388,6 +413,9 @@
         {
         	struct sock *sk;
      
        +	if (atomic_read(&unix_nr_socks) >= 2*max_files)
        +		return NULL;
        +
         	MOD_INC_USE_COUNT;
         	sk = sk_alloc(PF_UNIX, GFP_KERNEL, 1);
         	if (!sk) {
      
      [0]: https://cdn.kernel.org/pub/linux/kernel/v2.2/patch-2.2.4.gz
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4bd73b5