1. 30 3月, 2021 1 次提交
    • L
      net:tipc: Fix a double free in tipc_sk_mcast_rcv · 6bf24dc0
      Lv Yunlong 提交于
      In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get
      the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq
      and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq)
      finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time.
      
      Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after
      the branch completed.
      
      My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because
      this skb will be freed by kfree_skb(skb) finally.
      
      Fixes: cb1b7280 ("tipc: eliminate race condition at multicast reception")
      Signed-off-by: NLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bf24dc0
  2. 02 12月, 2020 3 次提交
    • R
      net/tipc: fix all function Return: notation · 637b77fd
      Randy Dunlap 提交于
      Fix Return: kernel-doc notation in all net/tipc/ source files.
      Also keep ReST list notation intact for output formatting.
      Fix a few typos in comments.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      637b77fd
    • R
      net/tipc: fix socket.c kernel-doc · f172f4b8
      Randy Dunlap 提交于
      Fix socket.c kernel-doc warnings in preparation for adding to the
      networking docbook.
      
      Also, for rcvbuf_limit(), use bullet notation so that the lines do
      not run together.
      
      ../net/tipc/socket.c:130: warning: Function parameter or member 'cong_links' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'probe_unacked' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'snd_win' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'peer_caps' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'rcv_win' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'group' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'oneway' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'nagle_start' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'snd_backlog' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'msg_acc' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'pkt_cnt' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'expect_ack' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'nodelay' not described in 'tipc_sock'
      ../net/tipc/socket.c:130: warning: Function parameter or member 'group_is_open' not described in 'tipc_sock'
      ../net/tipc/socket.c:267: warning: Function parameter or member 'sk' not described in 'tsk_advance_rx_queue'
      ../net/tipc/socket.c:295: warning: Function parameter or member 'sk' not described in 'tsk_rej_rx_queue'
      ../net/tipc/socket.c:295: warning: Function parameter or member 'error' not described in 'tsk_rej_rx_queue'
      ../net/tipc/socket.c:894: warning: Function parameter or member 'tsk' not described in 'tipc_send_group_msg'
      ../net/tipc/socket.c:1187: warning: Function parameter or member 'net' not described in 'tipc_sk_mcast_rcv'
      ../net/tipc/socket.c:1323: warning: Function parameter or member 'inputq' not described in 'tipc_sk_conn_proto_rcv'
      ../net/tipc/socket.c:1323: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_conn_proto_rcv'
      ../net/tipc/socket.c:1885: warning: Function parameter or member 'sock' not described in 'tipc_recvmsg'
      ../net/tipc/socket.c:1993: warning: Function parameter or member 'sock' not described in 'tipc_recvstream'
      ../net/tipc/socket.c:2313: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_filter_rcv'
      ../net/tipc/socket.c:2404: warning: Function parameter or member 'xmitq' not described in 'tipc_sk_enqueue'
      ../net/tipc/socket.c:2456: warning: Function parameter or member 'net' not described in 'tipc_sk_rcv'
      ../net/tipc/socket.c:2693: warning: Function parameter or member 'kern' not described in 'tipc_accept'
      ../net/tipc/socket.c:3816: warning: Excess function parameter 'sysctl_tipc_sk_filter' description in 'tipc_sk_filtering'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f172f4b8
    • R
      net/tipc: fix various kernel-doc warnings · 5fcb7d47
      Randy Dunlap 提交于
      kernel-doc and Sphinx fixes to eliminate lots of warnings
      in preparation for adding to the networking docbook.
      
      ../net/tipc/crypto.c:57: warning: cannot understand function prototype: 'enum '
      ../net/tipc/crypto.c:69: warning: cannot understand function prototype: 'enum '
      ../net/tipc/crypto.c:130: warning: Function parameter or member 'tfm' not described in 'tipc_tfm'
      ../net/tipc/crypto.c:130: warning: Function parameter or member 'list' not described in 'tipc_tfm'
      ../net/tipc/crypto.c:172: warning: Function parameter or member 'stat' not described in 'tipc_crypto_stats'
      ../net/tipc/crypto.c:232: warning: Function parameter or member 'flags' not described in 'tipc_crypto'
      ../net/tipc/crypto.c:329: warning: Function parameter or member 'ukey' not described in 'tipc_aead_key_validate'
      ../net/tipc/crypto.c:329: warning: Function parameter or member 'info' not described in 'tipc_aead_key_validate'
      ../net/tipc/crypto.c:482: warning: Function parameter or member 'aead' not described in 'tipc_aead_tfm_next'
      ../net/tipc/trace.c:43: warning: cannot understand function prototype: 'unsigned long sysctl_tipc_sk_filter[5] __read_mostly = '
      
      Documentation/networking/tipc:57: ../net/tipc/msg.c:584: WARNING: Unexpected indentation.
      Documentation/networking/tipc:63: ../net/tipc/name_table.c:536: WARNING: Unexpected indentation.
      Documentation/networking/tipc:63: ../net/tipc/name_table.c:537: WARNING: Block quote ends without a blank line; unexpected unindent.
      Documentation/networking/tipc:78: ../net/tipc/socket.c:3809: WARNING: Unexpected indentation.
      Documentation/networking/tipc:78: ../net/tipc/socket.c:3807: WARNING: Inline strong start-string without end-string.
      Documentation/networking/tipc:72: ../net/tipc/node.c:904: WARNING: Unexpected indentation.
      Documentation/networking/tipc:39: ../net/tipc/crypto.c:97: WARNING: Block quote ends without a blank line; unexpected unindent.
      Documentation/networking/tipc:39: ../net/tipc/crypto.c:98: WARNING: Block quote ends without a blank line; unexpected unindent.
      Documentation/networking/tipc:39: ../net/tipc/crypto.c:141: WARNING: Inline strong start-string without end-string.
      
      ../net/tipc/discover.c:82: warning: Function parameter or member 'skb' not described in 'tipc_disc_init_msg'
      
      ../net/tipc/msg.c:69: warning: Function parameter or member 'gfp' not described in 'tipc_buf_acquire'
      ../net/tipc/msg.c:382: warning: Function parameter or member 'offset' not described in 'tipc_msg_build'
      ../net/tipc/msg.c:708: warning: Function parameter or member 'net' not described in 'tipc_msg_lookup_dest'
      
      ../net/tipc/subscr.c:65: warning: Function parameter or member 'seq' not described in 'tipc_sub_check_overlap'
      ../net/tipc/subscr.c:65: warning: Function parameter or member 'found_lower' not described in 'tipc_sub_check_overlap'
      ../net/tipc/subscr.c:65: warning: Function parameter or member 'found_upper' not described in 'tipc_sub_check_overlap'
      
      ../net/tipc/udp_media.c:75: warning: Function parameter or member 'proto' not described in 'udp_media_addr'
      ../net/tipc/udp_media.c:75: warning: Function parameter or member 'port' not described in 'udp_media_addr'
      ../net/tipc/udp_media.c:75: warning: Function parameter or member 'ipv4' not described in 'udp_media_addr'
      ../net/tipc/udp_media.c:75: warning: Function parameter or member 'ipv6' not described in 'udp_media_addr'
      ../net/tipc/udp_media.c:98: warning: Function parameter or member 'rcast' not described in 'udp_bearer'
      
      Also fixed a typo of "duest" to "dest".
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5fcb7d47
  3. 28 11月, 2020 2 次提交
  4. 30 10月, 2020 1 次提交
    • J
      tipc: add stricter control of reserved service types · 72671b35
      Jon Maloy 提交于
      TIPC reserves 64 service types for current and future internal use.
      Therefore, the bind() function is meant to block regular user sockets
      from being bound to these values, while it should let through such
      bindings from internal users.
      
      However, since we at the design moment saw no way to distinguish
      between regular and internal users the filter function ended up
      with allowing all bindings of the reserved types which were really
      in use ([0,1]), and block all the rest ([2,63]).
      
      This is risky, since a regular user may bind to the service type
      representing the topology server (TIPC_TOP_SRV == 1) or the one used
      for indicating neighboring node status (TIPC_CFG_SRV == 0), and wreak
      havoc for users of those services, i.e., most users.
      
      The reality is however that TIPC_CFG_SRV never is bound through the
      bind() function, since it doesn't represent a regular socket, and
      TIPC_TOP_SRV can also be made to bypass the checks in tipc_bind()
      by introducing a different entry function, tipc_sk_bind().
      
      It should be noted that although this is a change of the API semantics,
      there is no risk we will break any currently working applications by
      doing this. Any application trying to bind to the values in question
      would be badly broken from the outset, so there is no chance we would
      find any such applications in real-world production systems.
      
      v2: Added warning printout when a user is blocked from binding,
          as suggested by Jakub Kicinski
      Acked-by: NYung Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jmaloy@redhat.com>
      Link: https://lore.kernel.org/r/20201030012938.489557-1-jmaloy@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      72671b35
  5. 19 9月, 2020 1 次提交
  6. 11 9月, 2020 1 次提交
    • T
      tipc: fix shutdown() of connection oriented socket · a4b5cc9e
      Tetsuo Handa 提交于
      I confirmed that the problem fixed by commit 2a63866c ("tipc: fix
      shutdown() of connectionless socket") also applies to stream socket.
      
      ----------
      #include <sys/socket.h>
      #include <unistd.h>
      #include <sys/wait.h>
      
      int main(int argc, char *argv[])
      {
              int fds[2] = { -1, -1 };
              socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
              if (fork() == 0)
                      _exit(read(fds[0], NULL, 1));
              shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
              wait(NULL); /* To be woken up by _exit(). */
              return 0;
      }
      ----------
      
      Since shutdown(SHUT_RDWR) should affect all processes sharing that socket,
      unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right
      behavior.
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4b5cc9e
  7. 03 9月, 2020 1 次提交
    • T
      tipc: fix shutdown() of connectionless socket · 2a63866c
      Tetsuo Handa 提交于
      syzbot is reporting hung task at nbd_ioctl() [1], for there are two
      problems regarding TIPC's connectionless socket's shutdown() operation.
      
      ----------
      #include <fcntl.h>
      #include <sys/socket.h>
      #include <sys/ioctl.h>
      #include <linux/nbd.h>
      #include <unistd.h>
      
      int main(int argc, char *argv[])
      {
              const int fd = open("/dev/nbd0", 3);
              alarm(5);
              ioctl(fd, NBD_SET_SOCK, socket(PF_TIPC, SOCK_DGRAM, 0));
              ioctl(fd, NBD_DO_IT, 0); /* To be interrupted by SIGALRM. */
              return 0;
      }
      ----------
      
      One problem is that wait_for_completion() from flush_workqueue() from
      nbd_start_device_ioctl() from nbd_ioctl() cannot be completed when
      nbd_start_device_ioctl() received a signal at wait_event_interruptible(),
      for tipc_shutdown() from kernel_sock_shutdown(SHUT_RDWR) from
      nbd_mark_nsock_dead() from sock_shutdown() from nbd_start_device_ioctl()
      is failing to wake up a WQ thread sleeping at wait_woken() from
      tipc_wait_for_rcvmsg() from sock_recvmsg() from sock_xmit() from
      nbd_read_stat() from recv_work() scheduled by nbd_start_device() from
      nbd_start_device_ioctl(). Fix this problem by always invoking
      sk->sk_state_change() (like inet_shutdown() does) when tipc_shutdown() is
      called.
      
      The other problem is that tipc_wait_for_rcvmsg() cannot return when
      tipc_shutdown() is called, for tipc_shutdown() sets sk->sk_shutdown to
      SEND_SHUTDOWN (despite "how" is SHUT_RDWR) while tipc_wait_for_rcvmsg()
      needs sk->sk_shutdown set to RCV_SHUTDOWN or SHUTDOWN_MASK. Fix this
      problem by setting sk->sk_shutdown to SHUTDOWN_MASK (like inet_shutdown()
      does) when the socket is connectionless.
      
      [1] https://syzkaller.appspot.com/bug?id=3fe51d307c1f0a845485cf1798aa059d12bf18b2Reported-by: Nsyzbot <syzbot+e36f41d207137b5d12f7@syzkaller.appspotmail.com>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a63866c
  8. 01 9月, 2020 1 次提交
  9. 24 8月, 2020 1 次提交
  10. 19 8月, 2020 1 次提交
  11. 25 7月, 2020 1 次提交
  12. 14 7月, 2020 1 次提交
  13. 12 6月, 2020 1 次提交
    • T
      tipc: fix kernel WARNING in tipc_msg_append() · c9aa81fa
      Tuong Lien 提交于
      syzbot found the following issue:
      
      WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 check_copy_size include/linux/thread_info.h:150 [inline]
      WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 copy_from_iter include/linux/uio.h:144 [inline]
      WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 tipc_msg_append+0x49a/0x5e0 net/tipc/msg.c:242
      Kernel panic - not syncing: panic_on_warn set ...
      
      This happens after commit 5e9eeccc ("tipc: fix NULL pointer
      dereference in streaming") that tried to build at least one buffer even
      when the message data length is zero... However, it now exposes another
      bug that the 'mss' can be zero and the 'cpy' will be negative, thus the
      above kernel WARNING will appear!
      The zero value of 'mss' is never expected because it means Nagle is not
      enabled for the socket (actually the socket type was 'SOCK_SEQPACKET'),
      so the function 'tipc_msg_append()' must not be called at all. But that
      was in this particular case since the message data length was zero, and
      the 'send <= maxnagle' check became true.
      
      We resolve the issue by explicitly checking if Nagle is enabled for the
      socket, i.e. 'maxnagle != 0' before calling the 'tipc_msg_append()'. We
      also reinforce the function to against such a negative values if any.
      
      Reported-by: syzbot+75139a7d2605236b0b7f@syzkaller.appspotmail.com
      Fixes: c0bceb97 ("tipc: add smart nagle feature")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9aa81fa
  14. 02 6月, 2020 1 次提交
  15. 29 5月, 2020 1 次提交
  16. 27 5月, 2020 1 次提交
    • T
      tipc: add test for Nagle algorithm effectiveness · 0a3e060f
      Tuong Lien 提交于
      When streaming in Nagle mode, we try to bundle small messages from user
      as many as possible if there is one outstanding buffer, i.e. not ACK-ed
      by the receiving side, which helps boost up the overall throughput. So,
      the algorithm's effectiveness really depends on when Nagle ACK comes or
      what the specific network latency (RTT) is, compared to the user's
      message sending rate.
      
      In a bad case, the user's sending rate is low or the network latency is
      small, there will not be many bundles, so making a Nagle ACK or waiting
      for it is not meaningful.
      For example: a user sends its messages every 100ms and the RTT is 50ms,
      then for each messages, we require one Nagle ACK but then there is only
      one user message sent without any bundles.
      
      In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
      but now the user sends messages in medium size, then there will not be
      any difference at all, that says 3 x 1000-byte data messages if bundled
      will still result in 3 bundles with MTU = 1500.
      
      When Nagle is ineffective, the delay in user message sending is clearly
      wasted instead of sending directly.
      
      Besides, adding Nagle ACKs will consume some processor load on both the
      sending and receiving sides.
      
      This commit adds a test on the effectiveness of the Nagle algorithm for
      an individual connection in the network on which it actually runs.
      Particularly, upon receipt of a Nagle ACK we will compare the number of
      bundles in the backlog queue to the number of user messages which would
      be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
      mode will be kept for further message sending. Otherwise, we will leave
      Nagle and put a 'penalty' on the connection, so it will have to spend
      more 'one-way' messages before being able to re-enter Nagle.
      
      In addition, the 'ack-required' bit is only set when really needed that
      the number of Nagle ACKs will be reduced during Nagle mode.
      
      Testing with benchmark showed that with the patch, there was not much
      difference in throughput for small messages since the tool continuously
      sends messages without a break, so Nagle would still take in effect.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3e060f
  17. 14 5月, 2020 1 次提交
    • T
      tipc: fix large latency in smart Nagle streaming · c7268589
      Tuong Lien 提交于
      Currently when a connection is in Nagle mode, we set the 'ack_required'
      bit in the last sending buffer and wait for the corresponding ACK prior
      to pushing more data. However, on the receiving side, the ACK is issued
      only when application really  reads the whole data. Even if part of the
      last buffer is received, we will not do the ACK as required. This might
      cause an unnecessary delay since the receiver does not always fetch the
      message as fast as the sender, resulting in a large latency in the user
      message sending, which is: [one RTT + the receiver processing time].
      
      The commit makes Nagle ACK as soon as possible i.e. when a message with
      the 'ack_required' arrives in the receiving side's stack even before it
      is processed or put in the socket receive queue...
      This way, we can limit the streaming latency to one RTT as committed in
      Nagle mode.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7268589
  18. 27 3月, 2020 1 次提交
  19. 10 2月, 2020 1 次提交
    • T
      tipc: fix successful connect() but timed out · 5391a877
      Tuong Lien 提交于
      In commit 9546a0b7 ("tipc: fix wrong connect() return code"), we
      fixed the issue with the 'connect()' that returns zero even though the
      connecting has failed by waiting for the connection to be 'ESTABLISHED'
      really. However, the approach has one drawback in conjunction with our
      'lightweight' connection setup mechanism that the following scenario
      can happen:
      
                (server)                        (client)
      
         +- accept()|                      |             wait_for_conn()
         |          |                      |connect() -------+
         |          |<-------[SYN]---------|                 > sleeping
         |          |                      *CONNECTING       |
         |--------->*ESTABLISHED           |                 |
                    |--------[ACK]-------->*ESTABLISHED      > wakeup()
              send()|--------[DATA]------->|\                > wakeup()
              send()|--------[DATA]------->| |               > wakeup()
                .   .          .           . |-> recvq       .
                .   .          .           . |               .
              send()|--------[DATA]------->|/                > wakeup()
             close()|--------[FIN]-------->*DISCONNECTING    |
                    *DISCONNECTING         |                 |
                    |                      ~~~~~~~~~~~~~~~~~~> schedule()
                                                             | wait again
                                                             .
                                                             .
                                                             | ETIMEDOUT
      
      Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED'
      and the 'wait_for_conn()' process is woken up but not run. Meanwhile,
      the server starts to send a number of data following by a 'close()'
      shortly without waiting any response from the client, which then forces
      the client socket to be 'DISCONNECTING' immediately. When the wait
      process is switched to be running, it continues to wait until the timer
      expires because of the unexpected socket state. The client 'connect()'
      will finally get ‘-ETIMEDOUT’ and force to release the socket whereas
      there remains the messages in its receive queue.
      
      Obviously the issue would not happen if the server had some delay prior
      to its 'close()' (or the number of 'DATA' messages is large enough),
      but any kind of delay would make the connection setup/shutdown "heavy".
      We solve this by simply allowing the 'connect()' returns zero in this
      particular case. The socket is already 'DISCONNECTING', so any further
      write will get '-EPIPE' but the socket is still able to read the
      messages existing in its receive queue.
      
      Note: This solution doesn't break the previous one as it deals with a
      different situation that the socket state is 'DISCONNECTING' but has no
      error (i.e. sk->sk_err = 0).
      
      Fixes: 9546a0b7 ("tipc: fix wrong connect() return code")
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5391a877
  20. 09 1月, 2020 2 次提交
    • T
      tipc: fix wrong connect() return code · 9546a0b7
      Tuong Lien 提交于
      The current 'tipc_wait_for_connect()' function does a wait-loop for the
      condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the socket
      connecting has done. However, when the condition is met, it returns '0'
      even in the case the connecting is actually failed, the socket state is
      set to 'TIPC_DISCONNECTING' (e.g. when the server socket has closed..).
      This results in a wrong return code for the 'connect()' call from user,
      making it believe that the connection is established and go ahead with
      building, sending a message, etc. but finally failed e.g. '-EPIPE'.
      
      This commit fixes the issue by changing the wait condition to the
      'tipc_sk_connected(sk)', so the function will return '0' only when the
      connection is really established. Otherwise, either the socket 'sk_err'
      if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9546a0b7
    • T
      tipc: fix link overflow issue at socket shutdown · 49afb806
      Tuong Lien 提交于
      When a socket is suddenly shutdown or released, it will reject all the
      unreceived messages in its receive queue. This applies to a connected
      socket too, whereas there is only one 'FIN' message required to be sent
      back to its peer in this case.
      
      In case there are many messages in the queue and/or some connections
      with such messages are shutdown at the same time, the link layer will
      easily get overflowed at the 'TIPC_SYSTEM_IMPORTANCE' backlog level
      because of the message rejections. As a result, the link will be taken
      down. Moreover, immediately when the link is re-established, the socket
      layer can continue to reject the messages and the same issue happens...
      
      The commit refactors the '__tipc_shutdown()' function to only send one
      'FIN' in the situation mentioned above. For the connectionless case, it
      is unavoidable but usually there is no rejections for such socket
      messages because they are 'dest-droppable' by default.
      
      In addition, the new code makes the other socket states clear
      (e.g.'TIPC_LISTEN') and treats as a separate case to avoid misbehaving.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49afb806
  21. 11 12月, 2019 1 次提交
    • T
      tipc: fix retrans failure due to wrong destination · abc9b4e0
      Tuong Lien 提交于
      When a user message is sent, TIPC will check if the socket has faced a
      congestion at link layer. If that happens, it will make a sleep to wait
      for the congestion to disappear. This leaves a gap for other users to
      take over the socket (e.g. multi threads) since the socket is released
      as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free
      to send messages to various destinations (e.g. via 'sendto()'), then
      the socket's preformatted header has to be updated correspondingly
      prior to the actual payload message building.
      
      Unfortunately, the latter action is done before the first action which
      causes a condition issue that the destination of a certain message can
      be modified incorrectly in the middle, leading to wrong destination
      when that message is built. Consequently, when the message is sent to
      the link layer, it gets stuck there forever because the peer node will
      simply reject it. After a number of retransmission attempts, the link
      is eventually taken down and the retransmission failure is reported.
      
      This commit fixes the problem by rearranging the order of actions to
      prevent the race condition from occurring, so the message building is
      'atomic' and its header will not be modified by anyone.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abc9b4e0
  22. 29 11月, 2019 4 次提交
  23. 24 11月, 2019 1 次提交
  24. 31 10月, 2019 1 次提交
    • J
      tipc: add smart nagle feature · c0bceb97
      Jon Maloy 提交于
      We introduce a feature that works like a combination of TCP_NAGLE and
      TCP_CORK, but without some of the weaknesses of those. In particular,
      we will not observe long delivery delays because of delayed acks, since
      the algorithm itself decides if and when acks are to be sent from the
      receiving peer.
      
      - The nagle property as such is determined by manipulating a new
        'maxnagle' field in struct tipc_sock. If certain conditions are met,
        'maxnagle' will define max size of the messages which can be bundled.
        If it is set to zero no messages are ever bundled, implying that the
        nagle property is disabled.
      - A socket with the nagle property enabled enters nagle mode when more
        than 4 messages have been sent out without receiving any data message
        from the peer.
      - A socket leaves nagle mode whenever it receives a data message from
        the peer.
      
      In nagle mode, messages smaller than 'maxnagle' are accumulated in the
      socket write queue. The last buffer in the queue is marked with a new
      'ack_required' bit, which forces the receiving peer to send a CONN_ACK
      message back to the sender upon reception.
      
      The accumulated contents of the write queue is transmitted when one of
      the following events or conditions occur.
      
      - A CONN_ACK message is received from the peer.
      - A data message is received from the peer.
      - A SOCK_WAKEUP pseudo message is received from the link level.
      - The write queue contains more than 64 1k blocks of data.
      - The connection is being shut down.
      - There is no CONN_ACK message to expect. I.e., there is currently
        no outstanding message where the 'ack_required' bit was set. As a
        consequence, the first message added after we enter nagle mode
        is always sent directly with this bit set.
      
      This new feature gives a 50-100% improvement of throughput for small
      (i.e., less than MTU size) messages, while it might add up to one RTT
      to latency time when the socket is in nagle mode.
      Acked-by: NYing Xue <ying.xue@windreiver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0bceb97
  25. 30 10月, 2019 1 次提交
    • H
      tipc: improve throughput between nodes in netns · f73b1281
      Hoang Le 提交于
      Currently, TIPC transports intra-node user data messages directly
      socket to socket, hence shortcutting all the lower layers of the
      communication stack. This gives TIPC very good intra node performance,
      both regarding throughput and latency.
      
      We now introduce a similar mechanism for TIPC data traffic across
      network namespaces located in the same kernel. On the send path, the
      call chain is as always accompanied by the sending node's network name
      space pointer. However, once we have reliably established that the
      receiving node is represented by a namespace on the same host, we just
      replace the namespace pointer with the receiving node/namespace's
      ditto, and follow the regular socket receive patch though the receiving
      node. This technique gives us a throughput similar to the node internal
      throughput, several times larger than if we let the traffic go though
      the full network stacks. As a comparison, max throughput for 64k
      messages is four times larger than TCP throughput for the same type of
      traffic.
      
      To meet any security concerns, the following should be noted.
      
      - All nodes joining a cluster are supposed to have been be certified
      and authenticated by mechanisms outside TIPC. This is no different for
      nodes/namespaces on the same host; they have to auto discover each
      other using the attached interfaces, and establish links which are
      supervised via the regular link monitoring mechanism. Hence, a kernel
      local node has no other way to join a cluster than any other node, and
      have to obey to policies set in the IP or device layers of the stack.
      
      - Only when a sender has established with 100% certainty that the peer
      node is located in a kernel local namespace does it choose to let user
      data messages, and only those, take the crossover path to the receiving
      node/namespace.
      
      - If the receiving node/namespace is removed, its namespace pointer
      is invalidated at all peer nodes, and their neighbor link monitoring
      will eventually note that this node is gone.
      
      - To ensure the "100% certainty" criteria, and prevent any possible
      spoofing, received discovery messages must contain a proof that the
      sender knows a common secret. We use the hash mix of the sending
      node/namespace for this purpose, since it can be accessed directly by
      all other namespaces in the kernel. Upon reception of a discovery
      message, the receiver checks this proof against all the local
      namespaces'hash_mix:es. If it finds a match, that, along with a
      matching node id and cluster id, this is deemed sufficient proof that
      the peer node in question is in a local namespace, and a wormhole can
      be opened.
      
      - We should also consider that TIPC is intended to be a cluster local
      IPC mechanism (just like e.g. UNIX sockets) rather than a network
      protocol, and hence we think it can justified to allow it to shortcut the
      lower protocol layers.
      
      Regarding traceability, we should notice that since commit 6c9081a3
      ("tipc: add loopback device tracking") it is possible to follow the node
      internal packet flow by just activating tcpdump on the loopback
      interface. This will be true even for this mechanism; by activating
      tcpdump on the involved nodes' loopback interfaces their inter-name
      space messaging can easily be tracked.
      
      v2:
      - update 'net' pointer when node left/rejoined
      v3:
      - grab read/write lock when using node ref obj
      v4:
      - clone traffics between netns to loopback
      Suggested-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f73b1281
  26. 29 10月, 2019 1 次提交
  27. 10 10月, 2019 2 次提交
    • E
      net: silence KCSAN warnings about sk->sk_backlog.len reads · 70c26558
      Eric Dumazet 提交于
      sk->sk_backlog.len can be written by BH handlers, and read
      from process contexts in a lockless way.
      
      Note the write side should also use WRITE_ONCE() or a variant.
      We need some agreement about the best way to do this.
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0
      
      write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
       sk_add_backlog include/net/sock.h:934 [inline]
       tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
       tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
      
      read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
       tcp_space include/net/tcp.h:1373 [inline]
       tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
       tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
       tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       sk_backlog_rcv include/net/sock.h:945 [inline]
       __release_sock+0x135/0x1e0 net/core/sock.c:2427
       release_sock+0x61/0x160 net/core/sock.c:2943
       tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1864 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      70c26558
    • E
      net: silence KCSAN warnings around sk_add_backlog() calls · 8265792b
      Eric Dumazet 提交于
      sk_add_backlog() callers usually read sk->sk_rcvbuf without
      owning the socket lock. This means sk_rcvbuf value can
      be changed by other cpus, and KCSAN complains.
      
      Add READ_ONCE() annotations to document the lockless nature
      of these reads.
      
      Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
      but this will be done in separate patches to ease stable
      backports (if we decide this is relevant for stable trees).
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
      
      write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
       __sk_add_backlog include/net/sock.h:902 [inline]
       sk_add_backlog include/net/sock.h:933 [inline]
       tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
       tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
      
      read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
       tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1864 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      8265792b
  28. 06 10月, 2019 1 次提交
  29. 19 8月, 2019 1 次提交
    • J
      tipc: clean up skb list lock handling on send path · e654f9f5
      Jon Maloy 提交于
      The policy for handling the skb list locks on the send and receive paths
      is simple.
      
      - On the send path we never need to grab the lock on the 'xmitq' list
        when the destination is an exernal node.
      
      - On the receive path we always need to grab the lock on the 'inputq'
        list, irrespective of source node.
      
      However, when transmitting node local messages those will eventually
      end up on the receive path of a local socket, meaning that the argument
      'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in  the
      function tipc_sk_rcv(). This has been handled by always initializing
      the spinlock of the 'xmitq' list at message creation, just in case it
      may end up on the receive path later, and despite knowing that the lock
      in most cases never will be used.
      
      This approach is inaccurate and confusing, and has also concealed the
      fact that the stated 'no lock grabbing' policy for the send path is
      violated in some cases.
      
      We now clean up this by never initializing the lock at message creation,
      instead doing this at the moment we find that the message actually will
      enter the receive path. At the same time we fix the four locations
      where we incorrectly access the spinlock on the send/error path.
      
      This patch also reverts commit d12cffe9 ("tipc: ensure head->lock
      is initialised") which has now become redundant.
      
      CC: Eric Dumazet <edumazet@google.com>
      Reported-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e654f9f5
  30. 31 7月, 2019 1 次提交
    • J
      tipc: fix unitilized skb list crash · 2948a1fc
      Jon Maloy 提交于
      Our test suite somtimes provokes the following crash:
      
      Description of problem:
      [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
      [ 1092.605072] PGD 0 P4D 0
      [ 1092.607620] Oops: 0000 [#1] SMP PTI
      [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1
      [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018
      [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
      [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
      [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
      [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045
      [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000
      [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a
      [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000
      [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a
      [ 1092.692614] FS:  0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000
      [ 1092.700702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0
      [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1092.727843] PKRU: 55555554
      [ 1092.730556] Call Trace:
      [ 1092.733010]  <IRQ>
      [ 1092.735034]  tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
      [ 1092.739828]  ? __kmalloc_node_track_caller+0x1cb/0x290
      [ 1092.744974]  ? dev_hard_start_xmit+0xa5/0x210
      [ 1092.749332]  tipc_sk_rcv+0x389/0x640 [tipc]
      [ 1092.753519]  tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
      [ 1092.758224]  tipc_rcv+0x57a/0xf20 [tipc]
      [ 1092.762154]  ? ktime_get_real_ts64+0x40/0xe0
      [ 1092.766432]  ? tpacket_rcv+0x50/0x9f0
      [ 1092.770098]  tipc_l2_rcv_msg+0x4a/0x70 [tipc]
      [ 1092.774452]  __netif_receive_skb_core+0xb62/0xbd0
      [ 1092.779164]  ? enqueue_entity+0xf6/0x630
      [ 1092.783084]  ? kmem_cache_alloc+0x158/0x1c0
      [ 1092.787272]  ? __build_skb+0x25/0xd0
      [ 1092.790849]  netif_receive_skb_internal+0x42/0xf0
      [ 1092.795557]  napi_gro_receive+0xba/0xe0
      [ 1092.799417]  mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
      [ 1092.804564]  mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
      [ 1092.809536]  mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
      [ 1092.814415]  ? __wake_up_common_lock+0x89/0xc0
      [ 1092.818861]  net_rx_action+0x149/0x3b0
      [ 1092.822616]  __do_softirq+0xe3/0x30a
      [ 1092.826193]  irq_exit+0x100/0x110
      [ 1092.829512]  do_IRQ+0x85/0xd0
      [ 1092.832483]  common_interrupt+0xf/0xf
      [ 1092.836147]  </IRQ>
      [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
      [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
      [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
      [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f
      [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000
      [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940
      [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78
      [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000
      [ 1092.905196]  ? cpuidle_enter_state+0x92/0x2a0
      [ 1092.909555]  do_idle+0x236/0x280
      [ 1092.912785]  cpu_startup_entry+0x6f/0x80
      [ 1092.916715]  start_secondary+0x1a7/0x200
      [ 1092.920642]  secondary_startup_64+0xb7/0xc0
      [...]
      
      The reason is that the skb list tipc_socket::mc_method.deferredq only
      is initialized for connectionless sockets, while nothing stops arriving
      multicast messages from being filtered by connection oriented sockets,
      with subsequent access to the said list.
      
      We fix this by initializing the list unconditionally at socket creation.
      This eliminates the crash, while the message still is dropped further
      down in tipc_sk_filter_rcv() as it should be.
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2948a1fc
  31. 10 5月, 2019 1 次提交
  32. 28 4月, 2019 1 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174