1. 20 12月, 2018 2 次提交
    • T
      tipc: add trace_events for tipc socket · 01e661eb
      Tuong Lien 提交于
      The commit adds the new trace_events for TIPC socket object:
      
      trace_tipc_sk_create()
      trace_tipc_sk_poll()
      trace_tipc_sk_sendmsg()
      trace_tipc_sk_sendmcast()
      trace_tipc_sk_sendstream()
      trace_tipc_sk_filter_rcv()
      trace_tipc_sk_advance_rx()
      trace_tipc_sk_rej_msg()
      trace_tipc_sk_drop_msg()
      trace_tipc_sk_release()
      trace_tipc_sk_shutdown()
      trace_tipc_sk_overlimit1()
      trace_tipc_sk_overlimit2()
      
      Also, enables the traces for the following cases:
      - When user creates a TIPC socket;
      - When user calls poll() on TIPC socket;
      - When user sends a dgram/mcast/stream message.
      - When a message is put into the socket 'sk_receive_queue';
      - When a message is released from the socket 'sk_receive_queue';
      - When a message is rejected (e.g. due to no port, invalid, etc.);
      - When a message is dropped (e.g. due to wrong message type);
      - When socket is released;
      - When socket is shutdown;
      - When socket rcvq's allocation is overlimit (> 90%);
      - When socket rcvq + bklq's allocation is overlimit (> 90%);
      - When the 'TIPC_ERR_OVERLOAD/2' issue happens;
      
      Note:
      a) All the socket traces are designed to be able to trace on a specific
      socket by either using the 'event filtering' feature on a known socket
      'portid' value or the sysctl file:
      
      /proc/sys/net/tipc/sk_filter
      
      The file determines a 'tuple' for what socket should be traced:
      
      (portid, sock type, name type, name lower, name upper)
      
      where:
      + 'portid' is the socket portid generated at socket creating, can be
      found in the trace outputs or the 'tipc socket list' command printouts;
      + 'sock type' is the socket type (1 = SOCK_TREAM, ...);
      + 'name type', 'name lower' and 'name upper' are the service name being
      connected to or published by the socket.
      
      Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e.
      the traces happen for every sockets with no filter.
      
      b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event
      which happens when the socket receive queue (and backlog queue) is
      about to be overloaded, when the queue allocation is > 90%. Then, when
      the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2
      issue can be traced.
      
      The trace event is designed as an 'upper watermark' notification that
      the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or
      actions can be triggerred in the meanwhile to see what is going on with
      the socket queue.
      
      In addition, the 'trace_tipc_sk_dump()' is also placed at the
      'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped
      for post-analysis.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01e661eb
    • T
      tipc: enable tracepoints in tipc · b4b9771b
      Tuong Lien 提交于
      As for the sake of debugging/tracing, the commit enables tracepoints in
      TIPC along with some general trace_events as shown below. It also
      defines some 'tipc_*_dump()' functions that allow to dump TIPC object
      data whenever needed, that is, for general debug purposes, ie. not just
      for the trace_events.
      
      The following trace_events are now available:
      
      - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
        e.g. message type, user, droppable, skb truesize, cloned skb, etc.
      
      - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
        queues, e.g. TIPC link transmq, socket receive queue, etc.
      
      - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
        sk state, sk type, connection type, rmem_alloc, socket queues, etc.
      
      - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
        link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
      
      - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
        node state, active links, capabilities, link entries, etc.
      
      How to use:
      Put the trace functions at any places where we want to dump TIPC data
      or events.
      
      Note:
      a) The dump functions will generate raw data only, that is, to offload
      the trace event's processing, it can require a tool or script to parse
      the data but this should be simple.
      
      b) The trace_tipc_*_dump() should be reserved for a failure cases only
      (e.g. the retransmission failure case) or where we do not expect to
      happen too often, then we can consider enabling these events by default
      since they will almost not take any effects under normal conditions,
      but once the rare condition or failure occurs, we get the dumped data
      fully for post-analysis.
      
      For other trace purposes, we can reuse these trace classes as template
      but different events.
      
      c) A trace_event is only effective when we enable it. To enable the
      TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
      directory in the 'debugfs' file system. Normally, they are located at:
      
      /sys/kernel/debug/tracing/events/tipc/
      
      For example:
      
      To enable the tipc_link_dump event:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
      
      To enable all the TIPC trace_events:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To collect the trace data:
      
      cat trace
      
      or
      
      cat trace_pipe > /trace.out &
      
      To disable all the TIPC trace_events:
      
      echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To clear the trace buffer:
      
      echo > trace
      
      d) Like the other trace_events, the feature like 'filter' or 'trigger'
      is also usable for the tipc trace_events.
      For more details, have a look at:
      
      Documentation/trace/ftrace.txt
      
      MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b9771b
  2. 18 11月, 2018 1 次提交
  3. 24 10月, 2018 1 次提交
    • K
      Revert "net: simplify sock_poll_wait" · 89ab066d
      Karsten Graul 提交于
      This reverts commit dd979b4d.
      
      This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
      internal TCP socket for the initial handshake with the remote peer.
      Whenever the SMC connection can not be established this TCP socket is
      used as a fallback. All socket operations on the SMC socket are then
      forwarded to the TCP socket. In case of poll, the file->private_data
      pointer references the SMC socket because the TCP socket has no file
      assigned. This causes tcp_poll to wait on the wrong socket.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ab066d
  4. 11 10月, 2018 1 次提交
  5. 30 9月, 2018 4 次提交
  6. 26 9月, 2018 1 次提交
  7. 07 9月, 2018 1 次提交
    • C
      tipc: call start and done ops directly in __tipc_nl_compat_dumpit() · 8f5c5fcf
      Cong Wang 提交于
      __tipc_nl_compat_dumpit() uses a netlink_callback on stack,
      so the only way to align it with other ->dumpit() call path
      is calling tipc_dump_start() and tipc_dump_done() directly
      inside it. Otherwise ->dumpit() would always get NULL from
      cb->args[].
      
      But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
      net pointer, the cb->skb here doesn't set skb->sk, the net pointer
      is saved in msg->net instead, so introduce a helper function
      __tipc_dump_start() to pass in msg->net.
      
      Ying pointed out cb->args[0...3] are already used by other
      callbacks on this call path, so we can't use cb->args[0] any
      more, use cb->args[4] instead.
      
      Fixes: 9a07efa9 ("tipc: switch to rhashtable iterator")
      Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f5c5fcf
  8. 06 9月, 2018 1 次提交
  9. 30 8月, 2018 2 次提交
    • C
      tipc: switch to rhashtable iterator · 9a07efa9
      Cong Wang 提交于
      syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
      where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
      tipc_group_delete() just deletes it in tipc_release().
      
      tipc_nl_sk_walk() aims to lock this sock when walking each sock
      in the hash table to close race conditions with sock changes like
      this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
      this doesn't work at all. All non-BH call path should take
      lock_sock() instead to make it work.
      
      tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
      where RCU read lock is required, this is the reason why lock_sock()
      can't be taken on this path. This could be resolved by switching to
      rhashtable iterator API's, where taking a sleepable lock is possible.
      Also, the iterator API's are friendly for restartable calls like
      diag dump, the last position is remembered behind the scence,
      all we need to do here is saving the iterator into cb->args[].
      
      I tested this with parallel tipc diag dump and thousands of tipc
      socket creation and release, no crash or memory leak.
      
      Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a07efa9
    • C
      tipc: fix a missing rhashtable_walk_exit() · bd583fe3
      Cong Wang 提交于
      rhashtable_walk_exit() must be paired with rhashtable_walk_enter().
      
      Fixes: 40f9f439 ("tipc: Fix tipc_sk_reinit race conditions")
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd583fe3
  10. 02 8月, 2018 1 次提交
  11. 31 7月, 2018 1 次提交
  12. 30 6月, 2018 1 次提交
  13. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  14. 26 5月, 2018 1 次提交
  15. 11 5月, 2018 1 次提交
    • E
      tipc: fix one byte leak in tipc_sk_set_orig_addr() · 09c8b971
      Eric Dumazet 提交于
      sysbot/KMSAN reported an uninit-value in recvmsg() that
      I tracked down to tipc_sk_set_orig_addr(), missing
      srcaddr->member.scope initialization.
      
      This patches moves srcaddr->sock.scope init to follow
      fields order and ease future verifications.
      
      BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
      BUG: KMSAN: uninit-value in move_addr_to_user+0x32e/0x530 net/socket.c:226
      CPU: 0 PID: 4549 Comm: syz-executor287 Not tainted 4.17.0-rc3+ #88
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
       kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
       copy_to_user include/linux/uaccess.h:184 [inline]
       move_addr_to_user+0x32e/0x530 net/socket.c:226
       ___sys_recvmsg+0x4e2/0x810 net/socket.c:2285
       __sys_recvmsg net/socket.c:2328 [inline]
       __do_sys_recvmsg net/socket.c:2338 [inline]
       __se_sys_recvmsg net/socket.c:2335 [inline]
       __x64_sys_recvmsg+0x325/0x460 net/socket.c:2335
       do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x4455e9
      RSP: 002b:00007fe3bd36ddb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
      RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 00000000004455e9
      RDX: 0000000000002002 RSI: 0000000020000400 RDI: 0000000000000003
      RBP: 00000000006dac20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fff98ce4b6f R14: 00007fe3bd36e9c0 R15: 0000000000000003
      
      Local variable description: ----addr@___sys_recvmsg
      Variable was created at:
       ___sys_recvmsg+0xd5/0x810 net/socket.c:2246
       __sys_recvmsg net/socket.c:2328 [inline]
       __do_sys_recvmsg net/socket.c:2338 [inline]
       __se_sys_recvmsg net/socket.c:2335 [inline]
       __x64_sys_recvmsg+0x325/0x460 net/socket.c:2335
      
      Byte 19 of 32 is uninitialized
      
      Fixes: 31c82a2d ("tipc: add second source address to recvmsg()/recvfrom()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c8b971
  16. 27 4月, 2018 1 次提交
  17. 13 4月, 2018 1 次提交
    • J
      tipc: fix missing initializer in tipc_sendmsg() · 335b929b
      Jon Maloy 提交于
      The stack variable 'dnode' in __tipc_sendmsg() may theoretically
      end up tipc_node_get_mtu() as an unitilalized variable.
      
      We fix this by intializing the variable at declaration. We also add
      a default else clause to the two conditional ones already there, so
      that we never end up in the named function if the given address
      type is illegal.
      
      Reported-by: syzbot+b0975ce9355b347c1546@syzkaller.appspotmail.com
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      335b929b
  18. 09 4月, 2018 1 次提交
    • C
      tipc: use the right skb in tipc_sk_fill_sock_diag() · e41f0548
      Cong Wang 提交于
      Commit 4b2e6877 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
      tried to fix the crash but failed, the crash is still 100% reproducible
      with it.
      
      In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not
      correct to retrieve its NETLINK_CB(), instead, like other protocol diag,
      we should use NETLINK_CB(cb->skb).sk here.
      
      Reported-by: <syzbot+326e587eff1074657718@syzkaller.appspotmail.com>
      Fixes: 4b2e6877 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
      Fixes: c30b70de (tipc: implement socket diagnostics for AF_TIPC)
      Cc: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e41f0548
  19. 04 4月, 2018 1 次提交
  20. 01 4月, 2018 1 次提交
    • J
      tipc: permit overlapping service ranges in name table · 37922ea4
      Jon Maloy 提交于
      With the new RB tree structure for service ranges it becomes possible to
      solve an old problem; - we can now allow overlapping service ranges in
      the table.
      
      When inserting a new service range to the tree, we use 'lower' as primary
      key, and when necessary 'upper' as secondary key.
      
      Since there may now be multiple service ranges matching an indicated
      'lower' value, we must also add the 'upper' value to the functions
      used for removing publications, so that the correct, corresponding
      range item can be found.
      
      These changes guarantee that a well-formed publication/withdrawal item
      from a peer node never will be rejected, and make it possible to
      eliminate the problematic backlog functionality we currently have for
      handling such cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37922ea4
  21. 24 3月, 2018 1 次提交
  22. 23 3月, 2018 3 次提交
  23. 18 3月, 2018 2 次提交
    • J
      tipc: some name changes · e50e73e1
      Jon Maloy 提交于
      We rename some lists and fields in struct publication both to make
      the naming more consistent and to better reflect their roles. We
      also update the descriptions of those lists.
      
      node_list -> local_publ
      cluster_list -> all_publ
      pport_list -> binding_sock
      ref -> port
      
      There are no functional changes in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e50e73e1
    • J
      tipc: obsolete TIPC_ZONE_SCOPE · 928df188
      Jon Maloy 提交于
      Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
      aspects handled the same way, both on the publishing node and on the
      receiving nodes.
      
      Despite previous ambitions to the contrary, this is never going to change,
      so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related
      macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt
      using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we
      remain compatible with users and remote nodes still using ZONE_SCOPE.
      
      Furthermore, the non-formalized scope value 0 has always been permitted
      for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE.
      We now permit it even as binding scope, but for compatibility reasons we
      choose to not change the value of TIPC_CLUSTER_SCOPE.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      928df188
  24. 28 2月, 2018 1 次提交
    • J
      tipc: correct initial value for group congestion flag · 1b22bcad
      Jon Maloy 提交于
      In commit 60c25306 ("tipc: fix race between poll() and
      setsockopt()") we introduced a pointer from struct tipc_group to the
      'group_is_connected' flag in struct tipc_sock, so that this field can
      be checked without dereferencing the group pointer of the latter struct.
      
      The initial value for this flag is correctly set to 'false' when a
      group is created, but we miss the case when no group is created at
      all, in which case the initial value should be 'true'. This has the
      effect that SOCK_RDM/DGRAM sockets sending datagrams never receive
      POLLOUT if they request so.
      
      This commit corrects this bug.
      
      Fixes: 60c25306 ("tipc: fix race between poll() and setsockopt()")
      Reported-by: NHoang Le <hoang.h.le@dektek.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b22bcad
  25. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  26. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  27. 20 1月, 2018 1 次提交
    • J
      tipc: fix race between poll() and setsockopt() · 60c25306
      Jon Maloy 提交于
      Letting tipc_poll() dereference a socket's pointer to struct tipc_group
      entails a race risk, as the group item may be deleted in a concurrent
      tipc_sk_join() or tipc_sk_leave() thread.
      
      We now move the 'open' flag in struct tipc_group to struct tipc_sock,
      and let the former retain only a pointer to the moved field. This will
      eliminate the race risk.
      
      Reported-by: syzbot+799dafde0286795858ac@syzkaller.appspotmail.com
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60c25306
  28. 16 1月, 2018 2 次提交
    • J
      tipc: fix bug during lookup of multicast destination nodes · e9a03445
      Jon Maloy 提交于
      In commit 232d07b7 ("tipc: improve groupcast scope handling") we
      inadvertently broke non-group multicast transmission when changing the
      parameter 'domain' to 'scope' in the function
      tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding
      change in the calling function, with the result that the lookup always
      fails.
      
      A closer anaysis reveals that this parameter is not needed at all.
      Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the
      current implementation this will be delivered to all matching
      destinations except those which are published with NODE_SCOPE on other
      nodes. Since such publications never will be visible on the sending node
      anyway, it makes no sense to discriminate by scope at all.
      
      We now remove this parameter altogether.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9a03445
    • J
      tipc: fix a potental access after delete in tipc_sk_join() · febafc84
      Jon Maloy 提交于
      In commit d12d2e12 "tipc: send out join messages as soon as new
      member is discovered") we added a call to the function tipc_group_join()
      without considering the case that the preceding tipc_sk_publish() might
      have failed, and the group item already deleted.
      
      We fix this by returning from tipc_sk_join() directly after the
      failed tipc_sk_publish.
      
      Reported-by: syzbot+e3eeae78ea88b8d6d858@syzkaller.appspotmail.com
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      febafc84
  29. 10 1月, 2018 3 次提交
    • J
      tipc: improve poll() for group member socket · eb929a91
      Jon Maloy 提交于
      The current criteria for returning POLLOUT from a group member socket is
      too simplistic. It basically returns POLLOUT as soon as the group has
      external destinations, something obviously leading to a lot of spinning
      during destination congestion situations. At the same time, the internal
      congestion handling is unnecessarily complex.
      
      We now change this as follows.
      
      - We introduce an 'open' flag in  struct tipc_group. This flag is used
        only to help poll() get the setting of POLLOUT right, and *not* for
        congeston handling as such. This means that a user can choose to
        ignore an  EAGAIN for a destination and go on sending messages to
        other destinations in the group if he wants to.
      
      - The flag is set to false every time we return EAGAIN on a send call.
      
      - The flag is set to true every time any member, i.e., not necessarily
        the member that caused EAGAIN, is removed from the small_win list.
      
      - We remove the group member 'usr_pending' flag. The size of the send
        window and presence in the 'small_win' list is sufficient criteria
        for recognizing congestion.
      
      This solution seems to be a reasonable compromise between 'anycast',
      which is normally not waiting for POLLOUT for a specific destination,
      and the other three send modes, which are.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb929a91
    • J
      tipc: improve groupcast scope handling · 232d07b7
      Jon Maloy 提交于
      When a member joins a group, it also indicates a binding scope. This
      makes it possible to create both node local groups, invisible to other
      nodes, as well as cluster global groups, visible everywhere.
      
      In order to avoid that different members end up having permanently
      differing views of group size and memberhip, we must inhibit locally
      and globally bound members from joining the same group.
      
      We do this by using the binding scope as an additional separator between
      groups. I.e., a member must ignore all membership events from sockets
      using a different scope than itself, and all lookups for message
      destinations must require an exact match between the message's lookup
      scope and the potential target's binding scope.
      
      Apart from making it possible to create local groups using the same
      identity on different nodes, a side effect of this is that it now also
      becomes possible to create a cluster global group with the same identity
      across the same nodes, without interfering with the local groups.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232d07b7
    • J
      tipc: send out join messages as soon as new member is discovered · d12d2e12
      Jon Maloy 提交于
      When a socket is joining a group, we look up in the binding table to
      find if there are already other members of the group present. This is
      used for being able to return EAGAIN instead of EHOSTUNREACH if the
      user proceeds directly to a send attempt.
      
      However, the information in the binding table can be used to directly
      set the created member in state MBR_PUBLISHED and send a JOIN message
      to the peer, instead of waiting for a topology PUBLISH event to do this.
      When there are many members in a group, the propagation time for such
      events can be significant, and we can save time during the join
      operation if we use the initial lookup result fully.
      
      In this commit, we eliminate the member state MBR_DISCOVERED which has
      been the result of the initial lookup, and do instead go directly to
      MBR_PUBLISHED, which initiates the setup.
      
      After this change, the tipc_member FSM looks as follows:
      
           +-----------+
      ---->| PUBLISHED |-----------------------------------------------+
      PUB- +-----------+                                 LEAVE/WITHRAW |
      LISH       |JOIN                                                 |
                 |     +-------------------------------------------+   |
                 |     |                            LEAVE/WITHDRAW |   |
                 |     |                +------------+             |   |
                 |     |   +----------->|  PENDING   |---------+   |   |
                 |     |   |msg/maxactv +-+---+------+  LEAVE/ |   |   |
                 |     |   |              |   |       WITHDRAW |   |   |
                 |     |   |   +----------+   |                |   |   |
                 |     |   |   |revert/maxactv|                |   |   |
                 |     |   |   V              V                V   V   V
                 |   +----------+  msg  +------------+       +-----------+
                 +-->|  JOINED  |------>|   ACTIVE   |------>|  LEAVING  |--->
                 |   +----------+       +--- -+------+ LEAVE/+-----------+DOWN
                 |        A   A               |      WITHDRAW A   A    A   EVT
                 |        |   |               |RECLAIM        |   |    |
                 |        |   |REMIT          V               |   |    |
                 |        |   |== adv   +------------+        |   |    |
                 |        |   +---------| RECLAIMING |--------+   |    |
                 |        |             +-----+------+  LEAVE/    |    |
                 |        |                   |REMIT   WITHDRAW   |    |
                 |        |                   |< adv              |    |
                 |        |msg/               V            LEAVE/ |    |
                 |        |adv==ADV_IDLE+------------+   WITHDRAW |    |
                 |        +-------------|  REMITTED  |------------+    |
                 |                      +------------+                 |
                 |PUBLISH                                              |
      JOIN +-----------+                                LEAVE/WITHDRAW |
      ---->|  JOINING  |-----------------------------------------------+
           +-----------+
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12d2e12