1. 01 4月, 2018 1 次提交
    • J
      tipc: replace name table service range array with rb tree · 218527fe
      Jon Maloy 提交于
      The current design of the binding table has an unnecessary memory
      consuming and complex data structure. It aggregates the service range
      items into an array, which is expanded by a factor two every time it
      becomes too small to hold a new item. Furthermore, the arrays never
      shrink when the number of ranges diminishes.
      
      We now replace this array with an RB tree that is holding the range
      items as tree nodes, each range directly holding a list of bindings.
      
      This, along with a few name changes, improves both readability and
      volume of the code, as well as reducing memory consumption and hopefully
      improving cache hit rate.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      218527fe
  2. 28 3月, 2018 1 次提交
  3. 27 3月, 2018 2 次提交
  4. 26 3月, 2018 1 次提交
  5. 24 3月, 2018 8 次提交
    • J
      tipc: obtain node identity from interface by default · 52dfae5c
      Jon Maloy 提交于
      Selecting and explicitly configuring a TIPC node identity may be
      unwanted in some cases.
      
      In this commit we introduce a default setting if the identity has not
      been set at the moment the first bearer is enabled. We do this by
      using a raw copy of a unique identifier from the used interface: MAC
      address in the case of an L2 bearer, IPv4/IPv6 address in the case
      of a UDP bearer.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52dfae5c
    • J
      tipc: handle collisions of 32-bit node address hash values · 25b0b9c4
      Jon Maloy 提交于
      When a 32-bit node address is generated from a 128-bit identifier,
      there is a risk of collisions which must be discovered and handled.
      
      We do this as follows:
      - We don't apply the generated address immediately to the node, but do
        instead initiate a 1 sec trial period to allow other cluster members
        to discover and handle such collisions.
      
      - During the trial period the node periodically sends out a new type
        of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
        to all the other nodes in the cluster.
      
      - When a node is receiving such a message, it must check that the
        presented 32-bit identifier either is unused, or was used by the very
        same peer in a previous session. In both cases it accepts the request
        by not responding to it.
      
      - If it finds that the same node has been up before using a different
        address, it responds with a DSC_TRIAL_FAIL_MSG containing that
        address.
      
      - If it finds that the address has already been taken by some other
        node, it generates a new, unused address and returns it to the
        requester.
      
      - During the trial period the requesting node must always be prepared
        to accept a failure message, i.e., a message where a peer suggests a
        different (or equal)  address to the one tried. In those cases it
        must apply the suggested value as trial address and restart the trial
        period.
      
      This algorithm ensures that in the vast majority of cases a node will
      have the same address before and after a reboot. If a legacy user
      configures the address explicitly, there will be no trial period and
      messages, so this protocol addition is completely backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25b0b9c4
    • J
      tipc: add 128-bit node identifier · d50ccc2d
      Jon Maloy 提交于
      We add a 128-bit node identity, as an alternative to the currently used
      32-bit node address.
      
      For the sake of compatibility and to minimize message header changes
      we retain the existing 32-bit address field. When not set explicitly by
      the user, this field will be filled with a hash value generated from the
      much longer node identity, and be used as a shorthand value for the
      latter.
      
      We permit either the address or the identity to be set by configuration,
      but not both, so when the address value is set by a legacy user the
      corresponding 128-bit node identity is generated based on the that value.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d50ccc2d
    • J
      tipc: remove direct accesses to own_addr field in struct tipc_net · 23fd3eac
      Jon Maloy 提交于
      As a preparation to changing the addressing structure of TIPC we replace
      all direct accesses to the tipc_net::own_addr field with the function
      dedicated for this, tipc_own_addr().
      
      There are no changes to program logics in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fd3eac
    • J
      tipc: allow closest-first lookup algorithm when legacy address is configured · b89afb11
      Jon Maloy 提交于
      The removal of an internal structure of the node address has an unwanted
      side effect.
      - Currently, if a user is sending an anycast message with destination
        domain 0, the tipc_namebl_translate() function will use the 'closest-
        first' algorithm to first look for a node local destination, and only
        when no such is found, will it resort to the cluster global 'round-
        robin' lookup algorithm.
      - Current users can get around this, and enforce unconditional use of
        global round-robin by indicating a destination as Z.0.0 or Z.C.0.
      - This option disappears when we make the node address flat, since the
        lookup algorithm has no way of recognizing this case. So, as long as
        there are node local destinations, the algorithm will always select
        one of those, and there is nothing the sender can do to change this.
      
      We solve this by eliminating the 'closest-first' option, which was never
      a good idea anyway, for non-legacy users, but only for those. To
      distinguish between legacy users and non-legacy users we introduce a new
      flag 'legacy_addr_format' in struct tipc_core, to be set when the user
      configures a legacy-style Z.C.N node address. Hence, when a legacy user
      indicates a zero lookup domain 'closest-first' is selected, and in all
      other cases we use 'round-robin'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b89afb11
    • J
      tipc: remove restrictions on node address values · 20263641
      Jon Maloy 提交于
      Nominally, TIPC organizes network nodes into a three-level network
      hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
      hierarchy is reflected in the node address format, - it is sub-divided
      into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
      
      However, the 'zone' and 'cluster' levels have in reality never been
      fully implemented,and never will be. The result of this has been
      that the first 20 bits the node identity structure have been wasted,
      and the usable node identity range within a cluster has been limited
      to 12 bits. This is starting to become a problem.
      
      In the following commits, we will need to be able to connect between
      nodes which are using the whole 32-bit value space of the node address.
      We therefore remove the restrictions on which values can be assigned
      to node identity, -it is from now on only a 32-bit integer with no
      assumed internal structure.
      
      Isolation between clusters is now achieved only by setting different
      values for the 'network id' field used during neighbor discovery, in
      practice leading to the latter becoming the new cluster identity.
      
      The rules for accepting discovery requests/responses from neighboring
      nodes now become:
      
      - If the user is using legacy address format on both peers, reception
        of discovery messages is subject to the legacy lookup domain check
        in addition to the cluster id check.
      
      - Otherwise, the discovery request/response is always accepted, provided
        both peers have the same network id.
      
      This secures backwards compatibility for users who have been using zone
      or cluster identities as cluster separators, instead of the intended
      'network id'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20263641
    • J
      tipc: some cleanups in the file discover.c · b39e465e
      Jon Maloy 提交于
      To facilitate the coming changes in the neighbor discovery functionality
      we make some renaming and refactoring of that code. The functional changes
      in this commit are trivial, e.g., that we move the message sending call in
      tipc_disc_timeout() outside the spinlock protected region.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b39e465e
    • J
      tipc: refactor function tipc_enable_bearer() · cb30a633
      Jon Maloy 提交于
      As a preparation for the next commits we try to reduce the footprint of
      the function tipc_enable_bearer(), while hopefully making is simpler to
      follow.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb30a633
  6. 23 3月, 2018 3 次提交
  7. 18 3月, 2018 5 次提交
  8. 13 3月, 2018 1 次提交
  9. 08 3月, 2018 1 次提交
  10. 28 2月, 2018 1 次提交
    • J
      tipc: correct initial value for group congestion flag · 1b22bcad
      Jon Maloy 提交于
      In commit 60c25306 ("tipc: fix race between poll() and
      setsockopt()") we introduced a pointer from struct tipc_group to the
      'group_is_connected' flag in struct tipc_sock, so that this field can
      be checked without dereferencing the group pointer of the latter struct.
      
      The initial value for this flag is correctly set to 'false' when a
      group is created, but we miss the case when no group is created at
      all, in which case the initial value should be 'true'. This has the
      effect that SOCK_RDM/DGRAM sockets sending datagrams never receive
      POLLOUT if they request so.
      
      This commit corrects this bug.
      
      Fixes: 60c25306 ("tipc: fix race between poll() and setsockopt()")
      Reported-by: NHoang Le <hoang.h.le@dektek.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b22bcad
  11. 20 2月, 2018 2 次提交
    • P
      tipc: don't call sock_release() in atomic context · 26736a08
      Paolo Abeni 提交于
      syzbot reported a scheduling while atomic issue at netns
      destruction time:
      
      BUG: sleeping function called from invalid context at net/core/sock.c:2769
      in_atomic(): 1, irqs_disabled(): 0, pid: 85, name: kworker/u4:3
      5 locks held by kworker/u4:3/85:
        #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c9792deb>]
      process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
        #1:  (net_cleanup_work){+.+.}, at: [<00000000adc12e2a>]
      process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
        #2:  (net_sem){++++}, at: [<000000009ccb5669>] cleanup_net+0x23f/0xd20
      net/core/net_namespace.c:494
        #3:  (net_mutex){+.+.}, at: [<00000000a92767d9>] cleanup_net+0xa7d/0xd20
      net/core/net_namespace.c:496
        #4:  (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>]
      spin_lock_bh include/linux/spinlock.h:315 [inline]
        #4:  (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>]
      tipc_topsrv_stop+0x231/0x610 net/tipc/topsrv.c:685
      CPU: 0 PID: 85 Comm: kworker/u4:3 Not tainted 4.16.0-rc1+ #230
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6128
        __might_sleep+0x95/0x190 kernel/sched/core.c:6081
        lock_sock_nested+0x37/0x110 net/core/sock.c:2769
        lock_sock include/net/sock.h:1463 [inline]
        tipc_release+0x103/0xff0 net/tipc/socket.c:572
        sock_release+0x8d/0x1e0 net/socket.c:594
        tipc_topsrv_stop+0x3c0/0x610 net/tipc/topsrv.c:696
        tipc_exit_net+0x15/0x40 net/tipc/core.c:96
        ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:148
        cleanup_net+0x6ba/0xd20 net/core/net_namespace.c:529
        process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
        worker_thread+0x223/0x1990 kernel/workqueue.c:2247
        kthread+0x33c/0x400 kernel/kthread.c:238
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429
      
      This is caused by tipc_topsrv_stop() releasing the listener socket
      with the idr lock held. This changeset addresses the issue moving
      the release operation outside such lock.
      
      Reported-and-tested-by: syzbot+749d9d87c294c00ca856@syzkaller.appspotmail.com
      Fixes: 0ef897be ("tipc: separate topology server listener socket from subcsriber sockets")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by:  ///jon
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26736a08
    • J
      tipc: fix bug on error path in tipc_topsrv_kern_subscr() · 96c252bf
      Jon Maloy 提交于
      In commit cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we
      re-introduced an old bug on the error path in the function
      tipc_topsrv_kern_subscr(). We now re-introduce the correction too.
      
      Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96c252bf
  12. 17 2月, 2018 10 次提交
  13. 15 2月, 2018 4 次提交
    • J
      tipc: apply bearer link tolerance on running links · 37c64cf6
      Jon Maloy 提交于
      Currently, the default link tolerance set in struct tipc_bearer only
      has effect on links going up after that moment. I.e., a user has to
      reset all the node's links across that bearer to have the new value
      applied. This is too limiting and disturbing on a running cluster to
      be useful.
      
      We now change this so that also already existing links are updated
      dynamically, without any need for a reset, when the bearer value is
      changed. We leverage the already existing per-link functionality
      for this to achieve the wanted effect.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37c64cf6
    • Y
      tipc: Fix missing RTNL lock protection during setting link properties · ed4ffdfe
      Ying Xue 提交于
      Currently when user changes link properties, TIPC first checks if
      user's command message contains media name or bearer name through
      tipc_media_find() or tipc_bearer_find() which is protected by RTNL
      lock. But when tipc_nl_compat_link_set() conducts the checking with
      the two functions, it doesn't hold RTNL lock at all, as a result,
      the following complaints were reported:
      
      audit: type=1400 audit(1514679888.244:9): avc:  denied  { write } for
      pid=3194 comm="syzkaller021477" path="socket:[11143]" dev="sockfs"
      ino=11143 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
      tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
      tclass=netlink_generic_socket permissive=1
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      
      =============================
      WARNING: suspicious RCU usage
      4.15.0-rc5+ #152 Not tainted
      -----------------------------
      net/tipc/bearer.c:177 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syzkaller021477/3194:
        #0:  (cb_lock){++++}, at: [<00000000d20133ea>] genl_rcv+0x19/0x40
      net/netlink/genetlink.c:634
        #1:  (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_lock
      net/netlink/genetlink.c:33 [inline]
        #1:  (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_rcv_msg+0x115/0x140
      net/netlink/genetlink.c:622
      
      stack backtrace:
      CPU: 1 PID: 3194 Comm: syzkaller021477 Not tainted 4.15.0-rc5+ #152
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
        tipc_bearer_find+0x2b4/0x3b0 net/tipc/bearer.c:177
        tipc_nl_compat_link_set+0x329/0x9f0 net/tipc/netlink_compat.c:729
        __tipc_nl_compat_doit net/tipc/netlink_compat.c:288 [inline]
        tipc_nl_compat_doit+0x15b/0x660 net/tipc/netlink_compat.c:335
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1119 [inline]
        tipc_nl_compat_recv+0x112f/0x18f0 net/tipc/netlink_compat.c:1201
        genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599
        genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624
        netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
        genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
        netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
        netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
        netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
        sock_sendmsg_nosec net/socket.c:636 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:646
        sock_write_iter+0x31a/0x5d0 net/socket.c:915
        call_write_iter include/linux/fs.h:1772 [inline]
        new_sync_write fs/read_write.c:469 [inline]
        __vfs_write+0x684/0x970 fs/read_write.c:482
        vfs_write+0x189/0x510 fs/read_write.c:544
        SYSC_write fs/read_write.c:589 [inline]
        SyS_write+0xef/0x220 fs/read_write.c:581
        do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
        do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
        entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129
      
      In order to correct the mistake, __tipc_nl_compat_doit() has been
      protected by RTNL lock, which means the whole operation of setting
      bearer/media properties is under RTNL protection.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reported-by: Nsyzbot <syzbot+6345fd433db009b29413@syzkaller.appspotmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed4ffdfe
    • Y
      tipc: Introduce __tipc_nl_net_set · 5631f65d
      Ying Xue 提交于
      Introduce __tipc_nl_net_set() which doesn't hold RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5631f65d
    • Y
      tipc: Introduce __tipc_nl_media_set · 07ffb223
      Ying Xue 提交于
      Introduce __tipc_nl_media_set() which doesn't hold RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ffb223