1. 24 3月, 2018 6 次提交
    • J
      tipc: add 128-bit node identifier · d50ccc2d
      Jon Maloy 提交于
      We add a 128-bit node identity, as an alternative to the currently used
      32-bit node address.
      
      For the sake of compatibility and to minimize message header changes
      we retain the existing 32-bit address field. When not set explicitly by
      the user, this field will be filled with a hash value generated from the
      much longer node identity, and be used as a shorthand value for the
      latter.
      
      We permit either the address or the identity to be set by configuration,
      but not both, so when the address value is set by a legacy user the
      corresponding 128-bit node identity is generated based on the that value.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d50ccc2d
    • J
      tipc: remove direct accesses to own_addr field in struct tipc_net · 23fd3eac
      Jon Maloy 提交于
      As a preparation to changing the addressing structure of TIPC we replace
      all direct accesses to the tipc_net::own_addr field with the function
      dedicated for this, tipc_own_addr().
      
      There are no changes to program logics in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fd3eac
    • J
      tipc: allow closest-first lookup algorithm when legacy address is configured · b89afb11
      Jon Maloy 提交于
      The removal of an internal structure of the node address has an unwanted
      side effect.
      - Currently, if a user is sending an anycast message with destination
        domain 0, the tipc_namebl_translate() function will use the 'closest-
        first' algorithm to first look for a node local destination, and only
        when no such is found, will it resort to the cluster global 'round-
        robin' lookup algorithm.
      - Current users can get around this, and enforce unconditional use of
        global round-robin by indicating a destination as Z.0.0 or Z.C.0.
      - This option disappears when we make the node address flat, since the
        lookup algorithm has no way of recognizing this case. So, as long as
        there are node local destinations, the algorithm will always select
        one of those, and there is nothing the sender can do to change this.
      
      We solve this by eliminating the 'closest-first' option, which was never
      a good idea anyway, for non-legacy users, but only for those. To
      distinguish between legacy users and non-legacy users we introduce a new
      flag 'legacy_addr_format' in struct tipc_core, to be set when the user
      configures a legacy-style Z.C.N node address. Hence, when a legacy user
      indicates a zero lookup domain 'closest-first' is selected, and in all
      other cases we use 'round-robin'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b89afb11
    • J
      tipc: remove restrictions on node address values · 20263641
      Jon Maloy 提交于
      Nominally, TIPC organizes network nodes into a three-level network
      hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
      hierarchy is reflected in the node address format, - it is sub-divided
      into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.
      
      However, the 'zone' and 'cluster' levels have in reality never been
      fully implemented,and never will be. The result of this has been
      that the first 20 bits the node identity structure have been wasted,
      and the usable node identity range within a cluster has been limited
      to 12 bits. This is starting to become a problem.
      
      In the following commits, we will need to be able to connect between
      nodes which are using the whole 32-bit value space of the node address.
      We therefore remove the restrictions on which values can be assigned
      to node identity, -it is from now on only a 32-bit integer with no
      assumed internal structure.
      
      Isolation between clusters is now achieved only by setting different
      values for the 'network id' field used during neighbor discovery, in
      practice leading to the latter becoming the new cluster identity.
      
      The rules for accepting discovery requests/responses from neighboring
      nodes now become:
      
      - If the user is using legacy address format on both peers, reception
        of discovery messages is subject to the legacy lookup domain check
        in addition to the cluster id check.
      
      - Otherwise, the discovery request/response is always accepted, provided
        both peers have the same network id.
      
      This secures backwards compatibility for users who have been using zone
      or cluster identities as cluster separators, instead of the intended
      'network id'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20263641
    • J
      tipc: some cleanups in the file discover.c · b39e465e
      Jon Maloy 提交于
      To facilitate the coming changes in the neighbor discovery functionality
      we make some renaming and refactoring of that code. The functional changes
      in this commit are trivial, e.g., that we move the message sending call in
      tipc_disc_timeout() outside the spinlock protected region.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b39e465e
    • J
      tipc: refactor function tipc_enable_bearer() · cb30a633
      Jon Maloy 提交于
      As a preparation for the next commits we try to reduce the footprint of
      the function tipc_enable_bearer(), while hopefully making is simpler to
      follow.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb30a633
  2. 23 3月, 2018 3 次提交
  3. 18 3月, 2018 5 次提交
  4. 13 3月, 2018 1 次提交
  5. 08 3月, 2018 1 次提交
  6. 28 2月, 2018 1 次提交
    • J
      tipc: correct initial value for group congestion flag · 1b22bcad
      Jon Maloy 提交于
      In commit 60c25306 ("tipc: fix race between poll() and
      setsockopt()") we introduced a pointer from struct tipc_group to the
      'group_is_connected' flag in struct tipc_sock, so that this field can
      be checked without dereferencing the group pointer of the latter struct.
      
      The initial value for this flag is correctly set to 'false' when a
      group is created, but we miss the case when no group is created at
      all, in which case the initial value should be 'true'. This has the
      effect that SOCK_RDM/DGRAM sockets sending datagrams never receive
      POLLOUT if they request so.
      
      This commit corrects this bug.
      
      Fixes: 60c25306 ("tipc: fix race between poll() and setsockopt()")
      Reported-by: NHoang Le <hoang.h.le@dektek.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b22bcad
  7. 20 2月, 2018 2 次提交
    • P
      tipc: don't call sock_release() in atomic context · 26736a08
      Paolo Abeni 提交于
      syzbot reported a scheduling while atomic issue at netns
      destruction time:
      
      BUG: sleeping function called from invalid context at net/core/sock.c:2769
      in_atomic(): 1, irqs_disabled(): 0, pid: 85, name: kworker/u4:3
      5 locks held by kworker/u4:3/85:
        #0:  ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c9792deb>]
      process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
        #1:  (net_cleanup_work){+.+.}, at: [<00000000adc12e2a>]
      process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
        #2:  (net_sem){++++}, at: [<000000009ccb5669>] cleanup_net+0x23f/0xd20
      net/core/net_namespace.c:494
        #3:  (net_mutex){+.+.}, at: [<00000000a92767d9>] cleanup_net+0xa7d/0xd20
      net/core/net_namespace.c:496
        #4:  (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>]
      spin_lock_bh include/linux/spinlock.h:315 [inline]
        #4:  (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>]
      tipc_topsrv_stop+0x231/0x610 net/tipc/topsrv.c:685
      CPU: 0 PID: 85 Comm: kworker/u4:3 Not tainted 4.16.0-rc1+ #230
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6128
        __might_sleep+0x95/0x190 kernel/sched/core.c:6081
        lock_sock_nested+0x37/0x110 net/core/sock.c:2769
        lock_sock include/net/sock.h:1463 [inline]
        tipc_release+0x103/0xff0 net/tipc/socket.c:572
        sock_release+0x8d/0x1e0 net/socket.c:594
        tipc_topsrv_stop+0x3c0/0x610 net/tipc/topsrv.c:696
        tipc_exit_net+0x15/0x40 net/tipc/core.c:96
        ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:148
        cleanup_net+0x6ba/0xd20 net/core/net_namespace.c:529
        process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
        worker_thread+0x223/0x1990 kernel/workqueue.c:2247
        kthread+0x33c/0x400 kernel/kthread.c:238
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429
      
      This is caused by tipc_topsrv_stop() releasing the listener socket
      with the idr lock held. This changeset addresses the issue moving
      the release operation outside such lock.
      
      Reported-and-tested-by: syzbot+749d9d87c294c00ca856@syzkaller.appspotmail.com
      Fixes: 0ef897be ("tipc: separate topology server listener socket from subcsriber sockets")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by:  ///jon
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26736a08
    • J
      tipc: fix bug on error path in tipc_topsrv_kern_subscr() · 96c252bf
      Jon Maloy 提交于
      In commit cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we
      re-introduced an old bug on the error path in the function
      tipc_topsrv_kern_subscr(). We now re-introduce the correction too.
      
      Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96c252bf
  8. 17 2月, 2018 10 次提交
  9. 15 2月, 2018 8 次提交
    • J
      tipc: apply bearer link tolerance on running links · 37c64cf6
      Jon Maloy 提交于
      Currently, the default link tolerance set in struct tipc_bearer only
      has effect on links going up after that moment. I.e., a user has to
      reset all the node's links across that bearer to have the new value
      applied. This is too limiting and disturbing on a running cluster to
      be useful.
      
      We now change this so that also already existing links are updated
      dynamically, without any need for a reset, when the bearer value is
      changed. We leverage the already existing per-link functionality
      for this to achieve the wanted effect.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37c64cf6
    • Y
      tipc: Fix missing RTNL lock protection during setting link properties · ed4ffdfe
      Ying Xue 提交于
      Currently when user changes link properties, TIPC first checks if
      user's command message contains media name or bearer name through
      tipc_media_find() or tipc_bearer_find() which is protected by RTNL
      lock. But when tipc_nl_compat_link_set() conducts the checking with
      the two functions, it doesn't hold RTNL lock at all, as a result,
      the following complaints were reported:
      
      audit: type=1400 audit(1514679888.244:9): avc:  denied  { write } for
      pid=3194 comm="syzkaller021477" path="socket:[11143]" dev="sockfs"
      ino=11143 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
      tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
      tclass=netlink_generic_socket permissive=1
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      
      =============================
      WARNING: suspicious RCU usage
      4.15.0-rc5+ #152 Not tainted
      -----------------------------
      net/tipc/bearer.c:177 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syzkaller021477/3194:
        #0:  (cb_lock){++++}, at: [<00000000d20133ea>] genl_rcv+0x19/0x40
      net/netlink/genetlink.c:634
        #1:  (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_lock
      net/netlink/genetlink.c:33 [inline]
        #1:  (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_rcv_msg+0x115/0x140
      net/netlink/genetlink.c:622
      
      stack backtrace:
      CPU: 1 PID: 3194 Comm: syzkaller021477 Not tainted 4.15.0-rc5+ #152
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x257 lib/dump_stack.c:53
        lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
        tipc_bearer_find+0x2b4/0x3b0 net/tipc/bearer.c:177
        tipc_nl_compat_link_set+0x329/0x9f0 net/tipc/netlink_compat.c:729
        __tipc_nl_compat_doit net/tipc/netlink_compat.c:288 [inline]
        tipc_nl_compat_doit+0x15b/0x660 net/tipc/netlink_compat.c:335
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1119 [inline]
        tipc_nl_compat_recv+0x112f/0x18f0 net/tipc/netlink_compat.c:1201
        genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599
        genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624
        netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
        genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
        netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
        netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
        netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
        sock_sendmsg_nosec net/socket.c:636 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:646
        sock_write_iter+0x31a/0x5d0 net/socket.c:915
        call_write_iter include/linux/fs.h:1772 [inline]
        new_sync_write fs/read_write.c:469 [inline]
        __vfs_write+0x684/0x970 fs/read_write.c:482
        vfs_write+0x189/0x510 fs/read_write.c:544
        SYSC_write fs/read_write.c:589 [inline]
        SyS_write+0xef/0x220 fs/read_write.c:581
        do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
        do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
        entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129
      
      In order to correct the mistake, __tipc_nl_compat_doit() has been
      protected by RTNL lock, which means the whole operation of setting
      bearer/media properties is under RTNL protection.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reported-by: Nsyzbot <syzbot+6345fd433db009b29413@syzkaller.appspotmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed4ffdfe
    • Y
      tipc: Introduce __tipc_nl_net_set · 5631f65d
      Ying Xue 提交于
      Introduce __tipc_nl_net_set() which doesn't hold RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5631f65d
    • Y
      tipc: Introduce __tipc_nl_media_set · 07ffb223
      Ying Xue 提交于
      Introduce __tipc_nl_media_set() which doesn't hold RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ffb223
    • Y
      tipc: Introduce __tipc_nl_bearer_set · 93532bb1
      Ying Xue 提交于
      Introduce __tipc_nl_bearer_set() which doesn't holding RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93532bb1
    • Y
      tipc: Introduce __tipc_nl_bearer_enable · 45cf7edf
      Ying Xue 提交于
      Introduce __tipc_nl_bearer_enable() which doesn't hold RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45cf7edf
    • Y
      tipc: Introduce __tipc_nl_bearer_disable · d59d8b77
      Ying Xue 提交于
      Introduce __tipc_nl_bearer_disable() which doesn't hold RTNL lock.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d59d8b77
    • Y
      tipc: Refactor __tipc_nl_compat_doit · e5d1a1ee
      Ying Xue 提交于
      As preparation for adding RTNL to make (*cmd->transcode)() and
      (*cmd->transcode)() constantly protected by RTNL lock, we move out of
      memory allocations existing between them as many as possible so that
      the time of holding RTNL can be minimized in __tipc_nl_compat_doit().
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5d1a1ee
  10. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  11. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  12. 09 2月, 2018 1 次提交
    • H
      tipc: fix skb truesize/datasize ratio control · 55b3280d
      Hoang Le 提交于
      In commit d618d09a ("tipc: enforce valid ratio between skb truesize
      and contents") we introduced a test for ensuring that the condition
      truesize/datasize <= 4 is true for a received buffer. Unfortunately this
      test has two problems.
      
      - Because of the integer arithmetics the test
        if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
        ratios [4 < ratio < 5], which was not the intention.
      - The buffer returned by skb_copy() inherits skb->truesize of the
        original buffer, which doesn't help the situation at all.
      
      In this commit, we change the ratio condition and replace skb_copy()
      with a call to skb_copy_expand() to finally get this right.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55b3280d