1. 10 8月, 2017 1 次提交
    • J
      tipc: remove premature ESTABLISH FSM event at link synchronization · ed43594a
      Jon Paul Maloy 提交于
      When a link between two nodes come up, both endpoints will initially
      send out a STATE message to the peer, to increase the probability that
      the peer endpoint also is up when the first traffic message arrives.
      Thereafter, if the establishing link is the second link between two
      nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
      helping the peer to perform initial synchronization between the two
      links.
      
      However, the initial STATE message may be lost, in which case the SYNCH
      message will be the first one arriving at the peer. This should also
      work, as the SYNCH message itself will be used to take up the link
      endpoint before  initializing synchronization.
      
      Unfortunately the code for this case is broken. Currently, the link is
      brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
      arrives, whereupon __tipc_node_link_up() is called to distribute the
      link slots and take the link into traffic. But, __tipc_node_link_up() is
      itself starting with a test for whether the link is up, and if true,
      returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
      is unnecessary, since tipc_node_link_up() is itself issuing such an
      event, but also harmful, since it inhibits tipc_node_link_up() to
      perform the test of its tasks, and the link endpoint in question hence
      is never taken into traffic.
      
      This problem has been exposed when we set up dual links between pre-
      and post-4.4 kernels, because the former ones don't send out the
      initial STATE message described above.
      
      We fix this by removing the unnecessary event call.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed43594a
  2. 01 7月, 2017 1 次提交
  3. 11 6月, 2017 1 次提交
  4. 12 5月, 2017 1 次提交
    • J
      tipc: make macro tipc_wait_for_cond() smp safe · 844cf763
      Jon Paul Maloy 提交于
      The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
      to fulfil its task. The latter, in turn, is evaluating the stated
      condition outside the socket lock context. This is problematic if
      the condition is accessing non-trivial data structures which may be
      altered by incoming interrupts, as is the case with the cong_links()
      linked list, used by socket to keep track of the current set of
      congested links. We sometimes see crashes when this list is accessed
      by a condition function at the same time as a SOCK_WAKEUP interrupt
      is removing an element from the list.
      
      We fix this by expanding selected parts of sk_wait_event() into the
      outer macro, while ensuring that all evaluations of a given condition
      are performed under socket lock protection.
      
      Fixes: commit 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reviewed-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      844cf763
  5. 03 5月, 2017 2 次提交
  6. 29 4月, 2017 3 次提交
  7. 25 4月, 2017 3 次提交
  8. 14 4月, 2017 2 次提交
    • J
      netlink: pass extended ACK struct where available · fe52145f
      Johannes Berg 提交于
      This is an add-on to the previous patch that passes the extended ACK
      structure where it's already available by existing genl_info or extack
      function arguments.
      
      This was done with this spatch (with some manual adjustment of
      indentation):
      
      @@
      expression A, B, C, D, E;
      identifier fn, info;
      @@
      fn(..., struct genl_info *info, ...) {
      ...
      -nlmsg_parse(A, B, C, D, E, NULL)
      +nlmsg_parse(A, B, C, D, E, info->extack)
      ...
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, info;
      @@
      fn(..., struct genl_info *info, ...) {
      <...
      -nla_parse_nested(A, B, C, D, NULL)
      +nla_parse_nested(A, B, C, D, info->extack)
      ...>
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nlmsg_parse(A, B, C, D, E, NULL)
      +nlmsg_parse(A, B, C, D, E, extack)
      ...>
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_parse(A, B, C, D, E, NULL)
      +nla_parse(A, B, C, D, E, extack)
      ...>
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      ...
      -nlmsg_parse(A, B, C, D, E, NULL)
      +nlmsg_parse(A, B, C, D, E, extack)
      ...
      }
      
      @@
      expression A, B, C, D;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_parse_nested(A, B, C, D, NULL)
      +nla_parse_nested(A, B, C, D, extack)
      ...>
      }
      
      @@
      expression A, B, C, D;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nlmsg_validate(A, B, C, D, NULL)
      +nlmsg_validate(A, B, C, D, extack)
      ...>
      }
      
      @@
      expression A, B, C, D;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_validate(A, B, C, D, NULL)
      +nla_validate(A, B, C, D, extack)
      ...>
      }
      
      @@
      expression A, B, C;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_validate_nested(A, B, C, NULL)
      +nla_validate_nested(A, B, C, extack)
      ...>
      }
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe52145f
    • J
      netlink: pass extended ACK struct to parsing functions · fceb6435
      Johannes Berg 提交于
      Pass the new extended ACK reporting struct to all of the generic
      netlink parsing functions. For now, pass NULL in almost all callers
      (except for some in the core.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fceb6435
  9. 30 3月, 2017 2 次提交
  10. 29 3月, 2017 2 次提交
  11. 23 3月, 2017 1 次提交
    • Y
      tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe · 557d054c
      Ying Xue 提交于
      Until now, tipc_nametbl_unsubscribe() is called at subscriptions
      reference count cleanup. Usually the subscriptions cleanup is
      called at subscription timeout or at subscription cancel or at
      subscriber delete.
      
      We have ignored the possibility of this being called from other
      locations, which causes deadlock as we try to grab the
      tn->nametbl_lock while holding it already.
      
         CPU1:                             CPU2:
      ----------                     ----------------
      tipc_nametbl_publish
      spin_lock_bh(&tn->nametbl_lock)
      tipc_nametbl_insert_publ
      tipc_nameseq_insert_publ
      tipc_subscrp_report_overlap
      tipc_subscrp_get
      tipc_subscrp_send_event
                                   tipc_close_conn
                                   tipc_subscrb_release_cb
                                   tipc_subscrb_delete
                                   tipc_subscrp_put
      tipc_subscrp_put
      tipc_subscrp_kref_release
      tipc_nametbl_unsubscribe
      spin_lock_bh(&tn->nametbl_lock)
      <<grab nametbl_lock again>>
      
         CPU1:                              CPU2:
      ----------                     ----------------
      tipc_nametbl_stop
      spin_lock_bh(&tn->nametbl_lock)
      tipc_purge_publications
      tipc_nameseq_remove_publ
      tipc_subscrp_report_overlap
      tipc_subscrp_get
      tipc_subscrp_send_event
                                   tipc_close_conn
                                   tipc_subscrb_release_cb
                                   tipc_subscrb_delete
                                   tipc_subscrp_put
      tipc_subscrp_put
      tipc_subscrp_kref_release
      tipc_nametbl_unsubscribe
      spin_lock_bh(&tn->nametbl_lock)
      <<grab nametbl_lock again>>
      
      In this commit, we advance the calling of tipc_nametbl_unsubscribe()
      from the refcount cleanup to the intended callers.
      
      Fixes: d094c4d5 ("tipc: add subscription refcount to avoid invalid delete")
      Reported-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      557d054c
  12. 10 3月, 2017 1 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  13. 02 3月, 2017 1 次提交
  14. 25 2月, 2017 1 次提交
  15. 18 2月, 2017 1 次提交
  16. 16 2月, 2017 1 次提交
  17. 14 2月, 2017 1 次提交
  18. 26 1月, 2017 1 次提交
  19. 25 1月, 2017 6 次提交
  20. 21 1月, 2017 4 次提交
  21. 17 1月, 2017 1 次提交
  22. 04 1月, 2017 3 次提交
    • J
      tipc: reduce risk of user starvation during link congestion · 365ad353
      Jon Paul Maloy 提交于
      The socket code currently handles link congestion by either blocking
      and trying to send again when the congestion has abated, or just
      returning to the user with -EAGAIN and let him re-try later.
      
      This mechanism is prone to starvation, because the wakeup algorithm is
      non-atomic. During the time the link issues a wakeup signal, until the
      socket wakes up and re-attempts sending, other senders may have come
      in between and occupied the free buffer space in the link. This in turn
      may lead to a socket having to make many send attempts before it is
      successful. In extremely loaded systems we have observed latency times
      of several seconds before a low-priority socket is able to send out a
      message.
      
      In this commit, we simplify this mechanism and reduce the risk of the
      described scenario happening. When a message is attempted sent via a
      congested link, we now let it be added to the link's backlog queue
      anyway, thus permitting an oversubscription of one message per source
      socket. We still create a wakeup item and return an error code, hence
      instructing the sender to block or stop sending. Only when enough space
      has been freed up in the link's backlog queue do we issue a wakeup event
      that allows the sender to continue with the next message, if any.
      
      The fact that a socket now can consider a message sent even when the
      link returns a congestion code means that the sending socket code can
      be simplified. Also, since this is a good opportunity to get rid of the
      obsolete 'mtu change' condition in the three socket send functions, we
      now choose to refactor those functions completely.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      365ad353
    • J
      tipc: modify struct tipc_plist to be more versatile · 4d8642d8
      Jon Paul Maloy 提交于
      During multicast reception we currently use a simple linked list with
      push/pop semantics to store port numbers.
      
      We now see a need for a more generic list for storing values of type
      u32. We therefore make some modifications to this list, while replacing
      the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
      functions which will come to use in the next commits.
      Acked-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d8642d8
    • J
      tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions · 8c44e1af
      Jon Paul Maloy 提交于
      The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
      similar. The latter function is also called from two locations, and
      there will be more in the coming commits, which will all need to test on
      different conditions.
      
      Instead of making yet another duplicates of the function, we now
      introduce a new macro tipc_wait_for_cond() where the wakeup condition
      can be stated as an argument to the call. This macro replaces all
      current and future uses of the two functions, which can now be
      eliminated.
      Acked-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c44e1af