1. 03 5月, 2017 1 次提交
  2. 25 4月, 2017 3 次提交
  3. 14 4月, 2017 2 次提交
    • J
      netlink: pass extended ACK struct where available · fe52145f
      Johannes Berg 提交于
      This is an add-on to the previous patch that passes the extended ACK
      structure where it's already available by existing genl_info or extack
      function arguments.
      
      This was done with this spatch (with some manual adjustment of
      indentation):
      
      @@
      expression A, B, C, D, E;
      identifier fn, info;
      @@
      fn(..., struct genl_info *info, ...) {
      ...
      -nlmsg_parse(A, B, C, D, E, NULL)
      +nlmsg_parse(A, B, C, D, E, info->extack)
      ...
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, info;
      @@
      fn(..., struct genl_info *info, ...) {
      <...
      -nla_parse_nested(A, B, C, D, NULL)
      +nla_parse_nested(A, B, C, D, info->extack)
      ...>
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nlmsg_parse(A, B, C, D, E, NULL)
      +nlmsg_parse(A, B, C, D, E, extack)
      ...>
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_parse(A, B, C, D, E, NULL)
      +nla_parse(A, B, C, D, E, extack)
      ...>
      }
      
      @@
      expression A, B, C, D, E;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      ...
      -nlmsg_parse(A, B, C, D, E, NULL)
      +nlmsg_parse(A, B, C, D, E, extack)
      ...
      }
      
      @@
      expression A, B, C, D;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_parse_nested(A, B, C, D, NULL)
      +nla_parse_nested(A, B, C, D, extack)
      ...>
      }
      
      @@
      expression A, B, C, D;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nlmsg_validate(A, B, C, D, NULL)
      +nlmsg_validate(A, B, C, D, extack)
      ...>
      }
      
      @@
      expression A, B, C, D;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_validate(A, B, C, D, NULL)
      +nla_validate(A, B, C, D, extack)
      ...>
      }
      
      @@
      expression A, B, C;
      identifier fn, extack;
      @@
      fn(..., struct netlink_ext_ack *extack, ...) {
      <...
      -nla_validate_nested(A, B, C, NULL)
      +nla_validate_nested(A, B, C, extack)
      ...>
      }
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe52145f
    • J
      netlink: pass extended ACK struct to parsing functions · fceb6435
      Johannes Berg 提交于
      Pass the new extended ACK reporting struct to all of the generic
      netlink parsing functions. For now, pass NULL in almost all callers
      (except for some in the core.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fceb6435
  4. 30 3月, 2017 2 次提交
  5. 29 3月, 2017 2 次提交
  6. 23 3月, 2017 1 次提交
    • Y
      tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe · 557d054c
      Ying Xue 提交于
      Until now, tipc_nametbl_unsubscribe() is called at subscriptions
      reference count cleanup. Usually the subscriptions cleanup is
      called at subscription timeout or at subscription cancel or at
      subscriber delete.
      
      We have ignored the possibility of this being called from other
      locations, which causes deadlock as we try to grab the
      tn->nametbl_lock while holding it already.
      
         CPU1:                             CPU2:
      ----------                     ----------------
      tipc_nametbl_publish
      spin_lock_bh(&tn->nametbl_lock)
      tipc_nametbl_insert_publ
      tipc_nameseq_insert_publ
      tipc_subscrp_report_overlap
      tipc_subscrp_get
      tipc_subscrp_send_event
                                   tipc_close_conn
                                   tipc_subscrb_release_cb
                                   tipc_subscrb_delete
                                   tipc_subscrp_put
      tipc_subscrp_put
      tipc_subscrp_kref_release
      tipc_nametbl_unsubscribe
      spin_lock_bh(&tn->nametbl_lock)
      <<grab nametbl_lock again>>
      
         CPU1:                              CPU2:
      ----------                     ----------------
      tipc_nametbl_stop
      spin_lock_bh(&tn->nametbl_lock)
      tipc_purge_publications
      tipc_nameseq_remove_publ
      tipc_subscrp_report_overlap
      tipc_subscrp_get
      tipc_subscrp_send_event
                                   tipc_close_conn
                                   tipc_subscrb_release_cb
                                   tipc_subscrb_delete
                                   tipc_subscrp_put
      tipc_subscrp_put
      tipc_subscrp_kref_release
      tipc_nametbl_unsubscribe
      spin_lock_bh(&tn->nametbl_lock)
      <<grab nametbl_lock again>>
      
      In this commit, we advance the calling of tipc_nametbl_unsubscribe()
      from the refcount cleanup to the intended callers.
      
      Fixes: d094c4d5 ("tipc: add subscription refcount to avoid invalid delete")
      Reported-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      557d054c
  7. 10 3月, 2017 1 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  8. 02 3月, 2017 1 次提交
  9. 25 2月, 2017 1 次提交
  10. 18 2月, 2017 1 次提交
  11. 16 2月, 2017 1 次提交
  12. 14 2月, 2017 1 次提交
  13. 26 1月, 2017 1 次提交
  14. 25 1月, 2017 6 次提交
  15. 21 1月, 2017 4 次提交
  16. 17 1月, 2017 1 次提交
  17. 04 1月, 2017 3 次提交
    • J
      tipc: reduce risk of user starvation during link congestion · 365ad353
      Jon Paul Maloy 提交于
      The socket code currently handles link congestion by either blocking
      and trying to send again when the congestion has abated, or just
      returning to the user with -EAGAIN and let him re-try later.
      
      This mechanism is prone to starvation, because the wakeup algorithm is
      non-atomic. During the time the link issues a wakeup signal, until the
      socket wakes up and re-attempts sending, other senders may have come
      in between and occupied the free buffer space in the link. This in turn
      may lead to a socket having to make many send attempts before it is
      successful. In extremely loaded systems we have observed latency times
      of several seconds before a low-priority socket is able to send out a
      message.
      
      In this commit, we simplify this mechanism and reduce the risk of the
      described scenario happening. When a message is attempted sent via a
      congested link, we now let it be added to the link's backlog queue
      anyway, thus permitting an oversubscription of one message per source
      socket. We still create a wakeup item and return an error code, hence
      instructing the sender to block or stop sending. Only when enough space
      has been freed up in the link's backlog queue do we issue a wakeup event
      that allows the sender to continue with the next message, if any.
      
      The fact that a socket now can consider a message sent even when the
      link returns a congestion code means that the sending socket code can
      be simplified. Also, since this is a good opportunity to get rid of the
      obsolete 'mtu change' condition in the three socket send functions, we
      now choose to refactor those functions completely.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      365ad353
    • J
      tipc: modify struct tipc_plist to be more versatile · 4d8642d8
      Jon Paul Maloy 提交于
      During multicast reception we currently use a simple linked list with
      push/pop semantics to store port numbers.
      
      We now see a need for a more generic list for storing values of type
      u32. We therefore make some modifications to this list, while replacing
      the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
      functions which will come to use in the next commits.
      Acked-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d8642d8
    • J
      tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions · 8c44e1af
      Jon Paul Maloy 提交于
      The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
      similar. The latter function is also called from two locations, and
      there will be more in the coming commits, which will all need to test on
      different conditions.
      
      Instead of making yet another duplicates of the function, we now
      introduce a new macro tipc_wait_for_cond() where the wakeup condition
      can be stated as an argument to the call. This macro replaces all
      current and future uses of the two functions, which can now be
      eliminated.
      Acked-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c44e1af
  18. 24 12月, 2016 1 次提交
  19. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  20. 03 12月, 2016 1 次提交
    • M
      tipc: check minimum bearer MTU · 3de81b75
      Michal Kubeček 提交于
      Qian Zhang (张谦) reported a potential socket buffer overflow in
      tipc_msg_build() which is also known as CVE-2016-8632: due to
      insufficient checks, a buffer overflow can occur if MTU is too short for
      even tipc headers. As anyone can set device MTU in a user/net namespace,
      this issue can be abused by a regular user.
      
      As agreed in the discussion on Ben Hutchings' original patch, we should
      check the MTU at the moment a bearer is attached rather than for each
      processed packet. We also need to repeat the check when bearer MTU is
      adjusted to new device MTU. UDP case also needs a check to avoid
      overflow when calculating bearer MTU.
      
      Fixes: b97bf3fd ("[TIPC] Initial merge")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reported-by: NQian Zhang (张谦) <zhangqian-c@360.cn>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3de81b75
  21. 28 11月, 2016 1 次提交
    • J
      tipc: fix link statistics counter errors · 95901122
      Jon Paul Maloy 提交于
      In commit e4bf4f76 ("tipc: simplify packet sequence number
      handling") we changed the internal representation of the packet
      sequence number counters from u32 to u16, reflecting what is really
      sent over the wire.
      
      Since then some link statistics counters have been displaying incorrect
      values, partially because the counters meant to be used as sequence
      number snapshots are now used as direct counters, stored as u32, and
      partially because some counter updates are just missing in the code.
      
      In this commit we correct this in two ways. First, we base the
      displayed packet sent/received values on direct counters instead
      of as previously a calculated difference between current sequence
      number and a snapshot. Second, we add the missing updates of the
      counters.
      
      This change is compatible with the current netlink API, and requires
      no changes to the user space tools.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95901122
  22. 26 11月, 2016 3 次提交
    • J
      tipc: resolve connection flow control compatibility problem · 6998cc6e
      Jon Paul Maloy 提交于
      In commit 10724cc7 ("tipc: redesign connection-level flow control")
      we replaced the previous message based flow control with one based on
      1k blocks. In order to ensure backwards compatibility the mechanism
      falls back to using message as base unit when it senses that the peer
      doesn't support the new algorithm. The default flow control window,
      i.e., how many units can be sent before the sender blocks and waits
      for an acknowledge (aka advertisement) is 512. This was tested against
      the previous version, which uses an acknowledge frequency of on ack per
      256 received message, and found to work fine.
      
      However, we missed the fact that versions older than Linux 3.15 use an
      acknowledge frequency of 512, which is exactly the limit where a 4.6+
      sender will stop and wait for acknowledge. This would also work fine if
      it weren't for the fact that if the first sent message on a 4.6+ server
      side is an empty SYNACK, this one is also is counted as a sent message,
      while it is not counted as a received message on a legacy 3.15-receiver.
      This leads to the sender always being one step ahead of the receiver, a
      scenario causing the sender to block after 512 sent messages, while the
      receiver only has registered 511 read messages. Hence, the legacy
      receiver is not trigged to send an acknowledge, with a permanently
      blocked sender as result.
      
      We solve this deadlock by simply allowing the sender to send one more
      message before it blocks, i.e., by a making minimal change to the
      condition used for determining connection congestion.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6998cc6e
    • J
      tipc: improve sanity check for received domain records · d876a4d2
      Jon Paul Maloy 提交于
      In commit 35c55c98 ("tipc: add neighbor monitoring framework") we
      added a data area to the link monitor STATE messages under the
      assumption that previous versions did not use any such data area.
      
      For versions older than Linux 4.3 this assumption is not correct. In
      those version, all STATE messages sent out from a node inadvertently
      contain a 16 byte data area containing a string; -a leftover from
      previous RESET messages which were using this during the setup phase.
      This string serves no purpose in STATE messages, and should no be there.
      
      Unfortunately, this data area is delivered to the link monitor
      framework, where a sanity check catches that it is not a correct domain
      record, and drops it. It also issues a rate limited warning about the
      event.
      
      Since such events occur much more frequently than anticipated, we now
      choose to remove the warning in order to not fill the kernel log with
      useless contents. We also make the sanity check stricter, to further
      reduce the risk that such data is inavertently admitted.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d876a4d2
    • J
      tipc: fix compatibility bug in link monitoring · f7967556
      Jon Paul Maloy 提交于
      commit 81729810 ("tipc: fix link priority propagation") introduced a
      compatibility problem between TIPC versions newer than Linux 4.6 and
      those older than Linux 4.4. In versions later than 4.4, link STATE
      messages only contain a non-zero link priority value when the sender
      wants the receiver to change its priority. This has the effect that the
      receiver resets itself in order to apply the new priority. This works
      well, and is consistent with the said commit.
      
      However, in versions older than 4.4 a valid link priority is present in
      all sent link STATE messages, leading to cyclic link establishment and
      reset on the 4.6+ node.
      
      We fix this by adding a test that the received value should not only
      be valid, but also differ from the current value in order to cause the
      receiving link endpoint to reset.
      Reported-by: NAmar Nv <amar.nv005@gmail.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7967556
  23. 20 11月, 2016 1 次提交