1. 13 10月, 2017 16 次提交
    • J
      tipc: guarantee delivery of UP event before first broadcast · 399574d4
      Jon Maloy 提交于
      The following scenario is possible:
      - A user joins a group, and immediately sends out a broadcast message
        to its members.
      - The broadcast message, following a different data path than the
        initial JOIN message sent out during the joining procedure, arrives
        to a receiver before the latter..
      - The receiver drops the message, since it is not ready to accept any
        messages until the JOIN has arrived.
      
      We avoid this by treating group protocol JOIN messages like unicast
      messages.
      - We let them pass through the recipient's multicast input queue, just
        like ordinary unicasts.
      - We force the first following broadacst to be sent as replicated
        unicast and being acknowledged by the recipient before accepting
        any more broadcast transmissions.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      399574d4
    • J
      tipc: guarantee that group broadcast doesn't bypass group unicast · 2f487712
      Jon Maloy 提交于
      We need a mechanism guaranteeing that group unicasts sent out from a
      socket are not bypassed by later sent broadcasts from the same socket.
      We do this as follows:
      
      - Each time a unicast is sent, we set a the broadcast method for the
        socket to "replicast" and "mandatory". This forces the first
        subsequent broadcast message to follow the same network and data path
        as the preceding unicast to a destination, hence preventing it from
        overtaking the latter.
      
      - In order to make the 'same data path' statement above true, we let
        group unicasts pass through the multicast link input queue, instead
        of as previously through the unicast link input queue.
      
      - In the first broadcast following a unicast, we set a new header flag,
        requiring all recipients to immediately acknowledge its reception.
      
      - During the period before all the expected acknowledges are received,
        the socket refuses to accept any more broadcast attempts, i.e., by
        blocking or returning EAGAIN. This period should typically not be
        longer than a few microseconds.
      
      - When all acknowledges have been received, the sending socket will
        open up for subsequent broadcasts, this time giving the link layer
        freedom to itself select the best transmission method.
      
      - The forced and/or abrupt transmission method changes described above
        may lead to broadcasts arriving out of order to the recipients. We
        remedy this by introducing code that checks and if necessary
        re-orders such messages at the receiving end.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f487712
    • J
      tipc: guarantee group unicast doesn't bypass group broadcast · b87a5ea3
      Jon Maloy 提交于
      Group unicast messages don't follow the same path as broadcast messages,
      and there is a high risk that unicasts sent from a socket might bypass
      previously sent broadcasts from the same socket.
      
      We fix this by letting all unicast messages carry the sequence number of
      the next sent broadcast from the same node, but without updating this
      number at the receiver. This way, a receiver can check and if necessary
      re-order such messages before they are added to the socket receive buffer.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87a5ea3
    • J
      tipc: introduce group multicast messaging · 5b8dddb6
      Jon Maloy 提交于
      The previously introduced message transport to all group members is
      based on the tipc multicast service, but is logically a broadcast
      service within the group, and that is what we call it.
      
      We now add functionality for sending messages to all group members
      having a certain identity. Correspondingly, we call this feature 'group
      multicast'. The service is using unicast when only one destination is
      found, otherwise it will use the bearer broadcast service to transfer
      the messages. In the latter case, the receiving members filter arriving
      messages by looking at the intended destination instance. If there is
      no match, the message will be dropped, while still being considered
      received and read as seen by the flow control mechanism.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b8dddb6
    • J
      tipc: introduce group anycast messaging · ee106d7f
      Jon Maloy 提交于
      In this commit, we make it possible to send connectionless unicast
      messages to any member corresponding to the given member identity,
      when there is more than one such member. The sender must use a
      TIPC_ADDR_NAME address to achieve this effect.
      
      We also perform load balancing between the destinations, i.e., we
      primarily select one which has advertised sufficient send window
      to not cause a block/EAGAIN delay, if any. This mechanism is
      overlayed on the always present round-robin selection.
      
      Anycast messages are subject to the same start synchronization
      and flow control mechanism as group broadcast messages.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee106d7f
    • J
      tipc: introduce group unicast messaging · 27bd9ec0
      Jon Maloy 提交于
      We now make it possible to send connectionless unicast messages
      within a communication group. To send a message, the sender can use
      either a direct port address, aka port identity, or an indirect port
      name to be looked up.
      
      This type of messages are subject to the same start synchronization
      and flow control mechanism as group broadcast messages.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27bd9ec0
    • J
      tipc: introduce flow control for group broadcast messages · b7d42635
      Jon Maloy 提交于
      We introduce an end-to-end flow control mechanism for group broadcast
      messages. This ensures that no messages are ever lost because of
      destination receive buffer overflow, with minimal impact on performance.
      For now, the algorithm is based on the assumption that there is only one
      active transmitter at any moment in time.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7d42635
    • J
      tipc: receive group membership events via member socket · ae236fb2
      Jon Maloy 提交于
      Like with any other service, group members' availability can be
      subscribed for by connecting to be topology server. However, because
      the events arrive via a different socket than the member socket, there
      is a real risk that membership events my arrive out of synch with the
      actual JOIN/LEAVE action. I.e., it is possible to receive the first
      messages from a new member before the corresponding JOIN event arrives,
      just as it is possible to receive the last messages from a leaving
      member after the LEAVE event has already been received.
      
      Since each member socket is internally also subscribing for membership
      events, we now fix this problem by passing those events on to the user
      via the member socket. We leverage the already present member synch-
      ronization protocol to guarantee correct message/event order. An event
      is delivered to the user as an empty message where the two source
      addresses identify the new/lost member. Furthermore, we set the MSG_OOB
      bit in the message flags to mark it as an event. If the event is an
      indication about a member loss we also set the MSG_EOR bit, so it can
      be distinguished from a member addition event.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae236fb2
    • J
      tipc: add second source address to recvmsg()/recvfrom() · 31c82a2d
      Jon Maloy 提交于
      With group communication, it becomes important for a message receiver to
      identify not only from which socket (identfied by a node:port tuple) the
      message was sent, but also the logical identity (type:instance) of the
      sending member.
      
      We fix this by adding a second instance of struct sockaddr_tipc to the
      source address area when a message is read. The extra address struct
      is filled in with data found in the received message header (type,) and
      in the local member representation struct (instance.)
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31c82a2d
    • J
      tipc: introduce communication groups · 75da2163
      Jon Maloy 提交于
      As a preparation for introducing flow control for multicast and datagram
      messaging we need a more strictly defined framework than we have now. A
      socket must be able keep track of exactly how many and which other
      sockets it is allowed to communicate with at any moment, and keep the
      necessary state for those.
      
      We therefore introduce a new concept we have named Communication Group.
      Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
      The call takes four parameters: 'type' serves as group identifier,
      'instance' serves as an logical member identifier, and 'scope' indicates
      the visibility of the group (node/cluster/zone). Finally, 'flags' makes
      it possible to set certain properties for the member. For now, there is
      only one flag, indicating if the creator of the socket wants to receive
      a copy of broadcast or multicast messages it is sending via the socket,
      and if wants to be eligible as destination for its own anycasts.
      
      A group is closed, i.e., sockets which have not joined a group will
      not be able to send messages to or receive messages from members of
      the group, and vice versa.
      
      Any member of a group can send multicast ('group broadcast') messages
      to all group members, optionally including itself, using the primitive
      send(). The messages are received via the recvmsg() primitive. A socket
      can only be member of one group at a time.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75da2163
    • J
      tipc: improve destination linked list · a80ae530
      Jon Maloy 提交于
      We often see a need for a linked list of destination identities,
      sometimes containing a port number, sometimes a node identity, and
      sometimes both. The currently defined struct u32_list is not generic
      enough to cover all cases, so we extend it to contain two u32 integers
      and rename it to struct tipc_dest_list.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a80ae530
    • J
      tipc: add new function for sending multiple small messages · f70d37b7
      Jon Maloy 提交于
      We see an increasing need to send multiple single-buffer messages
      of TIPC_SYSTEM_IMPORTANCE to different individual destination nodes.
      Instead of looping over the send queue and sending each buffer
      individually, as we do now, we add a new help function
      tipc_node_distr_xmit() to do this.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f70d37b7
    • J
      tipc: refactor function filter_rcv() · 64ac5f59
      Jon Maloy 提交于
      In the following commits we will need to handle multiple incoming and
      rejected/returned buffers in the function socket.c::filter_rcv().
      As a preparation for this, we generalize the function by handling
      buffer queues instead of individual buffers. We also introduce a
      help function tipc_skb_reject(), and rename filter_rcv() to
      tipc_sk_filter_rcv() in line with other functions in socket.c.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64ac5f59
    • J
      tipc: add ability to obtain node availability status from other files · 38077b8e
      Jon Maloy 提交于
      In the coming commits, functions at the socket level will need the
      ability to read the availability status of a given node. We therefore
      introduce a new function for this purpose, while renaming the existing
      static function currently having the wanted name.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38077b8e
    • J
      tipc: improve address sanity check in tipc_connect() · 23998835
      Jon Maloy 提交于
      The address given to tipc_connect() is not completely sanity checked,
      under the assumption that this will be done later in the function
      __tipc_sendmsg() when the address is used there.
      
      However, the latter functon will in the next commits serve as caller
      to several other send functions, so we want to move the corresponding
      sanity check there to the beginning of that function, before we possibly
      need to grab the address stored by tipc_connect(). We must therefore
      be able to trust that this address already has been thoroughly checked.
      
      We do this in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23998835
    • J
      tipc: add ability to order and receive topology events in driver · 14c04493
      Jon Maloy 提交于
      As preparation for introducing communication groups, we add the ability
      to issue topology subscriptions and receive topology events from kernel
      space. This will make it possible for group member sockets to keep track
      of other group members.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14c04493
  2. 09 10月, 2017 2 次提交
    • J
      tipc: Unclone message at secondary destination lookup · a9e2971b
      Jon Maloy 提交于
      When a bundling message is received, the function tipc_link_input()
      calls function tipc_msg_extract() to unbundle all inner messages of
      the bundling message before adding them to input queue.
      
      The function tipc_msg_extract() just clones all inner skb for all
      inner messagges from the bundling skb. This means that the skb
      headroom of an inner message overlaps with the data part of the
      preceding message in the bundle.
      
      If the message in question is a name addressed message, it may be
      subject to a secondary destination lookup, and eventually be sent out
      on one of the interfaces again. But, since what is perceived as headroom
      by the device driver in reality is the last bytes of the preceding
      message in the bundle, the latter will be overwritten by the MAC
      addresses of the L2 header. If the preceding message has not yet been
      consumed by the user, it will evenually be delivered with corrupted
      contents.
      
      This commit fixes this by uncloning all messages passing through the
      function tipc_msg_lookup_dest(), hence ensuring that the headroom
      is always valid when the message is passed on.
      Signed-off-by: NTung Nguyen <tung.q.nguyen@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9e2971b
    • J
      tipc: correct initialization of skb list · 3382605f
      Jon Maloy 提交于
      We change the initialization of the skb transmit buffer queues
      in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also
      initialize their spinlocks. This is needed because we may, during
      error conditions, need to call skb_queue_purge() on those queues
      further down the stack.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3382605f
  3. 01 10月, 2017 1 次提交
  4. 07 9月, 2017 1 次提交
  5. 30 8月, 2017 1 次提交
  6. 25 8月, 2017 4 次提交
  7. 24 8月, 2017 1 次提交
  8. 23 8月, 2017 2 次提交
    • Y
      tipc: fix a race condition of releasing subscriber object · fd849b7c
      Ying Xue 提交于
      No matter whether a request is inserted into workqueue as a work item
      to cancel a subscription or to delete a subscription's subscriber
      asynchronously, the work items may be executed in different workers.
      As a result, it doesn't mean that one request which is raised prior to
      another request is definitely handled before the latter. By contrast,
      if the latter request is executed before the former request, below
      error may happen:
      
      [  656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117
      [  656.184487] general protection fault: 0000 [#1] SMP
      [  656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel]
      [  656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6
      [  656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [  656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000
      [  656.190157] RIP: 0010:spin_bug+0xdd/0xf0
      [  656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202
      [  656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000
      [  656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08
      [  656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b
      [  656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b
      [  656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0
      [  656.196101] FS:  0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
      [  656.197172] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0
      [  656.198873] Call Trace:
      [  656.199210]  do_raw_spin_lock+0x66/0xa0
      [  656.199735]  _raw_spin_lock_bh+0x19/0x20
      [  656.200258]  tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc]
      [  656.200990]  tipc_subscrb_rcv_cb+0x45/0x260 [tipc]
      [  656.201632]  tipc_receive_from_sock+0xaf/0x100 [tipc]
      [  656.202299]  tipc_recv_work+0x2b/0x60 [tipc]
      [  656.202872]  process_one_work+0x157/0x420
      [  656.203404]  worker_thread+0x69/0x4c0
      [  656.203898]  kthread+0x138/0x170
      [  656.204328]  ? process_one_work+0x420/0x420
      [  656.204889]  ? kthread_create_on_node+0x40/0x40
      [  656.205527]  ret_from_fork+0x29/0x40
      [  656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 <45> 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f
      [  656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8
      [  656.209798] ---[ end trace e2a800e6eb0770be ]---
      
      In above scenario, the request of deleting subscriber was performed
      earlier than the request of canceling a subscription although the
      latter was issued before the former, which means tipc_subscrb_delete()
      was called before tipc_subscrp_cancel(). As a result, when
      tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was
      executed to cancel a subscription, the subscription's subscriber
      refcnt had been decreased to 1. After tipc_subscrp_delete() where
      the subscriber was freed because its refcnt was decremented to zero,
      but the subscriber's lock had to be released, as a consequence, panic
      happened.
      
      By contrast, if we increase subscriber's refcnt before
      tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(),
      the panic issue can be avoided.
      
      Fixes: d094c4d5 ("tipc: add subscription refcount to avoid invalid delete")
      Reported-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd849b7c
    • P
      tipc: remove subscription references only for pending timers · 458be024
      Parthasarathy Bhuvaragan 提交于
      In commit, 139bb36f ("tipc: advance the time of deleting
      subscription from subscriber->subscrp_list"), we delete the
      subscription from the subscribers list and from nametable
      unconditionally. This leads to the following bug if the timer
      running tipc_subscrp_timeout() in another CPU accesses the
      subscription list after the subscription delete request.
      
      [39.570] general protection fault: 0000 [#1] SMP
      ::
      [39.574] task: ffffffff81c10540 task.stack: ffffffff81c00000
      [39.575] RIP: 0010:tipc_subscrp_timeout+0x32/0x80 [tipc]
      [39.576] RSP: 0018:ffff88003ba03e90 EFLAGS: 00010282
      [39.576] RAX: dead000000000200 RBX: ffff88003f0f3600 RCX: 0000000000000101
      [39.577] RDX: dead000000000100 RSI: 0000000000000201 RDI: ffff88003f0d7948
      [39.578] RBP: ffff88003ba03ea0 R08: 0000000000000001 R09: ffff88003ba03ef8
      [39.579] R10: 000000000000014f R11: 0000000000000000 R12: ffff88003f0d7948
      [39.580] R13: ffff88003f0f3618 R14: ffffffffa006c250 R15: ffff88003f0f3600
      [39.581] FS:  0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
      [39.582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [39.583] CR2: 00007f831c6e0714 CR3: 000000003d3b0000 CR4: 00000000000006f0
      [39.584] Call Trace:
      [39.584]  <IRQ>
      [39.585]  call_timer_fn+0x3d/0x180
      [39.585]  ? tipc_subscrb_rcv_cb+0x260/0x260 [tipc]
      [39.586]  run_timer_softirq+0x168/0x1f0
      [39.586]  ? sched_clock_cpu+0x16/0xc0
      [39.587]  __do_softirq+0x9b/0x2de
      [39.587]  irq_exit+0x60/0x70
      [39.588]  smp_apic_timer_interrupt+0x3d/0x50
      [39.588]  apic_timer_interrupt+0x86/0x90
      [39.589] RIP: 0010:default_idle+0x20/0xf0
      [39.589] RSP: 0018:ffffffff81c03e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
      [39.590] RAX: 0000000000000000 RBX: ffffffff81c10540 RCX: 0000000000000000
      [39.591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [39.592] RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000000
      [39.593] R10: ffffc90001cbbe00 R11: 0000000000000000 R12: 0000000000000000
      [39.594] R13: ffffffff81c10540 R14: 0000000000000000 R15: 0000000000000000
      [39.595]  </IRQ>
      ::
      [39.603] RIP: tipc_subscrp_timeout+0x32/0x80 [tipc] RSP: ffff88003ba03e90
      [39.604] ---[ end trace 79ce94b7216cb459 ]---
      
      Fixes: 139bb36f ("tipc: advance the time of deleting subscription from subscriber->subscrp_list")
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      458be024
  9. 22 8月, 2017 1 次提交
    • J
      tipc: don't reset stale broadcast send link · 40501f90
      Jon Paul Maloy 提交于
      When the broadcast send link after 100 attempts has failed to
      transfer a packet to all peers, we consider it stale, and reset
      it. Thereafter it needs to re-synchronize with the peers, something
      currently done by just resetting and re-establishing all links to
      all peers. This has turned out to be overkill, with potentially
      unwanted consequences for the remaining cluster.
      
      A closer analysis reveals that this can be done much simpler. When
      this kind of failure happens, for reasons that may lie outside the
      TIPC protocol, it is typically only one peer which is failing to
      receive and acknowledge packets. It is hence sufficient to identify
      and reset the links only to that peer to resolve the situation, without
      having to reset the broadcast link at all. This solution entails a much
      lower risk of negative consequences for the own node as well as for
      the overall cluster.
      
      We implement this change in this commit.
      Reviewed-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40501f90
  10. 19 8月, 2017 1 次提交
    • E
      tipc: fix use-after-free · 5bfd37b4
      Eric Dumazet 提交于
      syszkaller reported use-after-free in tipc [1]
      
      When msg->rep skb is freed, set the pointer to NULL,
      so that caller does not free it again.
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
      Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
      
      CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x24e/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       skb_push+0xd4/0xe0 net/core/skbuff.c:1466
       tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512e9
      RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
      RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
      R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
       __alloc_skb+0xf1/0x740 net/core/skbuff.c:219
       alloc_skb include/linux/skbuff.h:903 [inline]
       tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
       tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3763
       kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
       __kfree_skb net/core/skbuff.c:682 [inline]
       kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
       tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801c6e71dc0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 208 bytes inside of
       224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
      The buggy address belongs to the page:
      page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
      raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                               ^
       ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov  <dvyukov@google.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bfd37b4
  11. 15 8月, 2017 2 次提交
    • J
      tipc: avoid inheriting msg_non_seq flag when message is returned · 59a361bc
      Jon Paul Maloy 提交于
      In the function msg_reverse(), we reverse the header while trying to
      reuse the original buffer whenever possible. Those rejected/returned
      messages are always transmitted as unicast, but the msg_non_seq field
      is not explicitly set to zero as it should be.
      
      We have seen cases where multicast senders set the message type to
      "NOT dest_droppable", meaning that a multicast message shorter than
      one MTU will be returned, e.g., during receive buffer overflow, by
      reusing the original buffer. This has the effect that even the
      'msg_non_seq' field is inadvertently inherited by the rejected message,
      although it is now sent as a unicast message. This again leads the
      receiving unicast link endpoint to steer the packet toward the broadcast
      link receive function, where it is dropped. The affected unicast link is
      thereafter (after 100 failed retransmissions) declared 'stale' and
      reset.
      
      We fix this by unconditionally setting the 'msg_non_seq' flag to zero
      for all rejected/returned messages.
      Reported-by: NCanh Duc Luu <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59a361bc
    • J
      tipc: accept PACKET_MULTICAST packets · fed5f571
      Jon Paul Maloy 提交于
      On L2 bearers, the TIPC broadcast function is sending out packets using
      the corresponding L2 broadcast address. At reception, we filter such
      packets under the assumption that they will also be delivered as
      broadcast packets.
      
      This assumption doesn't always hold true. Under high load, we have seen
      that a switch may convert the destination address and deliver the packet
      as a PACKET_MULTICAST, something leading to inadvertently dropped
      packets and a stale and reset broadcast link.
      
      We fix this by extending the reception filtering to accept packets of
      type PACKET_MULTICAST.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fed5f571
  12. 10 8月, 2017 1 次提交
    • J
      tipc: remove premature ESTABLISH FSM event at link synchronization · ed43594a
      Jon Paul Maloy 提交于
      When a link between two nodes come up, both endpoints will initially
      send out a STATE message to the peer, to increase the probability that
      the peer endpoint also is up when the first traffic message arrives.
      Thereafter, if the establishing link is the second link between two
      nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
      helping the peer to perform initial synchronization between the two
      links.
      
      However, the initial STATE message may be lost, in which case the SYNCH
      message will be the first one arriving at the peer. This should also
      work, as the SYNCH message itself will be used to take up the link
      endpoint before  initializing synchronization.
      
      Unfortunately the code for this case is broken. Currently, the link is
      brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
      arrives, whereupon __tipc_node_link_up() is called to distribute the
      link slots and take the link into traffic. But, __tipc_node_link_up() is
      itself starting with a test for whether the link is up, and if true,
      returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
      is unnecessary, since tipc_node_link_up() is itself issuing such an
      event, but also harmful, since it inhibits tipc_node_link_up() to
      perform the test of its tasks, and the link endpoint in question hence
      is never taken into traffic.
      
      This problem has been exposed when we set up dual links between pre-
      and post-4.4 kernels, because the former ones don't send out the
      initial STATE message described above.
      
      We fix this by removing the unnecessary event call.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed43594a
  13. 01 7月, 2017 1 次提交
  14. 11 6月, 2017 1 次提交
  15. 12 5月, 2017 1 次提交
    • J
      tipc: make macro tipc_wait_for_cond() smp safe · 844cf763
      Jon Paul Maloy 提交于
      The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
      to fulfil its task. The latter, in turn, is evaluating the stated
      condition outside the socket lock context. This is problematic if
      the condition is accessing non-trivial data structures which may be
      altered by incoming interrupts, as is the case with the cong_links()
      linked list, used by socket to keep track of the current set of
      congested links. We sometimes see crashes when this list is accessed
      by a condition function at the same time as a SOCK_WAKEUP interrupt
      is removing an element from the list.
      
      We fix this by expanding selected parts of sk_wait_event() into the
      outer macro, while ensuring that all evaluations of a given condition
      are performed under socket lock protection.
      
      Fixes: commit 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reviewed-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      844cf763
  16. 03 5月, 2017 2 次提交
  17. 29 4月, 2017 2 次提交