1. 28 6月, 2014 7 次提交
    • J
      tipc: introduce message evaluation function · 5a379074
      Jon Paul Maloy 提交于
      When a message arrives in a node and finds no destination
      socket, we may need to drop it, reject it, or forward it after
      a secondary destination lookup. The latter two cases currently
      results in a code path that is perceived as complex, because it
      follows a deep call chain via obscure functions such as
      net_route_named_msg() and net_route_msg().
      
      We now introduce a function, tipc_msg_eval(), that takes the
      decision about whether such a message should be rejected or
      forwarded, but leaves it to the caller to actually perform
      the indicated action.
      
      If the decision is 'reject', it is still the task of the recently
      introduced function tipc_msg_reverse() to take the final decision
      about whether the message is rejectable or not. In the latter case
      it drops the message.
      
      As a result of this change, we can finally eliminate the function
      net_route_named_msg(), and hence become independent of net_route_msg().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a379074
    • J
      tipc: separate building and sending of rejected messages · 8db1bae3
      Jon Paul Maloy 提交于
      The way we build and send rejected message is currenty perceived as
      hard to follow, partly because we let the transmission go via deep
      call chains through functions such as tipc_reject_msg() and
      net_route_msg().
      
      We want to remove those functions, and make the call sequences shallower
      and simpler. For this purpose, we separate building and sending of
      rejected messages. We build the reject message using the new function
      tipc_msg_reverse(), and let the transmission go via the newly introduced
      tipc_link_xmit2() function, as all transmission eventually will do. We
      also ensure that all calls to tipc_link_xmit2() are made outside
      port_lock/bh_lock_sock.
      
      Finally, we replace all calls to tipc_reject_msg() with the two new
      calls at all locations in the code that we want to keep. The remaining
      calls are made from code that we are planning to remove, along with
      tipc_reject_msg() itself.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8db1bae3
    • J
      tipc: introduce direct iovec to buffer chain fragmentation function · 067608e9
      Jon Paul Maloy 提交于
      Fragmentation at message sending is currently performed in two
      places in link.c, depending on whether data to be transmitted
      is delivered in the form of an iovec or as a big sk_buff. Those
      functions are also tightly entangled with the send functions
      that are using them.
      
      We now introduce a re-entrant, standalone function, tipc_msg_build2(),
      that builds a packet chain directly from an iovec. Each fragment is
      sized according to the MTU value given by the caller, and is prepended
      with a correctly built fragment header, when needed. The function is
      independent from who is calling and where the chain will be delivered,
      as long as the caller is able to indicate a correct MTU.
      
      The function is tested, but not called by anybody yet. Since it is
      incompatible with the existing tipc_msg_build(), and we cannot yet
      remove that function, we have given it a temporary name.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      067608e9
    • J
      tipc: make link mtu easily accessible from socket · 16e166b8
      Jon Paul Maloy 提交于
      Message fragmentation is currently performed at link level, inside
      the protection of node_lock. This potentially binds up the sending
      link structure for a long time, instead of letting it do other tasks,
      such as handle reception of new packets.
      
      In this commit, we make the MTUs of each active link become easily
      accessible from the socket level, i.e., without taking any spinlock
      or dereferencing the target link pointer. This way, we make it possible
      to perform fragmentation in the sending socket, before sending the
      whole fragment chain to the link for transport.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e166b8
    • J
      tipc: introduce send functions for chained buffers in link · 4f1688b2
      Jon Paul Maloy 提交于
      The current link implementation provides several different transmit
      functions, depending on the characteristics of the message to be
      sent: if it is an iovec or an sk_buff, if it needs fragmentation or
      not, if the caller holds the node_lock or not. The permutation of
      these options gives us an unwanted amount of unnecessarily complex
      code.
      
      As a first step towards simplifying the send path for all messages,
      we introduce two new send functions at link level, tipc_link_xmit2()
      and __tipc_link_xmit2(). The former looks up a link to the message
      destination, and if one is found, it grabs the node lock and calls
      the second function, which works exclusively inside the node lock
      protection. If no link is found, and the destination is on the same
      node, it delivers the message directly to the local destination
      socket.
      
      The new functions take a buffer chain where all packet headers are
      already prepared, and the correct MTU has been used. These two
      functions will later replace all other link-level transmit functions.
      
      The functions are not backwards compatible, so we have added them
      as new functions with temporary names. They are tested, but have no
      users yet. Those will be added later in this series.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f1688b2
    • J
      tipc: use negative error return values in functions · e4de5fab
      Jon Paul Maloy 提交于
      In some places, TIPC functions returns positive integers as return
      codes. This goes against standard Linux coding practice, and may
      even cause problems in some cases.
      
      We now change the return values of the functions filter_rcv()
      and filter_connect() to become signed integers, and return
      negative error codes when needed. The codes we use in these
      particular cases are still TIPC specific, since they are both
      part of the TIPC API and have no correspondence in errno.h
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4de5fab
    • J
      tipc: eliminate case of writing to freed memory · 3d09fc42
      Jon Paul Maloy 提交于
      In the function tipc_nodesub_notify() we call a function pointer
      aggregated into the object to be notified, whereafter we set
      the function pointer to NULL. However, in some cases the function
      pointed to will free the struct containing the function pointer,
      resulting in a write to already freed memory.
      
      This bug seems to always have been there, without causing any
      notable harm.
      
      In this commit we fix the problem by inverting the order of the
      zeroing and the function call.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d09fc42
  2. 12 6月, 2014 2 次提交
    • O
      net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d
      Octavian Purdila 提交于
      There are several instances where a pskb_copy or __pskb_copy is
      immediately followed by an skb_clone.
      
      Add a couple of new functions to allow the copy skb to be allocated
      from the fclone cache and thus speed up subsequent skb_clone calls.
      
      Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Marek Lindner <mareklindner@neomailbox.ch>
      Cc: Simon Wunderlich <sw@simonwunderlich.de>
      Cc: Antonio Quartulli <antonio@meshcoding.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
      Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad93e9d
    • J
      tipc: fix potential bug in function tipc_backlog_rcv · 02c00c2a
      Jon Paul Maloy 提交于
      In commit 4f4482dc ("tipc: compensate
      for double accounting in socket rcv buffer") we access 'truesize' of
      a received buffer after it might have been released by the function
      filter_rcv().
      
      In this commit we correct this by reading the value of 'truesize' to
      the stack before delivering the buffer to filter_rcv().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02c00c2a
  3. 25 5月, 2014 1 次提交
  4. 15 5月, 2014 8 次提交
    • J
      tipc: merge port message reception into socket reception function · 9816f061
      Jon Paul Maloy 提交于
      In order to reduce complexity and save a call level during message
      reception at port/socket level, we remove the function tipc_port_rcv()
      and merge its functionality into tipc_sk_rcv().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9816f061
    • J
      tipc: clean up neigbor discovery message reception · c82910e2
      Jon Paul Maloy 提交于
      The function tipc_disc_rcv(), which is handling received neighbor
      discovery messages, is perceived as messy, and it is hard to verify
      its correctness by code inspection. The fact that the task it is set
      to resolve is fairly complex does not make the situation better.
      
      In this commit we try to take a more systematic approach to the
      problem. We define a decision machine which takes three state flags
       as input, and produces three action flags as output. We then walk
      through all permutations of the state flags, and for each of them we
      describe verbally what is going on, plus that we set zero or more of
      the action flags. The action flags indicate what should be done once
      the decision machine has finished its job, while the last part of the
      function deals with performing those actions.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c82910e2
    • J
      tipc: improve and extend media address conversion functions · 38504c28
      Jon Paul Maloy 提交于
      TIPC currently handles two media specific addresses: Ethernet MAC
      addresses and InfiniBand addresses. Those are kept in three different
      formats:
      
      1) A "raw" format as obtained from the device. This format is known
         only by the media specific adapter code in eth_media.c and
         ib_media.c.
      2) A "generic" internal format, in the form of struct tipc_media_addr,
         which can be referenced and passed around by the generic media-
         unaware code.
      3) A serialized version of the latter, to be conveyed in neighbor
         discovery messages.
      
      Conversion between the three formats can only be done by the media
      specific code, so we have function pointers for this purpose in
      struct tipc_media. Here, the media adapters can install their own
      conversion functions at startup.
      
      We now introduce a new such function, 'raw2addr()', whose purpose
      is to convert from format 1 to format 2 above. We also try to as far
      as possible uniform commenting, variable names and usage of these
      functions, with the purpose of making them more comprehensible.
      
      We can now also remove the function tipc_l2_media_addr_set(), whose
      job is done better by the new function.
      
      Finally, we expand the field for serialized addresses (format 3)
      in discovery messages from 20 to 32 bytes. This is permitted
      according to the spec, and reduces the risk of problems when we
      add new media in the future.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38504c28
    • J
      tipc: rename and move message reassembly function · 37e22164
      Jon Paul Maloy 提交于
      The function tipc_link_frag_rcv() is in reality a re-entrant generic
      message reassemby function that has nothing in particular to do with
      the link, where it is defined now. This becomes obvious when we see
      the need to call the function from other places in the code.
      
      In this commit rename it to tipc_buf_append() and move it to the file
      msg.c. We also simplify its signature by moving the tail pointer to
      the control block of the head buffer, hence making the head buffer
      self-contained.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37e22164
    • J
      tipc: mark head of reassembly buffer as non-linear · 5074ab89
      Jon Paul Maloy 提交于
      The message reassembly function does not update the 'len' and 'data_len'
      fields of the head skbuff correctly when fragments are chained to it.
      This may sometimes lead to obsure errors, such as fragment reordering
      when we receive fragments which are cloned buffers.
      
      This commit fixes this, by ensuring that the two fields are updated
      correctly.
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5074ab89
    • J
      tipc: don't record link RESET or ACTIVATE messages as traffic · ec37dcd3
      Jon Paul Maloy 提交于
      In the current code, all incoming LINK_PROTOCOL messages, irrespective
      of type, nudge the "last message received" checkpoint, informing the
      link state machine that a message was received from the peer since last
      supervision timeout event. This inhibits the link from starting probing
      the peer unnecessarily.
      
      However, not only STATE messages are recorded as legitimate incoming
      traffic this way, but even RESET and ACTIVATE messages, which in
      reality are there to inform the link that the peer endpoint has been
      reset. At the same time, some RESET messages may be dropped instead
      of causing a link reset. This happens when the link endpoint thinks
      it is fully up and working, and the session number of the RESET is
      lower than or equal to the current link session. In such cases the
      RESET is perceived as a delayed remnant from an earlier session, or
      the current one, and dropped.
      
      Now, if a TIPC module is removed and then immediately reinserted, e.g.
      when using a script, RESET messages may arrive at the peer link endpoint
      before this one has had time to discover the failure. The RESET may be
      dropped because of the session number, but only after it has been
      recorded as a legitimate traffic event. Hence, the receiving link will
      not start probing, and not discover that the peer endpoint is down, at
      the same time ignoring the periodic RESET messages coming from that
      endpoint. We have ended up in a stale state where a failed link cannot
      be re-established.
      
      In this commit, we remedy this by nudging the checkpoint only for
      received STATE messages, not for RESET or ACTIVATE messages.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec37dcd3
    • J
      tipc: compensate for double accounting in socket rcv buffer · 4f4482dc
      Jon Paul Maloy 提交于
      The function net/core/sock.c::__release_sock() runs a tight loop
      to move buffers from the socket backlog queue to the receive queue.
      
      As a security measure, sk_backlog.len of the receiving socket
      is not set to zero until after the loop is finished, i.e., until
      the whole backlog queue has been transferred to the receive queue.
      During this transfer, the data that has already been moved is counted
      both in the backlog queue and the receive queue, hence giving an
      incorrect picture of the available queue space for new arriving buffers.
      
      This leads to unnecessary rejection of buffers by sk_add_backlog(),
      which in TIPC leads to unnecessarily broken connections.
      
      In this commit, we compensate for this double accounting by adding
      a counter that keeps track of it. The function socket.c::backlog_rcv()
      receives buffers one by one from __release_sock(), and adds them to the
      socket receive queue. If the transfer is successful, it increases a new
      atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
      transferred buffer. If a new buffer arrives during this transfer and
      finds the socket busy (owned), we attempt to add it to the backlog.
      However, when sk_add_backlog() is called, we adjust the 'limit'
      parameter with the value of the new counter, so that the risk of
      inadvertent rejection is eliminated.
      
      It should be noted that this change does not invalidate the original
      purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
      upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
      doesn't respect the send window) keeps pumping in buffers to
      sk_add_backlog(), he will eventually reach an upper limit,
      (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
      to the backlog, and the connection will be broken. Ordinary, well-
      behaved senders will never reach this buffer limit at all.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f4482dc
    • J
      tipc: decrease connection flow control window · 6163a194
      Jon Paul Maloy 提交于
      Memory overhead when allocating big buffers for data transfer may
      be quite significant. E.g., truesize of a 64 KB buffer turns out
      to be 132 KB, 2 x the requested size.
      
      This invalidates the "worst case" calculation we have been
      using to determine the default socket receive buffer limit,
      which is based on the assumption that 1024x64KB = 67MB buffers
      may be queued up on a socket.
      
      Since TIPC connections cannot survive hitting the buffer limit,
      we have to compensate for this overhead.
      
      We do that in this commit by dividing the fix connection flow
      control window from 1024 (2*512) messages to 512 (2*256). Since
      older version nodes send out acks at 512 message intervals,
      compatibility with such nodes is guaranteed, although performance
      may be non-optimal in such cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6163a194
  5. 09 5月, 2014 2 次提交
  6. 06 5月, 2014 10 次提交
  7. 01 5月, 2014 1 次提交
  8. 29 4月, 2014 2 次提交
  9. 28 4月, 2014 1 次提交
  10. 27 4月, 2014 2 次提交
  11. 25 4月, 2014 1 次提交
  12. 23 4月, 2014 3 次提交