1. 09 2月, 2012 3 次提交
    • E
      ipv6: Implement IPV6_UNICAST_IF socket option. · c4062dfc
      Erich E. Hoover 提交于
      The IPV6_UNICAST_IF feature is the IPv6 compliment to IP_UNICAST_IF.
      Signed-off-by: NErich E. Hoover <ehoover@mines.edu>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4062dfc
    • E
      ipv4: Implement IP_UNICAST_IF socket option. · 76e21053
      Erich E. Hoover 提交于
      The IP_UNICAST_IF feature is needed by the Wine project.  This patch
      implements the feature by setting the outgoing interface in a similar
      fashion to that of IP_MULTICAST_IF.  A separate option is needed to
      handle this feature since the existing options do not provide all of
      the characteristics required by IP_UNICAST_IF, a summary is provided
      below.
      
      SO_BINDTODEVICE:
      * SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
      does not.  From reading some old mailing list articles my
      understanding is that SO_BINDTODEVICE requires administrative
      privileges because it can override the administrator's routing
      settings.
      * The SO_BINDTODEVICE option restricts both outbound and inbound
      traffic, IP_UNICAST_IF only impacts outbound traffic.
      
      IP_PKTINFO:
      * Since IP_PKTINFO and IP_UNICAST_IF are independent options,
      implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
      applications.
      * Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
      complicates the Wine codebase and reduces the socket performance
      (doing this requires a lot of extra communication between the
      "server" and "user" layers).
      
      bind():
      * bind() does not work on broadcast packets, IP_UNICAST_IF is
      specifically intended to work with broadcast packets.
      * Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
      traffic.
      Signed-off-by: NErich E. Hoover <ehoover@mines.edu>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e21053
    • E
      gro: more generic L2 header check · 43480aec
      Eric Dumazet 提交于
      Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
      only, and failed on IB/ipoib traffic.
      
      He provided a patch faking a zeroed header to let GRO aggregates frames.
      
      Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
      check to be more generic, ie not assuming L2 header is 14 bytes, but
      taking into account hard_header_len.
      
      __napi_gro_receive() has special handling for the common case (Ethernet)
      to avoid a memcmp() call and use an inline optimized function instead.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NShlomo Pongratz <shlomop@mellanox.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43480aec
  2. 08 2月, 2012 2 次提交
    • D
      caif: remove duplicate initialization · af2ce213
      Dan Carpenter 提交于
      "priv" is initialized twice.  I kept the second one, because it is next
      to the check for NULL.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af2ce213
    • S
      net/sched: sch_plug - Queue traffic until an explicit release command · c3059be1
      Shriram Rajagopalan 提交于
      The qdisc supports two operations - plug and unplug. When the
      qdisc receives a plug command via netlink request, packets arriving
      henceforth are buffered until a corresponding unplug command is received.
      Depending on the type of unplug command, the queue can be unplugged
      indefinitely or selectively.
      
      This qdisc can be used to implement output buffering, an essential
      functionality required for consistent recovery in checkpoint based
      fault-tolerance systems. Output buffering enables speculative execution
      by allowing generated network traffic to be rolled back. It is used to
      provide network protection for Xen Guests in the Remus high availability
      project, available as part of Xen.
      
      This module is generic enough to be used by any other system that wishes
      to add speculative execution and output buffering to its applications.
      
      This module was originally available in the linux 2.6.32 PV-OPS tree,
      used as dom0 for Xen.
      
      For more information, please refer to http://nss.cs.ubc.ca/remus/
      and http://wiki.xensource.com/xenwiki/Remus
      
      Changes in V3:
        * Removed debug output (printk) on queue overflow
        * Added TCQ_PLUG_RELEASE_INDEFINITE - that allows the user to
          use this qdisc, for simple plug/unplug operations.
        * Use of packet counts instead of pointers to keep track of
          the buffers in the queue.
      Signed-off-by: NShriram Rajagopalan <rshriram@cs.ubc.ca>
      Signed-off-by: NBrendan Cully <brendan@cs.ubc.ca>
      [author of the code in the linux 2.6.32 pvops tree]
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3059be1
  3. 07 2月, 2012 16 次提交
    • A
      tipc: Minor optimization to rejection of connection-based messages · dff10e9e
      Allan Stephens 提交于
      Modifies message rejection logic so that TIPC doesn't attempt to
      send a FIN message to the rejecting port if it is known in advance
      that there is no such message because the rejecting port doesn't exist.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      dff10e9e
    • A
      tipc: Eliminate alteration of publication key during name table purging · 3175bd9a
      Allan Stephens 提交于
      Removes code that alters the publication key of a name table entry
      that is being forcibly purged from TIPC's name table after contact
      with the publishing node has been lost.
      
      Current TIPC ensures that all defunct names are purged before
      re-establishing contact with a failed node.  There used to be a risk
      that the publication might be accidentally deleted because it might be
      re-added to the name table before the purge operation was completed.
      But now there is no longer a need to ensure that the new key is different
      than the old one.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      3175bd9a
    • A
      tipc: Prevent loss of fragmented messages over broadcast link · 63e7f1ac
      Allan Stephens 提交于
      Modifies broadcast link so that an incoming fragmented message is not
      lost if reassembly cannot begin because there currently is no buffer
      big enough to hold the entire reassembled message. The broadcast link
      now ignores the first fragment completely, which causes the sending node
      to retransmit the first fragment so that reassembly can be re-attempted.
      
      Previously, the sender would have had no reason to retransmit the 1st
      fragment, so we would never have a chance to re-try the allocation.
      
      To do this cleanly without duplicaton, a new bclink_accept_pkt()
      function is introduced.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      63e7f1ac
    • A
      tipc: Prevent loss of fragmented messages over unicast links · b76b27ca
      Allan Stephens 提交于
      Modifies unicast link endpoint logic so an incoming fragmented message
      is not lost if reassembly cannot begin because there is no buffer big
      enough to hold the entire reassembled message. The link endpoint now
      ignores the first fragment completely, which causes the sending node to
      retransmit the first fragment so that reassembly can be re-attempted.
      
      Previously, the sender would have had no reason to retransmit the 1st
      fragment, so we would never have a chance to re-try the allocation.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      b76b27ca
    • A
      tipc: Remove obsolete broadcast tag capability · 1ec2bb08
      Allan Stephens 提交于
      Eliminates support for the broadcast tag field, which is no longer
      used by broadcast link NACK messages.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      1ec2bb08
    • A
      tipc: Major redesign of broadcast link ACK/NACK algorithms · 7a54d4a9
      Allan Stephens 提交于
      Completely redesigns broadcast link ACK and NACK mechanisms to prevent
      spurious retransmit requests in dual LAN networks, and to prevent the
      broadcast link from stalling due to the failure of a receiving node to
      acknowledge receiving a broadcast message or request its retransmission.
      
      Note: These changes only impact the timing of when ACK and NACK messages
      are sent, and not the basic broadcast link protocol itself, so inter-
      operability with nodes using the "classic" algorithms is maintained.
      
      The revised algorithms are as follows:
      
      1) An explicit ACK message is still sent after receiving 16 in-sequence
      messages, and implicit ACK information continues to be carried in other
      unicast link message headers (including link state messages).  However,
      the timing of explicit ACKs is now based on the receiving node's absolute
      network address rather than its relative network address to ensure that
      the failure of another node does not delay the ACK beyond its 16 message
      target.
      
      2) A NACK message is now typically sent only when a message gap persists
      for two consecutive incoming link state messages; this ensures that a
      suspected gap is not confirmed until both LANs in a dual LAN network have
      had an opportunity to deliver the message, thereby preventing spurious NACKs.
      A NACK message can also be generated by the arrival of a single link state
      message, if the deferred queue is so big that the current message gap
      cannot be the result of "normal" mis-ordering due to the use of dual LANs
      (or one LAN using a bonded interface). Since link state messages typically
      arrive at different nodes at different times the problem of multiple nodes
      issuing identical NACKs simultaneously is inherently avoided.
      
      3) Nodes continue to "peek" at NACK messages sent by other nodes. If
      another node requests retransmission of a message gap suspected (but not
      yet confirmed) by the peeking node, the peeking node forgets about the
      gap and does not generate a duplicate retransmit request. (If the peeking
      node subsequently fails to receive the lost message, later link state
      messages will cause it to rediscover and confirm the gap and send another
      NACK.)
      
      4) Message gap "equality" is now determined by the start of the gap only.
      This is sufficient to deal with the most common cases of message loss,
      and eliminates the need for complex end of gap computations.
      
      5) A peeking node no longer tries to determine whether it should send a
      complementary NACK, since the most common cases of message loss don't
      require it to be sent. Consequently, the node no longer examines the
      "broadcast tag" field of a NACK message when peeking.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      7a54d4a9
    • A
      tipc: Add missing locks in broadcast link statistics accumulation · b98158e3
      Allan Stephens 提交于
      Ensures that all attempts to update broadcast link statistics are done
      only while holding the lock that protects the link's main data structures,
      to prevent interference by simultaneous updates caused by messages
      arriving on other interfaces.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      b98158e3
    • A
      tipc: Fix bug in broadcast link duplicate message statistics · 0232c5a5
      Allan Stephens 提交于
      Modifies broadcast link so that it increments the "received duplicate
      message" count if an incoming message cannot be added to the deferred
      message queue because it is already present in the queue. (The aligns
      broadcast link behavior with that of TIPC's unicast links.)
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0232c5a5
    • A
      tipc: Fix node lock reclamation issues in broadcast link reception · 8a275a6a
      Allan Stephens 提交于
      Fixes a pair of problems in broadcast link message reception code
      relating to the reclamation of the node lock after consuming an
      in-sequence message.
      
      1) Now retests to see if the sending node is still up after reclaiming
         the node lock, and bails out if it is non-operational.
      
      2) Now manipulates the node's deferred message queue only after
         reclaiming the node lock, rather than using queue head pointer
         information that was cached previously.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      8a275a6a
    • A
      tipc: Add missing broadcast link lock when sending NACK · 57732560
      Allan Stephens 提交于
      Ensures that any attempt to send a NACK message over TIPC's broadcast
      link has exclusive access to the link's main data structures, to prevent
      interference with a simultaneous attempt to send other broadcast link
      traffic (such as application-generated multicast messages).
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      57732560
    • A
      tipc: Fix problem with broadcast link synchronization between nodes · 47361c87
      Allan Stephens 提交于
      Corrects a problem in which a link endpoint that activates as the
      result of receiving a RESET/STATE sequence of link protocol messages
      fails to properly record the broadcast link status information about
      the node to which it is now communicating with. (The problem does
      not occur with the more common RESET/ACTIVATE sequence of messages.)
      The fix ensures that the broadcast link status info is updated after
      the RESET message resets the link endpoint, rather than before, thereby
      preventing new information from being overwritten by the reset operation.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      47361c87
    • A
      tipc: Ensure broadcast link re-acquires node after link failure · 93499313
      Allan Stephens 提交于
      Fix a bug that can prevent TIPC from sending broadcast messages to a node
      if contact with the node is lost and then regained. The problem occurs if
      the broadcast link first clears the flag indicating the node is part of the
      link's distribution set (when it loses contact with the node), and later
      fails to restore the flag (when contact is regained); restoration fails
      if contact with the node is regained by implicit unicast link activation
      triggered by the arrival of a data message, rather than explicitly by the
      arrival of a link activation message.
      
      The broadcast link now uses separate fields to track whether a node is
      theoretically capable of receiving broadcast messages versus whether it is
      actually part of the link's distribution set. The former member is updated
      by the receipt of link protocol messages, which can occur at any time; the
      latter member is updated only when contact with the node is gained or lost.
      This change also permits the simplification of several conditional
      expressions since the broadcast link's "supported" field can now only be
      set if there are working links to the associated node.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      93499313
    • A
      tipc: Prevent broadcast link stalling in dual LAN environments · 4d75313c
      Allan Stephens 提交于
      Ensure that sequence number information about incoming broadcast link
      messages is initialized only by the activation of the first link to a
      given cluster node.  Previously, a race condition allowed reset and/or
      activation messages for a second link to re-initialize this sequence
      number information with obsolete values. This could trigger TIPC to
      request the retransmission of previously acknowledged broadcast link
      messages from that node, resulting in broadcast link processing becoming
      stalled if the node had already released one or more of those messages
      and was unable to perform the required retransmission.
      
      Thanks to Laser <gotolaser@gmail.com> for identifying this problem
      and assisting in the development of this fix.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      4d75313c
    • A
      tipc: Prevent transmission of outdated link protocol messages · 92d2c905
      Allan Stephens 提交于
      Ensures that a link endpoint discards any previously deferred link
      protocol message whenever it attempts to send a new one.
      
      Previously, it was possible for a link protocol message that was unsent
      due to congestion to be transmitted after newer protocol messages had
      been sent. The stale link protocol message might then cause the receiving
      link endpoint to malfunction because of its outdated conent.
      
      Thanks to Osamu Kaminuma [okaminum@avaya.com] for diagnosing the problem
      and contributing a prototype patch.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      92d2c905
    • A
      tipc: improve the link deferred queue insertion algorithm · 8809b255
      Allan Stephens 提交于
      Re-code the algorithm for inserting an out-of-sequence message into
      a unicast or broadcast link's deferred message queue.  It remains
      functionally equivalent but should be easier to understand/maintain.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      8809b255
    • D
      net: Make qdisc_skb_cb upper size bound explicit. · a0417fa3
      David S. Miller 提交于
      Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
      of other data structures.
      
      This is intended to be used by IPoIB so that it can remember
      addressing information stored at hard_header_ops->create() time that
      it can fetch when the packet gets to the transmit routine.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0417fa3
  4. 06 2月, 2012 2 次提交
  5. 05 2月, 2012 3 次提交
  6. 03 2月, 2012 3 次提交
  7. 02 2月, 2012 10 次提交
  8. 01 2月, 2012 1 次提交
    • D
      xfrm6: remove unneeded NULL check in __xfrm6_output() · 5b11b2e4
      Dan Carpenter 提交于
      We don't check for NULL consistently in __xfrm6_output().  If "x" were
      NULL here it would lead to an OOPs later.  I asked Steffen Klassert
      about this and he suggested that we remove the NULL check.
      
      On 10/29/11, Steffen Klassert <steffen.klassert@secunet.com> wrote:
      >> net/ipv6/xfrm6_output.c
      >>    148
      >>    149		if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
      >>                           ^
      >
      > x can't be null here. It would be a bug if __xfrm6_output() is called
      > without a xfrm_state attached to the skb. I think we can just remove
      > this null check.
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b11b2e4