1. 16 12月, 2008 3 次提交
    • H
      net: Add Generic Receive Offload infrastructure · d565b0a1
      Herbert Xu 提交于
      This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
      This is pretty similar to LRO except that this is protocol-independent.
      Instead of holding packets in an lro_mgr structure, they're now held in
      napi_struct.
      
      For drivers that intend to use this, they can set the NETIF_F_GRO bit and
      call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
      The latter will call napi_receive_skb automatically.  When napi_gro_receive
      is used, the driver must either call napi_complete/napi_rx_complete, or
      call napi_gro_flush in softirq context if the driver uses the primitives
      __napi_complete/__napi_rx_complete.
      
      Protocols will set the gro_receive and gro_complete function pointers in
      order to participate in this scheme.
      
      In addition to the packet, gro_receive will get a list of currently held
      packets.  Each packet in the list has a same_flow field which is non-zero
      if it is a potential match for the new packet.  For each packet that may
      match, they also have a flush field which is non-zero if the held packet
      must not be merged with the new packet.
      
      Once gro_receive has determined that the new skb matches a held packet,
      the held packet may be processed immediately if the new skb cannot be
      merged with it.  In this case gro_receive should return the pointer to
      the existing skb in gro_list.  Otherwise the new skb should be merged into
      the existing packet and NULL should be returned, unless the new skb makes
      it impossible for any further merges to be made (e.g., FIN packet) where
      the merged skb should be returned.
      
      Whenever the skb is merged into an existing entry, the gro_receive
      function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
      merely matches an existing entry but can't be merged with it, then
      this shouldn't be set.
      
      If gro_receive finds it pointless to hold the new skb for future merging,
      it should set NAPI_GRO_CB(skb)->flush.
      
      Held packets will be flushed by napi_gro_flush which is called by
      napi_complete and napi_rx_complete.
      
      Currently held packets are stored in a singly liked list just like LRO.
      The list is limited to a maximum of 8 entries.  In future, this may be
      expanded to use a hash table to allow more flows to be held for merging.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d565b0a1
    • H
      net: Add frag_list support to GSO · 1a881f27
      Herbert Xu 提交于
      This patch allows GSO to handle frag_list in a limited way for the
      purposes of allowing packets merged by GRO to be refragmented on
      output.
      
      Most hardware won't (and aren't expected to) support handling GRO
      frag_list packets directly.  Therefore we will perform GSO in
      software for those cases.
      
      However, for drivers that can support it (such as virtual NICs) we
      may not have to segment the packets at all.
      
      Whether the added overhead of GRO/GSO is worthwhile for bridges
      and routers when weighed against the benefit of potentially
      increasing the MTU within the host is still an open question.
      However, for the case of host nodes this is undoubtedly a win.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a881f27
    • R
      Define smp_call_function_many for UP · d2ff9118
      Rusty Russell 提交于
      Otherwise those using it in transition patches (eg. kvm) can't compile
      with CONFIG_SMP=n:
      
      arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function 'make_all_cpus_request':
      arch/x86/kvm/../../../virt/kvm/kvm_main.c:380: error: implicit declaration of function 'smp_call_function_many'
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2ff9118
  2. 13 12月, 2008 6 次提交
  3. 11 12月, 2008 15 次提交
  4. 10 12月, 2008 1 次提交
    • N
      netpoll: fix race on poll_list resulting in garbage entry · 7b363e44
      Neil Horman 提交于
      	A few months back a race was discused between the netpoll napi service
      path, and the fast path through net_rx_action:
      http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
      
      A patch was submitted for that bug, but I think we missed a case.
      
      Consider the following scenario:
      
      INITIAL STATE
      CPU0 has one napi_struct A on its poll_list
      CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
      napi_struct A that CPU0 has on its list
      
      
      
      CPU0						CPU1
      net_rx_action					poll_napi
      !list_empty (returns true)			locks poll_lock for A
      						 poll_one_napi
      						  napi->poll
      						   netif_rx_complete
      						    __napi_complete
      						    (removes A from poll_list)
      list_entry(list->next)
      
      
      In the above scenario, net_rx_action assumes that the per-cpu poll_list is
      exclusive to that cpu.  netpoll of course violates that, and because the netpoll
      path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
      list at the top of the while loop in net_rx_action, but have it become empty by
      the time it calls list_entry.  Since the poll_list isn't surrounded by any other
      structure, the returned data from that list_entry call in this situation is
      garbage, and any number of crashes can result based on what exactly that garbage
      is.
      
      Given that its not fasible for performance reasons to place exclusive locks
      arround each cpus poll list to provide that mutal exclusion, I think the best
      solution is modify the netpoll path in such a way that we continue to guarantee
      that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
      implemented the patch below.  It adds an additional bit to the state field in
      the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
      be set. When a driver calls netif_rx_complete, if that bit is set, it will not
      remove the napi_struct from the poll_list.  That work will be saved for the next
      iteration of net_rx_action.
      
      I've tested this and it seems to work well.  About the biggest drawback I can
      see to it is the fact that it might result in an extra loop through
      net_rx_action in the event that the device is actually contended for (i.e. the
      netpoll path actually preforms all the needed work no the device, and the call
      to net_rx_action winds up doing nothing, except removing the napi_struct from
      the poll_list.  However I think this is probably a small price to pay, given
      that the alternative is a crash.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b363e44
  5. 09 12月, 2008 3 次提交
  6. 08 12月, 2008 4 次提交
    • G
      dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · 6fdd34d4
      Gerrit Renker 提交于
      This removes the use of the sysctl and the minisock variable for the Send Ack
      Vector feature, as it now is handled fully dynamically via feature negotiation
      (i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
       RFC 4341, 4.).
      
      Using a sysctl in parallel to this implementation would open the door to
      crashes, since much of the code relies on tests of the boolean minisock /
      sysctl variable. Thus, this patch replaces all tests of type
      
      	if (dccp_msk(sk)->dccpms_send_ack_vector)
      		/* ... */
      with
      	if (dp->dccps_hc_rx_ackvec != NULL)
      		/* ... */
      
      The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
      negotiation concluded that Ack Vectors are to be used on the half-connection.
      Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
      so that the test is a valid one.
      
      The activation handler for Ack Vectors is called as soon as the feature
      negotiation has concluded at the
       * server when the Ack marking the transition RESPOND => OPEN arrives;
       * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
      
      Adding the sequence number of the Response packet to the Ack Vector has been
      removed, since
       (a) connection establishment implies that the Response has been received;
       (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
           this entry will always be ignored;
       (c) it can not be used for anything useful - to detect loss for instance, only
           packets received after the loss can serve as pseudo-dupacks.
      
      There was a FIXME to change the error code when dccp_ackvec_add() fails.
      I removed this after finding out that:
       * the check whether ackno < ISN is already made earlier,
       * this Response is likely the 1st packet with an Ackno that the client gets,
       * so when dccp_ackvec_add() fails, the reason is likely not a packet error.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fdd34d4
    • G
      dccp: Remove manual influence on NDP Count feature · 4098dce5
      Gerrit Renker 提交于
      Updating the NDP count feature is handled automatically now:
       * for CCID-2 it is disabled, since the code does not use NDP counts;
       * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.
      
      Allowing the user to change NDP values leads to unpredictable and failing
      behaviour, since it is then possible to disable NDP counts even when they
      are needed (e.g. in CCID-3).
      
      This means that only those user settings are sensible that agree with the
      values for Send NDP Count implied by the choice of CCID. But those settings
      are already activated by the feature negotiation (CCID dependency tracking),
      hence this form of support is redundant.
      
      At startup the initialisation of the NDP count feature uses the default
      value of 0, which is done implicitly by the zeroing-out of the socket when
      it is allocated. If the choice of CCID or feature negotiation enables NDP
      count, this will then be updated via the NDP activation handler.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4098dce5
    • G
      dccp: Remove obsolete parts of the old CCID interface · 0049bab5
      Gerrit Renker 提交于
      The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
      case, their value equals initially that of the sysctl, but at the end of
      feature negotiation may be something different.
      
      The old interface removed by this patch thus has been replaced by the newer
      interface to dynamically query the currently loaded CCIDs.
      
      Also removed are the constructors for the TX CCID and the RX CCID, since the
      switch "rx <-> non-rx" is done by the handler in minisocks.c (and the handler
      is the only place in the code where CCIDs are loaded).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0049bab5
    • W
      netdevice: Kill netdev->priv · b74ca3a8
      Wang Chen 提交于
      This is the last shoot of this series.
      After I removing all directly reference of netdev->priv, I am killing
      "priv" of "struct net_device" and fixing relative comments/docs.
      
      Anyone will not be allowed to reference netdev->priv directly.
      If you want to reference the memory of private data, use netdev_priv()
      instead.
      If the private data is not allocted when alloc_netdev(), use
      netdev->ml_priv to point that memory after you creating that private
      data.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74ca3a8
  7. 06 12月, 2008 1 次提交
  8. 05 12月, 2008 7 次提交