1. 03 9月, 2014 18 次提交
  2. 02 9月, 2014 22 次提交
    • W
      sock: deduplicate errqueue dequeue · 364a9e93
      Willem de Bruijn 提交于
      sk->sk_error_queue is dequeued in four locations. All share the
      exact same logic. Deduplicate.
      
      Also collapse the two critical sections for dequeue (at the top of
      the recv handler) and signal (at the bottom).
      
      This moves signal generation for the next packet forward, which should
      be harmless.
      
      It also changes the behavior if the recv handler exits early with an
      error. Previously, a signal for follow-up packets on the errqueue
      would then not be scheduled. The new behavior, to always signal, is
      arguably a bug fix.
      
      For rxrpc, the change causes the same function to be called repeatedly
      for each queued packet (because the recv handler == sk_error_report).
      It is likely that all packets will fail for the same reason (e.g.,
      memory exhaustion).
      
      This code runs without sk_lock held, so it is not safe to trust that
      sk->sk_err is immutable inbetween releasing q->lock and the subsequent
      test. Introduce int err just to avoid this potential race.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364a9e93
    • W
      net-timestamp: expand documentation · 8fe2f761
      Willem de Bruijn 提交于
      Expand Documentation/networking/timestamping.txt with new
      interfaces and bytestream timestamping. Also minor
      cleanup of the other text.
      
      Import txtimestamp.c test of the new features.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fe2f761
    • D
      Merge branch 'csums-next' · c5a65680
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      net: Checksum offload changes - Part VI
      
      I am working on overhauling RX checksum offload. Goals of this effort
      are:
      
      - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
      - Preserve CHECKSUM_COMPLETE through encapsulation layers
      - Don't do skb_checksum more than once per packet
      - Unify GRO and non-GRO csum verification as much as possible
      - Unify the checksum functions (checksum_init)
      - Simplify code
      
      What is in this seventh patch set:
      
      - Add skb->csum. This allows a device or GRO to indicate that an
        invalid checksum was detected.
      - Checksum unncessary to checksum complete conversions.
      
      With these changes, I believe that the third goal of the overhaul is
      now mostly achieved. In the case of no encapsulation or one layer of
      encapsulation, there should only be at most one skb_checksum over
      each packet (between GRO and normal path). In the case of two layers
      of encapsulation, it is still possible with the right combination of
      non-zero and zero UDP checksums to have >1 skb_checksum. For instance:
      IP>GRE(with csum)>IP>UDP(zero csum)>VXLAN>IP>UDP(non-zero csum),
      would likely necessiate an skb_checksum in GRO and normal path.
      This doesn't seem like a common scenario at all so I'm inclined to
      not address this now, if multiple layers of encapsulation becomes
      popular we can reassess.
      
      Note that checksum conversion shows a nice improvement for RX VXLAN when
      outer UDP checksum is enabled (12.65% CPU compared to 20.94%). This
      is not only from the fact that we don't need checksum calculation on
      the host, but also allows GRO for VXLAN in this case. Checksum
      conversion does not help send side (which still needs to perform
      a checksum on host). For that we will implement remote checksum offload
      in a later patch
      (http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00).
      
      Please review carefully and test if possible, mucking with basic
      checksum functions is always a little precarious :-)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5a65680
    • T
    • T
    • T
      gre: Add support for checksum unnecessary conversions · 884d338c
      Tom Herbert 提交于
      Call skb_checksum_try_convert and skb_gro_checksum_try_convert
      after checksum is found present and validated in the GRE header
      for normal and GRO paths respectively.
      
      In GRO path, call skb_gro_checksum_try_convert
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      884d338c
    • T
      udp: Add support for doing checksum unnecessary conversion · 2abb7cdc
      Tom Herbert 提交于
      Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE
      conversion in UDP tunneling path.
      
      In the normal UDP path, we call skb_checksum_try_convert after locating
      the UDP socket. The check is that checksum conversion is enabled for
      the socket (new flag in UDP socket) and that checksum field is
      non-zero.
      
      In the UDP GRO path, we call skb_gro_checksum_try_convert after
      checksum is validated and checksum field is non-zero. Since this is
      already in GRO we assume that checksum conversion is always wanted.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2abb7cdc
    • T
      net: Infrastructure for checksum unnecessary conversions · d96535a1
      Tom Herbert 提交于
      For normal path, added skb_checksum_try_convert which is called
      to attempt to convert CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE. The
      primary condition to allow this is that ip_summed is CHECKSUM_NONE
      and csum_valid is true, which will be the state after consuming
      a CHECKSUM_UNNECESSARY.
      
      For GRO path, added skb_gro_checksum_try_convert which is the GRO
      analogue of skb_checksum_try_convert. The primary condition to allow
      this is that NAPI_GRO_CB(skb)->csum_cnt == 0 and
      NAPI_GRO_CB(skb)->csum_valid is set. This implies that we have consumed
      all available CHECKSUM_UNNECESSARY checksums in the GRO path.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d96535a1
    • T
      net: Support for csum_bad in skbuff · 5a212329
      Tom Herbert 提交于
      This flag indicates that an invalid checksum was detected in the
      packet. __skb_mark_checksum_bad helper function was added to set this.
      
      Checksums can be marked bad from a driver or the GRO path (the latter
      is implemented in this patch). csum_bad is checked in
      __skb_checksum_validate_complete (i.e. calling that when ip_summed ==
      CHECKSUM_NONE).
      
      csum_bad works in conjunction with ip_summed value. In the case that
      ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the
      first (or next) checksum encountered in the packet is bad. When
      ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last
      one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY,
      csum_level == 1, and csum_bad is set-- then the third checksum in the
      packet is bad. In the normal path, the packet will be dropped when
      processing the protocol layer of the bad checksum:
      __skb_decr_checksum_unnecessary called twice for the good checksums
      changing ip_summed to CHECKSUM_NONE so that
      __skb_checksum_validate_complete is called to validate the third
      checksum and that will fail since csum_bad is set.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a212329
    • H
      r8152: rename rx_buf_sz · 52aec126
      hayeswang 提交于
      The variable "rx_buf_sz" is used by both tx and rx buffers. Replace
      it with "agg_buf_sz".
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52aec126
    • F
      net: phy: mdio-bcm-unimac: NULL-terminate unimac_mdio_ids · 4559154a
      Florian Fainelli 提交于
      drivers/net/phy/mdio-bcm-unimac.c:195:37-38: unimac_mdio_ids is not NULL
      terminated at line 195
      
      Make sure of_device_id tables are NULL terminated
      Generated by: scripts/coccinelle/misc/of_table.cocci
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4559154a
    • F
      net: dsa: make dsa_pack_type static · 61b7363f
      Florian Fainelli 提交于
      net/dsa/dsa.c:624:20: sparse: symbol 'dsa_pack_type' was not declared.
      Should it be static?
      
      Fixes: 3e8a72d1 ("net: dsa: reduce number of protocol hooks")
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61b7363f
    • N
      bonding: add slave_changelink support and use it for queue_id · 0f23124a
      Nikolay Aleksandrov 提交于
      This patch adds support for slave_changelink to the bonding and uses it
      to give the ability to change the queue_id of the enslaved devices via
      netlink. It sets slave_maxtype and uses bond_changelink as a prototype for
      bond_slave_changelink.
      Example/test command after the iproute2 patch:
       ip link set eth0 type bond_slave queue_id 10
      
      CC: David S. Miller <davem@davemloft.net>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Suggested-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f23124a
    • S
      tcp: whitespace fixes · 688d1945
      stephen hemminger 提交于
      Fix places where there is space before tab, long lines, and
      awkward if(){, double spacing etc. Add blank line after declaration/initialization.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      688d1945
    • F
      net: systemport: tell RXCHK if we are using Broadcom tags · d09d3038
      Florian Fainelli 提交于
      When Broadcom tags are enabled, e.g: when interfaced to an Ethernet
      switch, make sure that we tell the RXCHK engine that it should be
      expecting a 4-bytes Broadcom tag after the Ethernet MAC Source Address.
      
      Use netdev_uses_dsa() to check for that condition since that will tell
      us if a switch is attached to our network interface.
      
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d09d3038
    • J
      pktgen: add flag NO_TIMESTAMP to disable timestamping · afb84b62
      Jesper Dangaard Brouer 提交于
      Then testing the TX limits of the stack, then it is useful to
      be-able to disable the do_gettimeofday() timetamping on every packet.
      
      This implements a pktgen flag NO_TIMESTAMP which will disable this
      call to do_gettimeofday().
      
      The performance change on (my system E5-2695) with skb_clone=0, goes
      from TX 2,423,751 pps to 2,567,165 pps with flag NO_TIMESTAMP. Thus,
      the cost of do_gettimeofday() or saving is approx 23 nanosec.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afb84b62
    • D
      bnx2x: fix tunneled GSO over IPv6 · 05f8461b
      Dmitry Kravkov 提交于
      Set correct bit for packed description.
      
      Introduced in e42780b6
          bnx2x: Utilize FW 7.10.51
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDmitry Kravkov <Dmitry.Kravkov@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05f8461b
    • D
      bnx2x: prevent incorrect byte-swap in BE · 55ef5c89
      Dmitry Kravkov 提交于
      Fixes incorrectly defined struct in FW HSI for BE platform.
      Affects tunneling, tx-switching and anti-spoofing.
      
      Introduced in e42780b6
          bnx2x: Utilize FW 7.10.51
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDmitry Kravkov <Dmitry.Kravkov@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55ef5c89
    • E
      tipc: add name distributor resiliency queue · a5325ae5
      Erik Hugne 提交于
      TIPC name table updates are distributed asynchronously in a cluster,
      entailing a risk of certain race conditions. E.g., if two nodes
      simultaneously issue conflicting (overlapping) publications, this may
      not be detected until both publications have reached a third node, in
      which case one of the publications will be silently dropped on that
      node. Hence, we end up with an inconsistent name table.
      
      In most cases this conflict is just a temporary race, e.g., one
      node is issuing a publication under the assumption that a previous,
      conflicting, publication has already been withdrawn by the other node.
      However, because of the (rtt related) distributed update delay, this
      may not yet hold true on all nodes. The symptom of this failure is a
      syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".
      
      In this commit we add a resiliency queue at the receiving end of
      the name table distributor. When insertion of an arriving publication
      fails, we retain it in this queue for a short amount of time, assuming
      that another update will arrive very soon and clear the conflict. If so
      happens, we insert the publication, otherwise we drop it.
      
      The (configurable) retention value defaults to 2000 ms. Knowing from
      experience that the situation described above is extremely rare, there
      is no risk that the queue will accumulate any large number of items.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5325ae5
    • E
      tipc: refactor name table updates out of named packet receive routine · f4ad8a4b
      Erik Hugne 提交于
      We need to perform the same actions when processing deferred name
      table updates, so this functionality is moved to a separate
      function.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4ad8a4b
    • H
      r8152: reduce the number of Tx · 1764bcd9
      hayeswang 提交于
      Because the Tx has the features of stopping queue and aggregation,
      We don't need many tx buffers. Change the tx number from 10 to 4
      to reduce the usage of the memory. This could save 16K * 6 bytes
      memory.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1764bcd9
    • D
      Merge branch 'xmit_list' · 53fda7f7
      David S. Miller 提交于
      David Miller says:
      
      ====================
      net: Make dev_hard_start_xmit() work fundamentally on lists
      
      After this patch set, dev_hard_start_xmit() will work fundemantally on
      any and all SKB lists.
      
      This opens the path for a clean implementation of pulling multiple
      packets out during qdisc_restart(), and then passing that blob in one
      shot to dev_hard_start_xmit().
      
      There were two main architectural blockers to this:
      
      1) The GSO handling, we kept the original GSO head SKB around simply
         because dev_hard_start_xmit() had no way to communicate to the
         caller how far into the segmented list it was able to go.  Now it
         can, so the head GSO can be liberated immediately.
      
         All of the special GSO head SKB destructor et al. handling goes
         away too.
      
      2) Validate of VLAN, CSUM, and segmentation characteristics was being
         performed inside of dev_hard_start_xmit().  If want to truly batch,
         we have to let the higher levels to this.  In particular, this is
         now dequeue_skb()'s job.
      
      And with those two issues out of the way, it should now be trivial to
      build experiments on top of this patch set, all of the framework
      should be there now.  You could do something as simple as:
      
      	skb = q->dequeue(q);
      	if (skb)
      		skb = validate_xmit_skb(skb, qdisc_dev(q));
      	if (skb) {
      		struct sk_buff *new, *head = skb;
      		int limit = 5;
      
      		do {
      			new = q->dequeue(q);
      			if (new)
      				new = validate_xmit_skb(new, qdisc_dev(q));
      			if (new) {
      				skb->next = new;
      				skb = new;
      			}
      		} while (new && --limit);
      		skb = head;
      	}
      
      inside of the else branch of dequeue_skb().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53fda7f7