• R
    [PATCH] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io)... · 7d3d0439
    Ravinandan Arakali 提交于
    [PATCH] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs
    
    Hi,
    Below is a patch for the Large Receive Offload feature.
    Please review and let us know your comments.
    
    LRO algorithm was described in an OLS 2005 presentation, located at
    ftp.s2io.com
    user: linuxdocs
    password: HALdocs
    
    The same ftp site has Programming Manual for Xframe-I ASIC.
    LRO feature is supported on Neterion Xframe-I, Xframe-II and
    Xframe-Express 10GbE NICs.
    
    Brief description:
    The Large Receive Offload(LRO) feature is a stateless offload
    that is complementary to TSO feature but on the receive path.
    The idea is to combine and collapse(upto 64K maximum) in the
    driver, in-sequence TCP packets belonging to the same session.
    It is mainly designed to improve 1500 mtu receive performance,
    since Jumbo frame performance is already close to 10GbE line
    rate. Some performance numbers are attached below.
    
    Implementation details:
    1. Handle packet chains from multiple sessions(current default
    MAX_LRO_SESSSIONS=32).
    2. Examine each packet for eligiblity to aggregate. A packet is
    considered eligible if it meets all the below criteria.
      a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
      b. The packet has no checksum errors(L3 and L4).
      c. There are no IP options. The only TCP option supported is timestamps.
      d. Search and locate the LRO object corresponding to this
         socket and ensure packet is in TCP sequence.
      e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not set).
      f. TCP payload is non-zero(It's not a pure ACK).
      g. It's not an IP-fragmented packet.
    3. If a packet is found eligible, the LRO object is updated with
       information such as next sequence number expected, current length
       of aggregated packet and so on. If not eligible or max packets
       reached, update IP and TCP headers of first packet in the chain
       and pass it up to stack.
    4. The frag_list in skb structure is used to chain packets into one
       large packet.
    
    Kernel changes required: None
    
    Performance results:
    Main focus of the initial testing was on 1500 mtu receiver, since this
    is a bottleneck not covered by the existing stateless offloads.
    
    There are couple disclaimers about the performance results below:
    1. Your mileage will vary!!!! We initially concentrated on couple pci-x
    2.0 platforms that are powerful enough to push 10 GbE NIC and do not
    have bottlenecks other than cpu%;  testing on other platforms is still
    in progress. On some lower end systems we are seeing lower gains.
    
    2. Current LRO implementation is still (for the most part) software based,
    and therefore performance potential of the feature is far from being realized.
    Full hw implementation of LRO is expected in the next version of Xframe ASIC.
    
    Performance delta(with MTU=1500) going from LRO disabled to enabled:
    IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
    2-way Opteron : 4.5 to 6.1 Gbps
    Signed-off-by: NRavinandan Arakali <ravinandan.arakali@neterion.com>
    Signed-off-by: NJeff Garzik <jgarzik@pobox.com>
    7d3d0439
s2io.h 26.3 KB