1. 10 2月, 2009 1 次提交
  2. 09 2月, 2009 2 次提交
  3. 07 2月, 2009 1 次提交
  4. 06 2月, 2009 2 次提交
  5. 05 2月, 2009 2 次提交
    • H
      net: Reexport sock_alloc_send_pskb · 4cc7f68d
      Herbert Xu 提交于
      The function sock_alloc_send_pskb is completely useless if not
      exported since most of the code in it won't be used as is.  In
      fact, this code has already been duplicated in the tun driver.
      
      Now that we need accounting in the tun driver, we can in fact
      use this function as is.  So this patch marks it for export again.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cc7f68d
    • H
      net: Partially allow skb destructors to be used on receive path · 9a279bcb
      Herbert Xu 提交于
      As it currently stands, skb destructors are forbidden on the
      receive path because the protocol end-points will overwrite
      any existing destructor with their own.
      
      This is the reason why we have to call skb_orphan in the loopback
      driver before we reinject the packet back into the stack, thus
      creating a period during which loopback traffic isn't charged
      to any socket.
      
      With virtualisation, we have a similar problem in that traffic
      is reinjected into the stack without being associated with any
      socket entity, thus providing no natural congestion push-back
      for those poor folks still stuck with UDP.
      
      Now had we been consistent in telling them that UDP simply has
      no congestion feedback, I could just fob them off.  Unfortunately,
      we appear to have gone to some length in catering for this on
      the standard UDP path, with skb/socket accounting so that has
      created a very unhealthy dependency.
      
      Alas habits are difficult to break out of, so we may just have
      to allow skb destructors on the receive path.
      
      It turns out that making skb destructors useable on the receive path
      isn't as easy as it seems.  For instance, simply adding skb_orphan
      to skb_set_owner_r isn't enough.  This is because we assume all
      over the IP stack that skb->sk is an IP socket if present.
      
      The new transparent proxy code goes one step further and assumes
      that skb->sk is the receiving socket if present.
      
      Now all of this can be dealt with by adding simple checks such
      as only treating skb->sk as an IP socket if skb->sk->sk_family
      matches.  However, it turns out that for bridging at least we
      don't need to do all of this work.
      
      This is of interest because most virtualisation setups use bridging
      so we don't actually go through the IP stack on the host (with
      the exception of our old nemesis the bridge netfilter, but that's
      easily taken care of).
      
      So this patch simply adds skb_orphan to the point just before we
      enter the IP stack, but after we've gone through the bridge on the
      receive path.  It also adds an skb_orphan to the one place in
      netfilter that touches skb->sk/skb->destructor, that is, tproxy.
      
      One word of caution, because of the internal code structure, anyone
      wishing to deploy this must use skb_set_owner_w as opposed to
      skb_set_owner_r since many functions that create a new skb from
      an existing one will invoke skb_set_owner_w on the new skb.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a279bcb
  6. 01 2月, 2009 2 次提交
  7. 30 1月, 2009 6 次提交
    • H
      gro: Open-code memcpy in napi_fraginfo_skb · 80595d59
      Herbert Xu 提交于
      This patch optimises napi_fraginfo_skb to only copy the bits
      necessary.  We also open-code the memcpy so that the alignment
      information is always available to gcc.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80595d59
    • H
      gro: Do not merge paged packets into frag_list · 81705ad1
      Herbert Xu 提交于
      gro: Do not merge paged packets into frag_list
      
      Bigger is not always better :)
      
      It was easy to continue to merged packets into frag_list after the
      page array is full.  However, this turns out to be worse than LRO
      because frag_list is a much less efficient form of storage than the
      page array.  So we're better off stopping the merge and starting
      a new entry with an empty page array.
      
      In future we can optimise this further by doing frag_list merging
      but making sure that we continue to fill in the page array.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81705ad1
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
    • H
      gro: Move common completion code into helpers · 5d0d9be8
      Herbert Xu 提交于
      Currently VLAN still has a bit of common code handling the aftermath
      of GRO that's shared with the common path.  This patch moves them
      into shared helpers to reduce code duplication.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d0d9be8
    • S
      net: Fix OOPS in skb_seq_read(). · 71b3346d
      Shyam Iyer 提交于
      It oopsd for me in skb_seq_read. addr2line said it was
      linux-2.6/net/core/skbuff.c:2228, which is this line:
      
      
      	while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
      
      
      I added some printks in there and it looks like we hit this:
      
              } else if (st->root_skb == st->cur_skb &&
                         skb_shinfo(st->root_skb)->frag_list) {
                       st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                       st->frag_idx = 0;
                       goto next_skb;
              }
      
      
      
      Actually I did some testing and added a few printks and found that the
      st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
      This caused the kernel panic.
      
       	if (abs_offset < block_limit) {
      -		*data = st->cur_skb->data + abs_offset;
      +		*data = st->cur_skb->data + (abs_offset - st->stepped_offset);
      
      I enabled the debug_tcp and with a few printks found that the code did
      not go to the next_skb label and could find that the sequence being
      followed was this -
      
      It hit this if condition -
      
              if (st->cur_skb->next) {
                      st->cur_skb = st->cur_skb->next;
                      st->frag_idx = 0;
                      goto next_skb;
      
      And so, now the st pointer is shifted to the next skb whereas actually
      it should have hit the second else if first since the data is in the
      frag_list.
      
              else if (st->root_skb == st->cur_skb &&
                       skb_shinfo(st->root_skb)->frag_list) {
                      st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                      goto next_skb;
              }
      
      Reversing the two conditions the attached patch fixes the issue for me
      on top of Herbert's patches. 
      Signed-off-by: NShyam Iyer <shyam_iyer@dell.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71b3346d
    • H
      net: Fix frag_list handling in skb_seq_read · 95e3b24c
      Herbert Xu 提交于
      The frag_list handling was broken in skb_seq_read:
      
      1) We didn't add the stepped offset when looking at the head
      are of fragments other than the first.
      
      2) We didn't take the stepped offset away when setting the data
      pointer in the head area.
      
      3) The frag index wasn't reset.
      
      This patch fixes both issues.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95e3b24c
  8. 28 1月, 2009 3 次提交
  9. 21 1月, 2009 3 次提交
  10. 20 1月, 2009 2 次提交
    • J
      net: Fix data corruption when splicing from sockets. · 8b9d3728
      Jarek Poplawski 提交于
      The trick in socket splicing where we try to convert the skb->data
      into a page based reference using virt_to_page() does not work so
      well.
      
      The idea is to pass the virt_to_page() reference via the pipe
      buffer, and refcount the buffer using a SKB reference.
      
      But if we are splicing from a socket to a socket (via sendpage)
      this doesn't work.
      
      The from side processing will grab the page (and SKB) references.
      The sendpage() calls will grab page references only, return, and
      then the from side processing completes and drops the SKB ref.
      
      The page based reference to skb->data is not enough to keep the
      kmalloc() buffer backing it from being reused.  Yet, that is
      all that the socket send side has at this point.
      
      This leads to data corruption if the skb->data buffer is reused
      by SLAB before the send side socket actually gets the TX packet
      out to the device.
      
      The fix employed here is to simply allocate a page and copy the
      skb->data bytes into that page.
      
      This will hurt performance, but there is no clear way to fix this
      properly without a copy at the present time, and it is important
      to get rid of the data corruption.
      
      With fixes from Herbert Xu.
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Foreseen-by: NChangli Gao <xiaosuo@gmail.com>
      Diagnosed-by: NWilly Tarreau <w@1wt.eu>
      Reported-by: NWilly Tarreau <w@1wt.eu>
      Fixed-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b9d3728
    • H
      net: Add debug info to track down GSO checksum bug · 67fd1a73
      Herbert Xu 提交于
      I'm trying to track down why people're hitting the checksum warning
      in skb_gso_segment.  As the problem seems to be hitting lots of
      people and I can't reproduce it or locate the bug, here is a patch
      to print out more details which hopefully should help us to track
      this down.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67fd1a73
  11. 15 1月, 2009 3 次提交
  12. 11 1月, 2009 1 次提交
  13. 07 1月, 2009 5 次提交
  14. 06 1月, 2009 1 次提交
  15. 05 1月, 2009 3 次提交
    • M
      net: Fix for initial link state in 2.6.28 · 22604c86
      Michael Marineau 提交于
      From: Michael Marineau <mike@marineau.org>
      
      Commit b4730016 "Do not fire linkwatch
      events until the device is registered." was made as a workaround for
      drivers that call netif_carrier_off before registering the device.
      Unfortunately this causes these drivers to incorrectly report their
      link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
      flag when the interface is first brought up. This issues was
      previously pointed out[1] but was dismissed saying that IFF_RUNNING is
      not related to the link status. From my digging IFF_RUNNING, as
      reported to userspace, is based on the link state. It is set based on
      __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
      and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
      not reported to user space so it may well be independent of the link,
      I don't know if and when it may get set.)
      
      The end result depends slightly depending on the driver. The the two I
      tested were e1000e and b44. With e1000e if the system is booted
      without a network cable attached the interface will falsely report
      RUNNING when it is brought up causing NetworkManager to attempt to
      start it and eventually time out. With b44 when the system is booted
      with a network cable attached and brought up with dhcpcd it will time
      out the first time.
      
      The attached patch that will still set the operstate variable
      correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
      but then return rather than skipping the linkwatch_fire_event call
      entirely as the previous fix did. (sorry it isn't inline, I don't have
      a patch friendly email client at the moment)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22604c86
    • H
      gro: Add page frag support · 5d38a079
      Herbert Xu 提交于
      This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
      in one skb, rather than using the less efficient frag_list.
      
      It also adds a new interface, napi_gro_frags to allow drivers
      to inject page frags directly into the stack without allocating
      an skb.  This is intended to be the GRO equivalent for LRO's
      lro_receive_frags interface.
      
      The existing GSO interface can already handle page frags with
      or without an appended frag_list so nothing needs to be changed
      there.
      
      The merging itself is rather simple.  We store any new frag entries
      after the last existing entry, without checking whether the first
      new entry can be merged with the last existing entry.  Making this
      check would actually be easy but since no existing driver can
      produce contiguous frags anyway it would just be mental masturbation.
      
      If the total number of entries would exceed the capacity of a
      single skb, we simply resort to using frag_list as we do now.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d38a079
    • H
      gro: Use gso_size to store MSS · b530256d
      Herbert Xu 提交于
      In order to allow GRO packets without frag_list at all, we need to
      store the MSS in the packet itself.  The obvious place is gso_size.
      The only thing to watch out for is if the packet ends up not being
      GRO then we need to clear gso_size before pushing the packet into
      the stack.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b530256d
  16. 30 12月, 2008 2 次提交
  17. 27 12月, 2008 1 次提交