1. 01 3月, 2009 1 次提交
    • H
      netpoll: Add drop checks to all entry points · 4ead4431
      Herbert Xu 提交于
      The netpoll entry checks are required to ensure that we don't
      receive normal packets when invoked via netpoll.  Unfortunately
      it only ever worked for the netif_receive_skb/netif_rx entry
      points.  The VLAN (and subsequently GRO) entry point didn't
      have the check and therefore can trigger all sorts of weird
      problems.
      
      This patch adds the netpoll check to all entry points.
      
      I'm still uneasy with receiving at all under netpoll (which
      apparently is only used by the out-of-tree kdump code).  The
      reason is it is perfectly legal to receive all data including
      headers into highmem if netpoll is off, but if you try to do
      that with netpoll on and someone gets a printk in an IRQ handler                                             
      you're going to get a nice BUG_ON.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ead4431
  2. 24 2月, 2009 2 次提交
  3. 22 2月, 2009 1 次提交
    • D
      netns: fix double free at netns creation · 486a87f1
      Daniel Lezcano 提交于
      This patch fix a double free when a network namespace fails.
      The previous code does a kfree of the net_generic structure when
      one of the init subsystem initialization fails.
      The 'setup_net' function does kfree(ng) and returns an error.
      The caller, 'copy_net_ns', call net_free on error, and this one
      calls kfree(net->gen), making this pointer freed twice.
      
      This patch make the code symetric, the net_alloc does the net_generic
      allocation and the net_free frees the net_generic.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      486a87f1
  4. 18 2月, 2009 1 次提交
    • D
      net: Kill skb_truesize_check(), it only catches false-positives. · 92a0acce
      David S. Miller 提交于
      A long time ago we had bugs, primarily in TCP, where we would modify
      skb->truesize (for TSO queue collapsing) in ways which would corrupt
      the socket memory accounting.
      
      skb_truesize_check() was added in order to try and catch this error
      more systematically.
      
      However this debugging check has morphed into a Frankenstein of sorts
      and these days it does nothing other than catch false-positives.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a0acce
  5. 13 2月, 2009 1 次提交
  6. 07 2月, 2009 1 次提交
  7. 06 2月, 2009 1 次提交
  8. 30 1月, 2009 2 次提交
    • S
      net: Fix OOPS in skb_seq_read(). · 71b3346d
      Shyam Iyer 提交于
      It oopsd for me in skb_seq_read. addr2line said it was
      linux-2.6/net/core/skbuff.c:2228, which is this line:
      
      
      	while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
      
      
      I added some printks in there and it looks like we hit this:
      
              } else if (st->root_skb == st->cur_skb &&
                         skb_shinfo(st->root_skb)->frag_list) {
                       st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                       st->frag_idx = 0;
                       goto next_skb;
              }
      
      
      
      Actually I did some testing and added a few printks and found that the
      st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
      This caused the kernel panic.
      
       	if (abs_offset < block_limit) {
      -		*data = st->cur_skb->data + abs_offset;
      +		*data = st->cur_skb->data + (abs_offset - st->stepped_offset);
      
      I enabled the debug_tcp and with a few printks found that the code did
      not go to the next_skb label and could find that the sequence being
      followed was this -
      
      It hit this if condition -
      
              if (st->cur_skb->next) {
                      st->cur_skb = st->cur_skb->next;
                      st->frag_idx = 0;
                      goto next_skb;
      
      And so, now the st pointer is shifted to the next skb whereas actually
      it should have hit the second else if first since the data is in the
      frag_list.
      
              else if (st->root_skb == st->cur_skb &&
                       skb_shinfo(st->root_skb)->frag_list) {
                      st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                      goto next_skb;
              }
      
      Reversing the two conditions the attached patch fixes the issue for me
      on top of Herbert's patches. 
      Signed-off-by: NShyam Iyer <shyam_iyer@dell.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71b3346d
    • H
      net: Fix frag_list handling in skb_seq_read · 95e3b24c
      Herbert Xu 提交于
      The frag_list handling was broken in skb_seq_read:
      
      1) We didn't add the stepped offset when looking at the head
      are of fragments other than the first.
      
      2) We didn't take the stepped offset away when setting the data
      pointer in the head area.
      
      3) The frag index wasn't reset.
      
      This patch fixes both issues.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95e3b24c
  9. 21 1月, 2009 3 次提交
  10. 20 1月, 2009 2 次提交
    • J
      net: Fix data corruption when splicing from sockets. · 8b9d3728
      Jarek Poplawski 提交于
      The trick in socket splicing where we try to convert the skb->data
      into a page based reference using virt_to_page() does not work so
      well.
      
      The idea is to pass the virt_to_page() reference via the pipe
      buffer, and refcount the buffer using a SKB reference.
      
      But if we are splicing from a socket to a socket (via sendpage)
      this doesn't work.
      
      The from side processing will grab the page (and SKB) references.
      The sendpage() calls will grab page references only, return, and
      then the from side processing completes and drops the SKB ref.
      
      The page based reference to skb->data is not enough to keep the
      kmalloc() buffer backing it from being reused.  Yet, that is
      all that the socket send side has at this point.
      
      This leads to data corruption if the skb->data buffer is reused
      by SLAB before the send side socket actually gets the TX packet
      out to the device.
      
      The fix employed here is to simply allocate a page and copy the
      skb->data bytes into that page.
      
      This will hurt performance, but there is no clear way to fix this
      properly without a copy at the present time, and it is important
      to get rid of the data corruption.
      
      With fixes from Herbert Xu.
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Foreseen-by: NChangli Gao <xiaosuo@gmail.com>
      Diagnosed-by: NWilly Tarreau <w@1wt.eu>
      Reported-by: NWilly Tarreau <w@1wt.eu>
      Fixed-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b9d3728
    • H
      net: Add debug info to track down GSO checksum bug · 67fd1a73
      Herbert Xu 提交于
      I'm trying to track down why people're hitting the checksum warning
      in skb_gso_segment.  As the problem seems to be hitting lots of
      people and I can't reproduce it or locate the bug, here is a patch
      to print out more details which hopefully should help us to track
      this down.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67fd1a73
  11. 15 1月, 2009 3 次提交
  12. 11 1月, 2009 1 次提交
  13. 07 1月, 2009 5 次提交
  14. 06 1月, 2009 1 次提交
  15. 05 1月, 2009 3 次提交
    • M
      net: Fix for initial link state in 2.6.28 · 22604c86
      Michael Marineau 提交于
      From: Michael Marineau <mike@marineau.org>
      
      Commit b4730016 "Do not fire linkwatch
      events until the device is registered." was made as a workaround for
      drivers that call netif_carrier_off before registering the device.
      Unfortunately this causes these drivers to incorrectly report their
      link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
      flag when the interface is first brought up. This issues was
      previously pointed out[1] but was dismissed saying that IFF_RUNNING is
      not related to the link status. From my digging IFF_RUNNING, as
      reported to userspace, is based on the link state. It is set based on
      __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
      and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
      not reported to user space so it may well be independent of the link,
      I don't know if and when it may get set.)
      
      The end result depends slightly depending on the driver. The the two I
      tested were e1000e and b44. With e1000e if the system is booted
      without a network cable attached the interface will falsely report
      RUNNING when it is brought up causing NetworkManager to attempt to
      start it and eventually time out. With b44 when the system is booted
      with a network cable attached and brought up with dhcpcd it will time
      out the first time.
      
      The attached patch that will still set the operstate variable
      correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
      but then return rather than skipping the linkwatch_fire_event call
      entirely as the previous fix did. (sorry it isn't inline, I don't have
      a patch friendly email client at the moment)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22604c86
    • H
      gro: Add page frag support · 5d38a079
      Herbert Xu 提交于
      This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
      in one skb, rather than using the less efficient frag_list.
      
      It also adds a new interface, napi_gro_frags to allow drivers
      to inject page frags directly into the stack without allocating
      an skb.  This is intended to be the GRO equivalent for LRO's
      lro_receive_frags interface.
      
      The existing GSO interface can already handle page frags with
      or without an appended frag_list so nothing needs to be changed
      there.
      
      The merging itself is rather simple.  We store any new frag entries
      after the last existing entry, without checking whether the first
      new entry can be merged with the last existing entry.  Making this
      check would actually be easy but since no existing driver can
      produce contiguous frags anyway it would just be mental masturbation.
      
      If the total number of entries would exceed the capacity of a
      single skb, we simply resort to using frag_list as we do now.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d38a079
    • H
      gro: Use gso_size to store MSS · b530256d
      Herbert Xu 提交于
      In order to allow GRO packets without frag_list at all, we need to
      store the MSS in the packet itself.  The obvious place is gso_size.
      The only thing to watch out for is if the packet ends up not being
      GRO then we need to clear gso_size before pushing the packet into
      the stack.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b530256d
  16. 30 12月, 2008 2 次提交
  17. 27 12月, 2008 1 次提交
  18. 26 12月, 2008 1 次提交
  19. 23 12月, 2008 1 次提交
  20. 18 12月, 2008 3 次提交
  21. 16 12月, 2008 4 次提交
    • H
      ethtool: Add GGRO and SGRO ops · b240a0e5
      Herbert Xu 提交于
      This patch adds the ethtool ops to enable and disable GRO.  It also
      makes GRO depend on RX checksum offload much the same as how TSO
      depends on SG support.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b240a0e5
    • H
      net: Add skb_gro_receive · 71d93b39
      Herbert Xu 提交于
      This patch adds the helper skb_gro_receive to merge packets for
      GRO.  The current method is to allocate a new header skb and then
      chain the original packets to its frag_list.  This is done to
      make it easier to integrate into the existing GSO framework.
      
      In future as GSO is moved into the drivers, we can undo this and
      simply chain the original packets together.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71d93b39
    • H
      net: Add Generic Receive Offload infrastructure · d565b0a1
      Herbert Xu 提交于
      This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
      This is pretty similar to LRO except that this is protocol-independent.
      Instead of holding packets in an lro_mgr structure, they're now held in
      napi_struct.
      
      For drivers that intend to use this, they can set the NETIF_F_GRO bit and
      call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
      The latter will call napi_receive_skb automatically.  When napi_gro_receive
      is used, the driver must either call napi_complete/napi_rx_complete, or
      call napi_gro_flush in softirq context if the driver uses the primitives
      __napi_complete/__napi_rx_complete.
      
      Protocols will set the gro_receive and gro_complete function pointers in
      order to participate in this scheme.
      
      In addition to the packet, gro_receive will get a list of currently held
      packets.  Each packet in the list has a same_flow field which is non-zero
      if it is a potential match for the new packet.  For each packet that may
      match, they also have a flush field which is non-zero if the held packet
      must not be merged with the new packet.
      
      Once gro_receive has determined that the new skb matches a held packet,
      the held packet may be processed immediately if the new skb cannot be
      merged with it.  In this case gro_receive should return the pointer to
      the existing skb in gro_list.  Otherwise the new skb should be merged into
      the existing packet and NULL should be returned, unless the new skb makes
      it impossible for any further merges to be made (e.g., FIN packet) where
      the merged skb should be returned.
      
      Whenever the skb is merged into an existing entry, the gro_receive
      function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
      merely matches an existing entry but can't be merged with it, then
      this shouldn't be set.
      
      If gro_receive finds it pointless to hold the new skb for future merging,
      it should set NAPI_GRO_CB(skb)->flush.
      
      Held packets will be flushed by napi_gro_flush which is called by
      napi_complete and napi_rx_complete.
      
      Currently held packets are stored in a singly liked list just like LRO.
      The list is limited to a maximum of 8 entries.  In future, this may be
      expanded to use a hash table to allow more flows to be held for merging.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d565b0a1
    • H
      net: Add frag_list support to GSO · 1a881f27
      Herbert Xu 提交于
      This patch allows GSO to handle frag_list in a limited way for the
      purposes of allowing packets merged by GRO to be refragmented on
      output.
      
      Most hardware won't (and aren't expected to) support handling GRO
      frag_list packets directly.  Therefore we will perform GSO in
      software for those cases.
      
      However, for drivers that can support it (such as virtual NICs) we
      may not have to segment the packets at all.
      
      Whether the added overhead of GRO/GSO is worthwhile for bridges
      and routers when weighed against the benefit of potentially
      increasing the MTU within the host is still an open question.
      However, for the case of host nodes this is undoubtedly a win.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a881f27