1. 10 3月, 2009 1 次提交
  2. 05 3月, 2009 2 次提交
  3. 04 3月, 2009 1 次提交
  4. 03 3月, 2009 2 次提交
  5. 01 3月, 2009 1 次提交
    • H
      netpoll: Add drop checks to all entry points · 4ead4431
      Herbert Xu 提交于
      The netpoll entry checks are required to ensure that we don't
      receive normal packets when invoked via netpoll.  Unfortunately
      it only ever worked for the netif_receive_skb/netif_rx entry
      points.  The VLAN (and subsequently GRO) entry point didn't
      have the check and therefore can trigger all sorts of weird
      problems.
      
      This patch adds the netpoll check to all entry points.
      
      I'm still uneasy with receiving at all under netpoll (which
      apparently is only used by the out-of-tree kdump code).  The
      reason is it is perfectly legal to receive all data including
      headers into highmem if netpoll is off, but if you try to do
      that with netpoll on and someone gets a printk in an IRQ handler                                             
      you're going to get a nice BUG_ON.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ead4431
  6. 27 2月, 2009 4 次提交
  7. 25 2月, 2009 1 次提交
    • P
      netlink: change nlmsg_notify() return value logic · 1ce85fe4
      Pablo Neira Ayuso 提交于
      This patch changes the return value of nlmsg_notify() as follows:
      
      If NETLINK_BROADCAST_ERROR is set by any of the listeners and
      an error in the delivery happened, return the broadcast error;
      else if there are no listeners apart from the socket that
      requested a change with the echo flag, return the result of the
      unicast notification. Thus, with this patch, the unicast
      notification is handled in the same way of a broadcast listener
      that has set the NETLINK_BROADCAST_ERROR socket flag.
      
      This patch is useful in case that the caller of nlmsg_notify()
      wants to know the result of the delivery of a netlink notification
      (including the broadcast delivery) and take any action in case
      that the delivery failed. For example, ctnetlink can drop packets
      if the event delivery failed to provide reliable logging and
      state-synchronization at the cost of dropping packets.
      
      This patch also modifies the rtnetlink code to ignore the return
      value of rtnl_notify() in all callers. The function rtnl_notify()
      (before this patch) returned the error of the unicast notification
      which makes rtnl_set_sk_err() reports errors to all listeners. This
      is not of any help since the origin of the change (the socket that
      requested the echoing) notices the ENOBUFS error if the notification
      fails and should resync itself.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce85fe4
  8. 24 2月, 2009 2 次提交
  9. 23 2月, 2009 1 次提交
    • E
      netns: Remove net_alive · ce16c533
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce16c533
  10. 22 2月, 2009 1 次提交
    • D
      netns: fix double free at netns creation · 486a87f1
      Daniel Lezcano 提交于
      This patch fix a double free when a network namespace fails.
      The previous code does a kfree of the net_generic structure when
      one of the init subsystem initialization fails.
      The 'setup_net' function does kfree(ng) and returns an error.
      The caller, 'copy_net_ns', call net_free on error, and this one
      calls kfree(net->gen), making this pointer freed twice.
      
      This patch make the code symetric, the net_alloc does the net_generic
      allocation and the net_free frees the net_generic.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      486a87f1
  11. 21 2月, 2009 1 次提交
  12. 20 2月, 2009 1 次提交
  13. 19 2月, 2009 1 次提交
  14. 18 2月, 2009 1 次提交
    • D
      net: Kill skb_truesize_check(), it only catches false-positives. · 92a0acce
      David S. Miller 提交于
      A long time ago we had bugs, primarily in TCP, where we would modify
      skb->truesize (for TSO queue collapsing) in ways which would corrupt
      the socket memory accounting.
      
      skb_truesize_check() was added in order to try and catch this error
      more systematically.
      
      However this debugging check has morphed into a Frankenstein of sorts
      and these days it does nothing other than catch false-positives.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a0acce
  15. 16 2月, 2009 3 次提交
    • P
      d24fff22
    • P
      net: socket infrastructure for SO_TIMESTAMPING · 20d49473
      Patrick Ohly 提交于
      The overlap with the old SO_TIMESTAMP[NS] options is handled so
      that time stamping in software (net_enable_timestamp()) is
      enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
      is set.  It's disabled if all of these are off.
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20d49473
    • P
      net: infrastructure for hardware time stamping · ac45f602
      Patrick Ohly 提交于
      The additional per-packet information (16 bytes for time stamps, 1
      byte for flags) is stored for all packets in the skb_shared_info
      struct. This implementation detail is hidden from users of that
      information via skb_* accessor functions. A separate struct resp.
      union is used for the additional information so that it can be
      stored/copied easily outside of skb_shared_info.
      
      Compared to previous implementations (reusing the tstamp field
      depending on the context, optional additional structures) this
      is the simplest solution. It does not extend sk_buff itself.
      
      TX time stamping is implemented in software if the device driver
      doesn't support hardware time stamping.
      
      The new semantic for hardware/software time stamping around
      ndo_start_xmit() is based on two assumptions about existing
      network device drivers which don't support hardware time
      stamping and know nothing about it:
       - they leave the new skb_shared_tx unmodified
       - the keep the connection to the originating socket in skb->sk
         alive, i.e., don't call skb_orphan()
      
      Given that skb_shared_tx is new, the first assumption is safe.
      The second is only true for some drivers. As a result, software
      TX time stamping currently works with the bnx2 driver, but not
      with the unmodified igb driver (the two drivers this patch series
      was tested with).
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac45f602
  16. 13 2月, 2009 2 次提交
  17. 10 2月, 2009 1 次提交
  18. 09 2月, 2009 2 次提交
  19. 07 2月, 2009 1 次提交
  20. 06 2月, 2009 2 次提交
  21. 05 2月, 2009 2 次提交
    • H
      net: Reexport sock_alloc_send_pskb · 4cc7f68d
      Herbert Xu 提交于
      The function sock_alloc_send_pskb is completely useless if not
      exported since most of the code in it won't be used as is.  In
      fact, this code has already been duplicated in the tun driver.
      
      Now that we need accounting in the tun driver, we can in fact
      use this function as is.  So this patch marks it for export again.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cc7f68d
    • H
      net: Partially allow skb destructors to be used on receive path · 9a279bcb
      Herbert Xu 提交于
      As it currently stands, skb destructors are forbidden on the
      receive path because the protocol end-points will overwrite
      any existing destructor with their own.
      
      This is the reason why we have to call skb_orphan in the loopback
      driver before we reinject the packet back into the stack, thus
      creating a period during which loopback traffic isn't charged
      to any socket.
      
      With virtualisation, we have a similar problem in that traffic
      is reinjected into the stack without being associated with any
      socket entity, thus providing no natural congestion push-back
      for those poor folks still stuck with UDP.
      
      Now had we been consistent in telling them that UDP simply has
      no congestion feedback, I could just fob them off.  Unfortunately,
      we appear to have gone to some length in catering for this on
      the standard UDP path, with skb/socket accounting so that has
      created a very unhealthy dependency.
      
      Alas habits are difficult to break out of, so we may just have
      to allow skb destructors on the receive path.
      
      It turns out that making skb destructors useable on the receive path
      isn't as easy as it seems.  For instance, simply adding skb_orphan
      to skb_set_owner_r isn't enough.  This is because we assume all
      over the IP stack that skb->sk is an IP socket if present.
      
      The new transparent proxy code goes one step further and assumes
      that skb->sk is the receiving socket if present.
      
      Now all of this can be dealt with by adding simple checks such
      as only treating skb->sk as an IP socket if skb->sk->sk_family
      matches.  However, it turns out that for bridging at least we
      don't need to do all of this work.
      
      This is of interest because most virtualisation setups use bridging
      so we don't actually go through the IP stack on the host (with
      the exception of our old nemesis the bridge netfilter, but that's
      easily taken care of).
      
      So this patch simply adds skb_orphan to the point just before we
      enter the IP stack, but after we've gone through the bridge on the
      receive path.  It also adds an skb_orphan to the one place in
      netfilter that touches skb->sk/skb->destructor, that is, tproxy.
      
      One word of caution, because of the internal code structure, anyone
      wishing to deploy this must use skb_set_owner_w as opposed to
      skb_set_owner_r since many functions that create a new skb from
      an existing one will invoke skb_set_owner_w on the new skb.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a279bcb
  22. 01 2月, 2009 2 次提交
  23. 30 1月, 2009 5 次提交
    • H
      gro: Open-code memcpy in napi_fraginfo_skb · 80595d59
      Herbert Xu 提交于
      This patch optimises napi_fraginfo_skb to only copy the bits
      necessary.  We also open-code the memcpy so that the alignment
      information is always available to gcc.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80595d59
    • H
      gro: Do not merge paged packets into frag_list · 81705ad1
      Herbert Xu 提交于
      gro: Do not merge paged packets into frag_list
      
      Bigger is not always better :)
      
      It was easy to continue to merged packets into frag_list after the
      page array is full.  However, this turns out to be worse than LRO
      because frag_list is a much less efficient form of storage than the
      page array.  So we're better off stopping the merge and starting
      a new entry with an empty page array.
      
      In future we can optimise this further by doing frag_list merging
      but making sure that we continue to fill in the page array.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81705ad1
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
    • H
      gro: Move common completion code into helpers · 5d0d9be8
      Herbert Xu 提交于
      Currently VLAN still has a bit of common code handling the aftermath
      of GRO that's shared with the common path.  This patch moves them
      into shared helpers to reduce code duplication.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d0d9be8
    • S
      net: Fix OOPS in skb_seq_read(). · 71b3346d
      Shyam Iyer 提交于
      It oopsd for me in skb_seq_read. addr2line said it was
      linux-2.6/net/core/skbuff.c:2228, which is this line:
      
      
      	while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
      
      
      I added some printks in there and it looks like we hit this:
      
              } else if (st->root_skb == st->cur_skb &&
                         skb_shinfo(st->root_skb)->frag_list) {
                       st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                       st->frag_idx = 0;
                       goto next_skb;
              }
      
      
      
      Actually I did some testing and added a few printks and found that the
      st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
      This caused the kernel panic.
      
       	if (abs_offset < block_limit) {
      -		*data = st->cur_skb->data + abs_offset;
      +		*data = st->cur_skb->data + (abs_offset - st->stepped_offset);
      
      I enabled the debug_tcp and with a few printks found that the code did
      not go to the next_skb label and could find that the sequence being
      followed was this -
      
      It hit this if condition -
      
              if (st->cur_skb->next) {
                      st->cur_skb = st->cur_skb->next;
                      st->frag_idx = 0;
                      goto next_skb;
      
      And so, now the st pointer is shifted to the next skb whereas actually
      it should have hit the second else if first since the data is in the
      frag_list.
      
              else if (st->root_skb == st->cur_skb &&
                       skb_shinfo(st->root_skb)->frag_list) {
                      st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                      goto next_skb;
              }
      
      Reversing the two conditions the attached patch fixes the issue for me
      on top of Herbert's patches. 
      Signed-off-by: NShyam Iyer <shyam_iyer@dell.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71b3346d