1. 13 4月, 2010 1 次提交
    • E
      net: sk_dst_cache RCUification · b6c6712a
      Eric Dumazet 提交于
      With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
      work.
      
      sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)
      
      This rwlock is readlocked for a very small amount of time, and dst
      entries are already freed after RCU grace period. This calls for RCU
      again :)
      
      This patch converts sk_dst_lock to a spinlock, and use RCU for readers.
      
      __sk_dst_get() is supposed to be called with rcu_read_lock() or if
      socket locked by user, so use appropriate rcu_dereference_check()
      condition (rcu_read_lock_held() || sock_owned_by_user(sk))
      
      This patch avoids two atomic ops per tx packet on UDP connected sockets,
      for example, and permits sk_dst_lock to be much less dirtied.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6c6712a
  2. 12 4月, 2010 3 次提交
  3. 11 4月, 2010 1 次提交
  4. 09 4月, 2010 2 次提交
    • D
      tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb · 2626419a
      David S. Miller 提交于
      Back in commit 04a0551c
      ("loopback: Drop obsolete ip_summed setting") we stopped
      setting CHECKSUM_UNNECESSARY in the loopback xmit.
      
      This is because such a setting was a lie since it implies that the
      checksum field of the packet is properly filled in.
      
      Instead what happens normally is that CHECKSUM_PARTIAL is set and
      skb->csum is calculated as needed.
      
      But this was only happening for TCP data packets (via the
      skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
      happen for non-data packets like ACKs etc.
      
      Fix this by setting skb->ip_summed in the common non-data packet
      constructor.  It already is setting skb->csum to zero.
      
      But this reminds us that we still have things like ip_output.c's
      ip_dev_loopback_xmit() which sets skb->ip_summed to the value
      CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
      valid.  So we'll have to address that at some point too.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2626419a
    • J
      udp: fix for unicast RX path optimization · 1223c67c
      Jorge Boncompte [DTI2] 提交于
      Commits 5051ebd2 and
      5051ebd2 ("ipv[46]: udp: optimize unicast RX
      path") broke some programs.
      
      	After upgrading a L2TP server to 2.6.33 it started to fail, tunnels going up an
      down, after the 10th tunnel came up. My modified rp-l2tp uses a global
      unconnected socket bound to (INADDR_ANY, 1701) and one connected socket per
      tunnel after parameter negotiation.
      
      	After ten sockets were open and due to mixed parameters to
      udp[46]_lib_lookup2() kernel started to drop packets.
      Signed-off-by: NJorge Boncompte [DTI2] <jorge@dti2.net>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1223c67c
  5. 07 4月, 2010 1 次提交
    • T
      xfrm: cache bundles instead of policies for outgoing flows · 80c802f3
      Timo Teräs 提交于
      __xfrm_lookup() is called for each packet transmitted out of
      system. The xfrm_find_bundle() does a linear search which can
      kill system performance depending on how many bundles are
      required per policy.
      
      This modifies __xfrm_lookup() to store bundles directly in
      the flow cache. If we did not get a hit, we just create a new
      bundle instead of doing slow search. This means that we can now
      get multiple xfrm_dst's for same flow (on per-cpu basis).
      Signed-off-by: NTimo Teras <timo.teras@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80c802f3
  6. 04 4月, 2010 2 次提交
    • E
      icmp: Account for ICMP out errors · 1f8438a8
      Eric Dumazet 提交于
      When ip_append() fails because of socket limit or memory shortage,
      increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report
      these errors.
      
      LANG=C netstat -s | grep "ICMP messages failed"
          0 ICMP messages failed
      
      For IPV6, implement ICMP6_MIB_OUTERRORS counter as well.
      
      # grep Icmp6OutErrors /proc/net/dev_snmp6/*
      /proc/net/dev_snmp6/eth0:Icmp6OutErrors                   	0
      /proc/net/dev_snmp6/lo:Icmp6OutErrors                   	0
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f8438a8
    • J
      net: convert multicast list to list_head · 22bedad3
      Jiri Pirko 提交于
      Converts the list and the core manipulating with it to be the same as uc_list.
      
      +uses two functions for adding/removing mc address (normal and "global"
       variant) instead of a function parameter.
      +removes dev_mcast.c completely.
      +exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
       manipulation with lists on a sandbox (used in bonding and 80211 drivers)
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22bedad3
  7. 02 4月, 2010 2 次提交
  8. 31 3月, 2010 1 次提交
  9. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  10. 27 3月, 2010 4 次提交
  11. 25 3月, 2010 1 次提交
  12. 22 3月, 2010 5 次提交
  13. 21 3月, 2010 1 次提交
    • S
      NET_DMA: free skbs periodically · 73852e81
      Steven J. Magnani 提交于
      Under NET_DMA, data transfer can grind to a halt when userland issues a
      large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
      This appears to be because the NET_DMA design queues up lots of memcpy
      operations, but doesn't issue or wait for them (and thus free the
      associated skbs) until it is time for tcp_recvmesg() to return.
      The socket hangs when its TCP window goes to zero before enough data is
      available to satisfy the read.
      
      Periodically issue asynchronous memcpy operations, and free skbs for ones
      that have completed, to prevent sockets from going into zero-window mode.
      Signed-off-by: NSteven J. Magnani <steve@digidescorp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73852e81
  14. 20 3月, 2010 4 次提交
  15. 19 3月, 2010 2 次提交
  16. 17 3月, 2010 3 次提交
  17. 12 3月, 2010 1 次提交
    • D
      ipconfig: Handle devices which take some time to come up. · 964ad81c
      David S. Miller 提交于
      Some network devices, particularly USB ones, take several seconds to
      fully init and appear in the device list.
      
      If the user turned ipconfig on, they are using it for NFS root or some
      other early booting purpose.  So it makes no sense to just flat out
      fail immediately if the device isn't found.
      
      It also doesn't make sense to just jack up the initial wait to
      something crazy like 10 seconds.
      
      Instead, poll immediately, and then periodically once a second,
      waiting for a usable device to appear.  Fail after 12 seconds.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NChristian Pellegrin <chripell@fsfe.org>
      964ad81c
  18. 10 3月, 2010 1 次提交
  19. 09 3月, 2010 3 次提交
    • E
      tcp: Fix tcp_make_synack() · 28b2774a
      Eric Dumazet 提交于
      Commit 4957faad (TCPCT part 1g: Responder Cookie => Initiator), part
      of TCP_COOKIE_TRANSACTION implementation, forgot to correctly size
      synack skb in case user data must be included.
      
      Many thanks to Mika Pentillä for spotting this error.
      Reported-by: NPenttillä Mika <mika.penttila@ixonos.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28b2774a
    • E
      net: fix route cache rebuilds · 98376387
      Eric Dumazet 提交于
      We added an automatic route cache rebuilding in commit 1080d709
      but had to correct few bugs. One of the assumption of original patch,
      was that entries where kept sorted in a given way.
      
      This assumption is known to be wrong (commit 1ddbcb00 gave an
      explanation of this and corrected a leak) and expensive to respect.
      
      Paweł Staszewski reported to me one of his machine got its routing cache
      disabled after few messages like :
      
      [ 2677.850065] Route hash chain too long!
      [ 2677.850080] Adjust your secret_interval!
      [82839.662993] Route hash chain too long!
      [82839.662996] Adjust your secret_interval!
      [155843.731650] Route hash chain too long!
      [155843.731664] Adjust your secret_interval!
      [155843.811881] Route hash chain too long!
      [155843.811891] Adjust your secret_interval!
      [155843.858209] vlan0811: 5 rebuilds is over limit, route caching
      disabled
      [155843.858212] Route hash chain too long!
      [155843.858213] Adjust your secret_interval!
      
      This is because rt_intern_hash() might be fooled when computing a chain
      length, because multiple entries with same keys can differ because of
      TOS (or mark/oif) bits.
      
      In the rare case the fast algorithm see a too long chain, and before
      taking expensive path, we call a helper function in order to not count
      duplicates of same routes, that only differ with tos/mark/oif bits. This
      helper works with data already in cpu cache and is not be very
      expensive, despite its O(N^2) implementation.
      
      Paweł Staszewski sucessfully tested this patch on his loaded router.
      Reported-and-tested-by: NPaweł Staszewski <pstaszewski@itcare.pl>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98376387
    • E
      tcp: Add SNMP counters for backlog and min_ttl drops · 6cce09f8
      Eric Dumazet 提交于
      Commit 6b03a53a (tcp: use limited socket backlog) added the possibility
      of dropping frames when backlog queue is full.
      
      Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the
      possibility of dropping frames when TTL is under a given limit.
      
      This patch adds new SNMP MIB entries, named TCPBacklogDrop and
      TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line
      
      netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop"
          TCPBacklogDrop: 0
          TCPMinTTLDrop: 0
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cce09f8
  20. 06 3月, 2010 1 次提交