1. 16 6月, 2010 3 次提交
    • E
      ip_frag: Remove some atomic ops · d27f9b35
      Eric Dumazet 提交于
      Instead of doing one atomic operation per frag, we can factorize them.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d27f9b35
    • E
      inetpeer: RCU conversion · aa1039e7
      Eric Dumazet 提交于
      inetpeer currently uses an AVL tree protected by an rwlock.
      
      It's possible to make most lookups use RCU
      
      1) Add a struct rcu_head to struct inet_peer
      
      2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
      lookup. This is a normal function, not a macro like lookup().
      
      3) Add a limit to number of links followed by lookup_rcu_bh(). This is
      needed in case we fall in a loop.
      
      4) add an smp_wmb() in link_to_pool() right before node insert.
      
      5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
      last reference to an inet_peer, since lockless readers could increase
      refcount, even while we hold peers.lock.
      
      6) Delay struct inet_peer freeing after rcu grace period so that
      lookup_rcu_bh() cannot crash.
      
      7) inet_getpeer() first attempts lockless lookup.
         Note this lookup can fail even if target is in AVL tree, but a
      concurrent writer can let tree in a non correct form.
         If this attemps fails, lock is taken a regular lookup is performed
      again.
      
      8) convert peers.lock from rwlock to a spinlock
      
      9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
      rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
      128 bytes)
      In a future patch, this is probably possible to revert this part, if rcu
      field is put in an union to share space with rid, ip_id_count, tcp_ts &
      tcp_ts_stamp. These fields being manipulated only with refcnt > 0.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa1039e7
    • C
      tcp: unify tcp flag macros · a3433f35
      Changli Gao 提交于
      unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
      TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
      with the corresponding TCPHDR_*.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/net/tcp.h                      |   24 ++++++-------
       net/ipv4/tcp.c                         |    8 ++--
       net/ipv4/tcp_input.c                   |    2 -
       net/ipv4/tcp_output.c                  |   59 ++++++++++++++++-----------------
       net/netfilter/nf_conntrack_proto_tcp.c |   32 ++++++-----------
       net/netfilter/xt_TCPMSS.c              |    4 --
       6 files changed, 58 insertions(+), 71 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3433f35
  2. 15 6月, 2010 2 次提交
    • E
      netfilter: CLUSTERIP: RCU conversion · d73f33b1
      Eric Dumazet 提交于
      - clusterip_lock becomes a spinlock
      - lockless lookups
      - kfree() deferred after RCU grace period
      - rcu_barrier_bh() inserted in clusterip_tg_exit()
      
      v2)
      - As Patrick pointed out, we use atomic_inc_not_zero() in
      clusterip_config_find_get().
      - list_add_rcu() and list_del_rcu() variants are used.
      - atomic_dec_and_lock() used in clusterip_config_entry_put()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      d73f33b1
    • E
      inetpeer: various changes · d6cc1d64
      Eric Dumazet 提交于
      Try to reduce cache line contentions in peer management, to reduce IP
      defragmentation overhead.
      
      - peer_fake_node is marked 'const' to make sure its not modified.
        (tested with CONFIG_DEBUG_RODATA=y)
      
      - Group variables in two structures to reduce number of dirtied cache
      lines. One named "peers" for avl tree root, its number of entries, and
      associated lock. (candidate for RCU conversion)
      
      - A second one named "unused_peers" for unused list and its lock
      
      - Add a !list_empty() test in unlink_from_unused() to avoid taking lock
      when entry is not unused.
      
      - Use atomic_dec_and_lock() in inet_putpeer() to avoid taking lock in
      some cases.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6cc1d64
  3. 14 6月, 2010 1 次提交
  4. 11 6月, 2010 2 次提交
  5. 10 6月, 2010 1 次提交
  6. 09 6月, 2010 1 次提交
  7. 08 6月, 2010 5 次提交
  8. 07 6月, 2010 3 次提交
    • E
      ipmr: dont corrupt lists · 035320d5
      Eric Dumazet 提交于
      ipmr_rules_exit() and ip6mr_rules_exit() free a list of items, but
      forget to properly remove these items from list. List head is not
      changed and still points to freed memory.
      
      This can trigger a fault later when icmpv6_sk_exit() is called.
      
      Fix is to either reinit list, or use list_del() to properly remove items
      from list before freeing them.
      
      bugzilla report : https://bugzilla.kernel.org/show_bug.cgi?id=16120
      
      Introduced by commit d1db275d (ipv6: ip6mr: support multiple
      tables) and commit f0ad0860 (ipv4: ipmr: support multiple tables)
      Reported-by: NAlex Zhavnerchik <alex.vizor@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      035320d5
    • E
      raw: avoid two atomics in xmit · 1789a640
      Eric Dumazet 提交于
      Avoid two atomic ops per raw_send_hdrinc() call
      
      Avoid two atomic ops per raw6_send_hdrinc() call
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1789a640
    • T
      tcp: Fix slowness in read /proc/net/tcp · a8b690f9
      Tom Herbert 提交于
      This patch address a serious performance issue in reading the
      TCP sockets table (/proc/net/tcp).
      
      Reading the full table is done by a number of sequential read
      operations.  At each read operation, a seek is done to find the
      last socket that was previously read.  This seek operation requires
      that the sockets in the table need to be counted up to the current
      file position, and to count each of these requires taking a lock for
      each non-empty bucket.  The whole algorithm is O(n^2).
      
      The fix is to cache the last bucket value, offset within the bucket,
      and the file position returned by the last read operation.   On the
      next sequential read, the bucket and offset are used to find the
      last read socket immediately without needing ot scan the previous
      buckets  the table.  This algorithm t read the whole table is O(n).
      
      The improvement offered by this patch is easily show by performing
      cat'ing /proc/net/tcp on a machine with a lot of connections.  With
      about 182K connections in the table, I see the following:
      
      - Without patch
      time cat /proc/net/tcp > /dev/null
      
      real	1m56.729s
      user	0m0.214s
      sys	1m56.344s
      
      - With patch
      time cat /proc/net/tcp > /dev/null
      
      real	0m0.894s
      user	0m0.290s
      sys	0m0.594s
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b690f9
  9. 05 6月, 2010 6 次提交
  10. 04 6月, 2010 3 次提交
  11. 03 6月, 2010 2 次提交
  12. 02 6月, 2010 2 次提交
  13. 01 6月, 2010 3 次提交
  14. 31 5月, 2010 4 次提交
  15. 29 5月, 2010 1 次提交
  16. 27 5月, 2010 1 次提交
    • E
      net: fix lock_sock_bh/unlock_sock_bh · 8a74ad60
      Eric Dumazet 提交于
      This new sock lock primitive was introduced to speedup some user context
      socket manipulation. But it is unsafe to protect two threads, one using
      regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
      
      This patch changes lock_sock_bh to be careful against 'owned' state.
      If owned is found to be set, we must take the slow path.
      lock_sock_bh() now returns a boolean to say if the slow path was taken,
      and this boolean is used at unlock_sock_bh time to call the appropriate
      unlock function.
      
      After this change, BH are either disabled or enabled during the
      lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
      so we rename these functions to lock_sock_fast()/unlock_sock_fast().
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a74ad60