1. 06 10月, 2010 1 次提交
    • E
      net neigh: RCU conversion of neigh hash table · d6bf7817
      Eric Dumazet 提交于
      David
      
      This is the first step for RCU conversion of neigh code.
      
      Next patches will convert hash_buckets[] and "struct neighbour" to RCU
      protected objects.
      
      Thanks
      
      [PATCH net-next] net neigh: RCU conversion of neigh hash table
      
      Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
      neigh_table", a new structure is defined :
      
      struct neigh_hash_table {
             struct neighbour        **hash_buckets;
             unsigned int            hash_mask;
             __u32                   hash_rnd;
             struct rcu_head         rcu;
      };
      
      And "struct neigh_table" has an RCU protected pointer to such a
      neigh_hash_table.
      
      This means the signature of (*hash)() function changed: We need to add a
      third parameter with the actual hash_rnd value, since this is not
      anymore a neigh_table field.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bf7817
  2. 05 10月, 2010 2 次提交
  3. 04 10月, 2010 1 次提交
    • E
      net: introduce DST_NOCACHE flag · c7d4426a
      Eric Dumazet 提交于
      While doing stress tests with IP route cache disabled, and multi queue
      devices, I noticed a very high contention on one rwlock used in
      neighbour code.
      
      When many cpus are trying to send frames (possibly using a high
      performance multiqueue device) to the same neighbour, they fight for the
      neigh->lock rwlock in order to call neigh_hh_init(), and fight on
      hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())
      
      But we dont need to call neigh_hh_init() for dst that are used only
      once. It costs four atomic operations at least, on two contended cache
      lines, plus the high contention on neigh->lock rwlock.
      
      Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
      inserted in route cache.
      
      With the stress test bench, sending 160000000 frames on one neighbour,
      results are :
      
      Before patch:
      
      real	2m28.406s
      user	0m11.781s
      sys	36m17.964s
      
      
      After patch:
      
      real	1m26.532s
      user	0m12.185s
      sys	20m3.903s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d4426a
  4. 01 10月, 2010 1 次提交
  5. 30 9月, 2010 2 次提交
  6. 29 9月, 2010 1 次提交
    • T
      ipv4: Allow configuring subnets as local addresses · 4465b469
      Tom Herbert 提交于
      This patch allows a host to be configured to respond to any address in
      a specified range as if it were local, without actually needing to
      configure the address on an interface.  This is done through routing
      table configuration.  For instance, to configure a host to respond
      to any address in 10.1/16 received on eth0 as a local address we can do:
      
      ip rule add from all iif eth0 lookup 200
      ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200
      
      This host is now reachable by any 10.1/16 address (route lookup on
      input for packets received on eth0 can find the route).  On output, the
      rule will not be matched so that this host can still send packets to
      10.1/16 (not sent on loopback).  Presumably, external routing can be
      configured to make sense out of this.
      
      To make this work, we needed to modify the logic in finding the
      interface which is assigned a given source address for output
      (dev_ip_find).  We perform a normal fib_lookup instead of just a
      lookup on the local table, and in the lookup we ignore the input
      interface for matching.
      
      This patch is useful to implement IP-anycast for subnets of virtual
      addresses.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4465b469
  7. 28 9月, 2010 5 次提交
  8. 27 9月, 2010 3 次提交
  9. 25 9月, 2010 1 次提交
  10. 24 9月, 2010 1 次提交
  11. 22 9月, 2010 1 次提交
  12. 21 9月, 2010 1 次提交
    • T
      xfrm: Allow different selector family in temporary state · 8444cf71
      Thomas Egerer 提交于
      The family parameter xfrm_state_find is used to find a state matching a
      certain policy. This value is set to the template's family
      (encap_family) right before xfrm_state_find is called.
      The family parameter is however also used to construct a temporary state
      in xfrm_state_find itself which is wrong for inter-family scenarios
      because it produces a selector for the wrong family. Since this selector
      is included in the xfrm_user_acquire structure, user space programs
      misinterpret IPv6 addresses as IPv4 and vice versa.
      This patch splits up the original init_tempsel function into a part that
      initializes the selector respectively the props and id of the temporary
      state, to allow for differing ip address families whithin the state.
      Signed-off-by: NThomas Egerer <thomas.egerer@secunet.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8444cf71
  13. 17 9月, 2010 2 次提交
  14. 16 9月, 2010 4 次提交
  15. 15 9月, 2010 2 次提交
  16. 09 9月, 2010 3 次提交
    • E
      udp: add rehash on connect() · 719f8358
      Eric Dumazet 提交于
      commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
      added a secondary hash on UDP, hashed on (local addr, local port).
      
      Problem is that following sequence :
      
      fd = socket(...)
      connect(fd, &remote, ...)
      
      not only selects remote end point (address and port), but also sets
      local address, while UDP stack stored in secondary hash table the socket
      while its local address was INADDR_ANY (or ipv6 equivalent)
      
      Sequence is :
       - autobind() : choose a random local port, insert socket in hash tables
                    [while local address is INADDR_ANY]
       - connect() : set remote address and port, change local address to IP
                    given by a route lookup.
      
      When an incoming UDP frame comes, if more than 10 sockets are found in
      primary hash table, we switch to secondary table, and fail to find
      socket because its local address changed.
      
      One solution to this problem is to rehash datagram socket if needed.
      
      We add a new rehash(struct socket *) method in "struct proto", and
      implement this method for UDP v4 & v6, using a common helper.
      
      This rehashing only takes care of secondary hash table, since primary
      hash (based on local port only) is not changed.
      Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      719f8358
    • J
      e3634169
    • J
      ipvs: fix active FTP · 6523ce15
      Julian Anastasov 提交于
      - Do not create expectation when forwarding the PORT
        command to avoid blocking the connection. The problem is that
        nf_conntrack_ftp.c:help() tries to create the same expectation later in
        POST_ROUTING and drops the packet with "dropping packet" message after
        failure in nf_ct_expect_related.
      
      - Change ip_vs_update_conntrack to alter the conntrack
        for related connections from real server. If we do not alter the reply in
        this direction the next packet from client sent to vport 20 comes as NEW
        connection. We alter it but may be some collision happens for both
        conntracks and the second conntrack gets destroyed immediately. The
        connection stucks too.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6523ce15
  17. 04 9月, 2010 1 次提交
  18. 31 8月, 2010 3 次提交
    • G
      TCP: update initial windows according to RFC 5681 · 3d5b99ae
      Gerrit Renker 提交于
      This updates the use of larger initial windows, as originally specified in
      RFC 3390, to use the newer IW values specified in RFC 5681, section 3.1.
      
      The changes made in RFC 5681 are:
       a) the setting now is more clearly specified in units of segments (as the
          comments  by John Heffner emphasized, this was not very clear in RFC 3390);
       b) for connections with 1095 < SMSS <= 2190 there is now a change:
          - RFC 3390 says that IW <= 4380,
          - RFC 5681 says that IW = 3 * SMSS <= 6570.
      
      Since RFC 3390 is older and "only" proposed standard, whereas the newer RFC 5681
      is already draft standard, it seems preferable to use the newer IW variant.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d5b99ae
    • G
      tcp/dccp: Consolidate common code for RFC 3390 conversion · 22b71c8f
      Gerrit Renker 提交于
      This patch consolidates initial-window code common to TCP and CCID-2:
       * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
       * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b71c8f
    • J
      tcp: Add TCP_USER_TIMEOUT socket option. · dca43c75
      Jerry Chu 提交于
      This patch provides a "user timeout" support as described in RFC793. The
      socket option is also needed for the the local half of RFC5482 "TCP User
      Timeout Option".
      
      TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
      when > 0, to specify the maximum amount of time in ms that transmitted
      data may remain unacknowledged before TCP will forcefully close the
      corresponding connection and return ETIMEDOUT to the application. If
      0 is given, TCP will continue to use the system default.
      
      Increasing the user timeouts allows a TCP connection to survive extended
      periods without end-to-end connectivity. Decreasing the user timeouts
      allows applications to "fail fast" if so desired. Otherwise it may take
      upto 20 minutes with the current system defaults in a normal WAN
      environment.
      
      The socket option can be made during any state of a TCP connection, but
      is only effective during the synchronized states of a connection
      (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
      Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
      TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
      connection due to keepalive failure.
      
      The option does not change in anyway when TCP retransmits a packet, nor
      when a keepalive probe will be sent.
      
      This option, like many others, will be inherited by an acceptor from its
      listener.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dca43c75
  19. 28 8月, 2010 3 次提交
  20. 27 8月, 2010 1 次提交
  21. 26 8月, 2010 1 次提交