1. 01 12月, 2011 1 次提交
  2. 14 11月, 2011 1 次提交
    • E
      neigh: new unresolved queue limits · 8b5c171b
      Eric Dumazet 提交于
      Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
      > From: David Miller <davem@davemloft.net>
      > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
      >
      > > From: Eric Dumazet <eric.dumazet@gmail.com>
      > > Date: Wed, 09 Nov 2011 12:14:09 +0100
      > >
      > >> unres_qlen is the number of frames we are able to queue per unresolved
      > >> neighbour. Its default value (3) was never changed and is responsible
      > >> for strange drops, especially if IP fragments are used, or multiple
      > >> sessions start in parallel. Even a single tcp flow can hit this limit.
      > >  ...
      > >
      > > Ok, I've applied this, let's see what happens :-)
      >
      > Early answer, build fails.
      >
      > Please test build this patch with DECNET enabled and resubmit.  The
      > decnet neigh layer still refers to the removed ->queue_len member.
      >
      > Thanks.
      
      Ouch, this was fixed on one machine yesterday, but not the other one I
      used this morning, sorry.
      
      [PATCH V5 net-next] neigh: new unresolved queue limits
      
      unres_qlen is the number of frames we are able to queue per unresolved
      neighbour. Its default value (3) was never changed and is responsible
      for strange drops, especially if IP fragments are used, or multiple
      sessions start in parallel. Even a single tcp flow can hit this limit.
      
      $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
      8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5c171b
  3. 09 11月, 2011 1 次提交
  4. 30 9月, 2011 1 次提交
  5. 17 9月, 2011 1 次提交
    • T
      ipv6: Send ICMPv6 RSes only when RAs are accepted · 026359bc
      Tore Anderson 提交于
      This patch improves the logic determining when to send ICMPv6 Router
      Solicitations, so that they are 1) always sent when the kernel is
      accepting Router Advertisements, and 2) never sent when the kernel is
      not accepting RAs. In other words, the operational setting of the
      "accept_ra" sysctl is used.
      
      The change also makes the special "Hybrid Router" forwarding mode
      ("forwarding" sysctl set to 2) operate exactly the same as the standard
      Router mode (forwarding=1). The only difference between the two was
      that RSes was being sent in the Hybrid Router mode only. The sysctl
      documentation describing the special Hybrid Router mode has therefore
      been removed.
      
      Rationale for the change:
      
      Currently, the value of forwarding sysctl is the only thing determining
      whether or not to send RSes. If it has the value 0 or 2, they are sent,
      otherwise they are not. This leads to inconsistent behaviour in the
      following cases:
      
      * accept_ra=0, forwarding=0
      * accept_ra=0, forwarding=2
      * accept_ra=1, forwarding=2
      * accept_ra=2, forwarding=1
      
      In the first three cases, the kernel will send RSes, even though it will
      not accept any RAs received in reply. In the last case, it will not send
      any RSes, even though it will accept and process any RAs received. (Most
      routers will send unsolicited RAs periodically, so suppressing RSes in
      the last case will merely delay auto-configuration, not prevent it.)
      
      Also, it is my opinion that having the forwarding sysctl control RS
      sending behaviour (completely independent of whether RAs are being
      accepted or not) is simply not what most users would intuitively expect
      to be the case.
      Signed-off-by: NTore Anderson <tore@fud.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      026359bc
  6. 23 8月, 2011 1 次提交
  7. 09 7月, 2011 1 次提交
  8. 05 7月, 2011 2 次提交
  9. 09 6月, 2011 1 次提交
    • E
      inetpeer: remove unused list · 4b9d9be8
      Eric Dumazet 提交于
      Andi Kleen and Tim Chen reported huge contention on inetpeer
      unused_peers.lock, on memcached workload on a 40 core machine, with
      disabled route cache.
      
      It appears we constantly flip peers refcnt between 0 and 1 values, and
      we must insert/remove peers from unused_peers.list, holding a contended
      spinlock.
      
      Remove this list completely and perform a garbage collection on-the-fly,
      at lookup time, using the expired nodes we met during the tree
      traversal.
      
      This removes a lot of code, makes locking more standard, and obsoletes
      two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
      removes two pointers in inet_peer structure.
      
      There is still a false sharing effect because refcnt is in first cache
      line of object [were the links and keys used by lookups are located], we
      might move it at the end of inet_peer structure to let this first cache
      line mostly read by cpus.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b9d9be8
  10. 21 2月, 2011 1 次提交
  11. 03 2月, 2011 1 次提交
  12. 14 12月, 2010 1 次提交
  13. 29 11月, 2010 1 次提交
  14. 18 11月, 2010 1 次提交
  15. 13 11月, 2010 1 次提交
  16. 04 9月, 2010 1 次提交
  17. 31 5月, 2010 1 次提交
  18. 16 5月, 2010 1 次提交
  19. 23 2月, 2010 1 次提交
    • B
      IPv6: better document max_addresses parameter · e79dc484
      Brian Haley 提交于
      Andrew Morton wrote:
      >> >From ip-sysctl.txt file in kernel documentation I can see following description
      >> for max_addresses:
      >> max_addresses - INTEGER
      >>         Number of maximum addresses per interface.  0 disables limitation.
      >>         It is recommended not set too large value (or 0) because it would
      >>         be too easy way to crash kernel to allow to create too much of
      >>         autoconfigured addresses.
                 ^^^^^^^^^^^^^^
      
      >> If this parameter applies only for auto-configured IP addressed, please state
      >> it more clearly in docs or rename the parameter to show that it refers to
      >> auto-configuration.
      
      It did mention autoconfigured in the text, but the below makes it more obvious.
      
      More clearly document IPv6 max_addresses parameter.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e79dc484
  20. 19 2月, 2010 2 次提交
    • A
      net: TCP thin dupack · 7e380175
      Andreas Petlund 提交于
      This patch enables fast retransmissions after one dupACK for
      TCP if the stream is identified as thin. This will reduce
      latencies for thin streams that are not able to trigger fast
      retransmissions due to high packet interarrival time. This
      mechanism is only active if enabled by iocontrol or syscontrol
      and the stream is identified as thin.
      Signed-off-by: NAndreas Petlund <apetlund@simula.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e380175
    • A
      net: TCP thin linear timeouts · 36e31b0a
      Andreas Petlund 提交于
      This patch will make TCP use only linear timeouts if the
      stream is thin. This will help to avoid the very high latencies
      that thin stream suffer because of exponential backoff. This
      mechanism is only active if enabled by iocontrol or syscontrol
      and the stream is identified as thin. A maximum of 6 linear
      timeouts is tried before exponential backoff is resumed.
      Signed-off-by: NAndreas Petlund <apetlund@simula.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36e31b0a
  21. 19 1月, 2010 1 次提交
  22. 07 1月, 2010 1 次提交
    • J
      net: RFC3069, private VLAN proxy arp support · 65324144
      Jesper Dangaard Brouer 提交于
      This is to be used together with switch technologies, like RFC3069,
      that where the individual ports are not allowed to communicate with
      each other, but they are allowed to talk to the upstream router.  As
      described in RFC 3069, it is possible to allow these hosts to
      communicate through the upstream router by proxy_arp'ing.
      
      This patch basically allow proxy arp replies back to the same
      interface (from which the ARP request/solicitation was received).
      
      Tunable per device via proc "proxy_arp_pvlan":
        /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan
      
      This switch technology is known by different vendor names:
       - In RFC 3069 it is called VLAN Aggregation.
       - Cisco and Allied Telesyn call it Private VLAN.
       - Hewlett-Packard call it Source-Port filtering or port-isolation.
       - Ericsson call it MAC-Forced Forwarding (RFC Draft).
      Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65324144
  23. 04 12月, 2009 1 次提交
  24. 03 12月, 2009 2 次提交
  25. 07 10月, 2009 1 次提交
    • O
      make TLLAO option for NA packets configurable · f7734fdf
      Octavian Purdila 提交于
      On Friday 02 October 2009 20:53:51 you wrote:
      
      > This is good although I would have shortened the name.
      
      Ah, I knew I forgot something :) Here is v4.
      
      tavi
      
      >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
      From: Octavian Purdila <opurdila@ixiacom.com>
      Date: Fri, 2 Oct 2009 00:51:15 +0300
      Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
      
      Neighbor advertisements responding to unicast neighbor solicitations
      did not include the target link-layer address option. This patch adds
      a new sysctl option (disabled by default) which controls whether this
      option should be sent even with unicast NAs.
      
      The need for this arose because certain routers expect the TLLAO in
      some situations even as a response to unicast NS packets.
      
      Moreover, RFC 2461 recommends sending this to avoid a race condition
      (section 4.4, Target link-layer address)
      Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7734fdf
  26. 05 9月, 2009 1 次提交
    • B
      sctp: Sysctl configuration for IPv4 Address Scoping · 72388433
      Bhaskar Dutta 提交于
      This patch introduces a new sysctl option to make IPv4 Address Scoping
      configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
      
      In networking environments where DNAT rules in iptables prerouting
      chains convert destination IP's to link-local/private IP addresses,
      SCTP connections fail to establish as the INIT chunk is dropped by the
      kernel due to address scope match failure.
      For example to support overlapping IP addresses (same IP address with
      different vlan id) a Layer-5 application listens on link local IP's,
      and there is a DNAT rule that maps the destination IP to a link local
      IP. Such applications never get the SCTP INIT if the address-scoping
      draft is strictly followed.
      
      This sysctl configuration allows SCTP to function in such
      unconventional networking environments.
      
      Sysctl options:
      0 - Disable IPv4 address scoping draft altogether
      1 - Enable IPv4 address scoping (default, current behavior)
      2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
      3 - Enable address scoping but allow IPv4 link local address in init/init-ack
      Signed-off-by: NBhaskar Dutta <bhaskar.dutta@globallogic.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      72388433
  27. 02 9月, 2009 1 次提交
  28. 01 6月, 2009 1 次提交
  29. 18 5月, 2009 1 次提交
  30. 05 5月, 2009 1 次提交
    • I
      tcp: extend ECN sysctl to allow server-side only ECN · 255cac91
      Ilpo Järvinen 提交于
      This should be very safe compared with full enabled, so I see
      no reason why it shouldn't be done right away. As ECN can only
      be negotiated if the SYN sending party is also supporting it,
      somebody in the loop probably knows what he/she is doing. If
      SYN does not ask for ECN, the server side SYN-ACK is identical
      to what it is without ECN. Thus it's quite safe.
      
      The chosen value is safe w.r.t to existing configs which
      choose to currently set manually either 0 or 1 but
      silently upgrades those who have not explicitly requested
      ECN off.
      
      Whether to just enable both sides comes up time to time but
      unless that gets done now we can at least make the servers
      aware of ECN already. As there are some known problems to occur
      if ECN is enabled, it's currently questionable whether there's
      any real gain from enabling clients as servers mostly won't
      support it anyway (so we'd hit just the negative sides). After
      enabling the servers and getting that deployed, the client end
      enable really has some potential gain too.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      255cac91
  31. 19 3月, 2009 1 次提交
    • B
      ipv6: Fix incorrect disable_ipv6 behavior · 9bdd8d40
      Brian Haley 提交于
      Fix the behavior of allowing both sysctl and addrconf_dad_failure()
      to set the disable_ipv6 parameter without any bad side-effects.
      If DAD fails and accept_dad > 1, we will still set disable_ipv6=1,
      but then instead of allowing an RA to add an address then
      immediately fail DAD, we simply don't allow the address to be
      added in the first place.  This also lets the user set this flag
      and disable all IPv6 addresses on the interface, or on the entire
      system.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bdd8d40
  32. 24 2月, 2009 2 次提交
  33. 23 2月, 2009 1 次提交
  34. 01 2月, 2009 1 次提交
  35. 28 10月, 2008 1 次提交
    • N
      net: implement emergency route cache rebulds when gc_elasticity is exceeded · 1080d709
      Neil Horman 提交于
      This is a patch to provide on demand route cache rebuilding.  Currently, our
      route cache is rebulid periodically regardless of need.  This introduced
      unneeded periodic latency.  This patch offers a better approach.  Using code
      provided by Eric Dumazet, we compute the standard deviation of the average hash
      bucket chain length while running rt_check_expire.  Should any given chain
      length grow to larger that average plus 4 standard deviations, we trigger an
      emergency hash table rebuild for that net namespace.  This allows for the common
      case in which chains are well behaved and do not grow unevenly to not incur any
      latency at all, while those systems (which may be being maliciously attacked),
      only rebuild when the attack is detected.  This patch take 2 other factors into
      account:
      1) chains with multiple entries that differ by attributes that do not affect the
      hash value are only counted once, so as not to unduly bias system to rebuilding
      if features like QOS are heavily used
      2) if rebuilding crosses a certain threshold (which is adjustable via the added
      sysctl in this patch), route caching is disabled entirely for that net
      namespace, since constant rebuilding is less efficient that no caching at all
      
      Tested successfully by me.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080d709
  36. 11 7月, 2008 1 次提交