提交 · 5ff3fec6d3fc848753c2fa30b18607358f89a202 · openanolis / cloud-kernel

05 1月, 2013 1 次提交

ipv6: document ndisc_notify in networking/ip-sysctl.txt · db2b620a

由 Hannes Frederic Sowa 提交于 1月 01, 2013

I slipped in a new sysctl without proper documentation. I would like to
make up for this now.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db2b620a

11 12月, 2012 1 次提交

doc: Tighten-up and clarify description of tcp_fin_timeout · d825da2e

由 Rick Jones 提交于 12月 10, 2012

The description for tcp_fin_timeout should be tigher and more clear.

In addition to being tighter, we should make the spelling of the
state name consistent with what utilities report, remove the now
dated reference to 2.2 and put the default in the consistent place.
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d825da2e

08 12月, 2012 1 次提交

net: doc : use more suitable word 'unexpected' to replace 'secluded' · 5d248c49

由 Shan Wei 提交于 12月 06, 2012

 'secluded' is used to describe places, not suitable here.
Suggested-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NShan Wei <davidshan@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d248c49

06 12月, 2012 1 次提交

net: doc: add default value for neighbour parameters · cc868028

由 Shan Wei 提交于 12月 04, 2012

Signed-off-by: NShan Wei <davidshan@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc868028

30 11月, 2012 1 次提交

doc: make the description of how tcp_ecn works more explicit and clear · 7e3a2dc5

由 Rick Jones 提交于 11月 28, 2012

Make the description of how tcp_ecn works a bit more explicit and clear.
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e3a2dc5

26 10月, 2012 1 次提交

sctp: Make hmac algorithm selection for cookie generation dynamic · 3c68198e

由 Neil Horman 提交于 10月 24, 2012

Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
generate cookie values when establishing new connections via two build time
config options.  Theres no real reason to make this a static selection.  We can
add a sysctl that allows for the dynamic selection of these algorithms at run
time, with the default value determined by the corresponding crypto library
availability.
This comes in handy when, for example running a system in FIPS mode, where use
of md5 is disallowed, but SHA1 is permitted.

Note: This new sysctl has no corresponding socket option to select the cookie
hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
contains no option for this value, and I opted not to pollute the socket option
namespace.

Change notes:
v2)
	* Updated subject to have the proper sctp prefix as per Dave M.
	* Replaced deafult selection options with new options that allow
	  developers to explicitly select available hmac algs at build time
	  as per suggestion by Vlad Y.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c68198e

01 9月, 2012 2 次提交

tcp: TCP Fast Open Server - header & support functions · 10467163

由 Jerry Chu 提交于 8月 31, 2012

This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.
Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10467163

tcp: Increase timeout for SYN segments · 6c9ff979

由 Alex Bergmann 提交于 8月 31, 2012

Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for
the passive open side") changed the initRTO from 3secs to 1sec in
accordance to RFC6298 (former RFC2988bis). This reduced the time till
the last SYN retransmission packet gets sent from 93secs to 31secs.

RFC1122 is stating that the retransmission should be done for at least 3
minutes, but this seems to be quite high.

  "However, the values of R1 and R2 may be different for SYN
  and data segments.  In particular, R2 for a SYN segment MUST
  be set large enough to provide retransmission of the segment
  for at least 3 minutes.  The application can close the
  connection (i.e., give up on the open attempt) sooner, of
  course."

This patch increases the value of TCP_SYN_RETRIES to the value of 6,
providing a retransmission window of 63secs.

The comments for SYN and SYNACK retries have also been updated to
describe the current settings. The same goes for the documentation file
"Documentation/networking/ip-sysctl.txt".
Signed-off-by: NAlexander Bergmann <alex@linlab.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c9ff979

31 7月, 2012 1 次提交

ipv4: remove rt_cache_rebuild_count · 0c7462a2

由 Eric Dumazet 提交于 7月 30, 2012

After IP route cache removal, rt_cache_rebuild_count is no longer
used.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c7462a2

23 7月, 2012 1 次提交

sctp: Implement quick failover draft from tsvwg · 5aa93bcf

由 Neil Horman 提交于 7月 21, 2012

I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5aa93bcf

20 7月, 2012 2 次提交

net-tcp: Fast Open client - cookie-less mode · 67da22d2

由 Yuchung Cheng 提交于 7月 19, 2012

In trusted networks, e.g., intranet, data-center, the client does not
need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
of cookie availability.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67da22d2

net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) · cf60af03

由 Yuchung Cheng 提交于 7月 19, 2012

sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.

For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.

For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().

Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.

The buffer size of sendmsg() is independent of the MSS of the connection.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf60af03

17 7月, 2012 1 次提交

tcp: implement RFC 5961 3.2 · 282f23c6

由 Eric Dumazet 提交于 7月 17, 2012

Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.

Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)

If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.

Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.

Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

282f23c6

12 7月, 2012 1 次提交

tcp: TCP Small Queues · 46d3ceab

由 Eric Dumazet 提交于 7月 11, 2012

This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46d3ceab

01 7月, 2012 1 次提交
- D
  ipv4: Clarify in docs that accept_local requires rp_filter. · c801e3cc
  由 David S. Miller 提交于 6月 30, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  c801e3cc
13 6月, 2012 1 次提交

ipv4: Add interface option to enable routing of 127.0.0.0/8 · d0daebc3

由 Thomas Graf 提交于 6月 12, 2012

Routing of 127/8 is tradtionally forbidden, we consider
packets from that address block martian when routing and do
not process corresponding ARP requests.

This is a sane default but renders a huge address space
practically unuseable.

The RFC states that no address within the 127/8 block should
ever appear on any network anywhere but it does not forbid
the use of such addresses outside of the loopback device in
particular. For example to address a pool of virtual guests
behind a load balancer.

This patch adds a new interface option 'route_localnet'
enabling routing of the 127/8 address block and processing
of ARP requests on a specific interface.

Note that for the feature to work, the default local route
covering 127/8 dev lo needs to be removed.

Example:
  $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
  $ ip route del 127.0.0.0/8 dev lo table local
  $ ip addr add 127.1.0.1/16 dev eth0
  $ ip route flush cache

V2: Fix invalid check to auto flush cache (thanks davem)
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0daebc3

09 5月, 2012 1 次提交

netfilter: bridge: optionally set indev to vlan · 4981682c

由 Pablo Neira Ayuso 提交于 5月 08, 2012

if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.

When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.

This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.

Also update Documentation with current brnf default settings.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NBart De Schuymer <bdschuym@pandora.be>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4981682c

03 5月, 2012 2 次提交

tcp: change tcp_adv_win_scale and tcp_rmem[2] · b49960a0

由 Eric Dumazet 提交于 5月 02, 2012

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b49960a0

tcp: early retransmit · eed530b6

由 Yuchung Cheng 提交于 5月 02, 2012

This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eed530b6

27 4月, 2012 1 次提交

net: doc: merge /proc/sys/net/core/* documents into one place · c60f6aa8

由 Shan Wei 提交于 4月 26, 2012

All parameter descriptions in /proc/sys/net/core/* now is separated
two places. So, merge them into Documentation/sysctl/net.txt.
Signed-off-by: NShan Wei <davidshan@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c60f6aa8

04 4月, 2012 1 次提交

TCP: update ip_local_port_range documentation · 5d6bd861

由 Fernando Luis Vazquez Cao 提交于 4月 03, 2012

The explanation of ip_local_port_range in
Documentation/networking/ip-sysctl.txt contains several factual
errors:

- The default value of ip_local_port_range does not depend on the
  amount of memory available in the system.
- tcp_tw_recycle is not enabled by default.
- 1024-4999 is not the default value.
- Etc.

Clean up the mess.
Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d6bd861

07 12月, 2011 1 次提交

ipv4:correct description for tcp_max_syn_backlog · 99b53bdd

由 Peter Pan(潘卫平) 提交于 12月 05, 2011

Since commit c5ed63d6(tcp: fix three tcp sysctls tuning),
sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask,
and the minimal value is 128, and it will increase in proportion to the
memory of machine.
The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog
are out of date.

Changelog:
V2: update description for sysctl_max_syn_backlog
Signed-off-by: NWeiping Pan <panweiping3@gmail.com>
Reviewed-by: NShan Wei <shanwei88@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99b53bdd

01 12月, 2011 1 次提交

tcp: inherit listener congestion control for passive cnx · d8a6e65f

由 Eric Dumazet 提交于 11月 30, 2011

Rick Jones reported that TCP_CONGESTION sockopt performed on a listener
was ignored for its children sockets : right after accept() the
congestion control for new socket is the system default one.

This seems an oversight of the initial design (quoted from Stephen)

Based on prior investigation and patch from Rick.
Reported-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Yuchung Cheng <ycheng@google.com>
Tested-by: NRick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8a6e65f

14 11月, 2011 1 次提交

neigh: new unresolved queue limits · 8b5c171b

由 Eric Dumazet 提交于 11月 09, 2011

Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> >  ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit.  The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.

Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.

[PATCH V5 net-next] neigh: new unresolved queue limits

unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.

$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b5c171b

09 11月, 2011 1 次提交

net: min_pmtu default is 552 · 20db93c3

由 Eric Dumazet 提交于 11月 08, 2011

Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20db93c3

30 9月, 2011 1 次提交

net: Documentation: Fix type of variables · 605b91c8

由 Roy.Li 提交于 9月 28, 2011

Signed-off-by: NRoy.Li <rongqing.li@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

605b91c8

17 9月, 2011 1 次提交

ipv6: Send ICMPv6 RSes only when RAs are accepted · 026359bc

由 Tore Anderson 提交于 8月 28, 2011

This patch improves the logic determining when to send ICMPv6 Router
Solicitations, so that they are 1) always sent when the kernel is
accepting Router Advertisements, and 2) never sent when the kernel is
not accepting RAs. In other words, the operational setting of the
"accept_ra" sysctl is used.

The change also makes the special "Hybrid Router" forwarding mode
("forwarding" sysctl set to 2) operate exactly the same as the standard
Router mode (forwarding=1). The only difference between the two was
that RSes was being sent in the Hybrid Router mode only. The sysctl
documentation describing the special Hybrid Router mode has therefore
been removed.

Rationale for the change:

Currently, the value of forwarding sysctl is the only thing determining
whether or not to send RSes. If it has the value 0 or 2, they are sent,
otherwise they are not. This leads to inconsistent behaviour in the
following cases:

* accept_ra=0, forwarding=0
* accept_ra=0, forwarding=2
* accept_ra=1, forwarding=2
* accept_ra=2, forwarding=1

In the first three cases, the kernel will send RSes, even though it will
not accept any RAs received in reply. In the last case, it will not send
any RSes, even though it will accept and process any RAs received. (Most
routers will send unsolicited RAs periodically, so suppressing RSes in
the last case will merely delay auto-configuration, not prevent it.)

Also, it is my opinion that having the forwarding sysctl control RS
sending behaviour (completely independent of whether RAs are being
accepted or not) is simply not what most users would intuitively expect
to be the case.
Signed-off-by: NTore Anderson <tore@fud.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

026359bc

23 8月, 2011 1 次提交

net: Documentation: RFC 2553bis is now RFC 3493 · d5c073ca

由 Geoffrey Thomas 提交于 8月 22, 2011

Signed-off-by: NGeoffrey Thomas <geofft@mit.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5c073ca

09 7月, 2011 1 次提交

net: Fix default in docs for tcp_orphan_retries. · 06b8fc5d

由 David S. Miller 提交于 7月 08, 2011

Default should be listed at 8 instead of 7.
Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06b8fc5d

05 7月, 2011 2 次提交

Update documented default values for various TCP/UDP tunables · 6539fefd

由 Max Matveev 提交于 6月 21, 2011

tcp_rmem and tcp_wmem use 1 page as default value for the minimum
amount of memory to be used, same as udp_wmem_min and udp_rmem_min.
Pages are different size on different architectures - use the right
units when describing the defaults.
Reviewed-by: NShan Wei <shanwei@cn.fujitsu.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NMax Matveev <makc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6539fefd

Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables · a6e1204b

由 Max Matveev 提交于 6月 19, 2011

sctp does not use second and third ("default" and "max") values
of sctp_rmem tunable. The format is the same as tcp_rmem
but the meaning is different so make the documentation explicit to
avoid confusion.

sctp_wmem is not used at all.
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NMax Matveev <makc@redhat.com>
Reviewed-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6e1204b

09 6月, 2011 1 次提交

inetpeer: remove unused list · 4b9d9be8

由 Eric Dumazet 提交于 6月 08, 2011

Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b9d9be8

21 2月, 2011 1 次提交

tcp: document tcp_max_ssthresh (Limited Slow-Start) · 81146ec1

由 Ilpo Järvinen 提交于 2月 19, 2011

Base on Ilpo's patch about documenting tcp_max_ssthresh.
(see http://marc.info/?l=linux-netdev&m=117950581307310&w=2)

According to errata of RFC3742, fix the number of segments increased
during RTT time.

Just to state the occasion to use this parameter, But
about how to set parameter value, maybe some others can do it.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81146ec1

03 2月, 2011 1 次提交

tcp_ecn is an integer not a boolean · 34a6ef38

由 Peter Chubb 提交于 2月 02, 2011

There was some confusion at LCA as to why the sysctl tcp_ecn took one
of three values when it was documented as a Boolean.  This patch fixes
the documentation.
Signed-off-by: NPeter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34a6ef38

14 12月, 2010 1 次提交

net: change ip_default_ttl documentation · cc6f02dd

由 Eric Dumazet 提交于 12月 13, 2010

Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc6f02dd

29 11月, 2010 1 次提交

tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) · 0147fc05

由 Alexey Dobriyan 提交于 11月 22, 2010

tcp_win_from_space() does the following:

      if (sysctl_tcp_adv_win_scale <= 0)
              return space >> (-sysctl_tcp_adv_win_scale);
      else
              return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

      echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
      wget www.kernel.org
      [softlockup]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0147fc05

18 11月, 2010 1 次提交

clarify documentation for net.ipv4.igmp_max_memberships · d67ef35f

由 Jeremy Eder 提交于 11月 15, 2010

This patch helps clarify documentation for
net.ipv4.igmp_max_memberships by providing a formula for
calculating the maximum number of multicast groups that can be
subscribed to, plus defining the theoretical limit.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NJeremy Eder <jeder@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d67ef35f

13 11月, 2010 1 次提交
- B
  docs: Add neigh/gc_thresh3 and route/max_size documentation. · cbaf087a
  由 Ben Greear 提交于 11月 08, 2010
```
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  cbaf087a
04 9月, 2010 1 次提交

ipv6: Update ip-sysctl.txt documentation for recent changes to accept_ra and forwarding · ae8abfa0

由 Thomas Graf 提交于 9月 03, 2010

Documentation for recent changes to the tunables accept_ra and
forwarding.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae8abfa0

31 5月, 2010 1 次提交

arp_notify: document that a gratuitous ARP request is sent when this option is enabled · 3f8dc236

由 Ian Campbell 提交于 5月 26, 2010

This option causes a gratuitous ARP request, not a reply as the documentation
currently suggests.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f8dc236

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功