提交 · 434d305405ab86414f6ea3f261307d443a2c3506 · openeuler / Kernel

28 7月, 2014 1 次提交

inet: frag: don't account number of fragment queues · 434d3054

由 Florian Westphal 提交于 7月 24, 2014

The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

434d3054

08 5月, 2014 1 次提交

net: clean up snmp stats code · 698365fa

由 WANG Cong 提交于 5月 05, 2014

commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f5
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

698365fa

04 3月, 2014 1 次提交

tcp: snmp stats for Fast Open, SYN rtx, and data pkts · f19c29e3

由 Yuchung Cheng 提交于 3月 03, 2014

Add the following snmp stats:

TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed beacuse
the remote does not accept it or the attempts timed out.

TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
retransmissions into SYN, fast-retransmits, timeout retransmits, etc.

TCPOrigDataSent: number of outgoing packets with original data (excluding
retransmission but including data-in-SYN). This counter is different from
TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
more useful to track the TCP retransmission rate.

Change TCPFastOpenActive to track only successful Fast Opens to be symmetric to
TCPFastOpenPassive.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NLawrence Brakmo <brakmo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f19c29e3

27 2月, 2014 1 次提交

net: tcp: add mib counters to track zero window transitions · 8e165e20

由 Florian Westphal 提交于 2月 25, 2014

Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window,
  but can't because we would shrink current window.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e165e20

02 1月, 2014 1 次提交

ipv4: spaces required around that '=' · 442c67f8

由 Weilong Chen 提交于 12月 31, 2013

Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

442c67f8

07 12月, 2013 1 次提交

tcp: auto corking · f54b3111

由 Eric Dumazet 提交于 12月 05, 2013

With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

35209.439626 task-clock # 2.901 CPUs utilized
2,294 context-switches # 0.065 K/sec
101 CPU-migrations # 0.003 K/sec
4,079 page-faults # 0.116 K/sec
97,923,241,298 cycles # 2.781 GHz [83.31%]
51,832,908,236 stalled-cycles-frontend # 52.93% frontend cycles idle [83.30%]
25,697,986,603 stalled-cycles-backend # 26.24% backend cycles idle [66.70%]
102,225,978,536 instructions # 1.04 insns per cycle
# 0.51 stalled cycles per insn [83.38%]
18,657,696,819 branches # 529.906 M/sec [83.29%]
91,679,646 branch-misses # 0.49% of all branches [83.40%]

12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
40045.864494 task-clock # 3.301 CPUs utilized
171 context-switches # 0.004 K/sec
53 CPU-migrations # 0.001 K/sec
4,080 page-faults # 0.102 K/sec
111,340,458,645 cycles # 2.780 GHz [83.34%]
61,778,039,277 stalled-cycles-frontend # 55.49% frontend cycles idle [83.31%]
29,295,522,759 stalled-cycles-backend # 26.31% backend cycles idle [66.67%]
108,654,349,355 instructions # 0.98 insns per cycle
# 0.57 stalled cycles per insn [83.34%]
19,552,170,748 branches # 488.244 M/sec [83.34%]
157,875,417 branch-misses # 0.81% of all branches [83.34%]

12.130267788 seconds time elapsed
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f54b3111

10 8月, 2013 1 次提交

net: rename busy poll MIB counter · 288a9376

由 Eliezer Tamir 提交于 8月 07, 2013

Rename mib counter from "low latency" to "busy poll"

v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
Eric Dumazet suggested that the current location is better.

So v2 just renames the counter to fit the new naming convention.
Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

288a9376

09 8月, 2013 1 次提交

net: add SNMP counters tracking incoming ECN bits · 1f07d03e

由 Eric Dumazet 提交于 8月 06, 2013

With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.

Its probably too late to change this now.

This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
Ip[6]CEPkts    : Number of packets received with Congestion Experienced

lph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives                    1634137            0.0
Ip6InReceives                   3714107            0.0
Ip6InNoECTPkts                  19205              0.0
Ip6InECT0Pkts                   52651828           0.0
IpExtInNoECTPkts                33630              0.0
IpExtInECT0Pkts                 15581379           0.0
IpExtInCEPkts                   6                  0.0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f07d03e

11 6月, 2013 1 次提交

net: add low latency socket poll · 06021292

由 Eliezer Tamir 提交于 6月 10, 2013

Adds an ndo_ll_poll method and the code that supports it.
This method can be used by low latency applications to busy-poll
Ethernet device queues directly from the socket code.
sysctl_net_ll_poll controls how many microseconds to poll.
Default is zero (disabled).
Individual protocol support will be added by subsequent patches.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Tested-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06021292

30 4月, 2013 1 次提交

net: Add MIB counters for checksum errors · 6a5dc9e5

由 Eric Dumazet 提交于 4月 29, 2013

Add MIB counters for checksum errors in IP layer,
and TCP/UDP/ICMP layers, to help diagnose problems.

$ nstat -a | grep  Csum
IcmpInCsumErrors                72                 0.0
TcpInCsumErrors                 382                0.0
UdpInCsumErrors                 463221             0.0
Icmp6InCsumErrors               75                 0.0
Udp6InCsumErrors                173442             0.0
IpExtInCsumErrors               10884              0.0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a5dc9e5

19 4月, 2013 1 次提交

tcp: introduce TCPSpuriousRtxHostQueues SNMP counter · 0e280af0

由 Eric Dumazet 提交于 4月 18, 2013

Host queues (Qdisc + NIC) can hold packets so long that TCP can
eventually retransmit a packet before the first transmit even left
the host.

Its not clear right now if we could avoid this in the first place :

- We could arm RTO timer not at the time we enqueue packets, but
  at the time we TX complete them (tcp_wfree())

- Cancel the sending of the new copy of the packet if prior one
  is still in queue.

This patch adds instrumentation so that we can at least see how
often this problem happens.

TCPSpuriousRtxHostQueues SNMP counter is incremented every time
we detect the fast clone is not yet freed in tcp_transmit_skb()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e280af0

12 3月, 2013 2 次提交

tcp: TLP loss detection. · 9b717a8d

由 Nandita Dukkipati 提交于 3月 11, 2013

This is the second of the TLP patch series; it augments the basic TLP
algorithm with a loss detection scheme.

This patch implements a mechanism for loss detection when a Tail
loss probe retransmission plugs a hole thereby masking packet loss
from the sender. The loss detection algorithm relies on counting
TLP dupacks as outlined in Sec. 3 of:
http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01

The basic idea is: Sender keeps track of TLP "episode" upon
retransmission of a TLP packet. An episode ends when the sender receives
an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the
episode. We want to make sure that before the episode ends the sender
receives a "TLP dupack", indicating that the TLP retransmission was
unnecessary, so there was no loss/hole that needed plugging. If the
sender gets no TLP dupack before the end of the episode, then it reduces
ssthresh and the congestion window, because the TLP packet arriving at
the receiver probably plugged a hole.
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b717a8d

tcp: Tail loss probe (TLP) · 6ba8a3b1

由 Nandita Dukkipati 提交于 3月 11, 2013

This patch series implement the Tail loss probe (TLP) algorithm described
in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
first patch implements the basic algorithm.

TLP's goal is to reduce tail latency of short transactions. It achieves
this by converting retransmission timeouts (RTOs) occuring due
to tail losses (losses at end of transactions) into fast recovery.
TLP transmits one packet in two round-trips when a connection is in
Open state and isn't receiving any ACKs. The transmitted packet, aka
loss probe, can be either new or a retransmission. When there is tail
loss, the ACK from a loss probe triggers FACK/early-retransmit based
fast recovery, thus avoiding a costly RTO. In the absence of loss,
there is no change in the connection state.

PTO stands for probe timeout. It is a timer event indicating
that an ACK is overdue and triggers a loss probe packet. The PTO value
is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
ACK timer when there is only one oustanding packet.

TLP Algorithm

On transmission of new data in Open state:
  -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
  -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
  -> PTO = min(PTO, RTO)

Conditions for scheduling PTO:
  -> Connection is in Open state.
  -> Connection is either cwnd limited or no new data to send.
  -> Number of probes per tail loss episode is limited to one.
  -> Connection is SACK enabled.

When PTO fires:
  new_segment_exists:
    -> transmit new segment.
    -> packets_out++. cwnd remains same.

  no_new_packet:
    -> retransmit the last segment.
       Its ACK triggers FACK or early retransmit based recovery.

ACK path:
  -> rearm RTO at start of ACK processing.
  -> reschedule PTO if need be.

In addition, the patch includes a small variation to the Early Retransmit
(ER) algorithm, such that ER and TLP together can in principle recover any
N-degree of tail loss through fast recovery. TLP is controlled by the same
sysctl as ER, tcp_early_retrans sysctl.
tcp_early_retrans==0; disables TLP and ER.
		 ==1; enables RFC5827 ER.
		 ==2; delayed ER.
		 ==3; TLP and delayed ER. [DEFAULT]
		 ==4; TLP only.

The TLP patch series have been extensively tested on Google Web servers.
It is most effective for short Web trasactions, where it reduced RTOs by 15%
and improved HTTP response time (average by 6%, 99th percentile by 10%).
The transmitted probes account for <0.5% of the overall transmissions.
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ba8a3b1

19 2月, 2013 2 次提交

net: proc: change proc_net_remove to remove_proc_entry · ece31ffd

由 Gao feng 提交于 2月 18, 2013

proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.

this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ece31ffd

net: proc: change proc_net_fops_create to proc_create · d4beaa66

由 Gao feng 提交于 2月 18, 2013

Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4beaa66

01 9月, 2012 1 次提交

tcp: TCP Fast Open Server - header & support functions · 10467163

由 Jerry Chu 提交于 8月 31, 2012

This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.
Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10467163

20 7月, 2012 1 次提交

net-tcp: Fast Open client - sending SYN-data · 783237e8

由 Yuchung Cheng 提交于 7月 19, 2012

This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

783237e8

17 7月, 2012 3 次提交

tcp: implement RFC 5961 4.2 · 0c24604b

由 Eric Dumazet 提交于 7月 17, 2012

Implement the RFC 5691 mitigation against Blind
Reset attack using SYN bit.

Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
incoming packet, instead of resetting the session.

Add a new SNMP counter to count number of challenge acks sent
in response to SYN packets.
(netstat -s | grep TCPSYNChallenge)

Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
because of a SYN flag.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c24604b

tcp: implement RFC 5961 3.2 · 282f23c6

由 Eric Dumazet 提交于 7月 17, 2012

Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.

Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)

If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.

Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.

Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

282f23c6

tcp: add OFO snmp counters · a6df1ae9

由 Eric Dumazet 提交于 7月 16, 2012

Add three SNMP TCP counters, to better track TCP behavior
at global stage (netstat -s), when packets are received
Out Of Order (OFO)

TCPOFOQueue : Number of packets queued in OFO queue

TCPOFODrop  : Number of packets meant to be queued in OFO
              but dropped because socket rcvbuf limit hit.

TCPOFOMerge : Number of packets in OFO that were merged with
              other packets.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6df1ae9

20 3月, 2012 1 次提交

tcp: reduce out_of_order memory use · c8628155

由 Eric Dumazet 提交于 3月 18, 2012

With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.

Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.

When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.

Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.

Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.

This also reduced average memory used by tcp sockets.

With help from Neal Cardwell.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8628155

27 1月, 2012 1 次提交

tcp: add LINUX_MIB_TCPRETRANSFAIL counter · 09e9b813

由 Eric Dumazet 提交于 1月 25, 2012

It might be useful to get a counter of failed tcp_retransmit_skb()
calls.
Reported-by: NSatoru Moriya <satoru.moriya@hds.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09e9b813

23 1月, 2012 1 次提交

tcp: detect loss above high_seq in recovery · 974c1236

由 Yuchung Cheng 提交于 1月 19, 2012

Correctly implement a loss detection heuristic: New sequences (above
high_seq) sent during the fast recovery are deemed lost when higher
sequences are SACKed.

Current code does not catch these losses, because tcp_mark_head_lost()
does not check packets beyond high_seq. The fix is straight-forward by
checking packets until the highest sacked packet. In addition, all the
FLAG_DATA_LOST logic are in-effective and redundant and can be removed.

Update the loss heuristic comments. The algorithm above is documented
as heuristic B, but it is redundant too because heuristic A already
covers B.

Note that this change only marks some forward-retransmitted packets LOST.
It does NOT forbid TCP performing further CWR on new losses. A potential
follow-up patch under preparation is to perform another CWR on "new"
losses such as
1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
2) retransmission is lost.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

974c1236

13 12月, 2011 1 次提交

foundations of per-cgroup memory pressure controlling. · 180d8cd9

由 Glauber Costa 提交于 12月 11, 2011

This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

180d8cd9

10 11月, 2011 1 次提交

ipv4: reduce percpu needs for icmpmsg mibs · acb32ba3

由 Eric Dumazet 提交于 11月 08, 2011

Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
(can be ~88000 us).

This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
for all possible cpus can read 16 Mbytes of memory.

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acb32ba3

01 11月, 2011 1 次提交

net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules · bc3b2d7f

由 Paul Gortmaker 提交于 7月 15, 2011

These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

bc3b2d7f

16 9月, 2011 1 次提交

tcp: Change possible SYN flooding messages · 946cedcc

由 Eric Dumazet 提交于 8月 30, 2011

"Possible SYN flooding on port xxxx " messages can fill logs on servers.

Change logic to log the message only once per listener, and add two new
SNMP counters to track :

TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.

Based on a prior patch from Tom Herbert, and suggestions from David.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

946cedcc

09 12月, 2010 1 次提交

tcp: Replace time wait bucket msg by counter · 67631510

由 Tom Herbert 提交于 12月 08, 2010

Rather than printing the message to the log, use a mib counter to keep
track of the count of occurences of time wait bucket overflow.  Reduces
spam in logs.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67631510

11 11月, 2010 1 次提交

net: avoid limits overflow · 8d987e5c

由 Eric Dumazet 提交于 11月 09, 2010

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NRobin Holt <holt@sgi.com>
Reviewed-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d987e5c

01 7月, 2010 1 次提交

snmp: 64bit ipstats_mib for all arches · 4ce3c183

由 Eric Dumazet 提交于 6月 30, 2010

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ce3c183

03 6月, 2010 1 次提交

ipv4: add LINUX_MIB_IPRPFILTER snmp counter · b5f7e755

由 Eric Dumazet 提交于 6月 02, 2010

Christoph Lameter mentioned that packets could be dropped in input path
because of rp_filter settings, without any SNMP counter being
incremented. System administrator can have a hard time to track the
problem.

This patch introduces a new counter, LINUX_MIB_IPRPFILTER, incremented
each time we drop a packet because Reverse Path Filter triggers.

(We receive an IPv4 datagram on a given interface, and find the route to
send an answer would use another interface)

netstat -s | grep IPReversePathFilter
    IPReversePathFilter: 21714
Reported-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5f7e755

22 3月, 2010 1 次提交

tcp: Add SNMP counter for DEFER_ACCEPT · 907cdda5

由 Eric Dumazet 提交于 3月 19, 2010

Its currently hard to diagnose when ACK frames are dropped because an
application set TCP_DEFER_ACCEPT on its listening socket.

See http://bugzilla.kernel.org/show_bug.cgi?id=15507

This patch adds a SNMP value, named TCPDeferAcceptDrop

netstat -s | grep TCPDeferAcceptDrop
    TCPDeferAcceptDrop: 0

This counter is incremented every time we drop a pure ACK frame received
by a socket in SYN_RECV state because its SYNACK retrans count is lower
than defer_accept value.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

907cdda5

09 3月, 2010 1 次提交

tcp: Add SNMP counters for backlog and min_ttl drops · 6cce09f8

由 Eric Dumazet 提交于 3月 07, 2010

Commit 6b03a53a (tcp: use limited socket backlog) added the possibility
of dropping frames when backlog queue is full.

Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the
possibility of dropping frames when TTL is under a given limit.

This patch adds new SNMP MIB entries, named TCPBacklogDrop and
TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line

netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop"
    TCPBacklogDrop: 0
    TCPMinTTLDrop: 0
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cce09f8

17 2月, 2010 1 次提交

percpu: add __percpu sparse annotations to net · 7d720c3e

由 Tejun Heo 提交于 2月 16, 2010

Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d720c3e

23 1月, 2010 1 次提交

net: constify MIB name tables · 5833929c

由 Alexey Dobriyan 提交于 1月 22, 2010

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5833929c

27 4月, 2009 1 次提交

snmp: add missing counters for RFC 4293 · edf391ff

由 Neil Horman 提交于 4月 27, 2009

The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols.  This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edf391ff

16 2月, 2009 1 次提交

net: replace commatas with semicolons · 1c10c49d

由 Thomas Gleixner 提交于 2月 16, 2009

Impact: syntax fix

Interestingly enough this compiles w/o any complaints:

	orphans = percpu_counter_sum_positive(&tcp_orphan_count),
	sockets = percpu_counter_sum_positive(&tcp_sockets_allocated),
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c10c49d

30 12月, 2008 1 次提交

net: Fix percpu counters deadlock · eb4dea58

由 Herbert Xu 提交于 12月 29, 2008

When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.

Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.
Reported-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb4dea58

26 11月, 2008 2 次提交

net: Use a percpu_counter for orphan_count · dd24c001

由 Eric Dumazet 提交于 11月 25, 2008

Instead of using one atomic_t per protocol, use a percpu_counter
for "orphan_count", to reduce cache line contention on
heavy duty network servers. 
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd24c001

net: Use a percpu_counter for sockets_allocated · 1748376b

由 Eric Dumazet 提交于 11月 25, 2008

Instead of using one atomic_t per protocol, use a percpu_counter
for "sockets_allocated", to reduce cache line contention on
heavy duty network servers. 

Note : We revert commit (248969ae
net: af_unix can make unix_nr_socks visbile in /proc),
since it is not anymore used after sock_prot_inuse_add() addition
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1748376b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功