提交 · ebfbd2cff4e374ea243d58d42ab075c4e9cb6f68 · openanolis / cloud-kernel

04 6月, 2019 1 次提交

net/tcp: Support tunable tcp timeout value in TIME-WAIT state · ebfbd2cf

由 George Zhang 提交于 3月 28, 2018

By default the tcp_tw_timeout value is 60 seconds. The minimum is
1 second and the maximum is 600. This setting is useful on system under
heavy tcp load.

NOTE: set the tcp_tw_timeout below 60 seconds voilates the "quiet time"
restriction, and make your system into the risk of causing some old data
to be accepted as new or new data rejected as old duplicated by some
receivers.

Link: http://web.archive.org/web/20150102003320/http://tools.ietf.org/html/rfc793Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ebfbd2cf

02 5月, 2019 1 次提交

ipv4: set the tcp_min_rtt_wlen range from 0 to one day · 250e51f8

由 ZhangXiaoxu 提交于 4月 16, 2019

[ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ]

There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
 <IRQ>
 dump_stack+0x8c/0xba
 ubsan_epilogue+0x11/0x60
 handle_overflow+0x12d/0x170
 ? ttwu_do_wakeup+0x21/0x320
 __ubsan_handle_mul_overflow+0x12/0x20
 tcp_ack_update_rtt+0x76c/0x780
 tcp_clean_rtx_queue+0x499/0x14d0
 tcp_ack+0x69e/0x1240
 ? __wake_up_sync_key+0x2c/0x50
 ? update_group_capacity+0x50/0x680
 tcp_rcv_established+0x4e2/0xe10
 tcp_v4_do_rcv+0x22b/0x420
 tcp_v4_rcv+0xfe8/0x1190
 ip_protocol_deliver_rcu+0x36/0x180
 ip_local_deliver+0x15b/0x1a0
 ip_rcv+0xac/0xd0
 __netif_receive_skb_one_core+0x7f/0xb0
 __netif_receive_skb+0x33/0xc0
 netif_receive_skb_internal+0x84/0x1c0
 napi_gro_receive+0x2a0/0x300
 receive_buf+0x3d4/0x2350
 ? detach_buf_split+0x159/0x390
 virtnet_poll+0x198/0x840
 ? reweight_entity+0x243/0x4b0
 net_rx_action+0x25c/0x770
 __do_softirq+0x19b/0x66d
 irq_exit+0x1eb/0x230
 do_IRQ+0x7a/0x150
 common_interrupt+0xf/0xf
 </IRQ>

It can be reproduced by:
  echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen

Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

250e51f8

27 9月, 2018 1 次提交

net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int · d4ce5808

由 Maciej Żenczykowski 提交于 9月 25, 2018

(fix documentation and sysctl access to treat it as such)

Tested:
  # zcat /proc/config.gz | egrep ^CONFIG_HZ
  CONFIG_HZ_1000=y
  CONFIG_HZ=1000
  # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294968
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
  # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
  4294967
  # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
  # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
  -1
  tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4ce5808

17 8月, 2018 2 次提交

Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb · 70fd8036

由 Ivan Khoronzhuk 提交于 8月 15, 2018

If set cbs parameters calculated for 1000Mb, but use on 100Mb port
w/o h/w offload (for cpsw offload it doesn't matter), it works
incorrectly. According to the example and testing board, second port
is 100Mb interface. Correct them on recalculated for 100Mb interface.
It allows to use the same command for CBS software implementation for
board in example.
Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70fd8036

netfilter: doc: Add nf_tables part in tproxy.txt · 1bfc2bc7

由 Máté Eckl 提交于 8月 10, 2018

Recently, transparent proxy support has been added to nf_tables so that
this document should be updated with the new information.

- Nft commands are added as alternatives to iptables ones.
- The link for a patched iptables is removed as it is already part of
  the mainline iptables implementation (and the link is dead).
- tcprdr is added as an example implementation of a transparent proxy

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Florian Westphal <fw@strlen.de>
Cc: KOVACS Krisztian <hidden@sch.bme.hu>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NMáté Eckl <ecklm94@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1bfc2bc7

13 8月, 2018 1 次提交

ipv6: Add icmp_echo_ignore_all support for ICMPv6 · e6f86b0f

由 Virgile Jarry 提交于 8月 10, 2018

Preventing the kernel from responding to ICMP Echo Requests messages
can be useful in several ways. The sysctl parameter
'icmp_echo_ignore_all' can be used to prevent the kernel from
responding to IPv4 ICMP echo requests. For IPv6 pings, such
a sysctl kernel parameter did not exist.

Add the ability to prevent the kernel from responding to IPv6
ICMP echo requests through the use of the following sysctl
parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
Update the documentation to reflect this change.
Signed-off-by: NVirgile Jarry <virgile@acceis.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f86b0f

02 8月, 2018 2 次提交

net: ipv4: Control SKB reprioritization after forwarding · 432e05d3

由 Petr Machata 提交于 8月 01, 2018

After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.

Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.

Therefore introduce a sysctl that controls this behavior. Keep the
default value at 1 to maintain backward-compatible behavior.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

432e05d3

Documentation: dpaa2: Use correct heading adornment · e02ee981

由 Ioana Ciornei 提交于 7月 31, 2018

Add overline heading adornment to document title in order to comply
with kernel doc requirements.

Fixes: 60b91319 staging: fsl-mc: Convert documentation to rst format
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e02ee981

27 7月, 2018 2 次提交

can: ucan: add driver for Theobroma Systems UCAN devices · 9f2d3eae

由 Jakob Unterwurzacher 提交于 4月 11, 2018

The UCAN driver supports the microcontroller-based USB/CAN
adapters from Theobroma Systems. There are two form-factors
that run essentially the same firmware:

* Seal: standalone USB stick ( https://www.theobroma-systems.com/seal )

* Mule: integrated on the PCB of various System-on-Modules from
  Theobroma Systems like the A31-µQ7 and the RK3399-Q7
  ( https://www.theobroma-systems.com/rk3399-q7 )

The USB wire protocol has been designed to be as generic and
hardware-indendent as possible in the hope of being useful for
implementation on other microcontrollers.
Signed-off-by: NMartin Elshuber <martin.elshuber@theobroma-systems.com>
Signed-off-by: NJakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Signed-off-by: NPhilipp Tomsich <philipp.tomsich@theobroma-systems.com>
Acked-by: NWolfgang Grandegger <wg@grandegger.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

9f2d3eae

docs: net: Convert netdev-FAQ to restructured text · 96398ddf

由 Tobin C. Harding 提交于 7月 26, 2018

Preferred kernel docs format is now restructured text.  Convert
netdev-FAQ.txt to restructured text.

 - Add SPDX license identifier.

 - Change file heading 'Information you need to know about netdev' to
  'netdev FAQ' to better suit displayed index (in HTML).

 - Change question/answer layout to suit rst.  Copy format in
   Documentation/bpf/bpf_devel_QA.rst

 - Fix indentation of code snippets

 - If multiple consecutive URLs appear put them in a list (to maintain
  whitespace).

 - Use uniform spelling of 'bug fix' throughout document (not bugfix or
   bug-fix).

 - Add double back ticks to 'net' and 'net-next' when referring to the
   trees.

 - Use rst references for Documentation/ links.

 - Add rst label 'netdev-FAQ' for referencing by other docs files.

 - Remove stale entry from Documentation/networking/00-INDEX
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96398ddf

25 7月, 2018 1 次提交

soc: fsl: dpio: Convert DPIO documentation to .rst · d8e516ba

由 Roy Pledge 提交于 7月 24, 2018

Convert the Datapath I/O documentation to .rst format
and move to the Documation/networking/dpaa2 directory
Signed-off-by: NRoy Pledge <roy.pledge@nxp.com>
Reviewed-by: NHoria Geantă <horia.geanta@nxp.com>
Reviewed-by: NIoana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: NLi Yang <leoyang.li@nxp.com>

d8e516ba

24 7月, 2018 1 次提交

Documentation: networking: cpsw: add MQPRIO & CBS offload examples · ae62372f

由 Ivan Khoronzhuk 提交于 7月 24, 2018

This document describes MQPRIO and CBS Qdisc offload configuration
for cpsw driver based on examples. It potentially can be used in
audio video bridging (AVB) and time sensitive networking (TSN).
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae62372f

19 7月, 2018 2 次提交

docs: networking: Convert bridge.txt to rst · 6b335f82

由 Tobin C. Harding 提交于 7月 18, 2018

The kernel documentation is now restructured text. Convert the Ethernet
Bridge documentation and include it in the toplevel kernel
documentation.

 - Fix heading adornments.
 - Add license identifier.
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b335f82

docs: networking: Convert alias.txt to rst · 735dadf8

由 Tobin C. Harding 提交于 7月 18, 2018

The kernel documentation is now restructured text. Convert the IP
aliasing documentation and include it in the toplevel kernel
documentation.

 - Fix heading adornments.
 - Correctly indent code snippets.
 - Limit line length to 72 characters inline with kernel documentation
   standards.
 - Add license identifier.
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

735dadf8

17 7月, 2018 3 次提交

bonding: Fix a typo in bonding.txt · 9f80a072

由 Masanari Iida 提交于 7月 13, 2018

This patch fixes a spelling typo in bonding.txt
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f80a072

docs: networking: Fix failover build warnings · 28809849

由 Tobin C. Harding 提交于 7月 12, 2018

Currently building the net_failover docs causes a bunch of warnings to
be emitted.  These warnings are all related to indentation and correctly
highlight missing '::' (for code sections).  It looks, from other rst
files in Documentation, that the first column should be indented 2
spaces.

Add '::' before code snippets and indent all snippets uniformly starting
with 2 spaces.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28809849

docs: networking: Add failover docs to index · d95768d3

由 Tobin C. Harding 提交于 7月 12, 2018

Currently we have rst format docs for the failover and net_failover
modules however these docs are not linked to within the index.

Add `failover` and `net_failover` to the networking documentation index.
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d95768d3

12 7月, 2018 3 次提交

networking: e1000.rst: Get rid of Sphinx warnings · 8dc4b1a7

由 Mauro Carvalho Chehab 提交于 6月 26, 2018

Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation.
Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent.
Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent.

While here, fix highlights and mark a table as such.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8dc4b1a7

networking: e100.rst: Get rid of Sphinx warnings · b203cc7a

由 Mauro Carvalho Chehab 提交于 6月 26, 2018

Documentation/networking/e100.rst:57: WARNING: Literal block expected; none found.
Documentation/networking/e100.rst:68: WARNING: Literal block expected; none found.
Documentation/networking/e100.rst:75: WARNING: Literal block expected; none found.
Documentation/networking/e100.rst:84: WARNING: Literal block expected; none found.
Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string without end-string.

While here, fix some highlights.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b203cc7a

Documentation: ip-sysctl.txt: document addr_gen_mode · f168db5e

由 Sabrina Dubroca 提交于 7月 09, 2018

addr_gen_mode was introduced in without documentation, add it now.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f168db5e

02 7月, 2018 1 次提交
- A
  Documentation: Add explanation for XPS using Rx-queue(s) map · a4fd1f4b
  由 Amritha Nambiar 提交于 6月 29, 2018
```
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  a4fd1f4b
28 6月, 2018 1 次提交

skbuff: preserve sock reference when scrubbing the skb. · 9c4c3252

由 Flavio Leitner 提交于 6月 27, 2018

The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.

XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.

TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.

Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action
and on TX side the netfilter checks if the reference is local before
use it.
Signed-off-by: NFlavio Leitner <fbl@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c4c3252

24 6月, 2018 1 次提交

strparser: Corrected typo in documentation. · 3531456a

由 Vakul Garg 提交于 6月 24, 2018

Replaced strp_pause() with strp_unpause() to correct a seemingly copy
paste documentation mistake.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3531456a

23 6月, 2018 4 次提交

Documentation: e1000: Fix docs build error · 805f16a5

由 Tobin C. Harding 提交于 6月 22, 2018

Recent patch updated e1000 docs to rst format.  Docs build (`make
htmldocs`) is currently failing due to this file with error:

        (SEVERE/4) Unexpected section title.

This is because a section of the file is indented 2 spaces.  Build error
can be cleared by aligning the text with column 0.  While we are changing
these lines we can make sure line length does not exceed 72, that
newlines following headings are uniform, and that full stops are
followed by two spaces.

Align text with column 0, limit line length to 72, ensure two spaces
follow all full stops, ensure uniform use of newlines after heading.

Fixes commit (228046e7 Documentation: e1000: Update kernel documentation)

CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

805f16a5

Documentation: e100: Fix docs build error · 3b0c3ebe

由 Tobin C. Harding 提交于 6月 22, 2018

Recent patch updated e100 docs to rst format.  Docs build (`make
htmldocs`) is currently failing due to this file with error:

	(SEVERE/4) Unexpected section title.

This is because a section of the file is indented 2 spaces.  Build error
can be cleared by aligning the text with column 0.  While we are changing
these lines we can make sure line length does not exceed 72, that
newlines following headings are uniform, and that full stops are
followed by two spaces.

Align text with column 0, limit line length to 72, ensure two spaces
follow all full stops, ensure uniform use of newlines after heading.

Fixes commit (85d63445 Documentation: e100: Update the Intel 10/100 driver doc)

CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b0c3ebe

Documentation: e1000: Use correct heading adornment · 3be40e54

由 Tobin C. Harding 提交于 6月 22, 2018

Recently documentation file was converted to rst.  The document title
has the incorrect heading adornment.  From kernel docs:

	* Please stick to this order of heading adornments:

	  1. ``=`` with overline for document title::

	       ==============
	       Document title
	       ==============

Add  overline heading adornment to document title.

Fixes commit (228046e7 Documentation: e1000: Update kernel documentation)

CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3be40e54

Documentation: e100: Use correct heading adornment · 32e6996c

由 Tobin C. Harding 提交于 6月 22, 2018

Recently documentation file was converted to rst.  The document title
has the incorrect heading adornment.  From kernel docs:

	* Please stick to this order of heading adornments:

	  1. ``=`` with overline for document title::

	       ==============
	       Document title
	       ==============

Add  overline heading adornment to document title.

Fixes commit (85d63445 Documentation: e100: Update the Intel 10/100 driver doc)

CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NTobin C. Harding <me@tobin.cc>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32e6996c

15 6月, 2018 1 次提交

docs: can.rst: fix a footnote reference · b7017450

由 Mauro Carvalho Chehab 提交于 5月 06, 2018

As stated at:
	http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes

A footnote should contain either a number, a reference or
an auto number, e. g.:
        [1], [#f1] or [#].

While using [*] accidentaly works for html, it fails for other
document outputs. In particular, it causes an error with LaTeX
output, causing all books after networking to not be built.

So, replace it by a valid syntax.
Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

b7017450

06 6月, 2018 1 次提交

netdev-FAQ: clarify DaveM's position for stable backports · 75d4e704

由 Cong Wang 提交于 6月 05, 2018

Per discussion with David at netconf 2018, let's clarify
DaveM's position of handling stable backports in netdev-FAQ.

This is important for people relying on upstream -stable
releases.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75d4e704

05 6月, 2018 4 次提交

docs: networking: fix minor typos in various documentation files · bb38ccce

由 Olivier Gayot 提交于 6月 04, 2018

This patch fixes some typos/misspelling errors in the
Documentation/networking files.
Signed-off-by: NOlivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb38ccce

net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization · 79e9fed4

由 Maciej Żenczykowski 提交于 6月 03, 2018

This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.

It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
  ie. bound to 'lo' interface or where one of source or destination
  IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.

This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').

This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)

Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -> [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.

This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Wei Wang <weiwan@google.com>
Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79e9fed4

Documentation: e1000: Update kernel documentation · 228046e7

由 Jeff Kirsher 提交于 5月 10, 2018

Updated the e1000.txt kernel documentation with the latest information.

Also convert the text file to reStructuredText (RST) format, since the
Linux kernel documentation now uses this format for documentation.
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>

228046e7

Documentation: e100: Update the Intel 10/100 driver doc · 85d63445

由 Jeff Kirsher 提交于 5月 10, 2018

Over the years, several of the links have changed or are no longer valid
so update them.  In addition, the default values were incorrect for a
couple of parameters.

Converted the text file to the reStructuredText (RST) format, since the
Linux kernel documentation now uses this format for documentation.
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>

85d63445

04 6月, 2018 1 次提交

xsk: new descriptor addressing scheme · bbff2f32

由 Björn Töpel 提交于 6月 04, 2018

Currently, AF_XDP only supports a fixed frame-size memory scheme where
each frame is referenced via an index (idx). A user passes the frame
index to the kernel, and the kernel acts upon the data.  Some NICs,
however, do not have a fixed frame-size model, instead they have a
model where a memory window is passed to the hardware and multiple
frames are filled into that window (referred to as the "type-writer"
model).

By changing the descriptor format from the current frame index
addressing scheme, AF_XDP can in the future be extended to support
these kinds of NICs.

In the index-based model, an idx refers to a frame of size
frame_size. Addressing a frame in the UMEM is done by offseting the
UMEM starting address by a global offset, idx * frame_size + offset.
Communicating via the fill- and completion-rings are done by means of
idx.

In this commit, the idx is removed in favor of an address (addr),
which is a relative address ranging over the UMEM. To convert an
idx-based address to the new addr is simply: addr = idx * frame_size +
offset.

We also stop referring to the UMEM "frame" as a frame. Instead it is
simply called a chunk.

To transfer ownership of a chunk to the kernel, the addr of the chunk
is passed in the fill-ring. Note, that the kernel will mask addr to
make it chunk aligned, so there is no need for userspace to do
that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or
3000 to the fill-ring will refer to the same chunk.

On the completion-ring, the addr will match that of the Tx descriptor,
passed to the kernel.

Changing the descriptor format to use chunks/addr will allow for
future changes to move to a type-writer based model, where multiple
frames can reside in one chunk. In this model passing one single chunk
into the fill-ring, would potentially result in multiple Rx
descriptors.

This commit changes the uapi of AF_XDP sockets, and updates the
documentation.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

bbff2f32

29 5月, 2018 3 次提交

virtio_net: Extend virtio to use VF datapath when available · ba5e4426

由 Sridhar Samudrala 提交于 5月 24, 2018

This patch enables virtio_net to switch over to a VF datapath when STANDBY
feature is enabled and a VF netdev is present with the same MAC address.
It allows live migration of a VM with a direct attached VF without the need
to setup a bond/team between a VF and virtio net device in the guest.

It uses the API that is exported by the net_failover driver to create and
and destroy a master failover netdev. When STANDBY feature is enabled, an
additional netdev(failover netdev) is created that acts as a master device
and tracks the state of the 2 lower netdevs. The original virtio_net netdev
is marked as 'standby' netdev and a passthru device with the same MAC is
registered as 'primary' netdev.

The hypervisor needs to unplug the VF device from the guest on the source
host and reset the MAC filter of the VF to initiate failover of datapath
to virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba5e4426

net: Introduce net_failover driver · cfc80d9a

由 Sridhar Samudrala 提交于 5月 24, 2018

The net_failover driver provides an automated failover mechanism via APIs
to create and destroy a failover master netdev and manages a primary and
standby slave netdevs that get registered via the generic failover
infrastructure.

The failover netdev acts a master device and controls 2 slave devices. The
original paravirtual interface gets registered as 'standby' slave netdev and
a passthru/vf device with the same MAC gets registered as 'primary' slave
netdev. Both 'standby' and 'failover' netdevs are associated with the same
'pci' device. The user accesses the network interface via 'failover' netdev.
The 'failover' netdev chooses 'primary' netdev as default for transmits when
it is available with link up and running.

This can be used by paravirtual drivers to enable an alternate low latency
datapath. It also enables hypervisor controlled live migration of a VM with
direct attached VF by failing over to the paravirtual datapath when the VF
is unplugged.
Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfc80d9a

net: Introduce generic failover module · 30c8bd5a

由 Sridhar Samudrala 提交于 5月 24, 2018

The failover module provides a generic interface for paravirtual drivers
to register a netdev and a set of ops with a failover instance. The ops
are used as event handlers that get called to handle netdev register/
unregister/link change/name change events on slave pci ethernet devices
with the same mac address as the failover netdev.

This enables paravirtual drivers to use a VF as an accelerated low latency
datapath. It also allows migration of VMs with direct attached VFs by
failing over to the paravirtual datapath when the VF is unplugged.
Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30c8bd5a

25 5月, 2018 1 次提交

ppp: remove the PPPIOCDETACH ioctl · af8d3c7c

由 Eric Biggers 提交于 5月 23, 2018

The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea.  It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference.  However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances.  As reported by syzbot, this can trivially
be used to cause a use-after-free.

Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.

All pppd versions released in the last 15 years just close() the file
descriptor instead.

Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
a stub in place that prints a one-time warning and returns EINVAL.

Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NGuillaume Nault <g.nault@alphalink.fr>
Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af8d3c7c

18 5月, 2018 2 次提交

tcp: add tcp_comp_sack_nr sysctl · 9c21d2fc

由 Eric Dumazet 提交于 5月 17, 2018

This per netns sysctl allows for TCP SACK compression fine-tuning.

This limits number of SACK that can be compressed.
Using 0 disables SACK compression.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c21d2fc

tcp: add tcp_comp_sack_delay_ns sysctl · 6d82aa24

由 Eric Dumazet 提交于 5月 17, 2018

This per netns sysctl allows for TCP SACK compression fine-tuning.

Its default value is 1,000,000, or 1 ms to meet TSO autosizing period.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d82aa24

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功