提交 · ef5c0e253d0bde6e11a7f94e58a68c3b354afc6b · openeuler / Kernel

09 2月, 2016 10 次提交

Merge branch 'tpacket-gso-csum-offload' · ef5c0e25

由 David S. Miller 提交于 2月 09, 2016

Willem de Bruijn says:

====================
packet: tpacket gso and csum offload

Extend PACKET_VNET_HDR socket option support to packet sockets with
memory mapped rings.

Patches 2 and 4 add support to tpacket_rcv and tpacket_snd.

Patch 1 prepares for this by moving the relevant virtio_net_hdr
logic out of packet_snd and packet_rcv into helper functions.

GSO transmission requires all headers in the skb linear section.
Patch 3 moves parsing of tx_ring slot headers before skb allocation
to enable allocation with sufficient linear size.

Changes
  v1->v2:
    - fix bounds checks:
      - subtract sizeof(vnet_hdr) before comparing tp_len to size_max
      - compare tp_len to size_max also with GSO, just do not truncate to MTU
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef5c0e25

packet: tpacket_snd gso and checksum offload · 1d036d25

由 Willem de Bruijn 提交于 2月 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.

When enabled, a struct virtio_net_hdr is expected to precede the data
in the ring. The vnet option must be set before the ring is created.

The implementation reuses the existing skb_copy_bits code that is used
when dev->hard_header_len is non-zero. Move this ll_header check to
before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
Allocate and copy the max of the two.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d036d25

packet: parse tpacket header before skb alloc · 8d39b4a6

由 Willem de Bruijn 提交于 2月 03, 2016

GSO packet headers must be stored in the linear skb segment.
Move tpacket header parsing before sock_alloc_send_skb. The GSO
follow-on patch will later increase the skb linear argument to
sock_alloc_send_skb if needed for large packets.

The header parsing code does not require an allocated skb, so is
safe to move. Later pass to tpacket_fill_skb the computed data
start and length.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d39b4a6

packet: vnet_hdr support for tpacket_rcv · 58d19b19

由 Willem de Bruijn 提交于 2月 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_RX_RING.
When enabled, a struct virtio_net_hdr will precede the data in the
packet ring slots.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c

  pkt: 1454269209.798420 len=5066
  vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off
  csum: start=34 off=16
  eth: proto=0x800
  ip: src=<masked> dst=<masked> proto=6 len=5052
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58d19b19

packet: move vnet_hdr code to helper functions · 16cc1400

由 Willem de Bruijn 提交于 2月 03, 2016

packet_snd and packet_rcv support virtio net headers for GSO.
Move this logic into helper functions to be able to reuse it in
tpacket_snd and tpacket_rcv.

This is a straighforward code move with one exception. Instead of
creating and passing a separate gso_type variable, reuse
vnet_hdr.gso_type after conversion from virtio to kernel gso type.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16cc1400

bonding: 3ad: apply ad_actor settings changes immediately · 5ee14e6d

由 Nikolay Aleksandrov 提交于 2月 03, 2016

Currently the bonding allows to set ad_actor_system and prio while the
bond device is down, but these are actually applied only if there aren't
any slaves yet (applied to bond device when first slave shows up, and to
slaves at 3ad bind time). After this patch changes are applied immediately
and the new values can be used/seen after the bond's upped so it's not
necessary anymore to release all and enslave again to see the changes.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ee14e6d

Merge branch 'bridge-mdb-entry-offload-flag' · a1b486ae

由 David S. Miller 提交于 2月 09, 2016

Jiri Pirko says:

====================
bridge: mdb: flag offloaded mdb entries

This patchset extends uapi to let the user know if an mdb entry is offloaded.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1b486ae

bridge: mdb: Passing the port-group pointer to br_mdb module · 9e8430f8

由 Elad Raz 提交于 2月 03, 2016

Passing the port-group to br_mdb in order to allow direct access to the
structure. br_mdb will later use the structure to reflect HW reflection
status via "state" variable.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e8430f8

bridge: mdb: Separate br_mdb_entry->state from net_bridge_port_group->state · 9d06b6d8

由 Elad Raz 提交于 2月 03, 2016

Change net_bridge_port_group 'state' member to 'flags' and define new set
of flags internal to the kernel.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d06b6d8

bridge: mdb: add support for offloaded mdb entries · 157ede67

由 Elad Raz 提交于 2月 03, 2016

Add new bitmask member 'flags' to br_mdb_entry structure. Adding
MDB_FLAGS_OFFLOAD bit which indicates MDB entries is offloaded to hardware.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

157ede67

08 2月, 2016 30 次提交

bonding: trivial: style fixes · d66bd905

由 Zhang Shengju 提交于 2月 03, 2016

remove some redudant brackets, use sizeof(*) instead of sizeof(struct x).
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d66bd905

tcp: Fix syncookies sysctl default. · 0aca737d

由 David S. Miller 提交于 2月 08, 2016

Unintentionally the default was changed to zero, fix
that.

Fixes: 12ed8244 ("ipv4: Namespaceify tcp syncookies sysctl knob")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0aca737d

Merge branch 'ns-tcp-sysctls' · 7158ce80

由 David S. Miller 提交于 2月 07, 2016

Nikolay Borisov says:

====================
Namespaceify more of the tcp sysctl knobs

This patch series continues making more of the tcp-related
sysctl knobs be per net-namespace. Most of these apply per
socket and have global defaults so should be safe and I
don't expect any breakages.

Having those per net-namespace is useful when multiple
containers are hosted and it is required to tune the
tcp settings for each independently of the host node.

I've split the patches to be per-sysctl but after
the review if the outcome is positive I'm happy
to either send it in one big blob or just.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7158ce80

ipv4: Namespaceify tcp_notsent_lowat sysctl knob · 4979f2d9

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4979f2d9

ipv4: Namespaceify tcp_fin_timeout sysctl knob · 1e579caa

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e579caa

N
ipv4: Namespaceify tcp_orphan_retries sysctl knob · c402d9be
由 Nikolay Borisov 提交于 2月 03, 2016
```
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c402d9be

ipv4: Namespaceify tcp_retries2 sysctl knob · c6214a97

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6214a97

ipv4: Namespaceify tcp_retries1 sysctl knob · ae5c3f40

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae5c3f40

ipv4: Namespaceify tcp reordering sysctl knob · 1043e25f

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1043e25f

ipv4: Namespaceify tcp syncookies sysctl knob · 12ed8244

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12ed8244

N
ipv4: Namespaceify tcp synack retries sysctl knob · 7c083ecb
由 Nikolay Borisov 提交于 2月 03, 2016
```
Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
7c083ecb

ipv4: Namespaceify tcp syn retries sysctl knob · 6fa25166

由 Nikolay Borisov 提交于 2月 03, 2016

Signed-off-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fa25166

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 9d1eb21b

由 David S. Miller 提交于 2月 07, 2016

Antonio Quartulli says:

====================
This batch of patches includes a number of corrections and
improvements for our kernel-doc. These changes also make sure
that our doc is now properly processed by the kernel-doc
parsing tool.

Other than that you have a patch updating all the copyright
lines to 2016 and another patch switching the URLs in our
readme, Kconfig and MAINTAINERS file from "http" to "https".
Both by Sven Eckelmann.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d1eb21b

Merge branch 'virtio_net_ethtool_settings' · e6359193

由 David S. Miller 提交于 2月 07, 2016

Nikolay Aleksandrov says:

====================
virtio_net: add ethtool get/set settings support

Patch 1 adds ethtool speed/duplex validation functions which check if the
value is defined. Patch 2 adds support for ethtool (get|set)_settings and
uses the validation functions to check the user-supplied values.

v2: split in 2 patches to allow everyone to make use of the validation
functions and allow virtio_net devices to be half duplex
v3: added a check to return error if the user tries to change anything else
besides duplex/speed as per Michael's comment
v4: Set port type to PORT_OTHER
v5: clear diff1.port (ignore port) when checking for changes since we set
it now and ethtool uses it in the set request

Sorry about the pointless iterations, should've all covered now.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6359193

virtio_net: add ethtool support for set and get of settings · 16032be5

由 Nikolay Aleksandrov 提交于 2月 03, 2016

This patch allows the user to set and retrieve speed and duplex of the
virtio_net device via ethtool. Having this functionality is very helpful
for simulating different environments and also enables the virtio_net
device to participate in operations where proper speed and duplex are
required (e.g. currently bonding lacp mode requires full duplex). Custom
speed and duplex are not allowed, the user-supplied settings are validated
before applying.

Example:
$ ethtool eth1
Settings for eth1:
...
	Speed: Unknown!
	Duplex: Unknown! (255)
$ ethtool -s eth1 speed 1000 duplex full
$ ethtool eth1
Settings for eth1:
...
	Speed: 1000Mb/s
	Duplex: Full

Based on a patch by Roopa Prabhu.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16032be5

ethtool: add speed/duplex validation functions · 103a8ad1

由 Nikolay Aleksandrov 提交于 2月 03, 2016

Add functions which check if the speed/duplex are defined.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

103a8ad1

Merge branch 'sunvnet-tracepoints' · 13340a0a

由 David S. Miller 提交于 2月 07, 2016

Sowmini Varadhan says:

====================
sunvnet: perf tracepoint hooks

Added some perf tracepoints to help track and debug sunvnet
descriptor state at run-time.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13340a0a

sunvnet: perf tracepoint invocations to trace LDC state machine · 365a1028

由 Sowmini Varadhan 提交于 2月 02, 2016

Use sunvnet perf trace macros to monitor LDC message exchange state.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

365a1028

sunvnet: Add support for perf LDC event tracing · 46fcc6ef

由 Sowmini Varadhan 提交于 2月 02, 2016

Add perf event macros for support of tracing and instrumentation
of LDC state machine
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46fcc6ef

Merge branch 'tcp_cong_ctrl_refactoring' · ffeb6437

由 David S. Miller 提交于 2月 07, 2016

Yuchung Cheng says:

====================
tcp: congestion control refactoring

This patch set refactors the sequence of congestion control,
loss recovery, and transmission logic in TCP ack processing.

The design goal is to decouple and sequence them in the following order:

  0. ACK accounting: free or tag sent packets [unchanged]

  1. loss recovery: identify lost/ecn packets and update congestion state

  2. congestion control: up/down cwnd and pacing rate based on (1)

  3. transmission: send new or retransmit old based on (1) and (2)

This refactoring makes the cwnd changes more clear because it's done
in one place. The packet accounting is also more robust especially
for connections that do not support SACK. Patch 1-4 and 6 are
refactoring and patch 5 improves TCP performance under reordering.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffeb6437

tcp: tcp_cong_control helper · d452e6ca

由 Yuchung Cheng 提交于 2月 02, 2016

Refactor and consolidate cwnd and rate updates into a new function
tcp_cong_control().
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d452e6ca

tcp: make congestion control more robust against reordering · 2d14a4de

由 Yuchung Cheng 提交于 2月 02, 2016

This change enables congestion control to update cwnd based on
not only packet cumulatively acked but also packets delivered
out-of-order. This makes congestion control robust against packet
reordering because it may raise cwnd as long as packets are being
delivered once reordering has been detected (i.e., it only cares
the amount of packets delivered, not the ordering among them).
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d14a4de

tcp: refactor pkts acked accounting · 3ebd8871

由 Yuchung Cheng 提交于 2月 02, 2016

A small refactoring that gets number of packets cumulatively acked
from tcp_clean_rtx_queue() directly.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ebd8871

tcp: new delivery accounting · ddf1af6f

由 Yuchung Cheng 提交于 2月 02, 2016

This patch changes the accounting of how many packets are
newly acked or sacked when the sender receives an ACK.

The current approach basically computes

   newly_acked_sacked = (prior_packets - prior_sacked) -
                        (tp->packets_out - tp->sacked_out)

   where prior_packets and prior_sacked out are snapshot
   at the beginning of the ACK processing.

The new approach tracks the delivery information via a new
TCP state variable "delivered" which monotically increases
as new packets are delivered in order or out-of-order.

The reason for this change is that the current approach is
brittle that produces negative or inaccurate estimate.

   1) For non-SACK connections, an ACK that advances the SND.UNA
   could reset the DUPACK counters (tp->sacked_out) in
   tcp_process_loss() or tcp_fastretrans_alert(). This inflates
   the inflight suddenly and causes under-estimate or even
   negative estimate. Here is a real example:

                   before   after (processing ACK)
   packets_out     75       73
   sacked_out      23        0
   ca state        Loss     Open

   The old approach computes (75-23) - (73 - 0) = -21 delivered
   while the new approach computes 1 delivered since it
   considers the 2nd-24th packets are delivered OOO.

   2) MSS change would re-count packets_out and sacked_out so
   the estimate is in-accurate and can even become negative.
   E.g., the inflight is doubled when MSS is halved.

   3) Spurious retransmission signaled by DSACK is not accounted

The new approach is simpler and more robust. For SACK connections,
tp->delivered increments as packets are being acked or sacked in
SACK and ACK processing.

For non-sack connections, it's done in tcp_remove_reno_sacks() and
tcp_add_reno_sack(). When an ACK advances the SND.UNA, tp->delivered
is incremented by the number of packets ACKed (less the current
number of DUPACKs received plus one packet hole).  Upon receiving
a DUPACK, tp->delivered is incremented assuming one out-of-order
packet is delivered.

Upon receiving a DSACK, tp->delivered is incremtened assuming one
retransmission is delivered in tcp_sacktag_write_queue().
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddf1af6f

tcp: move cwnd reduction after recovery state procesing · 31ba0c10

由 Yuchung Cheng 提交于 2月 02, 2016

Currently the cwnd is reduced and increased in various different
places. The reduction happens in various places in the recovery
state processing (tcp_fastretrans_alert) while the increase
happens afterward.

A better sequence is to identify lost packets and update
the congestion control state (icsk_ca_state) first. Then base
on the new state, up/down the cwnd in one central place. It's
more clear to reason cwnd changes.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31ba0c10

tcp: retransmit after recovery processing and congestion control · e662ca40

由 Yuchung Cheng 提交于 2月 02, 2016

The retransmission and F-RTO transmission currently happen inside
recovery state processing (tcp_fastretrans_alert) but before
congestion control.  This refactoring moves the logic after both
s.t. we can determine how much to send (cwnd) before deciding what to
send.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e662ca40

net: drop write-only stack variable · 3575dbf2

由 David Herrmann 提交于 2月 02, 2016

Remove a write-only stack variable from unix_attach_fds(). This is a
left-over from the security fix in:

    commit 712f4aad
    Author: willy tarreau <w@1wt.eu>
    Date:   Sun Jan 10 07:54:56 2016 +0100

        unix: properly account for FDs passed over unix sockets
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3575dbf2

net: Add support for fill_slave_info to VRF device · 67eb0331

由 David Ahern 提交于 2月 02, 2016

Allows userspace to have direct access to VRF table association
versus looking up master device and its table.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67eb0331

xen-netback: implement dynamic multicast control · 22fae97d

由 Paul Durrant 提交于 2月 02, 2016

My recent patch to the Xen Project documents a protocol for 'dynamic
multicast control' in netif.h. This extends the previous multicast control
protocol to not require a shared ring reconnection to turn the feature off.
Instead the backend watches the "request-multicast-control" key in xenstore
and turns the feature off if the key value is written to zero.

This patch adds support for dynamic multicast control in xen-netback.
Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22fae97d

Merge branch 'be2net-non-critical-fixes' · ce30905a

由 David S. Miller 提交于 2月 07, 2016

Sriharsha Basavapatna says:

====================
be2net patch-set

v2 changes:
	Patch-4:	Changed a tab to space in be.h
	Patches-6,7,8:	Updated commit log summary line: benet --> be2net

Hi David,

The following patch set contains a few non-critical bug fixes. Please
consider applying this to the net-next tree. Thanks.

Patch-1 fixes be_set_phys_id() ethtool function to return an error code.
Patch-2 fixes a warning when some commands fail for VFs.
Patch-3 fixes be_vlan_rem_vid() to verify vlan being removed is in the list.
Patch-4 improves SRIOV queue distribution logic.
Patch-5 avoids running self test on VFs.
Patch-6 fixes error recovery in Lancer to clean up after moving to ready state.
Patch-7 adds retry logic to error recovery in case of recovery failures
Patch-8 fixes time interval used in eq delay computation routine
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce30905a

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功