提交 · ca065d0cf80fa547724440a8bf37f1e674d917c0 · openeuler / Kernel

05 4月, 2016 40 次提交

udp: no longer use SLAB_DESTROY_BY_RCU · ca065d0c

由 Eric Dumazet 提交于 4月 01, 2016

Tom Herbert would like not touching UDP socket refcnt for encapsulated
traffic. For this to happen, we need to use normal RCU rules, with a grace
period before freeing a socket. UDP sockets are not short lived in the
high usage case, so the added cost of call_rcu() should not be a concern.

This actually removes a lot of complexity in UDP stack.

Multicast receives no longer need to hold a bucket spinlock.

Note that ip early demux still needs to take a reference on the socket.

Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
but this might be changed later.

Performance for a single UDP socket receiving flood traffic from
many RX queues/cpus.

Simple udp_rx using simple recvfrom() loop :
438 kpps instead of 374 kpps : 17 % increase of the peak rate.

v2: Addressed Willem de Bruijn feedback in multicast handling
 - keep early demux break in __udp4_lib_demux_lookup()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca065d0c

net: add SOCK_RCU_FREE socket flag · a4298e45

由 Eric Dumazet 提交于 4月 01, 2016

We want a generic way to insert an RCU grace period before socket
freeing for cases where RCU_SLAB_DESTROY_BY_RCU is adding too
much overhead.

SLAB_DESTROY_BY_RCU strict rules force us to take a reference
on the socket sk_refcnt, and it is a performance problem for UDP
encapsulation, or TCP synflood behavior, as many CPUs might
attempt the atomic operations on a shared sk_refcnt

UDP sockets and TCP listeners can set SOCK_RCU_FREE so that their
lookup can use traditional RCU rules, without refcount changes.
They can set the flag only once hashed and visible by other cpus.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Tested-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4298e45

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 43e2dfb2

由 David S. Miller 提交于 4月 04, 2016

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2016-04-04

This series contains updates to ixgbe and ixgbevf.

Pavel Tikhomirov fixes a typo where we were incrementing transmit stats
instead of receive stats on the receive side.

Emil updates the ixgbevf driver to use bit operations for setting and
checking the adapter state.

Chas Williams adds the new NDO trust feature check so that the VF guest
has the ability to set the unicast address of the interface, if it is a
trusted VF.

Alex cleans up the driver to that the only time we add a PF entry to the
VLVF is either for VLAN 0 or if the PF has requested a VLAN that a VF
is already using.  Also adds support for generic transmit checksums,
giving the added advantage is that we can support inner checksum offloads
for tunnels and MPLS while still being able to transparently insert
VLAN tags.  Lastly, changed ixgbe so that we can use the ethtool
rx-vlan-filter flag to toggle receive VLAN filtering on and off.

Mark cleans up the ixgbe driver by making all op structures that do not
change constants.  Also fixed flow control for Xeon D KR backplanes, since
we cannot use auto-negotiation to determine the mode, we have to use
whatever the user configured.

Sowmini Varadhan updates ixgbe to use eth_platform_get_mac_address()
instead of the arch specific solution that was added by a previous
commit.

Don fixed an issue where it was possible that a system reset could occur
when we were holding the SWFW semaphore lock, which the next time the
driver loaded would see it incorrectly as locked.

v2: updated patch 8 of the series to include a minor flags issue where
    we had lost NETIF_F_HW_TC and we were setting NETIF_F_SCTP_CRC in
    two different areas, when we only needed/wanted it in one spot.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43e2dfb2

Merge branch 'mv88e6131-hw-bridging-6185' · 6e338048

由 David S. Miller 提交于 4月 04, 2016

Vivien Didelot says:

====================
net: dsa: mv88e6131: HW bridging support for 6185

All packets passing through a switch of the 6185 family are currently all
directed to the CPU port. This means that port bridging is software driven.

To enable hardware bridging for this switch family, we need to implement the
port mapping operations, the FDB operations, and optionally the VLAN operations
(for 802.1Q and VLAN filtering aware systems).

However this family only has 256 FDBs indexed by 8-bit identifiers, opposed to
4096 FDBs with 12-bit identifiers for other families such as 6352. It also
doesn't have dedicated FID registers for ATU and VTU operations.

This patchset fixes these differences, and enable hardware bridging for 6185.

Changes v1 -> v2:
 - Describe the different numbers of databases and prefer a feature-based logic
   over the current ID/family-based logic.
====================
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e338048

net: dsa: mv88e6131: enable hardware bridging · 26892ffc

由 Vivien Didelot 提交于 3月 31, 2016

By adding support for bridge operations, FDB operations, and optionally
VLAN operations (for 802.1Q and VLAN filtering aware systems), the
switch bridges ports correctly, the CPU is able to populate the hardware
address databases, and thus hardware bridging becomes functional within
the 88E6185 family of switches.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26892ffc

net: dsa: mv88e6xxx: map destination addresses for 6185 · f93dd042

由 Vivien Didelot 提交于 3月 31, 2016

The 88E6185 switch also has a MapDA bit in its Port Control 2 register.
When this bit is cleared, all frames are sent out to the CPU port.

Set this bit to rely on address databases (ATU) hits and direct frames
out of the correct ports, and thus allow hardware bridging.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f93dd042

net: dsa: mv88e6xxx: support 256 databases · 11ea809f

由 Vivien Didelot 提交于 3月 31, 2016

The 6185 family of devices has only 256 address databases. Their 8-bit
FID for ATU and VTU operations are split into ATU Control and ATU/VTU
Operation registers.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11ea809f

net: dsa: mv88e6xxx: variable number of databases · f74df0be

由 Vivien Didelot 提交于 3月 31, 2016

Marvell switch chips have different number of address databases.

The code currently only supports models with 4096 databases. Such switch
has dedicated FID registers for ATU and VTU operations. Models with
fewer databases have their FID split in several registers.

List them all but only support models with 4096 databases at the moment.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f74df0be

net: dsa: mv88e6xxx: protect FID registers access · b426e5f7

由 Vivien Didelot 提交于 3月 31, 2016

Only switch families with 4096 address databases have dedicated FID
registers for ATU and VTU operations.

Factorize the access to the GLOBAL_ATU_FID register and introduce a
mv88e6xxx_has_fid_reg() helper function to protect the access to
GLOBAL_ATU_FID and GLOBAL_VTU_FID.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b426e5f7

net: dsa: mv88e6xxx: protect SID register access · 2e7bd5ef

由 Vivien Didelot 提交于 3月 31, 2016

Introduce a mv88e6xxx_has_stu() helper to protect the access to the
GLOBAL_VTU_SID register, instead of checking switch families.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e7bd5ef

ixgbe: Add support for toggling VLAN filtering flag via ethtool · 0c5a6166

由 Alexander Duyck 提交于 3月 10, 2016

This change makes it so that we can use the ethtool rx-vlan-filter flag to
toggle Rx VLAN filtering on and off.  This is basically just an extension
of the existing VLAN promisc work in that it just adds support for the
additional ethtool flag.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0c5a6166

ixgbe: Extend cls_u32 offload to support UDP headers · 4ae78342

由 Amritha Nambiar 提交于 3月 09, 2016

Added support to match on UDP fields in the transport layer.
Extended core logic to support multiple headers.

Verified with the following filters :

	handle 1: u32 divisor 1
	u32 ht 800: order 1 link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
	u32 ht 1: order 2 \
	match tcp src 1024 ffff match tcp dst 23 ffff action drop
	handle 2: u32 divisor 1
	u32 ht 800: order 3 link 2: \
	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 17 ff
	u32 ht 2: order 4 \
	match udp src 1025 ffff match udp dst 24 ffff action drop
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4ae78342

ixgbe: Place SWFW semaphore in known valid state at probe · dbd15b8f

由 Don Skidmore 提交于 3月 09, 2016

It is possible on some HW that a system reset could occur when we are
holding the SWFW semaphore lock.  So next time the driver was loaded we
would see it incorrectly as locked. This patch will recover from that state
by: Attempting to acquire the semaphore and then regardless of whether or
not it was acquire we immediately release it. This will force us into
a known good state.
Signed-off-by: NDon Skidmore <donald.c.skidmore@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

dbd15b8f

ixgbe: add a callback to set the maximum transmit bitrate · c04f90e5

由 Rostislav Pehlivanov 提交于 1月 27, 2016

This commit adds a callback which allows to adjust the maximum transmit
bitrate the card can output. This makes it possible to get a smooth
traffic instead of the default burst-y behaviour when trying to output
e.g. a video stream.

Much of the logic needed to get a correct bcnrc_val was taken from the
ixgbe_set_vf_rate_limit() function.
Signed-off-by: NRostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c04f90e5

ixgbe: Fix flow control for Xeon D KR backplane · afdc71e4

由 Mark Rustad 提交于 1月 25, 2016

Xeon D KR backplane is different from other backplanes,
in that we can't use auto-negotiation to determine the
mode. Instead, use whatever the user configured.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

afdc71e4

ixgbevf: Add support for generic Tx checksums · cb2b3edb

由 Alexander Duyck 提交于 1月 13, 2016

This patch adds support for generic Tx checksums to the ixgbevf driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
		skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
		skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
		skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.  In
the case of the VF drivers this meant adding support for SCTP CRCs, and
inner checksum offloads for MPLS and various tunnel types.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

cb2b3edb

ixgbe: Add support for generic Tx checksums · 49763de0

由 Alexander Duyck 提交于 1月 13, 2016

This patch adds support for generic Tx checksums to the ixgbe driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
		skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
		skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
		skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

49763de0

ixgbe: use eth_platform_get_mac_address() · c7374b5a

由 Sowmini Varadhan 提交于 1月 12, 2016

This commit converts commit c762dff2 ("ixgbe: Look up MAC address in
Open Firmware or IDPROM") to use eth_platform_get_mac_address()
added by commit c7f5d105 ("net: Add eth_platform_get_mac_address()
helper.")
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c7374b5a

ixgbe: Make all unchanging ops structures const · 37689010

由 Mark Rustad 提交于 1月 07, 2016

The source for the ops structure contents are const, so make them
so. Copy them in place with structure assignments instead of memcpys.
Make the mbx_ops accessed by reference instead of making a copy of
the source structure. Update copyright date on the touched files.
Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

37689010

ixgbe: Avoid adding VLAN 0 twice to VLVF and VFTA · 06bb1c39

由 Alexander Duyck 提交于 1月 06, 2016

We were adding VLAN 0 twice each time we restored the VLAN configuration.
Instead of doing it twice we can just start working through the active
VLANs from ID 1 on and skip the double write.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

06bb1c39

irda: sh_irda: remove driver · 9ef280c6

由 Simon Horman 提交于 4月 04, 2016

Remove the sh-irda driver as it appears to be unused since
c0bb9b30 ("ARCH: ARM: shmobile: Remove ag5evm board support").
Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ef280c6

Merge branch 'macb-coding-style' · acf195a9

由 David S. Miller 提交于 4月 04, 2016

Moritz Fischer says:

====================
macb: Codingstyle cleanups

resending almost unchanged v2 here:

Changes from v2:
* Rebased onto net-next
* Changed 5th patches commit message
* Added Nicholas' and Michal's Acked-Bys

Changes from v1:
* Backed out variable scope changes
* Separated out ether_addr_copy into it's own commit
* Fixed typo in comments as suggested by Joe
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acf195a9

net: macb: Fix simple typo · 88023beb

由 Moritz Fischer 提交于 3月 29, 2016

Acked-by: NMichal Simek <michal.simek@xilinx.com>
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: NMoritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88023beb

net: macb: Use ether_addr_copy over memcpy · eefb52d1

由 Moritz Fischer 提交于 3月 29, 2016

Checkpatch suggests using ether_addr_copy over memcpy
to copy the mac address.
Acked-by: NMichal Simek <michal.simek@xilinx.com>
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: NMoritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eefb52d1

net: macb: Fix coding style suggestions · aa50b552

由 Moritz Fischer 提交于 3月 29, 2016

This commit deals with a bunch of checkpatch suggestions
that without changing behavior make checkpatch happier.
Acked-by: NMichal Simek <michal.simek@xilinx.com>
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: NMoritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa50b552

net: macb: Fix coding style warnings · 64ec42fe

由 Moritz Fischer 提交于 3月 29, 2016

This commit takes care of the coding style warnings
that are mostly due to a different comment style and
lines over 80 chars, as well as a dangling else.
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: NMoritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64ec42fe

net: macb: Fix coding style error message · 96ec6310

由 Moritz Fischer 提交于 3月 29, 2016

checkpatch.pl gave the following error:

ERROR: space required before the open parenthesis '('
+	for(; p < end; p++, offset += 4)
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: NMichal Simek <michal.simek@xilinx.com>
Signed-off-by: NMoritz Fischer <moritz.fischer@ettus.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96ec6310

ravb: Add dma queue interrupt support · f51bdc23

由 Kazuya Mizuguchi 提交于 4月 03, 2016

This patch supports the following interrupts.

- One interrupt for multiple (timestamp, error, gPTP)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)

This patch improve efficiency of the interrupt handler by adding the
interrupt handler corresponding to each interrupt source described
above. Additionally, it reduces the number of times of the access to
EthernetAVB IF.
Also this patch prevent this driver depends on the whim of a boot loader.

[ykaneko0929@gmail.com: define bit names of registers]
[ykaneko0929@gmail.com: add comment for gen3 only registers]
[ykaneko0929@gmail.com: fix coding style]
[ykaneko0929@gmail.com: update changelog]
[ykaneko0929@gmail.com: gen3: fix initialization of interrupts]
[ykaneko0929@gmail.com: gen3: fix clearing interrupts]
[ykaneko0929@gmail.com: gen3: add helper function for request_irq()]
[ykaneko0929@gmail.com: gen3: remove IRQF_SHARED flag for request_irq()]
[ykaneko0929@gmail.com: revert ravb_close() and ravb_ptp_stop()]
[ykaneko0929@gmail.com: avoid calling free_irq() to non-hooked interrupts]
[ykaneko0929@gmail.com: make NC/BE interrupt handler a function]
[ykaneko0929@gmail.com: make timestamp interrupt handler a function]
[ykaneko0929@gmail.com: timestamp interrupt is handled in multiple
 interrupt handler instead of dma queue interrupt handler]
Signed-off-by: NKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: NYoshihiro Kaneko <ykaneko0929@gmail.com>
Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f51bdc23

ixgbe: Do not allow PF to add VLVF entry unless it actually needs it · 18be4fce

由 Alexander Duyck 提交于 1月 06, 2016

While doing the work on igb I realized there were a few cases where we were
still adding VLANs to the VLVF entries for the PF when they were not
needed.  This patch cleans that up so that the only time we add a PF entry
to the VLVF is either for VLAN 0 or if the PF has requested a VLAN that a VF
is already using.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

18be4fce

ixgbe: Extend trust to allow guest to set unicast address · 1d96cf98

由 chas williams 提交于 1月 05, 2016

When running certain routing protocols like VRRP, VF guests need the
ability to set the unicast address of the interface.  Extend the new ndo
trust feature to let the hypervisor trust a guest to set/update its own
unicast address.
Signed-off-by: NChas Williams <3chas3@gmail.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1d96cf98

ixgbevf: use bit operations for setting and checking resets · d5dd7c3f

由 Emil Tantilov 提交于 12月 17, 2015

Move the reset flags to adapter->state in order to make use of bit
operations.

This is an alternative patch to the one previously submitted by
John Greene.
Suggested-by: NAlexander Duyck <aduyck@mirantis.com>
Reported-by: NScott Otto <otts62@yahoo.com>
Reported-by: NJohn Greene <jogreene@redhat.com>
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d5dd7c3f

Merge branch 'cmsg_timestamp' · 9d2355ba

由 David S. Miller 提交于 4月 04, 2016

Soheil Hassas Yeganeh says:

====================
add TX timestamping via cmsg

This patch series aim at enabling TX timestamping via cmsg.

Currently, to occasionally sample TX timestamping on a socket,
applications need to call setsockopt twice: first for enabling
timestamps and then for disabling them. This is an unnecessary
overhead. With cmsg, in contrast, applications can sample TX
timestamps per sendmsg().

This patch series adds the code for processing SO_TIMESTAMPING
for cmsg's of the SOL_SOCKET level, and adds the glue code for
TCP, UDP, and RAW for both IPv4 and IPv6. This implementation
supports overriding timestamp generation flags (i.e.,
SOF_TIMESTAMPING_TX_*) but not timestamp reporting flags.
Applications must still enable timestamp reporting via
setsockopt to receive timestamps.

This series does not change existing timestamping behavior for
applications that are using socket options.

I will follow up with another patch to enable timestamping for
active TFO (client-side TCP Fast Open) and also setting packet
mark via cmsgs.

Thanks!

Changes in v2:
        - Replace u32 with __u32 in the documentation.

Changes in v3:
	- Fix the broken build for L2TP (due to changes
	  in IPv6).
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d2355ba

sock: document timestamping via cmsg in Documentation · fd91e12f

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Update docs and add code snippet for using cmsg for timestamping.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd91e12f

sock: enable timestamping using control messages · c14ac945

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.

Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.

Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c14ac945

ipv6: process socket-level control messages in IPv6 · ad1e46a8

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Process socket-level control messages by invoking
__sock_cmsg_send in ip6_datagram_send_ctl for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip6_datagram_send_ctl is called for
udp and raw, we also process socket-level control messages.

This is a bit uglier than IPv4, since IPv6 does not have
something like ipcm_cookie. Perhaps we can later create
a control message cookie for IPv6?

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv6 control messages.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad1e46a8

ipv4: process socket-level control messages in IPv4 · 24025c46

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Process socket-level control messages by invoking
__sock_cmsg_send in ip_cmsg_send for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip_cmsg_send is called in udp, icmp,
and raw, we also process socket-level control messages.

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv4 control messages.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24025c46

sock: accept SO_TIMESTAMPING flags in socket cmsg · 3dd17e63

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Accept SO_TIMESTAMPING in control messages of the SOL_SOCKET level
as a basis to accept timestamping requests per write.

This implementation only accepts TX recording flags (i.e.,
SOF_TIMESTAMPING_TX_HARDWARE, SOF_TIMESTAMPING_TX_SOFTWARE,
SOF_TIMESTAMPING_TX_SCHED, and SOF_TIMESTAMPING_TX_ACK) in
control messages. Users need to set reporting flags (e.g.,
SOF_TIMESTAMPING_OPT_ID) per socket via socket options.

This commit adds a tsflags field in sockcm_cookie which is
set in __sock_cmsg_send. It only override the SOF_TIMESTAMPING_TX_*
bits in sockcm_cookie.tsflags allowing the control message
to override the recording behavior per write, yet maintaining
the value of other flags.

This patch implements validating the control message and setting
tsflags in struct sockcm_cookie. Next commits in this series will
actually implement timestamping per write for different protocols.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dd17e63

tcp: use one bit in TCP_SKB_CB to mark ACK timestamps · 6b084928

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Currently, to avoid a cache line miss for accessing skb_shinfo,
tcp_ack_tstamp skips socket that do not have
SOF_TIMESTAMPING_TX_ACK bit set in sk_tsflags. This is
implemented based on an implicit assumption that the
SOF_TIMESTAMPING_TX_ACK is set via socket options for the
duration that ACK timestamps are needed.

To implement per-write timestamps, this check should be
removed and replaced with a per-packet alternative that
quickly skips packets missing ACK timestamps marks without
a cache-line miss.

To enable per-packet marking without a cache line miss, use
one bit in TCP_SKB_CB to mark a whether a SKB might need a
ack tx timestamp or not. Further checks in tcp_ack_tstamp are not
modified and work as before.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b084928

tcp: accept SOF_TIMESTAMPING_OPT_ID for passive TFO · 6db8b963

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

SOF_TIMESTAMPING_OPT_ID is set to get data-independent IDs
to associate timestamps with send calls. For TCP connections,
tp->snd_una is used as the starting point to calculate
relative IDs.

This socket option will fail if set before the handshake on a
passive TCP fast open connection with data in SYN or SYN/ACK,
since setsockopt requires the connection to be in the
ESTABLISHED state.

To address these, instead of limiting the option to the
ESTABLISHED state, accept the SOF_TIMESTAMPING_OPT_ID option as
long as the connection is not in LISTEN or CLOSE states.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6db8b963

sock: break up sock_cmsg_snd into __sock_cmsg_snd and loop · 39771b12

由 Willem de Bruijn 提交于 4月 02, 2016

To process cmsg's of the SOL_SOCKET level in addition to
cmsgs of another level, protocols can call sock_cmsg_send().
This causes a double walk on the cmsghdr list, one for SOL_SOCKET
and one for the other level.

Extract the inner demultiplex logic from the loop that walks the list,
to allow having this called directly from a walker in the protocol
specific code.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39771b12

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功