提交 · cdba756f5803a2f0a8bbc6605acc166dd817979e · openeuler / Kernel

07 1月, 2016 14 次提交

net: move ndo_features_check() close to ndo_start_xmit() · cdba756f

由 Eric Dumazet 提交于 1月 06, 2016

TX fast path uses ndo_start_xmit(), ndo_features_check() and
ndo_select_queue().

Move ndo_features_check() close to ndo_start_xmit() to increase
data locality.

All "struct net_device_ops" should now be using C99 initializers.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdba756f

fsl/fman: double free on probe failure · 9e02d8ca

由 Dan Carpenter 提交于 1月 06, 2016

"priv" is allocated with devm_kzalloc() so freeing it here with kfree()
will lead to a double free.

Fixes: 39339616 ('fsl/fman: Add FMan MAC driver')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e02d8ca

fsl/fman: fix the pause_time test · e06a03bd

由 Dan Carpenter 提交于 1月 06, 2016

pause_time is unsigned so it can't be less than zero.  The bug means
that we allow invalid pause-times.

Fixes: 57ba4c9b ('fsl/fman: Add FMan MAC support')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e06a03bd

mlxsw: core: remove an unnecessary condition · 719255d0

由 Dan Carpenter 提交于 1月 06, 2016

We checked "err" on the lines before so we know it's zero here.

These cause a static checker warning because checking known things can
indicate a bug.  Maybe there is a missing assignment or we are checking
the wrong variable.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

719255d0

ethernet/atheros/alx: sanitize buffer sizing and padding · c406700c

由 Jarod Wilson 提交于 1月 06, 2016

This is based on the work done by Przemek Rudy in bug 70761 at
bugzilla.kernel.org, but with some work done to disentagle and clarify
things a bit.

Similar to Przemek's work and other drivers, we're adding a padding of 16
here, but we're also disentangling mtu size calculations from max buffer
size calculations a bit, and adding ETH_HLEN to the value written into
ALX_MTU. Hopefully, with a bit more consistency and clarity, things behave
better here. Sadly, I can only test in my alx-driven E2200, which worked
just fine before this patch.

In comment #58 of bug 70761, Eugene A. Shatokhin reports that this patch
does help considerably for a ROSA Linux user of his with an AR8162 network
adapter when patched into a 4.1.x-based kernel, with several days of
normal operation where wired network previously wasn't usable without
setting MTU to 9000 as a work-around.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
CC: "Eugene A. Shatokhin" <eugene.shatokhin@rosalab.ru>
CC: Przemek Rudy <prudy1@o2.pl>
CC: Jay Cliburn <jcliburn@gmail.com>
CC: Chris Snook <chris.snook@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c406700c

Merge branch 'mlxsw-vlan_filtering-offload' · f637941b

由 David S. Miller 提交于 1月 06, 2016

Jiri Pirko says:

====================
mlxsw: add offload support for vlan_filtering option

Elad says:

This patch adds SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port attribute.
When a bridge is offloaded to hardware, the hardware can learn if the bridge is
.1Q bridge (VLAN-aware) or not VLAN aware bridge.
In order to toggle the mode a user can use sysfs:
$ echo 1 > /sys/devices/virtual/net/br0/bridge/vlan_filtering
or via iproute2:
$ ip link set dev br0 type bridge vlan_filtering 1

---
v1->v2: small fix in patch #1
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f637941b

mlxsw: Remember untagged VLANs · fc1273af

由 Elad Raz 提交于 1月 06, 2016

When a vlan is been configured, remeber the untagged mode of the vlan.
When displaying the list of configured VLANs, show the untagged attribute.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc1273af

mlxsw: Disable vlan_filtering for non .1D bridge · 26a4ea0f

由 Elad Raz 提交于 1月 06, 2016

When a port is bridged, the bridge must be vlan aware bridge (.1Q)
or the bridging should be on top of VLAN interfaces (.1D bridge).
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26a4ea0f

mlxsw: Renaming local variable names for consistency · e4a13055

由 Elad Raz 提交于 1月 06, 2016

Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4a13055

mlxsw: Fixing vlans init range · 29edf44f

由 Elad Raz 提交于 1月 06, 2016

Initialize VLANs 0..4095 (Remove init for VID 4096).
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29edf44f

bridge: add vlan filtering change for new bridged device · 404cdbf0

由 Elad Raz 提交于 1月 06, 2016

Notifying hardware about newly bridged port vlan-aware changes.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

404cdbf0

bridge: add vlan filtering change notification · 6b72a770

由 Elad Raz 提交于 1月 06, 2016

Notifying hardware about bridge vlan-aware changes.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b72a770

switchdev: add bridge vlan_filtering attribute · 81435c33

由 Elad Raz 提交于 1月 06, 2016

Adding vlan_filtering attribute to allow hardware vendor to support
vlan-aware bridges. Vlan_filtering is a per-bridge attribute.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81435c33

bridge: Propagate vlan add failure to user · 08474cc1

由 Elad Raz 提交于 1月 06, 2016

Disallow adding interfaces to a bridge when vlan filtering operation
failed. Send the failure code to the user.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08474cc1

06 1月, 2016 18 次提交

soreuseport: change consume_skb to kfree_skb in error case · 00ce3a15

由 Craig Gallek 提交于 1月 05, 2016

Fixes: 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00ce3a15

soreuseport: pass skb to secondary UDP socket lookup · 1134158b

由 Craig Gallek 提交于 1月 05, 2016

This socket-lookup path did not pass along the skb in question
in my original BPF-based socket selection patch.  The skb in the
udpN_lib_lookup2 path can be used for BPF-based socket selection just
like it is in the 'traditional' udpN_lib_lookup path.

udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
the same hlist slot.  Coincidentally, I chose 10 sockets per
reuseport group in my functional test, so the lookup2 path was not
excersised. This adds an additional set of tests with 20 sockets.

Fixes: 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Fixes: 3ca8e402 ("soreuseport: BPF selection functional test")
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1134158b

mlxsw: pci: Adjust value of CPU egress traffic class · f0138e25

由 Ido Schimmel 提交于 1月 05, 2016

During initialization, when creating the send descriptor queues (SDQs),
we specify the CPU egress traffic class of each SDQ. The maximum number
of classes of this type is different in the two ASICs supported by this
PCI driver.

New firmware versions check this value is set correctly, which causes
errors on the Spectrum ASIC, as its max exposed egress traffic class is
lower than 7.

Solve this by setting this field to 3, which is an acceptable value for
both ASICs.

Note that we currently do not expose the QoS capabilities of the ASICs,
so setting this to an hardcoded value is OK for now.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0138e25

Merge tag 'wireless-drivers-next-for-davem-2016-01-05' of... · 56b87180

由 David S. Miller 提交于 1月 06, 2016

Merge tag 'wireless-drivers-next-for-davem-2016-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
brcfmac

* fix IBSS which got broken over time
* new USB id for bcm43242 dongle
* arp offload configuration through inet notifier

ath9k

* add random number generator support (CONFIG_ATH9K_HWRNG)

iwlwifi

* Make scan parameters low latency aware
* Fix in the NL80211_FEATURE_FULL_AP_CLIENT_STATE state case
* Fix enable injection mode (Chaya Rachel)
* Various cleanups (Dan / Julia / myself)
* Allow to stay more time on popular channels (David Spinadel)
* Bug fixes for D0i3 (Eliad / Luca)
* Fixes for GO uAPSD (myself)
* Start of TSO support (myself)
* Rate control bug fixes (Eyal / Gregory)
* Start the work on 9000 devices (Johannes / Sara / Oren)
* Start the work on a new Tx queue allocation model (Liad)
* Debug infrastructure enhancements (Golan)

mwifiex

* add a debugfs file for chip reset
* advertise SMS4 cipher suite
* increase ap and station interface limit to 3
* enable MSI support on newer pcie devices (8897 onwards)

rtlwifi

* fix lots of module parameter usage
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56b87180

net: hns: avoid uninitialized variable warning: · be78a690

由 Arnd Bergmann 提交于 1月 01, 2016

gcc fails to see that the use of the 'last_offset' variable
in hns_nic_reuse_page() is used correctly and issues a bogus
warning:

drivers/net/ethernet/hisilicon/hns/hns_enet.c: In function 'hns_nic_reuse_page':
drivers/net/ethernet/hisilicon/hns/hns_enet.c:541:6: warning: 'last_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]

This simplifies the function to make it more obvious what is
going on to both readers and compilers, which makes the warning
go away.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be78a690

inet: kill unused skb_free op · a72a5e2d

由 Florian Westphal 提交于 1月 05, 2016

The only user was removed in commit
029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free clone operations").
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a72a5e2d

include/uapi/linux/sockios.h: mark SIOCRTMSG unused · 2fbf5758

由 xypron.glpk@gmx.de 提交于 1月 05, 2016

IOCTL SIOCRTMSG does nothing but return EINVAL.

So comment it as unused.

SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h

inet_ioctl calls ip_rt_ioctl.
ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fbf5758

Merge branch 'mlx5e-tstamp' · 1633bf11

由 David S. Miller 提交于 1月 05, 2016

Saeed Mahameed says:

====================
Introduce mlx5 ethernet timestamping

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

Changes from V2:
net/mlx5_core: Introduce access function to read internal_timer
	- Remove one line function
	- Change function name

net/mlx5e: Add HW timestamping (TS) support:
	- Data path performance optimization (caching tstamp struct in rq,sq)
	- Change read/write_lock_irqsave to read/write_lock
	- Move ioctl functions to en_clock file
	- Changed overflow start algorithm according to comments from Richard
	- Move timestamp init/cleanup to open/close ndos.

In details:

1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.

2nd patch adds the needed low level helpers for:
	- Fetching the hardware clock (hardware internal timer)
	- Parsing CQEs timestamps
	- Device frequency capability

3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
	- Internal clock structure initialization and other helper functions
	- Added the needed ioctl for setting/getting the current timestamping
	  configuration.
	- used this configuration in RX/TX data path to fill the SKB with
	  the timestamp.

4th patch Introduces PTP (PHC) support.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1633bf11

net/mlx5e: Add PTP Hardware Clock (PHC) support · 3d8c38af

由 Eran Ben Elisha 提交于 12月 29, 2015

Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d8c38af

net/mlx5e: Add HW timestamping (TS) support · ef9814de

由 Eran Ben Elisha 提交于 12月 29, 2015

Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef9814de

net/mlx5_core: Introduce access function to read internal timer · b0844444

由 Eran Ben Elisha 提交于 12月 29, 2015

A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0844444

net/mlx5e: Do not modify the TX SKB · 34802a42

由 Achiad Shochat 提交于 12月 29, 2015

If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34802a42

Merge branch 'sctp-transport-rhashtable' · 33c15297

由 David S. Miller 提交于 1月 05, 2016

Xin Long says:

====================
sctp: use transport hashtable to replace association's with rhashtable

for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.

when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:

in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
       and use one of them to send data to server]

===== test on net-next
-- perf top
server:
  55.73%  [kernel]             [k] sctp_assoc_is_match
   6.80%  [kernel]             [k] sctp_assoc_lookup_paddr
   4.81%  [kernel]             [k] sctp_v4_cmp_addr
   3.12%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.94%  [kernel]             [k] sctp_cmp_addr_exact

client:
  46.01%  [kernel]                    [k] sctp_endpoint_lookup_assoc
   5.55%  libc-2.17.so                [.] __libc_calloc
   5.39%  libc-2.17.so                [.] _int_free
   3.92%  libc-2.17.so                [.] _int_malloc
   3.23%  [kernel]                    [k] __memset

-- spent time
time is 487s, send pkt is 10000000

we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.

besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.

===== test with this patchset:
-- perf top
server:
  15.98%  [kernel]                 [k] _raw_spin_unlock_irqrestore
   9.92%  [kernel]                 [k] __pv_queued_spin_lock_slowpath
   7.22%  [kernel]                 [k] copy_user_generic_string
   2.38%  libpthread-2.17.so       [.] __recvmsg_nocancel
   1.88%  [kernel]                 [k] sctp_recvmsg

client:
  11.90%  [kernel]                   [k] sctp_hash_cmp
   8.52%  [kernel]                   [k] rht_deferred_worker
   4.94%  [kernel]                   [k] __pv_queued_spin_lock_slowpath
   3.95%  [kernel]                   [k] sctp_bind_addr_match
   2.49%  [kernel]                   [k] __memset

-- spent time
time is 22s, send pkt is 10000000
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33c15297

sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc · c79c0666

由 Xin Long 提交于 12月 30, 2015

sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c79c0666

sctp: drop the old assoc hashtable of sctp · b5eff712

由 Xin Long 提交于 12月 30, 2015

transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5eff712

sctp: apply rhashtable api to sctp procfs · 39f66a7d

由 Xin Long 提交于 12月 30, 2015

Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39f66a7d

sctp: apply rhashtable api to send/recv path · 4f008781

由 Xin Long 提交于 12月 30, 2015

apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f008781

sctp: add the rhashtable apis for sctp global transport hashtable · d6c0256a

由 Xin Long 提交于 12月 30, 2015

tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6c0256a

05 1月, 2016 8 次提交

Merge branch 'faster-soreuseport' · 6a5ef90c

由 David S. Miller 提交于 1月 04, 2016

Craig Gallek says:

====================
Faster SO_REUSEPORT

This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.

This series only includes the UDP path.  I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.

Changes in v4:
- pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov)

Changes in v3:
- skb_pull_inline -> pskb_pull (per Alexei Starovoitov)
- reuseport_attach* -> sk_reuseport_attach* and simple return statement
  syntax change (per Daniel Borkmann)

Changes in v2:
- Fix ARM build; remove unnecessary include.
- Handle case where protocol header is not in linear section (per
  Alexei Starovoitov).
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a5ef90c

soreuseport: BPF selection functional test · 3ca8e402

由 Craig Gallek 提交于 1月 04, 2016

This program will build classic and extended BPF programs and
validate the socket selection logic when used with
SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.

It also validates the re-programing flow and several edge cases.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ca8e402

soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1

由 Craig Gallek 提交于 1月 04, 2016

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

538950a1

soreuseport: fast reuseport UDP socket selection · e32ea7e7

由 Craig Gallek 提交于 1月 04, 2016

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e32ea7e7

soreuseport: define reuseport groups · ef456144

由 Craig Gallek 提交于 1月 04, 2016

struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group.  When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.

Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made.  With this structure, an appropriate
socket can be found after looking up just one socket in the group.

This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef456144

Merge branch 'mlxsw-fixes' · ebb3cf41

由 David S. Miller 提交于 1月 04, 2016

Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of fixes from Ido.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebb3cf41

mlxsw: spectrum: Change bridge port attributes only when bridged · 6c72a3d0

由 Ido Schimmel 提交于 1月 04, 2016

Bridge port attributes are offloaded to hardware when invoked with SELF
flag set, but it really makes no sense to reflect them when port is not
bridged.

Allow a user to change these attribute only when port is bridged and
initialize them correctly when joining or leaving a bridge.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c72a3d0

mlxsw: spectrum: Set bridge status in appropriate functions · 5a8f4525

由 Ido Schimmel 提交于 1月 04, 2016

Set the bridge status of physical ports in the appropriate functions, to
be consistent with LAG join/leave and vPorts joining/leaving bridge.

Also, remove the error messages in these two functions, as we already
emit errors in both the single functions they call.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a8f4525

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功