提交 · b073ac1fcf42376018f6db6acc885dfd2cc9ff02 · openanolis / cloud-kernel

09 11月, 2016 3 次提交

brcmfmac: proto: add callback for queuing TX data · b073ac1f

由 Rafał Miłecki 提交于 9月 26, 2016

So far our core code was calling brcmf_fws_process_skb which wasn't
a proper thing to do. If case of devices using msgbuf protocol fwsignal
shouldn't be used. It was an unnecessary extra layer simply calling
a protocol specifix txdata function.

Please note we already have txdata callback, but it's used for calls
between bcdc and fwsignal so it couldn't be simply used there.

This makes core code more generic (instead of bcdc/fwsignal specific).
Signed-off-by: NRafał Miłecki <rafal@milecki.pl>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>

b073ac1f

rt2x00: add support for mac addr from device tree · 9766cb70

由 Mathias Kresin 提交于 8月 26, 2016

On some devices the EEPROMs of Ralink Wi-Fi chips have a default Ralink
MAC address set (RT3062F: 00:0C:43:30:62:00, RT3060F:
00:0C:43:30:60:00). Using multiple of these devices in the same network
can cause nasty issues.

Allow to override the MAC in the EEPROM with (a known good) one set in
the device tree to bypass the issue.
Signed-off-by: NMathias Kresin <dev@kresin.me>
Acked-by: NStanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>

9766cb70

Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git · b9824cb8

由 Kalle Valo 提交于 11月 09, 2016

ath.git patches for 4.10. Major changes:

ath10k

* allow setting coverage class for first generation cards
* read regulatory domain from ACPI

ath9k

* disable RNG by default

b9824cb8

27 10月, 2016 16 次提交

Merge tag 'iwlwifi-next-for-kalle-2016-10-25-2' of... · 3f8247c8

由 Kalle Valo 提交于 10月 27, 2016

Merge tag 'iwlwifi-next-for-kalle-2016-10-25-2' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

* Finalize and enable dynamic queue allocation;
* Use dev_coredumpmsg() to prevent locking the driver;
* Small fix to pass the AID to the FW;
* Use FW PS decisions with multi-queue;

3f8247c8

devlink: Prevent port_type_set() callback when it's not needed · 6edf1017

由 Elad Raz 提交于 10月 23, 2016

When a port_type_set() is been called and the new port type set is the same
as the old one, just return success.
Signed-off-by: NElad Raz <eladr@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6edf1017

firewire: net: set initial MTU = 1500 unconditionally, fix IPv6 on some CardBus cards · 89ab88b0

由 Stefan Richter 提交于 10月 23, 2016

firewire-net, like the older eth1394 driver, reduced the initial MTU to
less than 1500 octets if the local link layer controller's asynchronous
packet reception limit was lower.

This is bogus, since this reception limit does not have anything to do
with the transmission limit.  Neither did this reduction affect the TX
path positively, nor could it prevent link fragmentation at the RX path.

Many FireWire CardBus cards have a max_rec of 9, causing an initial MTU
of 1024 - 16 = 1008.  RFC 2734 and RFC 3146 allow a minimum max_rec = 8,
which would result in an initial MTU of 512 - 16 = 496.  On such cards,
IPv6 could only be employed if the MTU was manually increased to 1280 or
more, i.e. IPv6 would not work without intervention from userland.

We now always initialize the MTU to 1500, which is the default according
to RFC 2734 and RFC 3146.

On a VIA VT6316 based CardBus card which was affected by this, changing
the MTU from 1008 to 1500 also increases TX bandwidth by 6 %.
RX remains unaffected.

CC: netdev@vger.kernel.org
CC: linux1394-devel@lists.sourceforge.net
CC: Jarod Wilson <jarod@redhat.com>
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89ab88b0

firewire: net: fix maximum possible MTU · 5d48f00d

由 Stefan Richter 提交于 10月 23, 2016

Commit b3e3893e ("net: use core MTU range checking in misc drivers")
mistakenly introduced an upper limit for firewire-net's MTU based on the
local link layer controller's reception capability.  Revert this.  Neither
RFC 2734 nor our implementation impose any particular upper limit.

Actually, to be on the safe side and to make the code explicit, set
ETH_MAX_MTU = 65535 as upper limit now.

(I replaced sizeof(struct rfc2734_header) by the equivalent
RFC2374_FRAG_HDR_SIZE in order to avoid distracting long/int conversions.)

Fixes: b3e3893e('net: use core MTU range checking in misc drivers')
CC: netdev@vger.kernel.org
CC: linux1394-devel@lists.sourceforge.net
CC: Jarod Wilson <jarod@redhat.com>
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d48f00d

net: netcp: add missing of_node_put() in netcp_probe() · e2897b82

由 Wei Yongjun 提交于 10月 22, 2016

This node pointer is returned by of_get_child_by_name() with refcount
incremented in this function. of_node_put() on it before exitting this
function.

This is detected by Coccinelle semantic patch.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2897b82

net: ena: use setup_timer() and mod_timer() · f850b4a7

由 Wei Yongjun 提交于 10月 22, 2016

Use setup_timer() instead of init_timer(), being the preferred/standard
way to set a timer up.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer and mod_timer to setup and arm a timer, to make the code
cleaner and easier to read.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f850b4a7

amd-xgbe: Fix error return code in xgbe_probe() · 675a6cee

由 Wei Yongjun 提交于 10月 22, 2016

Fix to return error code -ENODEV from the DMA is not supported error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

675a6cee

net: ns83820: use dev_kfree_skb_irq instead of kfree_skb · 0942170f

由 Wei Yongjun 提交于 10月 22, 2016

It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled, spin_lock_irqsave()
make sure always in irq disable context. So the kfree_skb()
should be replaced with dev_kfree_skb_irq().

This is detected by Coccinelle semantic patch.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0942170f

net: eth: altera: Fix error return code in altera_tse_probe() · a24a9d7a

由 Wei Yongjun 提交于 10月 22, 2016

Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a24a9d7a

net: dsa: mv88e6xxx: use setup_timer to simplify the code · 68497a87

由 Wei Yongjun 提交于 10月 22, 2016

Use setup_timer function instead of initializing timer with the function
and data fields.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68497a87

net: netcp: drop kfree for memory allocated with devm_kzalloc · 1aaa87af

由 Wei Yongjun 提交于 10月 22, 2016

It's not necessary to free memory allocated with devm_kzalloc in the
remove path and using kfree leads to a double free.

Fixes: 84640e27 ("net: netcp: Add Keystone NetCP core ethernet
driver")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1aaa87af

batman-adv: Revert "use core MTU range checking in misc drivers" · 701470ba

由 Sven Eckelmann 提交于 10月 22, 2016

The maximum MTU is defined via the slave devices of an batman-adv
interface. Thus it is not possible to calculate the max_mtu during the
creation of the batman-adv device when no slave devices are attached. Doing
so would for example break non-fragmentation setups which then
(incorrectly) allow an MTU of 1500 even when underlying device cannot
transport 1500 bytes + batman-adv headers.

Checking the dynamically calculated max_mtu via the minimum of the slave
devices MTU during .ndo_change_mtu is also used by the bridge interface.

Cc: Jarod Wilson <jarod@redhat.com>
Fixes: b3e3893e ("net: use core MTU range checking in misc drivers")
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

701470ba

Merge branch 'BCM54612E' · f806f772

由 David S. Miller 提交于 10月 26, 2016

Xo Wang says:

====================
Broadcom BCM54612E support

This series is based on tip of torvalds/master.

The first patch adds register definitions from Broadcom docs.

The second patch adds the BCM54612E PHY ID, flags, and device-specific
RGMII internal delay initialization.

I tested on a custom board with an Aspeed AST2500 SOC with its second
MAC connected to this PHY.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f806f772

net: phy: broadcom: Add support for BCM54612E · d92ead16

由 Xo Wang 提交于 10月 21, 2016

This PHY has internal delays enabled after reset. This clears the
internal delay enables unless the interface specifically requests them.
Signed-off-by: NXo Wang <xow@google.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d92ead16

net: phy: broadcom: Update Auxiliary Control Register macros · 3cf25904

由 Xo Wang 提交于 10月 21, 2016

Add the RXD-to-RXC skew (delay) time bit in the Miscellaneous Control
shadow register and a mask for the shadow selector field.

Remove a re-definition of MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL.
Signed-off-by: NXo Wang <xow@google.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3cf25904

sch_htb: do not report fake rate estimators · 73e42ff7

由 Eric Dumazet 提交于 10月 21, 2016

When I prepared commit d250a5f9 ("pkt_sched: gen_estimator: Dont
report fake rate estimators"), htb still had an implicit rate estimator
for all its classes.

Then later, I made this rate estimator optional in commit 64153ce0
("net_sched: htb: do not setup default rate estimators"), but I forgot
to update htb use of gnet_stats_copy_rate_est()

After this patch, "tc -s qdisc ..." no longer report fake rate
estimators for HTB classes.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73e42ff7

26 10月, 2016 2 次提交

iwlwifi: mvm: use dev_coredumpsg() · 7e62a699

由 Aviya Erenfeld 提交于 9月 20, 2016

iwlmvm currently uses dev_coredumpm() to collect multiple
buffers, but this has the downside of pinning the module
until the coredump expires, if the data isn't read by any
userspace.

Avoid this by using the new dev_coredumpsg() method. We
still copy the data from the old way of generating it, but
neither hold on to vmalloc'ed data for a long time, nor do
we pin the module now.
Signed-off-by: NAviya Erenfeld <aviya.erenfeld@intel.com>
[rewrite commit message]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>

7e62a699

iwlwifi: mvm: enable dynamic queue allocation mode · 7948b873

由 Liad Kaufman 提交于 9月 22, 2016

New firmwares support dynamic queue allocation (DQA), which enables
on-demand allocation of queues per RA/TID, instead of allocating them
statically per vif.  This allows an AP to send, for instance, BE
traffic to STA2 even if it also needs to send traffic to a sleeping
STA1, without being blocked by the sleeping station.

The implementation in the driver is now ready, so we can enable this
feature by default when running firmwares that support it.
Signed-off-by: NLiad Kaufman <liad.kaufman@intel.com>
[reworded the commit message]
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>

7948b873

24 10月, 2016 11 次提交

net: ip, diag -- Add diag interface for raw sockets · 432490f9

由 Cyrill Gorcunov 提交于 10月 21, 2016

In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.

v2:
 - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
 - implement @destroy for diag requests (by dsa@)

v3:
 - add export of raw_abort for IPv6 (by dsa@)
 - pass net-admin flag into inet_sk_diag_fill due to
   changes in net-next branch (by dsa@)

v4:
 - use @pad in struct inet_diag_req_v2 for raw socket
   protocol specification: raw module carries sockets
   which may have custom protocol passed from socket()
   syscall and sole @sdiag_protocol is not enough to
   match underlied ones
 - start reporting protocol specifed in socket() call
   when sockets are raw ones for the same reason: user
   space tools like ss may parse this attribute and use
   it for socket matching

v5 (by eric.dumazet@):
 - use sock_hold in raw_sock_get instead of atomic_inc,
   we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
   when looking up so counter won't be zero here.

v6:
 - use sdiag_raw_protocol() helper which will access @pad
   structure used for raw sockets protocol specification:
   we can't simply rename this member without breaking uapi

v7:
 - sine sdiag_raw_protocol() helper is not suitable for
   uapi lets rather make an alias structure with proper
   names. __check_inet_diag_req_raw helper will catch
   if any of structure unintentionally changed.

CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrey Vagin <avagin@openvz.org>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

432490f9

lwt: Remove unused len field · f76a9db3

由 Thomas Graf 提交于 10月 21, 2016

The field is initialized by ILA and MPLS but never used. Remove it.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f76a9db3

net: allow to kill a task which waits net_mutex in copy_new_ns · 7281a665

由 Andrey Vagin 提交于 10月 20, 2016

net_mutex can be locked for a long time. It may be because many
namespaces are being destroyed or many processes decide to create
a network namespace.

Both these operations are heavy, so it is better to have an ability to
kill a process which is waiting net_mutex.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7281a665

net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames · d65f2fa6

由 Shmulik Ladkani 提交于 10月 21, 2016

META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.

This is incorrect for zero VID vlan accel packets, making the following
match fail:
  tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...

Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b24 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
 adapted).

Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.

Fixes: 05423b24 ("vlan: allow null VLAN ID to be used")
Fixes: 1a31f204 ("netsched: Allow meta match on vlan tag on receive")
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d65f2fa6

Merge branch 'mlxsw-cosmetics-plus-res-mgmt-rewrite' · 1bac9381

由 David S. Miller 提交于 10月 23, 2016

Jiri Pirko says:

====================
mlxsw: Driver update

Mostly cosmetics and small resource values management rewrite.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bac9381

mlxsw: Convert resources into array · c1a38311

由 Jiri Pirko 提交于 10月 21, 2016

Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1a38311

mlxsw: cmd: Push resource query defines to cmd.h · f38a2314

由 Jiri Pirko 提交于 10月 21, 2016

Push cmd resource query related defines to cmd.h where they belong.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38a2314

mlxsw: reg: Generare register names automatically · 8e9658d5

由 Jiri Pirko 提交于 10月 21, 2016

Extend the MLXSW_REG_DEFINE macro to store register name in string form.
Use this string later on instead of hard coded string values.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e9658d5

mlxsw: reg: Use helper macro to define registers · 21978dcf

由 Jiri Pirko 提交于 10月 21, 2016

Save some code and also prepare to easily carry name in string form.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21978dcf

mlxsw: item: Make char *buf arg constant for getters · 412791df

由 Jiri Pirko 提交于 10月 21, 2016

Enforce const for getter buf args.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

412791df

mlxsw: item: Make struct mlxsw_item args const · fe0612dc

由 Jiri Pirko 提交于 10月 21, 2016

These should be const, so enforce it.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe0612dc

23 10月, 2016 7 次提交

Merge branch 'bpf-numa-id' · 67dc1596

由 David S. Miller 提交于 10月 22, 2016

Daniel Borkmann says:

====================
Add BPF numa id helper

This patch set adds a helper for retrieving current numa node
id and a test case for SO_REUSEPORT.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67dc1596

reuseport, bpf: add test case for bpf_get_numa_node_id · 3c2c3c16

由 Daniel Borkmann 提交于 10月 21, 2016

The test case is very similar to reuseport_bpf_cpu, only that here
we select socket members based on current numa node id.

  # numactl -H
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
  node 0 size: 128867 MB
  node 0 free: 120080 MB
  node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
  node 1 size: 96765 MB
  node 1 free: 87504 MB
  node distances:
  node   0   1
    0:  10  20
    1:  20  10

  # ./reuseport_bpf_numa
  ---- IPv4 UDP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  ---- IPv6 UDP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  ---- IPv4 TCP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  ---- IPv6 TCP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  SUCCESS
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c2c3c16

bpf: add helper for retrieving current numa node id · 2d0e30c3

由 Daniel Borkmann 提交于 10月 21, 2016

Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d0e30c3

Merge branch 'udpmem' · a10b91b8

由 David S. Miller 提交于 10月 22, 2016

Paolo Abeni says:

====================
udp: refactor memory accounting

This patch series refactor the udp memory accounting, replacing the
generic implementation with a custom one, in order to remove the needs for
locking the socket on the enqueue and dequeue operations. The socket backlog
usage is dropped, as well.

The first patch factor out pieces of some queue and memory management
socket helpers, so that they can later be used by the udp memory accounting
functions.
The second patch adds the memory account helpers, without using them.
The third patch replacse the old rx memory accounting path for udp over ipv4 and
udp over ipv6. In kernel UDP users are updated, as well.

The memory accounting schema is described in detail in the individual patch
commit message.

The performance gain depends on the specific scenario; with few flows (and
little contention in the original code) the differences are in the noise range,
while with several flows contending the same socket, the measured speed-up
is relevant (e.g. even over 100% in case of extreme contention)

Many thanks to Eric Dumazet for the reiterated reviews and suggestions.

v5 -> v6:
 - do not orphan the skb on enqueue, skb_steal_sock() already did
   the work for us

v4 -> v5:
 - use the receive queue spin lock to protect the memory accounting
 - several minor clean-up

v3 -> v4:
 - simplified the locking schema, always use a plain spinlock

v2 -> v3:
 - do not set the now unsed backlog_rcv callback

v1 -> v2:
 - changed slighly the memory accounting schema, we now perform lazy reclaim
 - fixed forward_alloc updating issue
 - fixed memory counter integer overflows
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a10b91b8

udp: use it's own memory accounting schema · 850cbadd

由 Paolo Abeni 提交于 10月 21, 2016

Completely avoid default sock memory accounting and replace it
with udp-specific accounting.

Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.

Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr readers      Kpps (vanilla)  Kpps (patched)
1               170             440
3               1250            2150
6               3000            3650
9               4200            4450
12              5700            6250

v4 -> v5:
  - avoid unneeded test in first_packet_length

v3 -> v4:
  - remove useless sk_rcvqueues_full() call

v2 -> v3:
  - do not set the now unsed backlog_rcv callback

v1 -> v2:
  - add memory pressure support
  - fixed dropwatch accounting for ipv6
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

850cbadd

udp: implement memory accounting helpers · f970bd9e

由 Paolo Abeni 提交于 10月 21, 2016

Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.

On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.

On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
  costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
  dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
  the work for us

The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().

v5 -> v6:
  - don't orphan the skb on enqueue

v4 -> v5:
  - replace the mem_lock with the receive queue spin lock
  - ensure that the bh is always allowed to enqueue at least
    a skb, even if sk_rcvbuf is exceeded

v3 -> v4:
  - reworked memory accunting, simplifying the schema
  - provide an helper for both memory scheduling and enqueuing

v1 -> v2:
  - use a udp specific destrctor to perform memory reclaiming
  - remove a couple of helpers, unneeded after the above cleanup
  - do not reclaim memory on dequeue if not under memory
    pressure
  - reworked the fwd accounting schema to avoid potential
    integer overflow
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f970bd9e

net/socket: factor out helpers for memory and queue manipulation · f8c3bf00

由 Paolo Abeni 提交于 10月 21, 2016

Basic sock operations that udp code can use with its own
memory accounting schema. No functional change is introduced
in the existing APIs.

v4 -> v5:
  - avoid whitespace changes

v2 -> v4:
  - avoid exporting __sock_enqueue_skb

v1 -> v2:
  - avoid export sock_rmem_free
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8c3bf00

22 10月, 2016 1 次提交

net: remove MTU limits on a few ether_setup callers · 8b1efc0f

由 Jarod Wilson 提交于 10月 20, 2016

These few drivers call ether_setup(), but have no ndo_change_mtu, and thus
were overlooked for changes to MTU range checking behavior. They
previously had no range checks, so for feature-parity, set their min_mtu
to 0 and max_mtu to ETH_MAX_MTU (65535), instead of the 68 and 1500
inherited from the ether_setup() changes. Fine-tuning can come after we get
back to full feature-parity here.

CC: netdev@vger.kernel.org
Reported-by: NAsbjoern Sloth Toennesen <asbjorn@asbjorn.st>
CC: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
CC: R Parameswaran <parameswaran.r7@gmail.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b1efc0f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功