提交 · ab256e5ad02b36951f01bf6b5cfda25f14820847 · openanolis / cloud-kernel

12 12月, 2014 8 次提交

net/mlx4: Add a check if there are too many reserved QPs · ab256e5a

由 Dotan Barak 提交于 12月 11, 2014

The number of reserved QPs is affected both from the firmware and
from the driver's requirements. This patch adds a check that
validates that this number is indeed feasable.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab256e5a

net/mlx4: Change QP allocation scheme · ddae0349

由 Eugenia Emantayev 提交于 12月 11, 2014

When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddae0349

net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42

由 Matan Barak 提交于 12月 11, 2014

Previously, we've fired all our completion callbacks straight from our ISR.

Some of those callbacks were lightweight (for example, mlx4_en's and
IPoIB napi callbacks), but some of them did more work (for example,
the user-space RDMA stack uverbs' completion handler). Besides that,
doing more than the minimal work in ISR is generally considered wrong,
it could even lead to a hard lockup of the system. Since when a lot
of completion events are generated by the hardware, the loop over those
events could be so long, that we'll get into a hard lockup by the system
watchdog.

In order to avoid that, add a new way of invoking completion events
callbacks. In the interrupt itself, we add the CQs which receive completion
event to a per-EQ list and schedule a tasklet. In the tasklet context
we loop over all the CQs in the list and invoke the user callback.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dca0f42

net/mlx4_core: Mask out host side virtualization features for guests · 383677da

由 Or Gerlitz 提交于 12月 11, 2014

When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
need not be told that host side virtualization features such as VST, FSM
(MAC anti-spoofing) and running > 80 VFs are supported by the device.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

383677da

net/mlx4_en: Set csum level for encapsulated packets · c58942f2

由 Or Gerlitz 提交于 12月 11, 2014

This was dropped by mistake for the napi_gro_frags flow, fix that.

Fixes: dd65beac ('net/mlx4_en: Extend usage of napi_gro_frags')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c58942f2

be2net: Export tunnel offloads only when a VxLAN tunnel is created · 630f4b70

由 Sriharsha Basavapatna 提交于 12月 11, 2014

The encapsulated offload flags shouldn't be unconditionally exported
to the stack. The stack expects offloading to work across all tunnel
types when those flags are set. This would break other tunnels (like
GRE) since be2net currently supports tunnel offload for VxLAN only.

Also, with VxLANs Skyhawk-R can offload only 1 UDP dport. If more
than 1 UDP port is added, we should disable offloads in that case too.
Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

630f4b70

gianfar: Fix dma check map error when DMA_API_DEBUG is enabled · 0a4b5a24

由 Kevin Hao 提交于 12月 11, 2014

We need to use dma_mapping_error() to check the dma address returned
by dma_map_single/page(). Otherwise we would get warning like this:
  WARNING: at lib/dma-debug.c:1140
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc2-next-20141029 #196
  task: c0834300 ti: effe6000 task.ti: c0874000
  NIP: c02b2c98 LR: c02b2c98 CTR: c030abc4
  REGS: effe7d70 TRAP: 0700   Not tainted  (3.18.0-rc2-next-20141029)
  MSR: 00021000 <CE,ME>  CR: 22044022  XER: 20000000

  GPR00: c02b2c98 effe7e20 c0834300 00000098 00021000 00000000 c030b898 00000003
  GPR08: 00000001 00000000 00000001 749eec9d 22044022 1001abe0 00000020 ef278678
  GPR16: ef278670 ef278668 ef278660 070a8040 c087f99c c08cdc60 00029000 c0840d44
  GPR24: c08be6e8 c0840000 effe7e78 ef041340 00000600 ef114e10 00000000 c08be6e0
  NIP [c02b2c98] check_unmap+0x51c/0x9e4
  LR [c02b2c98] check_unmap+0x51c/0x9e4
  Call Trace:
  [effe7e20] [c02b2c98] check_unmap+0x51c/0x9e4 (unreliable)
  [effe7e70] [c02b31d8] debug_dma_unmap_page+0x78/0x8c
  [effe7ed0] [c03d1640] gfar_clean_rx_ring+0x208/0x488
  [effe7f40] [c03d1a9c] gfar_poll_rx_sq+0x3c/0xa8
  [effe7f60] [c04f8714] net_rx_action+0xc0/0x178
  [effe7f90] [c00435a0] __do_softirq+0x100/0x1fc
  [effe7fe0] [c0043958] irq_exit+0xa4/0xc8
  [effe7ff0] [c000d14c] call_do_irq+0x24/0x3c
  [c0875e90] [c00048a0] do_IRQ+0x8c/0xf8
  [c0875eb0] [c000ed10] ret_from_except+0x0/0x18

For TX, we need to unmap the pages which has already been mapped and
free the skb before return.

For RX, move the dma mapping and error check to gfar_new_skb(). We
would reuse the original skb in the rx ring when either allocating
skb failure or dma mapping error.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
Reviewed-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a4b5a24

cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call · 666224d4

由 Hariprasad Shenai 提交于 12月 11, 2014

Remove use of calls into t4_fw_hello() with MASTER_MUST, which results in
FW_HELLO_CMD_MASTERFORCE being set. The firmware doesn't support this and of
course any existing PF Drivers will totally go for a toss.
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

666224d4

11 12月, 2014 32 次提交

Merge branch 'fec-next' · 52c9b12d

由 David S. Miller 提交于 12月 10, 2014

Fugang Duan says:

====================
net: fec: driver code clean and bug fix

The patch serial include code clean and bug fix:
Patch#1: avoid dummy operation during suspend/resume test.
Patch#2: bug fix for i.MX6SX SOC that clean all interrupt events during MAC initial process.
Patch#3: before phy device link status is up, only enable MDIO bus interrupt.

V2:
- Modify the comment form from David's suggestion.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52c9b12d

net: fec: only enable mdio interrupt before phy device link up · 0c5a3aef

由 Nimrod Andy 提交于 12月 11, 2014

Before phy device link up, we only enable FEC mdio interrupt, which
is more reasonable.
Signed-off-by: NFugang Duan <B38611@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c5a3aef

net: fec: clear all interrupt events to support i.MX6SX · e17f7fec

由 Nimrod Andy 提交于 12月 11, 2014

For i.MX6SX FEC controller, there have interrupt mask and event
field extension. To support all SOCs FEC, we clear all interrupt
events during MAVC initial process.
Signed-off-by: NFugang Duan <B38611@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e17f7fec

net: fec: reset fep link status in suspend function · 858eeb7d

由 Nimrod Andy 提交于 12月 11, 2014

On some i.MX6 serial boards, phy power and refrence clock are supplied
or controlled by SOC. When do suspend/resume test, the power and clock
are disabled, so phy device link down.

For current driver, fep->link is still up status, which cause extra operation
like below code. To avoid the dumy operation, we set fep->link to down when
phy device is real down.
...
if (fep->link) {
	napi_disable(&fep->napi);
	netif_tx_lock_bh(ndev);
	fec_stop(ndev);
	netif_tx_unlock_bh(ndev);
	napi_enable(&fep->napi);
	fep->link = phy_dev->link;
	status_change = 1;
}
...
Signed-off-by: NFugang Duan <B38611@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

858eeb7d

net: sock: fix access via invalid file descriptor · 198bf1b0

由 Alexei Starovoitov 提交于 12月 10, 2014

0day robot reported the following crash:
[   21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
[   21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2

It's due to bpf_prog_get() returning ERR_PTR.
Check it properly.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

198bf1b0

net: introduce helper macro for_each_cmsghdr · f95b414e

由 Gu Zheng 提交于 12月 11, 2014

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f95b414e

cxgb4/cxgb4vf: global named must be unique · dd0bcc0b

由 Stephen Rothwell 提交于 12月 10, 2014

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd0bcc0b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 22f10923

由 David S. Miller 提交于 12月 10, 2014

Conflicts:
	drivers/net/ethernet/amd/xgbe/xgbe-desc.c
	drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22f10923

irda: Convert function pointer arrays and uses to const · 785c20a0

由 Joe Perches 提交于 12月 10, 2014

Making things const is a good thing.

(x86-64 defconfig with all irda)
$ size net/irda/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 109276	   1868	    244	 111388	  1b31c	net/irda/built-in.o.new
 108828	   2316	    244	 111388	  1b31c	net/irda/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

785c20a0

llc: Make llc_sap_action_t function pointer arrays const · 22bbf5f3

由 Joe Perches 提交于 12月 10, 2014

It's better when function pointer arrays aren't modifiable.

Net change:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61193	  12758	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22bbf5f3

llc: Make llc_conn_ev_qfyr_t function pointer arrays const · 9b373069

由 Joe Perches 提交于 12月 10, 2014

It's better when function pointer arrays aren't modifiable.

Net change from original:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61065	  12886	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b373069

llc: Make function pointer arrays const · 14b7d95f

由 Joe Perches 提交于 12月 10, 2014

It's better when function pointer arrays aren't modifiable.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14b7d95f

Merge branch 'kill_arch_fast_hash' · b95bf1e0

由 David S. Miller 提交于 12月 10, 2014

Daniel Borkmann says:

====================
Kill arch_fast_hash

Due to the size of changes I have based this against net-next,
also given 3.18 is already out. I've split this into 3 parts,
the first two to remove existing users (so they can optionally
go to stable) and the last one to kill the remaining library bits.

Let me know if there are any issues.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b95bf1e0

net, lib: kill arch_fast_hash library bits · 0cb6c969

由 Daniel Borkmann 提交于 12月 10, 2014

As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
23721754 ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f7 ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652d
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cb6c969

net: replace remaining users of arch_fast_hash with jhash · 87545899

由 Daniel Borkmann 提交于 12月 10, 2014

This patch effectively reverts commit 500f8087 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd56 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.

While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.

Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.

  [1] https://patchwork.ozlabs.org/patch/299369/
  [2] https://patchwork.ozlabs.org/patch/418756/

Cc: Neil Brown <neilb@suse.de>
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87545899

netlink: use jhash as hashfn for rhashtable · 7f19fc5e

由 Daniel Borkmann 提交于 12月 10, 2014

For netlink, we shouldn't be using arch_fast_hash() as a hashing
discipline, but rather jhash() instead.

Since netlink sockets can be opened by any user, a local attacker
would be able to easily create collisions with the DPDK-derived
arch_fast_hash(), which trades off performance for security by
using crc32 CPU instructions on x86_64.

While it might have a legimite use case in other places, it should
be avoided in netlink context, though. As rhashtable's API is very
flexible, we could later on still decide on other hashing disciplines,
if legitimate.

Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f19fc5e

Merge branch 'isdn-next' · 7d339890

由 David S. Miller 提交于 12月 10, 2014

Tilman Schmidt says:

====================
ISDN patches for net-next

Here's a series of patches for the Gigaset ISDN driver and one for
the ISDN CAPI subsystem.  Please merge as appropriate.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d339890

isdn/capi: correct argument types of command_2_index · 330078ab

由 Tilman Schmidt 提交于 12月 10, 2014

Utility function command_2_index is always called with arguments of
type u8. Adapt its declaration accordingly.
Signed-off-by: NTilman Schmidt <tilman@imap.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

330078ab

isdn/gigaset: enable Kernel CAPI support by default · f6a68e69

由 Tilman Schmidt 提交于 12月 10, 2014

Kernel CAPI has been the recommended ISDN subsystem for the Gigaset
driver since kernel release 2.6.34.2. It provides full backwards
compatibility to the old I4L subsystem thanks to the capidrv module.
I4L has been marked as deprecated for more than seven years.
Signed-off-by: NTilman Schmidt <tilman@imap.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6a68e69

isdn/gigaset: elliminate unnecessary argument from send_cb() · f99a6fde

由 Tilman Schmidt 提交于 12月 10, 2014

No need to pass a member of the cardstate structure as a separate
argument if the entire structure is already passed.
Signed-off-by: NTilman Schmidt <tilman@imap.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f99a6fde

isdn/gigaset: clarify gigaset_modem_fill control structure · f650dd28

由 Tilman Schmidt 提交于 12月 10, 2014

Replace the flag-controlled retry loop by explicit goto statements
in the error branches to make the control structure easier to
understand.
Signed-off-by: NTilman Schmidt <tilman@imap.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f650dd28

isdn/gigaset: drop duplicate declaration · ff66fc3a

由 Tilman Schmidt 提交于 12月 10, 2014

Function gigaset_skb_sent was declared twice, identically, in gigaset.h.
Signed-off-by: NTilman Schmidt <tilman@imap.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff66fc3a

tipc: fix broadcast wakeup contention after congestion · 340b6e59

由 Richard Alpe 提交于 12月 10, 2014

commit 908344cd ("tipc: fix bug in multicast congestion handling")
introduced a race in the broadcast link wakeup functionality.

This patch eliminates this broadcast link wakeup race caused by
operation on the wakeup list without proper locking. If this race
hit and corrupted the list all subsequent wakeup messages would be
lost, resulting in a considerable memory leak.
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

340b6e59

enic: add support for set/get rss hash key · 4f675eb2

由 Govindarajulu Varadarajan 提交于 12月 10, 2014

This patch adds support for setting/getting rss hash key using ethtool.

v2:
respin patch to support RSS hash function changes.
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f675eb2

Merge branch 'napi_page_frags' · 7dbea3e8

由 David S. Miller 提交于 12月 10, 2014

Alexander Duyck says:

====================
net: Alloc NAPI page frags from their own pool

This patch series implements a means of allocating page fragments without
the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
this I am able to decrease packet processing time by 11ns per packet in my
test environment.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7dbea3e8

ethernet/broadcom: Use napi_alloc_skb instead of netdev_alloc_skb_ip_align · 45abfb10

由 Alexander Duyck 提交于 12月 09, 2014

This patch replaces the calls to netdev_alloc_skb_ip_align in the
copybreak paths.

Cc: Gary Zambrano <zambrano@broadcom.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45abfb10

ethernet/realtek: use napi_alloc_skb instead of netdev_alloc_skb_ip_align · e2338f86

由 Alexander Duyck 提交于 12月 09, 2014

This replaces most of the calls to netdev_alloc_skb_ip_align in the Realtek
drivers.  The one instance I didn't replace in 8139cp.c is because it was
called as a part of init and as such is not always accessed from the
softirq context.

Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2338f86

cxgb: Use napi_alloc_skb instead of netdev_alloc_skb_ip_align · e0e31216

由 Alexander Duyck 提交于 12月 09, 2014

In order to use napi_alloc_skb I needed to pass a pointer to struct adapter
instead of struct pci_dev. This allowed me to access &adapter->napi.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0e31216

ethernet/intel: Use napi_alloc_skb · 67fd893e

由 Alexander Duyck 提交于 12月 09, 2014

This change replaces calls to netdev_alloc_skb_ip_align with
napi_alloc_skb.  The advantage of napi_alloc_skb is currently the fact that
the page allocation doesn't make use of any irq disable calls.

There are few spots where I couldn't replace the calls as the buffer
allocation routine is called as a part of init which is outside of the
softirq context.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67fd893e

net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb · fd11a83d

由 Alexander Duyck 提交于 12月 09, 2014

This change pulls the core functionality out of __netdev_alloc_skb and
places them in a new function named __alloc_rx_skb. The reason for doing
this is to make these bits accessible to a new function __napi_alloc_skb.
In addition __alloc_rx_skb now has a new flags value that is used to
determine which page frag pool to allocate from. If the SKB_ALLOC_NAPI
flag is set then the NAPI pool is used. The advantage of this is that we
do not have to use local_irq_save/restore when accessing the NAPI pool from
NAPI context.

In my test setup I saw at least 11ns of savings using the napi_alloc_skb
function versus the netdev_alloc_skb function, most of this being due to
the fact that we didn't have to call local_irq_save/restore.

The main use case for napi_alloc_skb would be for things such as copybreak
or page fragment based receive paths where an skb is allocated after the
data has been received instead of before.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd11a83d

net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag · ffde7328

由 Alexander Duyck 提交于 12月 09, 2014

This patch splits the netdev_alloc_frag function up so that it can be used
on one of two page frag pools instead of being fixed on the
netdev_alloc_cache.  By doing this we can add a NAPI specific function
__napi_alloc_frag that accesses a pool that is only used from softirq
context.  The advantage to this is that we do not need to call
local_irq_save/restore which can be a significant savings.

I also took the opportunity to refactor the core bits that were placed in
__alloc_page_frag.  First I updated the allocation to do either a 32K
allocation or an order 0 page.  This is based on the changes in commmit
d9b2938a where it was found that latencies could be reduced in case of
failures.  Then I also rewrote the logic to work from the end of the page to
the start.  By doing this the size value doesn't have to be used unless we
have run out of space for page fragments.  Finally I cleaned up the atomic
bits so that we just do an atomic_sub_and_test and if that returns true then
we set the page->_count via an atomic_set.  This way we can remove the extra
conditional for the atomic_read since it would have led to an atomic_inc in
the case of success anyway.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffde7328

D
Merge branch 'for-davem-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 6e5f59aa
由 David S. Miller 提交于 12月 10, 2014
```
More iov_iter work for the networking from Al Viro.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
6e5f59aa

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功