提交 · 07f12622a66320d5f56a71a921cf70a43e7a6b87 · openeuler / Kernel

13 12月, 2018 1 次提交

bnx2x: Enable PTP only on the PF that initializes the port · 07f12622

由 Sudarsana Reddy Kalluru 提交于 12月 12, 2018

There will be only one PHC clock per port. PTP should be enabled only on
one PF per port. The change enables PTP functionality on the PF that
initializes the port. The change is useful in multi-function modes e.g.,
NPAR where a port can have more than one PF.
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f12622

26 10月, 2018 1 次提交

drivers: net: remove <net/busy_poll.h> inclusion when not needed · 55469bc6

由 Eric Dumazet 提交于 10月 25, 2018

Drivers using generic NAPI interface no longer need to include
<net/busy_poll.h>, since busy polling was moved to core networking
stack long ago.

See commit 79e7fff4 ("net: remove support for per driver
ndo_busy_poll()") for reference.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55469bc6

10 7月, 2018 2 次提交

net: allow fallback function to pass netdev · 8ec56fc3

由 Alexander Duyck 提交于 7月 09, 2018

For most of these calls we can just pass NULL through to the fallback
function as the sb_dev. The only cases where we cannot are the cases where
we might be dealing with either an upper device or a driver that would
have configured things to support an sb_dev itself.

The only driver that has any significant change in this patch set should be
ixgbe as we can drop the redundant functionality that existed in both the
ndo_select_queue function and the fallback function that was passed through
to us.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8ec56fc3

net: allow ndo_select_queue to pass netdev · 4f49dec9

由 Alexander Duyck 提交于 7月 09, 2018

This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4f49dec9

30 6月, 2018 1 次提交

bnx2x: Fix receiving tx-timeout in error or recovery state. · 484c016d

由 Sudarsana Reddy Kalluru 提交于 6月 28, 2018

Driver performs the internal reload when it receives tx-timeout event from
the OS. Internal reload might fail in some scenarios e.g., fatal HW issues.
In such cases OS still see the link, which would result in undesirable
functionalities such as re-generation of tx-timeouts.
The patch addresses this issue by indicating the link-down to OS when
tx-timeout is detected, and keeping the link in down state till the
internal reload is successful.

Please consider applying it to 'net' branch.
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

484c016d

29 5月, 2018 1 次提交

bnx2x: Collect the device debug information during Tx timeout. · 1b40428c

由 Sudarsana Reddy Kalluru 提交于 5月 24, 2018

Tx-timeout mostly happens due to some issue in the device. In such cases,
debug dump would be helpful for identifying the cause of the issue.
This patch adds support to spill debug data during the Tx timeout. Here
bnx2x_panic_dump() API is used instead of bnx2x_panic(), since we still
want to allow the Tx-timeout recovery a chance to succeed.

Changes from previous version:
-------------------------------
v2: Fixed a coding error.

Please consider applying this to "net-next".
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b40428c

27 3月, 2018 2 次提交

bnx2x: Eliminate duplicate barriers on weakly-ordered archs · 7f883c77

由 Sinan Kaya 提交于 3月 25, 2018

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f883c77

bnx2x: Replace doorbell barrier() with wmb() · edd87423

由 Sinan Kaya 提交于 3月 25, 2018

barrier() doesn't guarantee memory writes to be observed by the hardware on
all architectures. barrier() only tells compiler not to move this code
with respect to other read/writes.

If memory write needs to be observed by the HW, wmb() is the right choice.
Signed-off-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edd87423

09 1月, 2018 1 次提交

bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE · 37ed41c4

由 Gal Pressman 提交于 1月 07, 2018

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37ed41c4

28 12月, 2017 1 次提交

bnx2x: Improve reliability in case of nested PCI errors · f7084059

由 Guilherme G. Piccoli 提交于 12月 22, 2017

While in recovery process of PCI error (called EEH on PowerPC arch),
another PCI transaction could be corrupted causing a situation of
nested PCI errors. Also, this scenario could be reproduced with
error injection mechanisms (for debug purposes).

We observe that in case of nested PCI errors, bnx2x might attempt to
initialize its shmem and cause a kernel crash due to bad addresses
read from MCP. Multiple different stack traces were observed depending
on the point the second PCI error happens.

This patch avoids the crashes by:

 * failing PCI recovery in case of nested errors (since multiple
 PCI errors in a row are not expected to lead to a functional
 adapter anyway), and by,

 * preventing access to adapter FW when MCP is failed (we mark it as
 failed when shmem cannot get initialized properly).
Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: NShahed Shaikh <Shahed.Shaikh@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7084059

19 12月, 2017 1 次提交

bnx2x: Use NETIF_F_GRO_HW. · 3c3def5f

由 Michael Chan 提交于 12月 16, 2017

Advertise NETIF_F_GRO_HW and turn on TPA_MODE_GRO when NETIF_F_GRO_HW
is set.  Disable NETIF_F_GRO_HW in bnx2x_fix_features() if the MTU
does not support TPA_MODE_GRO or GRO is not set.  bnx2x_change_mtu() also
needs to disable NETIF_F_GRO_HW if the MTU does not support it.

Original parameter disable_tpa will continue to disable LRO and GRO_HW.

Preserve the original behavior of enabling LRO by default.  User has
to run ethtool -K to explicitly enable GRO_HW.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Acked-by: NManish Chopra <manish.chopra@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c3def5f

08 11月, 2017 1 次提交

net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO · 575ed7d3

由 Nogah Frankel 提交于 11月 06, 2017

Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.
Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

575ed7d3

08 8月, 2017 4 次提交

net: sched: get rid of struct tc_to_netdev · de4784ca

由 Jiri Pirko 提交于 8月 07, 2017

Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de4784ca

net: sched: change return value of ndo_setup_tc for driver supporting mqprio only · 38cf0426

由 Jiri Pirko 提交于 8月 07, 2017

Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38cf0426

net: sched: push cls related args into cls_common structure · 5fd9fc4e

由 Jiri Pirko 提交于 8月 07, 2017

As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fd9fc4e

net: sched: make type an argument for ndo_setup_tc · 2572ac53

由 Jiri Pirko 提交于 8月 07, 2017

Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2572ac53

11 6月, 2017 1 次提交

bnx2x: Allow vfs to disable txvlan offload · 92f85f05

由 Mintz, Yuval 提交于 6月 09, 2017

VF clients are configured as enforced, meaning firmware is validating
the correctness of their ethertype/vid during transmission.
Once txvlan is disabled, VF would start getting SKBs for transmission
here vlan is on the payload - but it'll pass the packet's ethertype
instead of the vid, leading to firmware declaring it as malicious.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92f85f05

08 6月, 2017 1 次提交

net: propagate tc filter chain index down the ndo_setup_tc call · a5fcf8a6

由 Jiri Pirko 提交于 6月 06, 2017

We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5fcf8a6

02 6月, 2017 1 次提交

bnx2x: Fix Multi-Cos · 3968d389

由 Mintz, Yuval 提交于 6月 01, 2017

Apparently multi-cos isn't working for bnx2x quite some time -
driver implements ndo_select_queue() to allow queue-selection
for FCoE, but the regular L2 flow would cause it to modulo the
fallback's result by the number of queues.
The fallback would return a queue matching the needed tc
[via __skb_tx_hash()], but since the modulo is by the number of TSS
queues where number of TCs is not accounted, transmission would always
be done by a queue configured into using TC0.

Fixes: ada7c19e ("bnx2x: use XPS if possible for bnx2x_select_queue instead of pure hash")
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3968d389

01 5月, 2017 1 次提交

bnx2x: Align RX buffers · 9b70de6d

由 Scott Wood 提交于 4月 28, 2017

The bnx2x driver is not providing proper alignment on the receive buffers it
passes to build_skb(), causing skb_shared_info to be misaligned.
skb_shared_info contains an atomic, and while PPC normally supports
unaligned accesses, it does not support unaligned atomics.

Aligning the size of rx buffers will ensure that page_frag_alloc() returns
aligned addresses.

This can be reproduced on PPC by setting the network MTU to 1450 (or other
non-multiple-of-4) and then generating sufficient inbound network traffic
(one or two large "wget"s usually does it), producing the following oops:

Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656
Faulting instruction address: 0xc00000000080ef8c
Oops: Kernel access of bad area, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common
CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761daf #2
task: c00000ffd4892400 task.stack: c00000ffd4920000
NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320
REGS: c00000ffffc33710 TRAP: 0600   Not tainted  (4.11.0-rc8-00088-g4c761daf)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 24082042  XER: 00000000
CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1
GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100
GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000
GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f
GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001
GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000
GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a
GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100
NIP [c00000000080ef8c] __skb_clone+0xdc/0x140
LR [c00000000080eee8] __skb_clone+0x38/0x140
Call Trace:
[c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable)
[c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510
[c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80
[c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0
[c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260
[c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x]
[c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480
[c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0
[c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120
[c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200
[c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24
[c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110
[c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b70de6d

16 3月, 2017 1 次提交

mqprio: Modify mqprio to pass user parameters via ndo_setup_tc. · 56f36acd

由 Amritha Nambiar 提交于 3月 15, 2017

The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.

This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.

Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.
Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56f36acd

31 1月, 2017 1 次提交

drivers: net: generalize napi_complete_done() · 6ad20165

由 Eric Dumazet 提交于 1月 30, 2017

napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d303
("net: gro: add a per device gro flush timer")

This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ad20165

24 1月, 2017 1 次提交

bnx2x: avoid two atomic ops per page on x86 · b9032741

由 Eric Dumazet 提交于 1月 20, 2017

Commit 4cace675 ("bnx2x: Alloc 4k fragment for each rx ring buffer
element") added extra put_page() and get_page() calls on arches where
PAGE_SIZE=4K like x86

Reorder things to avoid this overhead.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9032741

04 12月, 2016 1 次提交

bnx2x: use reset to set network header · 0e24c0ad

由 Zhang Shengju 提交于 12月 02, 2016

Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e24c0ad

17 11月, 2016 1 次提交

bnx2x: switch to napi_complete_done() · 80f1c21c

由 Eric Dumazet 提交于 11月 15, 2016

Switch from napi_complete() to napi_complete_done()
for better GRO support (gro_flush_timeout) and core NAPI
features.

Do not rearm interrupts if we are busy polling,
to reduce bus and interrupts overhead.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80f1c21c

18 10月, 2016 1 次提交

ethernet/broadcom: use core min/max MTU checking · e1c6dcca

由 Jarod Wilson 提交于 10月 17, 2016

tg3: min_mtu 60, max_mtu 9000/1500

bnxt: min_mtu 60, max_mtu 9000

bnx2x: min_mtu 46, max_mtu 9600
- Fix up ETH_OVREHEAD -> ETH_OVERHEAD while we're in here, remove
  duplicated defines from bnx2x_link.c.

bnx2: min_mtu 46, max_mtu 9000
- Use more standard ETH_* defines while we're at it.

bcm63xx_enet: min_mtu 46, max_mtu 2028
- compute_hw_mtu was made largely pointless, and thus merged back into
  bcm_enet_change_mtu.

b44: min_mtu 60, max_mtu 1500

CC: netdev@vger.kernel.org
CC: Michael Chan <michael.chan@broadcom.com>
CC: Sony Chacko <sony.chacko@qlogic.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Dept-HSGLinuxNICDev@qlogic.com
CC: Siva Reddy Kallam <siva.kallam@broadcom.com>
CC: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1c6dcca

17 3月, 2016 1 次提交

bnx2x: don't wait for Tx completion on recovery · d78a1f08

由 Yuval Mintz 提交于 3月 13, 2016

When driver has hit a parity event, HW can no longer write to host memory.
As a result, Tx completions cannot be written to the host SB memory, and
waiting for Tx completions eventually timeout.
As driver is willing to delay as much as 1-2 seconds per Tx queue for its
draining and this delay is sequential, the time to recover might greatly
lengthen needlessly in case the recovery is done under multi-connection
traffic.
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d78a1f08

04 3月, 2016 1 次提交

net: relax setup_tc ndo op handle restriction · 5eb4dce3

由 John Fastabend 提交于 2月 29, 2016

I added this check in setup_tc to multiple drivers,

 if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)

Unfortunately restricting to TC_H_ROOT like this breaks the old
instantiation of mqprio to setup a hardware qdisc. This patch
relaxes the test to only check the type to make it equivalent
to the check before I broke it. With this the old instantiation
continues to work.

A good smoke test is to setup mqprio with,

# tc qdisc add dev eth4 root mqprio num_tc 8 \
  map 0 1 2 3 4 5 6 7 \
  queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7

Fixes: e4c6734e ("net: rework ndo tc op to consume additional qdisc handle paramete")
Reported-by: NSingh Krishneil <krishneil.k.singh@intel.com>
Reported-by: NJake Keller <jacob.e.keller@intel.com>
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5eb4dce3

17 2月, 2016 3 次提交

net: rework setup_tc ndo op to consume general tc operand · 16e5cc64

由 John Fastabend 提交于 2月 16, 2016

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e5cc64

net: rework ndo tc op to consume additional qdisc handle parameter · e4c6734e

由 John Fastabend 提交于 2月 16, 2016

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4c6734e

bnx2x: Remove unneccessary EXPORT_SYMBOL · 44520464

由 Yuval Mintz 提交于 2月 16, 2016

bnx2x_schedule_sp_rtnl is exported by bnx2x, although no other module
uses it.
Reported-by: NBenjamin Poirier <bpoirier@suse.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44520464

19 12月, 2015 1 次提交

bnx2x: Prevent FW assertion when using Vxlan · ea2465af

由 Yuval Mintz 提交于 12月 18, 2015

FW has a rare corner case in which a fragmented packet using lots
of frags would not be linearized, causing the FW to assert while trying
to transmit the packet.

To prevent this, we need to make sure the window of fragements containing
MSS worth of data contains 1 BD less than for regular packets due to
the additional parsing BD.
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea2465af

09 12月, 2015 2 次提交

bnx2x: remove rx_pkt/rx_calls · 5abe2558

由 Eric Dumazet 提交于 12月 08, 2015

These fields are updated but never read.
Remove the overhead.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5abe2558

bnx2x: avoid soft lockup in bnx2x_poll() · 4d6acb62

由 Eric Dumazet 提交于 12月 08, 2015

Under heavy TX load, bnx2x_poll() can loop forever and trigger
soft lockup bugs.

A napi poll handler must yield after one TX completion round,
risk of livelock is too high otherwise.

Bug is very easy to trigger using a debug build, and udp flood, because
of added cpu cycles in TX completion, and we do not receive enough
packets to break the loop.
Reported-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d6acb62

06 12月, 2015 2 次提交

bnx2x: change FW GRO error message to WARN_ONCE · 9adab1b0

由 Michal Schmidt 提交于 12月 04, 2015

It's supposed to be impossible for TPA to give us anything else
than IPv4 or IPv6 here. But in case there is a way to reach this error
by some strange received frames, we don't want to flood the kernel log.
WARN_ONCE is better for this purpose.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9adab1b0

bnx2x: drop redundant error message about allocation failure · 5c9ffde4

由 Michal Schmidt 提交于 12月 04, 2015

alloc_pages() already prints a warning when it fails. No need to emit
another message. Certainly not at KERN_ERR level, because it is no big
deal if this GFP_ATOMIC allocation fails occasionally.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c9ffde4

19 11月, 2015 3 次提交

net: provide generic busy polling to all NAPI drivers · 93d05d4a

由 Eric Dumazet 提交于 11月 18, 2015

NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)

napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()

This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.

Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.

Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93d05d4a

net: move skb_mark_napi_id() into core networking stack · 93f93a44

由 Eric Dumazet 提交于 11月 18, 2015

We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.

skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().

Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93f93a44

bnx2x: remove bnx2x_low_latency_recv() support · b59768c6

由 Eric Dumazet 提交于 11月 18, 2015

Switch to native NAPI polling, as this reduces overhead and complexity.

Normal path is faster, since one cmpxchg() is not anymore requested,
and busy polling with the NAPI polling has same performance.

Tested:
lpk50:~# cat /proc/sys/net/core/busy_read
70
lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    40095.07
16384  87380
IpInReceives                    401062             0.0
IpInDelivers                    401062             0.0
IpOutRequests                   401079             0.0
TcpActiveOpens                  7                  0.0
TcpPassiveOpens                 3                  0.0
TcpAttemptFails                 3                  0.0
TcpEstabResets                  5                  0.0
TcpInSegs                       401036             0.0
TcpOutSegs                      401052             0.0
TcpOutRsts                      38                 0.0
UdpInDatagrams                  26                 0.0
UdpOutDatagrams                 27                 0.0
Ip6OutNoRoutes                  1                  0.0
TcpExtDelayedACKs               1                  0.0
TcpExtTCPPrequeued              98                 0.0
TcpExtTCPDirectCopyFromPrequeue 98                 0.0
TcpExtTCPHPHits                 4                  0.0
TcpExtTCPHPHitsToUser           98                 0.0
TcpExtTCPPureAcks               5                  0.0
TcpExtTCPHPAcks                 101                0.0
TcpExtTCPAbortOnData            6                  0.0
TcpExtBusyPollRxPackets         400832             0.0
TcpExtTCPOrigDataSent           400983             0.0
IpExtInOctets                   21273867           0.0
IpExtOutOctets                  21261254           0.0
IpExtInNoECTPkts                401064             0.0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b59768c6

07 11月, 2015 1 次提交

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc

由 Mel Gorman 提交于 11月 06, 2015

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0164adc

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功